2,729 Matching Annotations
  1. Mar 2023
    1. Author Response

      Reviewer #1 (Public Review):

      Han et al use sophisticated genetic approaches to investigate leptin-responsive neural circuits. Overall, this is an impressive series of studies that provide fairly convincing evidence for a key inhibitory pathway downstream of AGRP neurons. A few data sets require additional validation or explanation.

      We appreciate the reviewer’s strong interests and support of this manuscript and these valuable comments below. We have revised the manuscript accordingly to incorporate reviewer’s suggestions and critiques.

      Reviewer #2 (Public Review):

      Using a novel genetic system to conditionally ablate Lepr from Agrp neurons in adults, the authors discovered that leptin-AgRP neuron signaling strongly modulates the DMH and sought to understand the DMH targets and mechanisms of action in the response to AgRP neuron signaling. GABA signaling likely underlies the effects of AgRP neuron-mediated hyperphagia (etc). DMH Mc4R neurons appear to lie downstream of Agrp neurons. GABA in the DMH appears to mediate many of the effects of AgRP neurons on feeding and body weight. Furthermore, Deletion of Lepr from AgRP neurons increases DMH GABA-ARa3, and modulation of this receptor in the DMH alters food intake and the response to leptin.

      Unfortunately, there is little quantification or other validation data from many of the systems deployed, and the analysis jumps around a fair amount, without really uniting the results in a way that paints a convincing picture of the final model that they build.

      Thanks for these positive comments on our studies. In the revised manuscript, we have added substantial amount of new experimental data, more controls, and data validation that significantly strengthen our proposed model.

      Reviewer #3 (Public Review):

      The manuscript by Han et al characterizes a pathway from AgRP(LepR) neurons to DMH(MC4R) neurons that is involved in energy balance control. They use a conditional knockout strategy to show that AgRP(LepR) knockout increases body weight and this effect was reversible by blocking GABA signaling. They also showed that activation of AgRP-DMH projection increases food intake, and highlighted a role for alpha3-GABAA receptor signaling in the DMH for regulating feeding behavior. While these data highlight a potential circuit that modulates feeding, there are concerns about the paper in its current form that diminish enthusiasm. The lack of proper controls in many of the experiments raises doubts about the findings.

      Strengths: The authors use new tools to characterize a new circuit for leptin-mediated energy balance control. The conditional knockout has several advantages over previous techniques that are described within the manuscript. Further, the authors use combinations of different techniques (gene knockout, optogenetic manipulation, in vivo activity monitoring) to make observations at multiple levels of analysis.

      Weaknesses: Several experiments within the paper have worrisome caveats or lack proper controls, raising concerns about the overall conclusions made.

      We appreciate the reviewer’s positive comments. We added more control and validation data in our updated manuscript to support our conclusion.

    1. Author Response

      Reviewer #1 (Public Review):

      Demographic inference is a notoriously difficult problem in population genetics, especially for non-model systems in which key population genetic parameters are often unknown and where the reality is always a lot more complex than the model. In this study, Rose et al. provided an elegant solution to these challenges in their analysis of the evolutionary history of human specialization in Ae. aegypti mosquitoes. They first applied state-of-the-art statistical phasing methods to obtain haplotype information in previously published mosquito sequences. Using this phased data, they conducted cross-coalescent and isolation-with-migration analyses, and they innovatively took advantage of a known historical event, i.e., the spread of Ae. aegypti to South America, to infer the key model parameters of generation time and mutation rate. With these parameters, they were able to confirm a previous hypothesis, which suggests that human specialists evolved at the end of the African Humid Period around 5,000 years ago when Ae. aegypti mosquitoes in the Sahel region had to adapt to human-derived water storage as their breeding sites during intense dry seasons. The authors further carried out an ancestry tract length analysis, showing that human specialists have recently introgressed into Ae. aegypti population in West African cities in the past 20-40 years, likely driven by rapid urbanization in these cities.

      Given all the complexities and uncertainties in the system, the authors have done outstanding jobs coming up with well-informed research questions and hypotheses, carrying out analyses that are most appropriate to their questions, and presenting their findings in a clear and compelling fashion. Their results reveal the deep connections between mosquito evolution and past climate change as well as human history and demonstrate that future mosquito control strategies should take these important interactions into account, especially in the face of ongoing climate change and urbanization. Methodologically, the analytical approach presented in this paper will be of broad interest to population geneticists working on demographic inference in a diversity of non-model organisms.

      In my opinion, the only major aspect that this paper can still benefit from is more explicit and in-depth communication and discussion about the assumptions made in the analyses and the uncertainties of the results. There is currently one short paragraph on this in the discussion section, but I think several other assumptions and sources of uncertainties could be included, and a few of them may benefit from some quantitative sensitivity analyses. To be clear, I don't think that most of these will have a huge impact on the main results, but some explicit clarification from the authors would be useful.

      Below are some examples:

      Thank you very much for your kind words and your feedback! We have expanded our discussion of assumptions and uncertainties – we have responded to each point below:

      1) Phasing accuracy: statistical phasing is a relatively new tool for non-model species, and it is unclear from the manuscript how accurate it is given the sample size, sequencing depth, population structure, genetic diversity, and levels of linkage disequilibrium in the study system. If authors would like to inspire broader adoption of this workflow, it would be very helpful if they could also briefly discuss the key characteristics of a study system that could make phasing successful/difficult, and how sensitive cross-coalescent analyses are to phasing accuracy.

      We agree that this is an important topic to expand on. We have clarified as follows:

      Results, Page 4, last paragraph: “Over 95% of prephase calls had maximal HAPCUT2 phred-scaled quality scores of 100 and prephase blocks (i.e. local haplotypes) were 728bp long on average (interquartile range 199-1009bp). We then used SHAPEIT4.2 to assemble the prephase blocks into chromosome-level haplotypes, using statistical linkage patterns present across our panel of 389 individuals (25).”

      Discussion, Page 8, last paragraph: “Overall linkage disequilibrium is relatively low in Ae. aegypti, dropping off quickly over a few kilobases and reaching half its maximum value within about 50kb (37); this is likely sufficient for assembling shorter, high-confidence prephase blocks into longer haplotypes in many cases. However, phase-switch errors may be common across longer distances – potentially affecting inferences in the most recent time windows. Nevertheless, the similar results we obtain using different proxy populations (and thus different input haplotype structures) for human-specialist and generalist lineages (see Figure S1) suggest that our results are robust to potential mistakes in long-range haplotype phasing.”

      Discussion, Page 9, paragraph 2: “Here, we take advantage of a continent-wide set of genomes, combined with read-based prephasing and population-wide statistical phasing to develop a phasing panel that should enable future studies in Ae. aegypti with a lower barrier to entry. The same approach may work for other study organisms with similar population genomic properties; high levels of diversity are helpful for prephasing and at least moderate levels of linkage disequilibrium are important for the assembly of prephase blocks.”

      2) Estimation of mutation rate and generation time: the estimation of these importantparameters is made based on the assumption that they should maximize the overlap between the distribution of estimated migration rate and the number of enslaved people crossing the Atlantic, but how reasonable is this assumption, and how much would the violation of this assumption affect the main result? Particularly, in the MSMC-IM paper (Wang et al. 2020, Fig 2A), even with a simulated clean split scenario, the estimated migration rate would have a wide distribution with a lot of uncertainty on both sides, so I believe that the exact meaning and limitations of such estimated migration rate over time should be clarified. This discussion would also be very helpful to readers who are thinking about using similar methods in their studies. Furthermore, the authors have taken 15 generations per year as their chosen generation time and based their mutation rate estimates on this assumption, but how much will the violation of this assumption affect the result?

      This is a great point. We have expanded our discussion of how this assumption affects our conclusions (see Discussion page 9, first paragraph): “Furthermore, we chose a scaling factor that maximized overlap between the peak of estimated Ae. aegypti migration and the peak of the Atlantic Slave Trade (Fig. 2B). If we instead consider alternative scenarios where peak migration occurred at the very beginning of the slave trade era, around 1500, then our inferred mutation rate would be lower (about 2.4e-9, assuming 15 generations per year), pushing back the split of human-specialist lineages to about 10,000 years before present. This scenario seems less plausible, in part because our isolation-with-migration analyses suggest a gradual onset of migration between continents rather than a single, early-pulse model. It would also make it harder to explain the timing of the bottleneck we see in invasive populations; the first signs of this bottleneck occur at the beginning of the slave trade (~500 years ago) with our current calibration (Fig. S1A), but would be pushed to a pre-trade date in this alternative scenario. We can also consider a scenario in which peak Ae. aegypti migration occurred more recently, perhaps around 1850, corresponding to increased global shipping traffic outside the slave trade alone. In this case, our inferred mutation rate would be higher (or generation time lower), and the split of human-specialist lineages would be placed at about 3,000 years ago. Overall, the best match between the existing literature and our data corresponds to our main estimates, but alternative scenarios could gain support if future research finds evidence for a different time course of invasion than is suggested by the epidemiological literature.”

      We have slightly expanded our description of calibration in Results, page 5, last paragraph: “The fact that we see good overlap between the two distributions (yellow–white color) across a wide range of reasonable mutation rates and generation times for Ae. aegypti is consistent with our understanding of the species’ recent history and supports our approach. For example, if we take the common literature value of 15 generations per year (0.067 years per generation) (17, 20), the de novo mutation rate that maximizes correspondence between the two datasets is 4.85x10-9 (black dot in Figure 2A, used in Figure 2B), which is on the order of values documented in other insects. We chose to carry forward this calibrated scaling factor (corresponding to any combination of mutation rate and generation time found along the line in Figure 2A) into subsequent analyses.”

      We have also expanded on the uncertainty of our analyses (see Discussion page 8, last paragraph): “First, the temporal resolution of our inferences is relatively low, and both previously published simulations (39) and our own bootstrap replicates (Figure 2B–D, grey lines) suggest relatively wide bounds for the precise timing of events.”

      3) The effect of selection: all analyses in this paper assume that no selection is at play,and the authors have excluded loci previously found to be under selection from these analyses, but how effective is this? In the ancestry tract length analysis, in particular, the authors have found that the human-specialist ancestry tends to concentrate in key genomic regions and suggested that selection could explain this, but doesn't this mean that excluding known loci under selection was insufficient? If the selection has indeed played an important role at a genome-wide level, how would it affect the main results (qualitatively)?

      We have clarified that we excluded those loci from our timing estimates for both MSMC and ancestry tract analyses, but then re-ran the ancestry tract analysis with all regions included to visualize and assess how tracts were distributed along chromosomes. See Methods, page 12, paragraph 2: “Since selection associated with adaptation to urban habitats could shape lengths of admixture tracts, we masked regions previously identified as under selection between human-specialists and generalists when estimating admixture timing—namely, the outlier regions in (2). However, we used an unmasked analysis to determine and visualize the genome-wide distribution of ancestries (Fig. 3).”

      We have also added additional discussion of the expected effects of selection on our analyses (see Discussion, page 9, last paragraph): “Positive selection during adaptive introgression can increase tract lengths and make admixture appear to be more recent than it actually is. For this reason, we masked regions of the genome thought to underlie adaptation to human habitats before running our analysis. Nevertheless, if selection has acted outside these regions, admixture may be somewhat older than we estimate.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have tried to correlate changes in the cellular environment by means of altering temperature, the expression of key cellular factors involved in the viral replication cycle, and small molecules known to affect key viral protein-protein interactions with some physical properties of the liquid condensates of viral origin. The ideas and experiments are extremely interesting as they provide a framework to study viral replication and assembly from a thermodynamic point of view in live cells.

      The major strengths of this article are the extremely thoughtful and detailed experimental approach; although this data collection and analysis are most likely extremely time-consuming, the techniques used here are so simple that the main goal and idea of the article become elegant. A second major strength is that in other to understand some of the physicochemical properties of the viral liquid inclusion, they used stimuli that have been very well studied, and thus one can really focus on a relatively easy interpretation of most of the data presented here.

      There are three major weaknesses in this article. The way it is written, especially at the beginning, is extremely confusing. First, I would suggest authors should check and review extensively for improvements to the use of English. In particular, the abstract and introduction are extremely hard to understand. Second, in the abstract and introduction, the authors use terms such as "hardening", "perturbing the type/strength of interactions", "stabilization", and "material properties", for just citing some terms. It is clear that the authors do know exactly what they are referring to, but the definitions come so late in the text that it all becomes confusing. The second major weakness is that there is a lack of deep discussion of the physical meaning of some of the measured parameters like "C dense vs inclusion", and "nuclear density and supersaturation". There is a need to explain further the physical consequences of all the graphs. Most of them are discussed in a very superficial manner. The third major weakness is a lack of analysis of phase separations. Some of their data suggest phase transition and/or phase separation, thus, a more in-deep analysis is required. For example, could they calculate the change of entropy and enthalpy of some of these processes? Could they find some boundaries for these transitions between the "hard" (whatever that means) and the liquid?

      The authors have achieved almost all their goals, with the caveat of the third weakness I mentioned before. Their work presented in this article is of significant interest and can become extremely important if a more detailed analysis of the thermodynamics parameters is assessed and a better description of the physical phenomenon is provided.

      We thank you for the comments and, in particular, for being so positive regarding the strengths of our manuscript and for raising concerns that will surely improve it. We have taken the following actions to address your concerns:

      1) Extensive revisions have been made to the use of English, particularly in the abstract and introduction. Key terms are defined as they are introduced in the text to enhance the clarity of the argument. This is a significant revision that is highlighted within the text, but it is too extensive to detail here.

      2) In the results section, we improved and extended the discussion of our graphs to the extent possible. However, we found that attempting to explain the graphs' meanings more thoroughly would detract from our manuscript's main focus: identifying thermodynamic changes that could potentially lead to alterations in material properties, specifically aspect ratio, size, and Gibbs free energy. As a result, we introduced the type of information we could obtain from our analyses in the introduction (Lines 112-125) and briefly commented on it in the ‘results’ section (Lines 304-306, sentences below).

      From introduction – lines 112-125:

      “In addition, other parameters like nucleation density determine how many viral condensates are formed per area of cytosol. Overall, the data will inform us if changing one parameter, e.g. the concentration, drives the system towards larger condensates with the same or more stable properties, or more abundant condensates that are forced to maintain the initial or a different size on account of available nucleation centres (Riback et al., 2020:Snead, 2022 #1152). It will also inform us if liquid viral inclusions behave like a binary or a multi-component system. In a binary mixture, Cdilute is constant (Klosin et al., 2020). However, in multi-component systems, Cdilute increases with bulk concentration (Riback et al., 2020). This type of information could have direct implications about the condensates formed during influenza infection. As the 8 different genomic vRNPs have a similar overall structure, they could, in theory, behave as a binary system between units of vRNPs and Rab11a. However, a change in Cdilute with concentration would mean that the system behaves as a multi-component system. This could raise the hypothesis that the differences in length, RNA sequence and valency that each vRNP has may be relevant for the integrity and behaviour of condensates.”.

      From results lines 304-306:

      This indicates that the liquid inclusions behave as a multi-component system and allow us to speculate that the differences in length, RNA sequence and valency that each vRNP may be key for the integrity and behaviour of condensates.

      3) The reviewer has drawn our attention to the absence of phase separation analysis in our study. We believe that the formation of influenza A virus condensates is governed by phase separation (or percolation coupled to phase separation). However, we must exercise caution at this point because the condensates we are studying are highly complex, and the physics of our cellular system may not be adequate to claim phase separation without being validated by an in vitro reconstitution system. IAV inclusions contain a variety of cellular membranes, different vRNPs, and Rab11a. While we have robust data to propose a model in which the liquid-like properties of IAV inclusions arise from a network of interacting vRNPs that bridge multiple cognate vRNP-Rab11 units on flexible membranes, similar to what occurs in phase-separated vesicles in neurological synapses, our model for this system still lacks formal experimental validation. As a note, the data supporting our model includes: the demonstration of the liquid properties of our liquid inclusions (Alenquer et al. 2019, Nature Communications, 10, 1629); and impairment of recycling endocytic activity during IAV infection Bhagwat et al. 2020, Nat Commun, 11, 23; Kawaguchi et al. 2012, J Virol, 86, 11086-95; Vale-costa et al. 2016, J Cell Sci, 129, 1697-710. This leads to aggregated vesicles seen by correlative light and electron microscopy (Vale-Costa et al., 2016 JCS, 129, 1697-710) and by immunofluorescence and FISH (Amorim et al. 2011,. J Virol 85, 4143-4156; Avilov et al. 2012, Vaccine 30, 7411-7417; Chou et al. 2013, PLoS Pathog 9, e1003358; Eisfeld et al. 2011, J Virol 85, 6117-6126 and Lakdawala et al. 2014, PLoS Pathog 10, e1003971.

      To be able to explore the significance of the liquid material properties of IAV inclusions, we used the strategy described in this current work. By developing an effective method to manipulate the material properties of IAV inclusions, we provide evidence that controlled phase transitions can be induced, resulting in decreased vRNP dynamics in cells and a negative impact on progeny virion production. This suggests that the liquid character of liquid inclusions is important for their function in IAV infection. We have improved our explanation addressing this concern in the limitations of our study (as outlined below in the box and in manuscript in lines 857-872).

      We are currently establishing an in vitro reconstitution system to formally demonstrate, in an independent publication, that IAV inclusions are formed by phase separation (or percolation coupled to phase separation). For this future work, we teamed up with Pablo Sartori, a theorical physicist to derive in-depth analysis of the thermodynamics of the viral liquid condensates in the in vitro reconstituted system and compare it to results obtained in the cell. This will provide means to establish comparisons. We think that cells have too many variables to derive meaningful physics parameters (such as entropy and enthalpy) and models that need to be complemented by in vitro systems. For example, increasing the concentration inside a cell is not a simple endeavour as it relies on cellular pathways to deliver material to a specific place. At the same time, the 8 vRNPs, as mentioned above, have different size, valency and RNA sequence and can behave very differently in the formation of condensates and maintenance of their material properties. Ideally, they should be analysed individually or in selected combinations. For the future, we will combine data from in vitro reconstitution systems and cells to address this very important point raised by the reviewer.

      From the paper on the section ‘Limitations of the study’:

      “Understanding condensate biology in living cells is physiological relevant but complex because the systems are heterotypic and away from equilibria. This is especially challenging for influenza A liquid inclusions that are formed by 8 different vRNP complexes, which although sharing the same structure, vary in length, valency, and RNA sequence. In addition, liquid inclusions result from an incompletely understood interactome where vRNPs engage in multiple and distinct intersegment interactions bridging cognate vRNP-Rab11 units on flexible membranes (Chou et al., 2013, Gavazzi et al., 2013, Sugita et al., 2013, Shafiuddin and Boon, 2019, Haralampiev et al., 2020, Le Sage et al., 2020). At present, we lack an in vitro reconstitution system to understand the underlying mechanism governing demixing of vRNP-Rab11a-host membranes from the cytosol. This in vitro system would be useful to explore how the different segments independently modulate the material properties of inclusions, explore if condensates are sites of IAV genome assembly, determine thermodynamic values, thresholds accurately, perform rheological measurements for viscosity and elasticity and validate our findings. The results could be compared to those obtained in cell systems to derive thermodynamic principles happening in a complex system away from equilibrium. Using cells to map how liquid inclusions respond to different perturbations provide the answer of how the system adapts in vivo, but has limitations.

      Reviewer #2 (Public Review):

      During Influenza virus infection, newly synthesized viral ribonucleoproteins (vRNPs) form cytosolic condensates, postulated as viral genome assembly sites and having liquid properties. vRNP accumulation in liquid viral inclusions requires its association with the cellular protein Rab11a directly via the viral polymerase subunit PB2. Etibor et al. investigate and compare the contributions of entropy, concentration, and valency/strength/type of interactions, on the properties of the vRNP condensates. For this, they subjected infected cells to the following perturbations: temperature variation (4, 37, and 42{degree sign}C), the concentration of viral inclusion drivers (vRNPs and Rab11a), and the number or strength of interactions between vRNPs using nucleozin a well-characterized vRNP sticker. Lowering the temperature (i.e. decreasing the entropic contribution) leads to a mild growth of condensates that does not significantly impact their stability. Altering the concentration of drivers of IAV inclusions impact their size but not their material properties. The most spectacular effect on condensates was observed using nucleozin. The drug dramatically stabilizes vRNP inclusions acting as a condensate hardener. Using a mouse model of influenza infection, the authors provide evidence that the activity of nucleozin is retained in vivo. Finally, using a mass spectrometry approach, they show that the drug affects vRNP solubility in a Rab11a-dependent manner without altering the host proteome profile

      The data are compelling and support the idea that drugs that affect the material properties of viral condensates could constitute a new family of antiviral molecules as already described for the respiratory syncytial virus (Risso Ballester et al. Nature. 2021)

      Nevertheless, there are some limitations in the study. Several of them are mentioned in a dedicated paragraph at the end of a discussion. This includes the heterogeneity of the system (vRNP of different sizes, interactions between viral and cellular partners far from being understood), which is far from equilibrium, and the absence of minimal in vitro systems that would be useful to further characterize the thermodynamic and the material properties of the condensates.

      There are other ones.

      We thank reviewer 2 for highlighting specific details that need improving and raising such interesting questions to validate our findings. We have addressed the comments of Reviewer 2, we performed the experiments as described (in blue) below each point raised.

      1) The concentrations are mostly evaluated using antibodies. This may be correct for Cdilute. However, measurement of Cdense should be viewed with caution as the antibodies may have some difficulty accessing the inner of the condensates (as already shown in other systems), and this access may depend on some condensate properties (which may evolve along the infection). This might induce artifactual trends in some graphs (as seen in panel 2c), which could, in turn, affect the calculation of some thermodynamic parameters.

      The concern of using antibodies to calculate Cdense is valid, and we thought it was very important. We addressed this concern by performing the same analyses using a fluorescent tagged virus that has mNeon Green fused to the viral polymerase PA (PA-mNeonGreen PR8 virus). Like NP, PA is a component of vRNPs and labels viral inclusions, colocalising with Rab11 when vRNPs are in the cytosol. However, per vRNP there is only one molecule of PA, whilst of NP there are 37-96 depending on the size of vRNPs. As predicted, we did observe changes in the Cdilute, Cdense and nucleation density. However, the measurements and values obtained for Gibbs free energy, size, aspect ratio detecting viral inclusions with fluorescently tagged vRNPs or antibody staining followed the same trend and allow us to validate our conclusion that major changes in Gibbs free energy occur solely when there is a change in the valency/strength of interactions but not in temperature or concentration (Figure 1 below). Given the extent of these data, we show here the results but, in the manuscript, we will describe the limitations of using antibodies in our study within the section ‘Limitations of the study’ from lines 881-894. Given the importance of the question regarding the pros and cons of the different systems for analysing thermodynamic parameters, we have decided to systematically assess and explore these differences in detail in a future manuscript.

      For more information. This reviewer may be asking why we did not use the PA-fluorescent virus in the first place to evaluate inclusion thermodynamics and avoid problems in accessibility that antibodies may have to get deep into large inclusions. Our answer is that no system is perfect. In the case of the PA-fluorescent virus, the caveats revolve around the fact that the virus is attenuated (Figure 1a below), exhibiting a delayed infection as demonstrated by reduced levels of viral proteins (Figure 1b below). Consistently, it shows differences in the accumulation of vRNPs in the cytosol and viral inclusions form later in infection and the amount of vRNPs in the cytosol does not reach the levels observed in PR8-WT virus. After their emergence, inclusions behave as in the wild-type virus (PR8-WT), fusing and dividing (Figure 1c below) and displaying liquid properties.

      As the overarching goal of this manuscript is to evaluate the best strategies to harden liquid IAV inclusions and given that one of the parameters we were testing is concentration, we reasoned that using PR8-WT virus for our analyses would be reasonable.

      In conclusions, both systems have caveats that are important to systematically assess, and these differences may shift or alter thermodynamic parameters such as nucleation density, inclusion maturation rate, Cdense, Cdilute in particular by varying the total concentration. As a note, to validate all our results using the PA-mNeonGreen PR8 virus, we considered the delayed kinetics and applied our thermodynamic analyses up to 20 hpi rather than 16 hpi.

      However, because of the question raised by this reviewer, on which is the best solution for mitigating errors induced by using antibodies, we re-checked all our data. Not only have we compared the data originated from attenuated fluorescently tagged virus with our data, but also made comparisons with images acquired from Z stacks (as used for concentration and for type/strength of interactions) with those acquired from 2D images. Our analysis revealed that there is a very good match using images acquired with Z-stacks and analysed as Z projections with between antibody staining and vRNP fluorescent virus. Therefore, we re-analysed all our thermodynamic data done with temperature using images acquired from Z stacks and altered entirely Figure 2. We believe that all these comparisons and analyses have greatly improved the manuscript and hence we thank all reviewers for their input.

      Figure 1 – The PA-mNeonGreen virus is attenuated in comparison to the WT virus and data obtained is consistent for Gibbs free energy with analyses done with images processed with antibody fluorescent vRNPs. A. Representation of the PA-mNeonGreen virus (PA-mNG; Abbreviations: NCR: non coding region). B. Cells (A549) were transfected with a plasmid encoding mCherry-NP and co-infected with PA-mNeonGreen virus for 16h, at an MOI of 10. Cells were imaged under time-lapse conditions starting at 16 hpi. White boxes highlight vRNPs/viral inclusions in the cytoplasm in the individual frames. The dashed white and yellow lines mark the cell nucleus and the cell periphery, respectively. The yellow arrows indicate the fission/fusion events and movement of vRNPs/ viral inclusions. Bar = 10 µm. Bar in insets = 2 µm. C-D. Cells (A549) were infected or mock-infected with PR8 WT or PA-mNG viruses, at a multiplicity of infection (MOI) of 3, for the indicated times. C. Viral production was determined by plaque assay and plotted as plaque forming units (PFU) per milliliter (mL) ± standard error of the mean (SEM). Data are a pool from 2 independent experiments. D. The levels of viral PA, NP and M2 proteins and actin in cell lysates at the indicated time points were determined by western blotting. (E-G) Biophysical calculations in cells infected with the PA-mNeonGreen virus upon altering temperature (at 10 hpi, evaluating the concentration of vRNPs (over a time course) in conditions expressing native amounts of Rab11a or overexpressing low levels of Rab11a and upon altering the type/strength of vRNP interactions by adding nucleozin at 10 hpi during the indicated time periods. All data: Ccytoplasm/Cnucleus; Cdense, Cdilute, area aspect ratio and Gibbs free energy are represented as boxplots. Above each boxplot, same letters indicate no significant difference between them, while different letters indicate a statistical significance at α = 0.05 using one-way ANOVA, followed by Tukey multiple comparisons of means for parametric analysis, or Kruskal-Wallis Bonferroni treatment for non-parametric analysis.

      2) Although the authors have demonstrated that vRNP condensates exhibit several key characteristics of liquid condensates (they fuse and divide, they dissolve upon hypotonic shock or upon incubation with 1,6-hexanediol, FRAP experiments are consistent with a liquid nature), their aspect ratio (with a median above 1.4) is much higher than the aspect ratio observed for other cellular or viral liquid compartments. This is intriguing and might be discussed.

      IAV inclusions have been shown to interact with microtubules and the endoplasmic reticulum, that confers movement, and undergo fusion and fission events. We propose that these interactions and movement impose strength and deform inclusions making them less spherical. To validate this assumption, we compared the aspect ratio of viral inclusions in the absence and presence of nocodazole (that abrogates microtubule-based movement). The data in figure 2 shows that in the presence of nocodazole, the aspect ratio decreases from 1.42±0.36 to 1.26 ±0.17, supporting our assumption.

      Figure 2 – Treatment with nocodazole reduces the aspect ratio of influenza A virus inclusions. Cells (A549) were infected with PR8 WT for 8 h and treated with nocodazole (10 µg/mL) for 2h, after which the movement of influenza A virus inclusions was captured by live cell imaging. Viral inclusions were segmented, and the aspect ratio measured by imageJ, analysed and plotted in R.

      3) Similarly, the fusion event presented at the bottom of figure 3I is dubious. It might as well be an aggregation of condensates without fusion.

      We have changed this (check Fig 5A and B in the manuscript), thank you for the suggestion.

      4) The authors could have more systematically performed FRAP/FLAPh experiments on cells expressing fluorescent versions of both NP and Rab11a to investigate the influence of condensate size, time after infection, or global concentrations of Rab11a in the cell (using the total fluorescence of overexpressed GFP-Rab11a as a proxy) on condensate properties.

      We have included a new figure, figure 5 with the suggested data.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors present evidence from studies of biopsies from human subject and muscles from young and older mice that the enzyme glutathione peroxidase 4 (GPx4) is expressed at reduced levels in older organisms associated with elevated levels of lipid peroxides. A series of studies in mice established that genetic reduction of GPx4 and hindlimb unloading each elevated lipid peroxide levels and reduced muscle contractility in young animals. Overexpression of GPx4 or N- acetylcarnosine blocked atrophy and loss of force generating capacity resulting from hindlimb unloading in young mice. Cell culture experiments in C2C12 myotubes were used to develop evidence linking elevated lipid peroxide levels to atrophy using genetic and pharmacologic approaches. Links between autophagy and atrophy were suggested.

      Experiments on GPx4 expression levels, lipid peroxide levels, muscle mass and muscle force generating capacity were internally consistent and convincing. I thought the experiments supporting the view that autophagy contributed to atrophy were convincing. The hypothesis that altered lipidation of autophagy factors contributed was tested or supported in my view. Evidence for muscle atrophy in response to genetic or pharmacologic manipulations is a bit inconsistent throughout the paper, possibly because the small N of some experiments does not provide sufficient power to detect observed numeric differences in the means. The pattern of muscle fiber atrophy by fiber type is consistent throughout the paper but there is variability in which comparisons reached the threshold for significance, again, possibly because of the small N of the experiments. I agree with the authors that altered activity of enzymes in the contractile apparatus provides one explanation for the observed weakness but respectfully wish to point out there are others such as impaired excitation-contraction coupling which is well known to occur in aging.

      We thank Dr. Cardozo for taking time to carefully review our manuscript, and for providing an enthusiastic feedback for the significance of our work. We are grateful for additional suggestions and modified our manuscript accordingly.

      Reviewer #2 (Public Review):

      This is a well-written paper that reports that the accumulation of LOOH with age and disuse contributes to the loss of skeletal muscle mass and strength. Moreover, the authors report that LOOH neutralization attenuates muscle atrophy and weakness. The mechanism via which LOOH contributes to these phenotypes remains unclear but seems to be mediated by the autophagy- lysosomal axis. In addition, the paper also reports the efficacy of N-acetylcarnosine treatment in ameliorating muscle atrophy in mice.

      We thank the reviewer 2 for their positive response to our manuscript. Very much appreciated! Below please find our response to your specific comments.

      The authors should consider the following points to improve the manuscript:

      • The authors showed that inhibition of the autophagy-lysosome axis by ATG3 deletion or BafA1 was sufficient to reduce LOOH levels induced by GPx4 deletion, erastin, or RSL3. Moreover, they found that 4-HNE co-localizes with LAMP2. However, it remains unclear the precise mechanism via which LOOH contributes to muscle atrophy and how it is amplified by the autophagy-lysosomal axis. The authors could further test the functional interaction of 4-HNE with LAMP2 with additional experiments such as immunoprecipitation.

      Thank you for these comments. We agree with the reviewer that our observations on autophagy-lysosomal axis is yet backed by a tangible mechanism. To clarify, we only show 4HNE and LAMP2 colocalization to show that they are proximate to each other. We do not necessarily claim that LAMP2 is the protein that becomes 4-HNE-ylated. We are currently developing a proteomic platform to detect 4-HNE conjugations on peptides, and this should hopefully shed light to the nature of interaction between LOOH and the autophagy-lysosomal axis. We now include additional discussion on autophagy-lysosomal axis with LOOH in lines 280-291.

      • A weak point of the paper is not having performed the experiments on 24-month-old-mice. At 20 months of age, the mice do not display any muscle wasting and myofiber atrophy compared to young mice that have completed postnatal muscle growth (=6-month-old-mice). It would be interesting to see the levels of 4-HNE in 24- or 30-month-old mice, and if N-acetylcarnosine treatment in older mice is able to rescue muscle atrophy induced by aging.

      This is a nuanced but a very important point. We initially set out to study mice in the 24 months old mice, but these mice did not tolerate the hindlimb unloading procedure well and ended up using the 20 months old mice instead. While mice at this age tolerated our HU procedure well, they did not manifest significant reduction in muscle mass compared to young. We included additional discussions in lines 298-300 and 310-314. To address this point, we are currently performing a 6-month N-acetylcarnosine intervention in 24 months old mice, and examine the effect that this compound has on the effect of aging (without HU) in multiple organ systems. We have thus completed 2 cohorts for this preclinical trial. Results on the effects of long-term N- acetylcarnosine treatment on muscle will be included in the separate manuscript.

      Previous studies have shown that inhibition of autophagy accelerates (rather than protect) from sarcopenia, and that autophagy is required to maintain muscle mass (Masiero 2009, PMID: 19945408; Castets 2013, PMID: 23602450; Carnio 2014, PMID: 25176656). On this basis, the authors should test whether their findings are valid only in the context of disuse atrophy or also in the context of sarcopenia (=24-30-month-old mice).

      We agree with the reviewer that the role of autophagy and muscle mass is likely complex. In the current study, we only showed that a SHORT-TERM inhibition of autophagy by ATG3 deletion prevents muscle atrophy induced by a SHORT-TERM disuse intervention. Inhibition of autophagic machinery long-term will likely be detrimental, and as shown in references provided by the reviewer, accelerates sarcopenia. We now include these discussions in lines 280-287. We respectfully request that the experiments in 24-30 month old ATG3-MKO mice be beyond the scope of the study. As discussed above, there is much more to study regarding the nature of interaction between the autophagy-lysosomal axis and LOOH.

      • In Fig.2 the authors report that GPx4 KD, erastin, and RSL3 reduce the diameter of myotubes. For how long and when was the treatment done? Looking at the images, it seems that there are some myoblasts in the cultures treated with GPx4 KD, erastin, and RSL3. Is it possible that these compounds reduce myotube size by inhibiting myoblast fusion rather than by inducing myotube atrophy?

      Thank you for point this out. We now provide further details in the method section (lines 439- 443). For KD experiments, we treat myoblasts with virus simultaneous to differentiation, due to lower infection efficiency in myotubes. This is certainly a caveat. However, erastin and RSL3 experiments were done on fully differentiated myotubes. It is common to have non- differentiated myoblasts under differentiated myotubes.

      • MDA quantification was done in the gastrocnemius although all the experiments in this paper were performed in the soleus and EDL. It would be good if the authors could explain the reason for this.

      MDA and 4-HNE WB were done on gastroc for all mouse models because some soleus and EDL muscles are below 7 mg and provided insufficient materials to perform MDA or 4-HNE. Soleus and EDL were used for contractile experiments (gastr0c cannot be used for this experiment) and for histological analyses.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Jigo et al. measured the entire contrast sensitivity function and manipulated eccentricity and stimulus size to assess changes in contrast sensitivity and acuity for different eccentricities and polar angles. They found that CSFs decreased with eccentricity, but to a lesser extent after M scaling while compensating for striate-cortical magnification around the polar angle of the visual field did not equate to contrast sensitivity.

      In this article, the authors used classic psychophysical tests and a simple experimental design to answer the question of whether cortical magnification underlies polar angle asymmetries of contrast sensitivity. Contrast sensitivity is considered to be the most fundamental spatial vision and is important for both normal individuals and clinical patients in ophthalmology. The parametric contrast sensitivity model and the extraction of key CSF attributes help to compare the comparison of the effect of M scaling at different angles. This work can provide a new reference for the study of normal and abnormal space vision.

      The conclusions of this paper are mostly well supported by data, but some aspects of data collection and analysis need to be clarified and extended.

      1) In addition to the key CSF attributes used in this paper, the area under the CSF curve is a common, global parameter to figure out how contrast sensitivity changes under different conditions. An analysis of the area under the CSF curve is recommended.

      – We have added the area under the CSF (AULCSF) [lines 305-319, Fig 5 E-F; lines 339-343, Fig 6 E-F]. Differences for non-magnified and magnified stimuli are not eliminated.

      2) In Figure 2, CRFs are given for several SFs, but were the CRFs at the cutof-sf well-fitted? The authors should have provided the CRF results and corresponding fits to make their results more solid.

      – As reported in Fig 4A,C,E, the group data fits were very high (≥.98).

      3) The authors suggested that the apparent decrease in HVA extent at high SF may be due to the lower cutoff-SF of the perifoveal VM. Analysis of the correlation between the change in HVA and cutoff SF after M scaling may help to draw more comprehensive conclusions.

      – We have rephrased our explanation [lines 453-460]. As per your suggestion, we correlated the change in HVA and the cutoff SF after M scaling and found these correlations to be non significant.

      4) In Figure 6, it would be desirable to add panels of exact values of HVA and VMA effects for key CSF attributes at different eccentricities, as shown in Figures 4B, D, and F, to make the results more intuitive.

      – We have added these panels [FIG 6] and the corresponding analysis in the text [lines 321-343]

      5) More discussions are needed to interpret the results. 1) Due to the different testing distances in VM and HM, their retinae will be in a different adaptation state, making any comparison between VM and HM tricky. The author should have added a discussion on this issue.

      – Note that the mean luminance of the display (from retina to monitor) was 23 cd/m2 at 57cm and 19 cd/m2 at 115 cm. The pupil size difference for these two conditions is relatively small (< 0.5 mm) and should not significantly affect contrast sensitivity (Rahimi-Nasrabadi et al., 2021) [lines 483-491]. Moreover, the differences we get here are consistent with the asymmetries we (e.g., Carrasco, Talgar & Cameron, 2001; Cameron, Tai & Carrasco, 2002; Fuller, Park & Carrasco, 2009; Abrams, Nizam & Carrasco, 2012; Corbett & Carrasco, 2012; Himmelberg, Winawer & Carrasco, 2020) and many others (e.g., Baldwin et al., 2012; Pointer & Hess, 1989; Regan and Beverley, 1983; Rijsdijk et al., 1980; Robson and Graham, 1981; Rosén et al., 2014; Silva et al., 2008) have observed for contrast sensitivity when the vertical and horizontal meridian are tested simultaneously at the same distance.

      6) In Figure 4, the HVA extent appears to change after M-scaling, although the analysis shows that M-scaling only affects the HVA extent at high SF. In contrast, the range of VMA was almost unchanged. The authors could have discussed more how the HVA and VMA effects behave differently after M-scaling.

      – We had commented on this pattern and have further clarified it [lines 436-451]

      7) The results in Figure 4 also show that at 11.3 cpd, the measurement may be inaccurate. This might lead to an inaccurate estimate of the M scaling effect at 11.3 cpd. The authors should discuss this issue more.

      – We have explained why this data point is at chance [FIG 4 caption]

      8) The different neural image-processing capabilities among locations, which is referred to as the "Qualitative hypothesis", is the main hypothesis explaining the differences around the polar angle of the visual field. To help the reader better understand this concept, the author should provide further discussions.

      – We have expanded the discussion of the qualitative hypothesis of differences in polar angle (lines 86-92; lines 476-481).

      9) The authors should also provide more details about their measures. For example, high grayscale is crucial in contrast sensitivity measurements, and the authors should clarify whether the monitor was calibrated with high grayscale or only with 8-bit. Since the main experiment was measuring CS at different locations, it should also be clarified whether the global uniformity of the display was calibrated.

      – The monitor was calibrated with 8-bit at the center of the display [lines 607].

      – Regarding global uniformity, although we only calibrated at the center of the display, please note that the asymmetries are not due to the particular monitor we used. We have obtained these asymmetries in contrast sensitivity in numerous studies using multiple monitors over 20 years (e.g., Carrasco, Talgar & Cameron, 2001; Cameron, Tai & Carrasco, 2002; Fuller, Park & Carrasco, 2009; Abrams, Nizam & Carrasco, 2012; Corbett & Carrasco, 2012; Hanning et al., 2022a; Himmelberg et al., 2020) and other groups have reported these visual asymmetries as well (Baldwin et al., 2012; Pointer and Hess, 1989; Rosén et al., 2014). Also important, as we had mentioned in the Introduction [lines 55-59], the HVA and VMA asymmetries shift in-line with egocentric referents, corresponding to the retinal location of the stimulus, not with the allocentric location (Corbett & Carrasco, 2011).

      10) In addition, their method of data analysis relies on parametric contrast sensitivity model fitting. One of the concerns is whether there are enough trials for each SF to measure the threshold. The authors should have included in their method the number of trials corresponding to each SF in each CSF curve.

      – We have specified number of trials [lines 637-644]

      Reviewer #2 (Public Review):

      This is an interesting manuscript that explores the hypothesis that inhomogeneities in visual sensitivity across the visual field are not solely driven by cortical magnification factors. Specifically, they examine the possibility that polar angle asymmetries are subserved by differences not necessarily related to the neural density of representation. Indeed, when stimuli were cortically magnified, pure eccentricity-related differences were minimized, whereas applying that same cortical magnification factor had less of an effect on mitigating polar angle visual field anisotropies. The authors interpret this as evidence for qualitatively distinct neural underpinnings. The question is interesting, the manuscript is well written, and the methods are well executed.

      1) The crux of the manuscript appears to lean heavily on M-scaling constants, to determine how much to magnify the stimuli. While this does appear to do a modest job compensating for eccentricity effects across some spatial frequencies within their subject pool, it of course isn't perfect. But what I am concerned about is the degree to which the M-scaling that is then done to adjust for presumed cortical magnification across meridians is precise enough to rely on entirely to test their hypothesis. That is, do the authors know whether the measures of cortical magnification across a polar angle that are used to magnify these stimuli are as reliable across subjects as they tend to be for eccentricity alone? If not, then to what degree can we trust the M-scaled manipulation here? In an ideal world, the authors could have empirically measured cortical surface area for their participants, using a combination of retinotopy and surface-based measures, and precisely compensated for cortical magnification, per subject. It would be helpful if the authors better unpacked the stability across subjects for their cortical magnification regime across polar angles.

      –– We note that the equations by Rovamo and Virsu are commonly used to cortically magnify stimulus size. This paper has many citations, and the conclusions of many studies are based on those calculations [lines 115-128].

      –– In response to Rev’s 3 comment, “In lieu of carrying out new measurements, it could also suffice to compare individual cortical magnification factors to the performance to quantify the contribution to the psychophysical performance”, we found a significant correlation between the surface area and contrast sensitivity measures at the horizontal, upper-vertical and lower-vertical meridians. However, we found no significant correlation between the cortical surface with the difference in contrast sensitivity for fixed-size and magnified stimuli at 6 deg at each meridian. These findings suggest that surface area plays a role but that individual magnification is unlikely to equalize contrast sensitivity [lines 366-380; Fig 7; lines 511-529].

      2) Related to this previous point, the description of the cortical magnification component of the methods, which is quite important, could be expanded on a bit more, or even placed in the body of the main text, given its importance. Incidentally, it was difficult to figure out what the references were in the Methods because they were indexed using a numbering system (formatted for perhaps a different journal), so I could only make best guesses as to what was being referred to in the Methods. This was particularly relevant for model assumptions and motivation.

      –– We now detail M-scaling in the Introduction [lines 115-135], and we have fixed the references in the Methods section.

      3) Another methodological aspect of the study that was unclear was how the fitting worked. The authors do a commendably thorough job incorporating numerous candidate CSF models. However, my read on the methods description of the fitting procedure was that each participant was fitted with all the models, and the best model was then used to test the various anisotropy models afterwards. What was the motivation for letting each individual have their own qualitatively distinct CSF model? That seems rather unusual.

      Related to this, while the peak of the CSF is nicely sampled, there was a lack of much data in the cutoff at higher spatial frequencies, which at least in the single subject data that was shown made the cutoff frequency measure seem like it would be unreliable. Did the authors find that to be an issue in fitting the data?

      –– We have further clarified that we fit all 9 models to the grouped data [lines 177-178] and in Methods [lines 693, 716, 725], and that the fit in Figure 3 corresponds to the grouped data [Fig 3 caption]. As reported in Fig 4A,C,E, the group data fits were very high (≥.98). Please note that the cutoff spatial frequency is reliable. The data point (11.3 cpd) in the differences which does not follow the same function (Fig 4D,F) reflects the fact that for both magnified and not-magnified stimuli, performance was at chance, consistent with the fact that high SF are harder to discriminate at peripheral locations [Fig 4 caption].

      4) The manuscript concludes that cortical magnification is insufficient to explain the polar angle inhomogeneities in perceptual sensitivity. However, there is little discussion of what the authors believe may actually underlie these effects then. It would be productive if they could offer some possible explanation.

      –– We have expanded the discussion of qualitative hypothesis of differences in polar angle [lines 86-92; lines 476-481].

      –– We have expanded the discussion of possible mechanisms [lines 496-529].

      –– We have explained why having assessed the VM and HM and different distances does not significantly influence our measures [lines 483-491].

      –– We have expanded the discussion of how the HVA and VMA effects behave differently after M-scaling [lines 435-450].

      –– We have clarified that the fits are reliable and made explicit that the highest SF data point is at chance in both conditions [FIG 4 caption].

      Reviewer #3 (Public Review):

      Jigo, Tavdy & Carrasco used visual psychophysics to measure contrast sensitivity functions across the visual field, varying not only the distance from fixation (eccentricity) but also the angular position (meridian). Both parameters have been shown to affect visual sensitivity: spatial visual acuities generally fall off with eccentricity, it is now widely accepted that it is superior along the horizontal than the vertical meridian, and there may also be differences between the upper and lower visual field, although this anisotropy is typically less pronounced. The eccentricity-dependent decrease in performance is thought to be due to reduced cortical magnification in peripheral compared to central vision; that is, the amount of brain tissue devoted to mapping a fixed amount of visual space. The authors, therefore, include a crucial experimental condition in which they scale the size of their stimuli to account for reduced cortical magnification. They find that while this corrects for reduced performance related to stimulus eccentricity, it does not fully explain the variation in performance at different visual field meridians. They argue that this suggests other neural mechanisms than cortical magnification alone underlie this intra-individual variability in visual perception.

      The experiments are done to an extremely high technical standard, the analysis is sound, and the writing is very clear. The main weakness is that as it stands the argument against cortical magnification as the factor driving this meridional variability in visual performance is not entirely convincing. The scaling of stimulus size is based on estimates in previous studies. There are two issues with this: First, these studies are all quite old and therefore used methods that cannot be considered state-of-the-art anymore. In turn, the estimates of cortical magnification may be a poor approximation of actual differences in cortical magnification between meridians.

      –– We note that the equations by Rovamo and Virsu are commonly used to cortically magnify stimulus size. This paper has many citations, and the conclusions of many studies are based on those calculations [lines 115-128].

      –– In response to Rev’s 3 comment, “In lieu of carrying out new measurements, it could also suffice to compare individual cortical magnification factors to the performance to quantify the contribution to the psychophysical performance”, we found a significant correlation between the surface area and contrast sensitivity measures at the horizontal, upper-vertical and lower-vertical meridians. However, we found no significant correlation between the cortical surface with the difference in contrast sensitivity for fixed-size and magnified stimuli at 6 deg at each meridian. These findings suggest that surface area plays a role but that individual magnification is unlikely to equalize contrast sensitivity [lines 366-380; Fig 7; lines 511-529].

      Second, we now know that this intra-individual variability is rather idiosyncratic (and there could be a wider discussion of previous literature on this topic). Since these meridional differences, especially between upper and lower hemifields, are relatively weak compared to the variance, a scaling factor based on previous data may simply not adequately correct these differences. In fact, the difference in scaling used for the upper and lower vertical meridian is minute, 7.7 vs 7.68 degrees of visual angle, respectively. This raises the question of whether such a small difference could really have affected performance.

      That said, there have been reports of meridional differences in the spatial selectivity of the human visual cortex (Moutsiana et al., 2016; Silva et al., 2017) that may not correspond one-to-one with cortical magnification. This could be a neural substrate for the differences reported here. This possibility could also be tested with their already existing neurophysiological data. Or perhaps, there could be as-yet undiscovered differences in the visual system, e.g., in terms of the distribution of cells between the ventral and dorsal retina. As such, the data shown here are undoubtedly significant and these possibilities are worth considering. If the authors can address this critique either by additional experiments, analyses, or by an explanation of why this cannot account for their results, this would strengthen their current claims; alternatively, the findings would underline the importance of these idiosyncrasies in the visual cortex.

      We now include discussion of the different points that the reviewer raised here in our new section 'What mechanism might underlie perceptual polar angle asymmetries' [lines 497-530].

    1. Author Response

      Reviewer #1 (Public Review):

      • The statistical procedures used are not completely described and may not be appropriate.

      We revised the text in Methods and Results sections to give more details about the methods used.

      -As only two levels of delay were tested, it is not possible to directly test whether the subjective discounting function is hyperbolic or exponential and hence whether the delay is encoded subjectively or objectively.

      We agree with the reviewer. A higher number of task parameters may offer a better resolution to evaluate the discounting functions. Fortunately, this does not affect our main results.

      • The task has several variable interval lengths (hold in: 1.2-2.8 s, short delay: 1.8-2.3 s, long delay: 3.5-4s) that frustrate interpretation. The distribution of these delays is not described, for example as it reads it seems possible that some long delay rewards are delivered with shorter latency between cue and reward than some short delay rewards (1.2 + 3.5 = 4.7s vs. 2.8+2.3 = 5.1 s).

      We revised the text to address that ambiguity. In the new version of the manuscript, we describe short versus long delays considering the total delay intervals between instruction cue onset and reward delivery [short delay (3.5-5.6s) and long delay (5.2-7.3s)]. Within each delay category, individual delays were distributed in a gaussian fashion such that the two delay ranges overlapped for 9% of trials. These details are now described in the revised Methods section (pg. 22).

      -The authors have not considered that if the delay value is encoding, then the value, both objectively and subjectively, may be changing as the delay elapses. The variation of these task intervals may have an effect on the value of delay.

      In the present study, we report a dynamic integration between the desirability of the expected reward and the imposed delay to reward delivery across the waiting period. Our results (e.g. see Fig. 6) do not fit with simple linear (or logarithmic) effects corresponding to continuous regular changes as the delay elapses. We found different types of interactions (Discounting± and Compounding±) at different periods of the hold period and in different single units. We did not find a way to model all these types of interactions with this type of approach.

      Reviewer #2 (Public Review):

      • Plots of "rejection rate" (trials where the monkeys failed to wait until the rewards) as a function of delay and reward size seem to indicate that the monkeys understood the visual cue. The rejection rates were very low (less than 4% for almost all conditions) which indicates that the monkeys did not have a hard time inhibiting their behavior. It also meant that the authors could not compare trials where the monkeys successfully waited with trials where they failed to wait. This missing comparison weakens the link between the neurophysiological observations and the conclusions the authors made about the signals they observed.

      Here, our main goal was to describe the dynamic STN signals engaged during the waiting period without studying action-related activities. In the discussion (pg. 20), we clearly wrote ‘Further research is needed to determine whether the neural signals identified here causally drive animals’ behavior or rather just participate to reflect or evaluate the current situation.’ Consequently, our conclusions were already tempered by that point.

      In addition, we address the same limitation by writing (pg. 20): “An important avenue for future research will be to determine how STN signals, such as those described here, change when animals run out of patience and finally decide to stop waiting. To do this, however, smaller reward sizes and longer delays might be used to promote more escape behaviors during the delay interval.”

      • The authors examined the STN activity aligned to the start of the delay and also aligned to the reward. Most of the "delay encoding" in the STN activity was observed near the end of the waiting period. The trouble with the analysis is that a neuron that responded with exactly the same response on short and long trials could appear to be modulated by delay. This is easiest to see with a diagram, but it should be easy to imagine a neural response that quickly rose at the time of instruction and then decayed slowly over the course of 2 seconds. For long trials, the neuron's activity would have returned to baseline, but for short trials, the activity would still be above baseline. As such, it is not clear how much the STN neurons were truly modulated by delay.

      We agree with the reviewers. Our original analyses using two-time windows had the potential to introduce biases in the detection of neuronal activities modulated by the delay. To overcome this issue, we modified the time frame of all of our analyses (neuronal activity, eye position, EMG). Now, the revised version of the manuscript only reports activities across one-time window aligned to the time of instruction cue delivery (i.e., -1 to 3.5s relative to instruction cue onset). This time frame corresponds to the minimum possible interval between instruction cues and reward delivery. We have revised all of the figures and we re-calculated all of the statistics using that one analysis window. Despite these major modifications, our key findings were not changed substantially. We found the same pattern in STN activities, with a strong encoding of reward (48% of neurons) preceding a late encoding of delay (39% of neurons). We also updated the text in Methods and Results sections to reflect the revised analyses.

      • Another concern is the presence of eye movement variables in the regressions that determine whether a neuron is reward or delay encoding. If the task variables modulated eye movements (which would not be surprising) and if the STN activity also modulated eye movements, then, even if task variables did not directly modulate STN activity, the regression would indicate that it did. This is commonly known as "collider bias". This is, unfortunately, a common flaw in neuroscience papers.

      Because the presence of eye variables did not influence how neurons were selected by the GLM, we do not think it likely that our analysis was susceptible to “collider bias”. Nonetheless, to control for that possibility directly, we have now repeated the GLM analyses with eye movement variables excluded. Results are shown in a new figure (Fig.4 – supplementary 1). Exclusion of eye parameters produced results that are very similar to those from the GLM that included eye parameters (differences <3 degrees). We have added text to the manuscript describing this added control analysis.

    1. Author Response

      Reviewer #2 (Public Review):

      The work integrated genomic and transcriptomic data to reconstruct the origin of the svPDE gene from the ancestral ENPP3 gene. The authors also analyzed the expression of svPDE along different snake lineages and different tissues in three species of venomous snakes. Finally, they purified an svPDE from the venom of Naja atra and analyzed its crystallographic structure and enzymatic function. The experiments are adequately designed and carefully planned and the conclusions made by the authors are well supported by evidence.

      I have the following suggestions:

      1) I could not find a section where the authors provided information regarding the origin of the analyzed venom and tissues. i.e. muscle tissue from Naja atra and venom for purification of svPDE. It is important to include this information.

      We thank the reviewer for mentioning this.

      The information for the venom purification has been described in Results (LINE 116) as “This svPDE was directly purified from the crude venom of Naja atra captured in Taiwan”. The information for the tissues of sequencing data has been included in Results (LINE 117) as “… with publicly available RNA-Seq data and compared them with the corresponding genomes available in the NCBI Assembly database (SI Appendix, Table S1)”, and Material and Methods (Line 403) as “DNA was extracted from the muscle tissue of a male Naja atra …”.

      Also, the SI Appendix Table S1 summarized all samples used for sequence analysis with their tissue origins.

      We are still grateful for this comment and have updated the text to make it clearer as follows:

      “The target genomes included the draft one of Naja atra sequenced from a muscle tissue (ongoing internal project, see Material and Methods for detail) and the complete one of its sister species, Naja naja, from the public data (Suryamohan et al., 2020).”

      We have also updated the text when the first time mentioning the comparative genomics and transcriptomes analysis to indicate where the information is described.

      “To test our hypothesis, we comprehensively de novo assembled transcriptomes from the species across 13 clades of Toxicofera (Fig. 1B) with publicly available RNA-Seq data and compared them with the corresponding genomes available in the NCBI Assembly database (see SI Appendix, Table S1 for sample details).”

      2) The authors mention (Line 156) that "the genomic sequences of svPDE-E1a were present in all species of Serpentes but not in the species of Dactyloidae, Varanidae, and Typhlopidae.". As I understand it, the family Typhlopidae is included in the Suborder Serpentes. The conclusions stand of course, but I believe it is worth revising, for accuracy.

      We thank the reviewer for noticing this issue.

      We have updated the text as follows to prevent misleading:

      From “the genomic sequences of svPDE-E1a were present in all species of Serpentes but not in the species of Dactyloidae, Varanidae, and Typhlopidae. This suggests an early emergence of svPDE-E1a in the common ancestor of Serpentes and became …”

      To

      “the genomic sequences of svPDE-E1a were present in all species of Serpentes except for the earliest diverged Typhlopidae. This suggest an early emergence of svPDE-E1a in the Serpentes evolution and became …”

      3) During the discussion (Line 315), it is stated that the expression of svPDE in Lamprophiidae is probably associated with the adaptation of prey selection as a dietary generalist compared to Viperidae and Elapidae. Provided that both of these clades have several species considered dietary generalists, I believe this statement is not strongly supported.

      We agreed with the reviewer’s comment that we overstated it without solid support. However, here we believe it is worth mentioning and providing a hint for future studies that Lamprophiidae, a less-known clade, has svPDE expression and is not lower than several species of Elapidae. Therefore, we have revised this paragraph to include the finding without further speculations.

      “Comparative transcriptomics is a powerful tool to reveal species-specific or tissue-specific novel transcripts, providing new insights for further studies. For example, the svPDE expression of Lamprophiidae, even higher than several species of Elapidae, indicates the worth of further study for this less-known clade to fill the knowledge gap.”

      4) Also in the discussion (Line 320), the authors mention that Colubridae is traditionally regarded as a non-venomous clade. This statement is far from accurate given that Colubridae is a very diverse clade and several species within it have been shown to be at least moderately venomous. Various species have been shown to produce secretions comparable to those of front-fanged snakes. Furthermore, despite their difference in morphology, I believe there is little to no evidence that suggests Duvernoy's glands in colubrids have any functions differing from the venom glands of front-fanged snakes.

      We thank reviewer’s comment for revising the interpretation. This paragraph has been rewritten to as follows:

      “Interestingly, the svPDE expression in Duvernoy’s glands of Colubridae, although low, several species within the diverse Colubridae clade have been shown to be moderately venomous. The expression of svPDE in the Duvernoy’s glands also highlights its potential function despite that Duvernoy’s glands exhibit morphological difference from the venom glands of front-fanged snakes”

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript "Interplay between PML NBs and HIRA for H3.3 dynamics following type I interferon stimulus" by Kleijwegt and colleagues describes a study that's set out to explore the details of the PML-HIRA axis in H3.3 deposition at ISGs upon IFN-I stimulation. First, the authors establish that HIRA colocalized at PML NBs upon TNFa and TNFb treatment. This process is SUMO-dependent and facilitated by at least one of the identified SIM domains of HIRA. Next, the authors set out to determine whether interferon responsive genes (ISGs) are dependent on HIRA or PML. By knocking-down either HIRA or PML, only an effect on ISGs was observed when PML was knocked down. In fact, immune-FISH showed that PML NBs are in close proximity of ISGs upon TNFb treatment. To address the histone chaperone function of HIRA, the deposition of the replication-independent H3.3 on ISGs is tested. In specific, the enrichment of H3.3 across the ISG gene body. ChIP-seq data (Fig 5B) showed an enrichment around the TES, whereas qPCR (Fig 5A) showed less convincing enrichment (for details see below). When either HIRA or PML are knocked down, a mild loss of H3.3 enrichment was observed (Fig 5E). Interestingly, when HIRA is sequestered away from PML NBs by Sp100, an increased enrichment of H3.3 was observed. To understand the interplay between H3.3 deposition and HIRA's role in this process in the presence of PML NBs, H3.3 was overexpressed. Two population of cells were observed: low or high levels of H3.3. In the former, HIRA formed foci and the latter, HIRA did not form foci. Surprisingly, when HIRA is overexpressed, PML NBs form in the absence of TNFb. Finally, a two-sided model is proposed, where PML NBs is required for ISG transcription promoting H3.3 loading. The second side is that PML NBs function as a "storage center" for HIRA to regulate its availability.

      Overall, it the model is intriguing, but the data presented seems insufficient to support the current claims.

      We thank the reviewer for his/her constructive comments. We want to point out that there is a confusion in the reviewer's statement (highlighted in red here above) between TNFb and IFNb, because it is IFNb that was mostly used in our study. We suppose it is a typo error. Concerning the sentence: "when HIRA is overexpressed, PML NBs form in the absence of TNFb", it is inaccurate. Indeed, PML NBs are present in our cells with or without IFNb treatment. Overexpression of HIRA triggers accumulation of the ectopic HIRA in the PML NBs in absence of IFNb, probably as part of a buffering mechanism.

      Major concerns:

      • The suggested function of HIRA at the PML NBs as storage is interesting. Ideally, this would be tested by real-time single molecule tracking.

      While surely interesting, we believe that the real-time single molecule tracking is beyond the scope of our article. In addition, with our hypothesis that PML NBs act as buffering places for HIRA, HIRA might come in and out of PML NBs depending on its concentration and/or the availability of free binding sites and single molecule tracking might not be informative for long- term possible storage functions of PML NBs.

      • The link between PML NBs containing HIRA and H3.3 deposition is very intriguing and indeed the ChIP-seq data shown in Figure 5B shows a clear increase in the H3.3 signal around the TES. This distribution is very intriguing as recent work (Fang et al 2018 Nat Comm) showed that H3.3 deposition across the gene body was diverse and dynamic. Ideally, the qPCR of some select ISGs would confirm the ChIP-seq data. Here a more complex picture emerges. Just as with the ChIP-seq, a modest decrease of H3.3 at the TSS was observed, but only in 2 of the 3 genes shown is H3.3 enriched at the TES and only in 1 gene (ISG54) is H3.3 enriched at the gene body. As qPCR is later used in the manuscript (Fig 5E and 5G), it is essential that the results of two different techniques give similar results. With regards to Fig 5E and 5G, it is unclear why certain gene regions are shown, but not others.

      We agree with the reviewer that distribution of H3.3 on active genes follows a diverse and dynamic pattern. H3.3 is enriched on gene bodies but several papers have shown an important increase of H3.3 loading on the TES region of actively transcribed genes (Tamura et al. 2009; Sarai et al. 2013). Our ChIP-qPCR data (Figure 6A) and our ChIP-Seq data (Figure 6B) are consistent and show a moderate increase of H3.3 on gene bodies, eg on MX1 mid or ISG54 mid regions shown by qPCR on Figure 6A (this enrichment is reproducible but not necessarily statistically significant) and on gene bodies of the 48 core ISGs as shown in our ChIP-Seq data (see the light blue line between TSS and TES on figure 6B). In addition, our ChIP-qPCR and ChIP-Seq data also consistently show a higher enrichment of H3.3 on the TES regions of ISGs (see the significant enrichment found in ChIP-qPCR in the TES regions of MX1, OAS1 and ISG54, as well as the strong increase in H3.3 deposition with IFN seen by the light blue line for ChIP- Seq data on figure 6B).

      Since the strongest enrichment for H3.3 was found on the TES region, we focused on this region to evaluate the impact of HIRA or PML knock-down. Our ChIP-Seq data (now added in main Figure 6F for the whole ISG region, or with a zoom on the TES region in Figure 6G) shows that the strongest effect of HIRA or PML knock-down is indeed visible in the TES region of ISGs. Our ChIP-qPCR presented on Figure 6E data totally supports this effect.

      Overall, the link between HIRA and PML in H3.3 loading is only mildly affected (Fig 5E and 5F). The conclusion that HIRA and PML are essential (Page 12, line 8) is not represented by the presented data. The authors propose that DAXX could play a role. Indeed, work on another H3 variant, CENP-A, showed that non-centromeric localization is dependent on both HIRA and DAXX (Nye et al 2018 PLoS ONE). It would be interesting to learn if a double knock-down of HIRA and DAXX can prevent the enrichment of H3.3 at TES of ISGs upon TNFb treatment.

      To address the first part of the comment, we have now added 3 things :

      (1) we have tuned-down our conclusion by saying that HIRA and PML are 'important' for the long-lasting deposition of H3.3 on ISGs,

      (2) we provide new data of time-ChIP qPCR experiments suggesting that HIRA is important for H3.3 recycling during transcription of ISGs. We believe that these results strengthen the importance of HIRA for the global H3.3 enrichment on ISGs (by acting both in the de novo deposition and/or recycling of H3.3).

      We agree with the reviewer that it could be interesting to study the impact of the double knock-down of DAXX and HIRA on H3.3 enrichment at ISGs. However, we decided to focus our attention on SP100 since it could help us to better tease apart the role of HIRA localization in PML NBs, versus its role in H3.3 deposition at ISGs. In addition, since SP100 knock-down unleashes ISGs transcription, it also provided us with the opportunity to study the impact of an elevated ISGs transcription on H3.3 deposition and whether this is also mediated by HIRA.

      (3) we thus now also provide data of the double knock-down of SP100 and HIRA showing that the increase in H3.3 loading on ISGs seen upon SP100 knock-down is mediated by HIRA. This new result also strengthens the importance of HIRA for H3.3 enrichment on ISGs upon transcription.

      • In Figure 6B, two versions of HIRA are overexpressed and the authors conclude that the number of PML NBs goes up. Earlier in the manuscript, the authors showed that PML NB formation upon IFNb exposure brings HIRA into the PML NBs via a SUMO-dependent mechanism. Is overexpression of HIRA and its accumulation in PML NBs also SUMO-dependent or SUMO-independent? Overexpressing the SIM mutants from Figure 3F would address this question. In addition, the link between the proposed HIRA being stored at PML NBs could be strengthened by overexpressing HIRA and see at both short and late time points whether H3.3 is enriched on ISG genes.

      We want to clarify the first point: we do not conclude that the number of PML NBs goes up upon overexpression of HIRA. The number of PML NBs seems stable, although we have not quantified it. The aim of Figure 4A (previously Figure 6B) is to show that upon overexpression, ectopic forms of HIRA localize in PML NBs without IFN-I treatment, as part of a buffering mechanism.

      The SIM mutant of HIRA from Figure 3F is indeed overexpressed and does not localize in PML NBs upon IFN-I treatment. We have now added an IF (Figure 3- figure supplement 1C) showing that it does not localize either in PML NBs in non-treated cells. Thus, this underscores that accumulation of ectopic HIRA in PML NBs is SUMO-SIM-dependent regardless of the IFN-I treatment.

      • BJ cells are known to senesce rather easily. Did the authors double-check what fraction of their cells were in senescence and whether this correlated with the high or low expression of ectopic H3.3?

      BJ cells can indeed enter into senescence, but there are less prone to senesce than other human primary cells such as IMR90 for example. Nevertheless, we checked EdU incorporation both in BJ cells (Figure 1 - Figure supplement 1F) and BJ eH3.3i cells with expression of ectopic H3.3, with or without IFN-I treatment (Figure R2 for reviewer). We could clearly see that in our conditions (Dox addition for 24h maximum, IFNb at 1000U/mL for 24h), there is no significant difference in the number of EdU+ cells (ie proliferating cells), thus excluding effects due to senescence entry. As positive control, we have treated BJ cells with etoposide, a known senescence-inducing drug (Kosar et al., 2013; Tasdemir et al., 2016) which indeed reduces the number of EdU positive cells. We have now added a sentence in the main text as well to underscore that cells are not senescent.

      • In Figure 6 - figure supplement D, it appears that the levels of HIRA go up upon TSA and IFNb treatment. Rather than relying on visual inspection, ideally, all Western blots should be quantified to confirm the assessment that protein levels are not affected by different experimental procedures.

      We now provide quantification of all WBs below each WB. In addition, we have removed data on TSA since it could appear too preliminary.

      Reviewer #2 (Public Review):

      HIRA chaperone complex has been previously shown to localize at PML Nuclear Bodies upon various stress or stimuli (senescence, viral infections, interferon treatment). The authors have previously unraveled an anti-viral role of PML NBs through the chromatinization of HSV-1 viral genome by H3.3 chaperones. Here, the authors identify SUMOylation, as well as a SIM-like sequence in HIRA, as drivers for HIRA recruitment at PML Nuclear Bodies upon interferon-I treatment. These HIRA-containing PML NBs localize close to interferon-stimulated gene (ISG) loci. Although the functional role of HIRA/PML interaction is yet not solved, HIRA and PML regulate H3.3 loading at transcriptional end sites of IGS upon Interferon-I treatment. The authors propose that PML NBs play a buffering role for HIRA, regulating its availability depending on H3.3 level or chromatin dynamics.

      Strength:

      The authors used primary human diploid BJ fibroblasts, a relevant cell line for studying physiological regulation upon inflammatory cytokines. The role of SUMO/SIM on HIRA localization upon interferon beta treatment was assessed using interesting - but already described - tools, such as SUMO-specific affimers. The authors provide convincing results on the requirement of PML SUMOylation and a putative SIM sequence on HIRA for its localization at PML Nuclear Bodies. Other interesting observations are described, such as some PML or HIRA-dependent long-lasting H3.3 loading at transcription end site of ISGs upon interferon beta treatment, as shown by ChIP analyses of ISG loci, but also by endogenous H3.3 ChIPseq analysis.

      Weakness:

      The authors claim HIRA partitioning at PML NBs correlates with increase in "PML valency" upon interferon-I. The "valency" refers to the number of interaction domains, but the number of SUMOs conjugated on PML is not explored here (nor the number of SIMs on HIRA). Although the authors have proposed interested hypothesis and discussion, the inhibitory role of H3.3 overexpression or acetylation inhibition on HIRA localization at PML Nuclear Bodies clearly deserves further investigations.

      More generally, the manuscript explores many directions, but the links between the various observations remain unclear and limit firm conclusions.

      We thank the reviewer for his/her constructive comments.

      We have now addressed these 3 weaknesses pointed out by the reviewer.

      • Our claims on PML valency have been removed. We have now underscored the link between HIRA accumulation in PML NBs and the increase in PML and SP100 protein levels, without lingering on the valency aspects which was not the focus of our paper.

      • The role of H3.3 overexpression in inhibition of HIRA localization in PML NBs has been moved in the first part of the paper describing the mechanistic for accumulation of HIRA in PML NBs. We feel that these data are still of importance and support the role of PML NBs as a buffering place for HIRA depending on DAXX levels (new data) as well as H3.3 levels.

      We agree that the acetylation inhibition would deserve further investigations and we have thus removed the part on TSA treatment.

      • Thanks to the reviewer's comments, we have now remodeled the article to better convey two main conclusions : (1) PML NBs serve as a buffering site for HIRA. Accumulation of HIRA in PML NBs depends both on PML and SP100 concentration (and on PML SUMOylation) as well as DAXX and H3.3 levels and (2) upon IFN-I treatment, PML regulates ISGs transcription and thus indirectly regulates HIRA loading on ISGs, which controls H3.3 deposition and recycling during transcription. HIRA-mediated H3.3 deposition/recycling is highly dependent on ISGs transcription levels and is thus increased upon SP100 knock-down which unleashes ISGs transcription.
    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides the first cellular analysis of how neuronal activity in axons (in this case the optic nerve) regulates the diameter of nearby blood vessels and hence the energy supply to neuronal axons and their associated cells. This is an important subject because, in a variety of neurological disorders, there is damage to the white matter that may result from a lack of sufficient energy supply, and this paper will stimulate work on this important subject.

      Axonal spiking is suggested to release glutamate which activates NMDA receptors on myelin-making oligodendrocytes wrapped around the axons: the oligodendrocytes - either directly or indirectly via astrocytes - then generate prostaglandin E2 which relaxes pericytes on capillaries, thus decreasing the resistance of the vascular bed and (presumably) increasing blood flow in the nerve.

      Strengths of the paper

      The paper identifies some important characteristics of axon-vascular coupling, notably its slow temporal development and long-lasting nature, the involvement of PgE2 in an oxygen-dependent manner, and a role for NMDARs. Rigorous criteria (constriction and dilation of capillaries by pharmacological agents) are used to select functioning pericytes for analysis.

      Weaknesses of the paper

      The study focuses exclusively on pericytes. It would have been interesting to assess whether arteriolar SMCs also contribute to regulating blood flow

      We thank reviewer #1 for his/her positive comment on our manuscript. We also share the future interest in the optic nerve’s arteriole (there is only one main arteriole covered by SMC). However, it is not always visible in the preparation due to the orientation of the nerve - if not on the surface and directly under the microscope it is not possible to image it.

      Reviewer #2 (Public Review):

      This paper describes a new concept of "axo-vascular coupling" whereby action potential traffic along white matter axons induces vasodilation in the mouse optic nerve. This is an initial report dissecting some of the mechanisms that are undoubtedly complex as in gray matter NVC. I like the novel AVC concept.

      We really appreciate the reviewer’s positive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a systematic study of the cortical propagation patterns of human beta bursts (~13-35Hz) generated around simple finger movements (index and middle finger button presses).

      The authors deployed a sophisticated and original methodology to measure the anatomical and dynamical characteristics of the cortical propagation of these transient events. MEG data from another study (visual discrimination task) was repurposed for the present investigation. The data sample is small (8 participants). However, beta bursts were extracted over a +/- 2s time window about each button press, from single trials, yielding the detection and analysis of hundreds of such events of interest. The main finding consists of the demonstration that the cortical activity at the source of movement related beta bursts follows two main propagation patterns: one along an anteroposterior directions (predominantly originating from pre central motor regions), and the other along a medio- lateral (i.e., dorso lateral) direction (predominantly originating from post central sensory regions). Some differences are reported, post-hoc, in terms of amplitude/cortical spread/propagation velocity between pre and post-movement beta bursts. Several control tests are conducted to ascertain the veracity of those findings, accounting for expected variations of signal-to-noise ration across participants and sessions, cortical mesh characteristics and signal leakage expected from MEG source imaging.

      One major perceived weakness is the purely descriptive nature of the reported findings: no meaningful difference was found between bursts traveling along the two different principal modes of propagation, and importantly, no relation with behavior (response time) was found. The same stands for pre vs. post motor bursts, except for the expected finding that post-motor bursts are more frequent and tend to be of greater amplitude (yielding the observation of a so-called beta rebound, on average across trials).

      Overall, and despite substantial methodological explorations and the description of two modes of propagation, the study falls short of advancing our understanding of the functional role of movement related beta bursts.

      For these reasons, the expected impact of the study on the field may be limited. The data is also relatively limited (simple button presses), in terms of behavioral features that could be related to the neurophysiological observations. One missed opportunity to explain the functional role of the distinct propagation patterns reports would have been, for instance, to measure the cortical "destination" of their respective trajectories.

      In response to this comment, we would like to highlight two important points.

      First, our work constitutes the first non-invasive human confirmation of invasive work in animals (Balasubramanian et al., 2020; Roberts et al., 2019; Rule et al., 2018; (Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Takahashi et al., 2011, 2015) and patients (Takahashi et al., 2011). Thus, these results bridges between recordings limited to the size of multielectrode arrays (roughly 0.16 cm2; Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Takahashi et al., 2011, 2015) and human EEG recordings spanning across large areas of the cortex and several functionally distinct regions (Alexander et al., 2016; Stolk et al., 2019). The ability to access these neural signatures non- invasively is important for cross-species comparison. This further enables us, to provide an in-depth analysis of the spatiotemporal diversity of human MEG signals and a detailed characterisation of the two propagation directions, which significantly extends previous reports. We note that their functional role remains undetermined also in these animal studies, but being able to identify these signals now in humans can provide a steppingstone for identifying their role.

      Second, and related, the reviewers are correct that we did not observe distinct propagation directions between pre- and post-movement bursts, nor a relationship with reaction time. However, such a null result would be relevant, in our view, towards understanding what the functional relevance of these signals, if any, might be. Recent work in macaques indicates that the spatiotemporal patterns of high-gamma activity carry kinematic information about the upcoming movement (Liang et al 2023). The functional role of beta may therefore be more complex and not relate to reaction times or kinematics in a straightforward manner. We believe this is a relevant observation, and in keeping with the continued efforts to identify how sensorimotor beta relates to behaviour. It is increasingly clear that spatiotemporal diversity in animal recordings and human E/MEG and intracranial recordings can constitute a substantial proportion of the measured dynamics. As such, our report is relevant in narrowing down what these signals may reflect.

      Together, we think that our work provides new insights into the multidimensional and propagating features of burst activity. This is important for the entire electrophysiology community, as it transforms how we commonly analyse and interpret these important brain signals. We anticipate that our work will guide and inspire future work on the mechanistic underpinnings of these dominant neural signals. We are confident that our article has the scope to reach out to the diverse readership of eLife.

      Reviewer #2 (Public Review):

      The authors devised novel and interesting experiments using high precision human MEG to demonstrate the propagation of beta oscillation events along two axes in the brain. Using careful analysis, they show different properties of beta events pre- and post movement, including changes in amplitude. Due to beta's prominent role in motor system dynamics, these changes are therefore linked to behavior and offer insights into the mechanisms leading to movement. The linking of wave-like phenomena and transient dynamics in the brain offers new insight into two paradigms about neural dynamics, offering new ways to think about each phenomena on its own.

      Although there is a substantial, and recent, body of literature supporting the conclusions that beta and other neural oscillations are transient, care must be taken when analyzing the data and the resulting conclusions about beta properties in both time and space. For example, modifying the threshold at which beta events are detected could alter their reported properties and expression in space and time. The authors should therefore performing parameter sweeps on e.g. the thresholds for detection of oscillation bursts to determine whether their conclusions on beta properties and propagation hold. If this additional analysis does not change their story, it would lend confidence in the results/conclusions.

      We thank the reviewing team for this comment. As suggested, we evaluated the effect of different burst thresholds on the burst parameters.

      The threshold in the main analysis was determined empirically from the data, as in previous work (Little et al., 2019). Specifically, trial-wise power was correlated with the burst probability across a range of different threshold values (from median to median plus seven standard deviations (std), in steps of 0.25, see Figure 6-figure supplement 1). The threshold value that retained the highest correlation between trial-wise power and burst probability was used to binarize the data.

      We repeated our original analysis using four additional thresholds, i.e., original threshold - 0.5 std, -0.25 std, +0.25 std, +0.5 std. As one would expect, burst threshold is negatively related to the number of bursts (i.e., higher thresholds yield fewer bursts, Figure R4a [top]), and positively related to burst amplitude (i.e., higher thresholds yield higher burst amplitudes, Figure R4a [bottom]).

      Similarly, the temporal duration of bursts and apparent spatial width are modulated by the burst threshold: lowering the threshold leads to longer temporal duration and larger apparent spatial width while increasing the threshold leads to shorter temporal duration and smaller apparent spatial width Figure R4b. Note that for the temporal and spectral burst characteristics, the difference to the original threshold can be numerically zero, i.e., changing the burst threshold did not lead to changes exceeding the temporal and spectral resolution of the applied time-frequency transformation (i.e., 200ms and 1Hz respectively).

      Importantly, across these threshold values, the propagation direction and propagation speed remain comparable.

      We now include this result as Figure 6-figure supplement 2and refer to this analysis in the manuscript (page 28 line 717).

      “To explore the robustness of the results analyses were repeated using a range of thresholds (Figure 6-figure supplement 2).”

      Determining the generators of beta events at different locations is a tricky issue. The authors mentioned a single generator that is responsible for propagating beta along the two axes described. However, it is not clear through what mechanism the beta events could travel along the neural substrate without additional local generators along the way. Previous work on beta events examined how a sequence of synaptic inputs to supra and infragranular layers would contribute to a typical beta event waveform. Although it is possible other mechanisms exist, how might this work as the beta events propagate through space? Some further explanation/investigation on these issues is therefore warranted.

      Based on this and other comments (i.e., comments 7 and 8) we re-evaluated the use of the term ‘generator’ in this manuscript.

      While the term generator can be used across scales, from micro- to macroscale, ifor the purpose of the present paper, we believe one should differentiate at least two concepts: a) generator of beta bursts, and b) generator of travelling waves.

      We realised that in the previous version of the manuscript the term ‘generator’ was at times used without context. We removed the term where no longer necessary.

      Further, the previous version of the manuscript discussed putative generators of travelling waves (page 19f.) but not generators of beta bursts. We now address this as follows:

      “Studies using biophysical modelling have proposed that beta bursts are generated by a broad infragranular excitatory synaptic drive temporally aligned with a strong supragranular synaptic drive (Law et al., 2022; Neymotin et al., 2020; Sherman et al., 2016; Shin et al., 2017) whereby layer specific inhibition acts to stabilise beta bursts in the temporal domain (West et al., 2023). The supragranular drive is thought to originate in the thalamus (E. G. Jones, 1998, 2001; Mo & Sherman, 2019; Seedat et al., 2020), indicating thalamocortical mechanisms (page 22f).”

      Once the mechanisms have been better understood, a question of how much the results generalize to other oscillation frequencies and other brain areas. On the first question of other oscillation frequencies, the authors could easily test whether nearby frequency bands (alpha and low gamma) have similar properties. This would help to determine whether the observations/conclusions are unique to beta, or more generally applicable to transient bursts/waves in the brain. On the second issue of applicability to other brain areas, the authors could relate their work to transient bursts and waves recorded using ECoG and/or iEEG. Some recent work on traveling waves at the brain-wide level would be relevant for such comparisons.

      We appreciate the enthusiasm and the suggestions. To comment on the frequency specificity of the observed effects we conducted the same analysis focusing on the gamma frequency range (60-90 Hz). For computational reasons, we limited this analysis to one subject. Figure R1 shows the polar probability histogram for the beta frequency range (left) and the gamma frequency range (right). In contrast to the beta frequency range, no dominant directions were observed for the gamma range and von Mises functions did not converge. These preliminary results suggest some frequency specificity of the spatiotemporal pattern in sensorimotor beta activity. We believe this paves the way for future analysis mapping propagation direction across frequency and space.

      Here we did not investigate the spatial specificity of the effects, as the beta frequency range is dominant in sensorimotor areas. Investigating beta bursts in other cortical areas would have likely resulted in very few bursts. We discuss our results across spatial scales in the section: Distinct anatomical propagation axes of sensorimotor beta activity. However, please note that most of the previous literature operates on a different spatial scale (roughly 4mm; Balasubramanian et al., 2020; Best et al., 2016; Rubino et al., 2006; Rule et al., 2018; Takahashi et al., 2011, 2015) and different species (e.g., non-human primates). Non-invasive recordings in humans capture temporospatial patterns of a very different scale, i.e., often across the whole cortex (Alexander et al., 2016; Roberts et al., 2019). Comparing spatiotemporal patterns, across different spatial scales is inherently difficult. Work

      investigating different spatial scales simultaneously, such as Sreekumar et al. 2020, is required to fully unpack the relationship between mesoscopic and macroscopic spatiotemporal patterns.

      Figure R1: Spatiotemporal organisation for the beta (β, 13-30Hz) and gamma (γ, 60-90) frequency range for one exemplar subject. Same as Figure 4a, but for one exemplar subject.

      If the source code could be provided on github along with documentation and a standard "notebook" on use other researchers would benefit greatly.

      All analyses are performed using freely available tools in MATLAB. The code carrying out the analysis in this paper can be found here: [link provided upon acceptance]. The 3D burst analyses can be very computationally intensive even on a modern computer system. The analyses in this paper were computed on a MacBook Pro with a 2.6 GHz 6-Core Intel Core i7 and 32 Gb of RAM. Details on the installation and setup of the dependencies can be found in the README.md file in the main study repository.

      This information has been added to the paper in the methods section on page 35.

    1. Author Response

      Reviewer #2 (Public Review):

      Understanding the molecular mechanism of obesity-associated OA is highly in clinical demand. Overall, the current study is well-designed and illustrated that down-regulated GAS6 impairs synovial macrophage efferocytosis and promotes obesity-associated osteoarthritis. Based on the patient's sample, the data indicated synovial tissues are highly hyperplastic in obese OA patients and infiltrated with more polarized M1 macrophages than in non-obese OA patients. Further authors proved that obesity promotes synovial M1 macrophage accumulation and GAS6 was inhibited in synovitis during OA development in mice models. The sample size, data collection, and quality of the IHC and immunofluorescent histological sections are outstanding. The results were well presented with appropriate interpretation. But the following major questions should be addressed.

      Major:

      1) Animal model: Ten-week-old animals received DMM surgery and were fed a standard/HFD diet for 4 or 8 weeks prior to specimen harvest. Since Wang J and other studies have shown that male ApoE(-/-) and C57BL/6J wild-type (WT) mice fed with a high-fat diet for 12 or 24 weeks, and the ApoE(-/-) mice gained less body weight and had less fat mass and lower triglyceride levels with better insulin sensitivity and lower levels of inflammatory markers in skeletal muscle than WT (Wang J, et al. Atherosclerosis. 2012 Aug;223(2):342-9. PMID: 22770993; Hofmann SM, et al. Diabetes. 2008 Jan;57(1):5-12. PMID: 17914034; Kypreos KE et al. J Biomed Res. 2017 Nov 1;32(3):183-90. PMID: 29770778). Thus, it is very important to provide the data on the final body weight gained in your groups and provide a relative background of the animal model chosen in the introduction or discussion. Please explain why ApoE-/- mouse model, and how this animal model is clinically relevant. Does a high-fat diet induced obsess OA available in C57BL/6 WT?

      Thank you for your valuable comment. We have added the body weight change data for each group of mice in Revised Figure 2-figure supplement 3. We also provided a relative background of the animal model in paragraph 2 of the Discussion section, which reads, “ApoE plays an important role in maintaining the normal levels of cholesterol and triglycerides in serum by transporting lipids in the blood. Mice lacking ApoE function develop hypercholesterolemia, increased very low-density lipoprotein (VLDL) and decreased high-density lipoprotein (HDL), exhibiting chronic inflammation in vascular disease and nonalcoholic steatohepatitis.”.

      Epidemiological study results suggest obesity is an independent risk factor for OA pathological progression. Gierman et al. found that increased plasma cholesterol levels play a vital role in the development of OA1,2. Mice deficient in ApoE-/- showed naturally high levels of LDL-cholesterol independent of gender and age, which could additionally be increased by a cholesterol-rich diet3,4. Moreover, recent studies found that ApoE-/- mice feeding with HFD gained more body weight than those feeding standard chow-diet groups5–7. We have re-analyzed the body weight statistics and found that ApoE-/- fed with HFD (19.81±1.33g) gained more body weight than the control (16.89±0.75g). These manuscripts indicated that feeding HFD to ApoE-/- mice for a short period could accelerate the increase in LDL cholesterol levels and cause more body weight gain. ApoE-/- mice may be partially clinically relevant to pathological progression in obese osteoarthritis patients with elevated plasma LDL cholesterol levels. As Reviewer #2 mentioned, an HFD induced obesity is available in C57BL/6 WT according to our weight gain data. However, the effect of obesity on OA progression in these two kinds of animals deserves further study.

      References:

      1. Gierman LM, Kühnast S, Koudijs A, et al. Osteoarthritis development is induced by increased dietary cholesterol and can be inhibited by atorvastatin in APOE*3Leiden.CETP mice—a translational model for atherosclerosis. Ann Rheum Dis. 2014;73(5):921-927.

      2. Gierman LM, van der Ham F, Koudijs A, et al. Metabolic stress-induced inflammation plays a major role in the development of osteoarthritis in mice. Arthritis Rheum. 2012;64(4):1172-1181.

      3. Wu D, Sharan C, Yang H, et al. Apolipoprotein E-deficient lipoproteins induce foam cell formation by downregulation of lysosomal hydrolases in macrophages. J Lipid Res. 2007;48(12):2571-2578.

      4. Naura AS, Hans CP, Zerfaoui M, et al. induces lung remodeling in ApoE-deficient mice: an association with an increase in circulatory and lung inflammatory factors. Lab Invest. 2009;89(11):1243-1251.

      5. Tung MC, Lan YW, Li HH, et al. Kefir peptides alleviate high-fat diet-induced atherosclerosis by attenuating macrophage accumulation and oxidative stress in ApoE knockout mice. Sci Rep. 2020;10(1):8802.

      6. Bao M hua, Luo H qing, Chen L hua, et al. Impact of high fat diet on long non-coding RNAs and messenger RNAs expression in the aortas of ApoE(−/−) mice. Sci Rep. 2016;6(1):34161.

      7. Cao X, Guo Y, Wang Y, et al. Effects of high-fat diet and Apoe deficiency on retinal structure and function in mice. Sci Rep. 2020;10(1):18601.

      2) Control group: The DMM surgery was performed on the right leg, and the contralateral knee joint should be used as a baseline to show the level of M1 macrophage infiltration under the obsess microenvironment.

      Thank you for this insightful comment. The reason why we used the right lower limb as the control group in our experiment was mainly because we considered the impact of right knee surgery on the left lower limb. A book published in 2014 described a series of method for inducing mouse osteoarthritis model, authors noted that sham-operated left knee joints would develop OA-like symptoms after right knee joints received DMM. Thus, Lorenz et al. strongly recommend using a separate control group for sham surgeries.

      References:

      1. Lorenz, J., Grässel, S. (2014). Experimental Osteoarthritis Models in Mice. In: Singh, S., Coppola, V. (eds) Mouse Genetics. Methods in Molecular Biology, vol 1194. Humana Press, New York, NY.
    1. Author Response

      Reviewer #1 (Public Review):

      The goal of this study was to investigate the mechanisms that lead to the release of photosynthetically fixed carbon from symbiotic dinoflagellate alga to their coral host. The experimental approach involved culturing free-living Brevolium sp dinoflagellates under "Normal" and "Low pH" conditions (respective target pH of 7.8 and 5.50) and measuring the following parameters: (Fig.1) cell growth rate over ~28 days, photosynthetic activity, glucose and galactose secretion at day 1; (Fig. 2) Cell clustering, external morphology (using SEM), and internal morphology (using TEM) after 3 weeks; (Fig. 3) Transcriptomic analyses at days 0 and 1; and (Fig. 4) glucose and galactose concentration in Normal culturing medium after 24h incubation with a putative cellulase inhibitor (PSG).

      The paper reports decreased growth at Low pH coupled with decreased photosynthetic rates and increased glucose and galactose release in 1-day Breviolum sp. cultures. At this same time point, genes related to cellulase were upregulated, and after 3 weeks morphological changes on the cell wall were reported. The addition of the cellulase inhibitor PSG to cells in pH 7.8 media decreased the release of glucose and galactose.

      The paper concludes that acidic conditions mimicking those reported for the coral symbiosome -the intracellular organelle that hosts the symbiotic algae- upregulate algal cellulases, which in turn degrade the algal cell wall releasing glucose and galactose that can be used as a source of food by the coral host. However, there are some methodological issues that hamper the interpretation of results and conclusions.

      We appreciate your helpful comments and apologize the confusion caused by insufficient descriptions in the previous manuscript. In the revised manuscript we clarify what we originally intended to demonstrate including the followings:

      (1) Most analyses including SEM and TEM were done at day 0 and 1, except for a few, i.e. growth rate over 28 days and cell clumping assay done 3 weeks after the inoculation, which is summarized as a schematic panel and clarified in the revised manuscript.

      (2) Inhibitor assay for secreted celluloses was done in pH 5.5.

      (3) We do not intend to suggest that low pH medium mimics symbiosomes, as these organelles are far more complex than simple culture media and how symbiosomes are maintained and what the interior environment is like are not fully understood in general. Based on previous studies, presumably they are featured by low pH, high CO2, host-derived nutrients. Among these, we focus on low pH, which is a stressor for dinoflagellates to go through in not only symbiosomes but also natural environments, e.g. animal gut.

      In this study, we clarified how algae respond to low pH as an environmental stressor, which can also provide insights into how they interact with the host inside the guts as well as symbiosomes.

      Reviewer #2 (Public Review):

      Ishii and colleagues investigated the process of monosaccharide release from algae in low-pH environmental conditions, mimicking the acidic lysosomal-like intracellular compartment where the algae reside symbiotically and transfer nutrients to their hosts, namely corals and other animals. Upon exposure of cultured algae to low pH, subsequent physiological changes as well as the increased presence of glucose and galactose were measured in the surrounding media. Concurrently, photosynthetic activity was decreased, and further experiments employing the photosynthetic inhibitor DCMU to cultures also replicated the increased monosaccharide release. Transcriptomic comparison of algae in low pH to controls showed differential expression in glycolytic pathways and, interestingly, a strong upregulation of signal-peptide-containing isoforms of cellulases. Finally, the elegant use of a cellulase inhibitor on the cultured algae revealed a decrease in monosaccharides in the media. This led the authors to propose a pathway of sugar release in which acidic conditions trigger a cellulase-driven cascade of cell wall degradation in the algae and their consequent release of monosaccharides. These results have interesting implications on the molecular mechanisms of coral-algae symbiosis, contributing to the understanding of how these important symbioses function on the cellular level.

      Overall the conclusions of this manuscript are supported by the data presented, but clarification and elaboration are needed to fully justify the proposed mechanisms and better situate the results in a broader context of the field.

      We thank the reviewer for the positive comments. In the revised the manuscript we show that the results could be better explained with the proposed mechanisms in a broader context.

    1. Author Response

      Reviewer #2 (Public Review):

      1) Mechanistic details of how FCA regulates FLC have been extensively studied, and both transcriptional and co-transcriptional regulations occur. I understand that FCA affects the 3'end processing of antisense COOLAIR RNAs, which regulate FLC. FCA also physically interacts with COOLAIR RNAs and other proteins, including chromatin-modifying complexes, which establish epigenetic repression of FLC regardless of vernalisation. In addition, FCA appears to function to resolve R-loop at the 3' end FLC, and FLC preferentially interacts with m6A-modified COOLAIR by forming liquid condensates. FCA is also alternatively spliced in an autoregulatory manner, and fca-1 mutant was reported to be a null allele as fca-1 cannot produce the functional form of FCA transcripts (r-form).

      However, I could not find any information on the fca-3 allele, which was reported to exhibit a weaker phenotype in terms of flowering time (Koornneef et al., 1991). In this manuscript, the authors showed that the level of FLC expression is lower than fca-1 and higher than Ler WT, but I could not find any other relevant information on the nature of the fca-3 allele. Given the known details on the function of FCA, the authors should explain how fca-3 shows an "intermediate" phenotype, which is highly relevant to the argument for an "analog" mode of regulation in fca-3. Therefore, the nature of the fca-3 mutant should be described in detail.

      We thank the reviewers for pointing out this omission. We have added much more information on the genotypes in the methods of the manuscript. We emphasise, however, that the rationale for selecting fca-3 as an intermediate mutant was empirical: namely, it generates an intermediate level of FLC expression (Fig. 1C and Fig. 1S1).

      2) The authors used a transgene (FLC-venus) in which an FLC fragment from ColFRI was used. Both fca-1 and fca-3 is Ler background where FLC sequence variations are known. I understand that the authors introgressed the transgenic in Ler background to avoid the transgene effect, but it is not known whether fca-1 or fca-3 mutations have the same impact on Col- FLC.

      We tested the expression of both endogenous (Ler) and FLC-Venus (Col-FLC) copies in these mutants by qPCR and found similar results (Fig. 1S1C,D), indicating that the fca-1 and fca-3 mutations have similar effects in both cases.

      3) Fig. 3A: I understand that Fig 3A is the qRT-PCR data using whole seedlings, and the gradual reduction of FLC from 7 DAG to 21 DAG was used to test the "analog" vs. "digital" mode of gene regulation in fca-1 and fca-3. I am not sure whether this is biologically relevant.

      Indeed, Ler is the only line that has transitioned to flowering during the experiment, with both fca lines being late flowering mutants. We totally agree that for Ler, later timepoints may be biologically irrelevant. It is used in this case as a negative control for the imaging, since FLC in Ler was already mostly OFF from the first timepoint and no biological conclusions are drawn from the later times. We have added a comment to this effect in the results section, also clarifying in the discussion that our focus is on the early regulation of FLC. Therefore, by looking at the young seedling in wildtype Ler, as we and others have previously, we are already looking too late to capture the switching of FLC to OFF. However, we expect that this combination of analog and digital regulation will be highly

      relevant to FLC regulation in wild-type plants in different accessions, partly leading to the differences in autumn FLC levels that were shown to be so important in the wild (Hepworth et al. 2020).

      3-a) The authors wrote that "This experiment revealed a decreasing trend in fca-3 and Ler (Fig. 3A)". But, I do also see a "decreasing trend" in fca-1 as well (although I understand that they may not be statistically significant). I also noticed that the level of FLC in fca-1 at 7 day has a greater variation. Is there any explanation?

      The level of FLC in fca-1 at 7 days is indeed more variable in these experiments. However, in a new second experiment, this is not the case (Fig. 3S2). In addition, a similar effect has not been observed in the ColFRI genotype (Fig. S9F of Antoniou-Kourounioti et al. 2018). Therefore, we believe this greater variation in one data set may simply be due to random fluctuations.

      For the decreasing trend in fca-1 in Fig. 3A, as the reviewer says, this is not significant. However, in the second experiment, we again see a decrease, which is now slow but significant. The decrease could be due to a subset of fca-1 ON cells switching off (in tissue that we have not imaged) and we comment on this slow decrease in the text.

      3-b) The decreasing trend observed in Ler (although the expression of FLC is already relatively low in Ler) may be the basis for the biological relevance. But Fig. 3D shows that the FLC-venus intensity in Ler root is not "decreasing". The authors interpreted that "root tip cells in Ler could switch off early, while ON cells still remain at the whole plant level that continue to switch off, thereby explaining the decrease in the qPCR experiment." Does this mean that the root tip system with FLC-venus cannot recapitulate other parts of plants (especially at the shoot tip where FLC function is more relevant)?

      The authors utilize the root system with transgenes in mutant backgrounds to observe and model the gene repression (transgene repression, to be exact). If the root tip cells behave differently from other parts of plants, how could the authors use data obtained from the root tip system?

      We now show that FLC-Venus in Ler, fca-3, fca-1 in young leaves have similar expression patterns to roots, thus validating the root system as an appropriate one to study the switching dynamics, see response to Essential comment 3. Nevertheless, in Fig. 3A, we show that FLC expression declines even in Ler. However, the levels here are low, so if it is indeed a subfraction of late-switching cells that are responsible, these cells cannot form a large proportion of the plant. We now make this clear in the text.

      4) I do see both fca-1 and fca-3 can express FCA at a comparable level (Fig. 3B); thus, I guess that the authors are measuring total FCA transcripts and that fca-3 may result in different levels of "functional form" of FCA. But this is not clearly discussed.

      We have now added yellow boxes in Fig. 2S3 to show additional examples of short files of ON cells in fca-3 and fca-4. To further improve the interpretation of this image (and all others in the manuscript) we have changed the presentation of the imaging using a different colourmap to enhance clarity.

      5) Quantification based on image intensity needs to be carefully controlled. Ideally, a threshold to call "ON" or "OFF" state should be based on the comparison to internal control and it is not clear to me how the authors determined which cells are ON or OFF based on image intensity (especially in fca-3).

      For the wild-type and fca-1 situations there is no switching in the model, and hence no dynamical changes in the FLC protein levels. As the FLC levels in the ON or OFF states are simply fit to the data using log-normal distributions, this would simply be a fitting exercise for fca-1 and Ler, and little would be learnt. Hence, we have not pursued this line of analysis.

      6) In many parts, I had to guess how the experiments were performed with what kind of tissues/samples. The methods section can benefit from a more thorough description.

      We have now gone through and added the missing information.

      Related to Public review #2. What is the phenotype (flowering time) of FLC-venus in fca-1 and fca-3? In addition, how many independent lines were used? Do they behave similarly?

      It was observed that with the additional FLC gene (in the form of the FLC-Venus), flowering is delayed as expected. However, this was not quantified in this work. Instead, we validated that the expression of the transgene was equivalent to endogeneous between genotypes, as shown in Fig. 1S1, supporting that this is an appropriate readout for FLC expression. One line for each genotype was selected and used in this work. In addition, we also now use fca-4, which has similar expression to fca-3, and where FLC-Venus also behaves similarly to the fca-3 case (Fig. 1S1, 2S3).

      Reviewer #3 (Public Review):

      1) The way the authors define ON and OFF cells sounds a bit arbitrary to me and, in my understanding, can affect a lot the outcomes and derived conclusions. The authors define ON cells to those cells having more than one transcript, or when they are above the value of 0.5 of the Venus intensity measure - what would it happen if the thresholds are slightly above these levels? And why such thresholds should be the same for the studied lines Ler, fca-3 and fca-1? By looking at the distributions of mRNAs and Venus intensities in Ler and fca-3 plants, one could argue that all cells are in an OFF, 'silent' state, and that what is measured is some 'leakage', noise or simply cell heterogeneity in the expression levels. If there is a digital regulation, I would expect to see this bimodality more clearly at some point, as it was captured in Berry et al (2015) - perhaps cells in fca-1 show at a certain level of bimodality? When seeing bimodality, one could separate ON and OFF states by unmixing gaussians, or something in these lines that makes the definition less arbitrary and more robust.

      As explained in Essential comment 5, we have removed arbitrary thresholding from the manuscript and only used absolute thresholds from smFISH (now changed to >3, and shown that our results are robust to varying these thresholds, Fig. 2S2). If all cells are in the OFF state and fca-3 just has higher noise/heterogeneity, then this does not explain the reduction in expression over time. Nor can such heterogeneity explain the short files of ON cells and longer files of OFF cells in Fig. 2S3: the cells should just be a random mix of varying FLC levels. Our results are much more compatible with switching into a heritable silenced state. Finally, with bimodality, this is difficult to see as clearly as before due to the wide levels of expression in fca-3, but we believe it is present: a well-defined OFF state together with a broad ON state. This broadness makes extracting the ON cells quite difficult as a completely rigorous unmixing of the two states is just not possible.

      2) The authors use means in all their plots for histograms and data, and perform tests that rely on these means. However, many of these plots are skewed right distributions, meaning that mean is not a good measure of center. I think using median would be more appropriate, and statistical tests should be rather done on medians instead of means. If tests using medians were performed, I believe that some of the pointed results will be less significant, and this will affect the conclusions of this work.

      Highly expressing FLC lines and mutants, such as ColFRI and fca-9, often used for vernalization studies, are late flowering, but do eventually flower even with no decrease in FLC levels (and so no switching). This is not an artifact of using roots versus shoots, and presumably arises from there being multiple inputs into the flowering decision which can allow the FLC-mediated flowering inhibition to eventually be overcome.

      3) Some data might require more repeats, together with its quantification. For instance, the expression levels for fca-1 in Fig 2E and Fig 3D at 7 days after sowing look qualitatively different to me - not just the mean looks different, but also the distribution; fca-1 in Fig 3D looks more monomodal, while in Fig 2E it looks it shows more a bimodal distribution. Having these two different behaviours in these two repeats indicates that, more ideally, three repeats might be needed, together with their quantification. Fig. 2C would also need some repeats. In Fig 1S1 C and D, it would be good to clarify in which cases there are 2 or more repeats -3 repeats might be needed for those cases in Fig 1S1 C-D that have large error bars.

      The data in Figs. 2C and 2E are both based on two independent experiments, with the results combined. The data in Fig. 3D is almost entirely based on three independent experiments. We have now stated this in the legend. The Venus imaging was performed on separate microscopes for Fig. 2 and Fig. 3 and this possibly accounts some of the observed differences. However, we do not think that the data in Fig 2E for fca-1 supports a bimodal distribution: the slight peak at higher levels is, we believe, much more likely to be a statistical fluctuation. For Fig. 1S1 C and D, we now clarify in the legend that n=2 biological replicates for fca-3 and n=3 for others.

      Also, when doing the time courses, I find it would be very beneficial to capture an earlier time point for all the lines, to see whether it is easier to capture the digital nature of the regulation. Note that the authors have already pointed that 7 days after sowing might be too late for Ler line to capture the switch.

      We agree that capturing earlier time points for Ler in particular is interesting and important. However, we have found that this requires specialist imaging in the embryo and we feel that this is really beyond the scope of this manuscript and will instead form the basis of a future publication.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use what is potentially a novel method for bootstrapping sequence data to evaluate the extent to which SARS-CoV-2 transmissions occurred between regions of the world, between France and other European countries, and between some distinct regions within France. Data from the first two waves of SARS-CoV-2 in Europe were considered, from 2020 into January 2021. The paper provides more detail about the specific spread of the virus around Europe, specifically within France, than other work in this area of which I am aware.

      First of all, we would like to thank reviewer #1 for their evaluation and their various comments which, in our opinion, have allowed us to considerably improve the manuscript.

      An interesting facet of the methodology used is the downsampling of sequence data, generating multiple bootstraps each of around 500-1000 sequences and conducting analysis on each one. This has the strength of sampling, in total, a large number of sequences, while reducing the overall computational cost of analysis on a database that contains in total several hundred thousand sequences. A question I had about the results concerns the extent of downsampling versus the rate of viral migration: If between-country movements are rapid, a reduced sample could be misleading, for example characterising a transmission path from A to B to C as being from A to C by virtue of missing data. I acknowledge that this would be a problem with any phylogeographic analysis relying on limited data. However, in this case, how does the rate of migration between locations compare to the length of time between samples in the reduced trees? Along these lines, I was unclear to what extent the reported proportions of intra- versus inter-regional transmissions (e.g. line 223) would be vulnerable to sampling effects.

      This question is indeed a very important one. Between-country movement rate can be high but the contagious period for a SARS-CoV-2-infected individual is short (a bit less than two weeks in average). In our subsamples, the dated trees have a median branch length around 20 days. To ensure that our subsamples did not introduce errors in estimating the exchange events between locations, we conducted a simulation. Briefly, we generated a tree of 1,000,000 tips with a five-states discrete trait. We then took 100 subsampled 1000-leaves trees, reconstructed the ancestry for the discrete trait and assess transitions between states. The error rate is less than 3% on average: it comprises the missing data, as you pointed out, and the errors in reconstructing the ancestry for the trait deeper in the tree.

      We think that overall, less than 3% is a satisfying error rate.

      The results of this specific simulation were added to the paper (lines 150-157) and as Figure 2—figure supplement 1.

      A further question around the methodology was the use of an artificially high fixed clock rate in the phylogenetic analysis so as to date the tree in an unbiased way. Although I understood that the stated action led to the required results, given the time available for review I was unable to figure out why this should be so. Is this an artefact of under-sampling, or of approximations made in the phylogenetic inference? Is this a well-known phenomenon in phylogenetic inference?

      We thank reviewer #1, who was, as reviewer #2 and the editor, disturbed by the use of an artificially fast and fixed molecular clock. It was an artifact to correct a mistake in our code that has been fixed. See the answer to point (3) of the editor.

      The value of this kind of research is highlighted in the paper, in that genomic data can be used to assess and guide public health measures (line 64). This work elucidates several facts about the geographical spread of SARS-CoV-2 within France and between European countries. The more clearly these facts can be translated into improved or more considered public health action, through the evaluation of previous policy actions, or through the explication of how future actions could lead to improved outcomes, the more this work will have a profound and ongoing impact.

      This is a very interesting point to emphasize indeed. We are currently discussing with public health specialists in our institution on how to assess past public health actions using phylodynamics data in a statistically valid manner.

      Reviewer #2 (Public Review):

      This study represents an important contribution to our understanding of SARS-CoV-2 transmission dynamics in France, Europe and globally during the early pandemic in 2020 and the authors should be congratulated for tackling this important question. Through evaluation of the contributions of intra- and inter-regional transmission at global, continental, and domestic levels, the authors provided compelling, although as of yet correlative and incomplete, evidence towards how international travel restrictions reduced inter-regional transmission while permitting increased transmission intra-regionally. Unfortunately, however this work suffers from a number of serious analytical shortcomings, all of which can be overcome in a major revision and re-analysis.

      We would like to thank the reviewer #2 for their evaluation and their various comments. We want to point that reviewer #2 was contacted for advice on strategy for the molecular clock since she performed a study on a similar topic describing SARS-CoV-2 epidemics in Canada during 2020. We strongly believe that all reviewer #2 comments drastically contributed to improve the quality of this work.

      With this genomic epidemiology analysis, the authors disentangled the relative contributions of different geographic levels to transmission events in France and in Europe in the first two COVID-19 waves of 2020. By partitioning the analysis into three complementary, but distinct, geographic levels, the migration flows in and out of continents, countries in Europe, and regions in France were inferred using maximum likelihood ancestral state reconstruction. The major strengths of this paper were the inclusion of multiple geographic levels, the comparison of different rate symmetries in the ancestral character estimation, and the comprehensive qualitative descriptions of comparisons over time and geographies. However, there were also major weaknesses that need to be addressed and are described in more detail below. They include summing across replicates that were drawn with replacement and were not independent; inadequate justification for excluding underrepresented geographies; the assertion that positive correlation between intra-regional transmission and deaths validates the accuracy of the analysis; considering the framework the authors have chosen for this analysis the analysis would accommodate and benefit strongly from increasing the size of the sequence sets selected for analysis in each replicate; and the sparsity of quantitative (over qualitative or exploratory) comparisons and statistics in the reporting of results. In particular, it would greatly strengthen the paper if the authors could better evaluate the effect of travel restrictions on importations and exportations by testing hypotheses, quantifying changes in the presence of restrictions, or estimating inflection points in importation rates.

      We are grateful for this comprehensive listing of the strengths and weaknesses of our study. Regarding the limitations of this study, these will be detailed specifically for each dedicated remark of the reviewer. We would like to emphasize that all the remarks and limitations reported here by reviewer #2 are in our opinion fully justified. We hence have tried to bring additional analyses (study of the Pango lineages, averaging of the subsamples, simulation study to justify the size of the sampling), a modification of the methodology (in particular concerning the molecular clock) and a thorough rewriting of the “Results” section.

      General comments on the Background: Need to elaborate on how this study fits into the big picture in the first paragraph. Should discuss how phylodynamics contributes to understanding of viral outbreaks, SARS-CoV-2 epidemiology and viral evolution.

      We have added in the “Introduction” section some elements to better understand why phylodynamics is an important field in the epidemiology of SARS-CoV-2 and its evolution.

      The authors should consider a hypothesis driven framework for their analyses, for example considering the geographically central position of France what hypotheses stem from this considering sources of viral importations and destinations of exportations from/to Europe vs other international? Or other a priori expectations.

      We agree with reviewer #2 about this remark. Indeed, given the central position of France, we can hypothesize that it has strongly participated in the dissemination of the virus within Europe. This hypothesis has been included in the "Introduction" section of the revised version (lines 102-105).

      To address the computational limits of phylogenetic reconstruction, 100 replicates of fewer than 1000 sequences each were sampled for each epidemic wave at each level. The inter- and intra-regional transmissions were averaged and then summed across replicates in order to compare the relative roles played by each geography towards transmission. While we see the logic in using the sum across replicates, this is highly likely to bias results, especially since in the methods, this is described as sampling with replacement between replicates (LX). The validity of summing replicates needs to be discussed and are likely most appropriately presented as mean or median. Also, these samples are quite small considering the computational capacity of the maximum likelihood tools being used. We recommend repeating the analysis with a substantially larger number of sequences per sample.

      We thank reviewer #2 for this relevant remark. We initially summed the subsamples, a strategy that may possibly bias the results. In the new version of the manuscript, we averaged the subsamples by region and by week as recommended (and stated in the methods, line 536-537).

      About the size of our subsamples, it made no difference to use 1,000, 2,000 or 5,000 genomes in each subsample. To get a more definitive and scientifically sound answer, we performed a simulation assay that has been included in the manuscript and is shown is what is now figure 2 (and figure 2—figure supplement 1). These simulations show that our subsampling strategy allows for an accurate estimate of transition rates for a discrete parameter (lines 107-160).

    1. Author Response

      Reviewer #1 (Public Review):

      The paper addresses an interesting question - how genetic changes in Y. pestis have led to phenotypic divergence from Y. pseudotuberculosis - and provides strong evidence that the frameshift mutation in rcsD is involved. Overall, I found the data to be clearly presented, and most of the conclusions well supported by the data. The authors convincingly show that (i) the frameshift mutation in rcsD alters the regulation of biofilm formation, (ii) this effect depends upon expression of a small protein that corresponds to the C-terminal portion of RcsD, and (iii) the frameshift mutation in rcsD prevents loss of the pgm locus. I felt that the discussion/conclusions about what phosphorylates/dephosphorylates RcsB and how this impacts biofilm formation are overstated, as there are no experiments that directly address this question. I also felt that the authors' model for what phosphorylates/dephosphorylates RcsB in Y. pestis should be more clearly articulated, even if it is only presented as speculation. Lastly, the authors propose that full-length RcsD is made in Y. pestis and contributes to phosphorylation of RcsB, but the evidence for this is weak (faint band in Figure 2d). It may be that the N-terminal domain of RcsD is functional. I recommend either softening this conclusion or testing this hypothesis further, e.g., by introducing an in-frame stop codon early in rcsD after the frame-shift.

      Thanks for your comments. We have provided a model and revised the discussion about phosphorylation/dephosphorylation of RcsB and how this impacts biofilm formation (Figure 8 and Supplementary Figure 4). In addition, we have introduced an in-frame stop codon in rcsD before the frameshift and showed that full-length RcsD is only made in wildtype Y. pestis but not in the rcsDpe-stop mutant (Supplementary Figure 1g).

      Reviewer #2 (Public Review):

      Guo et al. have investigated the consequences of a frameshift mutation in the rcsD gene in the Yersinia pseudotuberculosis progenitor that is conserved in modern Y. pestis strains. Interestingly, they identify a start codon with a ribosome binding site that enables production of an Hpt-domain protein from the C-terminus in Y. pestis. Targeted deletion of this Hpt-domain increased biofilm production in Y. pestis. They find that the ancestral RcsDpstb (full length) is a positive regulator of biofilm in Y. pestis while the Hpt-domain version (RcsDYP) represses biofilm in vitro. When fleas were infected with Y. pestis expressing the ancestral RcsDPSTB protein, there was no difference in bacterial survival or rate of proventricular blockage. This strain also killed mice the same rate (in a different Y. pestis strain background). However, replacing RcsDYP with RcsYPTB dramatically increases the frequency of pgm locus deletion (containing Hms ECM and yersiniabactin genes) during flea infection. The authors predict that this would reduce the invasiveness of the bacteria in mammals and/or flea blockage in subsequent flea-rodent-flea transmission cycles. They also measured global gene expression differences between RcsDPSTB compared to the wild-type strain. They argue that the frameshift of RcsD maintaining the Hpt-domain (RcsDYP) was needed to regulate biofilm while limiting loss of the pgm locus.

      Loss of the pgm locus was not tested in the Y. pestis rcsD mutant strain (lacking the entire gene or just the C-terminal Hpt domain). Therefore, the claim that maintaining the Hpt-domain protein was important lacks convincing evidence. Additionally, it is possible that the population of rcsDpe::rcsDpstb after in vitro growth for 6 days would still be proficient at infecting and blocking fleas, even though many of the bacteria would have lost the pgm locus. Production of Hms polysaccharide by pgm+ could trans-complement those that are pgm-. The nature of the pgm locus loss is assumed to be due to recombination between IS elements. This is certainly the likeliest explanation but not the only one. The authors checked for pgm loss by phenotype (CR binding) and by two sets of primers, one targeting the hmsS gene and another set that is unspecified. Loss of the entire pgm (especially yersiniabactin genes) should be clarified.

      Thanks for your comments. We have now provided the data to show that deletion of RcsD-Hpt resulted in increased loss of the pgm locus (Figure 5d) to strengthen the claim that maintenance of the Hpt-domain is significant for retention of the pgm locus. We also agree that 6-day old cultures of a mixture of pgm+ and pgm- rcsDpe::rcsDpstb will still be capable of infecting and blocking fleas. However, these strains will be less efficient at causing disease in the vertebrate host in the absence of the pgm locus. We agree that recombination between IS elements might not be the only cause of loss of the pgm locus. To verify the loss of the pgm locus, we have used two sets of primers. One set targets the hmsS gene and another set targets the upstream and downstream sequences of the pgm locus (Supplementary Table 3). We have clarified this in the revised manuscript (Line 610-613).

      Reviewer #3 (Public Review):

      The Rcs phosphorelay plays an important role in regulating gene expression in bacteria; most of the current knowledge about the Rcs proteins is from E. coli. Yersinia pestis, carrying mutations in two central components of the Rcs machinery, provides an interesting example of how evolution has shaped this system to fit the life cycle of this bacteria. In bacteria other than Y. pestis, most Rcs activating signals are sensed via the outer membrane lipoprotein RcsF; from there, signalling depends on inner membrane protein IgaA, a negative regulator of RcsD. Histidine kinase RcsC is the source of the phosphorylation cascade that goes from the histidine kinase domain of RcsC to the response regulator domain of RcsC, from there to the histidine phosphotransfer (Hpt) domain of RcsD, and finally to the response regulator RcsB. RcsB, alone or with other proteins, regulates transcription of many genes, both positively and negatively. These authors have previously shown that RcsA, a co-regulator that acts with RcsB at some promoters, is functional in Y. pseudotuberculosis but mutant in Y. pestis, and that this leads to increased biofilm in the flea. The authors also noted that rcsD in Y. pestis contains a frameshift after codon 642 in this 897 aa protein; in theory that should eliminate the Hpt domain from the expressed protein. However, they found evidence that the frame-shifted gene had a role in regulation. This paper investigates this in more depth, providing clear evidence for expression of the Hpt domain (without the N-terminal domain), and demonstrating a critical role for this domain in repressing biofilm formation. The Y. pseudotuberculosis RcsD does not express a detectable amount of the Hpt domain nor does it repress biofilm formation. The ability of the Hpt domain protein to keep biofilm formation low explains most of what is observed for the full-length frame-shifted protein.

      1) The authors provide a substantial amount of data supporting the expression of the C-terminus of RcsD is sufficient and necessary for low biofilm levels, and that this is dependent upon the active site His in the RcsD Hpt domain (H844A) as well as other components of the basic phosphorelay (RcsC and RcsB). However, it is only possible to see this protein by Western blot in 100-fold "Enriched" lysates (Figure 2). No small protein was detected in the RcsDpstb strain, although the enriched lysate was not shown for this. Without that experiment, it is not possible to evaluate whether the small protein is also made from the rcsDpstb gene. Either answer would be interesting, and would allow other conclusions to be drawn. Is the RBS and start codon the same for the HPT region of this rcsD gene (it could be added to Supplementary Table 6). If the small protein is made, is its ability to function blocked by the excess full length protein in terms of interactions with RcsC? Or is the expression of the small protein dependent upon loss of overlapping translation from the upstream start?

      The small Hpt protein may be produced from expression of the epitope tagged rcsDpstb gene as it can be detected in an enriched isolation of this sample (Supplementary Figure 1f). Because only a small amount of the RcsD-Hpt is produced from the rcsDpstb substitution, it might only function at low levels in the presence of large amounts of RcsDpstb. The RBS and start codon are the same for the RcsD-Hpt in Y. pestis and Y. pseudotuberculosis, we have added them in the Supplementary Table 6. In addition, we have provided a model to show the function and regulation of RcsD and Hpt (Supplementary Figure 4).

      2) In many phosphorelays, the protein kinase also acts as a phosphatase, and which direction P flows is critical for regulation. It is often difficult to follow what the model for this is in this paper, and that is important to understand for evaluating the results. Most of this paper uses two assays, biofilm formation and crystal violet staining (also related to biofilm formation) to assess the functioning of the Rcs phosphorelay. Based on the behavior of the rcsB mutant, it would seem that functional Yersinia pestis Rcs (RcsDpe) represses this behavior, and this correlates with RcsB phosphorylation (Figure4). What is the basis (Line 443-44) for saying that RcsD phosphorylates RcsB while RcsDHpt dephosphorylates? Yersinia pseudotuberculosis RcsD(pstb) shows no difference with the rcsB mutant. Doesn't that suggest that RcsDpstb is no longer repressing (phosphorylating)? In the presence of the RcsDpstb as well as multicopy RcsF, an activating signal in other organisms, RcsDpstb seems able to phosphorylate. This all suggests that the full-length protein, like the Hpt domain, is capable of phosphorylating, but that it may be doing nothing in the absence of signal (or dephosphorylating). Given these results, saying that RcsDpstb is positively regulating biofilm formation (Fig.1 title, and elsewhere) is somewhat misleading. What it presumably does is prevent the Hpt domain, expressed from the chromosomal locus in Figure1b, from signalling to RcsB. By itself, it is not clear it is doing anything. Understanding this clearly is important for interpreting this system and the tested mutants. A clear model and how phosphate is flowing in the various situations would help a lot. Currently Supplementary Figure3 seems to reflect the appropriate directional arrows, but the text does not. Moving the rcsB data earlier in the paper (after Figure1, 2, or maybe earlier, before Figure3) would certainly help.

      RcsD dephosphorylates RcsB while RcsD-Hpt phosphorylates RcsB. Expression of RcsDpstb in the wild type strain and the N-term deletion mutant resulted in increased biofilm, indicating RcsB is less phosphorylated (Figure 1b and 1c). While over-expression of RcsD-Hpt resulted in decreased biofilm formation, indicating RcsB is more phosphorylated. In addition, the Phos-tag experiments showed that the RcsDpstb strain has a lower level of phosphorylated RcsB (Figure 4b). Expression of RcsDpstb in the wild type strain showed similar results as a rcsB mutant indicating a lower level of phosphorylated RcsB in the presence of RcsDpstb.

      It is possible that the RcsDpstb interferes with the ability for RcsD-Hpt to phosphorylate RcsB. However, plasmid expression of the rcsDpstb-H844A mutant in the Y. pestis rcsDN-term deletion mutant formed significantly less biofilm than wild type rcsDpstb indicating H844 might be important for RcsD to dephosphorylate RcsB (Supplementary Figure 2b and Line 180-183). In addition, it is known that RcsD plays a dual role in phosphorylation and dephosphorylation of RcsB in other organisms (Majdalani N, et al., 2005, J. Bacteriol. https://doi.org/10.1128/JB.187.19.6770-6778.2005; Wall EA, et al., 2020, Plos Genetics, https://doi.org/10.1371/journal.pgen.1008610; Takeda S., et al., 2001, Mol. Microbiol., https://doi: 10.1046/j.1365-2958.2001.02393.x). We therefore think it is safe to say that the full length RcsD might function to dephosphorylate RcsB. We have modified the model in the revised manuscript (Supplementary Figure 4 and Figure 8). Regulation of RcsB has been investigated previously. The main finding of our manuscript is regulation of RcsB by the mutated RcsD (RcsD-Hpt). Thus, we have moved the known rcsB deletion mutant data to Figure 1 in the revised manuscript as suggested. We kept the rest of data in Figure 4 the same. We think it might be better to first show the mutation of rcsD alters Rcs signaling and then show how this occurs (by affecting RcsB phosphorylation).

      3) The authors show (in their pull-down) that there is a bit of full-length RcsD even in the frame-shifted protein. Is there any clear evidence this does anything here? Does the N-terminus (truncated after the frame-shift) have a function?

      We have introduced a stop codon in rcsDpe and showed that full-length RcsD is made by rcsDpe but not by rcsDpe with the stop codon (Supplementary Figure 1g). RcsDN-term seems do not have a function in our tested condition (Figure 1e).

      4) While the RNA seq data is useful addition here, it is difficult to interpret without a bit more data on the strain used for the RNA seq, including the biofilm phenotypes of the WT and mutant derivatives, as well as the relevant rcsD sequences, and maybe expression of a few genes or proteins (Hms or hmsT). Are these similar in the parallel strains used earlier in the paper and the one for RNA seq, in WT, rcsB- and the RcsDpstb derivative? It would appear that rcsB- and rcsDpstb have opposite effects, at least at 25{degree sign}C, while in Figure4, these two derivatives have similar effects on biofilm. Is this due to temperature, strains, or biofilm genes that are not shown here? It is certainly possible that the ability of the full-length RcsD changes its kinase/phosphatase balance as a function of temperature, or dependent on other differences in these Y. pestis strains.

      The strain used for RNA seq is a derivative of the biovar Microtus strain 201 which has a similar in vitro phenotype as the strain KIM6+ (Line 297-298). We used this strain for RNA seq because it has the virulence plasmid pCD1 and we wanted to analyze the gene expression of this plasmid, which is required for virulence, as well. RNAseq data showed that rcsB- and rcsDpstb have opposite effects on mRNA level of some genes. However, no significant change in expression of biofilm genes was noted in the RNAseq data set. In fact, our previous data has shown that the biofilm related (hmsT and hmsD) genes are only moderately (Less than 2-fold change between wild type and rcsB mutant) regulated by RcsB based on RT-PCR and β-gal analysis (Sun YC, et al., 2012, J. Bacteriol. https:// doi: 10.1128/JB.06243-11and Guo XP, et al., 2015, Sci. Rep. https://doi: 10.1038/srep08412 and Figure 4c).

    1. Author Response

      Reviewer #1 (Public Review):

      Sex determination and dosage compensation are two fundamental mechanisms in organisms with distinct sexes. These mechanisms vary greatly across the various model organisms in which they have been studied. Comparisons across more closely related members of the same genus have already proven productive in the past, to understand how these essential mechanisms evolve. In this study, the authors compare some aspects of the dosage compensation and sex determination mechanisms across two Caenorhabditis species that diverged ~15-30 MYA.

      Previously, the authors have studied dosage compensation and sex determination extensively in C. elegans. Here, they first identify the homologs of some key factors in C. briggsae, a species that independently evolved hermaphroditism. The authors show that some of the key players in these processes play the same roles in C. briggsae as they do in C. elegans. Namely, they show that the nematode-specific SDC-2 protein plays a role in both dosage compensation and sex determination also in C. briggsae, they find the homologs of some of the SMC protein complex that performs dosage compensation also in C. elegans and they study the binding specificity on the X chromosome.

      Overall, the work is thorough and compelling and is very clearly presented. The authors generate a number of genetic tools in C. briggsae and the careful genetic analyses together with a number of binding assays in vivo and in vitro, support the authors' main conclusions: that the main players and genetic regulatory hierarchy are conserved between these two nematodes, but the binding sites for the DCC on the X chromosome have diverged and the mode of binding has changed as well. Whereas in C. elegans the DCC binds sites in the X chromosome that contain multiple sequence motifs in a synergistic manner, in briggsae they seem to do so additively. This latter point is supported by the data, but it could be explored a bit more deeply using the available ChIP-seq data that the authors have generated. In addition, it would be interesting to discuss the possible implications of this difference.

      One minor weakness of this work is that it could be better put in the context of other related comparisons of these mechanisms. For example, the comparison of sex determination pathway by Haag et al. in Genetics 2008, and the comparison of dosage compensation across Drosophila species (Ellison and Bachtrog, Plos Genetics, 2019), and possibly others. The other point that the authors could provide deeper insight into, is the rate of divergence of proteins like SDC-2 (which is thought to be the protein that contacts DNA), versus some other proteins in the DCC and in general other proteins not involved in sex determination or dosage compensation (this doesn't need to be limited to comparing elegans and briggsae as there are numerous Caenorhabditis genomes available). This would provide a more complete view of the evolution of these processes.

      Regarding the comparison of our studies to those of the C. briggsae sex determination pathway described by Haag and others, we have included the following in our revised manuscript:

      Pages 8-9. "Within the Caenorhabditis genus, similarities and differences occur in the genetic pathways governing the later stages of sex determination and differentiation (Haag, 2005). For example, three sex-determination genes required for C. elegans hermaphrodite sexual differentiation but not dosage compensation, the transformer genes tra-1, tra-2, and tra-3, are conserved between C. elegans and C. briggsae and play very similar roles. Mutation of any one gene causes virtually identical masculinizing somatic and germline phenotypes in both species (Kelleher et al., 2008). Moreover, the DNA binding motif for both Cel and Cbr TRA-1 (Berkseth et al., 2013), a Ci/GL1 zincfinger transcription factor that acts as the terminal regulator of somatic sexual differentiation (Zarkower and Hodgkin, 1992), is conserved between the two species.

      At the opposite extreme, the mode of sexual reproduction, hermaphroditic versus male/female, dictated the genome size and reproductive fertility of Caenorhabditis species diverged by only 3.5 million years (Yin et al., 2018; Cutter et al., 2019). Species that evolved self-fertilization (e.g. C. briggsae or C. elegans) lost 30% of their DNA content compared to male/female species (e.g. C. nigoni or C. remanei), with a disproportionate loss of male-biased genes, particularly the male secreted short (mss) gene family of sperm surface glycoproteins (Yin et al., 2018). The mss genes are necessary for sperm competitiveness in male/female species and are sufficient to enhance it in hermaphroditic species. Thus, sex has a pervasive influence on genome content. In contrast to these later stages of sex determination and differentiation, the earlier stages of sex determination and differentiation had not been analyzed in C. briggsae."

      Regarding the comparison to Drosophila dosage compensation, including the work of Ellison and Bachtrog (2019), we included the following in the Discussion of our revised manuscript (page 22) and included related remarks in the abstract.

      "In contrast to the divergence of X-chromosome target specificity between Caenorhabditis species, X-chromosome target specificity has been conserved among Drosophila species. A 21-bp GA-rich sequence motif on X is utilized across Drosophila species to recruit the dosage compensation machinery, although it may not be the sole source of X target specificity (Alekseyendo, 2008; Kuzu, 2016, Ellison, 2008; Alekseyendo, 2013)."

      Regarding a comparison of our work to that of other rapidly evolving processes, we have made the following revision to our Discussion (page 22):

      "Conservation of DNA target specificity among species is also a common theme among developmental regulatory proteins that participate in multiple, unrelated developmental processes, such as Drosophila Dorsal in body-plan specification (Schloop et al., 2020) or Caenorhabditis TRA-1 in hermaphrodite sexual differentiation and male neuronal differentiation (Berkseth et al., 2013; Bayer et al., 2020). Typically, for such multi-purpose proteins, target-site specificity is evolutionarily constrained: protein function is changed far more by changes in the number and location of conserved cis-acting target sequences than by changes in the target sequences themselves (Carroll, 2008; Nitta et al., 2015). Hence, the divergence in X-chromosome target specificity across the Caenorhabditis genus is atypical among developmental regulatory complexes with highly diverse target genes and could have been an important factor for establishing reproductive isolation between species. Our finding is reminiscent of the discovery that centromeric sequences and their corresponding centromere-binding proteins have co-evolved rapidly as a consequence of hybrid incompatibilities (Malik and Henikoff, 2001; Henikoff et al., 2001; Talbert and Henikoff, 2022). Occurrence of rapidly changing DNA targets and their corresponding DNA-binding proteins (see also Lienard et al., 2016; Ting et al., 1998; Ting et al., 2004; Sun et al., 2004) is an increasingly dominant theme contributing to reproductive isolation."

      A brief comment about all three comparisons is also made in the beginning of the Discussion on page 18.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to extend modeling of bispecific engager pharmacology through explicit modelling of the search of T cells for tumour cells, the formation of an immunological synapse and the dissociation of the immunological synapse to enable serial killing. These features have not been included in prior models and their incorporation may improve the predictive value of the model.

      Thank you for the positive feedback.

      The model provides a number of predictions that are of potential interest- that loss of CD19, the target antigen, to 1/20th of its initial expression will lead to escape and that the bone marrow is a site where the tumour cells may have the best opportunity to develop loss variants due to the limited pressure from T cells.

      Thank you for the positive feedback.

      A limitation of the model is that adhesion is only treated as a 2D implementation of the blinatumomab mediated bridge between T cell and B cells- there is no distinct parameter related to the distinct adhesion systems that are critical for immunological synapse formation. For example, CD58 loss from tumours is correlated with escape, but it is not related to the target, CD19. While they begin to consider the immunological synapse, they don't incorporate adhesion as distinct from the engager, which is almost certainly important.

      We agree that adhesion molecules play critical roles in cell-cell interaction. In our model, we assumed these adhesion molecules are constant (or not showing difference across cell populations). This assumption made us to focus on the BiTE-mediated interactions.

      Revision: To clarify this point, we added a couple of sentences in the manuscript.

      “Adhesion molecules such as CD2-CD58, integrins and selectins, are critical for cell-cell interaction. The model did not consider specific roles played by these adhesion molecules, which were assumed constant across cell populations. The model performed well under this simplifying assumption”.

      In addition, we acknowledged the fact that “synapse formation is a set of precisely orchestrated molecular and cellular interactions. Our model merely investigated the components relevant to BiTE pharmacologic action and can only serve as a simplified representation of this process”.

      While the random search is a good first approximation, T cell behaviour is actually guided by stroma and extracellular matrix, which are non-isotropic. In a lymphoid tissue the stroma is optimised for a search that can be approximated as brownian, or more accurately, a correlated random walk, but in other tissues, particularly tumours, the Brownian search is not a good approximation and other models have been applied. It would be interesting to look at observations from bone marrow or other sites to determine the best approximating for the search related to BiTE targets.

      We agree that the tissue stromal factors greatly influence the patterns of T cell searching strategy. Our current model considered Brownian motion as a good first approximation for two reasons: 1) we define tissues as homogeneous compartments to attain unbiased evaluations of factors that influence BiTE-mediated cell-cell interaction, such as T cell infiltration, T: B ratio, and target expression. The stromal factors were not considered in the model, as they require spatially resolved tissue compartments to represent the gradients of stromal factors; 2) our model was primarily calibrated against in vitro data obtained from a “well-mixed” system that does not recapitulate specific considerations of tissue stromal factors. We did not obtain tissue-specific data to support the prediction of T cell movement. This is under current investigation in our lab. Therefore, we are cautious about assuming different patterns of T cell movement in the model when translating into in vivo settings. We acknowledged the limitation of our model for not considering the more physiologically relevant T-cell searching strategies.

      Revision: In the Discussion, we added a limitation of our model: “We assumed Brownian motion in the model as a good first approximation of T cell movement. However, T cells often take other more physiologically relevant searching strategies closely associated with many stromal factors. Because of these stromal factors, the cell-cell encounter probabilities would differ across anatomical sites.”

      Reviewer #3 (Public Review):

      Liu et al. combined mechanistic modeling with in vitro experiments and data from a clinical trial to develop an in silico model to describe response of T cells against tumor cells when bi-specific T cell engager (BiTE) antigens, a standard immunotherapeutic drug, are introduced into the system. The model predicted responses of T cell and target cell populations in vitro and in vivo in the presence of BiTEs where the model linked molecular level interactions between BiTE molecules, CD3 receptors, and CD19 receptors to the population kinetics of the tumor and the T- cells. Furthermore, the model predicted tumor killing kinetics in patients and offered suggestions for optimal dosing strategies in patients undergoing BiTE immunotherapy. The conclusions drawn from this combined approach are interesting and are supported by experiments and modeling reasonably well. However, the conclusions can be tightened further by making some moderate to minor changes in their approach. In addition, there are several limitations in the model which deserves some discussion.

      Strengths

      A major strength of this work is the ability of the model to integrate processes from the molecular scales to the populations of T cells, target cells, and the BiTE antibodies across different organs. A model of this scope has to contain many approximations and thus the model should be validated with experiments. The authors did an excellent job in comparing the basic and the in vitro aspects of their approach with in vitro data, where they compared the numbers of engaged target cells with T cells as the numbers of the BiTE molecules, the ratio of effector and target cells, and the expressions of the CD3 and CD19 receptors were varied. The agreement with the model with the data were excellent in most cases which led to several mechanistic conclusions. In particular, the study found that target cells with lower CD19 expressions escape the T cell killing.

      The in vivo extension of the model showed reasonable agreements with the kinetics of B cell populations in patients where the data were obtained from a published clinical trial. The model explained differences in B cell population kinetics between responders and non-responders and found that the differences were driven by the differences in the T cell numbers between the groups. The ability of the model to describe the in vivo kinetics is promising. In addition, the model leads to some interesting conclusions, e.g., the model shows that the bone marrow harbors tumor growth during the BiTE treatment. The authors then used the model to propose an alternate dosage scheme for BiTEs that needed a smaller dose of the drug.

      Thank you for the positive comments.

      Weaknesses

      There are several weaknesses in the development of the model. Multiscale models of this nature contain parameters that need to be estimated by fitting the model with data. Some these parameters are associated with model approximations or not measured in experiments. Thus, a common practice is to estimate parameters with some 'training data' and then test model predictions using 'test data'. Though Supplementary file 1 provides values for some of the parameters that appeared to be estimated, it was not clear which dataset were used for training and which for test. The confidence intervals of the estimated parameters and the sensitivity of the proposed in vivo dosage schemes to parameter variations were unclear.

      We agree with the reviewer on the model validation.

      Revision: To ensure reproducibility, we summarized model assumptions and parameter values/sources in the supplementary file 1. To mimic tumor heterogeneity and evolution process, we applied stochastic agent-based models, which are challenging to be globally optimized against the data. The majority of key parameters was obtained or derived from the literature. Details have been provided in the response to Reviewer 3 - Question 1. In our modeling process, we manually optimized sensitive coefficient (β) for base model using pilot in-vitro data and sensitive coefficient (β) for in-vivo model by re-calibrating against the in-vitro data at a low BiTE concentration. BiTE concentrations in patients (mostly < 2 ng/ml) is only relevant to the low bound of the concentration range we investigated in vitro (0.65-2000 ng/ml). We have added some clarification/limitation of this approach in the text (details are provided in the following question). We understand the concerns, but the agent-based modeling nature prevent us to do global optimization.

      The model appears to show few unreasonable behaviors and does not agree with experiments in several cases which could point to missing mechanisms in the model. Here are some examples. The model shows a surprising decrease in the T cell-target cell synapse formation when the affinity of the BiTEs to CD3 was increased; the opposite should have been more intuitive. The authors suggest degradation of CD3 could be a reason for this behavior. However, this probably could be easily tested by removing CD3 degradation in the model. Another example is the increase in the % of engaged effector cells in the model with increasing CD3 expressions does not agree well with experiments (Fig. 3d), however, a similar fold increase in the % of engaged effector cells in the model agrees better with experiments for increasing CD19 expressions (Fig. 3e). It is unclear how this can be explained given CD3 and CD19 appears to be present in similar copy numbers per cell (~104 molecules/cell), and both receptors bind the BiTE with high affinities (e.g., koff < 10-4 s-1).

      Thank you for pointing this out. The bidirectional effect of CD3 affinity on IS formation is counterintuitive. In a hypothetical situation when there is no CD3 downregulation, the bidirectional effect disappears (as shown below), consistent with our view that CD3 downregulation accounts for the counterintuitive behavior. We have included the simulation to support our point. From a conceptual standpoint, the inclusion of CD3 degradation means the way to maximize synapse formation is for the BiTE to first bind tumor antigen, after which the tumor-BiTE complex “recruits” a T cell through the CD3 arm.

      We agree that the model did not adequately capture the effect of CD3 expression at the highest BiTE concentration 100 ng/ml, while the effects at other BiTE concentrations were well captured (as shown below, left). The model predicted a much moderate effect of CD3 expression on IS formation at the highest concentration. This is partly because the model assumed rapid CD3 downregulation upon antibody engagement. We did a similar simulation as above, with moderate CD3 downregulation (as shown below, right). This increases the effect of CD3 expression at the highest BiTE concentration, consistent with experiments. Interestingly, a rapid CD3 downregulation rate, as we concluded, is required to capture data profiles at all other conditions. Considering BiTE concentration at 100 ng/ml is much higher than therapeutically relevant level in circulation (< 2 ng/ml), we did not investigate the mechanism underlying this inconsistent model prediction but we acknowledged the fact that the model under-predicted IS formation in Figure 3d. Notably, this discrepancy may rarely appear in our clinical predictions as the CD3 expression is low level and blood BiTE concentration is very low (< 2 ng/ml).

      Revision: we have made text adjustment to increase clarity on these points. In addition, we added: “The base model underpredicted the effect of CD3 expression on IS formation at 100 ng/ml BiTE concentration, which is partially because of the rapid CD3 downregulation upon BiTE engagement and assay variation across experimental conditions.”

      The model does not include signaling and activation of T cells as they form the immunological synapse (IS) with target cells. The formation IS leads to aggregation of different receptors, adhesion molecules, and kinases which modulate signaling and activation. Thus, it is likely the variations of the copy numbers of CD3, and the CD19-BiTE-CD3 will lead to variations in the cytotoxic responses and presumably to CD3 degradation as well. Perhaps some of these missing processes are responsible for the disagreements between the model and the data shown in Fig. 3. In addition, the in vivo model does not contain any development of the T cells as they are stimulated by the BiTEs. The differences in development of T cells, such as generation of dysfunctional/exhausted T cells could lead to the differences in responses to BiTEs in patients. In particular, the in vivo model does not agree with the kinetics of B cells after day 29 in non-responders (Fig. 6d); could the kinetics of T cell development play a role in this?

      We agree that intracellular signaling is critical to T cell activation and cytotoxic effects. IS formation, T cell activation, and cytotoxicity are a cascade of events with highly coordinated molecular and cellular interactions. Compared to the events of T cell activation and cytotoxicity, IS formation occurs at a relatively earlier time. As shown in our study, IS formation can occur at 2-5 min, while the other events often need hours to be observed. We found that IS formation is primarily driven by two intercellular processes: cell-cell encounter and cell-cell adhesion. The intracellular signaling would be initiated in the process of cell-cell adhesion or at the late stage of IS formation. We think these intracellular events are relevant but may not be the reason why our model did not adequately capture the profiles in Figure 3d at the highest BiTE concentrations. Therefore, we did not include intracellular signaling in the models. Another reason was that we simulated our models at an agent level to mimic the process of tumor evolution, which is computationally demanding. Intracellular events for each cell may make it more challenging computationally.

      T cell activation and exhaustion throughout the BiTE treatment is very complicated, time-variant and impacted by multiple factors like T cell status, tumor burden, BiTE concentration, immune checkpoints, and tumor environment. T cell proliferation and death rates are challenging to estimate, as the quantitative relationship with those factors is unknown. Therefore, T cell abundance (expansion) was considered as an independent variable in our model. T cell counts are measured in BiTE clinical trials. We included these data in our model to reveal expanded T cell population. Patients with high T cell expansion are often those with better clinical response. Notably, the T cell decline due to rapid redistribution after administration was excluded in the model. T cell abundance was included in the simulations in Figure 6 but not proof of concept simulations in Figure 7.

      In Figure 6d, kinetics of T cell abundance had been included in the simulations for responders and non-responders in MT103-211 study. Thus, the kinetics of T cell development can’t be used to explain the disagreement between model prediction and observation after day 29 in non-responders. The observed data is actually median values of B-cell kinetics in non-responders (N = 27) with very large inter-subject variation (baseline from 10-10000/μL), which makes it very challenging to be perfectly captured by the model. A lot of non-responders with severe progression dropped out of the treatment at the end of cycle 1, which resulted in a “more potent” efficacy in the 2nd cycle. This might be main reason for the disagreement.

      Variation in cytotoxic response was not included in our models. Tumor cells were assumed to be eradicated after the engagement with effecter cells, no killing rate or killing probability was implemented. This assumption reduced the model complexity and aligned well with our in-vitro and clinical data. Cytotoxic response in vivo is impacted by multiple factors like copy number of CD3, cytokine/chemokine release, tumor microenvironment and T cell activation/exhaustion. For example, the cytotoxic response and killing rate mediated by 1:1 synapse (ET) and other variants (ETE, TET, ETEE, etc.) are supposed to be different as well. Our model did not differentiate the killing rate of these synapse variants, but the model has quantified these synapse variants, providing a framework for us to address these questions in the future. We agree that differentiate the cytotoxic responses under different scenarios cell may improve model prediction and more explorations need to be done in the future.

      Revision: We added a discussion of the limitations which we believe is informative to future studies.

      “Our models did not include intracellular signaling processes, which are critical for T activation and cytotoxicity. However, our data suggests that encounter and adhesion are more relevant to initial IS formation. To make more clinically relevant predictions, the models should consider these intracellular signaling events that drive T cell activation and cytotoxic effects. Of note, we did consider the T cell expansion dynamics in organs as independent variable during treatment for the simulations in Figure 6. T cell expansion in our model is case-specific and time-varying.”

      References:

      Chen W, Yang F, Wang C, Narula J, Pascua E, Ni I, Ding S, Deng X, Chu ML, Pham A, Jiang X, Lindquist KC, Doonan PJ, Blarcom TV, Yeung YA, Chaparro-Riggers J. 2021. One size does not fit all: navigating the multi-dimensional space to optimize T-cell engaging protein therapeutics. MAbs 13:1871171. DOI: 10.1080/19420862.2020.1871171, PMID: 33557687

      Dang K, Castello G, Clarke SC, Li Y, AartiBalasubramani A, Boudreau A, Davison L, Harris KE, Pham D, Sankaran P, Ugamraj HS, Deng R, Kwek S, Starzinski A, Iyer S, Schooten WV, Schellenberger U, Sun W, Trinklein ND, Buelow R, Buelow B, Fong L, Dalvi P. 2021. Attenuating CD3 affinity in a PSMAxCD3 bispecific antibody enables killing of prostate tumor cells with reduced cytokine release. Journal for ImmunoTherapy of Cancer 9:e002488. DOI: 10.1136/jitc-2021-002488, PMID: 34088740

      Gong C, Anders RA, Zhu Q, Taube JM, Green B, Cheng W, Bartelink IH, Vicini P, Wang BPopel AS. 2019. Quantitative Characterization of CD8+ T Cell Clustering and Spatial Heterogeneity in Solid Tumors. Frontiers in Oncology 8:649. DOI: 10.3389/fonc.2018.00649, PMID: 30666298

      Mejstríková E, Hrusak O, Borowitz MJ, Whitlock JA, Brethon B, Trippett TM, Zugmaier G, Gore L, Stackelberg AV, Locatelli F. 2017. CD19-negative relapse of pediatric B-cell precursor acute lymphoblastic leukemia following blinatumomab treatment. Blood Cancer Journal 7: 659. DOI: 10.1038/s41408-017-0023-x, PMID: 29259173

      Samur MK, Fulciniti M, Samur AA, Bazarbachi AH, Tai YT, Prabhala R, Alonso A, Sperling AS, Campbell T, Petrocca F, Hege K, Kaiser S, Loiseau HA, Anderson KC, Munshi NC. 2021. Biallelic loss of BCMA as a resistance mechanism to CAR T cell therapy in a patient with multiple myeloma. Nature Communications 12:868. DOI: 10.1038/s41467-021-21177-5, PMID: 33558511

      Xu X, Sun Q, Liang X, Chen Z, Zhang X, Zhou X, Li M, Tu H, Liu Y, Tu S, Li Y. 2019. Mechanisms of relapse after CD19 CAR T-cell therapy for acute lymphoblastic leukemia and its prevention and treatment strategies. Frontiers in Immunology 10:2664. DOI: 10.3389/fimmu.2019.02664, PMID: 31798590

      Yoneyama T, Kim MS, Piatkov K, Wang H, Zhu AZX. 2022. Leveraging a physiologically-based quantitative translational modeling platform for designing B cell maturation antigen-targeting bispecific T cell engagers for treatment of multiple myeloma. PLOS Computational Biology 18: e1009715. DOI: 10.1371/journal.pcbi.1009715, PMID: 35839267

    1. Author Response

      Reviewer #1 (Public Review):

      Following previous publications showing that NR2F2 controls atrial identity in the mouse and human iPS cells, the authors address in the fish the role of the transcription factor Nr2f1a, which is specific to the atrial chamber. This had been initiated in a previous publication (Duong et al, 2018) and is extended in this manuscript. In mutant fish, the atrial chamber is smaller and mispatterned. Markers of the atrioventricular canal and of the pacemaker are expanded. Transcriptomic analyses and electrophysiological measures further support this observation. A putative enhancer of nkx2.5 is identified by ATAC-seq and shown to be repressed in nr2f1a mutants, suggesting that Nkx2.5, a known repressor of pacemaker identity, may be a mediator of Nr2f1a. Overexpression of nkx2.5 delays the appearance of pacemaker cells, and is proposed to partially rescue the absence of nr2f1a.

      Overall, this work provides novel insight into the mechanism of atrial chamber patterning in the fish and discusses the conservation of the role of nr2f1a. However, the claim that atrial cells switch their identity into ventricular and pacemaker cells is currently not demonstrated. Alternative hypotheses of mispatterning, cell number changes by proliferation, survival, or ingression are not ruled out by the data presented. The claim that "Nr2f1a maintains atrial nkx2.5 expression" or of a "progressive loss of Nkx2.5 within the ACs" needs to be further supported. The definition of "atrial cells (AC)" varies between figures.

      Major comments:

      1) The definition of "AC" varies from figure to figure: amhc+ in Fig 1A, amhc+vmhc- in Fig.1S1A, amhc+fgf13a- in Fig. 2 and 5, morphological area in Fig. 3. Please clarify how the atrial chamber is delineated in mutants in Fig. 3 since the avc constriction is not obvious.

      a. As stated in the response to Essential Revisions comment 1.B, we have tried to clarify the definitions of the cardiomyocytes populations in the revised text by indicating the specific markers used in the text and the figures. We then provide our interpretation for what this means regarding the different cardiomyocyte populations.

      b. Since the analysis of the electrophysiology cannot be performed with markers or the transgenic zebrafish embryos using GFP, we chose areas for analysis closer to the middle of the morphological atrium in the nr2f1a mutant and WT sibling control embryo hearts that would be consistent with having Amhc+ expression and fgf13a:EGFP+ transgenic and Isl1 markers that were found from the analysis with immunohistochemistry. This strategy was schematized in Figure 3A and is now explicitly stated on lines 266 and 267 of the revised manuscript.

      2) The claim of a switch in cell identity or transdifferentiation is not demonstrated. This would require cell tracking or single-cell transcriptomics. I don't see how "AVC (..) [is] resolving to ventricular identity", since amhc seems to be maintained throughout the atrial chamber at all stages. The claim that "the number of vmhc+ only cardiomyocytes progressively increased" is not supported by Fig1S1. The expansion of pacemaker cells may result from cell ingression at the arterial pole. This hypothesis is in keeping with the expression of nr2f1a outside the heart tube in putative atrial progenitors (Duong, 2018). The phenotype upon nkx2.5 overexpression may also be interpreted along this line: ingression of pacemaker cells is delayed. The claim that "PC identity progressively expands throughout nr2f1a mutant atria" is not supported by the quantifications of a mean of 12 fgf13a+amhc+ cells at 96hpf (Fig. 2H), which is as many as fgf13a-amhc+ cells (Fig. 2G) and a quarter of the total amhc+ cells in Fig. 1J. The schema in Fig 6 does not reflect quantifications at 96hpf, which indicate the persistence of amhc+vmhc+ cells, amhc+ only, or amhc+fgf13a- in Fig 1S1 and 2G.

      "We did not observe effects on cell death or proliferation in the hearts of nr2f1a mutants": please provide the data, since proliferation was shown to be affected in mouse mutants (Wu, 2013).

      a. As indicated above in our response to the Essential Revisions comment 1.D, our quantification of cardiomyocytes indicates there are progressively fewer Amhc+/Vmhc+ cardiomyocytes in the nr2f1a mutant hearts (Figure 1J-L). The total number of Vmhc+ cardiomyocytes (Amhc+/Vmhc+ and Amhc-/Vmhc+) cardiomyocytes is increased in the nr2f1a mutant hearts relative to the WT sibling hearts. However, the number of Vmhc+-only (Amhc-/Vmhc+) cardiomyocytes, which reflect the ventricles, does not increase significantly in the n2f1a mutants and are not statistically different than their WT siblings at each of the stages, despite their trending that way (Figure 1 – figure supplement 2C). The total number of cardiomyocytes in the nr2f1a mutant hearts also is not increasing during these stages (Figure 1L). Along with the lack of cardiomyocyte death or proliferation (Figure 1 – figure supplements 3 and 4), this suggests that these hearts have more total Vmhc+ cardiomyocytes and the addition of Vmhc+-only cardiomyocytes is primarily coming from the cardiomyocytes in the Vmhc+/Amhc+ atrioventricular canal progressively losing Amhc expression. As indicated in the response to Essential Revisions comment 1.D, we have provided the individual image channels in a revised Figure 1 – figure supplement 1 and proportions of Vmhc+ cardiomyocytes in Figure 1 – figure supplement 2D to help clarify this issue.

      b. Regarding the transdifferentiation vs ingression of newly-differentiating cardiomyocyte hypotheses for the expansion of pacemaker markers, was addressed in the response to Essential Revision comment 2. Please see that comment for how we addressed this concern.

      3) The claim that "Nr2f1a maintains atrial nkx2.5 expression" or of a "progressive loss of Nkx2.5 within the ACs" needs to be further supported by quantification of the number of nkx2.5 positive cells in nr2f1a mutants. It seems that some cells in Fig. 4 co-express nkx2.5 and pacemaker markers in the mutant, which questions the repressive role of Nkx2.5. Following the observation of an nkx2.5 enhancer active next to pacemaker cells in control heart but absent in nr2f1a mutants, shouldn't we expect a gap of nkx2.5 expression next to pacemaker cells in mutants? It is unclear why pacemaker cells express nr2f1a (Fig. 6S1) but not nkx2.5. This needs clarification.

      a. The repressive role of Nkx2.5 with respect to pacemaker identity has been well documented in zebrafish and mice (Colombo et al., 2018). Nkx2.5 and Isl1 expression at the venous pole of zebrafish hearts are predominantly mutually exclusive, although there are a few cardiomyocytes at their borders that the express both Nkx2.5 and pacemaker markers. We recgonize that there are still some Nkx2.5-expressing cardiomyocytes that overlap with the pacemaker maker cardiomyocytes in the nr2f1a mutant hearts, as shown in Figure 4F. However, the majority of these cardiomyocytes have lower expression than the adjacent cardiomyocytes that form a border and do not have overlapping expression. Furthermore, as shown in Figure 4D-F and Figure 4 – figure supplement 2, the overall effect appears to be a regression of Nkx2.5+ expression in cardiomyocytes and corresponding expansion of pacemaker markers from the venous pole from 48 though 96 hpf in the nr2f1a mutant hearts, consistent with the established role of Nkx2.5 in repressing pacemaker identity. In the revised manuscript, we have provided each of the individual channels for the images in Figure 4 to better allow visualization of the different cardiomyocyte markers and a new supplemental figure showing the predominantly mutually exclusive expression of Nkx2.5 and Isl1 at the venous pole of zebrafish embryo hearts (Figure 4 – figure supplement 1).

      b. The expression of Nkx2.5 within the heart, like any gene, is likely controlled by multiple different regulatory elements. It is not clear to us why Reviewer #1 feels one would expect to see a gap in expression between Nkx2.5+ and pacemaker cardiomyocytes in the nr2f1a mutant hearts, unless Nkx2.5 was not required to repress pacemaker identity or there was a significant delay between loss of Nkx2.5 and gain of pacemaker markers. As indicated in the response to Essential Revisions comment 3.C, in the revised manuscript, we show experiments in which we have deleted the putative nkx2.5 enhancer element and found there is a loss of Nkx2.5+ and gain of fgf13a:EGFP+ cardiomyocytes in the atrium, as one might expect if the enhancer promotes or maintains Nkx2.5 expression in atrial cardiomyocytes that border the pacemaker cardiomyocytes. In the revised manuscript, this experiment is described in the Results (lines 348-364 and included in a revised Figure 6 and new Figure 6 – figure supplement 2.

      c. Please see our response to Essential Revision comment 3.A regarding the issue of Nr2f1a expression in pacemaker cardiomyocytes.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Warren et al., presents evidence suggesting that aberrant Yap signaling plays a role in epithelial progenitor cell dysregulation in lung fibrosis. This work builds on a body of work in the literature that Hippo signaling is aberrantly regulated in idiopathic pulmonary fibrosis. They use a combination of single nuclear and spatial transcriptomics, together with in vivo conditional genetic perturbations of Hippo signaling in mice, to investigate roles for Yap/Taz signaling in alveolar epithelial homeostasis and remodeling associated with exposure to a fibrosing agent, bleomycin. They show that Taz and Tead1/4 are most abundantly expressed by alveolar type 1 (AT1) cells, but Nf2 immunoreactivity (upstream activator of Hippo) is observed predominantly within airway and AT2 cells. Bleomycin exposure was associated with reduced p-Mst in regenerating alveolar epithelium, that inactivation of Yap/Taz arrested AT2>AT1 differentiation, and inactivation of either Nf2 or Mst1/2 promoted AT1 differentiation after bleomycin exposure and reduced matrix deposition/fibrosis. They go on to show that compromised alveolar regeneration resulting from inactivation of Yap/Taz results in enhanced bronchiolization of injured alveoli. Experiments are well designed and include quantitative endpoints where appropriate, data of high quality, and results are generally supportive of conclusions. These studies provide valuable new data relating to roles for the Hippo pathway in regulation of alveolar homeostasis and epithelial regeneration/remodeling in injury/repair and fibrosis.

      We thank the reviewer for their enthusiastic and constructive comments.

      Reviewer #2 (Public Review):

      The authors explored non-redundant, and potentially contrasting, roles of the Hippo effector transcription factors, YAP and TAZ, in the epithelial regenerative response to non-infectious lung injury. The strength of the work is the use of genetic mouse models that explored inducible loss of function of YAP and/or TAZ in an alveolar epithelial type 2 (AT2) specific manner. The main weakness of the work is that gene(s) inactivation was performed prior to lung injury and, therefore, does not take into account the contextual and dynamic nature of YAP/TAZ signaling; for example, work by other groups have shown that YAP/TAZ is activated early following injury followed by a decrease in activity, thus balancing proliferation and differentiation of AT2 cells (for review, see PMID: 34671628).

      We thank the reviewer for their enthusiastic and constructive comments.

      We agree that knocking out genes prior to injury might not take into account the contextual and dynamic nature of YAP/TAZ signaling. However, the Hippo pathway allows cells to sense changes in their environment. We have published that in the airway epithelium the Hippo pathway becomes inactivated upon naphthalene injury in surviving airway epithelial cells sensing the loss of their neighbors, to induce Wnt7b expression which then induces Fgf10 expression in airway smooth muscle cells to drive airway epithelial regeneration. Normally when regeneration is complete and cell density is restored the Hippo pathway reactivates and the repair cascade is inactivated. Knocking out Mst1/2 in airway epithelium chronically activates this cascade and leads to overproliferation of the airway epithelium. Interestingly, upon inactivation of Mst1/2 in the airway epithelium some airway epithelial cells also turn into AT1 cells.

      However, AT1 cells do not proliferate. As such we believe that inactivation of Mst1/2 or Nf2 in AT2 cells will not result in overproliferation but mainly promote AT1 cell differentiation. That being said there are other pathways and molecules that affect Yap/Taz nuclear localization. So inactivation of Mst1/2 or Nf2 in AT2 cells most likely primes/activates AT2 cells to regenerate AT1 cells but this decision is likely not binary.

      Reviewer #3 (Public Review):

      The manuscript entitled "Hippo signaling impairs alveolar epithelial regeneration in pulmonary fibrosis" is a rigorous and timely report detailing the significance of Hippo signaling, Taz and Yap in AT2/AT1 differentiation and the subsequent impact on the progression of lung fibrosis versus repair/ regeneration. The authors experimental design and results support their conclusions. The identification of the distinct effects of Taz and Yap in these processes highlight the pathway and specific molecules as potential therapeutic targets.

      The major strengths of these studies lie in the rigor of the elegant transgenic developmental/adult injuryrepair mouse models combined with spatial transcriptomics and analyses. The weaknesses include a lack of detail presented in the methods, some legends and discussion.

      We thank the reviewer for their enthusiastic and constructive comments. And have addressed the issues raised.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper showing that during amino acid starvation of Neurospora, the general amino acid control factors CPC-1 and CPC-3 are crucial to maintaining circadian rhythm at the levels of rhythmic growth and transcription of the FRQ gene. They show that deleting both genes leads to reduced and arrhythmic cell growth and FRQ transcription that can be accounted for by severely reduced occupancy of the FRQ promoter by the key transcription factor WCC. This defect in turn appears to result from diminished H3 acetylation of the FRQ promoter that was observed at least in the cpc-1 mutant, which is mediated by Gcn5. Thus, they show that Gcn5 occupancy at FRQ is rhythmic and impaired by cpc-1 knock-out, that CPC-1 occupies the FRQ promoter, and provide coIP evidence that Cpc-1 interacts with Gcn5 and Ada2 and, hence, could act directly to recruit these cofactors to the FRQ promoter. Importantly, they show that knock out of GCN5 eliminates rhythmic cell growth and FRQ expression (although surprisingly not FRQ mRNA abundance), as well as reducing H3ac levels and WCC binding at FRQ. They further show that TSA treatment can reverse the effects of histidine starvation on the circadian period in WT cells, and can partially restore rhythmic growth to histidine-starved cpc-3 cells, and that elimination of HDAC Hda1 increases H3ac at FRQ in WT cells. They provide some evidence that transcriptional activation of certain aa biosynthetic genes by CPC-1 is also rhythmic, although the evidence for this is not strong and it's unclear whether CPC-1 occupancy or its activation function would be periodic. They also did not address whether CPC-1 occupancy at FRQ is rhythmic.

      This work is important in providing convincing evidence that CPC-1-mediated induction of transcription factor CPC-3 in starved Neurospora cells mediates CPC-1-mediated recruitment of Gcn5 and acetylation of the FRQ promoter, which counteracts the function of histone deacetylase HDA1 to maintain high occupancy of the transcription factor WCC and attendant circadian rhythm of FRQ transcription. Although the work does not identify new regulatory circuits, such as rhythmic transcription of FRQ, the role of Gcn5, Hda1, and promoter histone acetylation in supporting transcriptional activation, and the general amino acid control response to amino acid starvation are all well-established mechanisms, the work is significant in showing how these pathways and mechanisms are integrated to maintain circadian rhythm in the face of amino acid limitation.

      There is an abundance of convincing experimental evidence provided to support the key claims just summarized above. However, there are a few instances in which additional experiments might be required to resolve a discrepancy in the data or provide stronger evidence to support a claim.

      Thanks for the comments. We have revised the manuscript as suggested.

      Reviewer #2 (Public Review):

      This study by Liu et al. investigates the mechanism that enables the Neurospora circadian clock to maintain robust molecular and physiological rhythms under conditions of nutrient stress. The authors showed that the nutrient-sensing GCN2 signaling pathway is required to maintain robust circadian clock function and output rhythms under amino acid starvation in the filamentous fungus Neurospora. Specifically, they observed that under amino acid starvation conditions, knocking out GCN2 pathway components GCN4 (CPC-1) and GCN2 (CPC-3) severely disrupts rhythmic transcription of core clock gene frequency (frq) and clock-regulated conidiation rhythm. They provided data to indicate that the observed disruptions are due to reduced binding of the White Collar (WC) complex to the frq promoter stemming from lower histone H3 acetylation levels. This prompted the authors to propose a model in which GCN2 (CPC-3) and GCN4 (CPC-1) are activated upon sensing amino acid starvation, recruit GCN-5 containing SAGA acetyltransferase complex to maintain robust histone acetylation rhythm at the frq promoter. They then performed a battery of assays to show that both GCN-5 and ADA-2 are necessary for maintaining robust H3ac, frq mRNA, and conidiation rhythms under normal conditions. To support that low H3ac level at the frq promoter is the cause for impaired WC binding and frq transcription, they demonstrated they can partially rescue the observed rhythm defects of the knockout mutants under amino acid starvation using an HDAC inhibitor. Finally, the authors used RNA-seq to identify genes and pathways that are differentially activated by GCN4 (CPC-1) under amino acid starvation conditions. Many of these genes are involved in amino acid metabolism and they showed that 3 of them exhibit rhythmic expression in WT but low and non-rhythmic expression in the CPC-1 KO strain.

      Strength: The 24-hour period length of the circadian clock is known to be stable over a range of environmental and metabolic conditions because of circadian compensation mechanisms. Whereas temperature compensation (maintenance of circadian period length over a physiological range of temperature) has been studied extensively in multiple model organisms, the phenomenon of nutritional compensation and its underlying mechanisms are poorly understood. This study provides new insights into this important yet understudied area of research in chronobiology. In addition to advancing our understanding of fundamental mechanisms governing clock compensation mechanisms, this study also adds to our understanding of metabolic regulation of rhythmic biology and the relationship between nutrition and healthy biological rhythms. Given that the GCN2 nutrient-sensing pathway is broadly conserved beyond Neurospora, findings from this study will likely be relevant to other eukaryotic systems.

      The authors provided strong evidence supporting their claims that the GCN2 signaling pathway is important for maintaining the robustness of the Neurospora clock under conditions of amino acid starvation. The authors performed parallel experiments in normal (no 3-AT) vs amino acid-starved conditions (+3-AT). Their observations of relatively minor disruptions of molecular and conidiation rhythms in cpc-3 and cpc-1 KO strains in normal nutrient conditions compared to starvation conditions support their model that sensing of amino acid starvation by GCN2 pathway-induced changes at the chromatin and transcriptional level that are necessary to maintain a robust frq oscillator. Without the comparison between normal vs amino acid starved conditions, this part of their model will not be as strong.

      Previously Karki et al. (2020) showed that rhythmic activation of GCN2 kinase is regulated by the clock, resulting in clock-control rhythmic translation initiation. This study uncovers an additional mechanism through which GCN2 pathway modulates circadian rhythms by regulating histone acetylation of rhythmic genes. RNA-seq as described in Figure 7 provides some potential targets.

      Thanks for the comments and suggestions. We have revised the manuscript as suggested.

      Weakness:

      (1) The authors propose a model (Figure 8) in which the GCN2 pathway is ,activated by amino acid starvation and recruits the SAGA complex to promote histone acetylation level at the frq promoter. There is however no data in this study showing that the GCN2 pathway is activated in amino acid-starved conditions, only that it is required to maintain robust frq and conidiation rhythms. The authors should clarify how they are defining "activation of the GCN2 pathway" in this study. For example, is it recruitment of GCN-5 and SAGA complex to frq promoter?

      Thanks for the question. CPC-3, the GCN2 homolog in Neurospora, is the only eIF2α kinase responsible for eIF2α phosphorylation at serine 51(Karki S et al. 2020, PMID: 32355000). As shown in the revised Figure 1-figure supplement 1A, the eIF2α phosphorylation and CPC-1 were induced by 3-AT treatment in the WT but not in the cpc-3KO strain. These results demonstrate that the GCN2 pathway is activated by amino acid starvation, and as a result, the CPC-1 expression is activated to recruit the SAGA complex to the frq promoter.

      (2) The experiments to examine the involvement of GCN-5 and ADA-2 were performed in normal conditions (no amino acid starvation). Unlike cpc-1 and cpc-3 KO strains, gcn-5 and ada-2 KO strains showed severely disrupted frq rhythms in normal nutrient conditions, suggesting they are normally required for robust circadian rhythms. If GCN-5 and the SAGA complex are normally involved in regulating H3ac rhythms in the frq loci, how does GCN2 pathway modulates the activity of GCN-5 and SAGA complex in conditions of amino acid starvation? Are the interactions between GCN2/4 with GCN-5 and SAGA complex different in normal vs amino acid starved conditions? The authors should clarify their model.

      As mentioned above, our data suggested that GCN-5 and ADA-2 are required for robust circadian rhythms under normal conditions. As suggested, we did detect dampened rhythmic expression of frq in the gcn-5KO and ada-2KO strains under amino acid starvation (Figure 5D and 5E and Figure 5–figure supplement 1E and 1F). We also performed Co-IP to compare the difference of interactions between CPC-1 with ADA-2 and GCN5 with ADA-2 under normal and amino acid starved conditions. The results showed that although the Myc.GCN-5, MYC.CPC-1 or Flag.ADA-2 protein level was repressed by 3 mM 3-AT treatment (likely due to global translational inhibition by induced eIF2α phosphorylation) (Karki S et al. 2020, PMID: 32355000), the interactions between CPC-1 with ADA-2 and GCN-5 with ADA-2 were almost the same under normal and amino acid starved conditions (IP was normalized with Input) (Figure 4B and 4C). These results indicated that amino acid starved conditions had little impact on the protein interactions between CPC-1 with GCN-5 and SAGA complex.

      In our model, we proposed that amino acid starvation resulted in compact chromatin structure (due to decreased H3ac) in the frq promoter in the WT strain (Figure 3B), likely due to activation of histone deacetylases or inhibition of histone acetyltransferases. Amino acid starvation activates GCN2 pathway and induces CPC-1 expression. The induced CPC-1 can recruit GCN5-containing SAGA complex to the frq promoter to loosen the chromatin structure, promoting frq rhythmic transcription under starvation conditions. However, in the cpc-3KO mutants, CPC-1 could not effectively recruit GCN5 containing SAGA complex to frq promoter, resulting in arrhythmic frq transcription. We have now clarified our model in the revised discussion.

      (3) Given that the GCN2 pathway is important for nutrient sensing, the authors should not disregard the alternative hypothesis that the GCN2 pathway may be important for nutrient compensation and plays a role in maintaining the robustness of rhythms in a range of nutrient conditions.

      Thanks for the suggestion. We now discussed the alternative hypothesis in the revised manuscript. “Because GCN2 signaling pathway is important for nutrient sensing, it may be important for nutrient compensation and plays a role in maintaining the robustness of rhythms in a range of nutrient conditions”.

      (4) The authors should use circadian statistics to compute the phase and amplitude of the mRNA, DNA binding of the WC complex, and H3Ac rhythms. This will allow them to compare between rhythms and provide statistical significance values, rather than just providing qualitative descriptions. This will be valuable when comparing rhythms between strains and between nutrient conditions.

      As suggested, we used CircaCompare to analyze our data.

      Reviewer #3 (Public Review):

      This is an important paper anchored by the observation that cultures of Neurospora undergoing amino acid starvation lose circadian rhythmicity if orthologs in the classic GCN2/CPC-3 cross-pathway control system are absent. Data convincingly show that Neurospora orthologs of Saccharomyces GCN2 and GCN4 (CPC-3 and CPC-1 respectively) are needed to promote histone acetylation at the core clock gene frequency to facilitate rhythmicity. While the binding of CPC-1 and thereby GCN-5 are plainly rhythmic, the explanation of exactly where rhythmicity enters the pathway is incomplete.

      Figure 1 shows that inhibition of the HIS-3 activity affected by 3-AT, which should trigger cross-pathway control, is correlated with a graded reduction in the amplitude of the rhythm, and eventually to arrhythmicity at 3 mM 3-AT. While normalized data are shown in Figure 1B, raw data should also be provided in the Supplement as sometimes normalization hides aspects of the data. Ideally, this would be on the same scale in wt and in mutant strains.

      We revised as suggested and added the raw data. The results are now shown in Figure 1–figure supplement 2A and 2B and Figure 5–figure supplement 1B and 1C.

      Figure 2. The logical conclusion from Fig 1 is that circadian frq expression driven by the WCC has been impacted by amino acid starvation in the mutants. If so, either WC-1/WC-2 levels might be low, or else they might not be able to bind to DNA. When this was assessed, ChIP assays showed a loss of DNA binding. Although documented, an interesting result is that WCC protein amounts are sharply increased, especially for WC-1. The authors could comment on possible causes for this.

      Line 176, "hypophosphorylation of WC-1 and WC-2 (which is normally associated with WC activation . . . )". While the authors are correct that this is often the case it is not always the case and this introduces a potentially interesting caveat. That is, the overall phosphorylation status of WCC does not always reflect its activity in driving frq transcription. This was first noticed by Zhou et al., (2018 PLOS Genetics) who reported that even though WCC is always hyperphosphorylated in ∆csp-6, the core clock maintains a normal circadian period with only minor amplitude reduction. This should be noted, cited, and discussed.

      Thanks for the suggestion. We revised the manuscript as suggested, “It should be noted that the overall phosphorylation status of WCC does not always reflect its activity in driving frq transcription, possibly due to the unknown function of multiple key phosphosites on WCC (Wang et al., 2019; X. Zhou et al., 2018)”.

      Figure 2 and Figure 2 Suppl. report different gel conditions and show that the sharply increased WC1/WC-2 levels seen in Fig 2 resulting from 3-AT treatment of the cpc pathway mutants are due to the accumulation of hypophosphorylated WC-1/2. The conclusion would be stronger if the gels in the Supplement showed the same degree of difference between wt and mutants as seen in Fig 2. In any case, these hypophosphorylated WC should be active and able to bind DNA but plainly are not based on Fig 2.

      Thanks for the comments. It’s correct that WC-1/WC-2 were hypo-phosphorylated and their protein levels were increased (Figure 2 and Figure 2-figure supplement 1). However, the reduced binding of WC-1/WC-2 at the frq promoter explains for the reduced frq transcription in the cpc-1KO or cpc-3KO mutants under amino acid starvation.

      Figure 3 correlates the unexpected loss of DNA binding by hypophosphorylated WCC with reduced histone H3 acetylation at frq. The 3 mM 3-AT reported to result in arrhythmicity in cpc mutants in Figures 1 and 2 results in a small (~20%?) and not statistically significant reduction in H3 acetylation in wt, compatible with the sustained rhythms seen in wt in Figure 1, but in a substantial (~5 fold) loss of binding in the ∆cpc-1 background; so CPC-1 is needed for H3 acetylation at frq to sustain the rhythm during amino acid starvation. The simplest explanation here then is that the hypophosphorylated WCC cannot bind to DNA because the chromatin is closed due to decreased AcH3.

      Thanks for the comments.

      Figure 4. Title:" Figure 4. CPC-1 recruits GCN-5 to activate frq transcription in response to amino acid starvation"; the conditions of amino acid starvation should be mentioned here for the reader's benefit. (In the unlikely case that there was no amino acid starvation here then many things about the manuscript need to be reconsidered.)

      Based on the model from yeast where amino acid starvation activates GCN2 (aka CPC-3 in Neurospora) kinase which activates the transcriptional activator GCN4 (aka CPC-1) which recruits the SAGA complex containing the histone acetylase GCN5 to regulated promoters, CPC-1 was tagged and shown by ChIP to bind rhythmically at frq. Co-IP experiments establish the interaction of components of the SAGA complex in Neurospora and Neurospora GCN-5 indeed is bound to frq, likely recruited by CPC-1. This part all follows the Saccharomyces model with the interesting twist that the binding CPC-1 is weakly rhythmic and GCN-5 strongly rhythmic in a CPC-1-dependent manner. Based on the figure legend title, these cultures should always be starved for amino acids (although as noted this should be made explicit in the figure legend). In any case, given this, from where does the rhythmicity in GCN-5-binding arise? This question is developed more below.

      Line 224, "low in the cpc-1KO strain, suggesting that CPC-1 rhythmically recruit GCN-5". Because ChIP was done only for a half circadian cycle (DD10-22), it is hard to conclude "rhythmically". The statement should be modified.

      To address the concern, we performed the ChIP assay using the CPC-1 antibody instead of Myc antibody (revised Figure 4A). Analysis of the ChIP results with CircaCompare showed that CPC-1 binding at the frq promoter was rhythmic without 3-AT (Figure 4A) or with 3 mM 3-AT treatment (Figure 4-figure supplement 1A). Due to the ADA-2-GCN5 and CPC-1-ADA-2 interactions with/without 3-AT treatment (Revised Figure 4B-C), CPC-1 should be able to recruit GCN-5-containing SAGA complex to activate frq transcription in response to amino acid starvation. We have now clarified this model in the revised manuscript. Please also see response to Reviewer 2/point 5.

      It was previously reported that the CPC-3/CPC-1 signaling pathway was rhythmically controlled by circadian clock, as indicated by CPC-3-mediated rhythmic eIF2α phosphorylation at serine 51 (Karki S et al. 2020, PMID: 32355000). Our data showed rhythmic CPC-1 and GCN-5 binding at the frq promoter in the WT strain and decreased GCN-5 binding in the cpc-1KO mutant (Figure 4A and 4D). These results suggested that the circadian clock controlled the CPC-3/CPC-1 signaling pathway rhythmically, which in turn promoted the rhythmic frq transcription through recruiting GCN5 containing SAGA complex under amino acid starvation. We clarified the model and description in the discussion.

      As suggested by the reviewer, we modified the statement "suggesting that CPC-1 recruits GCN-5-containing SAGA complex to the frq promoter".

      Figure 5 shows that rhythmicity in general and of frq/FRQ specifically requires GCN-5 even under conditions of normal amino acid sufficiency, and that normal levels of H3 acetylation and its rhythm at frq require GCN-5. Not surprisingly, high H3 acetylation at frq correlated with high WC-2 DNA binding, and ADA-2 is required for SAGA functions.

      As earlier, raw bioluminescence data corresponding to panel B should be provided in the figure or Supplement.

      Also, if CPC-3 and CPC-1 regulate frq transcription through GCN-5, why is the frq level extremely low in the cpc-3KO or cpc-1KO(Fig.1D) but remains normal in gcn-5KO (Fig. 5D)?

      Raw bioluminescence data are listed in Figure 5–figure supplement 1B and 1C. For frq transcription in the WT and gcn-5KO mutant, please see response to Essential Revisions point 4.

      Figure 6 documents the counter effects of TSA which inhibits histone deacetylation and shortens the period versus 3-AT which decreases (via CPC-3 to CPC-1 to GCN-5) histone acetylation and causes period lengthening or arrhythmicity. HDA-1 is necessary for histone deacetylation at frq.

      Thanks for the comments.

      Figure 7 documents extensive changes in gene expression associated with 3-AT-induced amino acid starvation and the CPC-3 to CPC-1 pathway. How do these results compare with other previously studied systems, particularly Saccharomyces, where similar experiments have been done? Are the same genes regulated to the same extent or are there some interesting differences?

      Thanks for the suggestion. We revised our manuscript by comparing the difference of these genes in Saccharomyces. GCN4/CPC-1 targets are similar. “Similar to Saccharomyces cerevisiae (Natarajan et al., 2001), genes in amino acid biosynthetic pathways, vitamin biosynthetic enzymes, peroxisomal components, and mitochondrial carrier proteins were also identified as CPC-1 targets”.

      Figure 8 provides a model consistent with the role of the CPC-3/GCN2 pathway in regulating genes in response to amino acid starvation. It seems this could be any gene responding to amino acid starvation.

      Not accounted for in the model is the data from Fig 4 which show the rhythmic binding of CPC-1 and stronger rhythmic binding of GCN-5 to frq, both under amino acid starvation. In the presence of 3-AT, amino acid starvation is constant, which should mean that CPC-3 and CPC-1 would always be "on". Why doesn't CPC-1 recruit GCN5 at the same level at all times leading to constant high H3 acetylation rather than rhythmic H3 acetylation as seen in Figure 3? Perhaps, unlike the statement in lines 345-34, it is WCC that regulates rhythmic GCN-5 binding and facilitates rhythmic histone acetylation at frq. Or perhaps the clock introduces rhythmicity upstream from GCN5. Without an answer to the question of where rhythmicity comes into the pathway, the story is only about how the CPC-3/GCN2 pathway in regulating genes in response to amino acid starvation; without explaining the rhythmicity the story seems incomplete.

      It was previously reported that the CPC-3/CPC-1 signaling pathway was rhythmically controlled by circadian clock, as indicated by CPC-3-mediated rhythmic eIF2α phosphorylation at serine 51 (Karki S et al. 2020, PMID: 32355000). Our data showed rhythmic CPC-1 and GCN-5 binding at the frq promoter in the WT strain and decreased GCN-5 binding in the cpc-1KO mutant (Figure 4A and 4D). These results suggested that the circadian clock controlled the CPC-3/CPC-1 signaling pathway rhythmically, which in turn promoted the rhythmic frq transcription through recruiting GCN5 containing SAGA complex under amino acid starvation. We clarified the model and description in the discussion.

    1. Author Response

      Reviewer 2 (Public review):

      A quasi-experimental before and after design as the methodological intention should be stated in the article. Although there are equally powerful alternatives with arguably less-stringent requirements that are appropriate and well-tested for natural experiments such as that intervened by the COVID-19 pandemic given the simulation methods, as of now obtaining the actual stage distribution of cancer and the cancer-specific mortality rates before and after the pandemic is possible for making scientifically valid conclusions based on observed data to support the simulation study.

      We agree with the reviewer that a modelled before-and-after analysis would have been informative. However, the required mortality and cancer stage distribution data to inform this analysis is not yet available for Australia. In future, our modelled predictions can be compared to emergent observed national stage and mortality data. The current paper presents estimates that were modelled during rapid-response modelling commissioned by the Australian Government early in the pandemic. Findings therefore demonstrate what could be done with the information available at that time. We have amended, as shown in bold below, the end of the introduction as follows:

      “We demonstrate what could be estimated by a rapid response evaluation based on information available early in the pandemic, and discuss how these estimates relate to subsequent observed disruptions to screening. In future, our modelled predictions can be compared to emergent observed national stage and mortality data.”

      The screening disruption is the only concerned parameter in modelling the change of cancer progression in this study. But delayed diagnosis after screening as another concern could be possibly affected by the pandemic. This should be taken into consideration in the simulation. The authors also claimed the cancer treatment could also be affected by the pandemic, the evaluation on mortality is therefore not feasible. However, the impacts of COVID-19 pandemic on the delayed treatment and cancer treatment are important issues which should be covered by simulation study.

      We clearly state that this is a limitation of the current study. We have added the following sentence to the discussion, lines 377-379.

      ‘These effects will be incorporated in future modelled evaluations, after careful calibration and validation to observed data, with a view to extending the modelled outcomes to mortality estimates.’

      By simulations, the confident intervals for the outcomes should be provided as the requirement to determine the required reliability for the estimates.

      The manuscript aims to present indicative estimates for a range of scenarios, with numerous simplifying assumptions as indicated. In this context, generating meaningful uncertainty intervals is not feasible or appropriate.

    1. Author Response

      Reviewer #1 (Public Review):

      There has been a lot of work showing that multi-peaked tuning curves contain more information than single peaked ones. If that's the case, why are single-peaked tuning curves ubiquitous in early sensory areas? The answer, as shown clearly in this paper, is that multi-peaked tuning curves are more likely to produce catastrophic errors.

      This is an extremely important point, and one that should definitely be communicated to the broader community. And this paper does an OK job doing that. However, it suffers from two (relatively easily fixable) problems:

      I) Unless one is an expert, it's very hard to extract why multi-peaked tuning curves lead to catastrophicerrors.

      II) It's difficult to figure out under what circumstances multi-peaked tuning curves are bad. This isimportant, because there are a lot of neurons in the sensory cortex, and one would like to know whether multi-peaked tuning curves are really a bad idea there.

      And here are the fixes:

      I) Fig. 1c is a missed opportunity to explain what's really going on, which is that on any particular trialthe positions of the peaks of the log likelihood can shift in both phase and amplitude (with phase being more important). However Fig. 1c shows the average log likelihood, which makes it hard to understand what goes wrong. It would really help if Fig. 1c were expanded into its own large figure, with sample log likelihoods showing catastrophic errors for multi-peaked tuning curves but not for single peaked ones. You could also indicate why, when multi-peaked tuning curves do give the right answer, the error tends to be small.

      We thank the reviewer for this suggestion. We have now split the first figure into two.

      In the new Figure 1, we provide an intuitive explanation of local vs catastrophic errors and single-peaked / periodic tuning curves. We have also added smaller panels to illustrate how the distribution of errors changes with decoding time (using a simulated single-peaked population).

      The new Figure 2 shows sampled likelihoods for 3 different populations. We hope this provides some intuitive understanding of the phase shifts. Unfortunately, it proved difficult not to normalize the “height” of each module’s likelihood as they can differ by several orders of magnitude across the modules. However, due to the multiplication, the peak likelihood values can (approximately) be disregarded in the ML-decoding. Lastly, we have also added more simulation points (scale factors) compared to what we had in the earlier version of the figure (see panels d-e).

      II) What the reader really wants to know is: would sensory processing in real brains be more efficient ifmulti-peaked tuning curves were used? That's certainly hard to answer in all generality, but you could make a comparison between a code with single peaked tuning curves and a good code with multi-peaked tuning curves. My guess is that a good code would have lambda_1=1 and c around 0.5 (you could use the module ratio the grid cell people came up with -- I think 1/sqrt(2) -- although I doubt if it matters much). My guess is that it's the total number of spikes, rather than the number of neurons, that matters. Some metric of performance (see point 1 below) versus the contrast of the stimulus and the number of spikes would be invaluable.

      We thank the reviewer for this comment and the suggestions. We agree, ideally such an expression would be useful. However, as you note it is a very challenging task due to the large parameter space (number of neurons, peak amplitude, spontaneous firing rate, width of tuning, stimulus dimensionality etc) and is beyond the scope of the present study. We have instead included a new figure (see Figure 7 in the manuscript) detailing the minimal decoding times for various choices of parameter values. We believe this gives an indication to how minimal decoding time scales with various parameters.

    1. Author Response:

      Reviewer #1 (Public Review):

      […] This novel system could serve as a powerful tool for loss-of-function experiments that are often used to validate a drug target. Not only this tool can be applied in exogenous systems (like EGFRdel19 and KRASG12R in this paper), the authors successfully demonstrated that ARTi can also be used in endogenous systems by CRISPR knocking in the ARTi target sites to the 3'UTR of the gene of interest (like STAG2 in this paper).

      We thank the referee for highlighting the novelty and potential of the ARTi system.

      ARTi enables specific, efficient, and inducible suppression of these genes of interest, and can potentially improve therapeutic target validations. However, the system cannot be easily generalized as there are some limitations in this system:

      • The authors claimed in the introduction sections that CRISPR/Cas9-based methods are associated with off-target effects, however, the author's system requires the use CRISPR/Cas9 to knock out a given endogenous genes or to knock-in ARTi target sites to the 3' UTR of the gene of interest. Though the authors used a transient CRISPR/Cas9 system to minimize the potential off-target effects, the advantages of ARTi over CRISPR are likely less than claimed.

      We thank the reviewer for raising these very valid concerns about potential off-target effects related to the CRISPR/Cas9-based gene knockout or engineering of endogenous ARTi target sites. In contrast to conventional RNAi- and CRISPR-based approaches, such off-target effects can be investigated prior to loss-of-function experiments through comparison between parental and engineered cells, which in the absence of CRISPR-induced off-target events are expected to be identical. Subsequent ARTi experiments provide full control over RNAi-induced off-target activities through comparison of target-site engineered and parental cells. However, we agree that undetected CRISPR/Cas9-induced off-target events cannot be ruled out in a definitive way, which we will point out in our revised manuscript.

      • Instead of generating gene-specific loss-of-function triggers for every new candidate gene, the authors identified a universal and potent ARTi to ensure standardized and controllable knockdown efficiency. It seems this would save time and effort in validating each lost-of-function siRNAs/sgRNAs for each gene. However, users will still have to design and validate the best sgRNA to knock out endogenous genes or to knock in ARTi target sites by CRISPR/Cas9. The latter is by no-means trivial. Users will need to design and clone an expression construct for their cDNA replacement construct of interest, which will still be challenging for big proteins.

      We fully agree that the required design of gene-specific sgRNAs and subsequent CRISPR-engineering steps are by no means trivial. However, we believe that decisive advantages of the method, in particular the robustness of LOF perturbations and additional means for controlling off-target activities, can make ARTi an investment that pays off. In our experience, much time can be lost in the search for effective LOF reagents, and even when these are found, questions about off-target activity remain. While ARTi overcomes many of these challenges by providing a standardized experimental workflow, we do not propose to replace all other LOF approaches by this method. Instead, we would position ARTi as a unique orthogonal approach for the stringent validation and in-depth characterization of candidate target genes, as we will highlight in our revised discussion.

      • The approach of knocking-out an endogenous gene followed by replacement of a regulatable gene can also be achieved using regulated degrons, and by tet-regulated promoters included in the gene replacement cassette. The authors should include a discussion of the merits of these approaches compared with ARTi.

      We thank the reviewer for pointing out these alternative LOF methods. We had already included a brief discussion of chemical-genetic LOF methods based on degron tags. While we certainly share the current excitement about degron technologies, they inevitably require changes to the coding sequence of target proteins, which can alter their regulation and function in ways that are hard to control for. In our revised discussion, we will add a brief comparison to conventional tet-regulatable expression systems, which unlike ARTi require the use of ectopic tet-responsive promoters. Overall, we would position ARTi as an orthogonal tool that enables inducible and reversible LOF perturbations without changing the coding sequence and the endogenous transcriptional control of candidate target genes.

      Reviewer #2 (Public Review):

      […] The system is very innovative, likely easy to be established and used by the scientific community and thus very meaningful.

      We thank the reviewer for their enthusiasm about ARTi.

  2. Feb 2023
    1. Author Response

      Reviewer #1 (Public Review):

      Starrett, Gabriel et al. investigated 43 bladder cancers (primary tumors), 5 metastases and 14 normal tissues from 43 solid organ transplant recipients of 5 Transplant Cancer Match Study participating registries (US) for the presence of viral genetic signatures, their host genome integration and possible contribution in carcinogenesis. They isolated DNA and RNA from FFPE tissues to perform state of the art whole genome and transcriptome sequencing. They find that 20 of the primary tumors, 3 of the metastases and 7 of the normal tissues harbor viral signatures with BKPyV and JCPyV being the most prevalent viruses detected. The bulk of the experiments focuses on the 9 BKPyV-positive primary tumors. They report that several of the BKPyV-positive tumors show host genome integration of BKPyV with associated focal amplifications of adjacent host chromosome regions, with chromosome 1 being the most prevalent. Furthermore, BKPyV-positive tumors show a distinct transcriptomic signature with gene expression changes related to DNA damage responses, cell cycle progression, angiogenesis, chromatin organization, mitotic spindle assembly, chromosome condensation/separation and neuronal differentiation. The authors only touch the features of other virus-positive tumors, e.g. those with JCPyV and HPV signals, without offering further detail or thought. The overall mutation signature analysis reveals no clear correlation between presence of viral sequences and tumor mutation burden suggesting that many different, virus-unrelated, factors possibly contribute to bladder cancer genesis and progression. Most striking are cases potentially linked to aristolochic acid, ABOBUCK3 and SBS5. Thus, while the approach is state-of-the-art, the causality of viral signatures and oncogenesis and vice versa remains unsolved.

      Strengths:

      1) The study assesses 43 primary tumors, 5 metastases and 14 normal tissues from 43 solid organ transplants of different kinds (24x kidney, 4x liver, 14x heart and/or lung, 1x pancreas) rather than just focusing on a particular organ.

      2) The study makes use of whole genome sequencing and transcriptomics and the assayed material is extracted from FFPE tissue, which shows a high level of practical, technical and computational skills and expertise.

      Weaknesses:

      1) There have been multiple inconsistencies in sample number and figure references throughout the publication. Is it 19 or 20 cases that have viral sequences detected? A comprehensive checker board table showing all cases, the available tissue samples and respective analyses would be in order.

      We would like to thank the reviewer for their detailed assessment of the manuscript. A checkerboard table of all samples tissues and analysis has been added as supplemental table 1 (Supplementary file 1a).

      2) The overall low coverage of the whole genome sequencing, which the authors mention, and the relatively big variation in coverage in both datasets (WGS, transcriptomics) are major limitations of the study. Possibly, this was done to increase specificity, but sorting out and discarding reads may also be problematic. Please comment.

      Besides performing quality and adapter trimming as described in the methods, we did not discard any reads. Experimental design and analysis were conducted to be as inclusive as possible considering the rarity of these specimens.

      Reviewer #2 (Public Review):

      Starrett et al performed whole genome and transcriptome sequencing of bladder cancers from 43 organ transplant recipients. They found that most of these tumors contained DNA from one of four viruses (BKPyV, JCPyV, HPV, and TTV). Viral genomes are most often integrated into the genomes of these tumor cells and the authors provide evidence that the integration utilized the POL theta-mediated end joining pathway. In most cases, viral RNA was detected in tumors with viral DNA. This suggests that the viruses are actively altering the cellular environment. Frequently, this resulted in similarities for overall gene expression patterns in the tumors that were grouped by the type of virus present in the tumor. Moreover, the changes in expression linked with viral gene expression were found in genes relevant to tumorigenesis. Immunohistochemical detection of viral proteins in these tumors also demonstrated active viral gene expression. However, the presence of viral proteins was heterogenous within the tumor, with between 1 and 100% of the tumor staining positive for BKPyV large T antigen. An analysis of mutational signatures in these tumors indicate that the viruses are also shaping the tumor genome by inducing mutations. Evidence that specific viruses are contributing to tumorigenesis in organ transplant patients has fundamental implications for preventing tumorigenesis in these patients.

      The conclusions of this paper are generally well supported by the data provided. Indeed, there is little doubt that viral infections are more likely in these tumors. However, there are aspects of the paper that could be improved and or clarified. Most importantly, despite the strong evidence that the viruses are altering the tumor cell environment, it is unclear if these changes are necessary for tumorigenesis or less excitingly the result of an even more immune suppressive environment within the tumor. The heterogeneity of the LT expression suggests that the presence of the viral DNA and RNA may not be enough to assess whether it is actively contributing to the tumor. Is an increased frequency of viral protein staining linked with any evidence of an active contribution to tumorigenesis (fewer tumor-suppressor/oncogene mutations). that they reduced mutations in tumor suppressors. This might be easiest to assess with the tumors that have oncogenic HPV DNA. If those tumors lacked p53 and RB mutations, it would support a causative role for the virus.

      We thank the reviewer for their thoughtful review. Indeed, in Figure 6 we show that no BKPyV-positive or HPV-positive tumor harbored mutations in RB1. Additionally, only one BKPyV-positive tumor and none of the HPV-positive tumors had a mutation in TP53. We have added further emphasis to this point on page 14, “None of the HPV-positive tumors with WGS harbored mutations in TP53 or RB1. Similarly, none of the polyomavirus-positive tumors harbored mutations in RB1 and only TBC08 had a frameshift mutation in TP53.”

    1. Author Response

      Reviewer #1 (Public Review):

      Buglak et al. describe a role for the nuclear envelope protein Sun1 in endothelial mechanotransduction and vascular development. The study provides a full mechanistic investigation of how Sun1 is achieving its function, which supports the concept that nuclear anchoring is important for proper mechanosensing and junctional organization. The experiments have been well designed and were quantified based on independent experiments. The experiments are convincing and of high quality and include Sun1 depletion in endothelial cell cultures, zebrafish, and in endothelial-specific inducible knockouts in mice.

      We thank the reviewer for their enthusiastic comments and for noting our use of multiple model systems.

      Reviewer #2 (Public Review):

      Endothelial cells mediate the growth of the vascular system but they also need to prevent vascular leakage, which involves interactions with neighboring endothelial cells (ECs) through junctional protein complexes. Buglak et al. report that the EC nucleus controls the function of cell-cell junctions through the nuclear envelope-associated proteins SUN1 and Nesprin-1. They argue that SUN1 controls microtubule dynamics and junctional stability through the RhoA activator GEF-H1.

      In my view, this study is interesting and addresses an important but very little-studied question, namely the link between the EC nucleus and cell junctions in the periphery. The study has also made use of different model systems, i.e. genetically modified mice, zebrafish, and cultured endothelial cells, which confirms certain findings and utilizes the specific advantages of each model system. A weakness is that some important controls are missing. In addition, the evidence for the proposed molecular mechanism should be strengthened.

      We thank the reviewer for their interest in our work and for highlighting the relative lack of information regarding connections between the EC nucleus and cell periphery, and for noting our use of multiple model systems. We thank the reviewer for suggesting additional controls and mechanistic support, and we have made the revisions described below.

      Specific comments:

      1) Data showing the efficiency of Sun1 inactivation in the murine endothelial cells is lacking. It would be best to see what is happening on the protein level, but it would already help a great deal if the authors could show a reduction of the transcript in sorted ECs. The excision of a DNA fragment shown in the lung (Fig. 1-suppl. 1C) is not quantitative at all. In addition, the gel has been run way too short so it is impossible to even estimate the size of the DNA fragment.

      We agree that the DNA excision is not sufficient to demonstrate excision efficiency. We attempted examination of SUN1 protein levels in mutant retinas via immunofluorescence, but to date we have not found a SUN1 antibody that works in mouse retinal explants. We argue that mouse EC isolation protocols enrich but don’t give 100% purity, so that RNA analysis of lung tissue also has caveats. Finally, we contend that our demonstration of a consistent vascular phenotype in Sun1iECKO mutant retinas argues that excision has occurred. To test the efficiency of our excision protocol, we bred Cdh5CreERT2 mice with the ROSAmT/mG excision reporter (cells express tdTomato absent Cre activity and express GFP upon Cre-mediated excision (Muzumdar et al., 2007). Utilizing the same excision protocol as used for the Sun1iECKO mice, we see a significantly high level of excision in retinal vessels only in the presence of Cdh5CreERT2 (Reviewer Figure 1).

      Reviewer Figure 1: Cdh5CreERT2 efficiently excises in endothelial cells of the mouse postnatal retina. (A) Representative images of P7 mouse retinas with the indicated genotypes, stained for ERG (white, nucleus). tdTomato (magenta) is expressed in cells that have not undergone Cre-mediated excision, while GFP (green) is expressed in excised cells. Scale bar, 100μm. (B) Quantification of tdTomato fluorescence relative to GFP fluorescence as shown in A. tdTomato and GFP fluorescence of endothelial cells was measured by creating a mask of the ERG channel. n=3 mice per genotype. ***, p<0.001 by student’s two-tailed unpaired t-test.

      2) The authors show an increase in vessel density in the periphery of the growing Sun1 mutant retinal vasculature. It would be important to add staining with a marker labelling EC nuclei (e.g. Erg) because higher vessel density might reflect changes in cell size/shape or number, which has also implications for the appearance of cell-cell junctions. More ECs crowded within a small area are likely to have more complicated junctions. Furthermore, it would be useful and straightforward to assess EC proliferation, which is mentioned later in the experiments with cultured ECs but has not been addressed in the in vivo part.

      We concur that ERG staining is important to show any changes in nuclear shape or cell density in the post-natal retina. We now include this data in Figure1-figure supplement 1F-G. We do not see obvious changes in nuclear shape or number, though we do observe some crowding in Sun1iECKO retinas, consistent with increased density. However, when normalized to total vessel area, we do not observe a significant difference in the nuclear signal density in Sun1iECKO mutant retinas relative to controls.

      3) It appears that the loss of Sun1/sun1b in mice and zebrafish is compatible with major aspects of vascular growth and leads to changes in filopodia dynamics and vascular permeability (during development) without severe and lasting disruption of the EC network. It would be helpful to know whether the loss-of-function mutants can ultimately form a normal vascular network in the retina and trunk, respectively. It might be sufficient to mention this in the text.

      We thank the reviewer for pointing this out. It is true that developmental defects in the vasculature resulting from various genetic mutations are often resolved over time. We’ve made text changes to discuss viability of Sun1 global KO mice and lack of perduring effects in sun1 morphant fish, perhaps resulting from compensation by SUN2, which is partially functionally redundant with SUN1 in vivo (Lei et al., 2009; Zhang, et al., 2009) (p. 20).

      4) The only readout after the rescue of the SUN1 knockdown by GEF-H1 depletion is the appearance of VE-cadherin+ junctions (Fig. 6G and H). This is insufficient evidence for a relatively strong conclusion. The authors should at least look at microtubules. They might also want to consider the activation status of RhoA as a good biochemical readout. It is argued that RhoA activity goes up (see Fig. 7C) but there is no data supporting this conclusion. It is also not clear whether "diffuse" GEF-H1 localization translates into increased Rho A activity, as is suggested by the Rho kinase inhibition experiment. GEF-H1 levels in the Western blot in (Fig. 6- supplement 2C) have not been quantitated.

      We agree that analysis of RhoA activity and additional analysis of rescued junctions strengthens our conclusions, so we performed these experiments. New data (Figure 6IJ) shows that co-depletion of SUN1 and GEF-H1 rescues junction integrity as measured by biotin-matrix labeling. Interestingly, co-depletion of SUN1 and GEF-H1 does not rescue reduced microtubule density at the periphery (Figure 6-figure supplement 3BC), placing GEF-H1 downstream of aberrant microtubule dynamics in SUN1 depleted cells. This is consistent with our model (Figure 8) describing how loss of SUN1 leads to increased microtubule depolymerization, resulting in release and activation of GEF-H1 that goes on to affect actomyosin contractility and junction integrity. In addition, we include images of the junctions in GEF-H1 single KD (Figure 6-figure supplement 3BC) and quantify the western blot in Figure 6-figure supplement 3A.

      We performed RhoA activity assays and new data shows that SUN1 depletion results in increased RhoA activation, while co-depletion of SUN1 and GEF-H1 ameliorates this increase (Figure 6-figure supplement 2D). This is consistent with our model in which loss of SUN1 leads to increased RhoA activity via release of GEF-H1 from microtubules. In addition, we now cite a recent study describing that GEF-H1 is activated when unbound to microtubules, with this activation resulting in increased RhoA activity (Azoitei et al., 2019).

      5) The criticism raised for the GEF-H1 rescue also applies to the co-depletion of SUN1 and Nesprin-1. This mechanistic aspect is currently somewhat weak and should be strengthened. Again, Rho A activity might be a useful and quantitative biochemical readout.

      We respectfully point out that we showed that co-depletion of nesprin-1 and SUN1 rescues SUN1 knockdown effects via several readouts, including rescue of junction morphology, biotin labeling, microtubule localization at the periphery, and GEFH1/microtubule localization. We’ve moved this data to the main figure (Figure 7B-C, E-F) to better highlight these mechanistic findings. These results are consistent with our model that nesprin-1 effects are upstream of GEF-H1 localization. We also added results showing that nesprin-1 knockdown alone does not affect junction integrity, microtubule density, or GEF-H1/microtubule localization (Figure 7-figure supplement 1B-G).

      Reviewer #3 (Public Review):

      Here, Buglak and coauthors describe the effect of Sun1 deficiency on endothelial junctions. Sun1 is a component of the LINC complex, connecting the inner nuclear membrane with the cytoskeleton. The authors show that in the absence of Sun1, the morphology of the endothelial adherens junction protein VE-cadherin is altered, indicative of increased internalization of VE-cadherin. The change in VE-cadherin dynamics correlates with decreased angiogenic sprouting as shown using in vivo and in vitro models. The study would benefit from a stricter presentation of the data and needs additional controls in certain analyses.

      We thank the reviewer for their insightful comments, and in response we have performed the revisions described below.

      1) The authors implicate the changes in VE-cadherin morphology to be of consequence for "barrier function" and mention barrier function frequently throughout the text, for example in the heading on page 12: "SUN1 stabilizes endothelial cell-cell junctions and regulates barrier function". The concept of "barrier" implies the ability of endothelial cells to restrict the passage of molecules and cells across the vessel wall. This is tested only marginally (Suppl Fig 1F) and these data are not quantified. Increased leakage of 10kDa dextran in a P6-7 Sun1-deficient retina as shown here probably reflects the increased immaturity of the Sun1-deficient retinal vasculature. From these data, the authors cannot state that Sun1 regulates the barrier or barrier function (unclear what exactly the authors refer to when they make a distinction between the barrier as such on the one hand and barrier function on the other). The authors can, if they do more experiments, state that loss of Sun1 leads to increased leakage in the early postnatal stages in the retina. However, if they wish to characterize the vascular barrier, there is a wide range of other tissue that should be tested, in the presence and absence of disease. Moreover, a regulatory role for Sun1 would imply that Sun1 normally, possibly through changes in its expression levels, would modulate the barrier properties to allow more or less leakage in different circumstances. However, no such data are shown. The authors would need to go through their paper and remove statements regarding the regulation of the barrier and barrier function since these are conclusions that lack foundation.

      We thank the reviewer for pointing out that the language used regarding the function and integrity of the junctions is confusing, although we suggest that the endothelial cell properties measured by our assays are typically equated with “barrier function” in the literature. However, we have edited our language to precisely describe our results as suggested by the reviewer.

      2) In Fig 6g, the authors show that "depletion of GEF-H1 in endothelial cells that were also depleted for SUN1 rescued the destabilized cell-cell junctions observed with SUN1 KD alone". However, it is quite clear that Sun1 depletion also affects cell shape and cell alignment and this is not rescued by GEF-H1 depletion (Fig 6g). This should be described and commented on. Moreover please show the effects of GEF-H1 alone.

      We thank the reviewer for pointing out the effects on cell shape. SUN1 depletion typically leads to shape changes consistent with elevated contractility, but this is considered to be downstream of the effects quantified here. We updated the panel in Figure 6G to a more representative image showing cell shape rescue by co-depletion of SUN1 and GEF-H1. We present new data panels showing that GEF-H1 depletion alone does not affect junction integrity (Figure 6I-J). We also present new data showing that co-depletion of GEF-H1 and SUN1 does not rescue microtubule density at the periphery (Figure 6-figure supplement 3B-C), consistent with our model that GEF-H1 activation is downstream of microtubule perturbations induced by SUN1 loss.

      3) In Fig. 6a, the authors show rescue of junction morphology in Sun1-depleted cells by deletion of Nesprin1. The effect of Nesprin1 KD alone is missing.

      We thank the reviewer for this comment, and we now include new panels (Figure 7figure supplement 1B-G) demonstrating that Nesprin-1 depletion does not affect biotin-matrix labeling, peripheral microtubule density, or GEF-H1/microtubule localization absent co-depletion with SUN1. These findings are consistent with our model that Nesprin-1 loss does not affect cell junctions on its own because it is held in a non-functional complex with SUN1 that is not available in the absence of SUN1.

      References

      Azoitei, M. L., Noh, J., Marston, D. J., Roudot, P., Marshall, C. B., Daugird, T. A., Lisanza, S. L., Sandί, M., Ikura, M., Sondek, J., Rottapel, R., Hahn, K. M., Danuser, & Danuser, G. (2019). Spatiotemporal dynamics of GEF-H1 activation controlled by microtubule- and Src-mediated pathways. Journal of Cell Biology, 218(9), 3077-3097. https://doi.org/10.1083/jcb.201812073

      Denis, K. B., Cabe, J. I., Danielsson, B. E., Tieu, K. V, Mayer, C. R., & Conway, D. E. (2021). The LINC complex is required for endothelial cell adhesion and adaptation to shear stress and cyclic stretch. Molecular Biology of the Cell, mbcE20110698. https://doi.org/10.1091/mbc.E20-11-0698

      King, S. J., Nowak, K., Suryavanshi, N., Holt, I., Shanahan, C. M., & Ridley, A. J. (2014). Nesprin-1 and nesprin-2 regulate endothelial cell shape and migration. Cytoskeleton (Hoboken, N.J.), 71(7), 423–434. https://doi.org/10.1002/cm.21182

      Lei, K., Zhang, X., Ding, X., Guo, X., Chen, M., Zhu, B., Xu, T., Zhuang, Y., Xu, R., & Han, M. (2009). SUN1 and SUN2 play critical but partially redundant roles in anchoring nuclei in skeletal muscle cells in mice. PNAS, 106(25), 10207–10212.

      Muzumdar, M. D., Tasic, B., Miyamichi, K., Li, L., & Luo, L. (2007). A global doublefluorescent Cre reporter mouse. Genesis, 45(9), 593-605. https://doi.org/10.1002/dvg.20335

      Ueda, N., Maekawa, M., Matsui, T. S., Deguchi, S., Takata, T., Katahira, J., Higashiyama, S., & Hieda, M. (2022). Inner Nuclear Membrane Protein, SUN1, is Required for Cytoskeletal Force Generation and Focal Adhesion Maturation. Frontiers in Cell and Developmental Biology, 10, 885859. https://doi.org/10.3389/fcell.2022.885859

      Zhang, X., Lei, K., Yuan, X., Wu, X., Zhuang, Y., Xu, T., Xu, R., & Han, M. (2009). SUN1/2 and Syne/Nesprin-1/2 complexes connect centrosome to the nucleus during neurogenesis and neuronal migration in mice. Neuron, 64(2), 173–187. https://doi.org/10.1016/j.neuron.2009.08.018.

    1. Author Response

      Reviewer #1 (Public Review):

      In mammals, a small subset of genes undergoes canonical genomic imprinting, with highly biased expression in function of parent of origin allele. Recent studies, using polymorphic mouse embryos and tissues, have reevaluating the number of allele-specific expressed genes (ASE) to 3 times more than previously thought, however with most of these novel genes showing a very low ASE (50%-60% bias toward one parental allele). Here, the authors undergo a comparison of 4 datasets and complete bioinformatic reanalysis of 3 recent allele specific RNAseq to study potential novel imprinted genes, using recently released iSoLDE pipeline. Very few genes have been confirmed with true ASE in the different studies and/or validated by pyrosequencing analysis, However, the authors show that most of the newly discovered ASE genes are lying in close proximity of already known imprinted loci and could be co-regulated by these imprinted clusters. This is important to understand how and to which extent imprinted control regions control gene expression.

      This manuscript highlights the number of potential false discovered imprinted genes in previous datasets that could result to either lack of replicates, weak allelic ratio or low gene expression and lack of read depth. But the lack of overlap in the ASE called genes (at the exception to the known imprinted genes) between the different datasets is worrying and important to discuss, as the authors did. I would have appreciated more details into the differences between the different datasets that could explain the lack of reproducibility : library preparation protocol, sequencer technology, SNP calling, number of reads per SNP, bioinformatics pipeline.

      We agree and a comparison of all the studies is included in the methods section. In particular, we have now included more information on SNP calling and sequencer technology.

      Studying allele specific expression of lowly expressed genes is difficult by technology based on PCR amplification (library preparation, pyrosequencing) and could result on a bias expression only due to the random amplification of a small pool of molecules. Could the author compare the level of expression of their different classes of genes? The more robust ASE genes in their study could be the more highly expressed? Several genes were identified only in one or two of the previous studies, were they expressed in the other studies when not define as ASE? This would also allow defining a threshold of expression to study allelic bias in the future. To conclude, this study is an important resource for the epigenetic field and better understand genomic imprinting.

      We thank-you for this suggestion. We have now taken all RNAseq data that we had run through the ISoLDE pipeline and extracted the transcripts per million (TPM) expression levels for each of the genes called in the original studies. We find no over representation of lowly expressed genes in the novel biased genes compared with known imprinted genes. We also looked specifically at the expression levels of the genes tested by pyrosequencing in these datasets and saw no relationship between validation and expression levels. Expression levels are consistent between studies, especially in the same tissue, indicating the lack of reproducibility between studies is not due to differing expression. These observations have been added to the manuscript.

      Reviewer #2 (Public Review):

      This work aims to understand genomic imprinting in the mouse and provide further insight to challenges and patterns identified in previous studies.

      Firstly, genomic imprinting studies have been surrounded by controversy especially ~10 years ago when the explosion of sequencing data but immature methods to analyze it lead to highly exaggerated claims of widespread imprinting. While the methods have improved, clear standards are not set and results still have some inconsistencies between studies. The authors first do a meta-analysis of previous studies, comparing their results and doing a useful reanalysis of the data. This provides some valuable insights into the reasons for inconsistencies and guides towards better study designs. While this work does not exactly set a common standard for the field, or provide a full authoritative catalog of imprinted loci in mouse tissues, it provides a step in that direction. I find these analyses relatively simple and straightforward, but they seem solid.

      Previous studies have described a relatively common pattern of subtle expression bias towards one parental allele, rather than the classical imprinting pattern of fully monoallelic expression. This work digs deeper into this phenomenon, using first the meta-analysis data and then also targeted pyrosequencing analysis of selected loci. The analysis is generally well done, although I did not understand why gDNA amplification bias was not systematically corrected in all cases but only if it was above a given (low) threshold. I doubt this would affect the results much though. To some extent the results confirm previously observed patterns (bimodal distribution of either subtle or full bias, and effect of distance from the core of the imprinted locus). The novel insights mostly concern individual loci, with discovery and validation of some novel genes, typically with a subtle or context-specific parental bias.

      The study also provides some insights into mechanisms, especially by analysis of existing mouse models with a deletion of the ICR of specific loci. The change in the parental bias pattern was then used to infer potential methylation and chromatin-related mechanisms in these imprinted loci, including how the subtle bias further away is achieved. There are interesting novel findings here, as well as hypotheses for further research. However, this is an area where the conclusions rely quite heavily on published research especially as this study doesn't include single-cell resolution, and it's not entirely clear how much of e.g. the Figure 7 mechanisms part is based on discoveries of this study.

      We agree that Figure 7 does not illustrate models based exclusively on data generated in this study: instead, it serves as hypotheses to be tested in the coming years

      Imprinting is a fascinating phenomenon that can be informative of mechanisms of genome regulation and parental effects in general. It is a bit of a niche area though, and the target audience of this study is likely going to be limited to specialists doing research on this specific topic. As the authors point out, the functional importance of the findings is unknown.

    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript, the authors studied the erythropoiesis and hematopoietic stem/progenitor cell (HSPC) phenotypes in a ribosome gene Rps12 mutant mouse model. They found that RpS12 is required for both steady and stress hematopoiesis. Mechanistically, RpS12+/- HSCs/MPPs exhibited increased cycling, loss of quiescence, protein translation rate, and apoptosis rates, which may be attributed to ERK and Akt/mTOR hyperactivation. Overall, this is a new mouse model that sheds light into our understanding of Rps gene function in murine hematopoiesis. The phenotypic and functional analysis of the mice are largely properly controlled, robust, and analyzed.

      A major weakness of this work is its descriptive nature, without a clear mechanism that explains the phenotypes observed in RpS12+/- mice. It is possible that the counterintuitive activation of ERK/mTOR pathway and increased protein synthesis rate is a compensatory negative feedback. Direct mechanism of Rps12 loss could be studied by ths acute loss of Rps12, which is doable using their floxed mice. At the minimum, this can be done in mammalian hematopoietic cell lines.

      We thank the reviewer for pointing this out. We have addressed this question by developing a new inducible conditional knockout Rps12 mouse model (see response below to major point 1).

      Below are some specific concerns need to be addressed.

      1) Line 226. The authors conclude that "Together, these results suggest that RpS12 plays an essential role in HSC function, including self-renewal and differentiation." The reviewer has three concerns regarding this conclusion and corresponding Figure3. 1) The data shows that RpS12+/- mice have decreased number of both total BM cells and multiple subpopulations of HSPCs. The frequency of HSPC subpopulations should also be shown to clarify if the decreased HSPC numbers arises from decreased total BM cellularity or proportionally decrease in frequency. 2) This figure characterizes phenotypic HSPC in BM by flow and lineage cells in PB by CBC. HSC function and differentiation are not really examined in this figure, except for the colony assay in Figure 3K. BMT data in Figure4 is actually for HSC function and differentiation. So the conclusion here should be rephrased. 3) Since all LT-, ST-HSCs, as well as all MPPs are decreased in number, how can the authors conclude that Rps12 is important for HSC differentiation? No experiments presented here were specifically designed to address HSC differentiation.

      We thank the reviewer for this excellent point. We think that the main defect is in HSC and progenitor maintenance, rather than in HSC differentiation. This is consistent with the decrease in multiple HSC and progenitor populations, as observed both by calculating absolute numbers and by frequency of the parent population (see new Supplementary Figures S2C-S2C). We have removed any references to altered differentiation from the text.

      We added data on the population frequency in the Supplementary Figure 2. And in the corresponding text. See lines 221-235.

      2) Figure 3A and 5E. The flow cytometry gating of HSC/MPP is not well performed or presented, especially HSC plot. Populations are not well separated by phenotypic markers. This concerns the validity of the quantification data.

      We chose a better representative HSC plot and included it in the Figure 3A

      3) It is very difficult to read bone marrow cytospin images in Fig 6F without annotation of cell types shown in the figure. It appears that WT and +/- looked remarkably different in terms of cell size and cell types. This mouse may have other profound phenotypes that need detailed examination, such as lineage cells in the BM and spleen, and colony assays for different types of progenitors, etc.

      The purpose of the bone marrow cytospin images in Figure 6F was to show the high number of apoptotic cells in the bone marrow of Rps12 KO/+ mice compared with controls. The differences in apoptosis in the LSK and myeloid progenitor populations are quantified in the flow cytometry data shown in Figure 6G-H. A detailed quantitative analysis of different bone marrow cell populations and their relative frequencies is also shown in Figures 2 and 3. In Rps12 KO/+ bone marrow, we observed a significant decrease in multiple stem cell and progenitor populations.

      4) For all the intracellular phospho-flow shown in Fig7, both a negative control of a fluorescent 2nd antibody only and a positive stimulus should be included. It is very concerning that no significant changes of pAKT and pERK signaling (MFI) after SCF stimulation from the histogram in WT LSKs. There are no distinct peaks that indicate non-phospho-proteins and phosphoproteins. This casts doubt on the validity of results. It is possible though that Rsp12+/- have very high basal level of activation of pAKT/mTOR and pERK pathway. This again may point to a negative feedback mechanism of Rps12 haploinsufficiency.

      It is true that we did not observe an increase in pAKT, p4EBP1, or pERK in control cells in every case. This is often an issue with these specific phospho-flow cytometry antibodies, as they are not very sensitive, and the response to SCF is very time-dependent. We did observe an increase in pS6 with SCF in both LSK cells and progenitors (Figure 7B, E). However, the main point of this experiment was to assess the basal level of signaling in Rps12 KO/+ vs control cells. We did not observe hypersensitivity of RpS12 cells to SCF, but we did observe significant increases in pAKT, pS6, p4EBP1, and pERK in Rsp12 KO/+ LSK cells.

      To address the concern about the validity of staining, please see the requested flow histograms for unstained vs individual Phospho-antibodies (Ab): p4EBP1, pERK, pS6 and pAKT (Figure R1 for reviewers) below. Additionally, since staining with the surface antibodies potentially can change the peak, we are including additional an control of the cell surface antibodies vs full sample with surface antibodies and Phospho-Ab: p4EBP1, pERK, pS6 and pAKT. We can include this figure in the Supplementary Data if requested.

      5) The authors performed in vitro OP-Puro assay to assess the global protein translation in different HSPC subpopulations. 1) Can the authors provide more information about the incubation media, any cytokine or serum included? The incubation media with supplements may boost the overall translation status, although cells from WT and RpS12+/- are cultured side by side. Based on this, in vivo OP-Puro assay should be performed in both genotypes. 2) Polysome profiling assay should be performed in primary HSPCs, or at least in hematopoietic cell lines. It is plausible that RpS12 haploinsufficiency may affect the content of translational polysome fractions.

      We are including these details in the methods section: for in vitro OP-Puro assay (lines 555565) cells were resuspended in DMEM (Corning 10-013-CV) media supplemented with 50 µM β-mercaptoethanol (Sigma) and 20 µM OPP (Thermo Scientific C10456). Cells were incubated for 45 minutes at 37°C and then washed with Ca2+ and Mg2+ free PBS. No additional cytokines were added.

      We did not perform polysome profiles. Polysome profiling of mutant stem and progenitor cells would be very challenging, as their numbers are much reduced. We now deem this of reduced interest, given the conclusion of the revised manuscript that RpS12 haploinsufficiency reduces overall translation. Also, because in RpS12-floxed/+;SCL-CRE-ERT mouse model with acute deletion of RpS12 we observed the expected decrease in translation in HSCs using the same ex vivo OPP protocol, we did not follow up with in vivo OPP treatment,

    1. Author Response:

      Reviewer #1 (Public Review):

      1) All feeding data presented in the manuscript are from the interactions of individual flies with a source of liquid food, where interaction is defined as 'physical contact of a specific duration.' It would be helpful to approach the measurement of feeding from multiple angles to form the notion of hedonic feeding since the debate around hedonic feeding in Drosophila has been ongoing for some time and remains controversial. One possibility would be to measure food intake volumetrically in addition to food interaction patterns and durations (e.g. via the modified CAFE assay used by Ja).

      We acknowledge that our FLIC assays address only one dimension of feeding behavior, physical interaction with liquid food. However, there is clear evidence that interactions are strongly predictive of consumption, and it would be technically difficult to measure feeding durations at the resolution of milliseconds using a Café assay.  Nevertheless, we appreciate the spirit of this comment and agree that expanding our inference to other measures of feeding, as well as feeding environments, is an important next step. To this end, we will include measures of feeding on more traditional solid food, using the ConEx assay, and find that flies in the hedonic environment consume twice as much sucrose volume compared to flies in the control environment. These will be added as supplemental data (Figure 1 – Figure Supplement 1A), and the text will be updated to reflect our findings.

      2) Some of the statistical analyses were presented in a way that may make understanding the data unnecessarily difficult for readers. Examples include:

      a) In Table I the authors present food interaction classifications based on direct observation. These are helpful. However, the classification system is updated or incompletely used as the manuscript progresses, most importantly changing from four categories with seven total subcategories to three categories and no subcategories. In subsequent data analyses, only one or two of these categories are assessed. It would be helpful, especially when moving from direct observation to automated categorization, to quantify the exact correspondences between all of the prior and new classifications, as well as elaborate on the types of data that are being excluded.

      We appreciate the feedback on our usage of the behavioral classification system and will make several adjustments to improve it. We will rename some of the behaviors to make them more intuitive (see Reviewer #2, comment #1), and update the main text and Table 1 to reflect these changes. We will update the text and figures to be more transparent about when we group subcategories into main categories for quantification and when we quantify all subcategories separately. Because these videos required manual scoring by an experimenter, after our initial characterizations we opted to score only main categories (which contain subcategories). We agree that it would be useful to quantify correspondence between subcategories and the automated FLIC signal. However, we believe this task is better suited for more advanced and automated video tracking software, and, incidentally, more sophisticated analysis of FLIC data, which has a very high-dimensional character that has yet to be properly exploited. At the moment, therefore, we are not confident in the ability to understand the data at the desired resolution.

      b) The authors switch between a variety of biological and physiological conditions with varying assays, which makes following the train of reasoning nearly impossible to follow. For example, the authors introduce us to circadian aspects of feeding behavior to introduce the concept of 'meal' and 'non-meal' periods of the day. It is then not clear in which of the subsequent experiments this paradigm is used to measure food interactions. Is it the majority of the subsequent figure panels? However, the authors also use starved flies for some assays, which would be incompatible with circadian-locked meals. The somewhat random and incompletely reported use of males and females, which the authors show behave differently, also makes the results more difficult to parse. Finally, the authors are comparing within-fly for the 'control environment' and between flies for their 'hedonic environment' (Figure 3A and subsequent panels), which I believe is not a good thing to do.

      We apologize for our difficulties conveying our inference, which was also noted by Reviewer #2.  We will work hard to improve this component in the revision. With respect to the confusion about circadian feeding, we introduced circadian meal-times to complement starvation as a second (perhaps more natural) way to measure behaviors associated with hunger. Importantly, we do not use circadian meal-times beyond Figure 1; all subsequent FLIC experiments were conducted during non-meal times of day for 6 hours, which avoids confounding our data with circadian-locked meals even when we use starved flies. We will clarify this point in the revision.

      The reviewer also points out that we make both within-fly and between-fly comparisons, which is a point that we note. Perhaps some concern arises, again, from the challenges that we faced in properly delineating our inferences about different types of feeding measures (and motivations). Inference about homeostatic feeding was made using within-fly measures, comparing events on sucrose vs. those on yeast. Inference about hedonic feeding was made using between fly measures (average durations of different flies on 2% vs. 20% sucrose). Treatment comparisons to control always used measures of the same type, such that inference was not made using between-fly measures for treatment and within-fly for control (i.e., all of our figure panels were either within-fly or between fly). We will clarify this in the revision.

      Importantly, our approach to all experiments avoided confounding by used randomized design at multiple levels (e.g., randomizing control and hedonic environments to FLIC DFMs, alternating food choice sidedness in the DFMs), by ensuring that flies in both environments are sibling flies that came from the same vial environment before being tested, and by performing each experiment multiple times.

      c) Statistical analyses are not always used consistently. For example, in Figures 3B and C, post hoc test results are shown for sucrose vs. yeast interactions, but no such statistics are given for 3E and 3F, preventing readers from assessing if the assay design is measuring what the authors tell us it is measuring.

      We report p-values for two-way ANOVA interaction terms for all appropriate experiments. If (and only if) the interaction term is significant, we conduct post-hoc tests for more detailed statistical analysis and report the p-values. The reviewer points out that we do not perform post-hoc tests in figures 3E and 3F. These figures had a non-significant interaction term, and thus, we did not feel a post-hoc test was warranted.

      Reviewer #2 (Public Review):

      1) The dissection of feeding into distinct behavioral elements and its correlation with electrical FLIC signals that allow interpreting feeding types is a fundamental new method to dissect feeding in flies. However, the categories of micro-behaviors in Table 1 are not intuitive.

      We agree and will update the Table, figures, and main text. Please see also our response to Reviewer #1, comment #1.

      2) The details for the behavioral data analysis are not clear and should be made more obvious. For example, how many males and females were used in each experiment? Were any of the females mated or were they all virgins? If all virgins, why not use mated females? Mating status may have an effect on the feeding drive. If mated and virgin females were used, are there any differences between them? Similarly, for diurnal feeding experiments, it is not immediately clear from the graphs how many animals were used and how the frequencies were obtained (Fig. 1F, presumably averages for each category per fly but that is inconsistent with the legend in the supplement for this figure). Why does the transition heat map not include all micro-behaviors (Fig. 1E, no LQ data which are significant in diurnal feeding)?

      We will clarify the number of flies and events for each behavioral experiment in Figure 1, and we will update the figure legend appropriately. We note that these behavioral datasets are non-overlapping, and each time we mention the number of events scored in the text, that number includes only “new” videos. Female and male flies for all experiments were mated, and we will clarify this in the main text and methods.

      For the diurnal experiment in Figure 1F, we scored over 700 events from new (non-overlapping) video compilations and updated the number of flies and event number in the figure legend. The diurnal data we present in the supplement for this figure is a separate experiment conducted on 38 flies, intended only to demonstrate the circadian nature of fly feeding.

      For the transition heat map, analysis of this sort seems to require a large amount of data to have sufficient power to return a transition matrix. LQ events are relatively low in frequency, so we opted to combine them with L events for this analysis. We have updated the figure and figure legend to reflect this.

      3) The CaMPARI images do not look great, particularly in the pan-neuronal condition (Fig. 5A). It would be useful to include the movie of the stack. Did any other brain regions show activity differences, such as SEZ or PI? These regions are known to be involved in feeding so it seems surprising they show no effect.

      We find that CaMPARI imaging is subject to high levels of noise and background, especially when using a broad driver as the reviewer has pointed out. This is why we opted to follow-up our pan-neuronal CaMPARI experiment using a more specific mushroom body driver and to test our correlational findings of increased MB activity in hedonic environments with genetic approaches in the remainder of Figure 5. We will include movies of the confocal stacks for both CaMPARI experiments, as requested.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes the accrual of RSV mutations in a severely immunocompromised child with persistent infection and demonstrates that ribavirin increases the observed mutation rate with base pair changes (C to U and G to A) compatible with its known mechanism. The paper utilizes a mathematical model to explain the counterintuitive finding that viral load does not decrease despite loss of viral fitness and clinical improvement. Positive selection is observed but does not keep pace with deleterious mutations induced by ribavirin. Overall, though the data is restricted and limited to a single person, the analysis is rigorous and supports the paper's interesting conclusions.

      The paper is fascinating, but its generalizability is somewhat limited by the single study participant. Nevertheless, comparisons of therapy-induced deleterious mutations versus adaptive mutations over time is potentially important for multiple viruses.

      We thank the reviewer for their comments. Although we acknowledge that this is only a single case of infection, we believe that it is an interesting case, and we are keen to share our findings with the broader scientific community.

      Reviewer #2 (Public Review):

      In this work, Illingworth et al. investigate the effectiveness of ribavirin and favipiravir on the treatment of a paediatric patient with chronic RSV. These drugs cause mutations and the authors tested whether they could observe this effect through deep sequencing viruses from nasal aspirates over the course of treatment. They found an increase in mutations caused by ribavirin but favipiravir appeared to have no additional mutagenic effect. Despite the lack of change in viral load, the authors suggest that the ribavirin reduced viral fitness and did not lead to adaptive escape mutations. The authors modelled how generation time and fitness interacted with mutational load. They also estimated fitness for different haplotypes generated from the mutational data.

      Strengths of the paper:

      Using mutagenic drugs to treat viruses is generally accepted but results have been mixed with severe viral infections and specific evidence of the precise effects of the drugs is often lacking. This paper is especially valuable for demonstrating that despite in vitro evidence that favipiravir had some effect against RSV, there was no evidence for favipiravir having an effect in a patient. This differs from the authors previous work showing a clear clinical benefit to favipiravir in treating influenza. This paper also appears to be the first to sequence RSV from a patient having been exposed to ribavirin which is important for demonstrating that the drug is having a measurable effect.

      Weaknesses in the paper:

      I think there is a conceptual problem with the paper. Ribavirin is supposed to increase the mutational rate of the virus which would increase the mutational load. Mutational load has been calculated by summing up the frequencies of minor alleles. However, if a particular mutation rises in frequency, it does not mean that ribavirin has caused additional mutations at the same site but rather viruses containing the mutation have risen in frequency. If a subpopulation containing mutations rises through drift or selection to a relatively high percentage that will bias the mutational load. The authors provide ~75 mutations which were at significant percentages across multiple different timepoints. It seems that these mutations contribute significantly to the mutational load but changes in mutation percentages between samples do not reflect changes in mutational events but changes in viral haplotypes/subpopulations. In a previous study Lumby et al. 2020, the authors removed mutations at >5% from their analysis but there is no indication that they performed this step similarly here. Summing many small changes will give an indication of background mutational rate (though counting only a single mutation at each locus is perhaps the only method to remove the effect of viral clonal expansion).

      The mutational load is defined as the mean number of mutations per virus with respect to the consensus, equal to the sum of minor allele frequencies across the genome. We filter variant frequencies prior to calculating mutational load to compensate for noise arising from genome sequencing.

      We use a deterministic model of mutation-selection balance to describe the overall dynamics of mutational load, but are conscious that the dynamics of individual variants are complex. Genetic drift could contribute to these dynamics, as might hidden structure in the viral population, with stochastic observations of viruses from distinct subpopulations. As we make clear, our key assumption regarding mutational load is that all variants from the consensus are at least mildly deleterious; under this assumption calculating the sum of allele frequencies is an appropriate measurement of mutational load. Our model accounts for the possible presence of variants under stronger and weaker selection being observed at lower and higher frequencies respectively.

      We note that, in a case where distinct variants occurred in subpopulations, these variants would be observed in a mixture at lower frequencies than they existed in the subpopulations. This would lead to the observation of more variants overall, with each variant being at a reduced frequency. While stochastic effects would alter the frequencies of mutations in individual samples, if mutational load acted equally on each subpopulation, the total mutational load would be preserved across samples. The existence of subpopulations would not of itself invalidate the calculation of mutational load as we have performed it.

      Our previous study Lumby et al, 2020 considered a case where favipiravir was given for a short period of time in a case of influenza B infection. In that case we did not make an assessment of the total mutational load in a population, although we did remove mutations at >5% when considering the spectrum of mutations i.e. the proportion of mutations of each type C to T, G to A, etc. We are still working on different approaches to measuring mutational load, but we are not convinced that removing high frequency mutations is always a good idea when evaluating the total mutational load. Cutting out higher frequencies is potentially a useful means to study mutational spectra under viral mutagenesis, but in a measurement of mutational load it could exclude deleterious mutations.

      While ribavirin appears to have shown an effect, many questions remain. Why does the mutational load only increase for 3 points before plateauing? The authors would likely argue that this is the new saturation point for mutation load but they don't test it. Sequencing points from after the cessation of treatment would be expected to show lower mutational load but this data was not collected. Furthermore, questions remain over the methodology. It is thought that Ribavirin should only increase transitions and a transition/transversion ratio for the different samples would have been helpful. The absolute numbers of many mutation classes appear to have increased including transversions e.g AU. There isn't a good reason why nucleoside analogues should have caused this effect and perhaps it is an artefact.

      Ribavirin has been shown to increase C to T and G to A mutations; these are both transitions, but T to C and A to G mutations are also transitions; the proportion of these was found to decrease under treatment. We have included a new figure showing Ts/Tv ratios but we do not find a significant pattern of change in these statistics over time.

      The plateauing of the observed mutational load is consistent with the theory of mutationselection balance. Following a change in the mutation rate we would expect a shift to a new equilibrium U/s.

      Sequencing was conducted as part of an investigation that was secondary to treatment of the patient: All of the samples that were collected were sequenced. We agree that upon the cessation of mutagenic drugs we would expect to see a fall in mutational load.

      I don't think that the authors can reasonably determine how many haplotypes there are in the population from short read sequencing data. I think that the sequencing data very clearly shows subpopulations due to the large changes in mutation frequencies between different time points. The authors say that their analysis assumes a well-mixed population which is clearly not the case. Therefore, determining fitness of different haplotypes or mutations is likely not accurate.

      Although we have short read sequencing data, some of the reads we have span more than one locus, providing some information about linkage between variants. As noted in the Methods section our inference approach provides a minimal reconstruction of haplotypes: Our reconstruction details the smallest set of distinct haplotypes necessary to explain the data.

      Where we use a haplotype-based model to reconstruct the within-host evolution of the population, we neglect the potential presence of subpopulations by assuming a well-mixed population, then fully discuss the implications of this assumption for our result.

      Our basic question is whether within-host adaptation leads to a gain in viral fitness in excess of the loss of fitness imposed by an increase in mutational load. In this comparison we make a conservative (i.e. low) estimate for the extent of the loss of fitness through mutational load.

      When we look at within-host evolution our assumption of a well-mixed population attributes all of the systematic change in the viral population to the effects of selection. If some of this change arises through stochastic differences in emissions from a structured population, the influence of selection would be less than our inference. Thus, our estimate of the gain in fitness through within-host adaptation is a high estimate. As our high estimate of within-host fitness gain is less than a low estimate of the fitness lost through mutational load, our result is robust to our assumption.

      The authors construct a model to estimate viral fitness and suggest that viral fitness decreased with the drug. This is somewhat problematic to me as viral load has not changed so it would be reasonable to say that viral fitness was likely unaffected by the drug. The authors define fitness in terms of the number of mutations that each virus likely has and assumes that these mutations are deleterious. The authors then use this to claim that mutagenic drugs reduce fitness. This seems very circular to me. If the drugs reduce fitness, it should be observed as a property of the virus population. As the only measure was viral load, which didn't change, it is difficult to claim ribavirin reduced viral fitness. There are other reasons why there could be an increase in the number of mutations e.g. sequencing more subpopulations which would have nothing to do with fitness.

      We have discussed our assumption that variants in the viral population are deleterious; this lies behind the use of a model of mutation-selection balance. Under this assumption, the accumulation of a greater number of mutations following ribavirin treatment is indicative of a loss of viral fitness, although we cannot precisely quantify the magnitude of this loss. The link between an increased mutation rate and lower viral fitness is intrinsic to the concept of mutagenic drugs; our data show an increase in mutational load coincident with the therapeutic use of ribavirin.

      A change in viral fitness does not necessarily lead to a substantial and clearly observable drop in viral load; we say more about this in the response to comments below.

      At various points, the paper assumes that there is no selection taking place but immunoglobulin was being applied weekly and palivizumab monthly. The timing of when these drugs were given should be included. How did the palivizumab affect selection? The K272E mutation seems to go up and down but it is not clear if this was in response to drug infusion timing or if this mutation was present in a subpopulation.

      Our approach assumes that selection could act at two distinct levels: Firstly, we assume that the observed increase in mutational load correlates to a reduction in viral fitness; the link between viral fitness and mutational load is intrinsic to the equation of Haldane. Secondly we use a haplotype-based model to infer how selection is acting on the level of higherfrequency mutations; under the assumption of a well-mixed model we identify a signal of within-host adaptation.

      We have added details of the timing of palivizumab treatment to Figure 1. Immunoglobulin was given throughout; details of treatment have been given in Supporting Data. As we have now clarified in the Methods, our identification of potentially selected alleles was a two stage process, with the first assessing the level of noise in the data. Our model of noise envisages nonuniformity arising from multiple sources, including a situation whereby the viral population was divided in subpopulations, and in which reads comprised stochastic samples from these subpopulations. Given our model for noise, the observation of the K272E mutation at generally higher frequencies in earlier samples and generally lower frequencies in later samples was sufficient to call this as a potentially selected variant. We did not explore more complex models of drug-dependent selection.

      I think the main impact of the paper will be that favipiravir will not be used in the future to treat RSV. Given that the EC50 of favipiravir against RSC is ~100x that of influenza, favipiravir was unlikely to reach a therapeutic level in the patient. Nucleoside analogues have a mixed record at treating serious viral infections. Hopefully, this work will spur on future studies to precisely measure the effect that ribavirin has on RSV.

      Favipiravir was used in this patient following its successful experimental use against a case of influenza B infection (Lumby et al., 2020). We would be happy if our work inspires future research in this area.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript explores how biliary epithelial cells respond to excess dietary lipids, an important area of research given the increasing prevalence of NAFLD. The authors utilize in vivo models complemented with cultured organoid systems. Interesting, E2F transcription factors appear important for BEC glycolytic activation and proliferation.

      We thank this reviewer for his/her comments and for finding the E2F-mediated mechanism of interest.

      Much of the work utilizes the BEC-organoid model, which is complicated by the fact that liver cell organoid models often fail to maintain exclusive cell identity in culture. The method used by the authors (Broutier et al., 2016) can lead to organoids with a mixture of ductal and hepatocyte markers. It would be helpful for the authors to further demonstrate the cholangiocyte identity of the organoid cells.

      We understand the concern of this reviewer. Indeed, this method can give rise to biliary cells or more hepatocyte-like cells. However, this choice depends on the culture media used. Our experiments used BEC-organoids in an undifferentiated state with a biliary expression profile. Please see point 1 above for a detailed answer.

      The authors suggest that BECs form lipid droplets in vivo by detecting BODIPY immunofluorescence of liver cryosections. While confocal microscopy would ensure that the BODIPY fluorescence signal is within the same plane as the cell of interest, the authors use a virtual slide microscope that cannot exclude fluorescence from a different focal plane. The conclusion that BECs accumulate lipids does not seem to be fully supported by this analysis.

      We fully agree with this criticism. To address this concern, we decided to use FACS analysis, a quantitative and independent method, to further confirm our initial findings. To this end, we stained sorted EPCAM+ BECs isolated from livers of CD- or HFD-fed mice with BODIPY, quantified the number of BODIPY+/EPCAM+ BECs in each experimental condition, and confirmed that these cells accumulate more lipids after HFD feeding (New Figure 1I, page 5, lines 112-115, and see also reply rebuttal to point 4).

      Several mouse experiments rely heavily on rare BEC proliferation events with the median proliferation event per bile duct being 0-1 cell. While the proliferative effect appears consistent across experiments, a more quantitative approach, such as performing Epcam+ BEC FACS and flow cytometry-based cell cycle analyses, would be helpful.

      Following this suggestion, we quantified proliferative EdU+ BEC cells by FACS in a new cohort of C57BL/6J mice fed CD or HFD. These data, now included in the revised manuscript (New Figure 2G, page 7, lines 143-147), strongly confirm that immunofluorescence quantification mirrors the FACS quantification and reinforce the initial finding that EPCAM+ BECs proliferate more in the livers of HFD-fed mice. Please see point 6 above for a detailed answer.

      Finally, it is not yet clear how relevant the findings in this study are to ductular reaction, which is a non-specific histopathologic indicator of liver injury in the context of severe liver disease. In NAFLD, the ductular reaction is uncommon in benign steatosis, and if seen at all, occurs in the setting of substantial liver inflammation and fibrosis (Gadd et al., Hepatology 2014). The authors use a dietary model containing 60 kcal% fat, which causes adipose lipid accumulation as well as subsequent liver lipid accumulation. This diet does not cause overt inflammation or fibrosis that would represent experimental NASH, which typically requires the addition of cholesterol in dietary lipid NASH models (Farrell et al., Hepatology, 2019). While the E2F-driven proliferation may be important for physiologic bile duct function in the setting of obesity, the claim that E2Fs mediate DR initiation would require an additional pathophysiologic model or human data to demonstrate relevance. The authors could clarify this point in their discussion.

      We agree with this reviewer that 15 weeks of HFD on C57BL/6J feeding are insufficient to trigger a ductular reaction. For this purpose, we used the term “BEC activation” in our manuscript, which refers to the first mandatory step for the ductular reaction to initiate. We apologize if our initial manuscript did not sufficiently emphasize this point. However, as suggested by the reviewer we investigated the ductular reaction in our model. In order to further characterize the livers after 15 weeks of CD or HFD feeding, we stained the bile ducts for pancytokeratin (PANCK) and osteopontin (OPN) and asked a pathologist (Dr. Christine Gopfert at EPFL) to evaluate these sections with a particular focus on the bile ducts. She concluded that the livers of HFD-fed mice showed steatosis and inflammation but no apparent fibrosis (New Figure 1 – figure supplement 1E). The shape of bile ducts was similar in the livers of CD- and HFD-fed mice (New Figure 1 – figure supplement 1I), concomitant with the absence of portal fibrosis and inflammation. In addition, we checked the expression levels of several established markers of ductular reaction in our RNA sequencing data and observed that, of all these genes, only Ncam1 was significantly upregulated with HFD feeding in EPCAM+-BEC cells (New Figure 2 – figure supplements 1D and 1E, Page 6, lines 127-131). Overall, these data support our conclusion that HFD triggers BEC activation without signs of an established ductular reaction and might suggest Ncam1 as a marker for this initial BEC activation process. Please see point 3 above for a detailed answer.

      Reviewer #2 (Public Review):

      The manuscript by Yildiz et al investigates the early response of BECs to high fatty acid treatment. To achieve this, they employ organoids derived from primary isolated BECs and treat them with a FA mix followed by viability studies and analysis of selected lipid metabolism genes, which are upregulated indicating an adjustment to lipid overload. Both organoids with lipid overload and BECs in mice exposed to a HFD show increased BEC proliferation, indicating BEC activation as seen in DR. Applying bulk RNA-sequencing analysis to sorted BECs from HFD mice identified four E2F transcription factors and target genes as upregulated. Functional analysis of knock-out mice showed a clear requirement for E2F1 in mediating HFD induced BEC proliferation. Given the known function of E2Fs the authors performed cell respiration and transcriptome analysis of organoids challenged with FA treatment and found a shift of BECs towards a glycolytic metabolism. The study is overall well-constructed, including appropriate analysis. Likewise, the manuscript is written clearly and supported by high-quality figures.

      We appreciate that this reviewer finds our study well-constructed, clear, and with high-quality figures.

      My major point is the lack of classification of the progression of DR, since the authors investigate the early stages of DR associated with lipid overload reminiscent of stages preceding late NAFLD fibrosis. How are early stages distinguished from later stages in this study? Molecularly and/or morphologically? While the presented data are very suggestive, a more substantial description would support the findings and resulting claims.

      We thank the reviewer for the suggestion. We would like to emphasize that instead of ductular reaction, we used the term “BEC activation” in our revised manuscript, referring to the first mandatory step for initiating the ductular reaction. Both reviewers criticized the poor characterization of the ductular reaction process in the first version of our study; we put substantial effort into further clarifying this point. Our response to this point can be read in our reply to the last comment of reviewer 1 and point 3 of the rebuttal.

    1. Author Response

      Reviewer #1 (Public Review):

      It is now widely accepted that the age of the brain can differ from the person's chronological age and neuroimaging methods are ideally suited to analyze the brain age and associated biomarkers. Preclinical studies of rodent models with appropriate neuroimaging do attest that lifestyle-related prevention approaches may help to slow down brain aging and the potential of BrainAGE as a predictor of age-related health outcomes. However, there is a paucity of data on this in humans. It is in this context the present manuscript receives its due attention.

      Comments:

      1) Lifestyle intervention benefits need to be analyzed using robust biomarkers which should be profiled non-invasively in a clinical setting. There is increasing evidence of the role of telomere length in brain aging. Gampawar et al (2020) have proposed a hypothesis on the effect of telomeres on brain structure and function over the life span and named it as the "Telomere Brain Axis". In this context, if the authors could measure telomere length before and after lifestyle intervention, this will give a strong biomarker utility and value addition for the lifestyle modification benefits. 2) Authors should also consider measuring BDNF levels before and after lifestyle intervention.

      Response to comments 1+2: we agree that associating both telomere length and BDNF level with brain age would be interesting and relevant. However, we did not measure these two variables. We would certainly consider adding these in future work. Regarding telomere length, we now include a short discussion of brain age in relation to other bodily ages, such as telomere length (Discussion section):

      “Studying changes in functional brain aging is part of a broader field that examines changes in various biological ages, such as telomere length1, DNA methylation2, and arterial stiffness3. Evaluating changes in these bodily systems over time allows us to capture health and lifestyle-related factors that affect overall aging and may guide the development of targeted interventions to reduce age-related decline. For example, in the CENTRAL cohort, we recently reported that reducing body weight and intrahepatic fat following a lifestyle intervention was related to methylation age attenuation4. In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among dataset6 and larger training sample size5 may improve the accuracy of such models in the future. We also suggest that examining the dynamics of multiple bodily ages and their interactions would enhance our understanding of the complex aging process8,9. “

      And

      “These findings complement the growing interest in bodily aging indicated, for example, by DNA methylation4 as health biomarkers and interventions that may affect them.”

      Reviewer #2 (Public Review):

      In this study, Levakov et al. investigated brain age based on resting-state functional connectivity (RSFC) in a group of obese participants following an 18-month lifestyle intervention. The study benefits from various sophisticated measurements of overall health, including body MRI and blood biomarkers. Although the data is leveraged from a solid randomized control set-up, the lack of control groups in the current study means that the results cannot be attributed to the lifestyle intervention with certainty. However, the study does show a relationship between general weight loss and RSFC-based brain age estimations over the course of the intervention. While this may represent an important contribution to the literature, the RSFC-based brain age prediction shows low model performance, making it difficult to interpret the validity of the derived estimates and the scale of change. The study would benefit from more rigorous analyses and a more critical discussion of findings. If incorporated, the study contributes to the growing field of literature indicating that weight-reduction in obese subjects may attenuate the detrimental effect of obesity on the brain.

      The following points may be addressed to improve the study:

      Brain age / model performance:

      1) Figure 2: In the test set, the correlation between true and predicted age is 0.244. The fitted slope looks like it would be approximately 0.11 (55-50)/(80-35); change in y divided by change in x. This means that for a chronological age change of 12 months, the brain age changes by 0.11*12 = 1.3 months. I.e., due to the relatively poor model performance, an 80-year-old participant in the plot (fig 2) has a predicted age of ~55. Hence, although the age prediction step can generate a summary score for all the RSFC data, it can be difficult to interpret the meaning of these brain age estimates and the 'expected change' since the scale is in years.

      2) In Figure 2 it could also help to add the x = y line to get a better overview of the prediction variance. The estimates are likely clustered around the mean/median age of the training dataset, and age is overestimated in younger subs and overestimated in older subs (usually referred to as "age bias"). It is important to inspect the data points here to understand what the estimates represent, i.e., is variation in RSFC potentially lost by wrapping the data in this summary measure, since the age prediction is not particularly accurate, and should age bias in the predictions be accounted for by adjusting the test data for the bias observed in the training data?

      Response to comment 1+2: we agree with the reviewer that due to the relatively moderate correlation between the predicted and observed age, a large change in the observed age corresponds to a small change in the predicted age. We now state this limitation in Results section 2.1:

      “Despite being significant and reproducible, we note that the correlations between the observed and predicted age were relatively moderate.”

      And discuss this point in the Discussion section:

      “In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among dataset6 and larger training sample size5 may improve the accuracy of such models in the future.”

      Moreover, , we now add the x=y line to Fig. 2, so the readers can better assess the prediction variance as suggested by the reviewer:

      We prefer to avoid using different scales (year/month) in the x and y axes to avoid misleading the readers, but the list of observed and predicted ages are available as SI files with a precision of 2 decimals point (~3 days).

      We note that despite the moderate precision accuracy, we replicated these results in three separate cohorts.

      Regarding the effect of “age bias” (also known as “regression attenuation” or “regression dilution” 10), we are aware of this phenomenon and agree that it must be accounted for. In fact, the “age bias” is one of the reasons we chose to use the difference between the expected and observed ages as the primary outcome of the study, as this measure already takes this bias into account. To demonstrate this effect we now compute brain age attenuation in two ways: 1. As described and used in the current study (Methods 4.9); and 2. By regressing out the effect of age on the predicted brain age at both times separately, then subtracting the adjusted predicted age at T18 from the adjusted predicted age at T0. The second method is the standard method to account for age bias as described in a previous work 11. Below is a scatter plot of both measures across all participants:

      The x-axis represents the first method, used in the current study, and the y-axis represents the second method, described in Smith et al., (2019). Across all subjects, we found a nearly perfect 1:1 correspondence between the two methods (r=.998, p<0.001; MAE=0.45), as the two are mathematically identical. The small gap between the two is because the brain age attenuation model also takes into account the difference in the exact time that passed between the two scans for each participant (mean=21.36m, std = 1.68m).

      We now note this in Methods section 4.9:

      “We note that the result of computing the difference between the bias-corrected brain age gap at both times was nearly identical to the brain age attenuation measure (r=.99, p<0.001; MAE=0.45). The difference between the two is because the brain age attenuation model takes into account the difference in the exact time that passed between the two scans for each participant (mean=21.36m, std = 1.68m).”

      3) In Figure 3, some of the changes observed between time points are very large. For example, one subject with a chronological age of 62 shows a ten-year increase in brain age over 18 months. This change is twice as large as the full range of age variation in the brain age estimates (average brain age increases from 50 to 55 across the full chronological age span). This makes it difficult to interpret RSFC change in units of brain age. E.g., is it reasonable that a person's brain ages by ten years, either up or down, in 18 months? The colour scale goes from -12 years to 14 years, so some of the observed changes are 14 / 1.5 = 9 times larger than the actual time from baseline to follow-up.

      We agree that our model precision was relatively low, especially compared to the period of the intervention, as also stated by reviewer #1. We now discuss this issue in light of the studies pointed out by the reviewer (Discussion section):

      “In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among datasets6 and larger training sample size5 may improve the accuracy of such models in the future.”

      Again, we note that despite the moderate precision accuracy, we replicated these results in three separate cohorts and found that both the correlation and the MAE between the predicted and observed age were significant in all of them.

      RSFC for age prediction:

      1) Several studies show better age prediction accuracy with structural MRI features compared to RSFC. If the focus of the study is to use an accurate estimate of brain ageing rather than specifically looking at changes in RSFC, adding structural MRI data could be helpful.

      We focused on brain structural changes in a previous work, and the focus of the current work was assessing age-related functional connectivity alterations. We now added a few sentences in the Introduction section that would hopefully better motivate our choice:

      “We previously found that weight loss, glycemic control, lowering of blood pressure, and increment in polyphenols-rich food were associated with an attenuation in brain atrophy 12. Obesity is also manifested in age-related changes in the brain’s functional organization as assessed with resting-state functional connectivity (RSFC). These changes are dynamic13 and can be observed in short time scales14 and thus of relevance when studying lifestyle intervention.”

      2) If changes in RSFC are the main focus, using brain age adds a complicated layer that is not necessarily helpful. It could be easier to simply assess RSFC change from baseline to follow up, and correlate potential changes with changes in e.g., BMI.

      We are specifically interested in age-related changes as we described a-priori in the registration of the study: https://clinicaltrials.gov/ct2/show/NCT03020186

      Moreover, age-related changes in RSFC are complex, multivariate and dependent upon the choice of theoretical network measures. We think that a data-driven brain age prediction approach might better capture these multifaceted changes and their relation to aging. We now state this in the Introduction section:

      “Studies have linked obesity with decreased connectivity within the default mode network15,16 and increased connectivity with the lateral orbitofrontal cortex17, which are also seen in normal aging18,19. Longitudinal trials have reported changes in these connectivity patterns following weight reduction20,21, indicating that they can be altered. However, findings regarding functional changes are less consistent than those related to anatomical changes due to the multiple measures22 and scales23 used to quantify RSFC. Hence, focusing on a single measure, the functional brain age, may better capture these complex, multivariant changes and their relation to aging. “

      The lack of control groups

      1) If no control group data is available, it is important to clarify this in the manuscript, and evaluate which conclusions can and cannot be drawn based on the data and study design.

      We agree that this point should be made more clear, and we now state this in the limitation section of the Discussion:

      “We also note that the lack of a no-intervention control group limits our ability to directly relate our findings to the intervention. Hence, we can only relate brain age attenuation to the observed changes in health biomarkers.”

      Also, following reviewers’ #2 and #3 comments, we refer to the weight loss following 18 months of lifestyle intervention instead of to the intervention itself. This is now made clear in the title, abstract, and the main text.

      Reviewer #3 (Public Review):

      The authors report on an interesting study that addresses the effects of a physical and dietary intervention on accelerated/decelerated brain ageing in obese individuals. More specifically, the authors examined potential associations between reductions in Body-Mass-Index (BMI) and a decrease in relative brain-predicted age after an 18-months period in N = 102 individuals. Brain age models were based on resting-state functional connectivity data. In addition to change in BMI, the authors also tested for associations between change in relative brain age and change in waist circumference, six liver markers, three glycemic markers, four lipid markers, and four MRI fat deposition measures. Moreover, change in self-reported consumption of food, stratified by categories such as 'processed food' and 'sweets and beverages', was tested for an association with change in relative brain age. Their analysis revealed no evidence for a general reduction in relative brain age in the tested sample. However, changes in BMI, as well as changes in several liver, glycemic, lipid, and fat-deposition markers showed significant covariation with changes in relative brain age. Three markers remained significant after additionally controlling for BMI, indicating an incremental contribution of these markers to change in relative brain age. Further associations were found for variables of subjective food consumption. The authors conclude that lifestyle interventions may have beneficial effects on brain aging.

      Overall, the writing is concise and straightforward, and the langue and style are appropriate. A strength of the study is the longitudinal design that allows for addressing individual accelerations or decelerations in brain aging. Research on biological aging parameters has often been limited to cross-sectional analyses so inferences about intra-individual variation have frequently been drawn from inter-individual variation. The presented study allows, in fact, investigating within-person differences. Moreover, I very much appreciate that the authors seek to publish their code and materials online, although the respective GitHub project page did not appear to be set to 'public' at the time (error 404). Another strength of the study is that brain age models have been trained and validated in external samples. One further strength of this study is that it is based on a registered trial, which allows for the evaluation of the aims and motivation of the investigators and provides further insights into the primary and secondary outcomes measures (see the clinical trial identification code).

      One weakness of the study is that no comparison between the active control group and the two experimental groups has been carried out, which would have enabled causal inferences on the potential effects of different types of interventions on changes in relative brain age. In this regard, it should also be noted that all groups underwent a lifestyle intervention. Hence, from an experimenter's perspective, it is problematic to conclude that lifestyle interventions may modulate brain age, given the lack of a control group without lifestyle intervention. This issue is fueled by the study title, which suggests a strong focus on the effects of lifestyle intervention. Technically, however, this study rather constitutes an investigation of the effects of successful weight loss/body fat reduction on brain age among participants who have taken part in a lifestyle intervention. In keeping with this, the provided information on the main effect of time on brain age is scarce, essentially limited to a sign test comparing the proportions of participants with an increase vs. decrease in relative brain age. Interestingly, this analysis did not suggest that the proportion of participants who benefit from the intervention (regarding brain age) significantly exceeds the number of participants who do not benefit. So strictly speaking, the data rather indicates that it's not the lifestyle intervention per sé that contributes to changes in brain age, but successful weight loss/body fat reduction. In sum, I feel that the authors' claims on the effects of the intervention cannot be underscored very well given the lack of a control group without lifestyle intervention.

      We agree that this point, also raised by reviewer #2, should be made clear, and we now state this in the limitation section of the Discussion:

      “We also note that the lack of a no-intervention control group limits our ability to directly relate our findings to the intervention. Hence, we can only relate brain age attenuation to the observed changes in health biomarkers.”

      Also, following reviewers #2 and #3, we refer to the weight loss following 18 months of lifestyle intervention instead of to the intervention itself. This is now explicitly mentioned in the title, abstract, and within the text:

      Title: “The effect of weight loss following 18 months of lifestyle intervention on brain age assessed with resting-state functional connectivity”

      Abstract: “…, we tested the effect of weight loss following 18 months of lifestyle intervention on predicted brain age, based on MRI-assessed resting-state functional connectivity (RSFC).”

      Another major weakness is that no rationale is provided for why the authors use functional connectivity data instead of structural scans for their age estimation models. This gets even more evident in view of the relatively low prediction accuracies achieved in both the validation and test sets. My notion of the literature is that the vast majority of studies in this field implicate brain age models that were trained on structural MRI data, and these models have achieved way higher prediction accuracies. Along with the missing rationale, I feel that the low model performances require some more elaboration in the discussion section. To be clear, low prediction accuracies may be seen as a study result and, as such, they should not be considered as a quality criterion of the study. Nevertheless, the choice of functional MRI data and the relevance of the achieved model performances for subsequent association analysis needs to be addressed more thoroughly.

      We agree that age estimation from structural compared to functional imaging yields a higher prediction accuracy. In a previous publication using the same dataset12, we demonstrated that weight loss was associated with an attenuation in brain atrophy, as we describe in the introduction:

      “We previously found that weight loss, glycemic control and lowering of blood pressure, as well as increment in polyphenols rich food, were associated with an attenuation in brain atrophy 12.”

      Here we were specifically interested in age-related functional alterations that are associated with successful weight reduction. Compared to structural brain changes aging effect on functional connectivity is more complex and multifaced. Hence, we decided to utilize a data-driven or prediction-driven approach for assessing age-related changes in functional connectivity by predicting participants’ functional brain age. We now describe this rationale in the introduction section:

      “Studies have linked obesity with decreased connectivity within the default mode network15,16 and increased connectivity with the lateral orbitofrontal cortex17, which are also seen in normal aging18,19. Longitudinal trials have reported changes in these connectivity patterns following weight reduction20,21, indicating that they can be altered. However, findings regarding functional changes are less consistent than those related to anatomical changes due to the multiple measures22 and scales23 used to quantify RSFC. Hence, focusing on a single measure, the functional brain age, may better capture these complex changes and their relation to aging.”

      We address the point regarding the low model performance in response to reviewer #2, comment #2.

    1. Author Response

      Reviewer #1 (Public Review):

      IRF8 is a key transcription factor in the differentiation of hematopoietic cell lineages including dendritic cell (DC) and monocyte/macrophage lineages. The promoter and enhancer regions of Irf8 have been a focus of intense research in recent times. In the submitted study Xu H. et. Al., have first time reported a lncRNA transcribed specifically in the pDC subtype from +32Kb which is also the region for the enhancer for Irf8 specifically in the cDC1 subtype. Authors have employed modern-day tools for an in-depth understanding of the role of lncIrf8, its promoter region, and crosstalk with Irf8 promoter to identify that it is not the lncIRF8 itself but its promoter region is crucial for pDC and cDC1 differentiation conferring feedback inhibition of Irf8 transcription. In the attempt to decipher the crosstalk between the promoter regions of IRF8 and lncIRF8 by employing various in vitro artificial systems, the study falls short of identifying the real significance of the lncIRF8 which is specifically expressed in pDC subtype.

      We appreciate the public review made by the reviewer. We agree with the reviewer that most of the experiments on the identification of the negative feedback regulation of IRF8 via the lncIRF8 promoter element were carried out in vitro. But we would like to point out also our in vivo work: (i) transplantation lncIRF8 promoter KO cells into mice demonstrates that pDC and cDC1 development were compromised (Figure 3); (ii) lncIRF8 is expressed in in vivo BM and spleen pDC (new Figure 1-figure supplement 3). We also would like to emphasize that (i) in vivo studies on the identification of the negative feedback regulation of IRF8 via the lncIRF8 promoter element and (ii) mechanistic studies with CRISPR activation and CRISPR interference would have been difficult to perform in vivo with current tools available in mice.

      According to our current understand lncIRF8 act as an indicator of +32 kb enhancer activity and we agree with the reviewer that further potential functions of lncIRF8 still need to be explored. We added a sentence on page 13, lines 427 and 428 on potential additional functions of lncIRF8:

      "However, lncIRF8 might have additional functions in DC biology, which are not revealed in the current study and remain to be identified."

      Reviewer #2 (Public Review):

      The manuscript of Xu and colleagues examines in detail the regulation of the important transcription factor IRF8 in dendritic cell (DC) subsets. They identify a long noncoding RNA arises from the +32kb enhancer of IRF8 specifically in plasmacytoid DCs (pDCs)and show clearly that this lncIRF8 marks the activity of a region of this enhancer but the RNA itself does not appear to have any function. Deletion of the promoter of the lncIRF8 ablated cDC1 and pDC differentiation using an in vitro cell differentiation model. The authors propose an innovative model that the lncIRF8 promoter sequences act to limit IRF8 expression in cDC1, but are inactive in pDCs, resulting in their characteristically very high IRF8 expression.

      This is a conceptually interesting study that makes excellent use of an extensive set of genomic data for the DC subsets. There has been a lot of recent research investigating the regulation of the IRF8 gene in hematopoiesis and this study provides an important new aspect to the work. The use of an in vitro model of DC differentiation is a powerful practical approach to investigating IRF8 regulation, as is the innovative use of CRISPR technology. Perhaps the biggest limitation of this study is that the authors have not conformed to the in-cell system data by creating a mouse strain lacking the lncIRF8 element. Such approaches by others, most notably the Murphy lab, have been instrumental in pushing this field forward. Nevertheless, Xu et al. significantly add to our current knowledge of the regulation of IRF8, a critical step in forming the dendritic cell network.

      We appreciate the public review made by the reviewer and the positive assessment of our work. We agree with the review that extending our in-cell system data to lncIRF8 promoter KO mice will further strengthen our data and this will be subject of our future work.

    1. Author Response

      Reviewer #1 (Public Review):

      Using health insurance claims data (from 8M subjects), a retrospective propensity score matched cohort study was performed (450K in both groups) to quantify associations between bisphosphonate (BP) use and COVID- 19 related outcomes (COVID-19 diagnosis, testing and COVID-19 hospitalization. The observation periods were 1-1-2019 till 2-29-2020 for BP use and from 3-1-2020 and 6-30-2020 for the COVID endpoints. In primary and sensitivity analyses BP use was consistently associated with lower odds for COVID-19, testing and COVID-19 hospitalization.

      The major strength of this study is the size of the study population, allowing a propensity-based matched- cohort study with 450K in both groups, with a sizeable number of COVID-19 related endpoints. Health insurance claims data were used with the intrinsic risk of some misclassification for exposure. In addition there probably is misclassification of endpoints as testing for COVID-19 was limited during the study period. Furthermore, the retrospective nature of the study includes the risk of residual confounding, which has been addressed - to some extent - by sensitivity analyses.

      In all analyses there is a consistent finding that BP exposure is associated with reduced odds for COVID-19 related outcomes. The effect size is large, with high precision.

      The authors extensively discuss the (many) potential limitations inherent to the study design and conclude that these findings warrant confirmation, preferably in intervention studies. If confirmed BP use could be a powerful adjunct in the prevention of infection and hospitalization due to COVID-19.

      We thank the reviewer for this overall very positive feedback. We appreciate the reviewer's comments regarding the potential risks associated with misclassification of exposure and other potential limitations, which we have sought to address in a number of sensitivity analyses and are also addressing in the discussion of our paper. In addition, as noted by the reviewer, the observed effect size of BP use on COVID-19 related outcomes is large, with high precision, which we feel is a strong argument to explore this class of drugs in further prospective studies.

      Reviewer #2 (Public Review):

      The authors performed a retrospective cohort study using claims data to assess the causal relationship between bisphosphonate (BP) use and COVID-19 outcomes. They used propensity score matching to adjust for measured confounders. This is an interesting study and the authors performed several sensitivity analyses to assess the robustness of their findings. The authors are properly cautious in the interpretation of their results and justly call for randomized controlled trials to confirm a causal relationship. However, there are some methodological limitations that are not properly addressed yet.

      Strengths of the paper include:

      (A) Availability of a large dataset.

      (B) Using propensity score matching to adjust for confounding.

      (C) Sensitivity analyses to challenge key assumptions (although not all of them add value in my opinion, see specific comments)

      (D) Cautious interpretation of results, the authors are aware of the limitations of the study design.

      Limitation of the paper are:

      (A) This is an observational study using register data. Therefore, the study is prone to residual confounding and information bias. The authors are well aware of that.

      (B) The authors adjusted for Carlson comorbidity index whereas they had individual comorbidity data available and a dataset large enough to adjust for each comorbidity separately.

      (C) The primary analysis violates the positivity assumption (a substantial part of the population had no indication for bisphosphonates; see specific comments). I feel that one of the sensitivity analyses 1 or 2 would be more suited for a primary analysis.

      (D) Some of the other sensitivity analyses have underlying assumptions that are not discussed and do not necessarily hold (see specific comments).

      In its current form the limitations hinder a good interpretation of the results and, therefore, in my opinion do not support the conclusion of the paper.

      The finding of a substantial risk reduction of (severe) COVID-19 in bisphosphonate users compared to non- users in this observational study may be of interest to other researchers considering to set up randomized controlled trials for evaluation of repurpose drugs for prevention of (severe) COVID-19.

      We thank the reviewer for the insightful comments and questions related to our manuscript. Our response to the concerns regarding limitations of our study is as follows:

      (A) We agree that there is likely residual confounding and information bias due to use of US health insurance claims datasets which do not include information on certain potentially relevant variables. Nonetheless, given the large effect size and precision of our analysis, we feel that our findings support our main conclusion that additional prospective trials appear warranted to further explore whether BPs might confer a meaure of protection against severe respiratory infections, including COVID-19. We have added a sentence on the second page of our Discussion (line 859-860) to emphasize this point: "Specifically, there is the potential that key patient characteristics impacting outcomes could not be derived from claims data."

      (B) The progression of this study mirrors the real-world performance of the analysis where we initially used the CCI in matching to control for comorbidity burden on a broader scale. This was our a priori approach. After observing large effect sizes, we performed more stringent matching for sensitivity analyses 1 and 2. Irrespective of the matching strategy chosen, effect sizes remained similar for all outcome parameters. Therefore, we elected to include both the primary analysis and the sensitivity analyses with more stringent matching in order to more transparently show what was done in entirety during our analyses, as we feel it displays all of the efforts taken to identify sources of unmeasured confounding which could have impacted our results.

      (C) We agree that the positivity assumption is a key factor to consider when building comparable treatment cohorts. We also agree that it is the important to separately perform the analysis for either all patients with an indication for use of BPs and for other anti-osteoporosis medications, as we have done in our analysis of the Osteo-Dx-Rx cohort and Bone-Rx cohort, respectively. However, we did not have sufficient data, a priori, to determine whether BP users would be more similar in their risk of COVID-19 outcomes to non- users or to other users of anti-resorptive medications. In addition, we believe that this specific limitation does not negate our findings in the primary analysis for the following reasons: (1) ‘Type of Outcome’: the outcomes in this study are related to infectious disease and are not direct clinical outcomes of any known treatment benefits of BPs. The clinical benefits being assessed - impact of BP use on COVID-19-related outcomes - were essentially unknown at the time of the study data; this fact mitigates the impact of any violation of the positivity assumption; and (2) ‘Clinical Population’: after propensity score matching, both the BP user and the BP non-user group in the primary analysis mainly consisted of older females (90.1% female, 97.2% age>50), which is the main population with clinical indications for BP use. According to NCHS Data Brief No. 93 (April 2012) released by the CDC, ~75% and 95% of US women between 60-69 and 70-79 suffer from either low bone mass or osteoporosis, respectively, and essentially all women (and 70% of men) above age 80 suffer from these conditions, which often go undiagnosed (https://www.cdc.gov/nchs/data/databriefs/db93.pdf). Women aged 60 and older make up ~75% of our study population (Table 1). Although bone density measurements are not available for non- BP users in the matched primary cohort, there is a high probability that the incidence of osteoporosis and/or low bone mass in these patients was similar to the national average. This justifies the assumption that BP therapy was indicated for most non-BP users in the matched primary cohort. Arguably, for these patients the positivity assumption was not violated.

      (D) We will discuss in detail below the specific issues raised by the reviewer regarding our sensitivity analyses. In general we acknowledge that individual analytical and/or matching approaches may each have their own limitations, but the analyses performed herein were done to test in a systematic fashion the different critical threats to the validity of our initial results in the primary cohort analysis, which were based on a priori-defined methods and yielded a large and robust effect size. Thus, the individual sensitivity analyses should be considered in the greater context of the entire project.

      Specific comments (in order of manuscript):

      Methods:

      Line 158: it is unclear how the authors dealt with patients who died during the follow-up period. The wording suggests they were excluded which would be inappropriate.

      When this study was executed, we were unable to link the patient-level US insurance claims data with patient-level mortality data due to HIPAA concerns. Therefore, line 158 (now 177) defines continuous insurance coverage during the observation period as a verifiable eligibility criterion we used for patient inclusion. It was necessary to disqualify individuals who discontinued insurance coverage for a variety of reasons, e.g. due to loss or change of coverage, relocation etc., but our approach also eliminated patients who died. Appendix 3 (line 2449ff) describes methods we employed post hoc to assess how censoring due to death could have impacted our analyses. We discuss our conclusions from this post hoc analysis in the main text (lines 1053-1058) as follows: "An additional limitation is potential censoring of patients who died during the observation period, resulting in truncated insurance eligibility and exclusion based on the continuous insurance eligibility requirement. However, modelling the impact of censoring by using death rates observed in BP users and non-users in the first six months of 2020 and attributing all deaths as COVID-19-related did not significantly alter the decreased odds of COVID-19 diagnosis in BP users (see Appendix 3)."

      Why did the authors use CCI for propensity matching rather than the individual comorbid conditions? I presume using separate variables will improve the comparability of the cohorts. The authors discuss imbalances in comorbidities as a limitation but should rather have avoided this.

      CCI was the a priori approach defined at the study outset and was chosen due to the widespread use and understanding of this score. The general CCI score was originally planned for matching in order to have the largest possible study population since we did not know how many patients would meet all criteria as well as have an event of interest. After realizing we had adequate sample size to power matching using stricter criteria, we proceeded to perform subsequent sensitivity analyses on more stringently matched cohorts (sensitivity analysis 2).

      Line 301-10: it seems unnecesary to me to adjust for the given covariates while these were already used for propensity score matching (except comorbidities, but see previous comment). The manuscript doesn't give a rationale why did the authors choose for this 'double correction'.

      The following language was added to the methods section (lines 325-327): “Demographic characteristics used in the matching procedure were also included in the final outcome regressions to control for the impact of those characteristics on outcomes modelled.”

      The following language was added to the Discussion section regarding the potential limitations of our srudy (lines 1078-1085): “Another limitation in the current study is related to a potential ‘double correction’ of patient characteristics that were included in both the propensity score matching procedure as well as the outcome regression modelling, which could lead to overfitting of the regression models and an overestimation of the measured treatment effect. Covariates were included in the regression models since these characteristics could have differential impacts on the outcomes themselves, and our results show that the adjusted ORs were in fact larger (showing a decreased effect size) when compared to the unadjusted ORs, which show the difference in effect sizes of the matched populations alone.”

      In causal research a very important assumption is the 'positivity assumption', which means that none of the individuals has a probability of zero or one to be exposed. Including everyone would therefore not be appropriate. My suggestion is to include either all patients with an indication (based on diagnosis) or all that use an anti-osteoporosis (AOP) drug (or one as the primary and the other as the sensitivity analysis) instead of using these cohorts as sensitivity analyses. The choice should in my opinion be based on two aspects: whether it is likely that other AOP drugs have an effect on the COVID-19 outcomes and whether BP users are deemed to be more similar (in their risk of COVID-19 outcomes) to non-users or to other AOP drug users. Or alternatively, the authors might have discussed the positivity assumption and argue why this is not applicable to their primary analysis.

      The following text has been added to the Discussion section addressing potential limitations of our study (lines 987-1009): " Another potential limitation of this study relates to the positivity assumption, which when building comparable treatment cohorts is violated when the comparator population does not have an indication for the exposure being modelled 56. This limitation is present in the primary cohort comparisons between BP users and BP non-users, as well as in the sensitivity analyses involving other preventive medications. This limitation, however, is mitigated by the fact that the outcomes in this study are related to infectious disease and are not direct clinical outcomes of known treatment benefits of BPs. The fact that the clinical benefits being assessed – the impact of BPs on COVID-related outcomes – was essentially unknown clinically at the time of the study data minimizes the impact of violation of the positivity assumption. Furthermore, our sensitivity analyses involving the “Bone-Rx” and “Osteo-Dx- Rx” cohorts did not suffer this potential violation, and the results from those analyses support those from the primary analysis cohort comparisons. Moreover, we note that the propensity score matched BP users and BP non-users in the primary analysis cohort mainly consisted of older females. According to the CDC, ~75% and 95% of US women between 60-69 and 70-79 suffer from either low bone mass or osteoporosis, respectively (https://www.cdc.gov/nchs/data/databriefs/db93.pdf). Essentially all women (and 70% of men) above age 80 suffer from these conditions, which often go undiagnosed. Women aged 60 and older represent ~75% of our study population (Table 1). Although bone density measurements are not available for non-BP users in the matched primary cohort, there is a high probability that the incidence of osteoporosis and/or low bone mass in these patients was similar to the national average.Thus, BP therapy would have been indicated for most non-BP users in the matched primary cohort, and arguably, for these patients the positivity assumption was not violated."

      Sensitivity Analysis 3: Association of BP-use with Exploratory Negative Control Outcomes: what is the implicit assumption in this analysis? I think the assumption here is that any residual confounding would be of the same magnitude for these outcomes. But that depends on the strength of the association between the confounder and the outcome which needs not be the same. Here, risk avoiding behavior (social distancing) is the most obvious unmeasured confounder, which may not have a strong effect on other health outcomes. Also it is unclear to me why acute cholecystitis and acute pancreatitis-related inpatient/emergency-room were selected as negative controls. Do the authors have convincing evidence that BPs have no effect on these outcomes? Yet, if the authors believe that this is indeed a valid approach to measure residual confounding, I think the authors might have taken a step further and present ORs for BP → COVID-19 outcomes that are corrected for the unmeasured confounding. (e.g. if OR BP → COVID-19 is ~ 0.2 and OR BP → acute cholecystitis is ~ 0.5, then 'corrected' OR of BP → COVID-19 would be ~ 0.4.

      We appreciate the reviewer’s thoughtful comments regarding the differential strength of the association between unmeasured confounders and outcome. We had initially selected acute cholecystitis and pancreatitis-related inpatient and emergency room visits as negative controls because we deemed them to be emergent clinical scenarios that should not be impacted by risk avoiding behavior. However, upon further search, we identified several publications that suggest a potential impact of osteoporosis and/or BPs on gallbladder diseases (DOIhttps://doi.org/10.1186/s12876-014-0192-z; http://dx.doi.org/10.1136/annrheumdis-2017-eular.3900), thus calling the validity our strategy into question. We therefore agree that the designation of negative control outcomes is problematic and adds relatively little to the overall story. Therefore, we have removed these analyses from the revised manuscript.

      Sensitivity Analysis 4: Association of BP-use with Exploratory Positive Control Outcomes: this doesn't help me be convinced of the lack of bias. If previous researchers suffered from residual confounding, the same type of mechanisms apply here. (It might still be valuable to replicate the previous findings, but not as a sensitivity analysis of the current study).

      We agree that the same residual confounding in previous research papers could be present in our study. Nonetheless, it was important to assess whether our analysis would be potentially subject to additional (or different) confounding due to the nature of insurance claims data as compared to the previous electronic record-based studies. Therefore, it was relevant to see if previous findings of an association between BP use and upper respiratory infections are observable in our cohort.

      The second goal of sensitivity analysis #4 (now #3) was to see whether associations could be found on different sets of respiratory infection-based conditions, both during the time of the pandemic/study period as well as during the pre-pandemic time, i.e. before medical care in the US was significantly impacted by the pandemic. In light of these considerations, we feel that sensitivity analysis 4 adds value by showing consistency in our core findings.

      Sensitivity Analysis 5: Association of Other Preventive Drugs with COVID-19-Related Outcomes: Same here as for sensitivity analysis 3: the assumption that the association of unmeasured confounders with other drugs is equally strong as for BPs. Authors should explicitly state the assumptions of the sensitivity analyses and argue why they are reasonable.

      The following sentence was added to the Discussion section (lines 1019-1020): “ "These analyses were based on the assumption that the association of unmeasured confounders with other drugs is comparable in magnitude and quality as for BPs."

      Results: The data are clearly presented. The C-statistic / ROC-AUC of the propensity model is missing.

      Unfortunately, a significant amount of time has passed since execution of our original analysis of the Komodo dataset by our co-authors at Cerner Enviza. To date, our ability to perform follow-up studies with the Komodo dataset (which is exclusively housed on Komodo's secure servers) has become limited because business arrangements between these companies have been terminated, and the pertinent statistical software is no longer active. This issue prevents us from attaining the original C-statistic and ROC-AUC information, however, we were able to extract the actual; propensity scores themselves for the base cohort matching (BP-users versus non-users). The table below illustrates that the distribution of propensity scores for the base cohort match ranged from <0.01 to a max of 0.49, with 81.4% of patients having a propensity score of 10-49%, and 52.9% of patients having a propensity score of 20-49%. This distribution is unlikely to reflect patients who had a propensity score of either all 0 or all 1.

      Discussion:

      When discussing other studies the authors reduce these results to 'did' or 'did not find an association'. Although commonly practiced, it doesn't justify the statistical uncertainty of both positive and negative findings. Instead I encourage the authors to include effect estimates and confidence intervals. This is particularly relevant for studies that are inconclusive (i.e. lower bound of confidence interval not excluding a clinically relevant reduction while upper bound not excluding a NULL-effect).

      We appreciate the reviewer’s suggestion and have added this information on p.21/22 in the Discussion.

      Line 1145 "These retrospective findings strongly suggest that BPs should be considered for prophylactic and/or therapeutic use in individuals at risk of SARS-CoV-2 infection." I agree for prophylactic use but do not see how the study results suggest anything for therapeutic use.

      We have removed “and/or therapeutic use” from this sentence (line 1088-1090).

      The authors should discuss the acceptability of using BPs as preventive treatment (long-term use in persons without osteoporosis or other indication for BPs). This is not my expertise but I reckon there will be little experience with long-term inhibiting osteoblasts in people with healthy bones. The authors should also discuss what prospective study design would be suitable and what sample size would be needed to demonstrate a reasonable reduction. (Say 50% accounting for some residual confounding being present in the current study.)

      Although BPs are also used in pediatric populations and in patients without osteoporosis (for example, patients with malignancy), we do recognize the lack of long-term safety data in use of BPs as preventative treatments. We tried to partially address this concern in our sub-stratified analysis of COVID-19 related outcomes and time of exposure to BP. Reassuringly, we observed that patients newly prescribed alendronic acid in February 2020 also had decreased odds of COVID-19 related outcomes (Figure 3B), suggesting that the duration of BP treatment may not need to be long-term. This was further discussed in the last paragraph of our Discussion where we state that " BP use at the time of infection may not be necessary for protection against COVID-19. Rather, our results suggest that prophylactic BP therapy may be sufficient to achieve a potentially rapid and sustained immune modulation resulting in profound mitigation of the incidence and/or severity of infections by SARS- CoV-2."

      We agree that a future prospective study on the effect of BPs on COVID-19 related outcomes will require careful consideration of the study design, sample size, statistical power etc. However, we feel that a detailed discussion of these considerations is beyond the scope of the present study.

      The authors should discuss the fact that confounders were based on registry data which is prone to misclassification. This can result in residual confounding.

      Some potential sources of misclassification have been discussed on line 932-948. In addition, the following language was added (line 970-985): "Additionally, limitations may be present due to misclassification bias of study outcomes due to the specific procedure/diagnostic codes used as well as the potential for residual confounding occurring for patient characteristics related to study outcomes that are unable to be operationalized in claims data, which would impact all cohort comparisons. For SARS- CoV-2 testing, procedure codes were limited to those testing for active infection, and therefore observations could be missed if they were captured via antibody testing (CPT 86318, 86328). These codes were excluded a priori due to the focus on the symptomatic COVID-19 population. Furthermore, for the COVID-19 diagnosis and hospitalization outcomes, all events were identified using the ICD-10 code for lab-confirmed COVID-19 (U07.1), and therefore events with an associated diagnosis code for suspected COVID-19 (U07.2) were not included. This was done to have a more stringent algorithm when identifying COVID-19-related events, and any impact of events identified using U07.2 is considered minimal, as previous studies of the early COVID-19 outbreak have found that U07.1 alone has a positive predictive value of 94%55, and for this study U07.1 captured 99.2%, 99.0%, and 97.5% of all COVID-19 patient-diagnoses for the primary, “Bone-Rx”, and “Osteo-Dx-Rx” cohorts, respectively."

    1. Author Response:

      We thank the reviewers and editor for their feedback, which we will carefully consider as we revise the manuscript. We aim to provide more detail on how this technique could be used with other probes, ideally showing experimental data to support this use. We will add further detail of the histology from our ex vivo ovine and porcine and in vivo porcine testing. We will also provide a more thorough comparison of our technique to other recently developed lesioning techniques. In order to provide more complete evidence that our technique perturbs local neuron populations, we will refine the action potential analysis presented before and after lesions in non-human primates. In addition to providing further clarity of the method, we will include more non-human primate data where possible.

    1. Author Response:

      We are very glad that the reviewers found our paper of broad interest to the community of population, evolutionary, and ecological genetics. We thank them for their positive feedback and insightful comments and suggestions. We are preparing a revision of the preprint that will address these points. 

      One issue raised by the reviewers was that it is important to acknowledge possible limitations of the demographic model used in simulation in capturing different aspects of genomic variation. In particular, different demographic models inferred for the same species using different methods or sets of samples may have different strengths and weaknesses, and this should be considered when selecting a demographic model for simulation. This is an important point that we intend to discuss in the revised version of our manuscript. We also plan to expand the documentation of the stdpopsim catalog to include more information about  the type of data used to fit every demographic model. Below we provide an outline of our thoughts on the topic.

      First of all, it is important to acknowledge that demographic models inferred from genomic data cannot fully capture all aspects of the true demographic changes in the history of a species. As a result, these models do a good job in capturing some aspects of genetic variation, but not all of them. This is primarily determined by two factors: the method used for demographic inference, and the samples whose genomes were used in inference. Regardless of the method applied, the inferred demographic model can only reflect the genealogical ancestry of the sampled individuals, and this will typically make up a small portion of the complete genealogical ancestry of the species (albeit the genealogy of any set of sampled individuals includes many ancestors). Thus, demographic models inferred from larger sets of samples from diverse ancestry backgrounds may provide a more comprehensive depiction of genetic variation within a species, as long as a sufficiently realistic demographic model can be fit. That said, the choice of samples used for inference will mostly influence recent changes in genetic variation. This is because the genealogy of even a single individual consists of numerous ancestors in each generation in the deep past (which is the premise behind PSMC-style inference methods).

      The computational method used for inference also affects the way genetic variation is reflected by the demographic model, because different methods derive their inference from different features of genomic variation. Some methods make use of the site frequency spectrum at unlinked single sites (e.g., dadi, Stairway plot), while other methods use haplotype structure (e.g., PSMC, MSMC, IBDNe). This, in turn, may influence the accuracy of different features in the inferred demography. For example, very recent demographic changes, such as recent admixture or bottlenecks, are difficult to infer from the site frequency spectrum, but are more easily inferred by examining shared long haplotypes (as demonstrated by the demographic model inferred for Bos Taurus by MacLeod et al. (2013)). There have been several studies that compare different approaches to demography inference (e.g., Biechman et al. (2017); Harris and Nielsen (2013)), but unfortunately, there is currently no succinct handbook that describes the relative strengths and weaknesses of different methods. Indeed, we hope that the standardized simulations provided by stdpopsim will facilitate systematic comparisons between methods, which will, in turn, provide valuable insights for researchers when selecting demographic models for simulation.

      It is important to note that inclusion of a demographic model in the stdpopsim catalog does not involve any judgment as to which aspects of genetic variation it captures. Any model that is a faithful implementation of a published model inferred from genomic data can be added to the stdpopsim catalog. Thus, potential users of stdpopsim should use the implemented models with the appropriate caution, keeping in mind the limitations discussed above. Scientists contributing a new model to the catalog are required to write a brief summary, which is added to the documentation page of the catalog: https://popsim-consortium.github.io/stdpopsim-docs/ latest/catalog.html. This summary includes a graphical description of the model (such as the one shown for Anopheles gambiae in Fig. 2B of the paper), as well as a description of the data and method used for inference. We will mention this in the revised manuscript to help users of stdpopsim navigate through this resource.

    1. Author Response:

      First of all, we would like to thank the reviewers for their work. We appreciate the constructive review comments and useful suggestions to further improve our article.

      The main criticism on our manuscript, from both reviewers, is that the cryo-EM structures are of low resolution and that the fit of the crystallographic structures of the PAD and the stalk domain into these low-resolution structures is questionable. We would like to point out that the cryo-EM data, and the conclusions from it, are not essential for the main conclusions of the article. All mutants that we made in this study were designed based on the structural data obtained from the high-resolution X-ray structures, with no input from the low-resolution cryo-EM docked models. We chose to include the cryo-EM data since it allowed us to speculate about the interaction between the PAD and the stalk domain of PrgB, domains that we have separately determined the structures of via X-ray crystallography. We agree with the reviewers that further experiments are needed to verify this potential interaction. Therefore, we will perform additional biochemical assays to investigate the proposed interaction. We will also try to optimize the cryo-EM data to hopefully allow for a more reliable fit of our high-resolution crystallographic structures. Once that is done, we will submit a revised version of the manuscript.

      On behalf of all authors,

      Ronnie Berntsson

    1. Author Response:

      We’d like to thank the three reviewers for reviewing our work in depth and providing insightful comments and suggestions.

      Reviewer 1

      1. The in vivo efficacy of MS023 does not seem to be very great. The mice treated with MS023 display a very small reduction in ADMA levels and a small increase in SDMA levels (Fig S6A).

      REPLY: We have quantified proteins with ADMA and SDMA by Western blotting tail clippings from mice treated with vehicle (n=6) and MS023 (n=6). These were normalized for equal loading to b-actin levels. The average ADMA relative expression was 0.92 for vehicle treated mice and 0.86 for MS023 treated mice (p < 0.044). The average SDMA relative expression was 0.89 for vehicle treated mice and 0.98 for MS023 treated mice (p < 0.000019). These whole-body measurements show MS023 promotes the decrease of proteins with ADMA and increasing proteins with SDMA, as observed before with inhibition of PRMT1 (Dhar et al, 2013).

      Reviewer 2

      1. Two weaknesses are noted which lie in overstatements of the findings. There are six type I PRMTs (PRMT1, 2, 3, 6, 8, and CARM1), all of which are inhibited by MS023. While the authors demonstrate that their observations are not due to the inhibition of CARM1, they do not demonstrate that it is due to the inhibition of PRMT1, as they suggest. 

      REPLY: MS023 has been shown to have in vitro activity for several type I enzymes (Eram et al, 2016) and the same goes for GSK3368712 (Fedoriw et al, 2019). MS023 IC50 in vitro 30nM PRMT1, 119 nM PRMT3, 83 nM CARM1, 4 nM PRMT6, and 5 nM PRMT8 (Eram et al., 2016).  It was documented early that PRMT1 is the major cellular type I enzyme (Pawlak et al, 2000) and this is why PRMT1 and PRMT5, major type II, are embryonic lethal in mice (Guccione & Richard, 2019). In vivo data using MS023 is paralleled by using siPRMT1 (Gao et al, 2019; Plotnikov et al, 2020; Wu et al, 2022; Zhu et al, 2019). Thus in vivo, MS023 targets the main type I PRMT, PRMT1. Further, in support of our claim that MS023 targets PRMT1 in MuSCs is our previous observation that deleting PRMT1 stimulates MuSC proliferation. Since this effect was irreversible (Blanc et al, 2016) we pursued studies with the reversible MS023, the only compound to have significant activity towards PRMT1 in vivo. For these reasons, we are convinced that the effect of MS023 is mainly mediated by inhibiting PRMT1 in the MuSC.

      To be thorough we should test all other type I PRMT inhibitors as they become available. CARM1 was shown to be a player in MuSC (Kawabe et al, 2012), but we excluded it using a CARM1 inhibitor TP-064 (Nakayama et al, 2018). PRMT6 mice that we generated are perfectly viable without overt phenotypes, suggesting PRMT6 is not involved (Neault et al, 2012), and PRMT8 is brain specific (Taneda et al, 2007).

      2. Furthermore, this study suggests that the switch and elevated cellular metabolism in muscle stem cells due to MS023 enhanced self-renewal and engraftment capabilities but does not demonstrate this fact directly as stated. 

      REPLY: Agreed. The link between cellular metabolism and MS023 enhanced self-renewal and engraftment capabilities is correlative and we will edit the revised text to reflect this.

      Reviewer 3

      1. However, the proposed underlying mechanism, which is claimed to rely on the expansion of MuSC and 'reprograming' of MuSCs towards a "unique and previously uncharacterized identity" is not sufficiently supported. The extent of the description of scRNA-seq data is inappropriate. Some conclusions from the scRNA-seq data appear to be overinterpreted or are rather trivial.

      REPLY: We presented the top marker genes for each subpopulation that was identified in our scRNAseq to aid the reader in establishing a broad view of whether a given subpopulation was quiescent-like, proliferating, or differentiating. M1-M5 clusters were all enriched for cell cycle markers (Mki67, Cdk1, etc), indicating a proliferative identity. The unique finding in our data is that treatment with MS023 resulted in a shift in identity as compared to the DMSO-treated proliferating MuSCs (M1, M2 and M4), creating transcriptionally distinct M3 and M5 clusters. M3 and M5 had elevated markers for metabolism (E.g. Eno1, Atp5k, etc) and early activation (E.g. Fos, Jun), while the untreated MuSCs in clusters M1, M2 and M4 did not. Furthermore, M3 and M5 had higher baseline levels of Pax7 expression when compared to untreated cells. Together, these findings describe a transitional subpopulation of MuSCs unique to MS023 treatment which not only harbour stem like/early activation markers Pax7, Fos and Jun, but also elevated proliferative markers related to cell cycle and energy metabolism. This particular combination of characteristics is unique to the MS023-treated MuSCs, thus identifying a novel subtype of MuSC identity. In accordance with our scRNAseq data, we validated experimentally that MS023-treated cells have higher energy metabolism and increased self-renewal potential, thereby confirming that the unique transcriptomic signature of these cells also lead to a different cell fate decision.

      2. It remains completely unclear whether the MS023-stimulated increase of metabolic pathway activity (OXPHOS, glycolysis) plays any role for preserving stem cell properties of MuSC during expansion and improves engraftment. Additional functional and mechanistic studies are required to explore the underlying molecular processes.

      REPLY: Agreed. The link between cellular metabolism and MS023 enhanced self-renewal and engraftment capabilities is correlative and we will edit the revised text to reflect this.

      3. Furthermore, it remains completely unclear whether the acclaimed increase in grip and tetanic strength of mdx mice after MS023 treatment relies on enhanced expansion of MuSC mediated by PRMT1 inhibition. 

      REPLY: Agreed. We cannot exclude if the effect is mediated by an expansion of the MuSC pool or by an effect on other cell types, such as a direct impact on the myofibers. The goal of this figure was to provide a therapeutic perspective for the use of type I PRMT inhibitor for the treatment of DMD. Muscle wasting/weakness in DMD is a complex and multifactorial process (e.g., myofiber fragility, MuSC defects, chronic inflammation, fibrofatty accumulation). If MS023 can target multiple aspects of the physiopathology of the disease it would increase its therapeutic applicability. Further studies will be needed to determine the exact mechanism by which MS023 mediate its beneficial effect. The manuscript will be modified to reflect this.

      References

      • Blanc RS, Vogel G, Li X, Yu Z, Li S, Richard S (2016) Arginine methylation by PRMT1 regulates muscle stem cell fate. Mol Cell Biol 37: e00457-00416

      • Dhar S, Vemulapalli  V, Patananan AN, Huang GL, Di Lorenzo A, Richard S, Comb MJ, Guo A, Clarke SG, Bedford MT (2013) Loss of the major Type I arginine methyltransferase PRMT1 causes substrate scavenging by other PRMTs. Scientific reports 3: 1311

      • Eram MS, Shen Y, Szewczyk M, Wu H, Senisterra G, Li F, Butler KV, Kaniskan HU, Speed BA, Dela Sena C et al (2016) A Potent, Selective, and Cell-Active Inhibitor of Human Type I Protein Arginine Methyltransferases. ACS Chem Biol 11: 772-781

      • Fedoriw A, Rajapurkar SR, Brien SO, Gerhart SV, Lorna H, Pappalardi B, Shah N, Laraio J, Liu Y, Butticello M et al (2019) Anti-tumor activity of the first-in-class type I PRMT inhibitor, GSK3368715, synergizes with PRMT5 inhibition through MTAP loss. Cancer cell XX: XX

      • Gao G, Zhang L, Villarreal OD, He W, Su D, Bedford E, Moh P, Shen J, Shi X, Bedford MT et al (2019) PRMT1 loss sensitizes cells to PRMT5 inhibition. Nucleic acids research 47: 5038-5048

      • Guccione E, Richard S (2019) The regulation, functions and clinical relevance of arginine methylation. Nat Rev Mol Cell Biol 20: 642-657

      • Kawabe Y, Wang YX, McKinnell IW, Bedford MT, Rudnicki MA (2012) Carm1 regulates Pax7 transcriptional activity through MLL1/2 recruitment during asymmetric satellite stem cell divisions. Cell Stem Cell 11: 333-345

      • Nakayama K, Szewczyk MM, Dela Sena C, Wu H, Dong A, al. e (2018) TP-064, a potent and selective small molecule inhibitor of PRMT4 for multiple myeloma. Oncotarget 9: 18480-18493

      • Neault M, Mallette FA, Vogel G, Michaud-Levesque J, Richard S (2012) Ablation of PRMT6 reveals a role as a negative transcriptional regulator of the p53 tumor suppressor. Nucleic acids research 40: 9513-9521

      • Pawlak MR, Scherer CA, Chen J, Roshon MJ, Ruley HE (2000) Arginine N-Methyltransferase 1 Is Required for Early Postimplantation Mouse Development, but Cells Deficient in the Enzyme Are Viable. Mol Cell Biol 20: 4859-4869

      • Plotnikov A, Kozer N, Cohen G, Carvalho S, Duberstein S, Almog O, Solmesky LJ, Shurrush KA, Babaev I, Benjamin S et al (2020) PRMT1 inhibition induces differentiation of colon cancer cells. Scientific reports 10: 20030

      • Taneda T, Miyata S, Kousaka A, Inoue K, Koyama Y, Mori Y, Tohyama M (2007) Specific regional distribution of protein arginine methyltransferase 8 (PRMT8) in the mouse brain. Brain Res 1155: 1-9

      • Wu Q, Nie DY, Ba-Alawi W, Ji Y, Zhang Z, Cruickshank J, Haight J, Ciamponi FE, Chen J, Duan S et al (2022) PRMT inhibition induces a viral mimicry response in triple-negative breast cancer. Nature chemical biology 18: 821-830

      • Zhu Y, He X, Lin YC, Dong H, Zhang L, Chen X, Wang Z, Shen Y, Li M, Wang H et al (2019) Targeting PRMT1-mediated FLT3 methylation disrupts maintenance of MLL-rearranged acute lymphoblastic leukemia. Blood 134: 1257-1268

    1. Author Response

      Reviewer #2 (Public Review):

      1) The main limitation of this study is that the results are primarily descriptive in nature, and thus, do not provide mechanistic insight into how Ryr1 disease mutations lead to the muscle-specific changes observed in the EDL, soleus and EOM proteomes.

      An intrinsic feature of the high-throughput proteomic analysis technology is the generation of lists of differentially expressed proteins (DEP) in different muscles from WT and mutated mice. Although the definition of mechanistic insights related to changes of dozens of proteins is very interesting, it is a difficult task to accomplish and goes beyond the goal of the high-throughput proteomic analysis presented here. Nevertheless, the analysis of DEPs may indeed provide arguments to speculate on the pathogenesis of the phenotype linked to recessive RyR1 mutations. In the unrevised manuscript, we pointed out that the fiber type I predominance observed in congenital myopathies linked to recessive Ryr1 mutation are consistent with the high expression level of heat shock proteins in slow twitch muscles. However, as suggested by Reviewer 3, we have removed "vague statements" from the text of the revised manuscript, concerning major insights into pathophysiological mechanisms, since we are aware that the mechanistic information, if any, that we can extract from the data set, cannot go over the intrinsic limitation of the high-throughput proteomic technology.

      b) Results comparing fast twitch (EDL) and slow twitch (soleus) muscles from WT mice confirmed several known differences between the two muscle types. Similar analyses between EOM/EDL and EOM/soleus muscles from WT mice were not conducted.

      We agree with the point raised by the Reviewer. In the revised manuscript we have changed Figure 2. The new Figure 2 shows the analysis of differentially expressed proteins in EDL, soleus and EOMs from WT mice. We have also added 2 new Tables (new Supplementary Table 2 and 3) and have inserted our findings in the revised Results section (page, 7, lines 157-176, pages 8 and 9).

      c) While a reactome pathway analysis for proteins changes observed in EDL is shown in Supplemental Figure 1, the authors do not fully discuss the nature of the proteins and corresponding pathways impacted in the other two muscle groups analyzed.

      We have now included in the revised manuscript a new Figure 2 which includes the Reactome pathway analysis comparing EDL with soleus, EDL with EOM and soleus with EOM (panels C, F and I, respectively). We have also inserted into the revised manuscript a brief description of the pathways showing the greatest changes in protein content (page 7 line 156-175, pages 8 and 9). We agree that the data showing changes in protein content between the 3 muscle groups of the WT mice are important also because they validate the results of the proteomic approach. Indeed, the present results confirm that many proteins including MyHCIIb, calsequestrin 1, SERCA1, parvalbumin etc are more abundantly expressed in fast twitch EDL muscles compared to soleus. Similarly, our results confirm that EOMs are enriched in MyHC-EO as well as cardiac isoforms of ECC proteins. This point has been clarified in the revised version of the manuscript (page 8, lines 198-213; page 9 lines 214-228). Nevertheless, we would like to point out that the main focus of our study is to compare the changes of protein content induced by the presence of recessive RyR1 mutations.

      Reviewer #3 (Public Review):

      a) it would be useful to determine whether changes in protein levels correlated with changes in mRNA levels …….

      We performed qPCR analysis of Stac3 and Cacna1s in EDL, Soleus and EOM from WT mice (see Figure 1 below). The expression of transcripts encoding Cacna1s and Stac3 is approximately 9-fold higher in EDL compared to Soleus. The fold change of Stac3 and Cacna1s transcripts in EDL muscles is higher compared to the differences we observed by Mass spectrometry at the protein level between EDL and Soleus. Indeed, we found that the content of the Stac3 protein in EDL is 3-fold higher compared to that in soleus. Although there is no apparent linear correlation between mRNA and protein levels, we believe that a few plausible conclusions can be drawn, namely: (i) the expression level of both transcripts and proteins is higher EDL compared to EOM and soleus muscles, respectively, (ii) the expression level of transcripts encoding Stac3 correlate with those encoding Cacan1s and confirm proteomic data. In addition, the level of Stac3 transcript does not changes between WT and dHT, confirming our proteomic data which show that Stac3 protein content in muscles from dHT is similar to that found in WT littermates. Altogether these results support the concept that the differences in Stac3 content between EDL and soleus occur at both the protein and transcript levels, namely high Stac3 mRNA level correlates with higher protein content (EDL) and low mRNA levels correlated with low Stac3 protein content in Soleus muscles (see Figure 1 below).

      Figure 2: qPCR of Cacna1s and Stac3 in muscles from WT mice. The expression levels of the transcripts encoding Cacna1s and Stac3 are the highest in EDL muscles and the lowest in soleus muscles (top panels). There are no significant changes in their relative expression levels in dHT vs WT. Each symbol represents the value from of a single mouse. * p=0.028 Mann Whitney test qPCR was performed as described in Elbaz et al., 2019 (Hum Mol Genet 28, 2987-2999).

      ….and whether or not the protein present was functional, and whether Stac3 was in fact stoichiometrically depleted in relation to Cacna1s.

      We thought about this point but think that there are no plausible arguments to believe that Stac3 is not functional, one simple reason being that our WT mice do not have a phenotype which would be associated with the absence of Stac3 (Reinholt et al., PLoS One 8, e62760 2013, Nelson et al. Proc. Natl. Acad. Sci. USA 110:11881 2013).

      b) In the abstract, the authors stated that skeletal muscle is responsible for voluntary movement. It is also responsible for non-voluntary. The abstract needs to be refocused on the mutation and on what we learn from this study. Please avoid vague statements like "we provide important insights to the pathophysiological mechanisms..." mainly when the study is descriptive and not mechanistic.

      The abstract of the revised manuscript has been rewritten. In particular, we removed statements referring to important “pathophysiological mechanistic insight”.

      c) The author should bring up the mutation name, location and phenotype early in the introduction.

      In the revised manuscript we provide the information requested by the Reviewer (page 2 lines 36-38 and page 4, lines 98-102).

      d) This reviewer also suggests that the authors refocus the introduction on the mutation location in the 3D RyR1 structure (available cryo-EM structure), if there is any nearby ligand binding site, protomers junction or any other known interacting protein partners. This will help the reader to understand how this mutation could be important for the channel's function

      The residue Ala4329 is present inside the TMx (Auxiliary transmembrane helices) domain which spans from residue 4322 to 4370 and interposes structurally (des Georges A et al. 2016 Cell 167,145-57; Chen W, et al. 2020 EMBO Rep. 21, e49891). Although the structural resolution of the region has been improved (des Georges et al, 2016), parts of the domain still remain with no defined atomic coordinates, especially the region encompassing a.a. E4253 – F4540. Because of such undefined atomic coordinates of the region E4253-F4540, we are not able to determine the real orientation and the disposition of the amino acids in this region, including the A4329 residue. As reference, structure PDB: 5TAL of des Georges et al, 2016 was analyzed with UCSF Chimera (production version 1.16) (Pettersen et al. J. Comput. Chem. 25: 1605-1612. doi: 10.1002/jcc.20084).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes a relatively novel approach to discovering combinations of herbal medications that may help modulate immune responses, and in turn help treat diseases such as cancer. The authors use breast plasma call mastitis as a disease in which they present results from a non-blinded clinical trial with modest results. The main shortcomings are a lack of rigor around standardizing the control group given steroids versus the treatment group given the combinations of herbal medications. There needs to be a detailed statistical analysis of the comparison in tumor size, stage, invasiveness, etc. as well as consideration of confounding disease states (autoimmune disease, prior cancers, diabetes, etc.). While the results are interesting in that the use of herbal medications is often overlooked in Western medicine, the manuscript needs great detail in the clinical comparison in order to provide convincing evidence for an effect.

      Many thanks for your very kind words about our work. We are excited to hear that you think our manuscript is relatively novel with considerable translational impact to the field of herbal medications. We are grateful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

      Reviewer #2 (Public Review):

      The work is rather interesting and novel because for the first time, the authors employed knowledge graph, a cutting-edge technique in the domain of artificial intelligence, to identify a novel herbal drug combination for the treatment of PCM. The results of the clinical trial study clearly demonstrated that the drug combination is effective to ameliorate the symptoms of PCM patients and improve the general health status of the patients. Overall, the strategy of this manuscript may provide a paradigm for the design of drug combination towards many other human disorders.

      We are truly grateful for your very kind words about our work. It is very encouraging to know that you think our work is novel and of significance for the field. We sincerely appreciate the valuable time and kind efforts that you have spent on the thorough review of our manuscript.

      Reviewer #3 (Public Review):

      The major merit of the manuscript is that the authors introduced the concept of knowledge graph into the domain of herbal drugs or TCM. Namely, the authors designed a knowledge graph towards systematic immunity or immunotherapy based on massive data mining techniques. The authors successfully identified an herbal drug combination for PCM with the help of a scoring system. Moreover, the authors conducted a clinical trial study and the clinical data showed that the herbal drug combination holds great promise as an effective treatment for PCM. The weakness of the manuscript is that some details for the herbal drug combination and the clinical trial study are missing.

      Many thanks for your very kind words about our work. We are excited to hear that you think our work is relatively novel and holds great promise as an effective remedy for PCM. We are truly thankful for your valuable time and efforts you have spent to provide your very insightful comments, which are of great help for our revision.

    1. Author Response

      Reviewer #1 (Public Review):

      After giving a very accessible introduction to cellular processes during brain development, the authors present the computational model used in this study. It combines the kinematics of cell proliferation with the mechanic of brain tissue growth and is essentially equal to their model presented in Zarzor et al (2021), but extended for the outer subventricular zone (OSVZ), see for example Figs. 2 in the present manuscript and in Zarzor et al (2021). This zone, which is specific to humans, provides a second zone of cell proliferation. The division rate in the OSVZ is smaller and at most equal to that in the ventricular zone.

      The authors present two main findings: The distance between sulci in the cortex is decreased whereas the cell density in the ventricular zone is increased in presence of the OSVZ. Furthermore, the "folding evolution", which is the ratio between the outer perimeter at time t and the initial perimeter increases in presence of the OSVZ. The strongest effect is seen, when division rates in both proliferating zones are equal. The authors compare the cases of varying and constant cortical stiffness, which they had also done in Zarzor et al (2021). Finally, they consider the feedback of cortical folding on OSVZ thickness.

      The computational model provides a sound description of how cell proliferation and migration combined with tissue mechanics yield cortical folding patterns. However, only a few parameter values are varied in a limited range. Also, it remains unclear to me, how important the specific functional dependencies of, for example, the cell division rate on the radial coordinate are. This point seems of particular importance because the effect of the presence of the OSVZ on the folding patterns seems rather minute, see Fig. 5. The authors do not propose experiments that could be used to test their description and results. Finally, the analysis is restricted to 2 dimensions.

      Thank you very much for the valuable suggestions. We agree that we are only able to show limited parameter studies in the manuscript. Therefore, we have now implemented a user interface that can be downloaded from Github (https://github.com/SaeedZarzor/BFSimulator) and will allow interested readers to directly change the parameter values and run the simulations.

      To better emphasize the effect of the presence of the OSVZ on the folding patterns, we have edited the corresponding section and figure in the revised manuscript to include a quantification of the distance between sulci:

      “In general, the distance between neighboring sulci decreases with increasing Gosvz, as marked in Figure 7. For the displayed cases, the distance decreases from d = 8.796 mm for Gosvz = 0 to d = 8.67 mm for Gosvz = 10 and finally d = 8.2 mm for Gosvz = 20. Interestingly, the cortical thickness and effective stiffness ratio at the first instability point (denoted by w in Figure 5) are the same for all these cases. Therefore, we attribute the observed differences to the faster increase in the cell density and thus cortical growth, cortical stiffness and the effective stiffness after the instability has been initiated.”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Reviewer #2 (Public Review):

      Weaknesses

      • To account for the complexity of biological phenomena, the model relies on a large number of ad hoc choices whose consequences are difficult to predict.

      We fully agree that there are quite a number of model assumptions that we have to make. Still, we have achieved great agreement with the data from fetal brain sections, which in our opinion justified the assumptions made.

      To better explain the choice of parameters, we have now included the following paragraph in the manuscript: “The mechanical and diffusion parameters are adapted from the literature Budday et al. (2020); de Rooij and Kuhl (2018), while the geometry parameters are estimated based on histologically stained human brain sections and magnetic resonance images. For instance, to determine the MST factor, we measured the relative distance between the ISVZ and OSVZ in histologically stained images. The final value adopted is the result of dividing the measured distance by the expected time. When determining the growth problem parameters, numerical stability and algorithm convergence were major criteria.”

      • The physical model description is highly technical and out of reach for a non-specialist.

      Thank you for making this point! We have now adapted the model description to better emphasize the main features of the model and the feedback mechanisms between the mechanical growth problem and the cell density problem:

      “...is the Cauchy stress tensor formulated in terms of the elastic deformation tensor, as only the elastic deformation induces stresses. The Cauchy stress describes the three dimensional stress state in the spatial (grown and deformed) configuration and is computed by deriving the strain energy function…”

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex (as the cortical stiffness changes while the subcortical stiffness remains constant) and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value G_vz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “G^s_osvz is the division rate in the OSVZ that decreases with increasing maximum stretch s in the domain”

      • The description of neurogenesis shows three zones of cell proliferation, each inhabited by a specific cell type. Despite its realism, the proposed model does not take into account the ISVZ where the intermediate progenitors operate.

      Indeed, in our model we have focused on two original sources of the cells which are radial glial cells and ORGCs. As we know so far, the intermediate progenitor cells are produced from those two cell types, so they are indirectly included in the model as a resulting cell density.

      • The experiment of comparing several regimes derived from the relative importance of proliferation in the VZ and OSVZ is not very clear. It leads to the observation of the evolution of cell density maxima over time, which seems insufficient to conclude the importance of the OSVZ for folding. One wonders whether the key parameter that leads to folding is the rate of OSVZ proliferation or simply the total quantity of neurons generated by the two or even the three zones.

      Thank you for this remark. We fully agree with the Reviewer that a key factor is the total quantity of neurons generated. However, the major question we intend to address here is where these neurons originate from and how the different proliferating zones interact. In other words, we do not question the existence of the OSVZ, but we are trying to build a computational model that can mimic all relevant cellular processes during brain development - to then study their individual effect on cortical folding. Therefore, we do not argue that the OSVZ is necessary for folding, but that it plays a crucial role in the speed of generating these folds and their complexity in the Conclusion section:

      “Our results show that the existence of the OSVZ particularly triggers the emergence of secondary mechanical instabilities leading to more complex folding patterns. Furthermore, the proliferation of outer radial glial cells (ORGCs) reduces the time required to induce the mechanical instability and thus cortical folding.”

      • The experiment on the heterogeneity of proliferation in the OSVZ is a bit frustrating. I would like to see a set-up corresponding to the mosaics found in ferrets and closely associated with folding patterns.

      This is a valuable point, thank you! We have now added new results showing a more distinct regional variation of the OSVZ and have adapted our conclusions regarding this point:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

      “Finally, our simulations reveal that inhomogeneous cell proliferation patterns in the OSVZ can control the location of first gyri and sulci but do not necessarily affect the distance between sulci and the overall complexity of the emerging folding pattern.”

      Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with the user interface shown below is now updated in the Data availability section.

      • It would be interesting to elaborate a little on the possibility of extending the model in 3D, which seems imperative to evaluate the nature of the folding pattern generated. Comparing them to reality is an essential step in gauging the credibility of the model. For instance, it would be interesting to test to which extent the model can father the type of variability observed in the general population (Mangin et al.). It will also be particularly interesting to work on the inverse model between the real folding patterns and the heterogeneous proliferation maps that can generate them.

      We fully agree with the Reviewer. Unfortunately, to the best of the Author’s knowledge, there is currently no data set providing both the 3D evolution of the folding pattern and the corresponding distribution of the cell density. Therefore, the validation of 3D results is difficult. Promisingly, our model achieved good agreement with data from histologically stained fetal brain sections regarding the local gyrification index, final cortical thickness, and cell density distribution, as presented in Zarzor, et al (2021). We have indeed initiated the collection of additional data, ideally for the 3D validation. However, this will take some time and is out of the scope of the current work. It is also a great suggestion to compare our 3D simulation results with the variability found in the general population. Indeed, we plan to do such work in the future but consider this out of the scope of the current work, which focuses more on the OSVZ.

      To still show that our model can be extended to 3D, we have now included the following results: “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone G_vz = 600, the folding complexity increases with increasing initial division rate in the OSVZ G_osvz.”

      Reviewer #3 (Public Review):

      Zarzor et al. developed a new multifield computational model, which couples cell proliferation and migration at the cellular level with biological growth at the organ level, to study the effect of OSVZ on cortical folding. Their approach complements the classical experimental approach in answering open questions in brain development. Their simulation results found the existence of OSVZ triggers the emergence of secondary mechanical instabilities that leads to more complex folding patterns. Also, they found that mechanical forces not only fold the cortex but also deepen subcortical zones as a result of cortical folding. Their physics-based computational modeling approach offered a novel way to predictively assess the links between cellular mechanisms and cortical folding during early human brain development, further shedding light on identifying the potential controlling parameters for reverse brain study.

      Strengths:

      The newly developed physics-based computational model has several advantages compared to previous existing computational brain models. First, it breaks the traditional double-layer computational brain model, gray matter layer and white matter layer, by introducing the outer subventricular zone. Second, it develops multiscale computational modeling by bringing the cellular level features, cell diffusion, and migration, into the macroscale biological growth model. Third, it could provide a cause-effect analysis of cortical folding and axonal fiber development. Finally, their approach could complement, but not substitute, sophisticated experimental approaches to answer some open questions in brain science.

      Weaknesses:

      The cellular diffusion and migration seem determined and controlled by a single variable, cell density, which is one-way coupled with the deformation gradient of the brain model. However, cell migration and diffusion should be potentially coupled with stress and vice versa. Also, the current computational model can be improved by extending it to a 3D model. Finally, they can further improve the study of regional proliferation variation by introducing fully-randomized heterogenous cell density and growth in their model.

      Thank you. We apologize for the lack of clarity in the original submission. There are indeed more coupling mechanisms, which we have now better emphasized when introducing the model:

      “Through Equation 6, the cell density problem controls the effective stiffness ratio between cortex and subcortex and thus also the emerging cortical folding pattern Budday et al. 2014; Zarzor et al. 2021.”

      “Through Equation 8, the amount of growth is directly related to the cell density - the higher the cell density, the more growth.”

      “The vector n represents the normalized orientation of radial glial cell fibers in the spatial configuration and controls the migration direction of neurons. As the brain grows and folds, the fiber direction changes. Through this feedback mechanism, the mechanical growth problem affects how neurons migrate and the cell density evolves locally.”

      “By applying Equation 16 for the VZ, we ensure that the division rate decreases from its initial value Gvz to a smaller value as the maximum stretch value s in the domain increases, i.e., with increasing gestational age. This constitutes an additional feedback mechanism between the mechanical growth problem and the cell density problem: As the maximum stretch and thus the deformation increases due to constrained cortical growth, the division rate in the VZ decreases, resulting in less newborn cells” and “Gosvzs is the division rate in the OSVZ that again decreases with increasing maximum stretch s in the domain”

      In addition, we have added a new figure to show that the observed trends also hold true for 3D simulations:

      “Figure 8 demonstrates that the observed trends also hold true when extending the model to 3D. For the case of varying stiffness with a stiffness ratio of 3, a growth ratio of 3, and an initial division rate in the ventricular zone Gvz = 600, the folding complexity increases with increasing initial division rate in the OSVZ Gosvz.”

      Finally, we have added new results showing a more distinct regional variation of the OSVZ. Furthermore, in our code, we have added a user interface with multiple options for different OSVZ regional variations. The link to the code with user interface is available in the paper:

      “Also in the ferret brain, where a region close in structure to the primate's OSVZ was found, this region shows a unique mosaic-like structure Fietz et al. (2010b); Reillo and Borrell (2012). In this section, we aim to assess the effect of regional proliferation variations in the OSVZ on the emerging cortical folding pattern. We discuss two different heterogeneous patterns here, but have included more variations online through our user interface on GitHub, as described in the Data availability section. In the first case, the OSVZ division rate gradually decreases along the circumferential direction. In the second case, the division rate varies in a more random pattern. Figures 13 and 14 show how cortical folds develop in both cases for the varying cortical stiffness case, a division rate in the VZ of G_vz = 120 and an initial division rate in the OSVZ of G_osvz = 20. As expected, the evolving folding patterns slightly differ. In both cases, the first folds appear, where the cell proliferation rate is highest. Expectedly, those regions also show a higher cell density in the cortex than regions nearby. However, both cases lead to final patterns with similar distances between sulci and folding complexity (one period doubling pattern). In addition, gyri and sulci are distributed equally -- regardless of the division rate. Therefore, we may conclude that inhomogeneous cell proliferation in the OSVZ controls the location of first gyri and sulci but does not necessarily affect the distance between sulci (also referred to as folding wavelength) and the overall complexity of the emerging folding pattern. This agrees well with our previous finding that the characteristic wavelength of folding remains relatively stable for inhomogeneous cortical growth patterns Budday and Steinmann (2018). The simulation results are also consistent with the previously found remarkable surface expansion above the regions with higher proliferation in the OSVZ Llinares-Benadero and Borrell (2019).”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors push a fresh perspective with a sufficiently sophisticated and novel methodology. I have some remaining reservations that concern the actual make-up of the data basis and consistency of results between the two (N=16) samples, the statistical analysis, as well as the “travelling” part.

      I previously commented on the fact that findings from both datasets were difficult to discern and more effort should be made to highlight these. Also, a major conclusion “the directionality effect [effect of attention on forward waves] only occurs for visual stimulation” only rested on a qualitative comparison between studies. The authors have improved on this here, e.g., by toning down this conclusion. One thing that is still missing is a graphical representation of the data from Foster et al. (the second dataset analysed here) that would support the statistical results and allow the reader a visual comparison between the sets of findings.

      We are glad that the reviewer recognizes the improvement in the presentation of the conclusions. According to the suggestions, we have modified figure 2, not only by including a third dataset (see point below), but also in a way that allows a direct comparison between the three datasets. Specifically, the results from the three datasets are now shown in three columns next to each other. The first row shows the FW and BW waves in contra and ipsilateral lines of electrodes for each dataset: our dataset and the one from Feldmann-Wustefeld and colleagues (the first and the second column in the figure, both with visual stimulation) shows a clear interaction between direction and laterality, as confirmed by the statistical analysis. The dataset from Foster and colleagues (the third column, no visual stimulation) shows a laterality effect only in the backward waves but not in the forward ones, in line with the hypothesis that FW waves are modulated only in the presence of visual stimulation. The second row shows a schematic representation of the task, and the third row illustrate the electrodes’ lines used in each dataset. We hope the reviewer will be satisfied with the current data presentation.

      Also, for any naive reader, the concept of travelling waves may be hard to grasp in the way data are currently presented - only based on the results of the 2D-FFT. Can forward and backward-travelling waves be illustrated in a representative example to make this more intuitive?

      We thank the reviewer for the suggestion. We included in figure 1 an additional panel E that represents a schematic example of forward and backward waves in the temporal domain (i.e., in the EEG data). We hope this example will provide a better understanding of the data and the traveling wave concept.

      Finally, the way Bayes Factors from the Bayesian ANOVA are presented, especially with those close to the ‘meaningful boundaries’ ⅓ and 3, as defined in the ‘Statistical analysis’ section, requires some unification/revision. For example, here: “We found a positive correlation between contra- and ipsi- lateral backward waves, and occipital (all Pearson’s r~=0.4, all BFs 10 ~=3) and -to a smaller extent- frontal areas (all Pearson’s r~=0.3, all BFs 10 ~=2).”, where the second part should strictly be labelled as inconclusive evidence. In the same vein, there is occasional mention of “negative effects”, where it should say that evidence favours the absence of an effect.

      We agree with the reviewer and apologize for the inaccuracies in reporting the statistical analysis. We corrected as suggested (see below), replacing ‘negative effects’ with ‘evidence favors the absence of an effect’.

      From the updated manuscript :

      "We found moderate evidence of a positive correlation between contra- and ipsi- lateral backward waves, and occipital (all Pearson’s r~=0.4, all BFs10~=3) but inconclusive evidence in the frontal areas (all Pearson’s r~=0.3, all BFs10~=2)."

      From the revised ‘Results’ section, now it reads:

      […] whereas all other factors and their interactions revealed evidence in favor of the absence of an effect (BFs10<0.3).

      […] but not in the forward waves (BF10=0.231, error<0.01%, supporting evidence in favor of the absence of an effect).

      Reviewer #2 (Public Review):

      The present manuscript takes a new perspective and investigates the functional relevance of traveling alpha waves’ direction for visual spatial attention. While the modulation of alpha oscillatory power - and especially the lateralization of alpha power - has been associated with spatial attention in the literature, the present investigation offers a new perspective that helps understand and differentiate the functional roles of alpha oscillations in the ipsi- versus contralateral hemisphere for spatial attention.

      The present study uses a straightforward approach and provides an analysis of two EEG datasets, which are convergingly in line with the authors’ claim that two patterns of travelling alpha waves need to be differentiated in visual spatial attention. First, backward waves in the ipsilateral hemisphere, and second, forward waves in the contralateral hemisphere, which are only observed during visual stimulation. Importantly, the authors test the relation of these patterns of traveling waves to the overall power of alpha oscillations and to the hemispheric lateralization of alpha power. Furthermore, to test the functional significance, the authors demonstrate that the pattern of forward and backward waves around stimulus onset differentiates between hits and misses in task performance.

      Although the results are in line with the conclusions drawn, some questions remain. The authors investigate the relationship between traveling alpha waves and the hemispheric lateralization of alpha power, which is a well-established neural signature of spatial attention. Surprisingly, the lateralization of alpha power shown in Figure 3B appears relatively weak in the present dataset (by visual inspection), which raises the question of whether the investigation of a relation between lateralized alpha power and alpha traveling waves is warranted in the first place.

      We agree with the reviewer that the effect seems reduced compared to other studies, despite the topography of alpha-band lateralization in our data is in line with the literature. In order to quantify the effect, we performed an analysis similar to (Thut et al., 2006), defining a laterality index as:

      We computed such index for occipital electrodes and their average (in red in figure R1). The results reveal that for most electrodes, including their average, the laterality index is significantly larger than 0, confirming the presence of alpha-band lateralization. However, we also note that the amplitude of the effect (~0.04) is reduced compared to the study by Thut and colleagues, which was between 0.05 and 0.10.

      Figure R1 – Laterality index for occipital electrodes, quantifying alpha-band lateralization during attention allocation. All electrodes go in the expected direction, revealing an increase of alpha-band power in the ipsilateral occipital hemisphere.

      Furthermore, the authors employ between-subject correlations (with N = 16) to test the relationship between alpha traveling waves and (lateralized) alpha power. However, as inter- individual differences in patterns of travelling waves are not the main focus here, within- subject analyses of the same relations would be able to test the authors’ hypotheses much more directly.

      As suggested, we included the recommended within-subject analysis in the revised manuscript by computing a trial-by-trial correlation between alpha power and traveling waves for each participant. First, we obtained a correlation coefficient and a p-value for each subject. Then, we tested whether the correlation coefficients had an overall positive or negative distribution (i.e., according to our previous results, we expected a positive correlation between backward waves and alpha power). Additionally, we combined the p-values to test for overall significance (using the Fisher method, see Methods section below). Our results corroborate the between-subject correlation, supporting the conclusion that alpha-band power correlates mostly with backward waves (especially contro-lateral to the attended location). The other correlations (i.e., forward waves and alpha power) were statistically inconclusive. We included in the revised manuscript these new results, as shown in the following.

      From the Results section:

      “To further investigate the relation between alpha-band travelling waves and alpha power, we performed the same analysis focusing on the correlation within each participant. In particular, we correlated trial-by-trial forward and backward waves with alpha-band power for each subject, obtaining correlation coefficients ‘r’ and their respective p-values. As in the previous analysis, we correlated forward and backward waves with frontal and occipital electrodes in both contro- and ipsilateral hemispheres. We applied the Fisher method (Fisher, 1992, see Methods for details) to combine all subjects' p-values in every conditions. Overall, we found a significant effect of all combined p-values (p<0.0001), except in the lateralization condition (contra- minus ipsilateral hemisphere), similar to our previous analysis. Additionally, we tested for a consistent positive or negative distribution of the correlation coefficients. As shown in figure 3C, the results support a significant correlation between backward waves and alpha- power in the hemisphere contralateral to the attended location (BF10=10.7 and BF10=7.4 for occipital and frontal regions, respectively; all other BF10 were between 1 and 2, providing inconclusive evidence). Interestingly, this analysis also revealed a small but consistent effect in the correlation between lateralization effects, as we reported a consistently positive correlation in the contra- minus ipsilateral difference between forward waves and alpha power (BF10~5 for both frontal and occipital electrodes). However, it’s important to notice that the combined p-values obtained using the Fisher method did not reach the significance threshold in the lateralization condition, reducing the relevance of this specific result.“

      From the Methods section:

      “Additionally, we computed trial-by-trial correlations between waves and alpha power for all participants. First, we tested the correlation coefficient against zero in all conditions. Then, we obtained a combined p-value per condition using the log/lin regress Fisher method (Fisher, 1992), as shown in (Zoefel et al., 2019). Specifically, we computed the T value of a chi- square distribution with 2*N degrees of freedom from the pi values of the N participants as:

      It needs to be appreciated that the authors analyze two datasets in the present study. However, the question remains whether the absence of the forward waves effect in paradigms without visual stimulation is a general one and would replicate in other datasets. Moreover, the manuscript would benefit from a discussion of the potential implications of traveling waves for functional connectivity between posterior and anterior regions.

      We have now included a third dataset in the paper. In this dataset, from (Feldmann-Wüstefeld & Vogel, 2019), participants performed a visual working memory task by attending either the left or the right side of the screen where a stimulus was displayed. We analyzed the amount of waves during stimulus presentation, and we found the same results as in our own dataset: very strong evidence in favor of an interaction between LATERALITY (contra- and ipsilateral) and DIRECTION (FW and BW). We now included the results in figure 2 (see point above) and in the results section of the manuscript. Unfortunately, we couldn't find any other publicly available EEG dataset in which participants attend to either side of the screen without ongoing visual stimulation.

      In addition, we re-analyzed our main findings (i.e. the interaction between LATERALITY and DIRECTION) in all three datasets using a classic ANOVA to report the effect size as 𝜂2 (see point above). Unlike the Bayesian ANOVA (which -in JASP- is based on linear mixed models), the classic one does not model the slope of the random effects. Yet, we observed that the LATERALITY x DIRECTION interaction in the Foster dataset proved very significant, with a large effect size (F(1,16)=9.81, p=0.003, 𝜂2=0.13). Supposedly, modeling the slope of the random effects in the Bayesian ANOVA lowered its statistical sensitivity. For the sake of completeness, we reported both results in the manuscript.

      Concerning the potential implications of traveling waves on functional connectivity, we consider the interpretation based on the Predictive Coding scheme in the one before the last paragraph of the discussion (reported below for the reviewer’s convenience). In this framework, top-down connections have inhibitory functions, suppressing the predicted activity in lower regions. These interpretations align with our findings, relating the inhibitory role of backward travelling waves to visual attention. Similarly, in the same paragraph, we refer to the work of Spratling, which extensively investigates the relationship between selective attention and Predictive Coding.

      From the Results section:

      "To confirm our previous results, we replicated the same traveling waves analysis on two publicly available EEG datasets in which participants performed similar attentional tasks (experiment 1 of Foster et al., 2017 and experiment 1 of Feldmann-Wüstefeld and Vogel, 2019). In the first experiment from the Feldmann-Wüstefeld and Vogel dataset, participants were instructed to perform a visual working memory task in which, while keeping a central fixation, they had to memorize a set of items while ignoring a group of distracting stimuli. We focused our analysis on those trials in which the visual items to remember were placed either to the right or the left side of the screen, while the distractors were either in the upper or lower part of the screen (we pulled together the trials with either 2 or 4 distractors, as this factor was irrelevant for the purposes of our analysis). The stimuli were shown for 200ms, and we computed the amount of forward and backward waves in the 500ms following stimulus onset. As shown in figure 2 (central column), the analysis confirmed our previous results, demonstrating a strong interaction between the factors DIRECTION and LATERALITY (BF10=667, error~2%; independently, the factors DIRECTION and LATERALITY had BF10=0.2 and BF10=0.4, respectively). These results confirmed that, in the presence of visual stimulation, spatial attention modulates both forward and backward waves. Next, we analyzed another publicly available dataset from Foster et al., 2017. [...]"

      "Remarkably, as shown in figure 2 (right panel), our analysis demonstrated an effect of the lateralization (LATERALITY: BF10=3.571, error~1%), revealing more waves contralateral to the attended location, but inconclusive results regarding the interaction between DIRECTION and LATERALITY (BF10=2.056, error~1%). However, using a classical ANOVA (i.e., without modeling the slope of the random terms), the interaction between DIRECTION and LATERALITY proved significant (F(1,16)=9.81, p=0.003, 𝜂2=0.13)."

      From the Methods section:

      "We included two additional datasets in this study. In both studies, participants performed a visual attention task while keeping their fixation in the center of the screen. Regarding the Feldmann-Wüstefeld and Vogel, 2019 study, participants were asked to memorize the colors of two stimuli while ignoring a set of distractors stimuli. We analyzed uniquely those trials in which the visual stimuli were presented to the left or right side of the screen, while the distractors were placed above or below the fixation cross. After 500ms of the fixation cross, two colored 'target' stimuli were presented for 200ms. Participants were asked to memorize these stimuli, and a new 'probe’ stimulus was shown after an additional second. Participants reported whether the probe matched the target stimuli or not. We analyzed the traveling waves in the 500ms following the target stimulus onset. Participants performed a spatial attention task in the second dataset from Foster et al. 2017. First, the fixation cross cued participants to covertly attend one of eight possible spatial positions uniformly distributed around the center of the screen. After one second, a digit was displayed either in the cued location or in any other one. The remaining locations were filled with letters. Participants were instructed to report the only displayed digit. We analyzed the waves the second before the stimuli onset when participants attended to the locations cued to the left or right side of the screen (we discarded trials in which participants attended locations above or below the fixation cross). For additional details about both experimental procedures, we refer the reader to Foster et al., 2017 and Feldmann-Wüstefeld and Vogel, 2019.”

      From the discussion:

      "Our previous work proposed an alternative cause for the generation of cortical waves (Alamia and VanRullen, 2019). We demonstrated that a simple multi-level hierarchical model based on Predictive Coding (PC) principles and implementing biologically plausible constraints (temporal delays between brain areas and neural time constants) gives rise to oscillatory traveling waves propagating both forward and backward. This model is also consistent with the 2-dipoles hypothesis (Zhigalov and Jensen, 2022), considering the interaction between the parietal and occipital areas (i.e., a model of 2 hierarchical levels). However, dipoles in parietal regions are unlikely to explain the observed pattern of top-down waves, suggesting that more frontal areas may be involved in generating the feedback. This hypothesis is in line with the PC framework, in which top-down connections have an inhibitory function, suppressing the activity predicted by higher-level regions (Huang and Rao, 2011). Interestingly, Spratling proposed a simple reformulation of the terms in the PC equations that could describe it as a model of biased competition in visual attention, thus corroborating the interpretation of our finding within the PC framework (Spratling, 2008, 2012)."

    1. Author Response

      Reviewer #1 (Public Review):

      The authors developed a new concept: Skeletal age, which is chronological age + years lost due to suffering a low-energy fracture. There seem to be conceptual problems with this concept: It is not known if the years lost are lost due to the fracture or co-morbidities.

      The Reviewer raises an important point, and we are happy to discuss it as follows. While it is not possible to show the causal relationship between a fragility fracture and excess mortality, it has been shown repeatedly that a fracture is associated with an increased risk of pre-mature mortality after accounting for comorbidities and frailty. Indeed, we and others have found that comorbidities contribute little to the increased risk10,11. Moreover, in a previous study using the ‘relative survival analysis’ technique12, we have shown that hip and proximal fractures were associated with reduced life expectancy after accounting for time-related changes in background mortality in the population, suggesting that hip and proximal fractures are an independent clinical risk factor for mortality.

      In this study, we used a multivariable Cox’s proportional hazards model to adjust for confounding effects of age and severity of comorbidities, and our result clearly indicated that a fracture is associated with years of life lost. Moreover, comorbidities were considered a factor in an individual's risk profile for estimating skeletal age. As a result, skeletal age reflects the common real-world scenario that the combination of comorbidities and proximal or lower leg fractures compounded post-fracture excess mortality, much greater than each alone13.

      Technically, there are two steps to individualise skeletal age for each individual with a specific risk profile. First, we used the statistical approach recommended for the individualisation of survival time prediction using statistical models14 to individualise specific mortality risk for each participant with a specific risk profile. Specifically, we calculated the prognostic risk index as a single-number summary of the combined effects of his/her specific risk profile of a specific fracture site and the severity of comorbidity. His/her individualised fracture-mortality association was then computed as the difference between his/her prognostic index and the mean prognostic index of “typical” people in the general population. In the second step, we used the Gompertz law of mortality and the Danish national lifetable data to transform the individualised association into life expectancy loss as a result of a fracture15.

      We have modified part of the description of the methodology as follows:

      “For the second aim, we determined skeletal age for individual based on the individual’s specific risk profile. First, we calculated the prognostic risk index as a single-number summary of the combined effects of his/her specific fracture site and the severity of comorbidity51. The prognostic index is a linear combination of the risk factors with weights derived from the regression coefficients. The individualised fracture-mortality association for an individual with a specific risk profile is then the difference between the individual's prognostic index and the mean prognostic index of 'typical' people in the general population51. In the second step, we used the Gompertz law of mortality and the Danish national lifetable data to transform the excess mortality into life expectancy loss as a result of a fracture49.”.

      In addition, with the possible exception of zoledronate after hip fracture, we have no evidence that this increased risk of mortality can be changed with interventions.

      We agree that there is a lack of strong evidence from randomised controlled trials supporting the benefit of anti-resorptive therapy on post-fracture survival. As mentioned above, the mention of zoledronic acid was simply for illustrating the use of skeletal age to convey a treatment benefit. We have decided to remove the section related to the benefit of pharmacological treatment on post-fracture mortality.

      Furthermore, it is not clear why the authors think that patients and doctors will better understand the implications of older "skeletal age", on future fracture risk and the need for prevention, for example, the 10-year risk of MOF? Knowing that my bones are older than me, could make a patient feel even more fragile and afraid of being physically active. The treatment will reduce the risk of future fractures, but this study provides no information about the effect on mortality of preventing the subsequent fracture or the risk of mortality associated with recurrent fractures.

      The risk of fracture is typically conveyed to patients and the public in terms of absolute risk metric (e.g., probability) or relative risk metrics (e.g., risk ratio). However, patients and doctors often struggle to comprehend probabilistic statements such as 'Your risk of death over the next 10 years is 5% if you have suffered from a bone fracture'. The underappreciation of post-fracture mortality's gravity has caused patients to be hesitant towards treatment and prevention, contributing to the current crisis of osteoporosis treatment.

      We consider that skeletal age will make doctor-patient risk communication more intuitive and probably more effective. For example, for the same 2-fold increased mortality risk of hip fracture, telling a 60-year man with a hip fracture that his skeletal age would be 66 years old, equivalent to a 6-year loss of life is much more intuitive. The patient might be thus more likely to accept the recommended pharmacological treatment, ultimately improving health benefits. However, we have not had RCT evidence for the effectiveness of skeletal age, and this will be one of our future research focus. We would like to point out that there is RCT evidence that effective age (such as 'Heart Age', 'Lung Age') could improve the uptake of preventive actions. For example, informing patients about their heart age, as shown by Lopez-Gonzalez et al16 was found to better improve their cardiovascular risk compared to informing the Framingham probabilistic risk score.

      Introduction:

      The statement that treatment reduces the risk of dying, needs modification as the majority of clinical trials have not demonstrated reduced mortality with treatment.

      We have modified the statement as follows: “In randomised controlled trials, treating high-risk individuals with bisphosphonates or denosumab reduces the risk of fracture4, though whether the reduction translates into reduced mortality risk remains contentious5, 6.”

      It is not clear how the skeletal age captures the risk of a future fracture. The other difference between the idea of "skeletal age" and for example "heart age" is that there are treatments available for heart disease that reduce the risk of mortality, as mentioned above this has not been shown consistently in clinical trials in osteoporosis.

      We take the Reviewer's point, but we would like to point out that there are at least two RCTs on zoledronic acid showing that treating patients with a fragility fracture reduces their risk of mortality17,18.

      Because the risk profile that is associated with a post-fracture mortality is also associated with the risk of fracture, skeletal age can be seen as a measure of the decline of the skeleton due to a fracture or exposure to risk factors that raise the risk of fracture. Thus, a 60-year-old with a skeletal age of 66 is in the same risk category as a 66-year-old with 'favourable risk factors' or at least the ones that are potentially modifiable. Hence, an older skeletal age means a greater risk of fracture.

      Neither the “Skeletal Age” nor the “Heart Age”16,19,20 has the treatment intervention incorporated into its calculator. We have added details to explain how the assessment of skeletal age would provide the conceptual risk of both fracture and post-fracture mortality as follows:

      “Unlike the current fracture risk assessment tools17 which estimate the probability of fracture over a period of time using probability-based metrics, such as relative risk and absolute risk, skeletal age quantifies the consequence of a fracture using a natural frequency metric. A natural frequency metric has been consistently shown to be easier and more friendly to doctors and patients than the probability-based metrics9 11 30. It is not straightforward to appreciate the importance of the two-fold increased risk of death (i.e., relative risk = 2.0) without knowing the background risk (i.e., 2 folds of 1% would remarkably differ from 2 folds of 10%). By contrast, for the same 2-fold mortality risk of hip fracture, telling a 60-year man with a hip fracture that his skeletal age would be 66 years old, equivalent to a 6-year loss of life, is more intuitive. The skeletal age can also be interpreted as the individual being in the same risk category as a 66-year-old with 'favorable risk factors' or at least the ones that are potentially modifiable. Hence, an older skeletal age means a greater risk of fracture.”.

      Discussion:

      The prevalent comorbidities; cardiovascular diseases, cancer, and diabetes, suggest that fracture patients die from their comorbidities and not their fractures.

      Please refer to the above response for more detail. Briefly, the multivariable Cox’s proportional hazards regression adjusted for the confounding effect of age and the severity of comorbidities, indicating the association between fracture and mortality was independent of aging and comorbidity severity. On the other hand, skeletal age is a measure of excess mortality related to either fracture or co-morbidities or both.

      The discussion should be more balanced as there is a number of clinical trials demonstrating reductions in vertebral and non-vertebral fractures without effect on mortality. There may be specific effects of zoledronate on mortality, but that has not been shown for the vast majority of treatments.

      Please refer to the above response for more detail. Specifically, as the study primarily aimed at introducing skeletal age as a new metric for risk communication, we have decided to omit the paragraph discussing the potential benefit of zoledronic acid on post-fracture mortality risk in order to maintain the clarity and focus of the study.

      It is not correct that FRAX does not take mortality into account? It does not tell you specifically how high the risk of dying and how high the risk of a fracture is but integrates the two. "Skeletal age" does not provide either information, it just tells you that your skeleton is older than your chronological age - most patients and doctors will not associate that with an increased risk of dying - only of frailty.

      Although it is commonly believed that FRAX accounts for competing risk of death, it does not provide the risk of post-fracture mortality. Indeed, none of the current fracture risk assessment tools was designed to provide post-fracture mortality risk5. Skeletal age fills the gap by providing the excess mortality following a fracture for an individual with specific risk profile.

      The statement that zoledronate reduces the "skeletal age" by 3 years, has not been demonstrated and it is not clear how this can be demonstrated by the analysis reported here. As the reduced mortality has only been shown for the Horizon RFT, this cannot be inferred for other treatments and other fracture types. The information provided by the "skeletal age" is only that the fracture you already had took x years of your remaining lifetime. With the exception of perhaps zoledronate after hip fracture, we have no indication from clinical trials that the treatment of osteoporosis will change this.

      The current study was not designed to examine the effectiveness of an intervention. The statement related to the survival benefit of zoledronate is used to illustrate how skeletal age is used to convey the treatment benefit in real-world doctor-patient risk communication. Given the hazard ratio of 0.72 for zoledronate-mortality association17, a patient might find the statement “Zoledronic acid treatment helps a patient with a hip fracture gain (back) 3 years of life” much easier to understand and probably more persuasive than the traditional statement of “Zoledronic acid treatment reduced the risk of death by 28%”.

      Reviewer #2 (Public Review):

      The paper of Tran et al. introduces the concept of 'skeletal age' as a means of conveying the combined risk of fracture and fracture-associated mortality for an individual. Skeletal age is defined as the sum of chronological age and the number of years of life lost associated with a fracture. Using the very comprehensive Danish national registry and employing Cox's proportional hazards model they estimated the hazard of mortality associated with a fracture. Skeletal age was estimated for each age and fracture site stratified by gender. The authors propose to replace the fracture probability with skeletal age for individualized fracture risk assessment.

      Strengths of the study lie in the novelty of the concept of 'skeletal age' as an informative metric to internalize the combined risks of fracture and mortality, the very large and well-described Danish National Hospital Discharge Registry, the sophisticated statistical analysis and the clear messages presented in the manuscript. The limitations of the study are acknowledged by the authors.

      We appreciate your positive remark that captures the essence of our work.

      References:

      1. Lujic S, Simpson JM, Zwar N, Hosseinzadeh H, Jorm L. Multimorbidity in Australia: Comparing estimates derived using administrative data sources and survey data. PloS one 2017; 12(8): e0183817.
      2. Andersen TF, Madsen M, Jorgensen J, Mellemkjoer L, Olsen JH. The Danish National Hospital Register. A valuable source of data for modern health sciences. Dan Med Bull 1999; 46(3): 263-8.
      3. Vestergaard P, Mosekilde L. Fracture risk in patients with celiac Disease, Crohn's disease, and ulcerative colitis: a nationwide follow-up study of 16,416 patients in Denmark. Am J Epidemiol 2002; 156(1): 1-10.
      4. Hundrup YA, Hoidrup S, Obel EB, Rasmussen NK. The validity of self-reported fractures among Danish female nurses: comparison with fractures registered in the Danish National Hospital Register. Scand J Public Health 2004; 32(2): 136-43.
      5. Beaudoin C, Moore L, Gagne M, et al. Performance of predictive tools to identify individuals at risk of non-traumatic fracture: a systematic review, meta-analysis, and meta-regression. Osteoporos Int 2019; 30(4): 721-40.
      6. Spiegelhalter D. How old are you, really? Communicating chronic risk through 'effective age' of your body and organs. BMC Med Inform Decis Mak 2016; 16: 104.
      7. Vestergaard P, Rejnmark L, Mosekilde L. Osteoporosis is markedly underdiagnosed: a nationwide study from Denmark. Osteoporos Int 2005; 16(2): 134-41.
      8. Roerholt C, Eiken P, Abrahamsen B. Initiation of anti-osteoporotic therapy in patients with recent fractures: a nationwide analysis of prescription rates and persistence. Osteoporos Int 2009; 20(2): 299-307.
      9. Cummings SR, Lui LY, Eastell R, Allen IE. Association Between Drug Treatments for Patients With Osteoporosis and Overall Mortality Rates: A Meta-analysis. JAMA Int Med 2019; 179(11): 1491-500.
      10. Chen W, Simpson JM, March LM, et al. Comorbidities Only Account for a Small Proportion of Excess Mortality After Fracture: A Record Linkage Study of Individual Fracture Types. J Bone Miner Res 2018; 33(5):795-802
      11. Vestergaard P, Rejnmark L, Mosekilde L. Increased mortality in patients with a hip fracture-effect of pre-morbid conditions and post-fracture complications. Osteoporos Int 2007; 18(12): 1583-93.
      12. Tran T, Bliuc D, Hansen L, et al. Persistence of Excess Mortality Following Individual Nonhip Fractures: A Relative Survival Analysis. J Clin Endocrinol Metab 2018; 103(9): 3205-14.
      13. Tran T, Bliuc D, Ho-Le T, et al. Association of Multimorbidity and Excess Mortality After Fractures Among Danish Adults. JAMA Netw Open 2022; 5(10): e2235856.
      14. Henderson R, Keiding N. Individual survival time prediction using statistical models. J Med Ethics 2005; 31(12): 703-6.
      15. Kulinskaya E, Gitsels LA, Bakbergenuly I, Wright N. Calculation of changes in life expectancy based on proportional hazards model of an intervention. Insur Math Econ 2020; 93: 27-35. 16 Lopez-Gonzalez AA, Aguilo A, Frontera M, et al. Effectiveness of the Heart Age tool for improving modifiable cardiovascular risk factors in a Southern European population: a randomized trial. Eur J Prev Cardiol 2015; 22(3): 389-96.
      16. Lyles KW, Colon-Emeric CS, Magaziner JS, et al. Zoledronic acid and clinical fractures and mortality after hip fracture. N Engl J Med 2007; 357(18): 1799-809.
      17. Reid IR, Horne AM, Mihov B, et al. Fracture Prevention with Zoledronate in Older Women with Osteopenia. N Engl J Med 2018; 379(25): 2407-16.
      18. Bonner C, Batcup C, Cornell S, et al. Interventions Using Heart Age for Cardiovascular Disease Risk Communication: Systematic Review of Psychological, Behavioral, and Clinical Effects. JMIR Cardio 2021; 5(2): e31056.
      19. Svendsen K, Jacobs DR, Morch-Reiersen LT, et al. Evaluating the use of the heart age tool in community pharmacies: a 4-week cluster-randomized controlled trial. Eur J Public Health 2020; 30(6): 1139-45.
      20. Suissa S. Immortal time bias in pharmaco-epidemiology. Am J Epidemiol 2008; 167(4): 492-9.
    1. Author Response

      Reviewer #1 (Public Review):

      The authors use a newly developed object-space memory task comprising of a "Stable" version and "Overlapping" version where two objects are presented in two locations per trial in a square open field. Each version consists of 5 training trials of 5-min presentations of an object-space configuration, with both object locations staying constant across training trials in the Stable condition, and only one object location staying fixed in the Overlapping condition. Memory is tested in a test trial 24 hours later where the opposite configuration is presented - overlapping configuration presented for the Stable condition and stable configuration presented for the Overlapping condition - with the thesis that memory in this test trial for the Overlapping condition will depend on the accumulated memory of spatial patterns over the training trials, whereas memory for the test trial in the Stable condition can be due to episodic memory of last trial or accumulated memory. Memory is quantified using a Discrimination Index (DI), comparing the amount of time animals spend exploring the two object locations.

      Here, animals in other groups are also presented with an interference trial equivalent to the test trial, to test if the memory of the Overlapping condition can be disrupted. The behavioral data show that for RGS14 over-expressing animals, memory in the Overlapping condition is diminished compared to controls with no interference or controls where over-expression is inhibited, whereas memory in the Stable condition is enhanced. This is interpreted as interference in semantic-like memory formation, whereas one-shot episodic memory is improved. The authors speculate that increased cortical plasticity should lead to increased and larger delta waves according to the sleep homeostasis hypothesis, and observe that instead increased cortical plasticity leads to less non-REM sleep and smaller delta waves, with more prefrontal neurons with slower firing rates (presumably more plastic neurons). They further report increased hippocampal-cortical theta coherence during task and REM sleep, increased NonREM oscillatory coupling, and changes in hippocampal ripples in RGS14 over-expressing animals.

      While these results are interesting, there are several issues that need to be addressed, and the link between physiology and behavioral results is unclear.

      1) The behavioral results rely on the interpretation that the Overlapping condition corresponds to semantic-like memory and the Stable condition corresponds to episodic-like memory. While the dissociation in memory performance due to interference seen in these two conditions is intriguing, the Stable condition can correspond not just to the memory of the previous trial but also accumulated memory of a stable spatial pattern over the 5 testing trials, similar to accumulated memory of a changing spatial pattern in the Overlapping pattern.

      Yes! We completely agree on this. We do not claim the stable condition corresponds to episodic-like memory, instead we refer to it as simple memory, since it can be solved either way (one trial memory or cumulative memory). We now expanded this in the discussion to make it clearer.

      Here, it is puzzling that in the behavioral control with no interference (Figure 1D), memory in the Stable and Overlapping condition is unchanged in the test trial, with the DI statistically at 0 in the test trial. In the original description of the Object Space task by the authors in the referenced paper, the measure of memory was a Discrimination Index significantly higher than 0 in both the Stable and Overlapping conditions. This discrepancy needs to be reconciled. Is the DI for the interference trial shown in Fig. S1 significantly different than 0? No statistics or description is provided in the figure legend here.

      As mentioned above, we apologize that we oversimplified the description. The 24h interference trial would be what corresponds to the original test trial. We added a clarifying figure for comparison in S1 (bar graph in addition to the violin plot) and stats. Performance was for all groups and conditions above chance, replicating our previous results.

      2) The physiology experiments compare Home cage (HC) conditions to the Object Space task (OS) throughout the manuscript. While some differences are seen in the control and RGS14 over-expressing animals, there is no comparison of the Stable vs. Overlapping condition in the physiology experiments. This precludes making explicit links between physiological observations and behavioral effects.

      As also mentioned above, we have now added analysis exploring the detailed OS conditions. We would like to thank the reviewers for giving us the opportunity of doing so.

      3) The authors speculate that learning will result in larger and more delta waves as per the synaptic homeostasis hypothesis. It should be noted here that an alternative hypothesis is that there should also be a selective increase in synaptic plasticity for learning and consolidation. The authors do observe that control animals show more frequent and higher-amplitude delta waves, but rather than enhancing this process, RGS14 animals with increased plasticity show the opposite effect. How can this be reconciled and linked with the behavioral data in the Stable and Overlapping condition?

      In the context of the Object Space Task, we would expect all behavioural conditions (Stable and Overlapping) to induce synaptic changes since learning does occur also in the Stable condition (see also performance on 24h trial). Thus, especially homeostatic responses such as increase in delta amplitude, we would expect for all experiences independent if subtle statistical rules are presented or not. In contrast, detailed processing, extracting underlying regularities is rather proposed by the Sleep for Active Systems Consolidation Hypothesis to occur during hippocampal-cortical interactions in form of delta/ripple/spindle interactions (with different theories emphasising different types of interactions). As mentioned above, we now add a more specific analysis in this regards, where we can show that the two OS conditions that involve moving objects (where thus potentially statistical regularities can be extracted) show a higher percentage of ripples occurring after large slow oscillations in comparison to home cage or the simple learning condition Stable. In contrast, RGS14 already has higher participation in both control conditions, emphasising that in these animals all experiences are treated by the brain as significant learning condition, explaining the behavioural effect (increased interference due to better memory for the interference). Further, we expanded in the discussion how in RGS we sometimes see an enhancement of learning effects but sometimes see a more complex interaction of what we would expect from physiological learning.

      Similarly, there is an increase in slower-firing neurons in RGS14 over-expressing animals. Slower-firing neurons have been proposed to be more plastic in the hippocampus based on their participation in learned hippocampal sequences, but appropriate references or data are needed to support the assertion that slower-firing neurons in the prefrontal cortex are more plastic.

      As described above, we have expanded the discussion including other citations that also consider the cortex. We can show that our changes would be expected if one turns the cortex as plastic as the hippocampus.

      4) It is noted that changing cortical plasticity influences hippocampal-cortical coupling and hippocampal ripples, suggesting a cortical influence on hippocampal physiological patterns. It has been previously shown that disrupting prefrontal cortical activity does alter hippocampal ripples and hippocampal theta sequences (Schmidt et al., 2019; Schmidt and Redish, 2021). The current results should be discussed in this context.

      We would like to thank the reviewer for these suggestions, they are now incorporated in the manuscript.

      Reviewer #2 (Public Review):

      In this paper, the authors provide evidence to support the longstanding proposition that a dual-learning system/systems-level consolidation (hippocampus attains memories at a fast pace which are eventually transmitted to the slow-learning neocortex) allows rapid acquisition of new memories while protecting pre-existing memories. The authors leverage many techniques (behavior, pharmacology, electrophysiology, modelling) and report a host of behavioral and electrophysiological changes on induction of increased medial prefrontal cortex (mPFC) plasticity which are interesting and will be of significant interest to the broad readership.

      The experimental design and analyses are convincing (barring some instances which are discussed below). The following recommendations will bolster the strength/quality of the manuscript:

      1) Certain concerns regarding the interpretation and analysis of the behavioral data remain. The authors need to clarify if increased mPFC plasticity leads to only an increase in one-shot memory or 'also' interference of previous information. It seems that the behavioral results could also be explained by the more parsimonious explanation that one-shot memory is improved. Do the current controls tease apart these two scenarios?

      We agree we cannot disentangle if one memory is just stronger than the other or if its an overwriting effect. We added this now to the discussion. Of note, we do not think it actually would be possible to distinguish these two effects behaviourally in rodents, or at least we cannot think of a fitting study design that would enable the contrast.

      Additionally, the authors need to clarify why the 'no trial' and 'anisomycin' controls for the stable task perform at chance levels on exposure to a new object-place association on test day (Fig 1D).

      Violin plots are sometimes hard to see. Here simple bar plots where you can see that the animals are not at chance at the 72h test in the control conditions.

      Finally, further description of how the discrimination index (exploration time of novel-exploration time of familiar/sum of both) is recommended i.e., in the stable condition, which 'object' is chosen as 'novel' (as both are in the same locations) for computing the index (Fig 1). Do negative DI values imply a neophobia to novel objects (and thus are a form of memory; this is also crucial because the modelling results (Fig 1E) use both neophilia and neophobia while negative discrimination indexes are considered similar to 0 for interpreting the behavioral results, as stated on page 3, lines 84-86?

      We added this now to the methods (For Overlapping it is moved location – stable location, for Stable it is location-to-be-moved-at-test – stable location and for random which is assigned as moved and stable is random, and then for each divided by total time). We agree that neophilia/neophobia (especially changes in the distribution) can be an issue and have discussed it in detail in Schut et al NLM 2020 where we see difference in absolute beta values (thus controlling for philia/phobia differences). We also discuss there why it is difficult to control for this in the DI in more detail. In short, one could use absolute values but then it is difficult to determine what a group chance-level would look like. However, luckily here there is not issue since we did not observe difference in neophilic or phobic tendencies while running the experiments. Critically the interference trial (that can also function as simple test trial) confirms that as a group animals show positive DI and neophilia.

      2) The authors report lower firing rates in RGS14414 animals during the task in Fig 2F. It is indeed remarkable how large the reported differences are. The authors need to rule out any differences in the behavioral state of the animals in the two groups during the task, i.e., rest vs. active exploration/movement dynamics. Are only epochs during the task while the animals interact with the objects used for computing the firing rates (same epochs as Fig 1)? If not, doing so will provide a useful comparison with Fig 1. Additionally, although the authors make the case for slow firing rate neurons being important for plasticity (based on Grosmark and Buzsaki, 2016), it is crucial to note that the firing rate dynamic (slow vs. fast) in that study for the hippocampus is defined based on the whole recorded session (predominated by sleep), indeed the firing rates of the two groups (slow vs. fast/plastic vs. rigid) during the task/maze-running do not differ in that study. Therefore, the results here seem incongruent with the Grosmark and Buzsaki paper. Since this finding is central to the main claim of the authors, it either warrants further investigation or a re-interpretation of their results.

      As mentioned in the main points, we now added the firing rate analysis (including new groups splits) for wake in the sleep box, NREM and REM separately. Each time the same results are obtained. Currently, we do not yet have the tracking and video synchronization set-up, therefore we cannot split the task for specific behaviours.

      However, we now also cite Buzsaki’s original log-normal brain review, where he first proposed the idea. There he also shows same effects as we do, in that the general firing rate distribution is the same for task and different sleep stages, just overall shifted. The analysis from Grosmark included more strigent subselection of neurons to be able to also argue that incorporation into run/replay-sequences could not have been biased by firing rate per se (instead of plasticity). However, the original proposition from Buzsaki does fit to our results. He further presents hippocampus vs cortex firing rates, which also confirm the idea (hippocampus more plastic and has slower firing rates). We included this figure above in the general comments. Further, we now expanded the discussion in this point.

      3) A concern remains as to how many of the electrophysiological changes they observe (firing rate differences, LFP differences including coupling, sleep state differences, Figs. 2-4) support their main hypothesis or are a by-product of injection of RGS14414 (for instance, one might argue that an increased 'capability' to learn new information/more plasticity might lead to more NREM sleep for consolidation, etc.). The authors need to carefully interpret all their data in light of their main hypothesis, which will substantially improve the quality/strength of the manuscript.

      We now expanded the discussion, included more structure and also include that we cannot disentangle if the cellular changes or sleep oscillation changes or an interaction of both is the cause of the result. Furthermore, we added that we cannot distinguish if the interference memory is stronger or actually overwrites the original training memory.

      Reviewer #3 (Public Review):

      The authors set out to test the idea that memories involve a fast process (for the acquisition of new information) and a slow process (where these memories are progressively transferred/integrated into more-long term storage). The former process involves the hippocampus and the latter the cerebral cortex. This 'dual-learning' system theoretically allows for new learning without causing interference in the consolidation of older memories. They test this idea by artificially increasing plasticity in the pre-limbic cortex and measuring changes in different learning/memory tasks. They also examined electrophysiological changes in sleep, as sleep is linked to memory formation and synaptic plasticity.

      The strengths of the study include a) meticulous analyses of a variety of electrophysiological measurements b) a combination of neurobiological and computational tools c) a largely comprehensive analysis of sleep-based changes. Some weaknesses include questions about the technique for increasing cortical plasticity (is this physiological?) and the absence of some additional experiments that would strengthen the conclusions. However, overall, the findings appear to support the general idea under examination.

      This study is likely to be very impactful as it provides some really new information about these important neural processes, as well as data that challenges popular ideas about sleep and synaptic plasticity.

      We would like to thank the reviewer for these positive comments. Answers to the weaknesses are presented below in the recommendations for the authors.

    1. Author Response

      Reviewer #1 (Public Review):

      I noticed 2 weaknesses, the first related to the killing assays: considering that WT IgG less efficiently promotes complement-mediated phagocytosis of bacteria, one would assume that the ingested bacteria (to be killed) would be lower in neutrophils exposed to this IgG, to begin with - which is not accounted for in the analyses shown.

      We now included a better explanation of our opsonophagocytic killing assay.

      A second weakness in my mind pertains to the in vivo experiment: the model used obviously requires a very high number of bacteria (the inoculum), somehow indicating that this specific bacterial strain does not lead to progressive infection (i.e. with replicating bacteria) but mice experience a severe acute inflammatory response followed by the rapid elimination of bacteria. This explains the high mortality - and indicates that mice succumb to acute inflammation, rather than the progressive replication of bacteria. To conclusively prove the therapeutic value of those modified antibodies, a clinically more relevant S. pneumoniae model would be helpful.

      The inoculum used in our mouse model was based on a dose finding study. Although the initial starting dose was 5x106 bacteria (based on previously published mouse infection models with S. pneumoniae serotype 6A), we needed a higher dose (1x108 bacteria) to reach 80-100% mortality. While we agree that the final dose was relatively high, this does not mean that capsule type 6 is not a clinically relevant strain. It is well known that clinically relevant serotypes in humans are not always invasive in mice (doi: 10.1128/iai.60.1.111-116.1992). This is the exact reason why we chose to perform in vivo experiments with serotype 6A, which is known to be more invasive in mice (while serotype 6B is more virulent in humans). Of course, while our in vivo data provide an important proof-of-concept for the capacity of hexamer-enhancing mutations to improve protection by anti-capsular antibodies, future studies are needed to verify the potential use of mAbs against other serotypes.

      A third aspect, which should be addressed in the discussion, unless tested and not shown, is how anti-pneumococcal IgM antibodies compare to hexamerized IgGs. Is there any advantage, or do they perform similarly with regards to complement activation?

      We have now generated and tested IgM against CPS6 (Figure 2g). Although anti-CPS6 IgM can induce complement-dependent phagocytosis to some extent, but IgM was less potent than IgG variants with hexamer-enhancing mutations. This suggests that complement activation via pre-assembled IgM oligomers was less effective than via IgG hexamers that are formed after target binding.

      These new data are now included in the revised manuscript as figure 2g, supplemental figure 9 and commented in results section lines 172-179.

      Reviewer #2 (Public Review):

      The results are intriguing, and one consideration is whether enhancing complement activation is beneficial or harmful for a therapeutic antibody. Based on these results is there the possibility of a natural selection against strong levels of complement activation?

      We appreciate the positive feedback to our presented work. Indeed, it is believed there is a natural selection against these mutations to avoid uncontrolled complement activation by naturally occurring IgGs in solution. It is important to realize that formation of IgG hexamers is a surface-dependent process. When IgG molecules bind to surface-bound antigens (via Fab), they can organize into higher-ordered hexamers via Fc-Fc interactions. The specific point mutations used in this paper increase hexamer formation after antigen binding on the cell surface. However, at high concentrations of IgG (as those occurring in our blood (>10 mg/ml), IgG hexamers might be formed independent of target binding (van Kampen et al Journal of Pharmaceutical Sciences Volume 111, Issue 6, June 2022, Pages 1587-1598). If naturally occurring IgGs would have hexamer-enhancing mutations, IgG hexamers could be formed in solution resulting in massive complement activation and depletion of the complement system.

      The study clearly shows that the introduction of the hexamerisation mutations affects the ability of the antibodies to bind and activate complement. The studies in Fig 2 examining the role of Fc are particularly elegant. One issue is that it is surprising that the WT IgG1 and IgG3 monoclonals have a minimal capacity to fix and activate complement, despite IgG1/3 to other antigens being efficient isotypes at fixing complement. In the absence of data showing whether IgG1/3 from normal human sera against capsule fixes complement then it is difficult to contextualise these results or to assess if other changes, such as in glycosylation, contribute to the results presented. Related to this, there is reasonable evidence that antibodies induced to capsules can be protective yet the data in Fig 5 suggests that without the mutations then the monoclonals are not effective at all for 6B and only effective at the highest concentration for 19A.

      As mentioned in Essential revision 3 our data with S. aureus antibodies demonstrate that this is not a consequence of how these mAbs are produced or differences in their Fc glycosylation profile. We agree with the fact that there are reasonable evidence that antibodies induced to capsules can be protective. However, not all vaccine serotypes are able to induce a strong immune protection. Serotype 6B, for instance, which is covered by current vaccines, is found to be poorly immunogenic (manuscript lines 101-103). For further studies, it would be really interesting to find out what makes this difference between mAbs and, specifically in our case between anti-CPS antibodies.

      The adoptive transfer experiments demonstrate that the antibodies can moderate bacteraemia. The mechanism of this is not explored and the importance of hexamerisation and complement activation not demonstrated, especially as it is not clear if human antibodies and mouse complement are a productive combination in this context.

      We have now included additional phagocytosis assays with mouse sera (supplemental figure 15) that demonstrate that human antibodies and mouse complement are a productive combination.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Silva et al. "Evaluation of the highly conserved S2 hairpin hinge as a pan-coronavirus target" seeks to evaluate a new epitope target on the S2 domain of SARS-CoV2 Spike protein and evaluate its potential as a pan-coronavirus target. This is an impressive combination of extensive structural, HDXMS-based dynamics and antibody engineering approaches. What is missing is a detailed correlation of HDXMS with Spike dynamics. The authors have not examined the allosteric effects of 3A3 binding to the Spike trimer, specifically cooperativity in antibody binding. Does binding of one Fab positively or negatively impact the subsequent binding of antibody? In this regard, readers would benefit from HDXMS spectral envelopes in figures, at least for the epitope locus peptides. Further, what is the effect of the intrinsic ensemble behavior of the Spike protein on 3A3 interactions? In a broader sense antibody binding is assisted by intrinsic trimer ensemble behavior, as observed by the lowered binding to the omicron variant- but are there induced binding effects? It would help to better integrate HDXMS with cryo-EM and antibody engineering. It is a novel, less explored epitope target on the S2 domain. Overall, a more definitive mechanistic conclusion for how targeting the S2 hinge can advance future pan-coronavirus strategies is missing.

      1) Given that the authors have demonstrated ensemble switching behavior from 4 ℃ to 37 ℃ (Costello et al. (2021)) why is this not factored in how the HDXMS is carried out? The samples were stored, frozen at -80 ℃, thawed, and equilibrated for 20 min at 20 ℃ with or without antibody present and analyzed by HDXMS. However, the reported t1/2 for trimer tightening at 37 ℃ is t1/2 = 2.5 h (Supplementary Fig. 7). The samples should ideally be analyzed under standardized conditions with the stable conformer. Sample heterogeneity from HDXMS is likely due to any of the following contributing factors:

      i) Intrinsic ensemble heterogeneity (Costello et al. (2021)), Kinetics of RBD- up and down conformational switching

      ii) Cooperativity of Fab binding.

      iii) Partial occupancy of trimer epitopes with bivalent IgG.

      iv) Combination of cooperativity effects and partial binding effects

      I would predict for any of the above reasons, it is intriguing why are there no bimodal kinetics of deuterium exchange reported. Partial occupancy should be evident from HDXMS paratope analysis.

      2) Pan-coronavirus neutralization potential is clearly evident. It is intriguing that the antibodies were isolated after immunization with an authentic MERS S2 domain but showed better selectivity to full-length 6P-engineered Spike. How is cooperativity built into antibody binding, given that the epitope site is occluded to various extents by the S1 domain and access is contingent upon RBD up-down kinetics?

      3) I am surprised that there is no allostery described for 3A3 (Supplementary figures 5, 6).

      The HDX-MS experiments presented in this work were carried out by the D’Arcy lab and published in a preprint on bioRxiv (originally posted on February 1, 2021) prior to publication of Costello et al. (first posted to bioRxiv July 11, 2021, epub March 2, 2022). Indeed, our bioRxiv posting inspired the Marqusee lab to request 3A3 for inclusion in their work focused on the conformational heterogeneity of the spike protein. Without prior knowledge of the conformational heterogeneity, we carried out these epitope mapping experiments at 25Ç, which allowed us to successfully mapped the epitope without determining which conformation the antibody prefers.

      The data presented in Costello et al. further confirms the location of 3A3’s epitope presented here and provides additional information about its preference for different conformational states within the spike protein. We have included an additional comment in the methods section (lines 660-661) stating, “The location of the 3A3 epitope was confirmed in a separate experiment carried out over the temperature range of 4 to 37 °C (Costello et al. 2022).”

      This is a clear example of the value of pre-prints to stimulate timely scientific collaboration. While Costello et al. used 3A3 as a tool to probe spike dynamics, here we highlight the original work that identified the epitope.

      Spectral envelopes have been provided (Supplementary Fig. 4b and Supplementary Table 3).

      The HDX-MS data provides limited insight into possible cooperative or allosteric binding of the 3A3 antibody because of other sources of heterogeneity such as spike dynamics and partial occupancy of the spike epitopes. However, no difference in occupancy was detected when HDX-MS with 3A3 Fab was compared to the same experiment with bivalent 3A3 IgG. It should be noted that in this HDX system, the antibody is not bound so tightly that the spectra are bimodal, showing the exchange of bound and unbound populations separately. Though HDX-MS experiments were performed in slight Fab or IgG excess of 1:1 Fab:spike monomer stoichiometry, the absolute stoichiometry in the context of the spike trimer is unclear.

      Reviewer #2 (Public Review):

      The authors report a conserved spike S2 hinge epitopes and two conformationally selective antibodies that help elucidate spike behavior. This work defines a third class of S2 antibody and provides insights into the potency and limitations of targeting this S2 epitope for future pan-coronavirus strategies.

      Thank you for your review of this manuscript.

      Reviewer #3 (Public Review):

      The study by Silva et al details the discovery and evaluation of a third class of broadly cross-reactive anti-Spike antibody that binds a conserved hinge region in the S2 domain. After immunizing mice with a stabilized S2 protein from MERS and generating scFv phage libraries, the authors were able to identify antibody 3A3, which showed broad cross-reactivity with SARS2 (including Omicron BA.1), SARS1, MERS, and HKU1 spike proteins. Using a combination of a low-resolution cryo-EM structure and HDX mass spectrometry, the authors were able to map amino acids in the antibody paratope and spike epitope, the latter of which is the hinge region of the Spike S2 domain (residues 980-1005) that plays a critical role in pre- to -post-fusion conformational changes. Through well-executed and comprehensive mutagenesis, binding, and functional assays, the authors further validated critical residues that lead to antibody escape, which centered around the 2P residues and diminished viral entry. While 3A3 and an affinity-enhanced engineered version, RAY53, did not show potent in vitro neutralization against the authentic virus, the antibody was shown to recruit Fc effector functions for viral clearance, in vitro.

      Overall, the conclusions of this paper are well supported by the data, but the usefulness of such antibodies is likely limited. The work can be strengthened by extending the analysis of 3A3-like antibodies in the context of human immune responses and in vivo effectiveness.

      1) Isolation of 3A3 was achieved after the generation of scFv-phage libraries following immunization with a MERS S2-domain immunogen in a mouse model. The fact that 3A3 binds well to 2P-stabilized sequences and binding/neutralization is diminished upon reversion of 2P mutations back to the native spike sequence (Figures 3a, 4c, and 5b), suggest that such antibodies would likely not arise from natural infection. This contrasts the isolation of fusion peptide and stem helix-directed antibodies, which were isolated from both immunized animals and convalescent individuals. To make their results more solid regarding the use of such antibodies in future vaccine strategies, the authors should provide evidence that 3A3-like antibodies can be identified in human donors. For example, they could enrich donor-derived S2-specific antibodies that bind both MERS and SARS2 S2 domains and evaluate the fraction of antibodies that recognize the hinge-epitope using competition binding assays (either ELISA or BLI), which have commonly been used to map epitope-specific sera responses. This could also be achieved with nsEMPEM of polyclonal IgGs bound to S2 proteins.

      2) The authors speculate in the discussion that strategies to enhance access to the hinge epitope, which may include ACE2-mimicking antibodies, could promote enhanced viral clearance. In addition to ACE2-mimicking antibodies, several antibodies have been described that bind the RBD and promote S1 shedding (see for instance mAb S2A4 - Piccoli et al, 2020, Cell). Several 2nd generation vaccine platforms utilize RBD-only immunogens that are likely to induce high titers of ACE2-mimicking and cross-reactive S1-shedding antibodies. Thus, adding in vitro neutralization and ADCC experiments to assess synergy between 3A3/RAY53 and such antibodies would booster this speculative claim and be of interest to many in the field developing strategies for pan-coronavirus therapies.

      3) The authors provide in vitro evidence in Figure 5c,d for Fc-mediated viral clearance. While in vivo data to show effectiveness in animal models is ideal, additional in vitro data that utilize engineered constructs that modulate effector function (e.g., DLE (+) or LALA (-)) would boost the authors' claims regarding Fc-mediated viral clearance mechanisms by 3A3/RAY53.

      1) Though we do not plan to isolate 3A3-like antibodies from human donors, there is evidence that these antibodies are elicited in infected humans via analysis of polyclonal responses in Claireaux et al 2022. We also know of several studies on naturally occurring S2 hinge targeting antibodies from colleagues that are in preparation. Understanding the therapeutic role of this antibody class is relevant to the study of broadly-reactive S2 antibodies, even if that role is limited.

      2) We agree that synergy between S2 hinge epitope binding antibodies and ACE2 mimicking antibodies will be very interesting to investigate. We hope to pursue this in future work.

      3) We agree these are excellent controls to include, in addition to isotype controls already shown. In accordance with the eLife COVID research policy, we minimized our claims around Fc-effector functions elicited by RAY53 and stated that further experiments to confirm our preliminary findings are needed.

      The existing description of the effector function experiments states in lines 392-392 “These results indicate that RAY53 binding is compatible with ADCP and ADCC,” which is already a very limited claim.

      We also added in line 450 that S2 core-binding antibodies “require further validation” of their ability to recruit effector functions.

      We appreciate the importance of controls providing effector function modulation and will include the LALAPG mutations as a standard component of our future ADCC evaluation. However, given our focus on the relevance of the epitope and consistency of the Fc regions across the antibodies, we felt that the isotype and positive control antibodies (target binding controls) were the most relevant controls to include in this study.

    1. Author Response

      eLife assessment

      Germline inactivation of NPHP2, which encodes a protein that localizes to the transition zone at the base of the primary cilium, results in infantile kidney cysts and fibrosis. In this study, the authors provide solid evidence that increased cell proliferation and fibrosis precede cyst formation in Nphp-2 mouse models, that mutant renal epithelial cells are responsible for the phenotype, and that genetic inhibition of ciliogenesis in this model reduces disease severity. They also show that valproic acid, a drug that affects a number of cellular targets and is used to treat other human conditions, slows disease progression. One limitation of the study is that it provides limited insights into the mechanisms responsible for any of its interesting observations.

      To our knowledge, our study is the first to pinpoint defective epithelial cells as the main driver for both epithelial cysts and interstitial fibrosis in a NPHP model. The discovery that abnormal signaling from epithelial cells triggered a profibrotic response in the absence of cyst formation is also novel. Our Ift88 Nphp2 double mutant results, combined with tissue-specific function of NPHP2, suggest that NPHP2 functions as a negative regulator of a profibrotic and pro-cystic pathway that interacts with cilia-mediated signaling in epithelial cells and that abnormal signaling from epithelial cells triggers interstitial fibrosis. Moreover, we identified the HDAC inhibitor VPA as a potential candidate drug for treating NPHP. Although the precise molecular function of NPHP2 remains undefined, our results suggest that epithelial specific function and epithelial-stromal crosstalk underlie NPHP like phenotypes in Nphp2 mutant kidneys. Furthermore, although whether NPHP2 interacts with polycystin-mediated signaling remains an outstanding question, our results ruled out the involvement of NPHP2 in ciliary localization of PC2.

      Reviewer #1 (Public Review):

      Nephronophthisis (Nphp) is a multigenic, recessive disorder of the kidney presenting in childhood that is characterized by cysts predominantly at the cortico-medullary junction and progressive fibrosis. An infantile form of the disease presents earlier with more diffuse cystic change. The condition is considered a ciliopathy because most of the genes linked to the condition encode proteins involved in ciliary biogenesis or function. Germline mutations in NPHP2 are associated with a particularly severe, infantile form of the disease. Given that interstitial fibrosis is a more prominent feature of Nphp compared to many other forms of polycystic kidney disease, the authors sought to determine the mutant cell types responsible for the phenotype.

      In the current study, the authors generated and characterized mouse lines with Nphp2 selectively inactivated in either renal epithelial cell or stromal cell lineages and found that inactivation in renal epithelial cells was both necessary and sufficient to cause disease. They further showed that markers of interstitial fibrosis and proliferation increase in mutants prior to the onset of histologically evident cystic disease, suggesting that aberrant epithelial-stromal cell signaling is an early and primary feature of the condition (Figures 1-4). The study design was straightforward and appropriate to address the question, and the results support their conclusions.

      They next tested whether the cilia-dependent cyst-activating pathway (CDCA) that is "unmasked" by loss of other PKD-related genes is similarly active in Nphp2 mutants by generating Nphp2/Ift88 double mutants. Their studies found that the severity of cystic disease and markers of proliferation and fibrosis was attenuated in double-mutants (Fig 5, 6). These studies were also appropriate for testing the hypothesis and the results were similarly consistent with their interpretation.

      In the last set of studies, they tested whether valproic acid (VPA), a drug that has multiple modes of action including acting as a broad inhibitor of HDACs and previously used by the investigators in other forms of polycystic kidney disease, would have similar effects in Nphp2 mutants. The authors tested daily injection from days P10 through P28 in both control and Nphp2 mutant mice with VPA or an appropriate vehicle control and found that VPA was beneficial (Fig 7). The study design was acceptable and the results generally support their conclusions. The one perplexing result is shown in Fig 7B. The Nphp2 mutants, regardless of treatment status, have body weights (BW) that are significantly lower than the controls, with treated mutants even trending lower than their untreated mutant counterparts. This is unexplained and should be addressed. In the mutants with more widespread epithelial cell knock-out of Nphp2 (Ksp-Cre, Fig 1), total body weight decreased as mice became more severely cystic with renal impairment. In the milder form of disease produced with the Pkhd1- Cre (Fig 7), total body weight is inexplicably approx. 2g lower on average despite having much more modestly elevated KBWs and BUNs. Moreover, one might have expected that mutants treated with VPA would have had BWs intermediate between untreated mutants and controls since the severity of the disease was moderately attenuated. These differences raise the question as to whether body weight differences are due to factors independent of disease status, the most likely of which would be that the controls were not littermates. This prompted a careful review of the text for descriptions of the control mice. Throughout the study, the investigators describe selecting animals from the same "cohort", but this term is imprecise.

      There is little information provided about background strains, whether any of the lines were congenic, or whether any of the studies were done using littermate controls. This must be addressed. It would help if the investigators identified the litter status in their plots. This would clearly show relationships between animals and the number of litters that had animals with these properties. If littermates were not used for each study, the authors must explain both why they didn't do so and how they then selected which animals to use. This information is especially important for interpreting the results of their genetic interaction (fig 5) and drug treatment studies (fig 7).

      We thank the reviewer for the multiple positive comments.

      To address the issue of body weight, we examined the time course of body weight change more carefully and added Figure 7-figure supplement 1 to present the results. Although Nphp2flox/flox;Pkhd1-Cre mice displayed reduced body weight at P28 in comparison to controls, this reduction was more moderate than that of Nphp2flox/flox;Ksp-Cre mice (Figure 7-figure supplement 1A). Notably, the trend of body weight difference started at around P21 in both Nphp2flox/flox;Pkhd1-Cre and Nphp2flox/flox;Ksp-Cre mice, coinciding with weaning (Figure 7-figure supplement 1B). It is possible that mutants with compromised kidney function were less capable to thrive and gain weight at around this transition time. In terms of VPA treatment, body weight trended down in both wild type and mutant mice subjected to the treatment, although the difference did not reach statistical significance (Fig. 7B). We cannot rule out the possibility that side effect of VPA contributed to weight loss in treated mice. In addition, VPA may affect body weight increase through HDAC: the HDAC inhibitor Trichostatin A was shown to inhibit adipogenesis (PMID: 34232916) and 4-hexylresorcinol, another HDAC inhibitor, reduced body weight in treated rats (PMID: 34445640). To include the additional data and references, we added the following in the Results section:

      "We analyzed body weight change of Nphp2flox/flox;Pkhd1-Cre mice in more detail and compared it to Nphp2flox/flox;Ksp-Cre mice. At P28, the reduction of body weight in Nphp2flox/flox;Pkhd1-Cre mice in comparison to control mice was more moderate than that in Nphp2flox/flox;Ksp-Cre mice (Figure 7-figure supplement 1)."

      " However, the reduced body weight phenotype in mutant mice was not suppressed by VPA treatment (Fig. 7B). We cannot rule out the possibility that the side effects of VPA contributed to weight loss in treated mice. In addition, VPA may reduce body weight through inhibiting HDAC during the growth period: the HDACI Trichostatin A was shown to inhibit adipogenesis (51)."

      Regarding genetic background, all mice analyzed in figures 5 and 7 are in the same genetic background (C57/BL6J). We added more detailed description of genetic background in the Materials and Methods section. Littermate status is now also indicated in figure legends.

      In Figure 5, multiple genotypes (i.g. Nphp2flox/flox;Ksp-Cre, Nphp2flox/flox;Ift88flox/flox;Ksp-Cre and Ift88flox/flox;Ksp-Cre) were analyzed. Because of the limited number of animals per litter and low yield of desired genotypes, non-littermates had to be included in some cases. Littermate status is now highlighted by colors in the data tables of Figure 5 source data.

      In Figure 7, because of the limited number of animals per litter and the need to subject each genotype to VPA and vehicle treatment, non-littermates had to be included in some cases. Littermate status is now indicated by highlight colors in the data tables of Figure 7 source data.

      Several other considerations. The authors state that the effects of VPA are mediated through the drug's inhibition of HDACs and suggest that future studies could be directed at refining the specific HDAC. While this is certainly possible, the authors should acknowledge that VPAs have been reported to have numerous pharmacologic effects and targets and which of these is mediating the effects in their model is unknown (text). They would need mechanistic studies to show this, though it doesn't discount their possible efficacy as a therapy for PKD.

      We agree that it is an important point to clarify and added in Discussion: "It is also worth noting that VPA could affect targets other than HDACs and testing newly approved HDACIs will provide useful insight."

      The authors also state in their abstract that their double knock-out studies "support a significant role of cilia in Nphp2 function in vivo." It is not clear to me how their studies show this nor how they can exclude that ciliary activity is operating in an Nphp2-independent, parallel fashion that modulates some common downstream pathways.

      We agree with the reviewer that our results do not exclude the possibility that NPHP2 and ciliary activity feed into a common downstream pathway, i.e., a cilia-dependent cyst-activating pathway could operate outside of cilia. We changed the sentence in abstract to "supporting a significant interaction of cilia and Nphp2 function in vivo." In addition, we added "Although cilia-dependent, the downstream pathway could potentially operate outside of cilia and receive parallel signals from both ciliary activity and Nphp2." to Discussion to clarify and reflect the results and model more precisely.

      Reviewer #2 (Public Review):

      The manuscript by Li et al demonstrates the role of Nphp2/Invs in renal epithelia in preventing NPHP-like phenotypes, such as epithelial/stromal proliferation and stromal fibrosis, in mice. Previously, mutants of the Nphp2 allele in mice, generated by insertional mutagenesis, showed severe cystic kidney disease and fibrosis in neonates.

      The authors nicely show that the NPHP-like phenotypes in mutant kidneys arise from abnormal signaling specifically within and from renal epithelial cells. Furthermore, the fibrotic response and abnormal increase of cell proliferation precede cyst formation and could be initiated independently of cyst formation. The authors also show that the removal of cilia reduces the severity of Nphp2 phenotypes. The authors suggest that similar to polycystins, NPHP2 inhibits a cilia-dependent cyst and fibrosis-activating pathway. Finally, the histone deacetylase (HDAC) inhibitor valproic acid (VPA) reduces these phenotypes and preserves kidney function in Nphp2 mutant mice, supporting HDAC inhibitors as potential candidate drugs for treating NPHP.

      Overall, understanding the mechanisms driving NPHP phenotypes is important and drugging relevant pathways in treating this disease is an important unmet need in patients. The authors have provided insights into both these aspects in this study. The manuscript is nicely written, and the assays shown are rigorous and insightful.

      We thank the reviewer for the positive comments.

      Reviewer #3 (Public Review):

      In this manuscript, Li et. al, investigate whether epithelial or stromal Nphp2 loss, a gene causative of nephronophthisis (NPHP), drives polycystic kidney disease (PKD) and kidney fibrosis in a novel floxed model of Nphp2. The authors found that only epithelial and not stromal Nphp2 loss results in NPHP-like phenotypes in their mouse model. In addition, the authors show that concurrent cilia, via Ift88 loss, and Nphp2 loss within the kidney epithelium as well as HDAC inhibition results in less severe PKD/kidney fibrosis, as has been shown in mouse models of other non-syndromic forms of PKD, such as autosomal dominant PKD caused by mutations to Pkd1 or Pkd2.

      The authors aimed to understand (1) whether the published NPHP phenotype (kidney cysts and fibrosis), known from the global Nphp2 knockout mouse, is driven by the function of NPHP2 in the kidney epithelium or stromal cells; (2) if kidney fibrosis in NPHP is linked to kidney damage caused by cysts, or independent and preceding of the PKD phenotype; (3) whether cilia are required, causative, or prohibitive of NPHP cystogenesis; and (4) if a broad spectrum HDAC inhibitor is a potential therapeutic approach for NPHP.

      With the provided results, the authors established that epithelial Nphp2 loss is likely a predominant driver of PKD in their model; however, they cannot exclude that stromal NPHP2 does not play a role in cysts growth post-initiation because the authors failed to directly compare their cell type-specific models to a global cre knockout (e.g. Cagg-cre).

      We agree with the reviewer that we cannot rule out the possibility that stromal NPHP2 plays a role post cyst initiation and added "However, our result does not rule out functional significance of interstitial cells once a pro-cystic and fibrotic response is triggered in mutant epithelial cells." to the Discussion section.

      A direct comparison between epithelial specific and global knockout models is an attractive idea, but technically challenging. For an interpretable comparison, it is essential that the stage and knockout efficiency in epithelial cells are equivalent between the two models. However, Ksp-Cre is expressed in the distal nephron specifically, sparing epithelial cells in other segments, while epithelial cells in all segments would be affected by Cagg-Cre. In addition, global knockout of Nphp2 leads to peri-natal lethality. Inducible Cagg-Cre could potentially be used to bypass earlier functional requirements. But matching stage and knockout efficiency in renal epithelial cells between Ksp-Cre and inducible Cagg-Cre mediated knockout remains challenging. These factors make a direct comparison problematic. Finally, our results revealed the role of defective epithelial cells in triggering the phenotypes but did not rule out a role of interstitial cells once abnormal signaling is initiated in epithelial cells. To clarify this point, we added " However, our result does not rule out functional significance of interstitial cells once a pro-cystic and fibrotic response is triggered in mutant epithelial cells." to the Discussion section.

      In addition, it is possible that cyst initiation/growth upon stromal Nphp2 loss occurs substantially slower compared to epithelial Nphp2 loss. However, it seems the authors did not look at kidney phenotypes beyond 28 days of age. Publications from the ADPKD field suggest, that stromal Pkd1 loss initiates cystogenesis much slower than epithelial Pkd1 loss.

      We have expanded our analysis to 8-week-old mice. We now show that Nphp2flox/flox;Foxd1-Cre mice show normal kidney weight, kidney/body weight ratio, kidney function and histology at P56, supporting our original conclusion that deletion of Nphp2 in interstitial cells fails to trigger obvious renal phenotypes, up to young adult stage. These results were presented in Figure 4- figure supplement 1 and the Results section.

      Further, while the authors suggest that kidney fibrosis precedes cyst development, the results supporting this conclusion are limited to one time point, analyzing IF staining of a single marker that can be compared between non-cystic and cystic time points. These analyses need to be extended to make any firm conclusions.

      At the precystic kidney stage (P7), we analyzed SMA and vimentin levels via immunostaining. Their mRNA levels were additionally quantified via RT-qPCR. We have now analyzed vimentin levels at multiple timepoints (P9, 14 and 21) and results were added to Figure 2. Combined, these data support the initiation of a fibrotic response prior to cyst formation.

      The most interesting finding of the manuscript, and likely most impactful to the field, is, that loss of cilia within the setting of epithelial Nphp2 loss reduces PKD severity. This finding parallels published findings for Pkd1 and Pkd2 which are suggested to function in a cilia- dependent cyst-activation mechanism. Unfortunately, the here shown studies, do not add to the mechanistic insight beyond showing the descriptive finding. Most importantly, it remains unclear whether NPHP2 functions in the same pathway as polycystin-1 or -2 (the Pkd1, Pkd2 gene products) or in a separate independent pathway.

      Our Ift88 Nphp2 double mutant results, combined with tissue-specific function of NPHP2, which to our knowledge is completely novel in a NPHP model, suggest that NPHP2 functions as a negative regulator of a profibrotic and pro-cystic pathway that interacts with cilia-mediated signaling in epithelial cells and that abnormal signaling from epithelial cells triggers interstitial fibrosis. We agree with the reviewer that whether NPHP2 functions in the same pathway as polycystins is an interestingly question. However, we feel it is out of the scope of this manuscript and would pursue this research direction in our future studies.

      With respect to the HDAC preclinical studies, the authors show supporting data that a broad- spectrum HDAC inhibitor may be suitable for slowing cyst growth in their model of NPHP. Overall, these studies are not novel to the field, as HDAC inhibition has been shown to slow PKD progression in various models of PKD al while not in NPHP specifically. Further, the studies seem like an add-on, which does not directly link to the prior cell type-specific studies of NPHP2, and no mechanisms linking the two concepts are provided.

      Although we and others showed that HDACIs slow cyst progression in other PKD models, this study is the first to show its impact on a NPHP model. Given the current lack of treatment for NPHP, we feel it important to communicate the results to the research community even though the molecular mechanism remains to be defined.

    1. Author Response

      Reviewer #1 (Public Review):

      The article "Identification of a weight loss-associated causal eQTL in MTIF3 and the effects of MTIF3 deficiency on human adipocyte function" explored the functional roles of MTIF3 during adipocyte differentiation. In persons living with obesity, genetic variation at the MTIF3 locus associates with body mass index and responses to weight loss interventions. MTIF3 regulates mitochondrial protein expression and gene knockouts cause cardiomyopathy in mice. This paper provides insight into the impacts of MTIF3 knockout on adipocyte differentiation and the expression effects of the eQTL on MTIF3 levels. The authors implement a CRISPR/Cas9 gene editing approach coupled with an in vitro platform to detect influences of MTIF3 on adipocyte glucose metabolism and gene expression. This method may serve as a platform to explore knockouts in human cell lines, so it may allow the discovery of new gene x environment influences on in vitro outcomes related to differentiation, growth, and metabolism.

      The conclusions of this paper are mostly well supported by data, but some experimental conditions and data analysis needs to be clarified and extended.

      1) The authors use CRISPR/Cas9 to generate the rs1885988 variant in the human white adipocyte cell line and performed a comprehensive validation analysis of gene editing (Figure 1). qPCR analysis showed reduced MTIF3 expression during human adipocyte differentiation (Figure 1E, F). To expand the importance of the rs1885988 variant, the authors should have provided target gene measurements to verify the canonical differentiation profile (e.g., FABP4, ADIPOQ) and help readers understand the overall impact of gene editing at the MTIF3 locus.

      Thank you for your suggestions. As you requested, we have quantified several adipocyte differentiation markers in the allele-edited cells after 12 days of adipogenic differentiation. The data (Figure 1-figure supplement 1) shows no significant difference between cells with the different genotypes. We have added more information about this in lines 100-101, and also in another context in lines 105-116.

      Notably, the intra-group variation of the marker gene expression is large (Figure 1-figure supplement 1), which makes it difficult to clearly state how much the allele editing, as opposed to random variation resulting from single cell cloning, contributes to the differentiation outcome. However, if we also consider MTIF3 knockout cells (that do not need to be single-cell cloned), their differentiation marker expression also appears unaffected (Figure 3-figure supplement 1). Taken together then, it is unlikely the allele editing with the consequent effect on MTIF3 expression affects adipogenic differentiation in our experiments. We mention the absence of effect of MTIF3 knockout on differentiation in the paragraph starting on line 137.

      2) The direct mechanistic influences of MTIF3 on adipocyte function remain unclear. MTIF3 regulates the translation initiation of mitochondrial protein synthesis. Western blots of OXPHOS proteins do not per se underscore supercomplex formation, which is also a process mediated by MTIF3. Blue native gel electrophoresis may prove a better method to establish the effects of MTIF3 loss-of-function on supercomplex formation.

      As suggested, we have run blue native gel electrophoresis to detect the formation of OXPHOS respiration complexes. In the revised manuscript (lines: 158-168 and Figure 4 E,F), we show how MTIF3 knockout indeed interferes with the complex formation, with lower abundance of complexes V/III2+IV1, III2/IV2 and IV1. Additionally, although the blot signal for complex I+III2+IVn is diffuse, it appears higher in scrambled control cells than in MTIF3 knockout cells. Interestingly, complex II content is slightly higher in MTIF3 knockouts, which may result from a compensatory regulation mechanism, as none of the subunits of complex II is encoded by mitochondrial DNA. We also found several faster-migrating (“undefined bands” in the figure) in the MTIF3 knockout samples, although it is hard to determine whether those are single chain proteins, or degradation or mistranslation products. Overall though, the native gel blots show impaired OXPHOS complex assembly in MTIF3 knockout samples.

      In addition, we performed western blots for other mitochondrial proteins, including COX II (subunit of OXPHOS complex IV), ND2 (subunit of OXPHOS complex I), ATP8 (subunit of OXPHOS complex V), and CYTB (subunit of OXPHOS complex III). The data (Figure 4 A,B), show decreased ND2 and COX II, trending decrease of CYTB, and unaffected ATP8 content in MTIF3 knockout adipocytes.

      The methods (paragraph starting at line 479), results (paragraph starting at line 145), and discussion (lines: 261-263, 274-277) were incorporated in the revised manuscript.

      3) Based on the findings, the authors argue that MTIF3 knockout alters the function of adipocytes. However, many of the experiments show fairly small effect sizes (Figure 5A, Figure 6A). How does the MTIF3 knockout explicitly perform functions related to body weight regulation? Gene editing in vivo would have helped to substantiate the authors' conclusions.

      In the paper we are looking at the consequences of MTIF3 deficiency in one cell type, over short time, in vitro. The outcome of body weight regulation, e.g. during weight loss, would result from long-term effects of MTIF3-altered metabolism in more than one tissue. We envisage that small changes in energy metabolism in not only fat, but also in e.g. muscle, would make a substantial difference over time in vivo (this, we cannot capture in in vitro models). We have added this discussion to lines 294-311.

      As for in vivo genomic editing, the alleles of interest are specific to the human genome. Ideally, a genotype-based recall study in humans would be appropriate, but due to time and resource limitation, we are not able to conduct such a study at the moment (although we certainly hope to perform such a study in the future). As for modeling the MTIF3 deficiency in mice – the MTIF3 knockout mice are not viable [1], and certainly other options (e.g. overexpression, tissue-specific knockouts) are possible and tempting to investigate. This, however, would require considerable additional work which we could only perform in a future project.

      4) In several instances, the authors refer to 'feeding' cells with glucose (line 206, line 171). Feeding experiments often imply complex nutrient interventions in animal models and people, which cannot be easily recapitulated in cell culture. The in vitro experiments simply alter levels of glucose and more precise language would state the specific challenges accurately.

      In the revised manuscript, we have substituted “feeding” for exact glucose concentration, or “glucose concentration” where appropriate. (paragraph starting at line 215, and lines 577-578, 597, 873-879)

      Reviewer #2 (Public Review):

      Huang Mi, et al. investigated the role of MTIF3, the mitochondrial translation initiation factor 3, in the function of adipocytes. They first detected the expression of the obesity-related MTIF3 variants based on the GTEx database and found two variants lead to an increase in MTIF3 expression. Then they knockout MTIF3 in differentiated hWAs adipocytes and characterized the mitochondrial function. They found loss of MTIF3 decrease mitochondrial respiration and fatty acid oxidation. They further treated cells with low glucose medium to mimic weight loss intervention and found MTIF3 knockout adipocytes lose fewer triglycerides than control adipocytes. This paper provides new information about MTIF3 in adipocytes and the potential functional role of MTIF3 in mitochondrial function.

      1) The authors provided sufficient data to show those two genetic variants increase MTIF3 expression. Their CRISPR/Cas9 knockin cell line is also convincing. But they didn't show if the genetic variants affect adipogenesis. Adipogenesis is an important process for weight gain and fat deposition. In lines 103-107, the authors mentioned that the "allele-edited cells have some problem in differentiated state, e.g. triglyceride or mitochondrial content", so they used an inducible Cas9 system. However, the issue of differentiated allele-edited cells may be the functional effect of MTIF3 genetic variants, such as interrupting adipogenesis, decreasing triglyceride, or affecting mitochondrial number. The authors should provide that information.

      Thank you for all your suggestions. We think we were not clear regarding this issue. We did not mean that the allele-edited cells have problem in differentiated state, which then definitely could be (as you point out) due to the functional effect of MTIF3 genetic variants. The problem relates to the process of single-cell cloning itself, which inherently introduces random variation. As a consequence, the data on adipogenic differentiation in allele-edited cells has relatively high intra-group variation. We have added more clarifying text in lines 104-116.

      To provide the data on this, per your request, in the revised manuscript we include the results for the rs67785913-edited cells in Figure 1-figure supplement 1. As shown, we observed no differences in the expression of adipogenic markers (ADIPOQ, PPARG, CEBPA, SREBF1 and FABP4) or in mitochondrial content between the two rs67785913 genotypes. Since the intra-group variation is often high, it is hard to conclude how much the rs67785913 eQTL affects the quantified variables. Much of the variation could instead be ascribed to the effects of single cell cloning.

      The cloning per se introduces random variation, but is required to obtain homozygous allele-edited cells. Because of this dilemma, and to clarify how much MTIF3 expression can actually influence adipogenic differentiation, we have, during the revision, also used the hWAs-iCas9 cells to generate MTIF3 knockouts at the preadipocyte stage and then tested their differentiation capacity. As we show in Figure 3-figure supplement 1, we found no apparent differences in adipogenic marker gene expression between scrambled control and MTIF3 knockout cells (we mention that in lines 137-144). Taken together, our results may indicate that the rs67785913 genotype, through affecting MTIF3 expression, is unlikely to regulate adipogenic differentiation.

      2) In Figure 4, the author mentioned that MTIF3 knockout does not affect the expression of adipogenic differentiation markers. They need to provide more evidence to prove their point. Oil-red O staining is a clearer way to quantify adipocyte differentiation in cell culture. In addition, in Fig. 4B western blot, the author should include MTIF3 as a control to show the knockout efficiency. It is not clear the meaning of plus and minus in that panel. The author should also compare the total triglyceride levels in MTIF3 knockout cells and control cells.

      We have now included Oil-red O staining results and total triglyceride levels (Figure 3 F,G), which show no apparent differences between scrambled control and MTIF3 knockout cells (method: lines 427-431; results: lines 137-144). We also added the MTIF3 blots to figure 4A as a control, showing high and consistent MTIF3 knockout efficiency in independent experiments. In the original manuscript, the plus and minus referred to control and knockout, respectively. To clarify that, we have changed the expression to SC and KO in the revised manuscript.

      With regards to Oil-red O vs. quantification of adipogenic markers, we actually prefer the latter method, as it gives more accurate and less variable results than Oil-red O (at least in the cell line we use). We have, however, performed Oil-red O as well to address your question.

      3) MTIF3 is a translation initiation factor in mitochondria and is involved in the protein synthesis of mitochondrial DNA-encoding genes. The authors should check protein levels rather than the mRNA levels of mitochondrial DNA-encoding genes (Fig. 6E). It's interesting to see the increase of mRNA levels of ND1 and ND2, which might be feedback of lower translation. Since ND1 and ND2 are in OXPHOS complex I, the expression levels of complex I in MTIF3 KO cells would be worth checking. Additionally, the author should also check the mitochondria copy number.

      As suggested, we have detected several mitochondrial encoding proteins which are subunits of each mitochondrial OXPHOS complex. As shown in figure 4A, ND2 (subunit of OXPHOS complex I) and COX II (subunit of OXPHOS complex IV) expression were significantly reduced, CYTB (subunit of OXPHOS complex V) expression tended to decrease, and ATP8 expression was not affected in the MTIF3 knockout adipocytes. We also detected the formation of the OXPHOS respiration complex in extracted mitochondrial proteins and found MTIF3 perturbation affect mitochondrial complex assembly. The detailed methods (lines: 479-490), results (lines: 145-169) and discussion (lines: 260-262, 274-277) were incorporated in the revised manuscript.

      We have also added the mitochondrial copy number data (Figure 3A), showing that MTIF3 knockout has lower mitochondrial content (methods: lines 491-500; results: 156-157)

      4) MTIF3 knockout adipocytes retain more triglycerides under glucose restriction is interesting. It may link to the previous result of lower fatty acid oxidation in MTIF3 knockout adipocytes. However, the authors then showed there is no difference in lipolysis. The author should discuss those results in the manuscript.The authors could also check lipolysis in glucose restriction conditions. It's also necessary to include the triglyceride levels of KO cell lines at full medium

      We have now examined the glycerol release in glucose restriction condition, and found no differences between control and MTIF3 knockouts (Figure 6-figure supplement 1). Interestingly, in 1 mM glucose, both genotypes released less glycerol than at 25 mM glucose, and this has been observed before in SGBS cell line [2] According to your suggestion, we have added the total triglyceride content at 25 mM glucose condition (Figure 6C), which also was not different between control and MTIF3 knockout cells. We speculate the higher retention of triglycerides in the knockouts could be due to higher re-esterification of lipolytically released fatty acids, since, as we observed, fatty acid oxidation is impaired in the knockouts. In the revised manuscript, we added that to the discussion (lines: 289-293).

      References

      1. Rudler, D.L., et al., Fidelity of translation initiation is required for coordinated respiratory complex assembly. Sci Adv, 2019. 5(12): p. eaay2118.
      2. Renes, J., et al., Calorie restriction-induced changes in the secretome of human adipocytes, comparison with resveratrol-induced secretome effects. Biochim Biophys Acta, 2014. 1844(9): p. 1511-22.
    1. Author Response

      Reviewer #2 (Public Review):

      The idea that decidualization is related to or evolved from wound healing, including fibroblast activation, is old, going back all the way to Creighton 1878 who pointed to the similarity between granulation tissue and decidual tissue, and is supported by the fact that embryo implantation is a compensated form of the endometrial lesion. Nevertheless, the mechanistic connection between FB activation and decidualization is an important fact necessary for understanding decidualization, a fact that is reflected in previous work, for instance, Kim et al., 1999 (Hum Reprod 14 Suppl 2), their reference 20, and Oliver et al., 1999 (Humn Reprod 14), their reference 56 a.o.m. More specifically, a recent single-cell study of in vitro decidualization has shown that a myofibroblast-like cell state is a transient state in the process of decidualization, i.e. decidual cells themselves are not so much activated fibroblasts, but rather decidual cells differentiate after endometrial stromal fibroblasts undergo a FB activation like process, and the decidual re-programming happens from these activated FB like states (Stadtmauer et al., 2021, Biol. of Reprod. 1-18).

      Yes, the paper from Stadtmauer DJ and Wagner GP (2022) was cited in revised version.

      The above assessment of how the current study fits into the conceptual landscape of mammalian reproductive biology does not diminish the importance of the paper under consideration. The study contributes a large amount of observational and experimental facts to the understanding of how FB activation and decidualization are related. The authors suggest, in particular, that blastocyst-derived TNF activates the cLPA- producing Arachidonic acid (AA), activating PGI2 and PPARd signaling pathway (more about this later).

      Other major comments:

      The authors suggest that luminal epithelial cells signal through the release of arachidonic acid (AA) in response to TNF. That is interesting and supported by in vitro experiments inducing decidualization and FB activation by AA. What makes this conclusion a little problematic is that it is known that luminal epithelial cells also express COX2/PTGS2 and thus the synthesis of prostaglandins is already starting in the LE and thus LE can also signal to the stoma via PGE2, PGI2 as well as PGL2 rather than AA directly. The in vitro experiments can not exclude the possibility that the ESF is producing some prostaglandin and then having an autocrine effect.

      Yes, we agree with you. It is possible that PGI2 and PGE2 from luminal epithelial cells may also induce fibroblast activation. Based on the data from in situ hybridization, COX-2, mPGES, PGIS and PPARδ are mainly expressed in subluminal stromal cells at mouse implantation site on day 5 of pregnancy (Lim et al, 2000; Ni et al, 2002; Wang et al, 2004). Therefore, PGI2 from stromal cells should be the dominant one compared to that from luminal epithelial cells. In the future, we will examine the effects of AA on COX-2, mPGES and PGIS in luminal epoithelial cells.

      Lim H, Dey SK. PPAR delta functions as a prostacyclin receptor in blastocyst implantation. Trends Endocrinol Metab. 2000 May-Jun;11(4):137-42.

      Ni H, Sun T, Ding NZ, Ma XH, Yang ZM. Differential expression of microsomal prostaglandin e synthase at implantation sites and in decidual cells of mouse uterus. Biol Reprod. 2002 Jul;67(1):351-8.

      Wang H, Ma WG, Tejada L, Zhang H, Morrow JD, Das SK, Dey SK. Rescue of female infertility from the loss of cyclooxygenase-2 by compensatory up-regulation of cyclooxygenase-1 is a function of genetic makeup. J Biol Chem. 2004 Mar 12;279(11):10649-58.

      344: here the authors report that PGE2 has no effect on FB activation marker expression, but the problem with that is, that (at least in human ESF), progesterone is causing a change in the expression of the PGE2 receptors from EP4 to EP2, and it is only the EP2 receptor that activates cAMP/PKA pathway.

      Yes, we agree with you. PGES is highly expressed in stromal cells at implantation site. Previous studies also show that PGE2 is important during decidualization. In our study, PGES showed no significant changes after stromal cells were treated with AA. PGE2 also had no significant effects on fibroblast activation. Therefore, we focused on PGI2-PPAR pathway. It is possible that PGE2 may regulate decidualization through an alternative way rather than fibroblast activation.

      The fact that the authors show an effect of PGI2 is interesting because PGI2 receptors are among the strongest expressed PTG receptors in mammalian ESF. Prostacyclin receptor is a GPCR rather than a nuclear receptor. So the question is really why the authors have not pursued the role of prostacyclin receptor and instead have focused on PPARd?

      Yes, we agree with you. When mouse stromal cells were treated with AA, there was no significant change for the protein level of prostacyclin receptor (Figures 4E, 4F). When mouse stromal cells were treated with the agonist SELEXIPAG of prostacyclin receptor, the markers of fibroblast activation showed lower changes compared with treatments with PPARδ (Figure 3D). Therefore, we focused on PPARδ. Yes, we agree with you. Although prostacyclin receptor is less responsive than PPARδ in activating fibroblast activation, it should contribute to fibroblast activation. In the future, we will pursue the effect of prostacyclin receptor on fibroblast activation. Thank you very much for your suggestion.

      Reviewer #3 (Public Review):

      This manuscript postulates that uterine stroma cells undergo a stage of activation between the resting state and the differentiated decidual state in order to support embryo implantation. Using in vivo mouse and in vitro mouse and human stroma cells they demonstrate that during decidualization the stroma cells express the marker genes for activated stroma. They then trace an axis from the embryo-producing TNF to prostaglandin production and activin A that is required for this process. They propose data to show that activation of the stroma is altered in infertility due to fetal trisomy 16.

      The strengths of this manuscript are:

      1) This is a comprehensive study using both in vivo and in vitro studies and in both mouse and human stroma cells.

      2) The experiments use a combination of ligands, agonists, and inhibitors to map the signaling axis regulating stroma activation.

      3) The data shown support the conclusions in this manuscript.

      The weaknesses of this manuscript are:

      1) The conclusion that Acitvin A is the regulator of stroma activation as mentioned by this manuscript is correlative. What is needed is a knockdown of Activin A and then assess stroma activation to prove Activin A is the major regulator and not one of many TGFb family members.

      Yes, the data from Activin A knockdown were provided.

      2) The use of uterine epithelial cells is problematic. The in vitro co-culture approach is not a state-of-the-art co-culture. Removal of epithelial cells from the uterus results in loss of the epithelial phenotype. If the manuscript used an epithelial organoid stroma cell coculture approach it may better reflect the role of the epithelial cells in this process. Otherwise, it is not clear that the epithelial cells are actual participants in the signaling axis. The treatments could be directly on the stroma cells.

      Yes, we agree with you. According to your suggestions, we established a culture system for epithelial organoid. When the epithelial organoids were treated with TNF, a similar response was obtained compared with in vitro cultured mouse epithelial cells.

      3) Ishikawa cells are endometrial cancer cells. They do not really reflect uterine epithelium and it is not clear that any epithelial cell could be substituted for these cells.

      Thank you very much for your comments. It is true that Ishikawa cells should be different from in vivo endometrial epithelial cells. However, several studies showed that Ishikawa cell line possess apical adhesiveness to JAR trophoblast cells and expresses many of the same enzymes and structural proteins found in normal human endometrium (Castelbaum AJ et al, 1997).. Because both estrogen and progesterone receptors are expressed in Ishikawa cells, Ishikawa cells show a good response to both estrogen and progesterone (Castelbaum AJ et al, 1997). Therefore, Ishikawa cells are used as a model for receptive endometrial epithelial cells (Hannan NJ et al, 2010).

      Castelbaum AJ, Ying L, Somkuti SG, Sun J, Ilesanmi AO, Lessey BA. Characterization of integrin expression in a well differentiated endometrial adenocarcinoma cell line (Ishikawa). J Clin Endocrinol Metab 1997; 82:136-142.

      Hannan NJ, Paiva P, Dimitriadis E, Salamonsen LA. Models for study of human embryo implantation: choice of cell lines? Biol Reprod. 2010; 82:235-245.

      Lessey BA, Ilesanmi AO, Castelbaum AJ, Yuan L, Somkuti SG, Chwalisz K, Satyaswaroop PG. Characterization of the functional progesterone receptor in an endometrial adenocarcinoma cell line (Ishikawa): progesterone-induced expression of the alpha1 integrin. J Steroid Biochem Mol Biol. 1996; 59:31-39.

      4) The activation of stroma cells in the fetal trisomy 16 experiments at the end is very superficial. Data should show that these cells decidualize with decidual markers. This appears to be an experiment to show the translational value of the signaling axis. This experiment, again, is not well developed, does not add much to the manuscript, and should be omitted.

      Yes, we agree with you. The description on human trisomy 16 was deleted.

      In summary, the concept of stroma cell activation as part of decidualization is nicely developed and will add to the field. Normally investigators consider decidualization a mesenchymal to epithelial transition while some consider it stromal activation. This manuscript demonstrates that stroma cell activation is a critical part of the process of decidualization.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors screen large libraries of small proteins to identify three proteins of <50 aa that rescue the growth of an auxotrophic serB deletion Escherichia coli strain. They convincingly show that the growth rescue is due to the small proteins increasing expression of the his operon by reducing transcriptional attenuation. The authors argue that the small proteins function by directly binding the his RNA 5' UTR to alter RNA secondary structure.

      The conclusion that the three small proteins reduce his operon attenuation is well supported by the data. A previous study suggested this mechanism for a somewhat larger, randomly selected protein, but the current study extends this prior work by firmly establishing that the proteins modulate attenuation. The suggestion that the small proteins function by directly binding the his RNA is less well supported by the data. The RNase T1 mapping data are not straightforward to interpret, and there is no assessment of protein-RNA interactions in vivo.

      Major comments:

      1) The RNase T1 probing data are not straightforward to interpret, and hence are insufficient to conclude that Hdp1 binding to the his 5' UTR is the mechanism by which it reduces attenuation. Specifically, G96 has reduced cleavage in the presence of Hdp1, inconsistent with the antiterminator conformation. The authors argue that G96 could be within the site of Hdp1 binding. This is certainly possible but would require additional experimental evidence to draw a confident conclusion. Also, the increased cleavage of bases around the start codon and Shine-Dalgarno sequence is inconsistent with a shift from the terminator to the antiterminator conformation. One confounding issue here is the lack of replicates and the lack of quantification. Additional probes could be tested, which would provide complementary structural information.

      We agree that the RNase T1 probing data alone does not provide sufficient resolution to fully assess changes in terminator/anti-terminator conformations. Therefore, we have clarified our interpretation of the data, addressed its limitations, and have softened the conclusions that can be drawn from it in the text (lines 419-431). We have also included two additional T1 probing experimental replicates in Supplementary Fig. S11 which are in agreement with the cleavage patterns presented in the main text Figure 3D. Based on the revised conclusions and the consistency of the cleavage patterns between the experimental replicates, we do not think that quantification of the probing data would provide any additional information.

      2) There are no experiments to test whether Hdp1 binds the his RNA in vivo. The in vitro data show that Hdp1 can bind the his RNA, but they do not show that this occurs in vivo, or that this is the mechanism by which Hdp1 regulates the expression of the his operon.

      As addressed in the Essential Revisions section, we have now performed and included data from co- immunoprecipitation assays, in which we were able to successfully detect and demonstrate enrichment of his operator-regulated RNA transcripts in HA-tagged Hdp1 pull-down samples. We were also able to demonstrate less enrichment (i.e. reduced interaction/specificity) for thr operator-regulated RNA transcripts in the Hdp1 pull-downs as well as lower enrichment for all his operator-regulated target RNA transcripts in pull-downs performed with the HA-tagged Hdp1 L27Q mutant. These data are presented in Fig. 3A and discussed in lines 313-337.

      Reviewer #2 (Public Review):

      In this work, Babina et al. address a central question in molecular evolution that is only partially answered: how does cellular novelty emerge in evolution? The authors focus here on small proteins, whose importance to various cellular functions has become more appreciated recently. Babina et al. ask if functional small proteins can emerge from random sequences, a question that is mostly unresolved with only a small number of examples in the published literature for such functions. In this study, the authors demonstrate that proteins selected from random, synthetic libraries can rescue auxotrophy in E. coli. Namely, the authors find three small, random proteins (<50 amino acids) that allow E. coli cells with a ΔserB genetic background to grow in a medium without the amino-acid serine. They then show that this rescue is based on the up-regulation of HisB, an enzyme that can compensate for the serB deletion. Finally, using different molecular biology techniques, the authors propose a model in which up-regulation of HisB is achieved by physical interactions between the random proteins and the his operator that regulates the transcription of the his operon in E. coli.

      Notably, as the authors themselves point out, a previous study has already shown that semi-random proteins can result in up-regulation of HisB levels to rescue ΔserB cells. Thus, most of the novelty comes from the attempt to figure out the molecular mechanism of the three random proteins. The idea that a random protein binds the 5' of an mRNA which results in up-regulated expression levels is interesting and can benefit the field. However, some clarification on existing data and additional control experiments are needed to support the authors' claims:

      1) Growth data are not presented in the current form of the manuscript, which makes it impossible to evaluate many of its claims. Especially, the extent of rescue and fitness gain achieved by these random proteins compared to cells harboring the serB gene.

      We thank the reviewer for pointing out this discrepancy. We have now added all relevant growth data under non-permissive conditions (Figure 1G, Supplementary Figures S2, S3, S5) and have also included data on the fitness effects exerted by Hdp expression in cells harboring serB under permissive conditions (LB medium), to allow for comparison with the empty plasmid control strain (Supplementary Figure S1).

      2) The authors have screened their library on other auxotrophic strains, however, they could only find random proteins that rescue growth in the ΔserB background. Currently, they do not address this point, but it might be relevant to the molecular mechanism of those random proteins.

      The reviewer raises an interesting point. We have added a paragraph to our Discussion addressing why we believe that the serB-model with a complementary enzyme is an ideal target for the selection of de novo genes (lines 536-543).

      3) Central to the authors' claims is the up-regulation of HisB, however, they mostly work with an alternative LacZ system to assess the effects of the random proteins on expression. The paper will benefit from some more work measuring actual HisB levels as expressed by the various constructs used along the paper. The authors did provide an important proteomic analysis to show that HisB (along with other proteins in the his operon) is up- regulated as a result of the expression of one of the random proteins. However, it is unclear if the reported ~3- fold increase in HisB levels is enough to allow the growth of ΔserB cells in a medium without serine.

      We thank the reviewer for raising this concern and allowing the opportunity to clarify. It is well established that upregulation of HisB can rescue growth of a SerB-deficient strain on minimal medium (for examples, see Patrick, et al. PMID: 17884825, Digianantonio and Hecht PMID: 26884172). We have now performed additional proteomics analyses that show a specific upregulation of the his operon upon expression of Hdp1 and Hdp3. We have also added a control experiment overexpressing HisB from our expression vector, showing that it restores growth of the auxotrophic ΔserB mutant. It is also clear that histidine starvation itself does not de-repress HisB sufficiently to allow growth of a ΔserB mutant, as this strain does not grow on minimal medium lacking histidine (such as M9 minimal medium that was used for the functional selection in our study). In addition to upregulation of HisB, we show that the rescue is dependent on presence of HisB and provide additional experiments showing a specific interactions in vitro and in vivo of Hdp1 with the his operator RNA. Our results clearly show that rescue depends on HisB and that Hdp expression upregulates HisB, and we do believe our central claim is substantiated beyond reasonable doubt. The reviewer’s main concern, that it is unclear if expression levels of HisB are high enough to allow growth is, in our opinion, resolved by the observation that Hdp-dependent upregulation of HisB does restore growth.

      We respectfully disagree with the reviewer’s suggestion that an exact determination of the level of upregulation is relevant and needed, as outlined above. In addition, we would like to point out that it is not possible to measure HisB upregulation compared to an empty plasmid control strain under non- permissive conditions. Comparing HisB levels in a ΔserB strain expressing Hdp vs. the empty plasmid control in minimal medium is not possible, since the empty plasmid control strain is not able to grow, and the corresponding baseline of HisB expression cannot be determined in a non-growing strain. To circumvent this, we determined HisB levels in rich medium, which does not necessarily reflect the exact amount of upregulation occurring under non-permissive conditions, but still allows us to detect a physiological activity. Alternative experimental setups, such as comparing HisB levels in a strain carrying serB in minimal medium also suffer severe shortcomings as it no longer reflects the cellular physiology of the auxotoph under non-permissive conditions, where growth is dependent on HisB upregulation.

      4) It is unclear how noisy and statistically significant some of the critical experiments in the manuscript are, especially the EMSA and T1-digestion experiments. The authors should try to find a different operator with a similar RNA structure and attenuation function, but a different nucleotide sequence, to the his operator, and show that this control operator is unaffected by the random proteins. Demonstrating the lack of phenotypes using the LacZ system, EMSA experiments, and T1-digestion patterns will much support the authors' claims.

      We thank the reviewer for suggesting this important control and agree that its inclusion significantly strengthens our claims. We used the threonine operon (thr) operator, which is regulated by terminator/anti-terminator formation similar to that of to the his operon with the his operator. We show that Hdp1 does not cause de-repression of this operator using a lacZ reporter construct. Strongly supporting this is the fact that our whole proteome analysis showed specific upregulation of the his operon. Any other off target de-repression would be detected in this assay. Furthermore, we now include the thr operator RNA as a control in the EMSAs, which demonstrates reduced binding with Hdp1 in comparison to the his operator RNA. We also added an in vivo pull-down experiment using tagged Hdp1, showing marked enrichment of his operator-regulated RNA transcripts, whereas the observed enrichment of the control thr RNA transcripts is substantially less.

    1. Author Response

      Reviewer #1 (Public Review):

      Thakkar et al describe the immune effects of 3rd and 4th doses of COVID-19 monovalent vaccines in a diverse cohort of immunocompromised cancer patients. They describe augmentation of anti-Spike antibodies after dose 3, especially seroconversion in 57% of patients, followed by a durable response over six months. The fourth dose was associated with increased anti-Spike antibodies in 67% of patients. T-cell responses were seen in 74% and 94% of patients after the third and fourth doses respectively. Strikingly, neutralization of Omicron was absent in all patients after the third dose but increased to 33% after the fourth dose.

      Strengths:

      Diverse cohort (34% Caucasian, 31% AA, 25% Hispanic 8% Asian) including 106 cancer patients after dose 3, of which 47 patients were longitudinally assessed for six months, as well as eighteen patients assessed after the fourth dose. Seronegative as well as seropositive patients benefit from a third dose of vaccination. Assessment of cellular (T cell) immune responses and viral neutralization against wild-type as well as Omicron variant is commendable.

      Weaknesses:

      The efficacy of the bivalent vaccine (Omicron specific) is not studied here, since the fourth dose of vaccine was a monovalent vaccine. This should be clarified in the discussion.

      We have added text in the discussion section regarding this comment, lines 470-472

      “The bivalent COVID-19 vaccine was introduced after the enrollment for our study was closed however it is reassuring to see that the bivalent vaccine has better neutralization activity against Omicron sub-variants”

      The authors describe an increase in anti-S titers after monoclonal antibodies. Were any of the patients receiving IVIG, and what was the effect, if any on Anti-S antibodies? Characteristics of breakthrough infections, particularly if they had prolonged duration, would be important to include.

      We have added text in the results section for IVIG (lines 382-383) and characteristics of breakthrough infections (lines 341-344)

      “No patients were on intravenous immunoglobulin (IVIG) at the time of study participation” “Of the 4 breakthrough infections, 1 patient had no symptoms, and 3 had mild symptoms”

      Reviewer #2 (Public Review):

      In this manuscript, Thakkar and colleagues evaluate the immunogenicity of 3rd and 4th doses of SARS-CoV2 vaccinations in patients with cancer. The authors find that additional vaccine doses are able to seroconvert a subset of patients and that antibody levels correlate with T-cell responses and viral neutralization.

      The main strengths of this manuscript are:

      1) The authors systemically performed a broad array of immunological assessments, including assessments of antibody levels, T cell activity, and neutralization assays, in a large cohort of patients with cancer receiving 3rd and 4th doses of COVID vaccines.

      2) The authors recruited an ethnically diverse cohort of patients with diverse cancer types, though enrolled participants were enriched for hematological malignancies.

      3) Prior to FDA/CDC guidance supporting a 4th vaccine dose, the authors recruited participants with no or inadequate responses into a prospective clinical trial of a 4th dose, the results of which are outlined here.

      4) The authors' findings that patients with hematologic malignancies and those receiving anti-CD20/BTK inhibitors have lower immunological responses to SARS-CoV-2 vaccines are consistent with multiple prior studies, including prior studies from these authors.

      5) The authors also find that 3rd and 4th COVID vaccine doses are able to seroconvert a subset of patients with no or "inadequate" responses, though it's unclear whether seroconversion is enough for true protection from SARS-CoV-2 infection.

      The main weaknesses of the manuscript include:

      1) The study cohorts disproportionately enrolled patients with hematological malignancies who have been previously shown to mount lower immunological responses to COVID-19 vaccines; thus, the findings may not be representative of a typical oncology patient population.

      We have clarified this in the discussion (lines 465-466)

      “However, caution should be exercised in generalizing these results to the broader immunosuppressed population given the small sample size of our cohort and the disproportionately high representation of hematologic malignancy patients”

      2) The subgroup analyses were relatively small.

      The discussion text in line 464-465 is in concordance with this observation

      “However, caution should be exercised in generalizing these results to the broader immunosuppressed population given the small sample size of our cohort and the disproportionately high representation of hematologic malignancy patients”

      3) The nomenclature used in the manuscript was confusing when it came to "baseline" assessments and boosters versus additional doses of vaccines.

      We have clarified the nomenclature throughout the manuscript

      4) Ultimately, the major limitation of this manuscript is that antibody levels/T-cell responses/neutralization are surrogates for immune protection against SARS-CoV-2, but it's unclear what defines the ideal cutoffs for protection. Simply seroconverting may still be insufficient. The authors don't provide data showing antibody levels as relates to breakthrough infection, likely because they are underpowered for this analysis.

      We have added text to expand on this further lines 475-482

      “Further efforts are also needed to better determine cut-off values at which anti-S antibody levels provide protection from symptomatic COVID-19. At the present time, this data exists only for neutralizing antibody titers[36, 44] and the commercially available anti-S antibody assays are quite heterogenous with efforts being made to improve equivalency in titer reporting[45]. Our study while providing a correlation between anti-S antibody titer and neutralizing antibody titer supports that the higher the titer, the better neutralization is expected and by extrapolation, less likelihood of symptomatic infection however this needs to be confirmed in larger, systematic studies”.

    1. Author Response

      Reviewer #3 (Public Review):

      Zhang, Q. et al. developed a two-photon fluorescence microscope (2PFM) by incorporating direct wavefront sensing adaptive optics (AO), which is optimized for mouse in vivo retinal imaging. By using the same 2PFM with the option of using or not using the incorporated AO system, this team compared the in vivo retinal images and convincingly demonstrated that AO correction acquired brighter and higher resolution images of retinal ganglion cells (RGCs) and their axons in both densely and sparse labeled transgenic mouse lines, normal and defected capillary vasculatures, and RGC spontaneous activities detected by genetic Ca2+ sensor. Interestingly and importantly, this team found that a global correction by removing the common aberration from the entire FOV enhances imaging signals throughout the entire large FOV, indicating a preferable AO imaging strategy for large FOVs. The potential applications of the in vivo retinal imaging techniques and strategies developed by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal vasculatures and neurons during disease progression and before and after treatments. It would be beneficial to the manuscript and the readers if the authors can elaborate on optic design a little bit more. For example, whether the incorporation of AO adversely affects the 2PFM optic design? If the 2PFM can be further optimized by uncompromised optic design without incorporating AO, the quality of in vivo images will comparable to the AO-2PFM or not?

      We thank the reviewer for these thoughtful questions.

      Whether the incorporation of AO adversely affects 2PFM optical design may be a matter of perspective. As we demonstrated in the retina and elsewhere, AO substantially improves the achievable spatial resolution. Its incorporation does not reduce the temporal resolution of the system, as the ocular aberrations are temporally stable in the anesthetized mouse due to the lack of eye movement and do not require repeated aberration measurements throughout the imaging session. Signal enhancement by AO can increase the frame rate by reducing pixel dwell time required to achieve desired signal-to-noise ratio (SNR). The deformable mirror used for wavefront correction has high reflectivity, thus does not reduce the power throughput of the 2PFM. Using similar lenses for conjugation of the AO path to those employed by the 2PFM itself, we also maintain the scanning field of view size.

      However, the incorporation of AO, including the direct wavefront sensing module (the “L10-L11-SH-sensor” path in Fig. 1A) and the deformable mirror (together with a pair of lenses for optical conjugation), does increase the complexity of the imaging system. Maintaining the optimal performance of AO also requires advanced optical knowledge that may not be possessed by most biological users.

      For this reason, we carefully designed the 2PFM path for optimal imaging performance without AO, characterized its performance (“AO two-photon fluorescence microscope (AO-2PFM)” and “System correction” sections of Materials and Methods, Fig. S1), and optimized sample preparation including designing our own contact lens (“In vivo imaging” section of Materials and Methods, Fig. S2). Our efforts, which we believe to have led to the best possible performance of a 2PFM sans AO, allowed us to resolve retinal capillaries and cell bodies (in 2D) in vivo. Therefore, our 2PFM (sans AO) design and sample preparation procedure should benefit users who do not plan to implement AO.

      Hypothetically, if the ocular aberrations of all mouse eyes were similar, it would be possible to add a static corrective element to a conventional 2PFM to improve image resolution (in the same spirit as the non-prescription reading glasses for far-sighted human eyes). However, as shown in Fig. S6 (“Zernike decompositions and corrective wavefronts for all experiments”), ocular aberrations are variable. These variabilities may arise from alignment differences (e.g., different angles between the optical axis of the ocular optics and the optical axis of the 2PFM), which can be minimized by establish a procedure to reproducibly position the eyes of different mice in similar ways. In this case, a static corrective element may be designed for substantial aberration reduction. However, the variations also arise from optical differences in the ages [1] or strains [2] of the mice. To have a 2PFM that always performs at the diffraction limit, an adaptive element as employed by AO is necessary to maintain optimal performance regardless of the specifics of the sample.

      References

      1. C. Cheng, J. Parreno, R. B. Nowak, S. K. Biswas, K. Wang, M. Hoshino, K. Uesugi, N. Yagi, J. A. Moncaster, W.-K. Lo, B. Pierscionek, and V. M. Fowler, "Age-related changes in eye lens biomechanics, morphology, refractive index and transparency," Aging (Albany. NY). 11(24), 12497–12531 (2019).
      2. C. Tan, H. na Park, J. Light, K. Lacy, and M. Pardue, "Strain differences in mouse lens refractive indices when measured with OCT," Invest. Ophthalmol. Vis. Sci. 54(15), 1917 (2013).
    1. Authoor Response

      Reviewer #1 (Public Review):

      This manuscript investigates the question of how polylysogeny impacts competition with a sensitive non-lysogen, and how this is shaped by phage resistance. This is an important and timely question, as lysogeny can be a strategy to invade new niches, and prophages are important vehicles for the acquisition of a range of virulence factors by pathogens including Klebsiella. The authors use a polylysogenic Klebsiella clone in competition with a non-lysogen that is sensitive to at least some of the prophages produced by the polylysogen. They compete these strains over a 30-day period and measure host population dynamics and evolution of phage resistance and lysogenic conversion in the (initially) sensitive competitor. Overall, the experiment shows that lysogen formation is relatively rare and short-lived. Instead, phage resistance through complete loss of the capsule is the primary mechanism evolving, but other resistant capsule mutants, with more subtle mutations affecting capsule expression, emerge as well. The authors have collected a very impressive amount of data and made some very interesting observations.

      My main problem with this paper is that the manuscript lacks a clear narrative, making it very hard to extract the key message this paper wants to convey. Related to this, (some of) the conclusions that the authors make do not appear to be well supported by the data. For example, the authors conclude that selection favours more subtle capsule mutations because they are less costly than capsule-loss mutants (lines 497-500). However, there are no data to support this conclusion, as fitness costs of the various resistance phenotypes analysed were not measured. Apart from the genotypes, the data that are presented in this show that these subtle mutants have more subtle decreases in capsule production compared to the mutants that show a complete loss of capsule. But this does not tell us their relative cost. It also doesn’t tell us how the emergence of these different mutants relates to phage pressure, because whilst bacterial population dynamics data are monitored meticulously, phage dynamics data are missing (I have not found them in the supplemental information either). This makes it impossible to directly relate the emergence of the various resistance mechanisms to phage infection pressure during the coevolution experiment, even though this appears to be a hypothesis the authors wish to test.

      Overall I think the overarching question of the manuscript is important and the model system is a very relevant one to study this question, but in my view, the current data don’t support the conclusions of the paper. Apart from these criticisms, the manuscript is very well written and the figures are overall easy to interpret.

      We thank the reviewer for the critical assessment of our work and the time invested in the process. We have modified our manuscript following the recommendations, provided new data and we are convinced that our main results are now fully supported by the data.

      Reviewer #2 (Public Review):

      This manuscript presents data on multiple experiments regarding the co-evolution of poly-lysogenic and phage-susceptible Klebsiella pneumoniae strains. In particular, the manuscript aimed to determine the mechanisms of resistance that would shape bacterial competition over co-evolutionary timescales. The major finding is that the potential for lysogenization as a phage resistance mechanism is narrow and only likely to occur given certain circumstances. Moreover, the manuscript again reinforces the importance of receptor changes -initially loss, but modification in structure or expression over longer time scales- as a major mechanism of phage resistance that influences bacterial competition.

      Strengths

      A major strength of this manuscript is the care in designing experiments and conducting follow-up experiments to isolate the essential elements to support each of the conclusions. This includes using orthogonal methods such as sequencing and modeling to support or expand the findings from culturing and experimental evolution. The study features results that were beautifully replicated (e.g. Figure 3) lending confidence to the findings.

      Weaknesses

      Two weaknesses of the manuscript in its current form are: 1) a need to discuss other studies that also have found context-dependent results and 2) more focus on delivering the key overall "message" of the paper to the reader. Finally, not a weakness, but a (necessary) limitation is the study system, but this manuscript sets a bar for other groups to test in their systems to probe the generality of the findings.

      The support for the conclusions is compelling. The findings were counter to the initial expectation (lysogenization as a major feature) and the manuscript does an admirable job of supporting the unexpected conclusion with thorough experimental work, supplemented with modeling.

      This manuscript will be of great significance in microbial evolution, both for its implications in limiting the scope of lysogenization as a viable phage resistance mechanism in the long term and for its significant experimental rigor, particularly with regard to the co-evolutionary timescale studied. The study has very important implications for the evolution of antimicrobial resistance and phage therapy.

      We thank the reviewer for the time spent and enthusiasm towards our experimental set-up.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors conducted a thorough analysis of the correlation between height and measures of cognitive abilities (what are essentially IQ test components) across four cohorts of children and adolescents in the UK measured between 1957 and 2018. The authors find the strength of the association between height and cognitive measures declined over this time frame--for example, among 10- and 11-year-olds born in 1958, height explained roughly 3% of the variation in verbal reasoning scores; this dropped to approximately 0.6% among those born in 2001. These associations were further attenuated after accounting for proxy measures of social class.

      The authors' analyses were performed carefully and their observations regarding declining height / cognitive measure associations are likely to be robust if we interpret their results with an important caveat: these results reflect measurements aimed at assessing cognition rather than cognition itself. The importance of this distinction is evidenced by the changing correlation structure of the cognitive measures over time. For example, age 11 verbal / math scores were correlated at >= 0.75 at the first two time points but dropped to 0.33 at the most recent time point. Similar patterns are present for the other cognitive measures and time points. The authors' conclude that such changes are unlikely to impact their primary findings, but I'm less certain. For example, one interpretation of this finding is that older cognitive measures were simply worse at indexing distinct cognitive domains and instead reflected a combination of cognitive ability together with non-specific factors relating to opportunity, health, class, etc. Further, height was historically a stronger proxy for class and economic status than it is today (e.g., by capturing adequate nutritional intake, risk for childhood disease, etc.). Together, then, previously high height / cognitive measure correlations might reflect the fact that both phenotypes previously indexed socio-economic factors to a greater extent than they might today (which is still non-negligible).

      We agree, it is possible that our results could in principle be explained by changes to the measures. We have provided further analysis to attempt to inform the likelihood of this suggestion and have expanded our discussion of this issue (Discussion, explanation of findings section; copied below).

      First, we conducted additional sensitivity analysis repeating our main analysis using cognition measures in which the number of response options was set to be the same for each test (the lowest common denominator across all cohorts). This was tested in two separate approaches: 1) by reducing the number of categories to the same number in each cohort; and 2) or by picking a random sample of question items for each category. Our main findings were unchanged: described in “Additional and sensitivity analyses” section, Figs S20-S21.

      Regarding the suggestion that “high height / cognitive measure correlations might reflect the fact that both phenotypes previously indexed socio-economic factors to a greater extent than they might today” – we sought to account for this by adjustment for measured indicators of socioeconomic position, and found the trend remained after adjustment (Fig 1 panel 2). As in other observational studies we cannot fully rule out the possibility of residual confounding however (Discussion, Explanation of findings paragraph 2).

      “The multi-purpose and multidisciplinary cohorts used cognition tests which differed slightly in each cohort. It is therefore possible that differences in testing could have either: 1) entirely generated the pattern of results we observed, such that if identical tests were used the association between cognition and height would otherwise have been identical in each cohort; in contrast to previous findings which reported using identical tests20; or 2) biased our results, such that if identical tests were used the decline in association between cognition and height would have been less marked than we reported. While we cannot directly falsify this alternative hypothesis given our reliance on historical data sources, a number of lines of reasoning suggest that the first scenario is unlikely. First, our results were similar when using 4 different cognitive tests (spanning mathematical and verbal reasoning); any bias which generated the results we observed should be similarly present across all 4 tests. Other things being equal, one would expect that more discriminatory tests (i.e., those with a greater number of responses) would have higher accuracy and thus better index cognition. Our results were similar when the youngest cohort had similar numbers of unique scores in cognitive tests compared with the oldest cohort (Verbal @ 11 years: n=41 in 1946c, n=40 in 2001c) and fewer unique scores (Maths @ 7/11: n=51 in 1946c, n=21 in 2001c). Our results were also similar in sensitivity analyses in which the number of response options were set to be the same in each cohort. Higher random measurement error in the independent variable (cognition) would lead to weakened observed associations with the outcome (height),52 yet we do not a-priori anticipate that this such error was higher in younger across all tests in such a manner that would have led to the correlation we observed. Ensuring comparability of exposure is a major challenge across such large timespans. Reassuringly, our results are consistent with those from a previous study which reported consistent tests being used (from 1939-1967).20 However, even seemingly identical require modification across time (e.g., for verbal reasoning/vocabulary there is typically a need to adapt question items due to societal and cultural changes over time in vocabulary and numerical use); further, changes to education such as increases in testing may have led to increasing preparedness and familiarity with testing than in the past even where identical tests are used.

      Interestingly, we observed a marked reduction in the correlation between cognitive tests across time (e.g., between verbal and maths scores). This trend has been reported in previous studies53 54 and warrants future investigation; it is consistent with evidence that IQ gains across time seemingly differ by cognitive domain,45 potentially capturing differences across time in cognitive skill use and development in the population. Previous studies using three (1958-2001c) of the included cohorts have also reported changing associations between cognition (verbal test scores at 10/11 years) and other traits: a declining negative association with birth weight19 and a change in direction of association with maternal age (from negative to positive);55 each finding has plausible explanations based on changes across time in relevant societal phenomena (improved medical conditions19 and changes in parental characteristics,55 respectfully), yet also cannot conclusively falsify the notion that differences in tests used influences the results obtained. In this paper, we used multiple tests and sensitivity analyses to attempt to address this.”

      Additionally, their findings add an interesting data point to a collection of recent results suggesting that the relationship between cognitive and anthropometric measures is complex and difficult to interpret. For example, studies using genetic markers to examine shared genetic bases have virtually all relied on methods assuming mating is random, which is not the case empirically. Howe et al. (doi.org/10.1038/s41588-022-01062-7) recently reported that the ostensible genetic correlation of -.32 between years of education and BMI attenuates to -.05 when using direct-effect estimates, which should theoretically be immune to the effects of non-random mating and other confounding variables. Likewise, Keller et al. (doi.org/10.1371/journal.pgen.1003451) and Border et al. (doi.org/10.1101/2022.03.21.485215) used very different approaches to arrive at the same conclusion that ~50% of the nominal genetic correlation between IQ and height could be attributed to bivariate assortative mating rather than shared causal biological factors. Given that assortative mating on both IQ measures and height involves many other traits (not just two as assumed in such bivariate models), the true extent to which height / IQ correlations reflect causal factors is plausibly even lower than these estimates suggest. For these reasons, I do not entirely agree with the authors' review of previous findings in the introduction, where they write "recent studies have suggested that links between higher cognition and taller height can be largely explained by genetic factors", though it is certainly true that this claim has been made.

      We have revised our introduction to better reflect the complexity of previous findings and to note that this claim.

      Reviewer #2 (Public Review):

      The authors use birth cohorts with extensive cognitive assessments and height measurements along with data on parental height and socioeconomic status. The authors estimate that the correlation between height and cognitive ability has approximately halved in the last 60 years.

      Quantile regression results suggest that this is due to a stronger association between low cognitive ability and short stature in older cohorts, potentially due to environmental factors that cause both and that have been removed by improvements in the environment in the last 60 years.

      While this is a plausible hypothesis, the evidence presented in the manuscript is unable to rule out alternative hypotheses, such as changes in assortative mating.

      The results in the manuscript will be of interest to researchers investigating how genetics and environment lead to correlations between cognitive and physical/health traits, and to researchers interested in the relationship between social and health inequalities.

      While my sense of the evidence presented is that there is fairly solid statistical evidence for a trend where the correlation between cognitive ability and height declines over time, there is no formal quantification of this trend nor measurement of the uncertainty in the trend.

      We now include additional statistical tests to compare estimates in each cohort (Fig S6). We have opted to include this in supplemental material given the large number of tests included already.

      Similarly, the quantile regression plots in Figure 2 appear to show a trend across the height deciles for the two oldest cohorts, but no quantification of how strong this is nor what uncertainty exists is calculated. Furthermore, if the apparent trend in the quantile regression plots is true, wouldn't this imply a non-linear association between height and cognitive ability for the older cohorts? Can this be seen in the scatterplots or in a non-linear regression?

      We included 95% confidence intervals in our quantile regression analyses which provide an indication of uncertainty. We believe that given the substantial amount of analyses (across 4 historical cohorts and 4 cognition tests; 23 supplemental results) further work would be best placed to undertake additional statistical exploration of both quantile regression and non-linear associations. We would be happy to reconsider this if requested.

      I think the authors could have done more with their data to investigate the contribution of assortative mating to the observed trend. Looking at Figure S4, it looks like the correlation between mother's education and father's height in the 2001 cohort is substantially lower than for previous cohorts. While cognitive ability may not be available for parents, one could look at, for example, father's education and mother's height across the cohorts and see if there is a downward trend in correlation.

      We now include in Figure S5 cross-cohort investigation of the correlation between parental height and maternal education. We find that the correlation is similar across 1946c, 1958c, and 1970c, yet is weaker in 2001c (Fig S5). We comment on this in the paper (see revised discussion, explanation of findings section). Interpretation of these results is complicated by measurement error in parental education (typically reported for both parents by mothers). Further, interpretation may be further complicated by reductions in the socioeconomic patterning of height across time (see https://www.thelancet.com/journals/lanpub/article/PIIS2468-2667(18)30045-8/fulltext). Future would which focuses on assortative mating could investigate these issues.

      Reviewer #3 (Public Review):

      A difficulty with the paper is the different cognitive tests used in the different cohorts; the authors address this at some length in the discussion. However, I am afraid that this matter makes the results hard or impossible to interpret along the lines of their research question. One would need to know that, if these cognitive tests were administered in a single cohort at one time, they would have the same correlation with height.

      Please see our responses to Reviewer 1 and our revised Discussion. We are reliant upon imperfect historical data to make inferences on long-run trends, in the absence of ideal data for this paper (eg, the same tests used in all cohorts born in 1946, 1958, 1970 and millennium; though even in this instance some changes would be required (eg, to the words chosen in verbal reasoning tasks; see Discussion, explanation of findings section)).

      I judge that the main limitation of the method is the fact that different cognitive tests are used in the different cohorts. The tests in themselves are valid tests of cognitive functions. However, given that the focus of the study is on the change in correlations across time, then it is a worry that the tests are different; that is, the authors have the burden of proving to us that, if the environmental/social changes had NOT been operative across time, then the height-cognitive test correlations would be the same. What can the authors do to prove to us that if, say, all of these different-cohort verbal tests had been given to a single cohort on a single occasion, then they would have the same correlations with height? The same goes for the mathematics based tests. I note the tests' somewhat different distributions in Figure 1, but that is not the only thing that could lead to different correlations with, say, height. I am aware that all cognitive tests tend to correlate positively and that they all have loadings on general intelligence; however, different tests will not necessarily have the same correlations with outside variables (e.g. height). This will depend on things such as their content, their reliability/internal consistency etc.

      In the Results the authors state: "Cognitive test scores were strongly-moderately positively correlated with each other, with the size of the correlation weakening across time." That's true, but perhaps, also a major concern for this study. One possible reason for the decline in verbal-maths test correlations across cohorts (old to recent) is that the nature of these tests has changed across time, either/both in terms of content (what capabilities are assessed) or something such as reliability/internal consistency/ceiling-or-floor effects (how well the capabilities are assessed). That is, given that the height-cognitive test correlations show a similarly declining pattern of correlations over cohorts, it could be that the tests' contents (of the different tests) is partly or wholly responsible. I raise that as a possibility only, and I appreciate that it might be correct, as the authors prefer, that there is an inherent lowering of intelligence-height correlations over time, but I do not think that one can rule out-with the present study's design-that it might have been due to the change in tests. For example, a reading-math correlation of 0.74 in 1946 lowered to a correlation of .32 in 2001, in the face of different tests. To show that this is not due to the different tests being used would require more information. If this is a true result, it is big news.

      Please see our responses to Reviewer 1. This includes additional analysis and an expanded discussion of this possible cause of bias. We hope our manuscript now provides further evidence and discussion to inform the likelihood of this possibility.

      I have a suggestion: if the authors wish to rule out the possibility that the lowering intelligence-height correlations across cohorts are due to different cognitive tests being used, they should take all the cognitive tests used here and apply them cross-sectionally to single-year-born samples (of 11- and 16-year olds) that have also been measured for height. If the cognitive tests all correlate at the same level with height within each of these two samples (they needn't do so across the 11- and 16-year olds), then one could proceed more safely with between-cohorts (1946, 1958, 1970, 2001) comparisons of the correlations.

      We thank the reviewer for this suggestion. However we are unsure that we understood the suggested analysis or whether it was tractable given our data—the cohorts we used were born in either 1946, 1958, 1970, or around 2000. We do not have cross-sectional samples of 11 and 16 year olds at the same time.

    1. Author Response:

      Dear eLife Editorial Board, dear reviewers, dear readers,

      We very much thank the eLife editors and reviewers for their overall very positive review and encouraging assessment of our manuscript, and for highlighting our study’s innovation and relevance for using genomic approaches for the conservation of biodiversity.

      We very much thank the reviewers for pointing out parts of the manuscript that could be described more clearly or in more detail to make the study fully reproducible, and have therefore rewritten parts of the manuscript. We importantly follow reviewer 1’s specific recommendation to focus the main text on clearly understandable results, and therefore now only showcase the application of selective nanopore sequencing (aka adaptive sampling) to one soil sample, which we hope will make the flow of the manuscript easier to understand.

      We further agree that parts of the study could have been conducted more extensively (e.g. include more samples and thereby showcase the broad applicability of the approach), which was unfortunately not feasible since I as the lead author left New Zealand to take up another position abroad. We are, however, following up on this work with another controlled large-scale study.  

      We further agree that both qPCR and metabarcoding have their advantages and disadvantages. Metabarcoding approaches, however, importantly deliver more information about the biodiversity of a location than just the presence of a single species; this, in our case, includes other endangered species and evidence of kākāpō predators. We further show that the chosen marker gene region (12S rRNA) is species-specific enough to distinguish kākāpō from its two closest relatives. While qPCR has been shown to be more sensitive for some species, the difference is often minimal (see e.g., Harper et al., Ecol Evol. 2018 Jun; 8(12): 6330–6341), and for some species has been shown to be equally sensitive (Schneider et al., PLoS ONE 2016, 11, e0162493). qPCR approaches further require the careful design of species-specific primers, and herewith the access to samples and DNA of the target species and of closely related species – all of which are not necessarily at hand, especially not for conservationists who want to use these approaches regularly in the future, and in countries like New Zealand where genomic work with material from any “treasured” species has to be approved in a long and detailed process according to national regulations and the Nagoya Protocol. Given all these reasons, and the general good performance of our metabarcoding approach (also in detecting our species of interest), we do not see the necessity of applying a qPCR approach in this study.

      To avoid any confusion, we now also describe the samplings sites in more detail and use their labels consistently throughout the manuscript. Briefly, the sites were always sampled directly at the site, and at 4m and 20m distance, and all in replicates, as described in detail in the manuscript. Specifically, the “abandoned nests” had only been abandoned ~30 days before sampling, as described in the Methods, and this is why kākāpō DNA is still present.

      We further thank reviewer 2 for suggesting to discuss the impact of selective nanopore sequencing on pore efficiency in more depth, and added a respective sentence to the Discussion. We in general added more references and the broader scientific context to the Discussion.

      Thank you again for this very helpful review of our work.

      With best regards,<br /> Lara Urban

    1. Author Response:

      We are grateful for the detailed feedback provided by the two anonymous reviewers. We provide a point-by-point response to their reviews below:

      Reviewer #1 (Public Review):

      Medwig-Kinney et al perform the latest in a series of studies unraveling the genetic and physical mechanisms involved in the formation of C. elegans gonad. They have paid particular attention to how two different cell fates are specified, the ventral uterine (VU) or anchor cell (AC), and the behaviors of these two cell types. This cell fate choice is interesting because the anchor cell performs an invasive migration through a basement membrane. A process that is required for correct C. elegans gonad formation and that can act as a model for other invasive processes, such as malignant cancer progression. The authors have identified a range of genes that are involved in the AC/VC fate choice, and that imparts the AC cell with its ability to arrest the cell cycle and perform an invasive migration. Taking advantage of a range of genetic tools, the authors show that the transcription factor NHR-63 is strongly expressed in the AC cell. The authors also present evidence that NHR-63 is could function as a transcriptional repressor through interactions with a Groucho and also a TCF homolog, and they also suggest that these proteins are forming repressive condensates through phase separation.

      The authors have produced an extensive dataset to support their two primary claims: that NHR-67 expression levels determine whether a cell is invasive or proliferative, and also that NHR-67 forms a repressive complex through interactions with other proteins. The authors should be commended for clearly and honestly conveying what is already known in this area of study with exhaustive references. But absent data unambiguously linking the formation and dissolution of NHR-67 condensates with the activation of downstream genes that NHR-67 is actively repressing, the novelty of these findings is limited.

      Response 1.1: We thank the reviewer for recognizing the extensive dataset we provide in this manuscript in support of our claims that, (1) NHR-67 expression levels are important for distinguishing between AC and VU cell fates, and (2) NHR-67 interacts with transcriptional repressors in VU cells. We acknowledge that a complete mechanistic understanding of the functional significance of NHR-67 puncta is not possible without knowing direct targets of NHR-67 in the AC. Unfortunately, tools to identify transcriptional targets in individual cells or lineages in C. elegans do not exist, and generation of such tools would be beyond the scope of this work. This is evidenced by the fact that the first successful attempt to transcriptionally profile the AC was only posted as a preprint one month ago (Costa et al., doi: 10.1101/2022.12.28.522136). It is our hope that the findings we present here can be integrated with future AC- and VU-specific profiling efforts to provide a more complete picture of the functional significance of NHR-67 subnuclear organization.

      Reviewer #2 (Public Review):

      Medwig-Kinney et al. explore the role of the transcription factor NHR-67 in distinguishing between AC and VU cell identity in the C. elegans gonad. NHR-67 is expressed at high levels in AC cells where it induces G1 arrest, a requirement for the AC fate invasion program (Matus et al., 2015). NHR-67 is also present at low levels in the non-invasive VU cells and, in this new study, the authors suggest a role for this residual NHR-67 in maintaining VU cell fate. What this new role entails, however, is not clear. The model in Figure 7E shows NHR-67 switching from a transcriptional activator in ACs to a transcriptional repressor in VUs by virtue of recruiting translational repressors. In this model, NHR-67 actively suppresses AC differentiation in VU cells by binding to its normal targets and acting as a repressor rather than an activator. Elsewhere in the text, however, the authors suggest that NHR-67 is "post-translationally sequestered" (line 450) in nuclear condensates in VU cells. In that model, the low levels of NHR-67 in VU cells are not functional because inactivated by sequestration in condensates away from DNA. Neither model is fully supported by the data, which may explain why the authors seem to imply both possibilities. This uncertainty is confusing and prevents the paper from arriving at a compelling conclusion. What is the function, if any, of NHR-67 and so-called "repressive condensates" in VU cells?

      Response 2.1: As the reviewer correctly notes, we present two possible models in this manuscript. The interaction between NHR-67 and the Groucho/TCF complex in the VU cells could (1) switch the role of NHR-67 from a transcriptional activator to a transcriptional repressor, or (2) sequester NHR-67 away from its transcriptional targets. Indeed, we cannot definitively exclude the possibility of either model. In our resubmission, we will attempt to make this more clear in the text and by presenting both possible models in the summary figure (Fig. 7E).

      Below we list problems with data interpretation and key missing experiments:

      1) The authors report that NHR-67 forms "repressive condensates" (aka. puncta) in the nuclei of VU cells and imply that these condensates prevent VU cells from becoming ACs. Fig. 3A, however, shows an example of an AC that also assemble NHR-67 puncta (these are less obvious simply due to the higher levels of NHR-67 in ACs). The presence of NHR-67 puncta in the AC seems to directly contradict the author's assumption that the puncta repress the AC fate program. Similarly, Figure 5-figure supplement 1A shows that UNC-37 and LSY-22 also form puncta in ACs. The authors need to analyze both AC and VU cells to demonstrate that NHR-67 puncta only form in VUs, as implied by their model.

      Response 2.2: The puncta formed by NHR-67 in the AC are different in appearance than those observed in the VU cells and furthermore do not exhibit strong colocalization with that of UNC-37 or LSY-22. The Manders’ overlap coefficient between NHR-67 and UNC-37 is 0.181 in the AC, whereas it is 0.686 in the VU cells. Likewise, the Manders’ overlap coefficient between NHR-67 and LSY-22 is 0.189 in the AC compared to 0.741 in the VU cells. We speculate that the areas of NHR-67 subnuclear enrichment in the AC may represent concentration around transcriptional targets, but testing this would require knowledge of direct targets of NHR-67.

      2) While a pool of NHR-67 localizes to "repressive condensates", it appears that a substantial portion of NHR-67 also exists diffusively in the nucleoplasm. This would appear to contradict a "sequestration model" since, for such a model to work, a majority of NHR-67 should be in puncta. What proportion of NHR-67 is in puncta? Is the concentration of NHR-67 in the nucleoplasm lower in VUs compared to ACs and does this depend on the puncta?

      Response 2.3: The proportion of NHR-67 localizing to puncta versus the nucleoplasm is dynamic, as these puncta form and dissolve over the course of the cell cycle. However, we estimate that approximately 25-40% of NHR-67 protein resides in puncta based on segmentation and quantification of fluorescent intensity of sum Z-projections. We also measured NHR-67 concentration in the nucleoplasm of VU cells and found that it is only 28% of what is observed in ACs (n = 10). We disagree with the notion that the majority of NHR-67 protein should be located in puncta to support the sequestration model. As one example, previously published work examining phase separation of endogenous YAP shows that it is present in the nucleoplasm in addition to puncta (Cai et al., 2019, doi: ​​10.1038/s41556-019-0433-z). In our system, it is possible that the combination of transcriptional downregulation and partial sequestration away from DNA is sufficient to disrupt the normal activity of NHR-67.

      3) The authors do not report whether NHR-67, UNC-37, LSY-22, or POP-1 localization to puncta is interdependent, as implied in the model shown in Fig. 7.

      Response 2.4: It is difficult to test whether localization of these proteins to puncta is interdependent, as perturbation of UNC-37, LSY-22, and POP-1 result in ectopic ACs. Trying to determine if loss of puncta results in VU-to-AC transdifferentiation or vice versa becomes a chicken-egg argument. It is also possible that UNC-37 and LSY-22 are at least partially redundant in this context. We based our model, shown in Fig. 7E, on known or predicted protein-protein interactions, which we confirmed through yeast two-hybrid analyses (Fig. 7D; Fig. 7-figure supplement 1).

      4) The evidence that the "repressor condensates" suppress AC fate in VUs is presented in Fig. 4D where the authors deplete the presumed repressor LSY-22. First, the authors do not examine whether NHR-67 forms puncta under these conditions. Second, the authors rely on a single marker (cdh-3p::mCherry::moeABD) to score AC fate: this marker shows weak expression in cells flanking one bright cell (presumably the AC) which the authors interpret as a VU AC transformation. The authors, however, do not identify the cells that express the marker by lineage analyses and dismiss the possibility that the marker-positive cells could arise from the division of an AC-committed cell. Finally, the authors did not test whether marker expression was dependent on NHR-67, as predicted by the model shown in Fig. 7.

      Response 2.5: For the auxin-inducible degron experiments, strains contained labeled AID-tagged proteins, a labeled TIR1 transgene, and a labeled AC marker. Thus, we were limited by the number of fluorescent channels we could co-visualize and therefore could not also visualize NHR-67 (to assess for puncta formation) or another AC marker (such as LAG-2). We could have generated an AID-tagged LSY-22 strain without a fluorescent protein, but then we would not be able to quantify its depletion, which this reviewer points out is important to measure. We did visualize NHR-67::GFP expression following RNAi-induced  knockdown of POP-1 and observed consistent loss of puncta in ectopic ACs. However, this again becomes a chicken-egg argument as far as whether cell fate change or loss of puncta causes the other.

      5) Interaction between NHR-67 and UNC-37 is shown using Y2H, but not verified in vivo. Furthermore, the functional significance of the NHR-67/UNC-37 interaction is not tested.

      Response 2.6: We attempted to remove the intrinsically disordered region found at the C-terminus of the endogenous nhr-67 locus, using CRISPR/Cas9, as this would both confirm the NHR-67/UNC-37 interaction in vivo and allow us to determine the functional significance of this interaction. However, we were unable to recover a viable line after several attempts, suggesting that this region of the protein is vital.

      6) Throughout the manuscript, the authors do not use lineage analysis to confirm fate transformation as is the standard in the field.

      Response 2.7: The timing between AC/VU cell fate specification and AC invasion (the point at which we look for differentiated ACs) is approximately 10-12 hours at 25 °C. With our imaging setup, we are limited to approximately 3-4 hours of live-cell imaging. Therefore, lineage tracing was not feasible for our experiments. Instead, we relied on visualization of established markers of AC and VU cell fate to determine how ectopic ACs arose. In Fig. 6B,C we show that the expression of two AC markers (cdh-3 and lag-2) turn on while a VU marker (lag-1) get downregulated within the same cell. In our opinion, live-imaging experiments that show in real time changes in cell fate via reporters was the most definitive way to observe the phenotype.

      There are 4 multipotential gonadal cells with the potential to differentiate into VUs or ACs. Which ones contribute to the extra ACs in the different genetic backgrounds examined was not determined, which complicates interpretation. The authors should consider and test the following possibilities: disruption of NHR-67 regulation causes 1) extra pluripotent cells to directly become ACs early in development, 2) causes VU cells to gradually trans-fate to an AC-like fate after VU fate specification (as implied by the authors), or 3) causes an AC to undergo extra cell division(s)?? In Fig. 1F, 5 cells are designated as ACs, which is one more that the 4 precursors depicted in Fig. 1A, implying that some of the "ACs" were derived from progenitors that divided.

      Response 2.8: When trying to determine the source of the ectopic ACs, we considered the three possibilities noted by the reviewer: (1) misspecification of AC/VU precursors, (2) VU-to-AC transdifferentiation, or (3) proliferation of the AC. We eliminated option 3 as a possibility, as the ectopic ACs we observed here were invasive and all of our previous work has shown that proliferating ACs cannot invade and that cell cycle exit is necessary for invasion (Matus et al., 2015; Medwig-Kinney & Smith et al., 2020; Smith et al., 2022). Specifically, NHR-67 is upstream of the cyclin dependent kinase CKI-1 and we found that induced expression of NHR-67 resulted in slow growth and developmental arrest, likely because of inducing cell cycle exit. For our experiment using hsp::NHR-67, we induced heat shock after AC/VU specification. For POP-1 perturbation, we explicitly acknowledged that misspecification of the AC/VU precursors could also contribute to ectopic ACs (Fig. 6A; lines 364-402). We could not achieve robust protein depletion through delayed RNAi treatment, so instead we utilized timelapse microscopy and quantification of AC and VU cell markers (Fig. 6B,C; see response 2.7 above).

      In conclusion, while the authors report on interesting observations, in particular the co-localization of NHR-67 with UNC-37/Groucho and POP-1 in nuclear puncta, the functional significance of these observations remains unclear. The authors have not demonstrated that the "repressive condensates" are functional and play a role in the suppression of AC fate in VU cells as claimed. The colocalization data suggest that NHR-67 interacts with repressors, but additional experiments are needed to demonstrate that these interactions are specific to VUs, impact VU fate, and sequester NHR-67 from its targets or transform NHR-67 into a transcriptional repressor.

      Response 2.9: We agree that, at this time, we cannot pinpoint the precise mechanism through which NHR-67 puncta function (i.e., by sequestering NHR-67 from DNA or switching the role of NHR-67 from activating to repressing). However, identification of NHR-67 puncta and their colocalization with UNC-37, LSY-22, and POP-1 in VU cells allowed us to discover an undescribed role for the Groucho/TCF complex in maintaining VU cell fate. This, combined with our evidence demonstrating that NHR-67 transcriptional regulation is important for distinguishing between AC and VU cell fate, are the main contributions of our study.

    1. Author Response:

      Reviewer #1 (Public Review):

      Vaparanta et al propose a new bioinformatic algorithm for pathway discovery from multi-omics data sources at one time point, and validate some of their algorithm's predictions using functional experiments. The authors should be commended for their detailed experimental work and comprehensive data collection around TYRO3 signaling in melanoma, which will likely be of value to that field. They also provide a mature software package that is well documented for implementing their bioinformatic methods. The reviewer's experience with the software was that it is computationally efficient/fast with well written code. The biological data (both multiomics and functional validation studies) will be of interest to melanoma research as well as scientists interested in TYRO3 signaling.

      The authors wish to thank the Reviewer for the positive comments.

      At this time, however, the bioinformatics algorithm proposed is of unclear utility to the broader multiomics community for the following reasons:

      First, the algorithm itself has numerous hyperparameters, which can make it challenging to use and potentially highly sensitive to these user inputs. Just the regulatory complex inference step has 10 hyperparameters/settings required to be selected.

      We have now reduced the number of parameters in the code by automating the choice for 2 of the parameters. The manuscript is now accompanied by a sensitivity analysis on all the key parameters in the code (new Supplementary Figures 5-11) and we have created a script to inform the choice of the key parameter S (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10). We have additionally thoroughly revised the accompanying documentation in helping the user choose the right settings for their datasets (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3).

      Second, the algorithm is presented in an ad hoc manner without mathematical/statistical justifications of the many design decisions and steps in the analysis. For example, the authors write "The inference of regulatory complexes from the combined score follows the nearest neighbor principle, assuming that while a single high combined score can be random chance, the combination of combined scores between 3 cell signaling molecules would be predictive". It is mathematically unclear that this is true…

      We have now tested the effect of the design decisions of the algorithm on the ability to discover known associations in omics datasets (new Supplementary Figure 4). Adhering to the design decision of the algorithm greatly improves the amount of known associations found in real omics data.

      …and thus this reviewer attempted to test the algorithm using simulated uncorrelated Gaussian noise (see code/outputs at end of the review) in 10K genes and 10 samples using a best attempt at hyperparameter selection per the code comments and documentation. It appears that nearly 1/3 of all genes (i.e., 3205 of 10K) were erroneously grouped into complexes (assuming no mistakes in reviewer's usage of the code). In general, "unbiased" pathway analysis in multiomics that is not relying on prior knowledge will require solving the extraordinarily challenging task of estimating a very large covariance matrix from statistically small sample sizes. This puts the method at high risk of producing spurious results.

      The Reviewer raises an important topic that should be considered in de novo analyses. However, the test dataset the reviewer used is not truly representative of the omics datasets that should be used to evaluate the performance of the algorithm. First, the algorithm should be only used with positive expression values due to the way the stoichiometry score is calculated. This is now more clearly indicated in the accompanying documentation (available in Mendeley data: https://data.mendeley.com/datasets/m3zggn6xx9/draft?a=71c29dac-714e-497e-8109-5c324ac43ac3). The Gaussian noise used by the reviewer does not represent any positive expression values of any omics datasets.

      Second, the way the algorithm is constructed it will try to find an association to all features in the dataset if so instructed by the parameters. To this end, we have now added a new parameter (parameter S) into the algorithm to better control this setting. If correctly used in the test dataset used by the reviewer the algorithm now returns 0 complexes. The authors also wish to point out that they strongly believe that the amount of features in the dataset that have no real association with other features in real omics data is very low since most intracellular molecules have common upstream regulators. This poses a problem only if the dataset has a very limited amount of features.

      Third, it seems to the authors that instead of testing the limits of the algorithm with totally randomized data, it would be more valuable to assess whether the algorithm can find true positives among randomized data. To this end we estimated the true positive and false positive rate with normally, negative binomial and beta distributed simulated data (new Supplementary Figures 7-9). Indeed, the algorithm can discover only the true positives among the false positives as long as the S parameter is not set too low. We now provide a separate script (suggest parameter S value for regulatory complex inference, new Supplementary Figure 10) that will help the user to choose the parameter S for their data so that the amount of false positives in the inference is minimized.

      Fourth, the data produced by the standard normal distribution has a relatively low variance, already 68% values fall between -1 and 1 and 95% values between -2 and 2. If you simulate 10000 random rows with a sample size of 10 of such low variance parameter you are at high chance of creating highly correlating rows that actually would be representative of true positives in the dataset due to the generally high variation within omics data. Therefore, it is exceedingly hard to interpret whether the features were erroneously assigned into complexes or not because the chosen simulation method could have by chance created associations that represent true positives in the dataset.

      Fifth, we also analyzed the standard normal distributed simulated data with WGCNA, which is still the most widely used module discovery method. WGCNA assigned almost all the features into modules. However, I think it is clear due to the wide us that the analysis still can offer valuable insight into biological processes. Therefore, the authors are not sure how concerned they should be about the results of this test.

      Third, pathway analysis has long been a bioinformatic goal in the literature, with the authors citing a landmark paper for the WGCNA method from 2008. As such, there are numerous and long-standing discussions in the literature regarding challenges of pathway analysis (i.e., omics data often has dimensionality D far larger than sample size N, and correlation matrix estimation requires D^2 >> N parameters to be estimated) and its potential for spurious correlations. Some authors use sophisticated statistical tools (e.g., "Biological network inference using low order partial correlation" 2014, "Learning Large‐Scale Graphical Gaussian Models from Genomic Data" 2005, "Incorporating prior knowledge into Gene Network Study" 2013) to attempt to deal with this issue.

      The authors agree that if by spurious the Reviewer means non causal indirect associations like in the paper by Zuo et al. (Zuo et al., 2014. Biological network inference using low order partial correlation. Methods 69:266-73. doi: 10.1016/j.ymeth.2014.06.010.), then, indeed, the algorithm has not been designed to find directed networks. Instead, the algorithm has been designed to find common upstream regulators.

      Furthermore, the authors indicate that their approach is the first to attempt pathway analysis in multi-omics setting, stating "Integrative approaches combining more than one robust molecular association measure, however, have not been explored", but one can find attempts such as "An Integrative Transcriptomic and Metabolomic Study of Lung Function in Children With Asthma" to build on WGCNA for work in multiomics datasets.

      Indeed, the Reviewer is correct that correlation networks and WGCNA have been previously used with multi-omics datasets. What the authors meant to convey is that these previous approaches rely only on one measure of molecular association, which in the case of correlation networks is correlation and WGCNA covariation, while our method is the first that combines two measures of molecular association, the correlation and stoichiometry score. We have now amended the sentence in the manuscript (lines 51-52).

      The 2020 review paper "Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources" seems to identify multiple published methods dealing with pathway estimation in multiomics datasets. As the paper stands, this reviewer cannot adequately assess the impact of the proposed bioinformatic algorithm and its results against the existing body of literature for pathway inference.

      We have now benchmarked our method against existing module discovery, network and multi-omics integration methods and provide evidence that our method outperforms these methods (new Figure 4).

      Reviewer #2 (Public Review):

      The authors describe a bioinformatic platform that allows for unbiased pathway analysis from multiomics data. The concept is based on correlation, stoichiometry scores and their combination to evidence interaction between two proteins, transcripts or phosphosites in an omic dataset. This platform was developed and validated on both previously published and in house omics data. I really appreciate that the paper is well written and clear, and I would like to acknowledge the amount of work generated to produce the in-house dataset.

      The authors wish to thank the Reviewer for the encouraging words.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors' conclusions presented herein are supported by a well-established mouse genetic conditional approach and an extensive array of phenotypic analyses.

      Strengths:

      1. The authors utilized well-described genetic tools, AdipoQCre, to target preadipocyte-like progenitor cell populations in bone marrow, as well as Csf1 floxed alleles. They further sifted through the cell population by showing that mature lipid-laden adipocytes express Csf1 at a much lower level, and determined that AdipoQCre-marked progenitor cell population presents a major cellular source of M-CSF,

      2. The reanalysis of published scRNAseq datasets in Figure 1, as well as the following phenotypic analyses of the mutant mice are well-conducted. The analyses include a broad range of experiments both in vivo (3DmicroCT, histology, flow cytometry) and ex vivo (osteoclastogenesis assay in bone marrow cell culture). The confidence of the reported findings is high.

      3. The data presented in this manuscript are of very high quality.

      Weaknesses:

      1. The role of AdipoQ-lineage progenitors as a source of M-CSF is overstated. The authors claim in many instances that "mature bone adipocytes do not express M-CSF", "These cells however do not produce Csf1", "...these peripheral AdipoQ+ cells nearly do not produce M-CSF". However, the authors' qPCR experiments only show four times differences in Csf1 expression. Therefore, the claim that AdipoQ-lineage progenitors are an exclusive source of M-CSF is not well substantiated. In line with this, some of the recent literature reporting conditional deletion of M-CSF in other bone cells (JBMR Plus. 4:e10080., Nature. 590:457-462) are not included.

      We thank the reviewer for this important question. We have performed the below experiments to further clarify and support our conclusion:

      1) We increased the replicates of each group cells in Fig. 3A (the old Fig. 1E) to five/group and based on reviewer 3’ recommendation on housekeeping gene usage, we found that the mRNA expression of Csf1 in bone marrow AdipoQ-lineage progenitor cells is 20-30 fold higher than those in mature adipocytes. This result has been updated in Fig. 3A.

      2) We further performed immunofluorescence staining of M-CSF on bone slices, and found that the majority of bone marrow AdipoQ-expressing progenitor cells express M-CSF (Fig. 3B, 1865 cells out of 2001 cells counted, n=3 mice, 93.2%). In contrast, M-CSF expression was not detected in mature bone marrow adipocytes (Perilipin1+) (Fig. 3C, 0 cells out of 115 cells counted, n=3 mice, 0%), indicating that mature bone marrow adipocytes are unlikely a significant source of M-CSF.

      3) We performed western blot to analyze M-CSF protein expression in peripheral adipose. As shown in Fig. 3D, the stromal vascular fraction (SVF) cells in adipose, which contain multiple cell populations including adipogenic progenitors, express M-CSF. On the contrary, M-CSF was nearly undetectable in the peripheral mature adipocytes isolated from adipose (Fig. 3D).

      These data collectively support that mature adipocytes are not a significant source of M-CSF as evidenced by nearly undetectable M-CSF expression compared to the Adipoq-lineage progenitors. The results were described on pg. 5. However, the reviewer’s comment on ‘exclusive source’ is well taken as osteocytes and osteo lineage also express certain levels of M-CSF. We deleted ‘exclusive source’ in the manuscript, have added relevant literature and discussion in the Results and Discussion section on pp. 5 and 9.

      2. Some of the phenotypic analyses are still incomplete. The authors did not report whether CHet (AdipoQCre Csf1(flox/+)) showed any bone phenotype. Further, the authors did not show that Csf1 mRNA or M-CSF protein is expressed in AdipoQ-lineage progenitors using histological methods. Current evidence is only based on scRNAseq and qPCR of isolated cells. Whether there was any change in circulating bone resorption markers in CKO mice was not shown. Cortical bone parameters were not included in the 3D-microCT analyses. These missing pieces of information would be important to correctly interpret the phenotypes.

      The het mice (Csf1f/+;AdipoQ Cre) do not show abnormal bone phenotype, which is now shown in Fig. 4-figure supplement 4. We performed immunofluorescence staining of M-CSF on bone slices, and found that the majority of bone marrow AdipoQ-expressing progenitor cells express M-CSF (Fig. 3B, 1865 cells out of 2001 cells counted, n=3 mice, 93.2%). We tested serum TRAP level in mice, and found that the Csf1 deficiency in Csf1∆AdipoQ mice significantly decreased the TRAP level in serum, compared to that in the WT control mice (Fig. 5B). Csf1∆AdipoQ mice do not exhibit abnormal cortical bone phenotype. The cortical bone parameters are now included in Fig. 4G.

      3. Which bone marrow cell population(s) are marked by AdipoQCre remain largely unclear. It is possible that AdipoQCre also marks at least part of MSPC-osteo cluster in addition to MSPC-adipo. Adipo-lineage progenitors may not stay entirely as adipoprogenitors and drift toward osteoblasts or their precursor cells.

      We thank the reviewer for the insightful comment on this interesting mystery and complicated question, which is drawing more attention in the field.

      In addition to Adipoq-lineage progenitors, Adipoq Cre also labels other clusters. However, the expression levels of Adipoq and frequency of Adipoq+ cells in other cell populations are relatively low. For example, the integrated scRNAseq dataset we analyzed shows that Adipoq is expressed at a low level (with scaled mean expression at 0.68, (27)) in a small proportion of MSPC-osteo cells (Fig. 1), and small amounts (31, 37) (about 4%) of osteoblasts in 8 or 12-week-old mice are Adipoq-lineage. A recent report found that in 24-week-old mice, about 15-40% of osteoblasts are marked with Adipoq Cre (37). This raises a few important possibilities that will need to be distinguished in future work. One possibility is that the Adipoq-lineage cells (adipo-CAR cells/MALPs) have minor or latent osteogenic potential that may become more evident under specific conditions, such as in older animals. However, balanced against this is the alternative that Adipoq-cre could primarily target a population of solely adipogenic adipo-CAR cells but that its specificity is imperfect, leading to progressive low levels of deletion in a separate population expressing very low levels of Adipoq, such as osteo-CAR cells. An additional possibility is that the Adipoq-lineage cells may themselves actually be further subdivided into multiple component cell types, including a major adipogenic and a separate minor osteogenic subpopulation. Ultimately, at the root of these issues is that Adipoq cre primarily defines one or possibly more lineages of cells rather than a cell type within those lineages. Therefore, application of further markers to fractionate the adipoq-lineage into its component cell types will be needed to resolve these possibilities, focusing on whether any potential osteogenic activity present can be fractionated away from the primary adipogenic activity present.

      Of note, the Adipoq expression level and positive cell proportion are much higher in bone marrow Adipoq lineage progenitors than the levels seen in osteoblast lineage (Fig.1, Fig.2, (22, 27, 31)) or endothelial cells in bone marrow (38, 39). For example, the MSPC-Adipo cluster (Adipoq-lineage progenitors) has 6441 cells with the highest level (scaled mean expression level at 3.01 per (27) at Single Cell Portal) of Adipoq seen among bone marrow cells analyzed. In contrast, the MSPC-osteo cluster consists of 2247 cells with a very low Adipoq expression level (scaled mean expression level at 0.68 per (27) at Single Cell Portal). Taken together with both average expression level and cell numbers in each cluster, the relative overall contribution to Adipoq expression by MSPC-osteo vs the Adipoq-lineage progenitors is 7.8% ((2247 x 0.68)/(6441 x 3.01)). Therefore, the expression of Adipoq in MSPC-osteo cluster is marginal compared to that in the Adipoq-lineage progenitors. These data make Adipoq as an important marker to identify bone marrow Adipoq lineage progenitors. Overall, our work not only validates prior research identifying adipoq-lineage cells, identified as MALPs (22, 31), as a key osteoclast regulatory population, but also further extends the scope of their functions to encompass M-CSF production and regulation of macrophages.

      These points have been added to the Discussion sections on pp. 9-10.

      4. The OVX data in Figure 5 are not very well explained. The data do not seem to support the authors' conclusion that M-CSF deficiency in AdipoQ-lineage progenitors alleviates estrogen-deficiency induced osteoporosis. The CKO mice lose bone mass almost to the same extent as WT mice upon OVX.

      To address the reviewer’s question, we calculated the changes of the uCT parameter values between Sham and OVX groups in the WT control and Csf1∆AdipoQ mice. Significant changes were identified between the control and Csf1∆Adipoq mice in several μCT parameters. For example, a decrease in trabecular BV/TV after OVX: 35.1% in the control vs 20.9% in Csf1∆Adipoq mice; a decrease in Tb. N after OVX:11.34% in the control vs 7.97% in Csf1∆Adipoq mice; a decrease in Conn-Dens after OVX: 39.7% in the control vs 14.56% in Csf1∆Adipoq mice; an increase in Tb. Sp after OVX: 12.51% in the control vs 1.97% in Csf1∆Adipoq mice. These results support our conclusion that M-CSF deficiency in AdipoQlineage progenitors alleviates estrogen-deficiency induced osteoporosis. These value changes have been included in Fig. 7C and discussed on pg. 7.

      Reviewer #3 (Public Review):

      Macrophage colony-stimulating factor (M-CSF) plays key roles in the differentiation of myeloid-lineage cells, including monocytes, macrophages and osteoclasts. The latter mediate bone resorption, which is important for physiological bone remodelling but, unrestrained, contributes to bone loss in conditions such as in post-menopausal osteoporosis. M-CSF production within the bone marrow is implicated in the maintenance of myeloid and skeletal homeostasis, but the cellular source of bone marrow M-CSF has remained elusive. In this study, Inoue et al address this issue through advanced transcriptomic and gene targeting approaches. They conclude that a population of Adipoq-expressing progenitors within the bone marrow, designated "AdipoQ-lineage progenitors", is the key cellular source of M-CSF. Consistent with this, they find that transgenic deletion of M-CSF from these cells disrupts macrophage and osteoclast development, leading to osteopetrosis and possibly preventing bone loss following ovariectomy. However, they have not adequately addressed the possibility that M-CSF production from other cell types, particularly adipocytes in peripheral adipose tissues, may also be influencing these phenotypes. Specific strengths and weaknesses are as follows:

      Strengths:

      1. The manuscript is written in a clear, succinct manner and the data are generally nicely presented. It is therefore a pleasure to read.

      2. The analysis of single-cell transcriptomic data is clear and convincing, and the skeletal phenotyping has been done to a high standard.

      Weaknesses:

      1. The authors underplay the potential contribution of M-CSF production from other cell types, particularly from adipocytes in peripheral adipose tissues. They show that M-CSF expression from these cells is lower than from the bone marrow progenitors that they focus on; however, based on this they allude to "no expression" of M-CSF from these other adipocytes. This overlooks the findings of other studies showing that peripheral adipocytes produce M-CSF and that this has biological functions. Whether their knockout model alters M-CSF expression in peripheral adipose tissue, whether for whole tissue or for isolated adipocytes, has not been tested.

      We performed western blot to analyze M-CSF protein expression in peripheral adipose. As shown in Fig. 3D, the stromal vascular fraction (SVF) cells in adipose, which contain multiple cell populations including adipogenic progenitors, express M-CSF. On the contrary, M-CSF was nearly undetectable in the peripheral mature adipocytes isolated from adipose (Fig. 3D). These data collectively support that mature adipocytes are not a significant source of M-CSF as evidenced by nearly undetectable M-CSF expression compared to the Adipoq-lineage progenitors. However, we understand that current techniques may have limitation in identification of trace amount of M-CSF. We thus deleted descriptions such as ‘exclusive’ or ‘do not produce/express…’ in the revised manuscript.

      2. The decreases in M-CSF have been assessed at the transcript level, but not for M-CSF protein. Whether their knockout model

      We performed immunofluorescence staining of M-CSF on bone slices, and found a drastic decrease in M-CSF protein in bone marrow AdipoQ+ cells in Csf1∆AdipoQ mice compared to the WT control mice. The results are shown in Fig. 4B, and Fig. 3B-D.

      3. It is also unclear if the Adipoq-lineage progenitors consist exclusively of adipogenic cells, or if osteogenic progenitors are also part of this population.

      We thank the reviewer for the insightful comment on this interesting mystery and complicated question, which is drawing more attention in the field.

      In addition to Adipoq-lineage progenitors, Adipoq Cre also labels other clusters. However, the expression levels of Adipoq and frequency of Adipoq+ cells in other cell populations are relatively low. For example, the integrated scRNAseq dataset we analyzed shows that Adipoq is expressed at a low level (with scaled mean expression at 0.68, (27)) in a small proportion of MSPC-osteo cells (Fig. 1), and small amounts (31, 37) (about 4%) of osteoblasts in 8 or 12-week-old mice are Adipoq-lineage. A recent report found that in 24-week-old mice, about 15-40% of osteoblasts are marked with Adipoq Cre (37). This raises a few important possibilities that will need to be distinguished in future work. One possibility is that the Adipoq-lineage cells (adipo-CAR cells/MALPs) have minor or latent osteogenic potential that may become more evident under specific conditions, such as in older animals. However, balanced against this is the alternative that Adipoq-cre could primarily target a population of solely adipogenic adipo-CAR cells but that its specificity is imperfect, leading to progressive low levels of deletion in a separate population expressing very low levels of Adipoq, such as osteo-CAR cells. An additional possibility is that the Adipoq-lineage cells may themselves actually be further subdivided into multiple component cell types, including a major adipogenic and a separate minor osteogenic subpopulation. Ultimately, at the root of these issues is that Adipoq cre primarily defines one or possibly more lineages of cells rather than a cell type within those lineages. Therefore, application of further markers to fractionate the adipoq-lineage into its component cell types will be needed to resolve these possibilities, focusing on whether any potential osteogenic activity present can be fractionated away from the primary adipogenic activity present.

      Of note, the Adipoq expression level and positive cell proportion are much higher in bone marrow Adipoq lineage progenitors than the levels seen in osteoblast lineage (Fig.1, Fig.2, (22, 27, 31)) or endothelial cells in bone marrow (38, 39). For example, the MSPC-Adipo cluster (Adipoq-lineage progenitors) has 6441 cells with the highest level (scaled mean expression level at 3.01 per (27) at Single Cell Portal) of Adipoq seen among bone marrow cells analyzed. In contrast, the MSPC-osteo cluster consists of 2247 cells with a very low Adipoq expression level (scaled mean expression level at 0.68 per (27) at Single Cell Portal). Taken together with both average expression level and cell numbers in each cluster, the relative overall contribution to Adipoq expression by MSPC-osteo vs the Adipoq-lineage progenitors is 7.8% ((2247 x 0.68)/(6441 x 3.01)). Therefore, the expression of Adipoq in MSPC-osteo cluster is marginal compared to that in the Adipoq-lineage progenitors. These data make Adipoq as an important marker to identify bone marrow Adipoq lineage progenitors. Overall, our work not only validates prior research identifying adipoq-lineage cells, identified as MALPs (22, 31), as a key osteoclast regulatory population, but also further extends the scope of their functions to encompass M-CSF production and regulation of macrophages.

      These points have been added to the Discussion section on pp. 9-10.

      If these weaknesses are addressed then this work has potential to yield firm conclusions and new insights into the regulation of myeloid and skeletal homeostasis, both in normal physiology and in clinically relevant conditions.

      Yes, we have addressed the above 3 major questions.

    1. Author Response

      Reviewer #1 (Public Review):

      The current study proposed a drug discovery pipeline to accelerate the process of identifying drug candidates for LCA10 patients using cells from mouse retinal organoid for initial screening, human patient iPSC-derived retinal organoid for further testing, and then mouse mutants for in vivo validation. Reserpine was identified as the top candidate, possibly through modulating proteostasis and autophagy to promote cilium assembly. The study was with high translational value. However, the rationale using dissociated cells from mouse retinal organoid for initial drug screening needs to be justified. In addition, the consistency of phenotypic characteristics in human patient iPSC-derived retinal organoid needs to be reported. It was unclear if the rescued phenotypic changes were from the drug effects or a result of phenotypic variations in organoids.

      We thank the reviewer for the comments and suggestions. Please see the response provided in the “Essential Revisions” earlier. Briefly, the use of single-cell cultures for screening is to compensate for the variations of the Nrl-GFP signal in rd16 organoids so that each compound was present to homogenous cells. In addition, we performed a large-scale screening with 11 concentrations and 2 duplicates of over 6000 compounds. It was thus not feasible to manually perform the screening. We used a semi-automatic electronic dispenser to set up the screens in 1536-well plates and a liquid handling system to add the compounds. Intact mouse retinal organoids are too big to be dispensed and would be damaged during the process. They are also too big to fit into one well of a 1536-well plate or even in a 384-well plate. Therefore, single-cell cultures outweigh intact organoids in this application. We understand the potential pitfalls and thus the positive hits were verified in intact organoids in the secondary assays.

      We have now tested reserpine on retinal organoids derived from 2 clones of each (a total of 4) of LCA1 and LCA2 patients. As suggested by the reviewers, we quantified the fluorescence intensity of rod marker rhodopsin staining in multiple sections of at least two batches of differentiation (Figure 3C and Figure 3—figure supplement 2). Although showing variability as predicted, reserpine treatment significantly increased the fluorescence intensity of rhodopsin in retinal organoids differentiated from multiple lines (Figure 3C), further validating the rescue effect of reserpine.

      Reviewer #2 (Public Review):

      In this manuscript, a drug discovery pipeline was developed using a human iPSC derived organoid-based high-throughput screening platform to be used to identify drug candidates for maintaining photoreceptor survival in LCA10 retinopathies. Reserpine proved effective in patient organoids and in mutant mouse retina in vivo to improve photoreceptor survival and outer segment structure. Protein homeostasis was restored after reserpine treatment by increasing p62 levels, decreasing the 20S proteasome, and increasing proteasome activity. The manuscript is clearly written, contains a large amount of valuable and high-quality data and demonstrates that rebalancing proteostasis can stabilize photoreceptor overall homeostasis in the presence of a mutation that causes retinal degeneration.

      The manuscript may lack functional in vivo data on the treatment by reserpine in RD16 mice such as ERG measurements or other functional tests (the authors also refer to it as future direction). Nevertheless, in my view, the study provides a solid and convincing set of data and substantially advances our understanding on the neuroprotective effects of reserpine beyond the scope of the retina and therefore can be expected to have widespread influence on a readership interested in the principles of neuroprotection rebalancing proteostasis.

      We sincerely thank the reviewer for the positive comments and suggestions. This study has taken many years to materialize. We agree and have now performed full-field electroretinogram (ERG) of untreated and reserpine-treated rd16 retina (as stated in response to an earlier comment). Scotopic a-wave was only marginally increased, yet scotopic b-wave displayed a significant higher amplitude, suggesting improved rod photoreceptor function (Figure 6D).

      Reviewer #3 (Public Review):

      Chen et al. perform an innovative screen using retinal organoids derived from rd16 mice to identify small molecules to treat CEP290 hypomorphic mutations linked to ciliopathies such as LCA. The authors identify reserpine which promotes photoreceptor development and viability in retinal organoids derived from LCA patient iPSCs and rd16 mouse retinas. The authors finally propose a mechanistic model where reserpine restores proteostasis thereby improving ciliogenesis.

      The authors present a highly effective drug screen that utilizes the benefits of retinal organoids while also accounting for the inherent variability of retinal organoids by performing a screen on 2D cultures derived from the organoids. This is an innovated approach to using retinal organoids in drug screens and is of interest to the greater community. The success of the screen is reflected in the effectiveness of reserpine in the in vivo rd16 mouse retinal model where it promotes photoreceptor survival. However there are multiple issues with the LCA patient organoid screen that must be resolved.

      We are grateful to the reviewer for generous comments. We have incorporated the suggestions and performed additional work to resolve the issues, as mentioned earlier in this response as well as below.

      The patient derived iPSC lines are not controlled sufficiently enough to make conclusions stated in the manuscript. The authors rely on single iPSC clones from disease patients to perform experiments, and it is not clear whether karyotyping to validate normal chromosomal integrity was performed. In the case of the RNAseq experiment one patient clone does not show any differences calling into question the findings from the other clone. Patient derived iPSC studies would benefit from the use of multiple independently derived iPSC clones per patient, or rescuing the LCA10 mutation using CRISPR editing to validate the correlation of the mutation with the differences observed.

      This study could be strengthened by parallel RNAseq studies is the rd16 mouse retina and patient iPSC retinal organoids.

      Thanks for the suggestions. As mentioned earlier in “Essential Revisions” and response to other reviewers, we have performed additional experiments using multiple iPSC clones and from three patients (2 each from LCA1 and LCA2). These iPSC lines have been characterized previously (Shimada et al. 2017). We have now provided more details on iPSC derivation, iPSC maintenance, and differentiation. Karyotypes of all human and mouse iPSC lines were provided in Figure 1—figure supplement 1. Retinal organoids were generated using iPSC lines within 10 passages of test cells.

      The purpose of the RNA-seq data is to provide primers on the signaling pathways modulated by reserpine treatment. The rescue effect of reserpine suggests that these pathways might be implicated in disease pathogenesis. Based on our RNA-seq data, we have validated the dysregulation of proteostasis pathway in patient-derived retinal organoids and in vivo rd16 retina. Further investigations are needed to validate other pathways but are beyond the scope of this manuscript. Although RNA-seq studies have advantages, more detailed molecular and functional assays are needed to validate the findings of RNA-seq studies and therefore we argue that performing additional RNA-seq on different clones or cell lines or mouse retina would provide more solid information.

      According to our quantification of rhodopsin staining intensity (Figure 3C and Figure 3—figure supplement 2), LCA1 organoids are more responsive to reserpine compared to LCA2, which is not surprising based on the variations of patient responsiveness to drug treatments in previous clinical studies. We note that reserpine is not a transcription factor, thus the differentially expressed genes in reserpine treatments are secondary effects and the change of gene profiles upon reserpine treatment could vary in time and intensity, which could explain the few differentially expressed genes observed in LCA-2. Nevertheless, the action mechanisms of reserpine we found based on LCA1 could be validated on LCA2 (Figure 5—figure supplement 3), further strengthening our findings.

      The reason why we performed RNA-seq on treated organoids but not treated mice was to identify the signaling pathways modulated by reserpine in a well-controlled manner in order to catch the small changes. Compared to reserpine treatment on organoid cultures, in which the organoids have stable and constant contact with reserpine, intravitreal injection of reserpine into P7 mice is technically challenging and leads to substantial variations. In this case, some small changes might be missed and masked by the variations.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors sought to be able to examine what cellular mechanisms underlie increases in mature blood cell production upon immune challenge. To this end they devised a new in vitro organ culturing system for the lymph gland, the main hematopoietic organ of the fruit fly Drosophila melanogaster; the fly serves as an excellent model for studying fundamental questions in immunology, as it allows live imaging combined with genetic manipulation, and the molecular pathways and cellular functions of its innate immune system are highly conserved with vertebrates.

      The authors provide compelling evidence that the cultured lymph gland shows a similar time scale, dynamics, and capacity for cell division as was observed in vivo, and does not undergo undue oxidative stress in their optimized culture conditions. This technique will prove extremely useful to the large community studying the fly lymph gland, and potentially vertebrate immunologists seeking to expand the models they utilize.

      In these cultured glands, the authors identify progenitors undergoing symmetric cell divisions and provide some evidence that is consistent with, but does not prove, that these two cells maintain their proliferative capacity. They detect equivalent levels in the two equally sized daughter cells of dome-Meso-GFP, a marker for JAK-STAT activity; however, this could be due to an equal inheritance of the protein from the mother, not an equivalent maintenance of a proliferative capacity.

      This is an interesting question. A close look at the our movie (Video 4) of the dome-Meso-GFP marker shows the following sequence of events: the marker is nuclear, the mother cell divides and the nuclear envelope breaks down, cell division is completed, the dome-Meso-GFP re-accumulates at the nucleus of the daughter cells. This sequence of events implies that JAK-STAT is still active in the daughter cells. But as the reviewer points out there is a possibility of inheritance of the signal from the mother. If one of the cells were to differentiate, we would expect two things to occur, a differentiation marker to turn on in one of the daughter cells, and likely a slow decrease in the signal level of dome-Meso-GFP in one of the cells over time. We failed to mention that we accounted for both of those possibilities in our experiments such as the one shown in Video 5. We did this by first, including the eater-dsRed in the genetic background (see Figure 2 figure legend) in which these experiments were undertaken, if differentiation took place dsRed level would go up, an occurrence which we did not observe. Second, long-term tracking of dome-Meso-GFP levels for extended periods of time after completion of cell division did not show divergence or significant decrease of protein levels in the two daughter cells (Figure 2 - figure supplement 2). In any case, to directly make readers aware of this important caveat raised by the reviewer concern we added to the Results section in line 225-230 an explanation mentioning the possibility of inheritance of the marker and why we did not think this was the case.

      The authors develop a technique to conduct tracking of progenitor cell size over time in the cultured lymph glands and identify a switch increase in growth after division, as well as two orientations of the divisions, with the main one occurring 90% of the time.

      They show that bacterial infection results in a significant decrease in the division of Blood progenitors and the elimination of the minor orientation of division, but no obvious change in the rate of division.

      By imaging two markers, Dome-GFP for the progenitor state and Eater dsRed for the differentiated one, they examine the trajectories by which differentiation occurs in the wild-type lymph gland. They describe two main categories of fate transitions. In one that they call linear, the blood cells express high levels of the differentiation marker along with the progenitor marker before turning off the progenitor marker. The dynamics of how these progenitor cells get to the state of expressing both the differentiation and progenitor marker at high levels is not described. In the other, which they call sigmoidal, cells express only high levels of the progenitor marker, and the differentiation marker increases after or as the progenitor marker decreases. The authors show that upon infection there is a large increase in the amount of the linear type of differentiation. But how this change in the type of differentiation upon infection explains the increased amount of differentiation is not clear.

      A potential explanation comes from an aspect of their data that the authors don't comment upon. In their live analysis of lymph glands at a distinct time point in the uninfected state (Fig 7M-N), 95% of the cells they analyze traversing the sigmoidal path are in the intermediate step. This would predict that the cells on this path spend a much longer time stuck in this intermediate state before traversing to the final differentiated one, or that only a small fraction of the cells that become sigmoidal intermediate cells progress onwards to full differentiation. But this does not match the trajectories observed in the real-time analysis for uninfected cultured lymph glands (Fig 7A'-D') marker. Perhaps their algorithm discarded traces from the live imaging in which the differentiation marker did not come up quickly and was thus not analyzed in the trajectories.

      If my interpretation of the single time point analysis is true, this would argue that the linear path is actually much faster/more fruitful than the sigmoidal one and this would explain why a higher level of total progenitor differentiation infection is the result of infection-inducing more differentiation by the linear path. Otherwise, I don't understand how their data explains that observation.

      We understand the reviewer concern here and would like to state categorically that we did not use an algorithm to “discard” traces. As the reviewer outlines there is a large concentration of cells in the Dome-Meso-GFP (low expressing), eater-dsRed (low expressing) state. This is an intermediate state for the sigmoid differentiation trajectory. The reviewer suggests two scenarios to explain this. The first scenario is that this is the slowest (and thus rate limiting) step in the sigmoid differentiation trajectory. But, also as the reviewer notes, our tracking of individual cell trajectories doesn't show that cells spend a lot of time in this state. This leaves the second scenario the reviewer outlines, that only a small fraction of the cells that are in the Dome-Meso-GFP (low expressing), eater-dsRed (low expressing) state go on to differentiate (at least in the larval stage). We favor this model, because it is consistent with our observations, mainly that manipulating the sigmoid pathway is not a good way to induce the production of mature blood cells following infection, compared to manipulating the linear pathway. As the reviewer correctly points out the linear pathway provides a powerful way to change the rate of production of mature blood cells, with a few hours of infection the number of cells that are found in the intermediate state for this trajectory (Dome-Meso-GFP (high expressing), eater-DSred (high expressing)) increases 5-6 times. We now mention this specifically in the Discussion in line 532-539.

    1. Author Response

      Reviewer #1 (Public Review):

      Single-cell sequencing technologies such as 10x, in conjunction with DNA barcoded multimeric peptide MHCs (pMHCs) has enabled high throughput paring of T cell receptor transcript with antigen specificity. However, the data generated through this method often suffers from the relatively high background due to ambient DNA barcodes and TCR transcripts leaking into "productive" GEMs that contain a 10X bead and a T cell decorated with antigen-specific barcoded proteins. Such contaminations can affect data analysis and interpretation and have the potential to lead to spurious results such as an incorrect assessment of antigen-TCR pairs or TCR cross-reactivity. To address this problem, Povelsen and colleagues have described a data-driven algorithm called "Accurate T cell Receptor Antigen Pairing through data-driven filtering of sequencing information from single-cells" (ATRAP) that supplies a set of filtering approaches that significantly reduces background and allows for accurate pairing of T cell clonotypes with cognate pMHC antigens.

      This paper is rigorously conducted and will be useful for the field - there are some areas where further clarifications and comparisons will benefit the reader.

      Strengths:

      1) Povelsen and colleagues have systematically evaluated the extent to which parameters in the experimental metadata can be used to assess the likelihood of a GEM to correctly identify the antigen specificity of the associated T cell clonotype.

      2) Povelsen and colleagues have provided elegant data-driven scoring metrics in the form of concordance score, specificity score, and an optimal ratio of pMHC UMI counts between different pMHCs on a GEM, which allows for easy identification of poor quality data points.

      3) Based on the experimental goals, ATRAP allows for customizable filters that could achieve appropriate data quality while maximizing data retention.

      Weakness:

      1) The authors mention that 100% of the 6,073 "productive" GEMs contained more than one sample hashing barcode, and 65% contained pMHC multiplets. While the rest of the paper elaborates on the steps taken to deal with pMHC multiplets issue, not much is said about the extent of multiplet hashing issue and how was it dealt with when assigning cells to individual donors. How is this accounted for? Even a brief explanation would be beneficial.

      We agree that the issue of multiplet hashing was only very briefly discussed in the manuscript. The reason for this is that although cell hashing multiplets exist for every GEM, it is generally a much simpler issue to solve than pMHC multiplets, because one hashing entry most often has much higher counts compared to the others (see supplementary fig. 3). Moreover, in the experimental design, only one hashing antibody is added to each sample. It is therefore given that only a single hashing signal should be associated with each GEM, i.e. this does not mirror the complex nature of the pMHC data, where cross-reactivity could result in more than one pMHC being a true binder to a given TCR. Given the simplicity associated with the hashing signal, we have here opted for utilizing an existing tool to annotate cell hashing. We have elaborated the description of this in the revised manuscript (line 384).

      2) It would be helpful for the authors to describe how experimental factors such as the quality of the input MHC protein may affect the outputted data (where different proteins may have different degrees of non-specific binding), and to what degree the ATRAP approach is robust to these changes. As an example, the authors mention that RVR/ A03 was present at high UMI counts across all GEMs and RPH/ B07 was consistently detected at low levels. Are these observations the property of the pMHCs or the barcoded dextran reagent? Furthermore, are there differences in the frequency of each of these multimers in the starting staining library which manifests in consistent high vs low read counts for the pMHC barcodes?

      We understand the reviewers' concern. We have extensive experience from staining with large libraries of different pMHCs in a bulk setting (Bentzen et al 2016), where it is part of the routine analyses to include an aliquot of the barcoded pMHC library taken prior to incubation with cells (input sample). From this data, we know that even if pMHCs are present in uneven amounts prior to cell incubation, this unevenness is not translated to the final output. I.e. if a given barcode (associated with a specific pMHC) is present at levels up to 2x higher than the remaining barcodes, this does not result in that barcode also being enriched after cell incubation if T cells do not recognize the corresponding pMHC. And vice versa, a barcode present at lower levels in the input can still be enriched after incubation with cells.. From the same type of data, we also have experience with differences in the background associated with different MHC/HLA molecules, i.e. a general higher level of background related to a certain MHC irrespectively of the peptide bound in this. We agree that this potentially could be a confounding factor influencing our results (as it will influence any other results related to the potential different background signal associated with different MHC/HLA molecules). We are currently in other studies investigating in a broader sense whether these differences reflect a biological inherent MHC association or are experimental artifacts. In the current work, we have opted for not defining pHLA specific UMI count threshold to ensure that any biological relevance remains unmasked, but still ensure that we can at the same time filter the data to identify the most likely true pMHC specific interaction.

      3) It would be helpful for the authors to further explain how ATRAP handles TCRs that may be present in only one (or a small number) of GEMs, as seen in Figure 7b, and potentially for the large number of relatively small clonotypes observed for the RVR/A03 peptide in Figure 6 (it is difficult to know if the long tail of clonotypes for RVR is in the range of 1 or 10 GEMs based on the scale bar). Beyond that, is there any effect on expected (or observed) clonal expansion on these data analyses, for example, if samples are previously expanded with a peptide antigen ex vivo or not?

      ITRAP removes any GEM that does not meet the criteria of the selected filters. Small clones are only removed if all GEMs in a clone fail to meet the selected filter criteria. As ITRAP is based on combinations of filters which are user-defined, one can choose to filter away singlet specificities, i.e. a TCR-pMHC pair only observed in a single GEM. However, this might not be relevant in all cases. We believe that it is a strength of the method that it is flexible and adaptable to the needs of individual users. This also allows for additional filters to be imposed by the user, if one for instance wishes to remove clones of fewer than a certain number of GEMs. With respect to figure 6, we agree that it was difficult to estimate the number of clonotypes within a given peptide plateau, and have updated the figure to include a clonotype count in the x-axis. In relation to the effect on clonotype expansion, we would first like to refer to figure 7. Here, we in figure a) and b) display the observed T cell frequencies towards the individual pMHCs as obtained by the two different experiment approaches a) conventional fluorescent multimer staining, and b) GEMs counts as obtained using the single-cell pipeline described here. This analysis demonstrates a very high concordance between the two approaches of the T cell populations, reflected by the vast majority of the responses detected by fluorescent multimer staining also being captured in the single-cell screening, (recall of 0.95). This result suggests that sensitivity of the SC approach, in the context of the current pMHC epitope set, is comparable to that of conventional fluorescent multimer staining. With regard to clonotype expansion, we would next like to refer back to figure 3. Even though we have not expanded the clones in vitro, this figure shows how the specificity of a TCR clone can be more confidently assigned when there are more GEMs mapped to a given TCR clone. Hence, to identify a single TCR-pMHC match, it could in many cases be valuable to expand a given clone prior to the experiments. However, since the 10x pipeline can only include a limited number of cells, we argue that it is valuable to identify pMHC TCR pairs on unexpanded/unmanipulated material to include as many different pairs as possible.

      4) The authors mention a second method, ICON, for conducting these types of analyses, and that the approach leads to significantly more data loss. However, given there could be differences in dataset quality themselves, and given the dataset, ICON is publicly available, it would be helpful for a more explicit cross-comparison to be conducted and presented as a figure in the paper.

      We have conducted such a comparative analysis in a separate manuscript (available at BioRxiv doi.org/10.1101/2023.02.01.526310). The overall conclusion is that both methods allow for effective denoising of the provided data, with an overall advantage in favor of iTRAP. We have extended the discussion in the current manuscript with a brief summary of the main findings from this study.

      Reviewer #2 (Public Review):

      The study by Povlsen, Bentzen et al. describes certain computational pipelines authors used to analyze the results from a single-cell sequencing experiment of pMHC-multimer stained T cells. DNA-barcoded pMHC multimers and single-cell sequencing technologies provide an opportunity for the high-throughput discovery of novel antigen-specific TCRs and profiling antigen-specific T-cell responses to multiple epitopes in parallel from a single sample. The authors' goal was to develop a computational pipeline that eliminates potential noise in TCR-pMHC assignments from single-cell sequencing data. With several reasonable biological assumptions about underlying data (absence of cross-reactivity between these epitopes, same specificity for different T-cells within a clonotype, more similarity for TCRs recognizing the same epitope, HLA-restriction of T cell response) authors identify the optimal strategy and thresholds to filter out artifacts from their data.

      It is not clear If the identified thresholds are optimal for other experiments of this kind, and how the violation of authors' assumptions (for example, inclusion of several highly similar pMHC-multimers recognized by the same clone of cross-reactive T cells) will impact the algorithm performance and threshold selection by the algorithm. The authors do not discuss several recent papers featuring highly similar experimental techniques and the same data filtering challenges:

      https://www.science.org/doi/10.1126/sciimmunol.abk3070

      https://www.nature.com/articles/s41590-022-01184-4

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9184244/

      As described above, we have investigated the use of ITRAP on the large data set provided by 10X Genomics, and here further compared the result to that obtained by ICON in an independent publication [BioRxiv doi.org/10.1101/2023.02.01.526310]. We have included a brief summary of the findings in study in the current manuscript. The overall results and conclusions between the two studies align very well. UMI count filtering and donor-HLA matching are in both cases driving the strongly denoising signal. However, when it comes to the identified UMI thresholds, they were found to differ between the two data sets. As stated above, this we however believe to be a strength of the ITRAP framework, since it demonstrates that the tools can be robustly applied to data originating from very different technical and/or biological settings.

      We acknowledge that ITRAP is highly dependent on the data containing a set of “large” clonotypes for which a single pMHC target can be assigned using the statistical approach outlined in the manuscript. This since the UMI filtering thresholds are defined based on these clonotypes and associated peptide annotations. However, other than this, the method does not exclude identification of cross-reactive TCR (in contrast to for instance ICON). We have expanded the discussion to make this point more clear.

      When it comes to the papers mentioned by the reviewer, these are clearly of high interest to us, and we are currently in the process of analyzing these data using the ITRAP framework. We however believe these analyses are beyond the score of the current publication, in particular since we have conducted the parallel benchmark study on the 10X Genomics data mentioned above.

      Unfortunately, I was unable to validate the method on other datasets or apply other approaches to the authors' data because neither code nor raw or processed data were available at the moment of the review.

      All data sets and code has been made publicly available at https://services.healthtech.dtu.dk/suppl/immunology/ITRAP

      One of the weaknesses of this study is that the motivation for the experiment and underlying hypothesis is unclear from the manuscript. Why these particular epitopes were selected, why these donors were selected, are any of the donors seropositive for EBV/CMV/influenza is unclear. Without particular research questions, it is hard to evaluate pipeline performance and justify a particular filtering strategy: for some applications, maximum specificity (i.e. no incorrect TCR specificity assignments) is crucial, while for others the main goal is to retain as many cells as possible.

      We understand this concern and have elaborate our motivation for the experimental design in the text. The overall motivation for this study was to generate TCR-pMHC data complementing what was available in the public domain at the start of the project. This with the purpose of generating novel data for training of TCR specificity prediction models. This is also the reason why we explicitly “deselected” T cells specific for the 3 negative control peptides, since these already are covered with large amounts of TCR sequences in the public databases.

      We do not know the serostatus of the donors included, but have determined the antigen-specificities present in the donors prior to initiating the study (evaluated for T cell recognition against 945 common viral specificities, using barcoded pMHC multimers in a bulk setting). The 945 peptides were selected from prevalent epitopes within IEDB. This means that the T cell specificities for the donors selected to be included in the current study was known a priori. We have updated the motivation for performing the study (lines 122-126).

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript "Optimal Cancer Evasion in a Dynamic Immune Microenvironment Generates Diverse Post-Escape Tumor Antigenicity Profiles" by George and Levine describes TEAL - a mathematical model for the dynamics of cancer evolution in response to immune recognition. The authors consider a process in which tumor cells from one clone are characterized by a set of neoantigens that may be recognized by the immune system with a certain probability. In response to the recognition, the tumor may adapt to evade immune recognition, by effective removal of recognizable neoantigens. The authors characterize the statistics of this adaptive process, considering, in particular, the evasion probability parameter, and a possibility of an adaptive strategy when this parameter is optimized in each step of the evolution. The dynamics of the latter process are solved with a dynamic programming approach. In the optimal case, the model captures the tradeoff between a cancer population's need for adaptability in hostile immune microenvironments and the cost of such adaptability to that population. Additionally, immune recognition of neoantigens is incorporated. These two factors, antitumor vs pro-tumor IME as quantified by the Beta penalty term, and the level of immune recognition as quantified by the rate q, form the basis of a characterization of tumors as 'hot' or 'cold'.

      I think this framework is a valuable attempt to formally characterize the processes and conditions that result in immunologically hot vs cold tumors. The model and the analytical work are sound and potentially interesting to a major audience. However, certain points require clarification for evaluation of the relevance of the model:

      1) Tumor clonality

      My main concern is about the lack of representation of the evolutionary process in the model and that the heterogeneity of the tumor is just glossed over.

      The single mention of the problem occurs in Section 2, p2: "Our focus is on a clonal population, recognizing that subclonal TAA distributions in this model may be studied by considering independent processes in parallel for each clone."

      I don't think this assumption resolves the impact of tumor heterogeneity on the immune evasion process. Furthermore, I would claim that the process depicted in Fig 1A is very rare and that cancers rarely lose recognizable neoantigens - typically it would be realized via subclonal evolution, with an already present cancer clone without the neoantigens picking up. Similarly, the adaptation of a tumor clone is an evolutionary process - supposedly the subclones that manage to escape recognition via genetic or epigenetic changes are the ones that persist. It is not clear what the authors assume about the heterogeneity of the adapting/adapted population between different generations, n->(n+1). Is the implicit assumption that the n+1 generation is again clonal, i.e. that the fitness advantage of the resulting subclone was such that the remaining clones were eliminated? Or does the model just focuses on the fittest subclone? A discussion on whether these considerations are relevant to the result would clarify the relevance of the result.

      We thank the reviewer for these helpful clarifying points. Empirical evidence in lung cancer exists for genomic changes manifesting as lost neoantigens in treatment-resistant clones (and Anagnostou et al. Cancer Discovery 2017) showed that those lost antigens were also shown to generate functional immune responses). Similar results for melanoma have also been shown (Verdegaal et al. Nature 2016), with loss of neoantigens associated with reactivity in TILs. Recent observations (Jaeger et al. Clinical Cancer Research 2020) even show that mutated peptides may be hid by protein stabilization, in addition to reduced expression patterns. We however do wish to clarify that our model implicitly equates antigen loss and the progression of a subpopulation currently adapted to evade immune targeting – either by direct pruning of the fittest subclone or by stochastic emergence and subsequent growth of a new one lacking the targeted antigens – as equivalent.

      Because we for foundational understanding studied the case where a single clonal signature was tracked in time, we under-explained the implementation of such a model in more complicated cases. As mentioned previously, the next most complicated scenario involves a heterogeneous population of cancer cells with disjoint neoantigen profiles. In this case, a parallel process can be studied wherein the effects of recognition in one environment are decoupled from the other (relevant to, for example, spatially distinct sub-populations). This description however misses the case where such disparate populations evolve to express shared antigens, or in the case where there are both clonal and subclonal antigen targets. Here, our model can still be applied in parallel to study distinct clones but requires additional structure. Namely, in this case we would need to incorporate non-trivial coupling between the possible recognition/selection against certain antigens shared across clones. For example, control of a population with clonal antigens {a,b} but having unique subclones having either antigens {w,x} or {y,z} could be considered by studying the process in parallel, and control in the next periods would require recognition/selection against either 1) at least one of {w,x} and at least one of {y,z}, or 2) at least one of {a,b}. In this more general framework, the arrival of new subclones with distinct features from the parent clone in question could also be incorporated and studied across time periods. This strategy of subdividing more complicated evolutionary structures has now been further elaborated on in the Methods section, and we have expounded these points in the discussion (see additions given under Editor Comment 2).

      2) Time scales

      Section 2, p2: "We assume henceforth that the recognition-evasion pair consists of the T cell repertoire of the adaptive immune system and a cancer cell population, recognizable by a minimal collection of s_n TAAs present on the surface of cancer cells in sufficient abundance for recognition to occur over some time interval n.".

      How do the results depend on the duration of interval n? The duration should be long enough to allow for recognition and, up to some limiting duration, proportional to the TAA recognition probability q. However, it should not be so long that the state of the system can change significantly. A clarification on this point is needed.

      We agree with the reviewer that these points should be elaborated upon when discussing the time interval. Very briefly, we opted for a discrete-time model tracking a cancer population under selective immune pressure. In order for 𝒒 to represent the total recognition probability of an immune system against a particular TAA, the time interval 𝚫𝒏 in question is a coarse-grained feature representing the time between the earliest chance that the adaptive immune system may identify a cancer clone and the latest point after which such a recognition event would no longer be able to prevent cancer escape. This time period may vary substantially across cancer subtypes and depends on the cancer per-cell division rate, for example (George, Levine. Can Res 2020). As the reviewer pointed out, in implementing such a model there is an asymmetric risk to considering 𝚫𝒏 too large, as the future state of the system may not be well-reflected by the simple loss and addition of new TAAs. On the other hand, considering small time intervals 𝚫𝒏, while possible, would require the incorporation of additional intermediate states ending in neither cancer elimination nor cancer escape.

      We have clarified the points that the reviewer has brought up by adding them to the discussion section: In this discrete-time evolutionary model, the intertemporal period considered represents the time period between the earliest moment that the adaptive immune system may identify a cancer clone and the latest point after which such a recognition event would no longer be able to prevent cancer escape (George, Levine. Can Res 2020). This effectively gives 𝒒 a probabilistic representation for the total rate of opportunity to recognize a given TAA during cancer progression. Implementing this model in cancer subtype-specific contexts thus requires a consideration of the per-cell division rates, for example.

      Reviewer #3 (Public Review):

      Cancer cell populations co-evolve under the pressure exerted by the recognition of tumor-associated antigens by the adaptive immune system. Here, George and Levine analyze how cancers could dynamically adapt the rate of tumor-associated antigen loss to optimize their probability of escape. This is an interesting hypothesis that if confirmed experimentally could potentially inform treatments. The authors analyze mathematically how such optimally adapting tumors gain and lose tumorassociated antigens over time. By simplifying the complex interplay of immune recognition and tumor evolution in a toy model, the authors are able to study questions of practical interest analytically or through stochastic simulations. They show how different model parameters relating to the tumor microenvironment and immune surveillance lead to different dynamics of tumor immunogenicity, and more immunologically hot or cold tumors.

      Simple models are important because they allow an exhaustive study of dynamical regimes for different parameters, such as has been done elegantly in this study. However, in this quest for simplification, the authors have not considered biological features that are likely to be of importance for understanding the process of cancer immune co-evolution in generality: tumor heterogeneity and immune recognition that only stochastically results in cancer elimination. In this sense, this paper might be seen as the opening act in a series of more sophisticated models, and the authors discuss avenues towards such further developments.

      We share the reviewer’s credence in foundational modeling for comprehensive predictions on available dynamical behavior for the important problem at hand. The reviewer also correctly points out that that future model refinement will be needed to further develop the foundational model developed in this work. In an attempt to illustrate one of the more reasonable generalizations, which is to include nontrivial sub-clonal heterogeneity in tumor antigens, we now describe how one would go about enhancing the existing model to address this, which has been added to the Methods and Discussion sections (see additions given under Editor Comment 2).

    1. Author Response

      Reviewer #1 (Public Review):

      N1-methyladenosine (m1A) is a rather intriguing RNA modification that can affect gene expression and RNA stability etc. The manuscript presented the exploration of RNAs m1A modification in normal and OGD/R-treated neurons and the effects of m1A on diverse RNAs. The authors showed that m1 modification can mediate circRNA/LncRNA-miRNA-mRNA mechanism and 3'UTR methylation of mRNAs can disturb miRNA-mRNA binding.

      The manuscript provides evidence for the following,

      1) The OGD/R can have impacts on various functions of m1A mRNAs and neuron fates.

      2) The m1A methylation of mRNA 3'UTRs disturbs the miRNA-mRNA binding.

      3) The authors identified three possible patterns of m1A modification regulation in neurons.

      The main merit of the manuscript is that the authors identified some critical features and patterns of m1A modification and in neurons and OGD/R-treated neurons. Moreover, the authors identified m1A modifications on different RNAs and explored the possible effects of m1A modification on the functions of different RNAs and the overall posttranscriptional regulation mechanism via an integrated approach of omics and bioinformatics. The major weakness of the manuscript is that technique details for many results are missing. Moreover, language inconsistences can be found throughout the manuscript. My general feeling about the manuscript is that some conclusions are rather superficial and therefore require validation and discussion.

      We appreciate your endorsement and constructive opinion concerning our work. Our study provides a comprehensive exploration of the characteristics of m1A modifications in neurons. According to your suggestions, we have specified the technique details in the revised manuscript have included our perspectives on some of the conclusions in the Discussion section. In addition, we have made changes to language inconsistences throughout the manuscript. We hope that the revisions made are acceptable and meet your requirements.

      Reviewer #2 (Public Review):

      In this manuscript, investigators explore the m1A modification, an important post-transcriptional regulatory mechanism, in primary normal neuron and OGD/R treated neuron. As far as I know, the regulatory m1A modification remains poorly characterized in neuron. This is an interesting topic in the context of epitranscriptomics. This paper not only provided us with a landscape of m1A modifications in neuron, but also explored the impact of m1A modifications on the biological functions of different RNA (mRNA, lncRNA, circRNA). In addition, the argument that m1A modification affects miRNA binding to other RNAs is of interest to reader, and the authors have performed a dual luciferase validation here to add feasibility to this conclusion.

      Thank you for your careful review of our study, and thank you for your appreciation on our work. The aim of this work was to explore the characteristics of m1A modification in neurons. We believe that incorporating your advice into the revised manuscript has enhanced the quality of our article.

      Reviewer #3 (Public Review):

      Overall, this is an interesting and well performed study that described a comprehensive landscape of m1A modification in primary neuron and investigated the role of m1A in the circRNA/lncRNA‒miRNA-mRNA regulatory network following OGD/R. The focus on the two different complex regulatory networks for differential expression and differential methylation is important and it will be a valuable resource for the research community that focuses on epitranscriptomics and central nerve system diseases. Collectively, the authors present an exciting piece of work that certainly adds to the literature regarding epitranscriptomic features in neuron. While interesting results obtained and the paper is nicely written, I have the following suggestions for minor revisions to improve the paper.

      We are grateful for your many positive comments and recognition of the potential of our work. Due to your suggestion, we found some shortcomings in our current manuscript. These suggestions were introduced and added value to our article. Our future research will continue to explore some conclusions obtained from this work. And we will continue to contribute our research outcomes in this field. Thank you again for your excellent suggestions!

      1) The authors have explored the role of m1A modification in neuron, but it would have been helpful if the authors described the significance of these findings in depth in some sections (Figure 5 and Figure 6) to enhance the value of the article.

      Thank you for your insightful suggestion. We agree to the comment that the significance of these findings should be described in detail. As such, we have added corresponding content to the Results (line 407-424) and Discussion (line 532-550) sections respectively.

      2) The authors should describe in detail the current research state of m1A modification and the significance of this study to the field of epitranscriptomics in the introduction and Discussion section.

      Thank you for your insightful suggestion. There is relatively little knowledge in the m1A modification area. It is really important to summarize the existing knowledge and research progress in a comprehensive and detailed manner. We conducted a comprehensive latest literature search and added corresponding content to the Introduction (line 78-83) and Discussion section (line 505-511, line 532-562) as you suggested.

    1. Author Response

      Reviewer 1 (Public Review):

      Protein oligomerization is essential to their in vivo function, and it is generally challenging to determine the distribution of oligomeric states and the corresponding conformational ensembles. By combining coarse-grained molecular dynamics simulations and experimental small-angle X-ray scattering profiles at different protein concentrations, the authors have established a robust approach to self-consistently determine the oligomeric state(s) and the conformational ensemble. The approach has been applied specifically to the speckle-type POZ protein (SPOP) and generated new insights into the conformational ensemble and structural features that determine the ensemble. The model was further tested by the analysis of several relevant mutants as well as models with different types of structural restraints. The results also support the isodesmic selfassociation model, with KD values comparable to those measured from independent experiments in the literature. The approach is potentially applicable to a broad set of systems.

      We thank the reviewer for taking the time to assess our work.

      Reviewer 2 (Public Review):

      This manuscript applied the SAXS data analysis of protein selfassembly by implementing the simultaneous fitting of intra- and intermolecular motions/conformations against SAXS data at a series of oligomerization states/concentrations. Despite several major assumptions hinted, a diverse pool of conformational and oligomeric candidates was generated from CG simulations, and more importantly, these candidates were fitted into these SAXS data to reach a reasonable agreement, suggesting a somewhat convergence (even if the ensemble-fitting could well be at a local minimal). This is considered a technical advance, given the fairly large numbers of both the oligomer fraction phi_i (i=1, ..., N) and the conformational weight w_k (k=1, ..., n), where N is the number of oligomers and n is the number of internal conformational states.

      We thank Prof. Yang for taking the time to assess our work.

      Central is optimizing phi_i and w_k, simultaneously. The former has been illustrated in Fig. 4 and SI-Fig. 7 for the total number of 60mers. The latter relies on an overfitting-preventing strategy, as shown in SI_Fig. 1, where an effective fraction cutoff was used from 0.1 to 1.0, as opposed to the number of conformational states. What are the numbers of conformational states for these oligomers? This should be quantifiable, e.g., defining the conformational differences by chi_2.

      The reviewer is correct that the entropy-based term for preventing overfitting is a key aspect of the method. In contrast to some of the other methods to combine experiments with simulations, our approach does, however, not require us to define individual conformational states. Instead, the weights in the entropy term refer to individual configurations rather than states, and we can thus integrate the SAXS experiments and simulations without, for example, clustering the conformations. Indeed, for most of the collective variables that we have calculated from the ensembles, such as the radii of gyration, end-to-end distances, and MATH-MATH distances, we observe continuous monomodal probability distributions, which suggests that it might be difficult to define a few distinct conformational states. For the MATH-BTB/BACK distance, we observe a trimodal distribution, and these distinct conformational states are shown as overlaid structures in Fig. 4i. Thus, while these “states” change populations during reweighting, this is the result from changing weights of the individual configurations.

      Reviewer 3 (Public Review):

      Molecular-level interpretations of SAXS data are challenging, especially for oligomeric systems of variable length with intrinsic flexibility and the possibility of multiple association interfaces. In order to make this challenge tractable, a number of assumptions are made here: 1) There is a single pathway by which individual domains associate first into homodimers and then into longer oligomers; 2) the association kinetics is isodesmic, which allows the direct calculation of oligomer distributions based on the given value of a single dissociation constant; 3) the internal dynamics within dimers is restricted essentially to relative domain-domain motions, that are sampled comprehensively via MD simulations. As a result, excellent fits to the SAXS data are obtained and the underlying conformational ensembles are highly plausible. The resulting models are useful to further understand SPOP function, especially in the context of liquidliquid phase separation.

      We thank the reviewer for taking time to read our work and for their various suggestions.

    1. Author Response

      Reviewer #1 (Public Review):

      This work provides a new general framework for estimating missing data on cervical cancer epidemiology, including sexual behavior, HPV prevalence, and cervical cancer incidence. These data are useful to determine impact projections of cervical cancer prevention. The authors suggest a three-step approach: 1) a clustering method applied on registries with an intermediate level of data availability to cluster cervical cancer incidence based on a Poisson-regression-based CEM algorithm, 2) a classification method applied on registries with a low level of data availability to classify cervical cancer incidence based on a Random Forest, 3) a projection method applied on missing data based on the mean of available data. The authors use India as a case study to implement this new methodology. Results indicate that two patterns of cervical cancer incidence are identified in India (high and low incidence), classifying all Indian states with missing data to a low incidence. From this classification, missing data is approximated using the mean of the available data within each cluster.

      A strength of this approach is that this methodology can be applied to regions with missing data, although a minimum set of information is needed. This makes it possible to have individual data for each unit in the region.

      One of the weaknesses of this methodology is the need for a minimum set of epidemiological data to enable impact projections. It is true that when epidemiological cervical cancer data is not available, authors mentioned that general indicators (e.g., human development index, geography) can be used but projections will be probably less realistic. As observed with other techniques, countries with fewer resources have less data available and cannot benefit from these types of techniques to have more adequate guidelines.

      Imputation of missing data is always a challenging issue. The technique proposed in this manuscript is an interesting new approach to missing data imputation that could be applied with a minimum set of available data. However, we must focus on obtaining reliable data from each region of the world to help local health authorities implement better preventive measures for the local population.

      We thank the reviewer for the considerate comments and suggestions and have tried to incorporate them as much as possible in the revised manuscript.

      As the reviewer has pointed out, the applicability of the proposed methodology depends on the available data. In our opinion, it is a general challenge for approximating missing data, rather than a weakness particular to our methodology. In fact, we believe that our framework is flexible to address missing data in many situations. To clarify this point, we have included the following sentences in the Discussion (lines 363-376, page 18): “It is important to note that, in general, the applicability the proposed framework depend on the actual amount of data available. However, in our opinion, it is a general challenge for approximating missing data, rather than a weakness particular to our methodology. By allowing possible adaptations, we believe that our framework is sufficient flexible to address missing data in many situations.”

      Finally, we fully agree with the reviewer that we should continue our effort to collect more data for countries where these are not available. The proposed framework should be considered as a solution to the situation in which collection of additional data is not or not yet possible.

      Reviewer #2 (Public Review):

      The burden of cervical cancer worldwide is well recognized. While prevention strategies, including vaccination against human papillomavirus (HPV), cervical cancer screening, and pre-cancer treatment, can reduce the burden of cervical cancer, access to these measures is still limited, especially in low- and middle-income countries. Since the impact of prevention strategies is heavily dependent on the disease's burden on a particular population, we need to know the latter to assess the impact of these context-specific prevention strategies.

      However, epidemiological data on cervical cancer are not always available for all geographical areas. This paper uses India as a case study to propose a framework called "Footprinting" to comprehensively evaluate the burden of cervical cancer. The authors applied a three-step analytical strategy to impute cervical cancer epidemiological data in states where this information was unavailable using data from cervical cancer incidence, HPV prevalence, and sexual behaviour from other regions. The findings suggest a high and low incidence of cervical cancer incidence in different parts of India; all Indian states with missing data were classified as low incidence.

      The proposed analytical strategy presents an important solution for imputing data from geographic areas of a country where data are missing.

      We thank the reviewer for the considerate comments and suggestions and have tried to incorporate them as much as possible in the revised manuscript.

      One conceptual limitation of this work is the lack of explanation or evidence that sexual behaviour can be used to approximate cervical cancer and/or HPV rates.

      A similar comment was raised by Reviewer #1. It is well established that sexual contact is the only transmission route of carcinogenic HPV infection, and hence necessary for the occurrence of cervical cancer [ref #26 Vaccerella 2006, Muñoz 1992 Int J Cancer 52, 743-749].

      We have included sexual behaviour variables that have previously been shown to be risk factors of HPV infection and cervical cancer risk, e.g., age of sexual debut and number of sexual partners [ref #26 Vaccerella 2006, ref #27 Schulte-Frohlinde 2021]. Furthermore, we used variables that are commonly available so that the analyses can be easily applied to other settings.

      As far as we know, there is no established set of sexual behaviour variables for predicting the patterns of HPV prevalence and cervical cancer incidence. The good prediction performance in the India case study shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.

      To clarify these points we have included the following paragraph in the Discussion (lines 319-325, page 16): “In our analysis of classifying clusters of cervical cancer incidence, we only included some of the sexual behaviour variables available in the NACO report [15]. We selected variables that were previously shown to be risk factors of HPV infection and cervical cancer risk and that are commonly available so that the analyses can be easily applied to other settings, e.g., age of sexual debut and number of sexual partners [26, 27]. As far as we know, there is no established set of sexual behaviour variables for predicting the patterns of HPV prevalence and cervical cancer incidence. The good prediction performance shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.”

      Also, full information on the three main indicators is only available in two states. This is used to impute the values for the other states.

      Indeed, HPV prevalence data were only available for two states. While we acknowledge that this affects the certainty in the imputed HPV prevalence, we considered the imputed results to be satisfactory based on the good accordance with the cervical cancer incidence data we found in the validation step (lines 286-23, page 14). We verified that the ratio of HPV prevalence between the high-and low-incidence cluster (1.7-fold) was very similar to the ratio of age-standardized cervical cancer incidence (1.9-fold).

      Furthermore, we note that previous modelling works on India relied on even less data, namely one source of HPV prevalence and cervical cancer incidence data [ref #29 Brisson 2020, Diaz 2008 Br J Cancer].

      Moreover, the available data used in this study also present some limitations; for example, cervical cancer incidence data were from 2012 to 2016, while sex behaviour data were from 2006. This large gap is likely to have a significant cohort effect, especially given changes in sexual norms in Western countries over the last few decades, which may have gradually influenced other countries, especially in this age of the internet and social media.

      In our opinion, for the purpose of modelling the natural history of cervical cancer, it is not necessarily more adequate to use the most recent data of sexual behaviour data. Arguably, as sexual behaviour is the “exposure” for the “outcome” cervical cancer, calibration of HPV transmission and cervical cancer model is best done with data of sexual behaviour and cervical from the same cohorts, hence, sexual behaviour data from an earlier period than the cervical cancer data.

      In addition, if changes of sexual behaviour occur across the country, it should not affect the clustering much.

      Finally, due to delay in reporting, cervical cancer incidence from the period 2012-2016 is the most recent edition at the moment of writing. Regarding sexual behaviour data, there is at the moment no later edition of the NACO report published after that of year 2006.

      Finally, it would be interesting to validate this methodology to confirm its utility.

      We agree that it would be very interesting to validate this proposed methodology in other regions. Unfortunately, it was beyond the scope of this work. Currently, we are working on a project in which we try to apply footprinting to a collection of low- and middle-income countries.

      The proposed framework's strength is difficult to evaluate because the steps and justification for the model variables were not clearly presented, nor were the models validated.

      We acknowledge that the framework could be more clearly presented and have added additional explanation in the following places to do so:

      • Concerning the framework steps, in Method (144-163, pages 7-8): “For convenience of explanation, we assumed earlier that data availability occurs hierarchically. However, the framework can also be applied with less stringent data requirements. First, the source of Footprint data needs not necessarily cover all geographical units. It is still possible to train a classifier in the classification step with Footprint data available for only a part of clustered geographical units. Second, if none of the key cervical cancer epidemiological data (sexual behavior, HPV prevalence, and cervical cancer incidence data) have large enough coverage to serve as Footprint data, alternatives indicators of similarity, such as human development index and geographical distance, could also be used as substitute. However, the resulting classification performance might be suboptimal, as we expect these indicators to correlate less well with cervical cancer risk. Third, for the projection step, data of cervical cancer incidence, sexual behavior, and HPV prevalence needed for calibration of projection models need not necessarily belong to the same geographical unit. Calibration can be performed as long as the three types of data are available within each cluster.

      With these less stringent data requirements, the proposed framework should sufficient flexible to be applied to many situations. However, one should still be cautious in applying the framework when there are little data. This means that, in some cases, we might need to exclude from the analysis some geographical units with too little data or redefine bigger geographical units if the data are not granular enough. Furthermore, we should assess the goodness-of-fit of the obtained clustering, performance of classification, correlation of data within different clusters, and calibration fits to ensure the validity of the final impact projections.”

      • Concerning selection of model variables (lines 319-325, page 16): “In our analysis of classifying clusters of cervical cancer incidence, we only included some of the sexual behaviour variables available in the NACO report [15]. We selected variables that were previously shown to be risk factors of HPV infection and cervical cancer risk and that are commonly available (e.g., age of sexual debut and number of sexual partners) so that the analyses can be easily applied to other settings [26, 27]. In the India case study, the good classification performance shows that using the selected set is sufficient. As sexual behaviour variables are highly correlated, including more variables might even risk overfitting.”

      Based on the authors' interpretation of the framework findings, this framework may help extrapolate data from one country to another. I'm curious as to whether this framework could be applied across states and countries.

      We thank the reviewer for this comment. Currently, we are working on a multi-year projects in which we try to apply the framework to all low- and middle-income countries.

    1. Author Response:

      eLife assessment

      This work is an attempt to establish conditions that accurately and efficiently mimic a drought response in Arabidopsis grown on defined agar-solidified media - an admirable goal as a reliable experimental system is key to conducting successful low water potential experiments and would enable high-throughput genetic screening (and GWAS) to assess the impacts of environmental perturbations on various genetic backgrounds. The authors compare transcriptome patterns of plant subjected to water limitation imposed using different experimental systems. The work is valuable in that it lays out the challenges of such an endeavor and points out shortcomings of previous attempts. However, a lack of water relations measurements, incomplete experimental design, and lack of critical evaluation of these methods in light of previous results render the proposed new methodology inadequate.

      We thank eLife for the initial assessment and comments to our work. In our revised manuscript we plan to address the main concerns raised by reviewers. Specifically, we plan to perform water relations measurements for all our treatment assays, as well as explore the separate effects agar hardening and nutrient concentration have in our low-water agar assay. We will also provide a more in depth critical review of our results compared to previously published results.

      Reviewer #1 (Public Review):

      High-throughput genetic screening is a powerful approach to elucidate genes and gene networks involved in a variety of biological events. Such screens are well established in single-celled organisms (i.e. CRISPR-based K/O in tissue culture or unicellular organisms; screens of natural variants in response to drugs). It is desirable to extend such methodology, for example to Arabidopsis where more than 1000 ecotypes from around the Northern hemisphere are available for study. These ecotypes may be locally adapted and are fully sequenced, so the system is set up for powerful exploration of GxE. But to do so, establishing consistent "in vitro" conditions that mimic ecologically relevant conditions like drought is essential. 

      The authors note that previous attempts to mimic drought response have shortcomings, many of which are revealed by 'omics type analysis. For example, three treatments thought to induce osmotic stress; the addition of PEG, mannitol, or NaCl, fail to elicit a transcriptional response that is comparable to that of bonafide drought. As an alternative, the authors suggest using a low water-agar assay, which in the things they measure, does a better job of mimicking osmotic stress responses. The major issues with this assay are, however, that it introduces another set of issues, for example, changing agar concentration can lead to mechanical effects, as illustrated nicely in the work of Olivier Hamant's group.

      We thank the reviewer for their comments. We hypothesize that our low-water agar assay is able to replicate drought gene expression patterns through a combination of hardened agar and higher nutrient concentration. However, we did not explore the separate effects each of these factors may play in eliciting such responses. Thus, in our revised manuscript, we will explore what role the mechanical effects of changing agar concentration has on root gene expression. However, we suspect that the mechanical effects introduced by hard agar does not introduce another issue per se, but in fact may help with replicating the transcriptional effects seen under drought.

      Reviewer #2 (Public Review):

      […] The authors have not always considered literature that would be relevant to their topic. For example, there is a number of studies that have reported (and deposited in the public database) transcriptome analysis of plants on PEG-plates or plants exposed to well-controlled, moderate severity soil drying assays (for the latter, check the paper of Des Marais et al. and others, for the former, Verslues and colleagues have published a series of studies using PEG-agar plates). They also overlook studies that have recorded growth responses of wild type and a range of mutants on properly prepared PEG plates and found that those results agree well with results when plants are exposed to a controlled, partial soil drying to impose a similar low water potential stress. In short, the authors need to make such comparisons to other data and think more about what may be wrong with their own experimental designs before making any sweeping conclusions about what is suitable or not suitable for imposing low water potential stress. 

      To solve the problem of using these other systems to impose low water potential stress, the authors propose the seemingly logical (but overly simplistic) idea of adding less water to the same mix of nutrients and agar. Because the increased agar concentration does not substantially influence water potential (the agar polymerizes and thus is not osmotically active), what they are essentially doing is using a concentrated solution of macronutrients in the growth media to impose stress. This is a rediscovery of an old proposal that concentrated macronutrient solutions could be used to study the osmotic component of salt stress (see older papers of Rana Munns). There are also effects of using very hard agar that is of unclear relationship to actual drought stress and low water potential. Thus, I see no reason to think that this would be a better method to impose low water potential. 

      We thank the reviewer for their comments. In our revised manuscript, we will address points regarding plant and soil water potential; similar concerns were also raised by Reviewer 1 and 3. We note that we report vermiculite water content in Supplementary Table 4.

      We would like to clarify that both the PEG media and overlay solution were buffered - we did not include this within the written description in the methods, but will do in our revised manuscript.

      We agree with the reviewer’s concern that it may be problematic to compare the transcriptomic profiles of seedling and mature plants. In light of this, we plan to explore what effects our treatment media has on mature rosettes.

      We note that we do not claim that PEG is unable to produce low-water potential responses similar to partial soil drying. Indeed, we indicate that it is a good technique for eliciting phenotypes comparable to drought at the physiological level (line 48). Rather, we claim that PEG is unable to produce gene expression responses that are sufficiently similar to partial vermiculite drying.

      Reviewer #3 (Public Review):

      […] The authors observed that gene expression responses of roots in their 'low-water agar' assay resembled more closely the water deficit in pots compared to the PEG, mannitol, and salt treatments (all at the highest dose). In particular, 28 % of PEG led to the down-regulation of many genes that were up-regulated under drought in pots. Through GO term analysis, it was pointed out that this may be due to the negative effect of PEG on oxygen solubility since downregulated genes were over-represented in oxygen-related categories. The data also shows that the treatment with abscisic acid on plates was very good at simulating drought in roots. Gene expression changes in shoots showed generally a high concordance between all treatments at the highest dose and water deficit in pots, with mannitol being the closest match. This is surprising, since plants grow in plates under non-transpiring conditions, while a mismatch between water loss by transpiration on water supply via the roots leads to drought symptoms such as wilting in pot and field-grown plants. The authors concluded that their 'low-water agar' assay provides a better alternative to simulate drought on plates. 

      Strengths: 

      The development of a more robust assay to simulate drought on plates to allow for high-throughput screening is certainly an important goal since many phenotypes that are discovered on plates cannot be recapitulated on the soil. Adding less water to the media mix and thereby increasing agar strength and nutrient concentration appears to be a good approach since nutrients are also concentrated in soils during water deficit, as pointed out by the authors. To my knowledge, this approach has not specifically been used to simulate drought on plates previously. Comparing their new 'low-water agar' assay to popular treatments with PEG, mannitol, salt, and abscisic acid, as well as plants grown in pots on vermiculite led to a comprehensive overview of how these treatments affect gene expression changes that surpass previous studies. It is promising that the impact of 'low-water agar' on the shoot size of 20 diverse Arabidopsis accessions shows some association with plant fitness under drought in the field. Their methodology could be powerful in identifying a better substitute for plate-based high-throughput drought assays that have an emphasis on gene expression changes. 

      Weaknesses: 

      While the authors use a good methodological framework to compare the different drought treatments, gene expression changes were only compared between the highest dose of each stress assay (Fig. 2B, 3B). From Fig. 1F it appears that gene expression changes depend significantly on the level of stress that is imposed. Therefore, their conclusion that the 'low-water agar' assay is better at simulating drought is only valid when comparing the highest dose of each treatment and only for gene expression changes in roots. Considering how comparable different levels of stress were in this study leads to another weakness. The authors correctly point out that PEG, mannitol, and salt are used due to their ability to lower the water potential through an increase in osmotic strength (L. 45/46). In soils, water deficit leads to lower water potential, due to the concentration of nutrients (as pointed out in L. 171), as well as higher adhesion forces of water molecules to soil particles and a decline in soil hydraulic conductivity for water, which causes an imbalance between supply and demand (see Juenger and Verslues, The Plant Cell 2022 for a recent review). While the authors selected three different doses for each treatment that are commonly used in the literature, these are not necessarily comparable on a physiological level. For example, 200 mM mannitol has an approximate osmotic potential of around -5 bar (Michel et al. Plant Physiol. 1983) whereas 28 % PEG has an osmotic potential closer to -10 bar (Michel et al. Plant Physiol. 1973). It also remains unclear how the increase in agar concentration versus the increase in nutrient concentration in the 'low-water agar' affect water potentials. For these reasons it cannot be known whether a better match of the 'low-water agar' at the 28% dose to water deficit in pots for roots in comparison to the other treatments is due to a good match in stress levels with the 'low-water agar' or adverse side-effect of PEG, mannitol, or and salt on gene regulation. Lastly, since only two biological replicates for RNA sequencing were collected per treatment, it is not possible to know how much variance exists and if this variance is greater than the treatments themselves. 

      We thank the reviewer for their comments. In our statistical analyses, we found that dose-responsive genes (as fit by a linear model) were very similar to those genes found differentially expressed at the highest dose. Thus, for clarity, we decided to simply present the genes differentially expressed at the highest dose. We see now that this might have been an oversimplification. In our revised manuscript, we will present genes that are dose responsive across the range of treatment doses, thus providing more evidence that lower doses of low-water agar are also capable of simulating drought (as is suggested by overlap analysis of Figure 2A).

      Additionally, we will also explore the osmotic potential of each of our different assays to provide a better benchmark of how comparable each of our treatments are (as similarly requested by Reviewer 1 and 2). Lastly, to address concerns regarding the size of variance in gene expression, we will sequence a 3rd replicate of RNA.

    1. Author Response

      Reviewer #2 (Public Review):

      1) Although the images and videos were of great quality, the results derived from them provided little new knowledge and few conceptual insights into male reproductive tract biology and basically confirmed what has been published using traditional methods. For example, the high intensity of the vascular network in the initial segment was previously reported by Abe in 1984 and Suzuki in 1982; the pattern of the major lymphatic vessel and drainage was beautifully depicted by Perez-Clavier, 1982.

      We thank the reviewer for his/her appreciative comments regarding the quality of the images/videos we provide in this study. We do not fully agree with his/her assessment of the lack of novelty. Our work confirms earlier reports that are now dated (1980s), which in itself is worth mentioning for the interested community, especially when the confirmation uses the most advanced technologies available today. We have never said that nothing was done in the past, and we have acknowledged all past contributors (including those mentioned by the reviewer) by pointing out the limitations of the technical tools that were available at the time. In addition, our current work provides a more comprehensive and global view by extending our approach to the entire mouse epididymis, whereas previous work was much more limited.

      2) The authors were very cautious when interpreting the results of marker immunostaining however these markers were not specific for a definite cell type. For example, as the authors stated, VEGFR3 marks both lymphatic vessels and fenestrated blood vessels. how could the authors claim the VEGFR3+ network was lymphatic? The authors claimed that they used three markers for the lymphatic vessel. But staining results of the networks were very different. How could the author make conclusions about the network of lymphatic vessels in the epididymis?

      We broadly agree with the reviewer and have made it clear that one cannot be 100% sure that all the VEGFR3+ structures we present are lymphatic. However, in total, we used 4 documented lymphatic markers (not 3 as mentioned by the reviewer) which are (VEGFR3, LYVE1, PROX1 and PDPN). Three of them give very similar profiles, while only PDPN shows some differences. We are currently studying in more detail the expression of PDPN in the mouse epididymis because we speculate that this marker may target a population of pluripotent cells in this tissue. Therefore, with the 3 similar profiles and with the subtraction of PVLAP+ structures, we are pretty confident that what we show corresponds to the different lymphatic structures.

      3) To understand the vascular network development in the epididymis, would the authors please look at the fetal stage when the vascular network is established in the first place? Wolffian duct tissues are much smaller and thinner and would be amenable for 3D imaging probably even without clearing.

      We generally agree with the reviewer that this could be an interesting addition. However, it represents a significant amount of additional work. Organ clearing will certainly be required because it is unlikely that Wolffian duct will be sufficiently transparent to allow lightsheet microscopy. In the literature, the study of Wolffian duct relies primarily on whole mounts, inclusions, and cryosections. Besides the fact that this represents a lot of extra work, we are not totally convinced that this would be of much use. A key reason is that the epididymis is an organ that differentiates completely after birth (Robaire and Hinton, 2015). It is reported that differentiation of mouse caput segment 1 occurs around 19DPN (Xu et al., 2016) and is intimately related to the development of the vasculature (Lebarr et al., 1986). Regarding the lymphatic network, Swingen et al, (2012) reports that lymphangiogenesis in the mouse testis and epididymis is initiated late in gestation after 15DPC. Videos showing the external lymphatic vessels of the testis and epididymis at 17.5DPC can be seen at https://doi.org/10.1371/journal.pone.0052620.s002. The authors indicate that lymphangiogenesis occurs via sprouting from the adjacent mesonephros. We hypothesize that the more internal lymphatics evolve between birth and 10DPN, which corresponds to the time when we observed LEPC Lyve1pos cells.

      4) Immunofluorescence staining of VEGF factors was not convincing. As a secreted factor, VEGF will be secreted out of the cells, would it be detected more in the interstitium? I am always skeptical about the results of immunostaining secreted growth factors. Would it be possible to perform in situ or RNAscope to confirm the spatial expression pattern of VEGFs?

      Well, active VEGF factors result from alternative mRNA splicing events and posttranslational proteolytic cleavage. Therefore, in our opinion, the study of VEGF mRNA by in situ hybridization or RNAscope analysis will not be very informative about the actual presence of active forms of VEGF in the epididymis. If necessary, we can provide as supplementary material immunohistochemistry data showing the presence of VEFG-A in the epididymal principal cells. Our major objective with these data was to show that VEGF factors and their respective receptors were present in the epididymis. Nevertheless, in an attempt to convince the reviewer, we provide as accompanying data to this rebuttal letter new sets of figures (Figures VEGF-A-response editor & VEGFC /VEGF-D-response editor) that we believe can improve the perception of our data. If the editorial office feels it is necessary, these figures could be added to the supplementary figure set (as Figure 6figure supplement 1 and Figure 6-figure supplement 2). For VEGF-A the data exists already in the literature as we have indicated (Korpelainen, 1998). In fine, our goal was not to show which cell types of the epididymis epithelium produce VEGFs but rather than VEGF factors and their receptors where there in order to support angiogenesis or lymphangiogenic activity in the tissue. In addition, we hypothesize that because septa have been reported to constitute barriers between segments restricting passive diffusion of molecules (Turner et al., 2003; Stammler et al., 2015), the VEGF factors are expected to be produced locally.

      Figure VEGF-A - response editor : Immunofluorescence of the angiogenic ligand VEGF-A in the epididymis. Figure 6 shows that this ligand is mainly found in the caput and more precisely in S1.It is very strongly expressed in the peritubular microvascularization of the SI which expresses the VEGFR3:YFP transgene whereas it is less expressed by intertubular blood vessels (asterisk). This seems to indicate that it is the peritubular vessels that are in the majority responsible for the angiogenic activity measured in our study. Furthermore, it is expressed by the epithelium as secretory vesicles (IS, and S3 and enlargement) which is in agreement with in situ hybridization work performed by Korpelainene E.I et al J.Cell.biol 1998). The enlargement shown in S3_Z shows the sagital plane of the tubule where one can distinguish VEGFR:YFP positive cells that strongly express are also VEGF-A positive indicating that the same cells of the epithelium express both the receptor and the ligand. Here the transgene is detected directly without the use of an anti-GFP which allows to enhance the signal.

      Figure VEGF-C / VEGF-D - response editor : Immunofluorescence of VEGF-C and VEGF-D lymphangiogenic ligands in the epididymis. This figure shows that these ligands are mainly found in the interstitial tissue throughout the organ with a higher proportion in the caudal part. This expression may be largely driven by fibroblasts, which are widely represented in the interstitium, or by endothelial cells, since these two ligands are expressed by these cell types. However, as shown in the figures and in the enlargement of panel A, VEGF-C is also produced by epithelial cells within what may appear as secretory vesicles. In contrast, for VEGF-D, we observe only few weakly positive epithelial cells (panel B). These ligands are also detected in the lumen of epididymal tubules (visible for VEGF-C Panel A S2). This presence may be explained by lumicrine transfer from the testis, in addition to secretion from epithelial cells. Here the transgene is detected directly without the use of an anti-GFP which allows to enhance the signal.

      5) The study is descriptive and does not provide functional and mechanistic insights. Maybe, the combination of 3D imaging with lineage tracing of endothelium cells or ligation study (removal/ligation of the certain vessel) would help better understand how the vascular network is established and their functional significance.

      The technical approaches suggested by the reviewer could certainly improve our understanding of the rather complex epididymal vascular network. Taken together, they represent the body of a comprehensive follow-up study that is worth undertaking.

      6) Immune response is among many physiological processes in which vascular networks play significant roles. Discussion would be needed in other physiological processes, such as tissue metabolism and stem/progenitor cell niche microenvironment.

      We agree with the reviewer that the mammalian vasculature is involved in other physiological processes beyond immune/inflammatory responses. We have deliberately chosen to focus our discussion on the inflammatory and immune context of the epididymis, as we believe this is the most relevant aspect. It is also in full agreement with the research that our team has been conducting for 15 years to try to understand the complex orchestration of tolerance versus immune surveillance in this territory. This is a finely tuned process that, if properly understood, can help to understand and appropriately treat clinical situations of infertility and/or urological problems. As our discussion section is already quite long, we feel that it was not justified to extend it further on other aspects. However, in response to the reviewer's suggestion, we now mention at the end of the first paragraph of the discussion that the epididymal vascular network is likely to serve different processes in this tissue (page 9, lines 299 to 303).

      7) How could the author determine the Cd-A labeled vessel in Fig 1 was an artery, not a vein? This leads to another critical question. Would it be possible to stain with artery and vein markers to help illustrate the blood flow directions of the vessel?

      The reviewer is right on the fact that we arbitrarily called the Cd-A vessel in Figure 1 an artery. Cd-A is not an acronym we use anymore. What we have done is to use the acronym SEA (superior epididymal artery) to indicate what we firmly believe to be an artery, as also suggested by previous literature (e.g., Suzuki, 1982; Abe et al, 1982) in which this same structure has been consistently referred to as an artery. For other blood vessels, we now have used the acronym "Cd-BV" because we do not know whether we are dealing with a vein or an artery as rightfully pointed out by the reviewer. This is clearly stated in the legend of Figure 1.

    1. Author Resposnse

      Reviewer #2 (Public Review):

      This manuscript reassesses the strength of evidence for rapid human germline mutation spectrum evolution, using high coverage whole genome sequencing data and paying particular attention to the potential impact of confounders like biased gene conversion. The authors also refute some recently published arguments that historical changes in the age of reproduction might explain the existence of such mutation spectrum changes. My overall impression is that the paper presents a useful new angle for studying mutation spectrum evolution, and the analysis is nicely suited to addressing whether a particular model such as the parental age model can explain a set of observed polymorphism data. My main criticism is that the paper overstates certain weaknesses of previously published papers on mutation spectrum evolution as well as the generation time hypothesis; correcting these oversimplifications would more accurately capture what the paper's new analyses add to the state of knowledge in these areas.

      As part of the motivation for the current study, the introduction states in lines 97-99 that "it thus remains unclear if the numerous observed [mutation spectrum] differences across human populations stem from rapid evolution of the mutation process itself, other evolutionary processes, or technical factors." This seems to overstate the uncertainty that existed prior to this study, given that Speidel, et al. 2021 found elevated TCC>TTC fractions in ancient genomes from a specific ancient European population, which seems like pretty airtight evidence that this historical mutation rate increase really happened. In addition, earlier papers (Harris 2015, Mathieson & Reich 2016, Harris & Pritchard 2017) already presented analyses rejecting the hypothesis that biased gene conversion or genetic drift could explain the reported patterns-in fact, the Mathieson & Reich paper reports one mutation spectrum difference between populations that they conclude is an artifact caused by the Native American population bottleneck, but they conclude that other mutation spectrum differences appear more robust.

      We completely agree with the reviewer that there has been compelling evidence from multiple independent groups supporting transient elevation of TCC>TTC mutation rate in Europeans. Beyond the TCC signal, however, the mechanisms underlying the observed differences in mutation spectrum across populations remain unclear. In particular, several biological and technical factors impact the mutation spectrum and none of the previous studies have investigated their effects, independently or altogether. Thus, it remains unclear if the mutation rate is evolving rapidly across populations, or if one or more factors (like biased gene conversion) differ across groups or over evolutionary time. Our analysis framework attempts to control these effects together to more reliably investigate the effects of various factors and examine when and how often there has been evolution of mutation rate over the course of human evolution.

      As the authors acknowledge in the discussion of their own results, biased gene conversion and non-equilibrium demography are difficult confounders to deal with, and neither previous papers nor the current paper are able to do this in a way that is 100% foolproof. The current manuscript makes a valuable contribution by presenting new ways of dealing with these issues, particularly since previous papers' work on this topic was often confined to supplementary material, but it seems appropriate to acknowledge that earlier papers discussed the potential impacts of biased gene conversion and demographic complexity and presented their own analyses arguing that these phenomena were poor explanations for the existence of mutation spectrum differences between populations.

      For the most part, I found the paper's introduction to be a useful summary of previous work, but there are a few additional places where the limitations of previous work could be described more clearly. I'd suggest noting that the data artifacts discovered by Anderson-Trocmé, et al. were restricted to a few old samples and that the large differences the current manuscript focuses on were never implicated as potential cell line artifacts. In addition, when the authors mention that their new approach includes "minimiz[ing] confounding effects of selection by removing constrained regions and known targets of selection" (lines 106-107), they should note that earlier papers like Harris & Pritchard 2017 also excluded conserved regions and exons.

      We agree with the reviewer that some of the previous work also attempted to account for the contributions of selection or other factors in post hoc ways; we now acknowledge this in the Results section more explicitly. However, we note that our contribution is in introducing a framework to account for these effects a priori and then assess if there are differences in mutation spectrum across populations and over the course of human evolution. In particular, an innovation of our framework is to better control for the effect of gBGC, which has not been done in previous studies.

      One innovative aspect of the current paper's approach is the use of allele ages inferred by Relate, which certainly has advantages over using allele frequencies as a proxy for allele age. Though the authors of Relate previously used this approach to study mutation spectrum evolution, they did not perform such a thorough investigation of ancient alleles and collapsed mutation type ratios. I like the authors' approach of building uncertainty into the use of Relate's age estimates, but I wonder about the validity of assuming that the allele age posterior probability is distributed uniformly between the upper and lower confidence bounds. Can the authors address why this is more appropriate than some kind of peaked distribution like a beta distribution?

      The lower and upper bounds of the allele age reported by Relate reflect the start and end points of the branch that the mutation falls on in the reconstructed genealogical tree. If Relate does a perfect job in reconstructing the tree and estimating the branch lengths, the mutation age should be uniformly distributed in the inferred interval. It is unrealistic that Relate can perform perfectly in tree building, and there is likely considerable uncertainty and even bias in the time to endpoints of the branch. Unfortunately, Relate does not report the uncertainty in the lower and upper bounds of the mutation age, so we were not able to model the posterior distribution of the allele age properly. However, assuming a uniform distribution of the mutation age between the upper and lower confidence bounds should be valid to first approximation.

      I would also argue that the statement on line 104 about Relate's reliability is not yet supported by data-there is certainly value in using Relate ages to investigate mutation spectrum change over time and compare this to what has been seen using allele frequencies, but I don't think we know enough yet to say that the Relate ages are definitely more reliable. Relate's estimates might be biased by the same processes like selection and demography that make allele frequencies challenging to interpret. The paper's statements about the limitations of allele frequencies are fair, but there is always a tradeoff between the clear drawbacks of simple summary statistics and the more cryptic possible blind spots of complicated "black box" algorithms (in the case of Relate, an MCMC that needs to converge properly). DeWitt, et al. 2021 noted that the demographic history inferred by Relate doesn't accurately predict the underlying data's site frequency spectrum, indicating that the associated allele ages might have some problems that need to be better characterized. While testing Relate for biases is beyond the scope of this work, the introduction should acknowledge that the accuracy and precision of its time estimates are still somewhat uncertain.

      We agree with the reviewer and have now added a paragraph in the Discussion highlighting some issues of Relate regarding mutation age estimation and ancestral allele polarization.

      The paper's results on C>T mutations in Europeans versus Africans are a nice confirmation of previous results, including the observation from Mathieson & Reich that neither SBS7 nor SBS11 is a good match for the mutational signature at play. More novel is the ancient mutational signature enriched in Africa and the interrogation of the ability of parental age to explain the observed patterns. I just have a few minor suggestions regarding these analyses:

      1) I like the idea of using maternal age C>G hotspots to test the plausibility of the maternal age as an explanatory factor, but I think this would be more convincing with the addition of a power analysis. Given two populations that have average maternal ages of 20 and 40, and the same population sample sizes available from 1000 Genomes, can the authors calculate whether the results they'd predict are any different from what is observed (i.e. no significant differences within the maternal hotspots and significant differences outside of these regions)?

      We thank the review for this suggestion. We performed simulations to estimate the power of observing significant inter-population differences within and outside the maternal C>G mutation hotspots, under the assumption that all differences in the mutation spectrum between the two populations are related to the parental age (i.e., generation time). We found that, because of the extraordinarily strong maternal age effects in the maternal mutation hotspots, the power for detecting variation in C>G/T>A ratio due to change in generation age is much greater within maternal hotspots than outside, despite the smaller total size of the maternal hotspot regions (and hence fewer SNPs; Figure 3 – figure supplement 4). For example, even with an age difference of five years, there is nearly 100% power to detect significant differences in the maternal hotspots, compared to <12% for regions outside the maternal hotspots. In other words, if inter-population differences in the mutation spectrum are driven by differences in maternal age across populations, we should have enough power to observe a signal in the maternal hotspot regions alone, the lack of which (Figure 2C) strongly suggests that maternal age is not driving these signals.

      2) Is it possible that the T>C/T>G ratio is elevated in all variants above a certain age but shows up as an African-specific signal because the African population retains more segregating variation in this age range, whereas non-African populations have fixed or lost more of this variation? Since Durvasula & Sankararaman identified putative tracts of super-archaic introgression within Africans, is it possible to test whether the mutation spectrum signal is enriched within those tracts?

      The observation that the T>C / T>G signal is driven by TpG>CpG mutations (which might be mis-polarized CpG transitions) casts a doubt on the signal. Given the unresolved technical issue, we have now removed any discussion of the biological explanations behind the signal and instead focus on describing the challenges with ancestral allele polarization under context-dependent mutation rate variation.

      3) Although Coll Macià, et al. argued that generation time is capable of explaining all mutation spectrum differences between populations, including the excess of TCC>TTC in Europeans, Wang et al. argue something slightly different. They exclude TCC>TTC and the other major components of the European signature from their analysis and then argue that parental age can explain the rest of the differences between populations. I think the analysis in this paper convincingly refutes the Coll Macià, et al. argument, but refuting the Wang, et al. version would require excluding the same mutation types that are excluded in that paper.

      Although we did not present an analysis that explicitly excludes TCC>TTC mutations, our analysis still shows that generation time alone cannot explain the remaining variations in the mutation spectrum observed (Figure 4). Specifically, the temporal trend of T>C/T>G ratio would suggest a decreasing generation time of Europeans with time, whereas the C>G/T>A ratio suggests the opposite. In addition, the power analysis for C>G maternal hotspots (suggested by the reviewer) further supports that the inter-population differences observed cannot be entirely driven by differences in parental ages. These observations, which do not involve TCC>TTC mutations, strongly suggest that generation time is not the sole or primary driver of differences in mutation spectrum across populations. Further, our analysis shows that several technical issues and biological processes, in addition to changes in life history traits can lead to changes in the mutation spectrum of polymorphisms. Therefore, inferring generation time using changes in mutation spectrum is not straightforward as Wang et al. proposed, because generation time is not the only or dominant factor impacting mutation spectrum.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an awesome comprehensive manuscript. Authors start by sorting putative stromal cellcontaining BM non-hematopoietic (CD235a-/CD45-) plus additional CD271+/CD235a/CD45- populations to identify nine individual stromal identities by scRNA-seq. The dual sorting strategy is a clever trick as it enriches for rare stromal (progenitor) cell signals but may suffer a certain bias towards CD271+ stromal progenitors. The lack of readable signatures already among CD45-/CD45- sorts might argue against this fear. This reviewer would appreciate a brief discussion on number & phenotype of putative additional MSSC phenotypes in light of the fact that the majority of 'blood lineage(s)'-negative scRNA-seq signatures identified blood cell progenitor identities (glycophorin A-negative & leukocyte common antigen-negative). The nine stromal cell entities share the CXCL12, VCAN, LEPR main signature. Perhaps the authors could speculate if future studies using VCAN or LEPRbased sort strategies could identify additional stromal progenitor identities?

      We would like to thank the reviewer for critically evaluating our work and for the generally positive evaluation of the paper. We apologize for delayed resubmission as it took a long time for a specific antibody to arrive to complete the confocal microscopy analyses.

      The reviewer asks for a brief discussion on the cell numbers and phenotypes of MSSC phenotypes. The cell numbers and percentages of MSSC in sorted CD45low/-CD235a- and CD45low/-CD235a-CD271+ cells can be found in Supplementary File 3 and we have added a summary of the phenotypes of MSSC in the new Supplementary File 7.

      Due to the extremely low frequency of stromal cells in human bone marrow, we chose a sorting strategy that also included CD45low cells (Fig 1A) to ensure that no stromal cells were excluded from the analysis. Although stromal elements are certainly enriched using this approach, the CD45low population contains several different hematopoietic cell types. These include CD34+ HSPCs which are characterized by low CD45 expression2, as well as the CD45low-expressing fractions of other hematopoietic cell populations such as B cells, T cells, NK cells, megakaryocytes, monocytes, dendritic cells, and granulocytes. Furthermore, CD235a- late-stage erythroid progenitors, which are negative for CD45, are represented as well. Of note, our data are consistent with previously reported murine studies showing the presence of a number of hematopoietic populations in CD45- cells, which accounted for the majority of CD45-Ter119-CD31- murine BM cells3,4. However, despite a certain enrichment of stromal elements in the CD45low cell fraction, frequencies were still too low to allow for a detailed analysis of this important bone marrow compartment. This prompted us to adopt the stromal cell-enrichment strategy as described in the manuscript to achieve a better resolution of the stromal compartment. In fact, sorting based on CD45low/-CD235a-CD271+ allowed us to sufficiently enrich bone marrow stromal cells to be clearly detectable in scRNAseq analysis. According to the reviewer’s suggestion, a brief discussion on this issue is now included in the Discussion (page 28, lines 10-15).

      The reviewer also suggested using VCAN or LEPR-based sorting strategy to identify additional stromal identities in future studies.

      However, as an extracellular matrix protein, FACS analysis of cellular VCAN expression can only be achieved based on its intracellular expression after fixation and permeabilization5,6. Additionally, while VCAN is highly and ubiquitously expressed by stromal clusters, VCAN is also expressed by monocytes (cluster 36). Therefore, VCAN is not an optimal marker to isolate viable stromal cells.

      LEPR is the marker that was reported to identify the majority of colony-forming cells in adult murine bone marrow7. We have previously reported that the majority of human adult bone marrow CFU-Fs is contained in the LEPR+ fraction 8. In our current scRNAseq surface marker profiling analysis, group A cells showed high expression of several canonical stromal markers including VCAM1, PDGFRB, ENG (CD73), as well as LEPR (Fig. 4A). However, the four stromal clusters in Group A could not be separated based on the expression of LEPR. Therefore, we chose not to use LEPR as a marker to prospectively isolate the different stromal cell types.

      The authors furthermore localized CD271+, CD81+ and NCAM/CD56+ cells in BM sections in situ. Finally, referring to the strong background of the group in HSC research, in silico prediction by CellPhoneDB identified a wide range of interactions between stromal cells and hematopoietic cells. Evidence for functional interdependence of FCU-F forming cells is completing the novel and more clear bone marrow stromal cell picture.

      We thank the reviewer for the positive comments.

      An illustrative abstract naming the top9 stromal identities in their top4 clusters by their "top10 markers" + functions would be highly appreciated.

      We thank the reviewer for the suggestion. A summary of the characteristics of stromal clusters is now shown in the new Supplementary File 7, which we hope matches the reviewer’s expectations.

      Reviewer #2 (Public Review):

      Knowledge about composition and function of the different subpopulations of the hematopoietic niche of the BM is limited. Although such knowledge about the mouse BM has been accumulating in recent years, a thorough study of the human BM still needs to be performed. The present manuscript of Li and coworkers fills this gap by performing single cell RNA sequencing (scRNAseq) on control BM as well as CD271+ BM cells enriched for non-hematopoietic niche cells.

      We apologize for delayed resubmission as it took a long time for a specific antibody to arrive to complete the confocal microscopy analyses. We thank the reviewer for the critical expert review and overall positive comments.

      Based on their scRNAseq, the authors propose 41 different BM cell populations, ten of which represented non-hematopoietic cells, including one endothelial cell cluster. The nine remaining skeletal subpopulations were subdivided into multipotent stromal stem cells (MSSC), four distinct populations of osteoprogenitors, one cluster of osteoblasts and three clusters of pre-fibroblasts. Using bioinformatic tools, the authors then compare their results and divisions of subpopulations to some previously published work from others and attempt to delineate lineage relationships using RNA velocity analyses. From these, they propose different paths from which MSSC enter the progenitor stages, and might differentiate into pre-osteoblasts and -fibroblasts.

      It is of interest to note, that apparently adipo-primed cells may also differentiate into osteolineage cells, something that should be further explored or validated. Furthermore, although this analysis yields a large adipo-primed populations, pre-adipocytes and mature adipocytes appear not to be included in the data set the authors used, which should also be explained.

      We thank the reviewer for this comment. We chose to annotate Cluster 5 as adipoprimed cluster based on the higher expression of adipogenic differentiation markers as well as a group of stress-related transcription factors (FOS, FOSB, JUNB, EGR1) (Fig. 2B-C, Figure 2-figure supplement 1C) some of which had been shown to mark bone marrow adipogenic progenitors1. Although at considerably lower levels compared to adipogenic genes, osteogenic genes were also expressed in cluster 5 cells (Fig. 2B and D), indicating the multi-potent potential of this cluster. Therefore, our initial annotation of these cells as adipoprimed progenitors was too narrow as it did not include the possible osteogenic differentiation potential. We apologize for the confusion caused by the inappropriate annotation and, in order to avoid any further confusion, cluster 5 has now been re-annotated as ‘highly adipocytic gene-expressing progenitors (HAGEPs), which we believe is a better representation of the cells. We furthermore agree with the reviewer that in-vivo differentiation needs to be performed to address potential differentiation capacities in future studies.

      With regard to the lack of adipocytes in our data set, we described in the Materials and Methods section that human bone marrow cells were isolated based on density gradient centrifugation. After centrifugation, the mononuclear cell-containing monolayers were harvested for further analysis. However, the resulting supernatant containing mature adipocytic cells was discarded14. Therefore, adipocyte clusters were not identified in our dataset. We have amended the manuscript accordingly (page 5, line 7).

      Regarding the pre-adipocytes, we are not aware of any specific markers for pre-adipocytes in the bone marrow. We examined the only known markers (ICAM1, PPARG, FABP4) that have been shown to mark committed pre-adipocytes in human adipose tissue15. As illustrated in Fig. R1 (below), low expression of all three markers was not restricted to a single distinct cluster but could be found in almost all stromal clusters. These data thus allow us to neither confirm nor exclude the presence of pre-adipocytes in the dataset. Due to the lack of specific markers for pre-adipocytes and the absence of mature adipocytes in the current dataset, it is therefore difficult to identify a well-defined pre-adipocytes cluster.

      Figure R1. UMAP illustration of the normalized expression of the markers for pre-adipocytes in stromal clusters.

      In addition, based on a separate analysis of surface molecules, the authors propose new markers that could be used to prospectively isolate different human subpopulations of BM niche cells by using CD52, CD81 and NCAM1 (=CD56). Indeed, these analyses yield six different populations with differential abilities to form fibroblast-like colonies and differentiate into adipo-, osteo-, and chondrogenic lineages. To explore how the scRNAseq data may help to understand regulatory processes within the BM, the authors predict possible interactions between hematopoietic and non-hematopoietic subpopulations in the BM. These should be further validated, to support statements as the suggestion in the abstract that separate CXCL12- and SPP1-regulated BM niches might exist.

      We agree with the reviewer that functional validation of the CellPhoneDB results using for example in vivo humanized mouse models would be needed to demonstrate the presence of different niches in the bone marrow. At this point of time we only put forward the hypothesis that different niche types exist while we will work on providing experimental proof in our future studies.

      The scRNAseq analysis is indeed a strong and important resource, also for later studies meant to increase knowledge about the hematopoietic niche of the BM. Although the analyses using different bioinformatic tools is very helpful, they remain mostly speculative, since validatory experiments, as already mentioned, are missing. As such, I feel the authors did not succeed in achieving their goals of understanding how non-hematopoietic cells of the BM regulate the different hematopoietic processes within the BM. Nevertheless, they have created valuable resources, both in the scRNAseq data they generated, as well as the different predictions about different cell populations, their lineage relationships, and how they might interact with hematopoietic cells.

      We thank the reviewer for the appreciation of the value of this dataset. We agree with the reviewer that it is of great importance to validate the contribution of potential driver genes for stromal cell differentiation and verify the in vitro data and in-silico prediction using in-vivo models. As the main goal of the current study was to formulate hypotheses based on the scRNAseq data for future studies, we believe that in vivo validation experiments using engineered human bone marrow models or humanized bone marrow ossicles are out of the scope of the current study, but certainly need to be performed in the future.

      The impact of this work is difficult to envision, since validations still need to be performed. Also, it has the born in mind that humans are not mice, which can be studied in neat homogeneous inbred populations. Human populations on the other hand, are quite diverse, so that the data generated in this manuscript and others will probably have to be combined to extrapolate data relevant to the whole of the human population. However, as it is equally difficult to generate reliable scRNAseq data from human BM, it seems likely that the data will indeed an important resource, when more data from different donors become available.

      We thank the reviewer for the generally positive evaluation of this study.

      Taken at point value, the authors provide evidence that human counterparts exist to several BM populations described in mice. In my opinion, the lineage relationships predicted using the RNA velocity analyses need more substance, as it seems the differentiation-paths may diverge from what is known from mice. If so, this issue should be studied more stringently. Similarly, the paper would have been strengthened considerably if a relevant experimental validation would have been attempted, perhaps by using genetically modified (knockdown) MSSC, similar to Battula et al. (doi: 10.1182/blood-2012-06-437988).

      In the study from Welner’s group, stromal differentiation trajectory was inferred based on scRNAseq analysis of murine bone marrow cells using Velocyto16. Velocyto identified MSCs as the ‘source’ cell state with pre-adipocytes, pro-osteoblasts, and prochondrocytes being end states. In our study, the MSSC population was predicted to be at the apex of the trajectory and the pre-osteoblast cluster was placed close to the terminal state of differentiation, which is consistent with the murine study. However, different stromal cell types were identified in mice compared with humans. For example, we have identified prefibroblasts in our dataset which are absent in the murine study, while a well-defined murine pre-adipocyte population was not identified in our human dataset. Therefore, it is not surprising to find some discrepancies between human and murine stromal differentiation trajectories. Of course and as mentioned before, critical in-vivo functional validations need to be carried out to address these important issues in the future.

      In summary, this is a very interesting but also descriptive paper with highly important resources. However, to prospectively identify or isolate human non-hematopoietic/nonendothelial niche populations, more stringent validations should have been performed to strengthen the validity of the different analyses that have been performed. As such, it remains an open question which niche subpopulations has the most impact on the different hematopoietic processes important for normal and stress hematopoiesis, as well as malignancies.

      Thank you for this comment. We completely agree that more stringent validations are necessary but are outside of the aim of our current hypothesis-generating study. Accordingly, we are planning functional verification studies using genetically manipulated stromal cells in combination with in-vivo humanized ossicles. Furthermore, other groups will hopefully use our database and contribute with functional studies in model systems that are currently not available to us, e.g. iPS-derived bone marrow in-vitro proxies.

      Specific remarks

      • Since CD45, CD235a, and CD271 are used as distinguishing markers in the sample preparation of the scRNAseq, it would be helpful to highlight these markers in the different analyses (Figures 1D, 2B, 2C-F, and 4A), and restrict the analyses to those cells that also not express CD45, CD235a (why use CD71?) and highly express CD271.

      Thank you for this comment. As shown in Fig. R2, we have modified figures Fig. 1D, 2B, and 4A showing now also the expression of PTPRC (CD45), GYPA (CD235a), and NGFR (CD271) on the top (Fig. 1D and 2B) or right (Fig. 4A) panel of the figures. To complement Fig. 2C-F, we have generated new stacked violin plots showing the expression level of three markers by all 9 stromal clusters (Fig. R2B). As we believe that including these three markers in the figures does not provide a better strategy to improve the analyses, we decided to leave the original figures unchanged in this respect.

      Figure R2. (A) Modified Fig. 1D, 2B and 4A with PTPRC (CD45), GYPA (CD235a) and NGFR (CD271) expression. (B) Stacked violin plots of PTPRC, GYPA and NGFR expressed by stromal clusters to complement Fig. 2C-F.

      With regard to cell exclusion based on CD45, as shown in the modified Figure corresponding to Fig 1A in the manuscript (Fig R2A), CD45 gene expression is observed also in the endothelial cluster, basal cluster, and neuronal cluster (Fig. R2A). These clusters represent non-hematopoietic clusters that we would like to keep in our dataset for further analysis, such as cell-cell interaction. Therefore, we choose to not restrict the analysis to solely CD45 nonexpressing cells.

      With regard to CD235a (GYPA), expression of CD235a is not detected in any of the nonhematopoietic clusters. Thus, CD235a-expressing cell exclusion is not necessary.

      For CD271, according to our previous results (own unpublished data, belonging to a dataset of which only significantly expressed genes were reported in Li et al.8), protein expression of CD271 is not necessarily reflected by gene expression. In the other words, stromal cells with CD271 protein expression do not always have high mRNA expression. A significant fraction of stromal cells would be excluded if we restrict the analyses only to those cells that show high CD271 gene expression, which would not reflect the real cellular composition of human bone marrow stroma. In order to not risk losing stromal cells, we therefore kept our previous analyses which included stromal cells with various CD271 expression levels.

      With regard to using CD71 as an exclusion marker, please see also the comments to reviewer 1. Briefly, according to our data, CD71 (TFRC)-expressing erythroid precursors could still be found after excluding CD45 and CD235a positive cells (Figure 1-figure supplement 1B and R3). As furthermore shown in Figure 1-figure supplement 1G and R2, CD71 expression in the stromal clusters is negligible. Therefore, we believe that this justifies the use of CD71 as an additional marker to exclude erythroid cells. We have amended the discussion to address this issue (page 19, lines 7-8).

      Figure R3. FACS plots illustrating the expression of (A) CD71 (TFRC) vs CD271 in CD45- CD235a- cells and (B) FSC-A vs CD81 in CD45-CD235a-CD271+CD71+ cells following exclusion of doublets and dead cells.

      • Despite a distinct neuronal cluster (39), there does not seem to be a distinctive marker for these cells. Is this true?

      Yes, the reviewer is correct that there is no significantly-expressed distinctive marker for neuronal cells. Multiple markers indicating the presence of different cell types were identified in cluster 39 (Supplementary File 4). Among them, several neuronal markers (NEUROD1, CHGB, ELAVL2, ELAVL3, ELAVL4, STMN2, INSM1, ZIC2, NNAT) were found to be enriched in this cluster (Supplementary File 4 and Fig. 1D) with higher fold changes compared to other identified genes. However, the expression of these genes was not statistically significant, which is mainly due to the heterogeneity of the cluster and thus does not allow us to draw any firm conclusions.

      Several genes including MALAT1, HNRNPH1, AC010970.1, and AD000090.1 were identified to be statistically highly expressed by cluster 39 (Supplementary File 4). The expression of these genes is not restricted to any specific cell type. It is therefore impossible to annotate the cluster based on this and our data thus indicated that cluster 39 is a heterogeneous population containing multiple cell types. Based on the expression of neuronal markers, we nevertheless chose to annotate Cluster 39 as “neuronal” as the prominent expression of neuronal markers indicated the presence of neurons in this cluster. To be more accurate, the annotation of cluster 39 has been changed to ‘neuronal cell-containing cluster’ to correctly reflect the presence of non-neuronal gene expressing cells as well (page 29, lines 3-8).

      • Since based on 2C and 2D, the authors are unable to distinguish adipo- from osteogenic cells, would the authors use the same molecules to distinguish different populations of 2C-D, or would they use other markers, if so which and why.

      We agree with the reviewer that at the first glance adipo-primed (cluster 5, now annotated as “highly adipocytic gene-expressing progenitors”, HAGEPs), balanced progenitors (cluster 16), and pre-osteoblasts (cluster 38) shared a similar expression pattern according to the violin plots in Fig. 2C and 2D. However, as illustrated in the heatmap (Fig. 2B), the expression patterns of adipo-primed (HAGEP) and balanced progenitors were quite different in terms of their expression of adipogenic and osteogenic markers. Both adipogenic and osteogenic marker expression was detected in HAGEPs, balanced progenitors, and preosteoblasts. Thus, as violin plots are summarizing the overall expression levels of a certain marker in a certain cluster, these plots tend to make it more difficult to detect differential expression patterns between different clusters. In this case, the heatmap shown in Fig. 2B is a good complement to the violin plots as it is demonstrating the different expression patterns of every cell in the different stromal clusters.

      Additionally, cluster 5 showed the expression of a group of stress-related transcription factors (FOS, FOSB, JUNB, EGR1) (Fig. 2B and Figure 2-figure supplement 1C), some of which had been shown to mark bone marrow adipogenic progenitors1. The expression of the abovementioned stress-related transcription factors (putative adipogenic progenitor markers) was generally lower in cluster 38 compared to cluster 5, further demonstrating that clusters were different.

      Furthermore, there was a gradual upregulation of more mature osteogenic markers such as RUNX1, CDH11, EBF1, and EBF3 from cluster 5 to cluster 16 and finally cluster 38. As shown in Fig. 2D, the expression of these markers was higher in cluster 38 compared to cluster 5. Therefore, cluster 38 was annotated as pre-osteoblasts.

      Most of the stromal clusters form a continuum (Fig. 2A), which correlates very well with the gradual transition of different cellular states during stromal cell development. It is highly unlikely that abrupt and dramatic gene expression changes would occur during the cellular state transition of cells of the same lineage. Therefore, it is not surprising to find the differences in gene expression profiles between stromal clusters share a certain level of similarities.

      In summary, we rely on several factors to distinguish different stromal clusters, which include canonical adipo-, osteo- and chondrogenic markers, stress markers, heatmap, violin plots, and the gradual up-regulation of certain lineage-specific markers.

      To directly answer the reviewer’s question, we believe that we are able to distinguish different stromal clusters based on our data.

      • In de Jong et al., an inflammatory MSC population (iMSC) is defined. Since the Schneider group showed that inflammatory S100A8 and A9 are expressed by inflamed MSC, is it possible that the some of the designated pre-fibroblasts actually correspond to these S100A8/A9-expressing iMSC?

      We thank the reviewer for raising this interesting question.

      First of all, we would like to point out that scRNAseq was performed using viably frozen bone marrow aspirates in de Jong’s study while freshly isolated bone marrows were used in our study. There might be discrepancies between frozen and fresh bone marrow samples in terms of cellular composition including stromal composition and, importantly, processinginduced stress-related gene expression profiles.

      To investigate if designated pre-fibroblasts actually correspond to iMSCs as suggested by the reviewer, we have re-examined the expression of some of the key iMSC genes as reported by de Jong et al 17. As shown in Fig. R6, the markers that can distinguish iMSC from other MSC clusters in de Jong et al. study were not exclusively expressed by pre-fibroblasts, but also by other stromal cell types including HAGEPs, balanced progenitors, and pre-osteoblasts.

      In the study by R. Schneider’s group18, significant upregulation of S100A8/S100A9 was observed in stromal cells from patients with myelofibrosis. Furthermore, base-line expression of S100A8/A9 was also observed in the fibroblast clusters in the control group, which correlates very well with our data of S100A8/9 expression in pre-fibroblasts in normal donors (Fig. 2F). Our data thus indicate – in line with Schneider’s findings - that there is a baseline level expression of S100A8/9 in fibroblasts in hematologically normal samples and that the expression of S100A8/9 is not restricted to inflamed MSC.

      In summary, the gene expression profiles observed in our study do not indicate the presence of iMSC in the healthy bone marrow.

      • Figure 3A: Do human adipo-primed cells (cluster 5) indeed differentiate into osteogenic cells (clusters 6, 38, and 39). This would be highly unexpected. Can the authors substantiate this "reliable outcome of the RNA velocity analysis"?

      Please refer to our previous responses regarding this topic. Briefly, as shown in Fig. 2B and D, both osteogenic and adipogenic genes are expressed in cluster 5, indicating the multi-potent potentials of this cluster. Although the cluster was initially annotated as adipo-primed progenitors, this was not intended to exclude the osteogenic differentiation potential of these progenitors. Nevertheless, this annotation did not correctly reflect the differentiation potential and might thus have caused confusion, for which we apologize. In order to more correctly describe the characteristics of these cells, cluster 5 has now been reannotated as ‘highly adipocytic gene-expressing progenitors (HAGEPs)’.

      In general, the outcome of the RNA velocity analysis needs to be corroborated by in-vivo differentiation experiments. But we believe that functional verification, which would be extensive, is out of the scope of the current study and we will address these questions in future studies.

      • How statistically certain are the authors, that the populations in Figure 4B as defined by flow cytometry, correspond to MSSC, adipo-primed cells, osteoprogenitors, etc., as defined by scRNAseq?

      To address this question, we sorted the A1-A4 populations and performed RT- PCR to examine the CD81 expression level in each cluster. As shown in Figure 4-figure supplement 1B, CD81 expression levels were higher in A1 and A2 compared with A3 and A4, which is consistent with the scRNAseq data that showed the highest CD81 expression in MSSCs compared to other clusters (Supplementary File 4).

      The phenotypes defined in this study allowed us to isolate different stromal cell types which demonstrated significant functional differences as described in the manuscript (page 19, lines 17-25; page 20, lines 1-11). These results, in combination with the quantitative real-time PCR results (Figure 4-figure supplement 1B), demonstrated that the A1-A4 subsets in FACS are functionally distinct populations and are likely to be – at least in large parts – identical or equivalent to the transcriptionally identified clusters in group A stromal cells. However, at this point, we do not have performed the required experiments (scRNAseq of sorted cells) that would provide sufficient proof to confirm this statement statistically.

      • The immunohistochemistry results shown do not allow distinct conclusions as the colors give unequivocal mix-colors, and surface expression cannot be distinguished from intracellular expression. Please use a 3D (confocal) method for such statements.

      We thank the reviewer for the suggestion and we have performed additional confocal microscopy analysis of human bone marrow biopsies as suggested by the reviewer. Representative confocal images are now presented in the middle and right panel of Fig. 6E. We also include a separate file (Supplemental confocal image file). Here, confocal scans of all maker combinations are shown as ortho views in addition to detailed intensity profile analyses of the cells of interest clearly distinguishing surface staining from intracellular staining.

      Confocal analysis of bone marrow biopsies confirmed our findings presented in the manuscript. As observed in the scanning images, CD271-expressing cells were negative for CD45 and were located in perivascular, endosteal, and peri-adipocytic regions. CD271/CD81double positive cells could be found either in the peri-adipocytic regions or perivascular regions while CD271/NCAM1 double-positive cells were exclusively situated at the bone-lining endosteal regions. The results of the confocal analysis have been added to the revised manuscript (page 21, lines 15-17).

      • Figure 5A: as all cells seem to interact with all other cells, this figure does not convey relevant information about BM regions using for instance CXCL12 or SPP1. Please reanalyze to show specificity of the interactions of the single clusters. Also, since it is unlikely the CellPhoneDB2-predicted interactions are restricted to hematopoietic responders, please also describe the possible interactions between non-hematopoietic cells.

      Fig. 5A was used to demonstrate the complexity of the interactions between hematopoietic cells and stromal cells.

      To gain a more detailed understanding of the interactions, we also performed an analysis with the top-listed ligand-receptor pairs as shown in Fig. 5B-C and Figure 5-figure supplement 1B. Here, each dot represents the interaction of a specific ligand-receptor pair listed on the x-axis between the two individual clusters indicated in the y-axis, which we believe shows what the reviewer is asking for.

      The specificity of the interactions between single clusters were shown in Fig. 5B-C and Figure 5-figure supplement 1B. The CXCL12- and SPP1-mediated interactions between MSSC/OC and hematopoietic clusters clearly suggested stromal cell type-specific interactions.

      Regarding non-hematopoietic cells, both inter- and intra-stromal interactions were identified to be operative between different stromal subsets as well as within the same stromal cell population as shown in Figure 5-figure supplement 3B. In addition, we have also analyzed the interaction pattern between endothelial cells and hematopoietic cells as shown in Fig. 7A, and thus we believe that we have sufficiently described these interactions as requested by the reviewer.

    1. Author Response

      Reviewer #2 (Public Review):

      This study identifies the neural circuits inhibited by activation of opioid receptors using complex experimental approaches such as electrophysiology, pharmacology, and optogenetics and combined them with retrograde and anterograde tracings. The authors characterize two key regions of the brainstem, the preBötzinger Complex, and the Kolliker-Fuse, and how these neuronal populations interact. Understanding the interactions of these circuits substantially increases our understanding of the neural circuits sensitive to opioid drugs which are critical to understand how opioids act on breathing and potentially design new therapies.

      Major strengths.

      This study maps the excitatory projections from the Kolliker-Fuse to the preBötzinger Complex and rostral ventral respiratory group and shows that these projections are inhibited by opioid drugs. These Kolliker-Fuse neurons express FoxP2, but not the calcitonin gene-related peptide, which distinguishes them from parabrachial neurons. In addition, the preBötzinger Complex is also hyperpolarized by opioid drugs. The experiments performed by the authors are challenging, complex, and the most appropriate types of approaches to understanding pre- and post-synaptic mechanisms, which cannot be studied in vivo. These experiments also used complex tracing methods using adenoassociated virus and cre-lox recombinase approaches.

      Limitations.

      (1) The roles of the mechanisms identified in this study have not been established in models recording opioid-induced respiratory depression or respiratory activity. This study does not record, modulate, or assess respiratory activity in-vitro or in-vivo, without or with opioid drugs such as fentanyl or morphine.

      (2) Experiments are performed in-vitro which do not mimic the effects of opioids observed in-vivo or in freely-moving animals. However, identification of pre- and post- synaptic mechanisms, as well as projections, cannot be performed in-vivo, so the authors use the right approaches for their experiments.

      We agree with both of these points. We hope this study lays the groundwork for future studies assessing the impact of these projections on respiratory activity in vitro and in vivo.

      (3) The type of neurons projecting from KP to preBötzinger Complex or ventral respiratory group have not been identified. Although some of these cells are glutamatergic, optogenetic experiments could have been performed in other cre-expressing cell populations, such as neurokinin-1 receptors.

      There are indeed many different cell populations that could be interrogated. In addition to the optogenetic identification of glutamatergic projections, we identified immunohistochemically that at least some opioid receptor-expressing, medullary-projecting KF neurons express FoxP2, and not CGRP. Further dissection of other cell populations, such as Lmx1b and Phox2b, are excellent future directions.

      Reviewer #3 (Public Review):

      This manuscript reveals opioid suppression of breathing could occur via multiple mechanisms and at multiple sites in the pontomedullary respiratory network. The authors show that opioids inhibit an excitatory pontomedullary respiratory circuit via three mechanisms: 1) postsynaptic MOR-mediated hyperpolarization of KF neurons that project to the ventrolateral medulla, 2) presynaptic MOR mediated inhibition of glutamate release from dorsolateral pontine terminals onto excitatory preBötC and rVRG neurons, and 3) postsynaptic MOR-mediated hyperpolarization of the preBötC and rVRG neurons that receive pontine glutamatergic input.

      This manuscript describes in detail a useful method for dissecting the relationship between the dorsolateral pons and the rostral medulla, which will be useful for various researchers. It's also great to see how many different methods have been applied to improve the accuracy of the results.

      1. Relationship between the dorsolateral pons and rostral ventrolateral medulla.

      The method of this paper is a good paper to show a very precise relationship between the presence of opioid receptors and the dorsolateral pons and rostral ventrolateral medulla, and for opioid receptors, based on the expression of Oprm1, the use of genetically modified mice with anterograde or retrograde viruses with additional fluorescent colors showed both anterograde and retrograde projections, revealing a relationship between the dorsolateral pons and rostral ventrolateral medulla.

      For example, to visualize dorsal pontine neurons expressing Oprm1, Oprm1Cre/Cre mice were crossed with Ai9tdTomato Cre reporter mice to generate Ai9tdT/+ oprm1Cre/+ mice (Oprm1Cre/tdT mice) expressing tdTomato on neurons that also express MOR at any point during development, and the retrograde virus encoding Cre-dependent expression of GFP (retrograde AAV-hSIN-DIO-eGFP was injected into the respiratory center of Oprm1Cre/+ mice and into the ventral respiratory neuron group, showing that KF neurons expressing Oprm1 project to the respiration-related nucleus of the ventrolateral medulla.

      However, although the authors have also corrected it, the virus may spread to other places as well as where they thought it would be injected, and it is important to note that it is injected accordingly to mark the injection site with an anterograde virus encoding a different fluorescent color mCherry, and the extent of the injection is quantified, which is excellent as a control experiment.

      In addition, the respiratory center seems to be related not only to preBötC but also to pFRG recently, so if the relation with it is described, it is important from the viewpoint of the effect on the respiratory center and the effect on the rhythm.

      Our injections centered in preBotC, rVRG or BötC did not spread extensively to slices containing 7N/pFRG (Figure 2C and Figure 2-supplement 1D, Bregma -6.0 to -6.4, shaded region labeled 7N).

    1. Author Response:

      eLife assessment

      This manuscript analyzes large-scale Neuropixels recordings from visual areas and hippocampus of mice passively viewing repeated clips of a movie and reports that neurons respond with elevated firing activities to specific, continuous sequences of movie frames. The important results support a role of rodent hippocampal neurons in general episode encoding and advance understanding of visual information processing across different brain regions. The strength of evidence for the primary conclusion is solid, but some technical limitations of the study were identified that merit further analyses.

      We thank the editors and reviews for the assessment and reviews. We have provided clarifications and updated the manuscripts to address the seeming technical limitations that are perhaps due to some misunderstanding, please see below. We provide additional results that isolate the contribution of pupil diameter, sharpwave ripple and theta power to show that movie tuning cannot be explained by these nonspecific effects. Nor are these mere time cells or some other internally generated patterns due to many differences highlighted below.

      Reviewer #1 (Public Review):

      Taking advantage of a publicly available dataset, neuronal responses in both the visual and hippocampal areas to passive presentation of a movie are analyzed in this manuscript. Since the visual responses have been described in a number of previous studies (e.g., see Refs. 11-13), the value of this manuscript lies mostly on the hippocampal responses, especially in the context of how hippocampal neurons encode episodic memories. Previous human studies show that hippocampal neurons display selective responses to short (5 s) video clips (e.g. see Gelbard-Sagiv et al, Science 322: 96-101, 2008). The hippocampal responses in head-fixed mice to a longer (30 s) movie as studied in this manuscript could potentially offer important evidence that the rodent hippocampus encodes visual episodes.

      We have now included citations to Gelbard-Sagiv et al. Science 2008 paper and many other references too, thank you for pointing that out. There are major differences between that study and ours.

      • The movies used in previous study contained very familiar, famous people and famous events, and the experiment was about the patient’s ability to recall those famous movie episodes. In our case the mice had seen this movie clip only twice before.

      • They did not look at the fine structure of neural responses below half a second whereas we looked at the mega-scale representations from 30ms to 30s.

      • The movie clips in that study were in full color with audio, we used an isoluminant, black-and-white, silent movie clip.

      • Their movie clips contained humans and was observed by humans, whereas our study mice observed a movie clip with humans and no mice or other animals.

      The analysis strategy is mostly well designed and executed. A number of factors and controls, including baseline firing, locomotion, frame-to-frame visual content variation, are carefully considered. The inclusion of neuronal responses to scrambled movie frames in the analysis is a powerful method to reveal the modulation of a key element in episodic events, temporal continuity, on the hippocampal activity. The properties of movie fields are comprehensively characterized in the manuscript.

      Thank you.

      Although the hippocampal movie fields appear to be weaker than the visual ones (Fig. 2g, Ext. Fig. 6b), the existence of consistent hippocampal responses to movie frames is supported by the data shown. Interestingly, in my opinion, a strong piece of evidence for this is a "negative" result presented in Ext. Fig. 13c, which shows higher than chance-level correlations in hippocampal responses to same scrambled frames between even and odd trials (and higher than correlations with neighboring scrambled frames). The conclusion that hippocampal movie fields depend on continuous movie frames, rather than a pure visual response to visual contents in individual frames, is supported to some degree by their changed properties after the frame scrambling (Fig. 4).

      Yes, hippocampal selectivity is not entirely abolished with scrambled movie, as we show in several figures (Fig 4d,g and Extended Data Fig. 16), but it is greatly reduced, far more than in the afferent visual cortices. The fraction of tuned cells for scrambled movies dropped to 4.5% in hippocampus, which is close to the chance level of 3%. In contrast, in visual areas selectivity was still above 80%.

      Significant overlap between even and odd trials is to be expected for the tuned cells. Without a significant overlap, i.e. a stable representation, they will not be tuned. Despite this, the correlation between even and odd trials for the (only 4.5% of) tuned cells in the hippocampus was more than 2-fold smaller than (more than 80% of) cells in visual cortices. This strongly supports our hypothesis that unlike visual cortices, hippocampal subfields depended very strongly on the continuity of visual information. We will clarify this in the main text.

      However, there are two potential issues that could complicate this main conclusion.

      One issue is related to the effect of behavioral variation or brain state. First, although the authors show that the movie fields are still present during low-speed stationary periods, there is a large drop in the movie tuning score (Z), especially in the hippocampal areas, as shown in Ext. Fig. 3b (compared to Ext. Fig. 2d). This result suggests a potentially significant enhancement by active behavior.

      There seems to be some misunderstanding here. There was no major reduction in movie tuning during immobility or active running. As we wrote in the manuscript, the drop in selectivity during purely immobile epochs is because of reduction in the amount of data, not reduction in selectivity per se. Specifically, as the amount data reduces, the statistical strength of tuning (z-scored sparsity) reduces. For example, if we split the total of 60 trials worth of data into two parts, the amount of data reduces to about half in each part, leading to a seeming reduction in selectivity in both halves. Extended figure 2B shows nearly identical tuning in all brain regions during immobility and equivalent subsamples chosen randomly from the entire data, including mobility and immobility. We will include additional data in the revised manuscript to demonstrate this more clearly. Please see below for more details.

      Second, a general, hard-to-tackle concern is that neuronal responses could be greatly affected by changes in arousal or brain state (including drowsy or occasional brief slow-wave sleep state) in head-fixed animals without a task. Without the analysis of pupil size or local field potentials (LFPs), the arousal states during the experiment are difficult to know.

      In the revised manuscript we will that the behavioral state effects cannot explain movie tuning. Specifically:

      • We compare sessions in which the mouse was mostly immobile versus sessions in which the mouse was mostly running. Movie tuned cells were found in both these cases (Extended Data Fig. 7).

      • b. We detect and remove all data around sharp-wave ripples (SWR). Movie tuning was unchanged in the remaining data.

      • c. As a further control, we quantified arousal by two standard metrics. First within a session, we split the data into two groups, segments with high theta power and segments with low theta power. Significant movie tuning persisted in both.

      • d. Finally, pupil dilation is another common method to estimate arousal, so data within a session were split into two parts: those with pupil dilation versus constriction. Movie tuning remained significant in both parts. See the new Extended Data Fig. 7.

      Many example movie fields in the presented raw data (e.g., Fig. 1c, Ext. Fig. 4) are broad with low-quality tuning, which could be due to broad changes in brain states. This concern is especially important for hippocampal responses, since the hippocampus can enter an offline mode indicated by the occurrence of LFP sharp-wave ripples (SWRs) while animals simply stay immobile. It is believed that the ripple-associated hippocampal activity is driven mainly by internal processing, not a direct response to external input (e.g., Foster and Wilson, Nature 440: 680, 2006). The "actual" hippocampal movie fields during a true active hippocampal network state, after the removal of SWR time periods, could have different quantifications that impact the main conclusion in the manuscript.

      We included the broadly tuned hippocampal neurons to demonstrate the movie-field broadening compared to those in visual areas. We will include more examples with sharp movie fields in the hippocampal regions (Main figure 1a-d right column, 2d and h, Extended Data Fig 5 and 8). Further, as stated above, we detected sharp-wave ripples and removed one second of data around SWR. Move tuning was unchanged in the remaining data. Thus, movie tuning is not generated internally via SWR (Extended Data Fig. 6). See also Extended Data 7 and 8 and the response above.

      Another issue is related to the relative contribution of direct visual response versus the response to temporal continuity in movie fields. First, the data in Ext. Fig. 8 show that rapid frame-to-frame changes in visual contents contribute largely to hippocampal movie fields (similarly to visual movie fields).

      There seems to be some misunderstanding here. That figure showed that the frame-toframe changes in the visual content had the highest effect on visual areas MSUA and much weaker in hippocampus (Extended Data Fig. 8, as per previous version). For example, the depth of modulation (max – min) / (max + min) for MSUA was 21% and 24% for V1 but below 6% for hippocampal regions. Similarly, the MSUA was more strongly (negatively) correlated with F2F correlation for visual areas (r=0.48 to 0.56) than hippocampal (0.07 to 0.3). Similarly, comparing the number of peaks or their median widths, visual regions showed stronger correlation with F2F, and largest depth of modulation than hippocampal regions, barring handful exceptions (like CA3 correlation between F2F and median peak duration). This strongly supports our claim that visual regions generated far greater response of the frame-to-frame changes in the movie than hippocampal regions.

      Interestingly, the data show that movie-field responses are correlated across all brain areas including the hippocampal ones.

      The changes in multiunit activity are strongly correlated only between visual areas and some of the hippocampal region pairs. The correlation is much weaker for hippocampal areas, or hippocampal-visual area pairs. This will be quantified explicitly in the revised text Extended Data Fig. 11 with an additional correlation matrix. Further, in Fig 3c we compared the MSUA responses with normalization between brain regions. Amongst the 21 possible brain region pairs, 5 were uncorrelated, 7 were significantly negatively correlated and 9 were significantly correlated.

      This could be due to heightened behavioral arousal caused by the changing frames as mentioned above, or due to enhanced neuronal responses to visual transients, which supports a component of direct visual response in hippocampal movie fields.

      As shown in Extended data 7 and 8 and described above, the effect of arousal as quantified by theta power of pupil diameter cannot explain the results in hippocampal areas and the correlations in multiunit responses are unrelated across many brain areas.

      Second, the data in Ext. Fig. 13c show a significant correlation in hippocampal responses to same scrambled frames between even and odd trials, which also suggests a significant component of direct visual response.

      This is plausible. The fraction of hippocampal cells which were significantly tuned for the scrambled presentation (4.5%) was close to chance level (3%), and this small subset of cells was used to compute the population overlap between even and odd trials in Ext Fig. 13 (old numbering). As described above, this significant but small amount of tuning could generate significant population overlap, which is to be expected by construction.

      Is there a significant component purely due to the temporal continuity of movie frames in hippocampal movie fields? To support that this is indeed the case, the authors have presented data that hippocampal movie fields largely disappear after movie frames are scrambled. However, this could be caused by the movie-field detection method (it is unclear whether single-frame field could be detected).

      As described in the methods section, the movie-field detection algorithm had a resolution of 3.3ms resolution, which ensured that we could detect single frame fields. As reported, we did find such short movie fields in several cells in the visual areas. The sparsity metric used is agnostic to the ordering of the responses, and hence single frame field, and the resultant significant movie-tuning, if present, can be detected by our methods.

      Another concern in the analysis is that movie-fields are not analyzed on re-arranged neural responses to scrambled movie frames. The raw data in Fig. 4e seem quite convincing. Unfortunately, the quantifications of movie fields in this case are not compared to those with the original movie.

      We saw very few (3.6-4.9%) cells with significant movie tuning for scrambled presentation in the hippocampus. Hence, we did not quantify this earlier. This is now provided in new Extended Data Fig. 16. The amount of movie tuning for the scrambled presentation taken as-is, or after rearranging the frames is below 5% for all hippocampal brain regions.

      Reviewer #2 (Public Review):

      […] The authors have concluded that the neurons in the thalamo-cortical visual areas and the hippocampus commonly encode continuous visual stimuli with their firing fields spanning the mega-scale, but they respond to different aspects of the visual stimuli (i.e., visual contents of the image versus a sequence of the images). The conclusion of the study is fairly supported by the data, but some remaining concerns should be addressed.

      1) Care should be taken in interpreting the results since the animal's behavior was not controlled during the physiological recording.

      This was done intentionally since plenty of research shows that task demand (e.g., Aronov and Tank, Nature 2017) can not only modulate hippocampal responses but also dramatically alter them. We have now provided additional figures (Extended Data Fig. 6 and 7) where we quantified the effects of the behavioral states (sharp wave ripples, theta power and pupil diameter), as well as the effect of locomotion (Extended Data Fig. 4). Movie tuning remained unaffected with these manipulations. Thus, movie tuning cannot be attributed to behavioral effects.

      It has been reported that some hippocampal neuronal activities are modulated by locomotion, which may still contribute to some of the results in the current study. Although the authors claimed that the animal's locomotion did not influence the movie-tuning by showing the unaltered proportion of movie-tuned cells with stationary epochs only, the effects of locomotion should be tested in a more specific way (e.g., comparing changes in the strength of movie-tuning under certain locomotion conditions at the single-cell level).

      Single cell analysis of the effect of locomotion and visual stimulation is underway, and beyond the scope of the current work. As detailed in the (Extended Data Fig. 4), we have ensured that in spite of the removal of running or stationary epochs, as well as removal of sharp wave ripple events (Extended Data Fig. 6) movie tuning persists. Further, we will provide examples of strongly tuned cells from sessions with predominantly running or predominantly stationary behavior (Extended Data Fig. 7).

      2) The mega-scale spanning of movie-fields needs to be further examined with a more controlled stimulus for reasonable comparison with the traditional place fields. This is because the movie used in the current study consists of a fast-changing first half and a slow-changing second half, and such varying and ununified composition of the movie might have largely affected the formation of movie-fields. According to Fig. 3, the mega-scale spanning appears to be driven by the changes in frame-to-frame correlation within the movie. That is, visual stimuli changing quickly induced several short fields while persisting stimuli with fewer changes elongated the fields.

      Please note that a strong correlation between the speed at which the movie scene changed across frames was correlated with movie-field width in the visual areas, but that correlation was much weaker in the hippocampal areas (see above). Please see Extended Data Fig. 11 and the quantification of correlation between frame-to-frame changes in the movie and the properties of movie fields.

      The presentation of persisting visual input for a long time is thought to be similar to staying in one place for a long time, and the hippocampal activities have been reported to manifest in different ways between running and standing still (i.e., theta-modulated vs. sharp wave ripple-based). Therefore, it should be further examined whether the broad movie-fields are broadly tuned to the continuous visual inputs or caused by other brain states.

      As shown in Extended Data Fig. 6, movie field properties are largely unchanged when SWR are removed from the data, or when the effect of pupil diameter or theta power were factored for (Extended Data Fig.7).

      3) The population activities of the hippocampal movie-tuned cells in Fig. 3a-b look like those of time cells, tiling the movie playback period. It needs to be clarified whether the hippocampal cells are actively coding the visual inputs or just filling the duration.

      Tiling patterns would be observed when the maximal are sorted in any data, even for random numbers. This alone does not make them time cells. The following observations suggest that movie fields cannot be explained as being time cells.

      • a. Time cells mostly cluster at the beginning of a running epoch (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) and they taper off towards the end. Such large clustering is not visible in these tiling plots for movie tuned cells.

      • b. Time fields become wider as the temporal duration progresses (Pastalkova et al. Science 2008, MacDonald et al. Neuron 2011) as the encoded temporal duration increases. This is not evident in any movie fields.

      • c. Widths of movie fields in visual areas, and to a smaller extent in the hippocampal areas, were clearly modulated by the visual content, like the change from one frame to the next (F2F correlation, Extended Data Fig. 11).

      • d. Tiling pattern of movie fields was found in visual areas too, with qualitatively similar pattern as hippocampus. Clearly, visual area responses are not time cells, as shown by the scrambled stimulus experiment. Here, neural selectivity could be recovered by rearranging them based on the visual content of the continuous movie, and not the passage of time.

      The scrambled condition in which the sequence of the images was randomly permutated made the hippocampal neurons totally lose their selective responses, failing to reconstruct the neural responses to the original sequence by rearrangement of the scrambled sequence. This result indirectly addressed that the substantial portion of the hippocampal cells did not just fill the duration but represented the contents and temporal order of the images. However, it should be directly confirmed whether the tiling pattern disappeared with the population activities in the scrambled condition (as shown in Extended Data Fig. 11, but data were not shown for the hippocampus).

      As stated above for the continuous movie, tiling pattern alone does not mean those are time cells. Further, tuning, and tiling pattern remained intact with scrambled movie in the visual cortices but not in hippocampus.

      Reviewer #3 (Public Review):

      […] The paper is conceptually novel since it specifically aims to remove any behavioral or task engagement whatsoever in the head-fixed mice, a setup typically used as an open-loop control condition in virtual reality-based navigational or decision making tasks (e.g. Harvey et al., 2012). Because the study specifically addresses this aspect of encoding (i.e. exploring effects of pure visual content rather than something task-related), and because of the widespread use of video-based virtual reality paradigms in different sub-fields, the paper should be of interest to those studying visual processing as well as those studying visual and spatial coding in the hippocampal system. However, the task-free approach of the experiments (including closely controlling for movement-related effects) presents a Catch-22, since there is no way that the animal subjects can report actually recognizing or remembering any of the visual content we are to believe they do.

      Our claim is that these are movie scene evoked responses. We make no claims about the animal’s ability to recognize or remember the movie content. That would require entirely different set of experiments. Meanwhile, we have shown that these results are not an artifact of brain states such as sharp wave ripples, theta power or pupil diameter (Extended Data Fig. 6 and 7) or running behavior (Extended Data Fig. 4). Please see above for a detailed response.

      We must rely on above-chance-level decoding of movie segments, and the requirement that the movie is played in order rather than scrambled, to indicate that the hippocampal system encodes episodic content of the movie. So the study represents an interesting conceptual advance, and the analyses appear solid and support the conclusion, but there are methodological limitations.

      It is important to emphasize that these responses could constitute episodic responses but does not prove episodic memory, just as place cell responses constitute spatial responses but that does not prove spatial memory. The link between place cells and place memory is not entirely clear. For example, mice lacking NMDA receptors have intact place cells, but are impaired in spatial memory task (McHugh et al. Cell 1996), whereas spatial tuning was virtually destroyed in mice lacking GluR1 receptors, but they could still do various spatial memory tasks (Resnik et al. J. Neuro 2012). The experiments about episodic memory would require an entirely different set of experiments that involve task demand and behavioral response, which in turn would modify hippocampal responses substantially, as shown by many studies. Our hypothesis here, is that just like place cells, these episodic responses without task demand would play a role, to be determined, in episodic memory. We will emphasize this point in the main text (Ln 432-436 in the revised manuscript).

      Major concerns:

      1) A lot hinges on hinges on the cells having a z-scored sparsity >2, the cutoff for a cell to be counted as significantly modulated by the movie. What is the justification of this criterion?

      The z-scored sparsity (z>2) corresponds to p<0.03. This would mean that 3% of the results could appear by chance. Hence, z>2 is a standard method used in many publications. Another advantage of z-scored sparsity is that it is relatively insensitive to the number of spikes generated by a neuron (i.e. the mean firing rate of the neuron and the duration of the experiment). In contrast, sparsity is strongly dependent on the number of spikes which makes it difficult to compare across neurons, brain regions and conditions (See Supplement S5 Acharya et al. Cell 2016). To further address this point, we compared our z-scored sparsity measure with 2 other commonly used metrics to quantify neural selectivity, depth of modulation and mutual information (Extended Data Fig. 3). Comparable movie tuning was obtained from all 3 metrics, upon z-scoring in an identical fashion.

      It should be stated in the Results. Relatedly, it appears the formula used for calculating sparseness in the present study is not the same as that used to calculate lifetime sparseness in de Vries et al. 2020 quoted in the results (see the formula in the Methods of the de Vries 2020 paper immediately under the sentence: "Lifetime sparseness was computed using the definition in Vinje and Gallant").

      The definition of sparsity we used is used commonly by most hippocampal scientists (Treves and Rolls 1991, Skaggs et al. 1996, Ravassard et al. 2013). Lifetime sparseness equation used by de Vries et al. 2020, differs from us by just one constant factor (1-1/N) where N=900 is the number of frames in the movie. This constant factor equals (1- 1/900)=0.999. Hence, there is no difference between the sparsity obtained by these two methods. Further, z-scored sparsity is entirely unaffected by such constant factors. We will clarify this in the methods of the revised manuscript.

      To rule out systematic differences between studies beyond differences in neural sampling (single units vs. calcium imaging), it would be nice to see whether calculating lifetime sparseness per de Vries et al. changed the fraction "movie" cells in the visual and hippocampal systems.

      As stated above, the two definitions of sparsity are virtually identical and we obtained similar results using two other commonly used metrics, which are detailed in Extended Data Fig. 3.

      2) In Figures 1, 2 and the supplementary figures-the sparseness scores should be reported along with the raw data for each cell, so the readers can be apprised of what types of firing selectivity are associated with which sparseness scores-as would be shown for metrics like gridness or Raleigh vector lengths for head direction cells. It would be helpful to include this wherever there are plots showing spike rasters arranged by frame number & the trial-averaged mean rate.

      As shown in several papers (Aghajan et al Nature Neuroscience 2015, Acharya et al., Cell 2016) raw sparsity (or information content) are strongly dependent on the number of spikes of a neuron. This makes the raw values of these numbers impossible to compare across cells, brain regions and conditions. (Please see Supplement S5 from Acharya et al., Cell 2016 for details). Including the data of sparsity would thus cause undue confusion. Hence, we provide z-scored sparsity. This metric is comparable across cells and brain regions, and now provided above each example cell in Figure 1 and Extended Data Fig. 2.

      3) The examples shown on the right in Figures 1b and c are not especially compelling examples of movie-specific tuning; it would be helpful in making the case for "movie" cells if cleaner / more robust cells are shown (like the examples on the left in 1b and c).

      We did not put the most strongly tuned hippocampal neurons in the main figures so that these cells are representative of the ensemble and not the best possible ones, so as to include examples with broad tuning responses. We have clarified in the legend that these cells are some of the best tuned cells. Although not the cleanest looking, the z-scored sparsity mentioned above the panels now indicates how strongly they are modulated compared to chance levels. Additional examples, including those with sharply tuned responses are shown in Extended Data Fig. 5 and 8.

      4) The scrambled movie condition is an essential control which, along with the stability checks in Supplementary Figure 7, provide the most persuasive evidence that the movie fields reflect more than a passive readout of visual images on a screen. However, in reference to Figure 4c, can the authors offer an explanation as to why V1 is substantially less affected by the movie scrambling than it's main input (LGN) and the cortical areas immediately downstream of it? This seems to defy the interpretation that "movie coding" follows the visual processing hierarchy.

      This is an important point, one that we find very surprising as well. Perhaps this is related to other surprising observations in our manuscript, such as more neurons appeared to be tuned to the movie than the classic stimuli. A direct comparison between movie responses versus fixed images is not possible at this point due to several additional differences such as the duration of image presentations and their temporal history. The latency required to rearrange the scrambled responses (60ms for LGN, 74ms for V1, 91ms for AM/PM) supports the anatomical hierarchy. The pattern of movie tuning properties was also broadly consistent between V1 and AM/PM (Fig 2). However, all metrics of movie selectivity (Fig 2) to the continuous movie showed a consistent pattern that was the exact opposite pattern of the simple anatomical hierarchy: V1 had stronger movie tuning, higher number of movie fields per cell, narrower movie-field widths, larger mega-scale structure, and better decoding than LGN. V1 was also more robust to the scrambled sequence than LGN. One possible explanation is that there are other sources of inputs to V1, beyond LGN, that contribute significantly to movie tuning. This is an important insight and we will modify the discussion to highlight this.

      Relatedly, the hippocampal data do not quite fit with visual hierarchical ordering either, with CA3 being less sensitive to scrambling than DG. Since the data (especially in V1) seem to defy hierarchical visual processing, why not drop that interpretation? It is not particularly convincing as is.

      The anatomical organization is well established and an important factor. Even when observations do not fit the anatomical hierarchy, it provides important insights about the mechanisms. All properties of movie tuning (Fig 2) –the strength of tuning, number of movie peaks, their width and decoding accuracy firmly put visual areas upstream of hippocampal regions. But, just like visual cortex there are consistent patterns that do not support a simple feed-forward anatomical hierarchy. We have pointed out these patterns so that future work can build upon it.

      5) In the Discussion, the authors argue that the mice encode episodic content from the movie clip as a human or monkey would. This is supported by the (crucial) data from the scrambled movie condition, but is nevertheless difficult to prove empirically since the animals cannot give a behavioral report of recognition and, without some kind of reinforcement, why should a segment from a movie mean anything to a head-fixed, passively viewing mouse?

      We emphasize once again that our claim is about the nature of encoding of the movie across these neurons. We make no claims about whether this forms a memory or whether the mouse is able to recognize the content or remember it. Despite decades of research, similar claims are difficult to prove for place cells, with plenty of counter examples (See the points above). The important point here is that despite any cognitive component, we see remarkably tuned responses in these brain areas. Their role in cognition would take a lot more effort and is beyond the scope of the current work.

      Would the authors also argue that hippocampal cells would exhibit "song" fields if segments of a radio song-equally arbitrary for a mouse-were presented repeatedly? (reminiscent of the study by Aronov et al. 2017, but if sound were presented outside the context of a task). How can one distinguish between mere sequence coding vs. encoding of episodically meaningful content? One or a few sentences on this should be added in the Discussion.

      Aronov et al 2017, found the encoding of an audio sweep in hippocampus when the animals were doing a task (release the lever at a specific frequency to obtain a reward). However, without a task demand they found that hippocampal neurons did not encode the audio sequence beyond chance levels. This is at odds with our findings with the movie where we see strong tuning despite any task demand or reward. These results are consistent with but go far beyond our recent findings that hippocampal (CA1) neurons can encode the position and direction of motion of a revolving bar of light (Purandare et al. Nature 2022). Please see Ln 414-420 for related discussion.

      These responses are unlikely to be mere sequence responses since the scrambled sequence was also fixed sequence that was presented many times and it elicited reliable responses in visual areas, but not in hippocampus. Hence, we hypothesize that hippocampal areas encode temporally related information, i.e. episodic content. We will modify the discussion to address these points.

    1. Author Response:

      We thank the eLife editorial board and the reviewers for the assessment of our article. We look forward to thoroughly addressing their comments and concerns. We would like to correct one factual error in the consensus public review:

      “Importantly, the authors do not present evidence that value itself is stably encoded across days, despite the paper's title. The more conservative in its claims in the Discussion seems more appropriate: "these results demonstrate a lack of regional specialization in value coding and the stability of cue and lick [(not value)] codes in PFC."

      The imaging sessions in which we identify value coding cells were in fact performed on separate days: Experimental Days 6 and 7 (see Figure 1b), which is evidence of the stability of value coding across consecutive days. Days 6 and 7 correspond to the third day of Odor Set 1 and the third day of Odor Set 2, respectively, which is why we referred to them both as “Day 3” in the manuscript, and this may have led to the confusion about the temporal relationship between these sessions. We will clarify this terminology in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this well-written manuscript, Afshar et al demonstrated the significant transcriptional and proteomic differences between cultured human umbilical vein endothelial cells (HUVECs) and those freshly isolated from the cords. They showed that TGFbeta and BMP signaling target genes were enriched in cord cells compared to those in culture. Extracellular matrix (ECM) and cell cycle-related genes were also different between the two conditions. Because master regulators of EC shear stress response genes, KLF2 and KLF4, were downregulated in culture, the authors sought to restore the in vivo transcriptional profile with the application of shear stress in an orbital shaker and dextran-containing media for various time periods. They showed that after 48 hours of shear stress the transcriptional profile of sheared cells correlated with in vivo transcriptional profile more significantly than static cultures. They also showed, using single cell RNAseq, that EC-smooth muscle cell cocultures resulted in changes in TGFbeta and NOTCH signaling pathways and rescued 9% of the in vivo transcriptional signatures.

      This is an important study that was elegantly executed. The authors should also be commended for making their data public; thereby, creating a valuable resource for vascular biologists.

      We much appreciate the comments and thank the reviewer for the time and effort evaluating the study.

      Reviewer #2 (Public Review):

      The authors profiled the transcriptome and proteome of human umbilical vein endothelial cells freshly isolated from in vivo and compared that with the same cells exposed to in vitro culture under different conditions, including static culture, flow, and co-culture with smooth muscle cells. The experiments were properly designed and performed. The authors also provided a reasonable and sound interpretation of their findings. This study provides valuable insights into how the culturing conditions impact on gene expression, encouraging the field to select their in vitro work setting appropriately. Overall, the manuscript is well-written and easy to follow.

      Several notable strengths include:

      1. Parallel transcriptome- and proteome-wide profiling of endothelial cells enabling the unbiased interrogation of gene expression and a genome-wide view of the impact of in vitro culture on endothelial transcriptome.

      2. The innovative experimental design and comparisons were done with genetically identical ECs (from the same donors) in vivo and in vitro.

      3. The analyses were robust and provided novel information on flow-dependent and cell context-dependent gene regulation, with the native freshly isolated cells as a baseline.

      4. The donor samples used in this study were diverse including Asian, White, Black, Latino, and American Indian samples which reduce racial background bias.

      Some points that can strengthen the study:

      A clear description of experimental and analytical details (e.g. how the comparisons were made) and more in-depth interpretation and discussion of the results, e.g. the complete genes that are rescued by flow and co-culture and potential synergy of these factors.

      We thank the reviewer for highlighting the strengths and appreciate the comments on experimental and analytical details which have been now addressed in this revised manuscript. Specifically, we have expanded the discussion and included synergy and additional comments on the rescued genes. A clear description of experimental and analytical details (e.g. how the comparisons were made) and more in-depth interpretation and discussion of the results, e.g. the complete genes that are rescued by flow and co-culture and potential synergy of these factors are now included.

      Reviewer #3 (Public Review):

      Afshar et al. performed RNA-seq and LC-MS of in vivo and in vitro HUVECs to identify the role of culture conditions on gene expression. Given the widespread use of HUVECs to study EC biology, these findings are interesting and can help design better in vitro experiments. There have been previous papers that compared in vivo and in vitro HUVECs, however, the depth of sequencing and analysis in this manuscript identifies some novel effects which should be accounted for in future in vitro experiments using ECs.

      Strengths:

      1. Major findings of distinct pathways affected by cell culture are novel and interesting. The authors identify major effects on TGFb and ECM gene expression. They also corroborate previous findings of flow response pathways, namely KLF2/4 and Notch pathway regulation.

      2. Use of multiple genomic methods to profile effects of culture conditions. The LC-MS data showed a significant correlation with RNA-seq, however, the data were not as strong so not used for subsequent analyses.

      3. Use of scRNA-seq to show the dynamic effects of co-culture and shear stress on ECs is very novel. However, the heterogeneity in the EC populations is not discussed in this manuscript.

      We would like to thank the reviewer for the in-depth analysis of our study and for highlighting the novelty and strength of the data. Note that we included comments in relation to EC heterogeneity as part of the limitations of this study (in the Discussion).

      Weaknesses:

      1. The physiological relevance of these changes in gene expression is not demonstrated in the manuscript. The authors claim the significance of their data is to improve in vitro culture to better represent in vivo biology. Is this the case with orbital shear stress? Do they rescue some functional effects in ECs with long-term shear stress? An angiogenesis, barrier function, or migration assay for HUVECs exposed to different conditions would help answer this question. A similar assay for cells after EC-VSMC co-culture would validate the importance of these stimuli.

      The reviewer is correct, our manuscript did not expand into physiological read outs, we have now clearly acknowledged this as part of the limitations of the study. Notably, there is already extensive literature on the effects of different types of flow on several physiological parameters. For example, others have shown that laminar shear stress (by orbital or other means) reduces proliferation and migration (PMID: 31831023; PMID: 22012789, PMID: 12857765, PMID: 21312062, PMID: 15886673; PMID: 17323381), reduces inflammation (PMID: 34747636; PMID: 32951280), and improves barrier function (PMID: 20543206; PMID: 32457386 ; PMID: 12577139, PMID: 27246807; PMID: 31500313 ).

      From the onset, our objective was to bring granularity to transcriptional changes associated with the transition from in vivo to in vitro. Further, it was our goal to identify the cohorts of transcripts that could and those that could not be rescued by altering culture conditions. Because we had transcriptional information from the identical samples at a time that they were in the vessel, we have been able to fulfill our goal. We feel this is important, and currently missing data, that will be of value to many investigators.

      1. One explanation for the increased expression of ECM genes in vivo is that these cells are contaminated with VSMCs/fibroblasts. This could be very likely given that cells were not sorted or purified upon isolation. Expression of other VSMC or fibroblast-specific markers (i.e. CNN1, MYH11, SMTN, DCN, FBLN1) would help determine if there is some level of non-EC contamination.

      We thank the reviewer for this comment and prompted by this, we have included a new figure (Supplemental Figure 1 and new panels in Supplemental Figure 5) that directly address this concern.

      Amongst the several pieces of data, we included scRNAseq from cells that were immediately obtained from umbilical vein – three independent experiments sequenced together and showed in one UMAP (Supplemental Figure 1C). As can be appreciated, the very large majority of cells are endothelial and the only other cell types present were blood cells (erythrocytes and CD45+ cells). No smooth muscle cells or fibroblasts were detected. These three examples are indeed representative of a large number of scRNAseq datasets (35 from cords and cultures for this and other projects). Furthermore, our cultures are also routinely evaluated by FACS (one example has been provided in Supplemental Figure 1E). We do not find, as illustrated in that example, cells that are not positive for CD31 and VE-Cadherin.

      We hope this information reveals the rigor of our studies and convinces the reviewer that the transcriptional changes observed are from endothelial cells.

      1. The use of scRNA-seq in Figure 4 is interesting. There appear to be 2 distinct EC populations in the co-cultured ECs. What are the marker genes for the 2 populations?

      Indeed, we and others (Kalluri et al., 2019) have noticed two distinct populations in the in vivo and also in cultured ECs, as pointed by the reviewer. Evaluation as to these two subpopulations reflect two transcriptionally distinct groups or different states of cyclic expression patterns, requires more thorough analysis and lineage tracing studies and distinct from the focus of this manuscript. Nonetheless, we have made a point in the revised manuscript to highlight these possibilities.

      Reference: Kalluri, AS, Vellarikkal, SK, Edelman, ER, Nguyen, L, Subramanian, A, Ellinor PT, Regev, A, Kathiresan, S, Gupta, RM. Single Cell Analysis of the Normal Mouse Aorta Reveals Functionally Distinct Endothelial Cell Populations. Circulation, 2019. 140:147-163.

      1. The modest shifts in gene expression with shear stress and co-culture could be attributed to the batch effect. The authors describe 1 batch correction method (ComBat) in the bulk RNA-seq, but no mention of batch correction was noted in the scRNA-seq methods. The authors should ensure that batch effect correction in all data is adequate, and these results should be added to the manuscript.

      We thank the reviewer for this comment. Indeed, batch effects are a particularly important consideration when samples are prepared separately and/or sequenced at distinct times, note this was not the case in this study.

      For the scRNA-seq analysis, we removed the low-quality cells, but did not use batch-effect correction methods because the samples were prepared and run at the same time. Meaning, isolation was performed in parallel, generation of cDNA libraries was done concurrently, and sequencing was run in the same gel. The quality of the data (and lack of batch effect) was subsequently verified when the two mono-culture biological replicates were evaluated by Seurat and were found to overlap on the UMAP (Figure 4), the same applies to the two co-culture biological replicates. These results clearly indicate that there’s no batch effect (as the samples were not process in distinct batches) among these samples.

      1. Table 1 shows ATAC-seq was done, however, no data from these experiments are provided in the manuscript.

      As mentioned (reviewer 2), we had performed ATACseq but decided to remove from the manuscript for several reasons and apologize for missing reference to Table 1. We have now corrected this error.

      1. Shear stress was achieved with an orbital shaker, which the accompanying citation states introduces significant heterogeneity in the ECs. This is based on the location of the culture dish. Was this heterogeneity seen in the scRNA-seq data?

      Correct. We only use the 2/3 peripheral area of the plates and discard the central aspect of the plate. We have added clarifying language to the Methods > Shear stress application to reflect this: “Orbital shear stress (130 rpm) was applied to confluent cell cultures by using an orbital shaker positioned inside the incubator as previously discussed (32). The shear stress within the cell culture well corresponds to arterial magnitudes (11.5 dynes/cm2) of shear stress. To reduce issues associated with uniformity of shear stress, the endothelial cell monolayers in 6-well plates were lysed after removing center region using cell scraper (BD Falcon #35-3085) and washing with 1X HBSS (Corning #21-022-CV). The 1.8cm blade was circumferentially used in the center of the 6-well plate to remove the center of the monolayer that did not see the higher shear stress.”

      1. It would be important to know whether the authors reproduce the findings from other papers that CD34 expression is reduced in cultured HUVECs:

      Muller AM, Cronen C, Muller KM, Kirkpatrick CJ: Comparative analysis of the reactivity of human umbilical vein endothelial cells in organ and monolayer culture. Pathobiology 1999;67:99-107. Delia D, Lampugnani MG, Resnati M, Dejana E, Aiello A, Fontanella E, Soligo D, Pierotti MA, Greaves MF: Cd34 expression is regulated reciprocally with adhesion molecules in vascular endothelial cells in vitro. Blood 1993;81:1001-1008.

      Thank you for this suggestion. Supplemental Excel 4 allows the reader to review single genes that are modulated by condition and in fact, consistent with all previous literature, CD34 expression is one of the most significantly decreased genes in cultured HUVECs (0.9, p=1E-5).

    1. Author Response

      Reviewer #1 (Public Review):

      1) I was confused about the nature of the short-term plasticity mechanism being modeled. In the Introduction, the contrast drawn is between synaptic rewiring and various plasticity mechanisms at existing synapses, including long-term potentiation/depression, and shorter-term facilitation and depression. And the synaptic modulation mechanism introduced is modeled on STDP (which is a natural fit for an associative/Hebbian rule, especially given that short-term plasticity mechanisms are more often non-Hebbian).

      Indeed, because of its associative nature, the modulation mechanism was envisioned to be STDP-like, i.e. on faster time scales than the complete rewiring of the network (via backpropagation) but slower time scales than things like STSP which, as the reviewer points out, are usually not considered associative. One thing we do want to highlight is that backpropagation and the modulation mechanism are certainly not independent of one another. During training, the network’s weights that are being adjusted by backpropagation are experiencing modulations, and said modulations certainly factor into the gradient calculation.

      We have edited the abstract and introduction to try to make the distinction of what we are trying to model clearer.

      1) cont: On the other hand, in the network models the weights being altered by backpropagation are changes in strength (since the network layers are all-to-all), corresponding more closely to LTP/LTD. And in general, standard supervised artificial neural network training more closely resembles LTP/LTD than changing which neurons are connected to which (and even if there is rewiring, these networks primarily rely on persistent weight changes at existing synapses).

      Although we did not highlight this particular biological mechanism because we wanted to keep the updates as general as possible, one could view the early versus late LTP. We have added an additional discussion of how the associative modulation mechanisms and backpropagation might biologically map into this mechanism in the discussion section.

      1) cont: Moreover, given the timescales of typical systems neuroscience tasks with input coming in on the 100s of ms timescale, the need for multiple repetitions to induce long-term plasticity, and the transient nature/short decay times of the synaptic modulations in the SM matrix, the SM matrix seems to be changing on a timescale faster than LTP/LTD and closer to STP mechanisms like facilitation/depression. So it was not clear to me what mechanism this was supposed to correspond to.

      We note that although the structure of the tasks certainly resembles known neuroscience experiments that happen on shorter time scales (and with the introduction of the 19 new NeuroGym tasks, even more so), we did not have a particular time scale for task effects in mind. So each piece of “evidence” in the integration tasks may indeed occur over significantly slower time scales and could abstractly represent multiple repetitions in order to induce (say) early phase LTP.

      Given that the separation between the two plasticity mechanisms may be clearer for STSP, and indeed many of the tasks we investigate may more naturally be mapped to tasks that occur on time scales more relevant to STSP, we have introduced a second modulation rule that is only dependent upon the presynaptic firing rates. See our response to the Essential Revisions above for additional details on these new results.

      2) A number of studies have explored using short-term plasticity mechanisms to store information over time and have found that these mechanisms are useful for general information integration over time. While many of these are briefly cited, I think they need to be further discussed and the current work situated in the context of these prior studies. In particular, it was not clear to me when and how the authors' assumptions differed from those in previous studies, which specific conclusions were novel to this study, and which conclusions are true for this specific mechanism as opposed to being generally true when using STP mechanisms for integration tasks.

      We have added additional works to the related works sections and expanded the introduction to try to better convey the differences with our work and previous studies. Briefly, mostly our assumptions differed from previous studies in that we considered a network that relied only on synaptic modulations to do computations, rather than a network with both recurrence and synaptic modulations. This allowed us to isolate the computational power and behavior of computing using synaptic modulations alone.

      It is hard to say which of the conclusions are generally true when using STP mechanisms for integration tasks without a comprehensive comparison of the various models of STP on the same tasks we investigated here. That being said, we believe we have presented in this work conclusions that are not present in other works (as far as we are aware) including: (1) a demonstration of the strength of computing with synaptic connection on a large variety of sequential tasks, (2) an investigation into the dynamics of such computations how they might manifest in neuronal recordings, and (3) a brief look at how these different dynamics might be computational beneficial in neuroscience-relevant areas. We also note that one reason for the simplicity of our mechanism is that we believe it captures many effects of synaptic modulations (e.g. gradual increase/decrease of synaptic strength that eventually saturates) with a relatively simple expression, and so we believe other STP mechanisms would yield qualitatively similar results. We have edited the text to try to clarify when conclusions are novel to this study and when we are referencing results from other works.

      Reviewer #2 (Public Review):

      On the other hand, the general principle appears (perhaps naively) very general: any stimulus-dependent, sufficiently long-lived change in neuronal/synaptic properties is a potential memory buffer. For instance, one might wonder whether some non-associative form of synaptic plasticity (unlike the Hebbian-like form studied in the paper), such as short-term synaptic plasticity which depends only on the pre-synaptic activity (and is better motivated experimentally), would be equally effective. Or, for that matter, one might wonder whether just neuronal adaptation, in the hidden layer, for instance, would be sufficient. In this sense, a weakness of this work is that there is little attempt at understanding when and how the proposed mechanism fails.

      We have tried to address if the simplicity of the tasks considered in this work may be a reason for the MPN’s success by training it on 19 additional neuroscience tasks (see response to Essential Revisions above). Across all these additional tasks, we found the MPN performs comparable to its RNN counterparts.

      To address whether associativity is necessary in our setup we have introduced a version of the MPN that has modulation updates that are only presynaptic dependent. We call this the “MPNpre” and have added several results across the paper addressing its computational abilities (again, additional details are provided above in Essential Revisions). We find the MPNpre has dynamics that are qualitatively the same as its MPN counterpart and has very comparable computational capabilities.

      Certainly, some of the tasks we consider may also be solvable by introducing other forms of computation such as neuronal adaptation. Indeed, we believe the ability of the brain to solve tasks in so many different ways is one of the things that makes it so difficult to study. Our work here has attempted to highlight one particular way of doing computations (via synapse dynamics) and compared it to one particular other form (recurrent connections). Extending this work to even more forms of computation, including neuronal dynamics, would be very interesting and further help distinguish these different computational methods from one another.

      Reviewer #3 (Public Review):

      Because the MPN is essentially a low-pass filter of the activity, and the activity is the input - it seems that integration is almost automatically satisfied by the dynamics. Are these networks able to perform non-integration tasks? Decision-making (which involves saddle points), for instance, is often studied with RNNs.

      We have tested the MPN on 19 additional supervised learning tasks found in the NeuroGym package (Molano-Mazon et. al., 2022), which consists of several decision-making-based tasks and added these results to the main text (see response to Essential Revisions above, and also Figs. 7i & 7j). Across all tasks we investigated, we found the MPN performs at comparable levels to its RNN counterparts.

      Manuel Molano-Mazon, Joao Barbosa, Jordi Pastor-Ciurana, Marta Fradera, Ru-Yuan Zhang, Jeremy Forest, Jorge del Pozo Lerida, Li Ji-An, Christopher J Cueva, Jaime de la Rocha, et al. “NeuroGym: An open resource for developing and sharing neuroscience tasks”. (2022).

      The current work has some resemblance to reservoir computing models. Because the M matrix decays to zero eventually, this is reminiscent of the fading memory property of reservoir models. Specifically, the dynamic variables encode a decaying memory of the input, and - given large enough networks - almost any function of the input can be simply read out. Within this context, there were works that studied how introducing different time scales changes performance (e.g., Schrauwen et al 2007).

      Thank you for pointing out this resemblance and work. In our setup, the fact that lamba is the same for the entire network means all elements of M decrease uniformly (though the learned modulation updates may allow for the growth of M to be non-uniform). One modification that we think would be very interesting to explore is the effects on the dynamics of non-uniform learning rates or decays across synapses. In this setting, the M matrix could have significantly different time scales and may even further resemble reservoir computing setups. We have added a sentence to the discussion section discussing this possibility.

      Another point is the interaction of the proposed plasticity rule with hidden-unit dynamics. What will happen for RNNs with these plasticity rules? I see why introducing short-term plasticity in a "clean" setting can help understand it, but it would be nice to see that nothing breaks when moving to a complete setting. Here, too, there are existing works that tackle this issue (e.g., Orhan & Ma, Ballintyn et al, Rodriguez et al).

      Thank you for pointing out these additional works, they are indeed very relevant and we have added them all to the text where relevant.

      Here we believe we have shown that either recurrent connections or synaptic dynamics alone can be used to solve a wide variety of neuroscience tasks. We don’t believe a hybrid setting with both synaptic dynamics and recurrence (e.g. a Vanilla RNN with synaptic dynamics) would “break” any part of this setup. Since each of the computational mechanisms could be learned to be suppressed the network could simply solve the task by relying on only one of the two mechanisms. For example, it could use a strictly non-synaptic solution by driving eta (the learning rate of the modulations) to zero or it could use a non-recurrent solution by driving the influence of recurrent connections to be very small. Orhan & Ma mention they have a hard time training a Vanilla RNN with Hebbian modulations on the recurrent weights for any modulation effect that goes back more than one time step, but unlike our work they rely on a fixed modulation strength.

      Indeed, we think how networks with multiple computational mechanisms will solve tasks is a very interesting question to be further investigated, and a hybrid solution may be likely. We believe our work is valuable in that it illuminates one end of the spectrum that is relatively unexplored: how such tasks could be solved using just synaptic dynamics. However, what type of solution a complete setup ultimately lands on is likely largely dependent upon both the initialization and the training procedure, so we felt exploring the dynamics of such networks was outside the scope of this work.

      One point regarding biological plausibility - although the model is abstract, the fact that the MPN increases without bounds are hard to reconcile with physical processes.

      Note although the MPN expression does not have explicit bounds, in practice the exponential decay eventually does balance with the SM matrix updates, and so we observe a saturation in its size (Fig. 4c, except for the case of lamba=1.0, which is not considered elsewhere in the text). However, we explicitly added modulation bounds to the M matrix update expression and did not find it significantly changed the results (see comments on Essential Revisions above for details).

    1. Author Response

      Reviewer #2 (Public Review):

      Here I will mainly comment on the biology of adipocytes, which is my specialty.

      In this manuscript, it has been very convincingly shown that O-GlcNAc acts as an important regulator of MSC differentiation in mice, and given previous studies in which O-GlcNAc is regulated by aging and nutritional status, it makes sense that this PTM determines differentiation and BM niche.

      The point that O-GlcNAc regulates adipocyte differentiation is convincing, but there are already previous studies using 3T3-L1 (e.g., Biochemical and Biophysical Research Communications 417 (2012) 1158-1163), and a more step-by-step demonstration of the molecular mechanism would make this an excellent paper that can be extended to adipocyte research in general, not just BM.

      While O-GlcNAc has been demonstrated in regulating many aspects of metabolic physiology, our understanding of its role in adipogenesis has been limited so far. As the reviewer pointed out, there was an in vitro report on its inhibition of adipogenesis in 3T3-L1 cells (Ji et al., 2012). Two recent publications from Dr. Xiaoyong Yang’s group revealed the profound role of mature white adipocytes OGT in regulating lipolysis and obesity (Li et al., 2018; Yang et al., 2020). To my knowledge, our manuscript is the first attempt to address the regulation of adipogenesis by O-GlcNAc in vivo. While using the BMSCs as a non-conventional model, we speculate our molecular mechanisms (i.e., O-GlcNAc inhibition of C/EBPβ) could be conserved in peripheral adipose organs, including white and brown adipose tissues. Future experiments are warranted in the lab to extend the current knowledge to these adipocyte progenitors. Nonetheless, I would also like to point out that, due to the broad actions of OGT and the current lack of adipocyte progenitor specific Cre animal tools, such efforts might be futile as results can be confounded by defects in other organs/cells.

      It is somewhat unclear whether or not the authors' in vitro experiments using 10T1/2 cells accurately reflect what is happening in vivo in knockout mice. The PDGFRa+VCAM1+ population of adipocyte progenitors shown by the authors is upregulated by about 30% by knockout of Ogt (Figure 4C). How significant is this difference? Rather, might the expression of Pparg, which indicates lineage commitment, be the underlying mechanism? In any case, this manuscript is highly impactful in the sense that the differentiation of adipocytes forming the BM niche can be controlled using tissue-specific knockouts of the Ogt gene.

      We agree with the reviewer that the role of OGT in BMSC fate determination and adipogenesis might be multifaceted. The 30% increase in PDGFRa+VCAM1+ BM adipose progenitors cannot fully explain the massive adipogenesis observed in OgtΔOsx animals (Fig. 4A). Indeed, we provided in vitro evidence that genetic deletion or chemical inhibition of OGT activates adipogenesis (Fig. 4D-I). Mechanistically, we found the O-GlcNAcylation of C/EBPβ protein (but not PPARγ) is responsible in the inhibition, which leads to reduced expression of adipogenic genes, including Pparg (Fig. 4H).

    1. Author Response

      Reviewer #1 (Public Review):

      The paper states that they observed a combined total of 77,017 single-nucleotide variants (SNVs) and 12,031 insertion/deletions (In/Dels) across all tissue, age, and intervention groups. Collectively, these data represent the largest collection of somatic mtDNA mutations obtained in a single study to date. However, A study with more somatic mtDNA mutations by the LostArc method (PMID 32943091) revealed 35 million deletions (~ 470,000 unique spans) in skeletal muscle from 22 individuals with and 19 individuals without pathogenic variants in POLG. Thus, the authors should reword this part to say that this study represents the largest collections of mouse mtDNA point mutations detected, but not the largest amount of mutations (deletions exceed this number).

      Thank you for pointing this out. When we wrote that sentence, we were more referring to small polymerase-based errors, as opposed to larger structural variants that likely arise from a different mechanism. However, the distinction between these two event classes is poorly defined. We have amended our statement and have added a citation to Lujan et al. Our statement now reads “We observed a combined total of 77,017 single-nucleotide variants (SNVs) and 12,031 small insertion/deletions (In/Dels) (≲15bp in size) across all tissue, age, and intervention groups. Collectively, these data represent the largest collection of somatic mtDNA point mutations obtained in a single study to date and is second only to Lujan et al. in terms overall In/Del counts (Lujan et al., 2012).” (Lines 252-256)

      What is the theoretical limit of pt mutations in the mitochondrial genome, assuming only one pt mutation per genome? Doesn't 77000 detected independent pt mutations approach that limit? Can the authors estimate how many molecules contained two or more pt mutations? Did the analysis reveal any un-mutated regions implying an essential function? For example, on p.9 can the authors provide an explanation of why OriL and other G/C-rich regions were not uniformly covered as compared to the rest of the genome?

      This is an interesting question and one we’ve given some thought to. In fact, this basic question was the inspiration for our recent Nucleic Acids Research paper (PMC8565317) where we asked how mutations were distributed in the genome. The short answer is that we likely exceed the limit for only dG site mutations (and only for G>A mutations, at that), but not the other reference sites. The reason is that there are only 2013 dG sites and the mutation spectrum is heavily skewed toward G>X (there are 47,680 dG site mutations, 42,924 of which are G>A). In comparison, we observe only 4,421 A>X, 9,277 T>X, and 15,632 C>X mutations, but with 5,629, 4,681, and 3,976 dA, dT, and dC genomic sites, respectively. Assuming the mutations are uniformly distributed along the genome (which they are not; see our NAR paper), then random binomial sampling would require a fair amount more mutations in order to reach saturation for the other genomic sites. The uneven distribution increases this number further.

      With regard to the second question, we can’t actually do this estimation with this data set. The reason is because the ~77,000 mutations aren’t found in a single sample, but are distributed across may independent or semi-independent (i.e. different organs within a mouse), which means that most, if not all, of the mutations are necessarily on different mtDNA molecules.

      With regard to the OriL and G/C rich regions, these presumably have some sort of secondary structure that prevents the sequencer from obtaining any useful information. However, this is all speculative and we don’t know why. Interestingly, human mtDNA doesn’t show this dip at the OriL, despite a similar function and location in the mtDNA.

      Given that mitochondrial disease usually doesn't present until >60% of the genomes are affected, the very low level of detected pt mutations observed in the mouse (and presumably similar to human) would mean that they are well below a physiological level. Thus, these low-level pt mutations are well tolerated. Can the authors estimate a theoretical age of the mouse (well beyond their life span) where over 50% of the genomes carry at least one pt mutation?

      The reviewer brings up a frequent noted point in mitochondrial biology that is very much worth addressing in this manuscript. The often-cited statistic that mitochondrial disease doesn’t present until ~60% of genomes are affected is, while true, only pertinent to overt mitochondrial diseases, such as LHON, MERRF, etc, where all or nearly all cells in an individual are affected by the mutation. However, the impact of mtDNA mutations is not only contingent on how many cells have the mutation, but also the fraction of mtDNA molecules within a cell that harbor the variant. Because the deleterious effects of a mtDNA mutation act at the level of individual cells, it is important to know both how many cells harbor a mutation as well as what the heteroplasmic level is within the cell before making claims on their pathological impact.

      To date, nearly all studies on mtDNA mutations rely on bulk DNA analysis from thousands to millions of cells, which necessarily decouples variant phasing information between any two reads, resulting in a loss of important biological information such as the heteroplasmic level within any given cell. As such, with bulk sequencing it is impossible to tell the difference between a homoplasmic mutation in a small subset of cells and heteroplasmic mutation in all cells. In the first case, the cells harboring this mutation would be negatively impacted, whereas in the second example, it is unlikely. One can imagine a scenario where every cell contains a different homoplasmic pathogenic mutation which would negatively affect cellular function for every cell. In this case, mutations would be highly prevalent (100% of cells), yet individually rare. However, bulk sequencing would give the appearance that no mutation comes close to exceeding the phenotypic threshold. We highlight this issue in a recent review (Sanchez-Contreras and Kennedy, 2022; PMC8896747).

      The point that the review brings up is extremely important, so we have added a section in the discussion related to heteroplasmy versus clones.

      Also, the problem with this low level of pt mutations is that they are not physiological, the effect of the drug treatment causing a reduction in ROS-mediated transversions would not be expected to have a detectable effect on mitochondria. The improvement on mitochondrial seen by others is most likely independent of the mutations in the genome. There needs to be a cause and effect here and I don't see one.

      It is important to note that we do not make the claim (no do we want to imply) that the reduction of mutations is the reason behind the improvements in mitochondrial function by these interventions. Instead, we believe that loss of ROS-linked mutations is a consequence of the mechanism by which these interventions work. We do hypothesize that the reduction in ROS-linked mutations suggests that “there is tissue specificity in how cells repair and/or destroy oxidatively damaged mitochondria and/or mtDNA resulting in a steady-state of ROS-linked mutations.” (Lines 551-553) and that “We propose that rather than the incidence and impact of ROS damage on mtDNA being minimal, recognition and removal of ROS-linked mutations are maintained at a steady state during aging.” (Lines 572-574).

      In addition, as noted above, how “low level” these mutations are and their impact on cellular function is not easily determined in bulk sequencing studies, so a strong link between cause and effect is not an answerable relationship with this data set.

      There's no mention in this paper and methodology about how point mutations in nuclear-encoded mtDNA (NUMTs) are excluded from the reads and I'm worried that these errors are being read as rare errors in the mtDNA genome. While NUMTs have been documented for decades, a recent report in Science (PMID: 36198798) documents how frequently and fluidly NUMTs occur. Can the authors provide a clear explanation of how mutations in NUMTs are excluded?

      The reviewer is absolutely correct to call attention to this important aspect of mitochondrial biology. We don’t believe NUMTs are an important confounder in our data set for several reasons.

      1) We used isogenic inbred C57Blk6/J which, frequently, were litter mates (siblings). Therefore, any mutations from NUMTS that are there would be expected to be uniform across samples, especially between tissues from a single sample animal. Unknown and variations of NUMTS would certainly be a potentially strong confounder in an outbred population, but the use of one isogenic inbred line for this study likely eliminates this confounder.

      2) We used the mm10 reference genome which is based on the C57Blk6/J strain so any NUMTS derived variants present in our mtDNA data should preferentially align against the NUMT. Therefore, we perform a BLAST step of all reads containing at least one variant against the mm10. BLAST is much more sensitive to sequence variation compared to bwa but is far slower, so it is impractical to run as the initial aligner. We then reassign the read based to whatever genomic location has the lower e-score. The result is typically around a dozen reads are removed, demonstrating that NUMTS are not likely a major source of false mutations.

      3) Because NUMTS are inherited, then any variants would be found across all the tissues and animals we used in this study. As part of our processing, we mark and remove variants shared between multiple individual samples.

      We have made edits to the Methods section (Lines 198-206) to more explicitly highlight the filtering steps and the logic behind them. In addition, we have added a paragraph in the discussion that addresses NUMTs (Starting on line 642).

      Reviewer #2 (Public Review):

      A common problem in mutation analysis is that DNA damage (present in one strand) is difficult to separate from real mutations (present in both strands). One of the approaches to solve this problem based on independent tagging of the two strands by different unique molecular identifiers was developed by the authors about 10 years ago. This study summarizes the application of this method to a wide range of mouse tissues, ages, and drug treatment regimes. Much of the results confirm previous conclusions from this laboratory. This involves overall mutational levels of somatic mtDNA mutations (~10-6-10-5), their accumulation with age, the prevalence of GA/CT transitions, and their clonality. Although these results were not new, it is important that these were confirmed in a single study with high confidence in a huge number of independent mutations.

      We thank the reviewer for the comment and really hope this data set will be of significant use to other researchers given its breadth of sample types and large number of mutations.

      What really sets this study apart from other studies is the detection of a large proportion of transversion mutations, primarily of the C>A/G>T and C>G/G>C types. Transversions are traditionally considered 'persona non grata' in mtDNA mutational spectra and are typically associated with errors of mutational analysis (which they in fact are). The presence of these mutations in both strands of the duplex makes a good case that these mutations are real, rather than converted damage. However, because this is such a novel discovery and because regular controls do not work (I mean, for example, that these mutations never clonally expand. If there is a clonal expansion, then the mutation is real, only real mutation can expand. But in the case of non-expandable C>A/G>T and C>G/G>C this control does not help to validate these mutations), it would be nice to provide extra assurances that this is not some kind of artifact that somehow slipped through the ds sequencing procedure. I would recommend including in the supplement the data on the abundance of single-stranded base changes as detected by ds sequencing (i.e., changes confirmed in one and not in the other strand of a given molecule). An unusually high presence of such single-stranded changes of the C>A/G>T and C>G/G>C type would be a red flag for me. If ratios of single and double-stranded mutations were similar for transitions and transversions - that would reassure me and hopefully the reader.

      Furthermore, a similar excess of C>A/G>T and C>G/G>C has been observed in a recent paper by Abascal 2021 (cited in the manuscript). In that paper, a UMI- free, but otherwise very similar ds sequencing approach in nuclear DNA (BotSeqS) was demonstrated to suffer from an artifact causing (among other effects) an excess of C>A/G>T and C>G/G>C transversions. This artifact is related to end repair and nick-translation of DNA fragments during library preparation. Because BotSeqS is very similar to ds sequencing, we expect that same artifact may be taking place in the study under review. We recommend running checks similar to those undertaken by Abascal et al (which include, at the very minimum, checking the distribution of the C>A/G>T and C>G/G>C transversions within the reads (artifacts tend to be concentrated towards the ends of the reads).

      The reviewer is absolutely correct to bring up this extremely important point. We have addressed these concerns in two ways that are addressed on Lines 332-361. 1) by performing an analysis of the single-stranded consensus data, which is a measure of PCR artifacts that frequently arise as a function of DNA damage, across all the tissues of the aged cohort. We noted no differences between tissues, which indicates that the amount of ROS-induced PCR artifacts is no different between the tissues. Thus, it would require a different rate at which ROS artifacts lead to false “Duplex consensus” variants that is tissue specific. The analysis is presented in Figure 3-figure supplement 2. 2) we have included an experiment in which we show that treatment of post-fragmented DNA with FPG, a glycosylase that targets Fapy-dG and 8-oxo-dG, does not differ from untreated control DNA. Because Duplex-Seq requires that both strands of a parent DNA molecule be present to form a final Duplex Consensus Sequence, the scission of one strand by the lyase activity of FPG would prevent the formation of this final consensus and prevent this sort of error from “bleeding through”. This analyses can now be found in a Figure 3-figure supplement 3.

      Of note, even if transversions detected in this study prove to be artifacts of the Abascal type (likely) they still may reflect real ss damage in mtDNA (not instrumental artifacts, like sequencing errors or in vitro DNA damage). This is supported by the strong variation in the levels of transversions across tissues and as a result of the ameliorating drug intervention. Artifacts, in contrast, would be expected to be at a constant level. This logic, however, does not differentiate between real ds mutations and ss damage. So UMI-based ds sequencing evidence remains the only (though very strong) independent proof. So, in my view, whereas the jury may be still out on whether the observed transversions are true ds mutations or some kind of single-stranded damage, this is a critically important observation. The evidence of ss damage greatly varied between tissues and detected with such precision on a single molecule level is a very important finding as well.

      Out of caution, I would recommend mentioning the above-stated uncertainty and noting that more research is needed to fully confirm that C>A/G>T and C>G/G>C changes detected in this study are indeed double-stranded mutations.

      We agree. Together with comments from Reviewer #1 regarding NUMTs (Comment #5), we have added a paragraph in the Discussion about potential alternative explanations for our observations.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, May et al use H2B overexpression driven by Keratin14 Cre-mediated excision of a loxPstop cassette to quantify bulk chromatin dynamics in the live epidermis. They observe heterogeneity of H2B distribution within the basal stem cell layer and a change in distribution when the stem cells delaminate into the suprabasal layers. They further show that these chromatin rearrangements precede cell fate commitment, as detected by adding another Cre-mediated transgene on top (tetO-Cre mediated Keratin10 reporter). Finally, they generate an MST stem-loop transgene for the keratin 10 transcript and observe transcriptional bursting.

      We would like to clarify for the reviewer that the H2B system used is a transgenic allele of histone-2B-GFP that is driven directly by the Keratin-14 promoter (Kanda et al., 1998; Tumbar et al., 2004). This system does not rely on any Cre-mediated excision of the LoxP-stop cassette, and these mice do not carry Cre alleles. We will touch on this point below when addressing the comment on Cre expression in cells and the raised question on whether it influences the quantifications of chromatin compaction.

      The manuscript uses elegant in vivo imaging approaches to describe a set of observations that are logically based on a panel of studies that have used genetic approaches to dissect the role of heterochromatin and histone/DNA modifications in epidermal state transitions. In addition, the MST stem-loop analysis is a nice technical advance, confirming transcriptional bursting as a general phenomenon of how transcription is regulated in cells (see work from Daniel Larsson, Jonathan Chubb, Arjun Raj, and others).

      We thank the reviewer for their recognition of our contribution to the transcription field. To deepen the connection between our data and previous characterizations of transcriptional dynamics in other systems, we have added new analyses of K10MS2 transcriptional bursting on a finer temporal scale (Fig 5G-K). We find pervasive “transcriptional bursting,” consistent with findings in vitro and in other model organisms, and a surprising variation of burst durations. We believe these additional analyses significantly strengthen our conclusions and the relevance of our study to the overall transcription field.

      The value of the study in my view is recapitulating these known phenomena in a live tissue setting with high-quality imaging and careful quantification. Overall, the analyses appear thorough, although the overall changes appear relatively minor, which is perhaps to be expected from imaging bulk H2B distribution as a proxy for chromatin states.

      There is one major technical concern that might impact the interpretation of the data. The authors combine Cre lines for their key conclusions (Krt10 reporter and SRF KO) and analyze single cells that thus express very high levels of Cre. Knowing that Cre will target non-loxP sites and is genotoxic, it is possible that the effect of chromatin is due to high levels of Cre expression in single cells rather than specific effects due to cell state transitions. I would encourage the authors to carefully quantify the dose-dependent effects of the Cre protein (independent of the LoxP sites) on chromatin organization. Along these lines, is the phenotype of the SRF KO similar in the presence of two Cre alleles versus just one?

      Thank you for these kind words. This is an important potential caveat to consider. We believe that Cre activity does not significantly affect the chromatin compaction profiles for several reasons. First, we interrogated Cre activity. The quantifications in Figure 1A-E and Figure 2B-C are from mice containing K14H2B-GFP allele alone and do not carry any Cre allele. When these data were compared to those from mice that had been treated with a high dose of tamoxifen to induce Cre-mediated recombination in the vast majority of cells, the chromatin compaction profiles were not significantly different (Supp Fig 3C). We have added this comparison to Supplemental Figure 3 and addressed this point in the text (page 9). To further determine whether Cremediated recombination affects our measurement of chromatin compaction, we also analyzed adjacent basal cells with and without Cre activity in the same animal. K14H2BGFP; K14CreER; tdTomato mice were induced with a low dose of tamoxifen such that roughly 65% of epidermal cells underwent Cre recombination as demonstrated by expression of the tdTomato fluorescent reporter (Gallini et al., 2022). They also received a punch biopsy performed on the unimaged ear. Three days post injury and six days after Cre induction, the chromatin compaction profiles of cells positive and negative for Cre-mediated recombination were also not significantly different (Rebuttal Figure 1). Together, these direct comparisons between cells exposed to Cre activity and cells not exposed to Cre activity indicate that Cre activity at levels comparable to those used in our experiments has no measurable effect on our measurements of chromatin compaction.

      Rebuttal Figure 1: Effect of Cre expression on chromatin compaction profiles

      The second issue is the conclusion of "chromatin spinning". Concluding that chromatin is spinning would in my view require that the authors demonstrate that the nuclear envelope is not moving or is moving less than the chromatin. To support this conclusion the authors should do double imaging for example with LINC complex proteins, an ER/outer nuclear membrane marker, or equivalent.

      This is an excellent point. While we expect that the entire nucleus is spinning based on observations others have made in in vitro fibroblasts systems, we describe our observation as “chromatin spinning” instead of “nuclear spinning” because the K14H2B-GFP allele only allows us to directly visualize chromatin itself (Kumar et al., 2014; Zhu et al., 2018).

      Unfortunately, LINC complex proteins and nuclear membrane proteins have not been fluorescently tagged in mice, which prevents us from visualizing their dynamics in vivo. To establish these new tools and perform experiments would take more than a year, making it therefore beyond the scope of this current paper. Additionally, their relatively uniform distribution across the nuclear membrane would not allow us to visualize potential spinning of these components. We have made efforts towards the reviewer’s question by asking whether other compartments within the cell also spin in delaminating cells. To do this, we leveraged a mouse line developed by Claudio Franco’s lab (Barbacena et al., 2019), which fluorescently labels both the chromatin (H2B-GFP) and the Golgi (GTS-mCherry). As expected, this model showed a perinuclear and polarized Golgi in skin fibroblasts (Rebuttal Figure 2). However, this tool is incompatible with our questions in epidermal cells for a few reasons. First, the system is toxic to epithelial cells in vivo, resulting in apoptosis, nuclear fragmentation, and binucleate cells. Second, the Golgi is not discretely polarized (or even perinuclear) in epithelial cells (Rebuttal Figure 2). As such, although we observe chromatin spinning in delaminating basal cells, we are uncertain as to whether the whole nucleus or any other cellular compartments are spinning in these cells.

      Rebuttal Figure 2: Interrogation of intracellular spinning

      Given the above reasoning and efforts, we have altered the text and specified that we only have the capacity to visualize chromatin through the H2B-GFP allele and that we hypothesize the entire nucleus is spinning (page 11).

      Reviewer #2 (Public Review):

      In this work entitled "Live imaging reveals chromatin compaction transitions and dynamic transcriptional bursting during stem cell differentiation in vivo" the authors use a combination of genetic and imaging tools to characterize dynamic changes in chromatin compaction of cells undergoing epidermal stem cell differentiation and to relate chromatin compaction to transcriptional regulation in vivo. They track this phenomenon by imaging the epithelium at the ear of live mice, thus in a physiological context. By following individual nuclei expressing H2B-GFP along time ranges of hours and up to 3 days, they develop a strategy to quantify the profile of chromatin compaction across different epidermal layers based on normalized intensity profiles of H2B-GFP. They observe that cells belonging to the basal stem cell layer display a considerable level of internuclear variability in chromatin compaction that is cell-cycle independent. Instead, intercellular variability in chromatin compaction appears more related to the differentiation status of the cells as it is stable in the hours range but dynamic in the days range. The authors show that differentiated nuclei in the spinous layer exhibit higher chromatin compaction. They also identified a subset of cells in the basal stem layer with an intermediate profile of chromatin compaction and with the dynamic expression of the early differentiation marker keratin 10. Lastly, they show that the expression of keratin-10 precedes the chromatin compaction establishing relevant temporal relationships in the process of epidermal differentiation.

      This work includes a number of challenging approaches and techniques since it is carried out in living mice. Also, it provides nice tools and methods to study chromatin structure in vivo during multiple days and within a differentiation physiological system. On the other hand, the results are descriptive and, in some respect, expected in line with previous observations.

      Thank you very much for this great summary, kind words, and the recommendations listed below. We will address each of them specifically. We have also deepened the analysis of transcriptional dynamics in ways that are more comparable with how other groups have studied transcription and included those results in Figure 5.

      References

      Kanda, T., Sullivan, K.F., and Wahl, G.M. (1998). Histone–GFP fusion protein enables sensitive analysis of chromosome dynamics in living mammalian cells. Current Biology 8, 377–385. 10.1016/S09609822(98)70156-3.

      Tumbar, T., Guasch, G., Greco, V., Blanpain, C., Lowry, W.E., Rendl, M., and Fuchs, E. (2004). Defining the epithelial stem cell niche in skin. Science 303, 359–363. 10.1126/science.1092436.

      Kumar, A., Maitra, A., Sumit, M., Ramaswamy, S., and Shivashankar, G.V. (2014). Actomyosin contractility rotates the cell nucleus. Sci Rep 4, 3781. 10.1038/srep03781.

      Zhu, R., Liu, C., and Gundersen, G.G. (2018). Nuclear positioning in migrating fibroblasts. Seminars in Cell & Developmental Biology 82, 41–50. 10.1016/j.semcdb.2017.11.006.

      Sara Gallini, Nur-Taz Rahman, Karl Annusver, David G. Gonzalez, Sangwon Yun, Catherine Matte-Martone, Tianchi Xin, Elizabeth Lathrop, Kathleen C. Suozzi, Maria Kasper, Valentina Greco . Injury suppresses Ras cell competitive advantage through enhanced wild-type cell proliferation.<br /> bioRxiv 2022.01.05.475078; doi: https://doi.org/10.1101/2022.01.05.475078

      Pedro Barbacena, Marie Ouarné, Jody J Haigh, Francisca F Vasconcelos, Anna Pezzarossa, Claudio A Franco. GNrep mouse: A reporter mouse for front-rear cell polarity. Genesis 2019 Jun. DOI: 10.1002/dvg.23299

      Cristiana M Pineda, Sangbum Park, Kailin R Mesa, Markus Wolfel, David G Gonzalez, Ann M Haberman, Panteleimon Rompolas, Valentina Greco. Intravital imaging of hair follicle regeneration in the mouse. Nature Protocols 2015 July. DOI: 10.1038/nprot.2015.070

    1. Author Response

      Reviewer #1 (Public Review):

      Reviewer 1 confirmed the view that your paper provides new insight into YTHDC1 function in regulating SC activation/proliferation but added that some of the data could be improved to fully support the conclusions. Specifically:

      The title "Nuclear m6A Reader YTHDC1 Promotes Muscle Stem Cell Activation/Proliferation by Regulating mRNA Splicing and Nuclear Export" seems a bit overstated. Their data are not sufficient to show YTHDC1 regulating nuclear export. From figure 6 we could see some mRNAs export was inhibited upon YTHDC1 loss but intron retention also occurs on these mRNAs, for example, Dnajc14. Since intron retention could lead to mRNA nuclear retention, the mRNA export inhibition may be caused by splicing deficiency. From the data they provided we could not draw the conclusion that YTHDC1 directly affects mRNA export. I think they could not emphasize this point in the title.

      Thanks for the suggestion. It is true that in our initial submission, we had more data to support YTHDC1 regulation of mRNA splicing but not enough on nuclear export. It will take substantial amount of time and efforts to have thorough dissection on both mechanisms. Nevertheless, we argue that our data does provide evidence on YTHDC1 regulation of nuclear export. For example, in Figures 6 C, H, and M, only ~20% of the target mRNAs (such as Dnaj14) showed alteration in both splicing and export upon YTHDC1 loss while the majority of the export targets showed no splicing deficiency. For example, Btbd7 and Tiparp in Figure 6 N showed no intron retention. In addition, we have now performed Co-IP experiments to validate the interaction between YTHDC1 and THOC7 (new result added in Figure 7L), which provides extra evidence to support YTHDC1 function in regulating mRNA nuclear export. We thus would like to keep the original title in order to reflect the multifaceted function of YTHDC1 in muscle stem cells.

      The mechanism of YTHDC1 promoting muscle stem cell activation/proliferation is not solidified. The authors could strengthen their evidence through bioinformatics analysis or give more discussion. Besides, the previous work done by Zhao and colleagues (Zhao et al,. Nature 542, 475-478 (2017).) reported another m6A reader Ythdf2 promotes m6A-dependent maternal mRNA clearance to facilitate zebrafish maternal-to-zygotic transition. Does YTHDC1 regulate mRNA clearance during SC activation/proliferation? The authors should explore this possibility by deep-seq data analysis and give some discussion.

      Thanks for the critical comment. For the first concern, we think YTHDC1 promotes muscle stem cell activation/proliferation through the multi-level gene regulatory capabilities of YTHDC1 on both transcriptional and post-transcriptional processes and the myriads of targets regulated by YTHDC1. In addition, with the newly added data, we believe that YTHDC1’s function is largely dependent on its synergism with hnRNPG (Figure 7 K). We have added the discussion in lines 421-427 of the revised text. For the second question, our data showed that YTHDC1 predominantly localizes in the nucleus of SCs and myoblasts (Figure 1 F&G), thus it may not have a role in regulating mRNA clearance in the cytoplasm like YTHDF2. Nevertheless, there are a few existing reports1, 2 suggesting its possible role in mRNA degradation and stability which may arise from its transient shuttling to cytoplasm of cells. We have now added this point in lines 469-472 of the revised text.

      Reviewer #2 (Public Review):

      Reviewer 2 was similarly positive stating that several tour-de-force techniques were used to examine m6A and the biological consequence in satellite cells and that there was a large amount of data supporting the conclusions with only a few minor weaknesses.

      General points: The main body is lengthy, and some content can be reduced or condensed. For example, RNA-seq was used to determine gene expression in WT and cKO cells, but the purpose of this is not well justified given that YTHDC1 mainly functions to regulate splicing and nuclear expert of mRNA rather than controlling their expression levels. Does the RNA-seq data suggest that YTHDC1 may also regulate gene expression independent of m6A reader function?

      Thanks for the comment. We have now revised the entire text to condense the content. Nevertheless, we must point out that the purpose of the RNA-seq is to provide extra evidence for the proliferation defect of the YTHDC1 KO cells but not to search for the underlying mechanism. We have now revised in lines 159-160 to clarify this.

      Reference:

      1. Shima, H., Matsumoto, M., Ishigami, Y., Ebina, M., Muto, A., Sato, Y., Kumagai, S., Ochiai, K., Suzuki, T. & Igarashi, K. S-Adenosylmethionine Synthesis Is Regulated by Selective N(6)-Adenosine Methylation and mRNA Degradation Involving METTL16 and YTHDC1. Cell Rep 21, 3354-3363 (2017).
      2. Zhang, Z., Wang, Q., Zhao, X., Shao, L., Liu, G., Zheng, X., Xie, L., Zhang, Y., Sun, C. & Xu, R. YTHDC1 mitigates ischemic stroke by promoting Akt phosphorylation through destabilizing PTEN mRNA. Cell Death Dis 11, 977 (2020).
      3. He, P.C. & He, C. m(6) A RNA methylation: from mechanisms to therapeutic potential. EMBO J 40, e105977 (2021).
      4. Widagdo, J., Anggono, V. & Wong, J.J. The multifaceted effects of YTHDC1-mediated nuclear m(6)A recognition. Trends Genet 38, 325-332 (2022).
      5. Sheng, Y., Wei, J., Yu, F., Xu, H., Yu, C., Wu, Q., Liu, Y., Li, L., Cui, X.L., Gu, X., Shen, B., Li, W., Huang, Y., Bhaduri-Mcintosh, S., He, C. & Qian, Z. A Critical Role of Nuclear m6A Reader YTHDC1 in Leukemogenesis by Regulating MCM Complex-Mediated DNA Replication. Blood (2021).
      6. Cheng, Y., Xie, W., Pickering, B.F., Chu, K.L., Savino, A.M., Yang, X., Luo, H., Nguyen, D.T., Mo, S., Barin, E., Velleca, A., Rohwetter, T.M., Patel, D.J., Jaffrey, S.R. & Kharas, M.G. N(6)-Methyladenosine on mRNA facilitates a phase-separated nuclear body that suppresses myeloid leukemic differentiation. Cancer Cell 39, 958-972 e958 (2021).
      7. Chen, C., Liu, W., Guo, J., Liu, Y., Liu, X., Liu, J., Dou, X., Le, R., Huang, Y., Li, C., Yang, L., Kou, X., Zhao, Y., Wu, Y., Chen, J., Wang, H., Shen, B., Gao, Y. & Gao, S. Nuclear m(6)A reader YTHDC1 regulates the scaffold function of LINE1 RNA in mouse ESCs and early embryos. Protein Cell 12, 455-474 (2021).
      8. Xiao, W., Adhikari, S., Dahal, U., Chen, Y.S., Hao, Y.J., Sun, B.F., Sun, H.Y., Li, A., Ping, X.L., Lai, W.Y., Wang, X., Ma, H.L., Huang, C.M., Yang, Y., Huang, N., Jiang, G.B., Wang, H.L., Zhou, Q., Wang, X.J., Zhao, Y.L. & Yang, Y.G. Nuclear m(6)A Reader YTHDC1 Regulates mRNA Splicing. Mol Cell 61, 507-519 (2016).
      9. Webster, M.T., Manor, U., Lippincott-Schwartz, J. & Fan, C.M. Intravital Imaging Reveals Ghost Fibers as Architectural Units Guiding Myogenic Progenitors during Regeneration. Cell Stem Cell 18, 243-252 (2016).
      10. Yankova, E., Blackaby, W., Albertella, M., Rak, J., De Braekeleer, E., Tsagkogeorga, G., Pilka, E.S., Aspris, D., Leggate, D., Hendrick, A.G., Webster, N.A., Andrews, B., Fosbeary, R., Guest, P., Irigoyen, N., Eleftheriou, M., Gozdecka, M., Dias, J.M.L., Bannister, A.J., Vick, B., Jeremias, I., Vassiliou, G.S., Rausch, O., Tzelepis, K. & Kouzarides, T. Small-molecule inhibition of METTL3 as a strategy against myeloid leukaemia. Nature 593, 597-601 (2021).
      11. Otto, A., Schmidt, C., Luke, G., Allen, S., Valasek, P., Muntoni, F., Lawrence-Watt, D. & Patel, K. Canonical Wnt signalling induces satellite-cell proliferation during adult skeletal muscle regeneration. J Cell Sci 121, 2939-2950 (2008).
      12. Liu, J., Gao, M., He, J., Wu, K., Lin, S., Jin, L., Chen, Y., Liu, H., Shi, J., Wang, X., Chang, L., Lin, Y., Zhao, Y.L., Zhang, X., Zhang, M., Luo, G.Z., Wu, G., Pei, D., Wang, J., Bao, X. & Chen, J. The RNA m(6)A reader YTHDC1 silences retrotransposons and guards ES cell identity. Nature 591, 322-326 (2021).
      13. Xu, W., Li, J., He, C., Wen, J., Ma, H., Rong, B., Diao, J., Wang, L., Wang, J., Wu, F., Tan, L., Shi, Y.G., Shi, Y. & Shen, H. METTL3 regulates heterochromatin in mouse embryonic stem cells. Nature 591, 317-321 (2021).
      14. Roberson, P.A., Romero, M.A., Osburn, S.C., Mumford, P.W., Vann, C.G., Fox, C.D., McCullough, D.J., Brown, M.D. & Roberts, M.D. Skeletal muscle LINE-1 ORF1 mRNA is higher in older humans but decreases with endurance exercise and is negatively associated with higher physical activity. J Appl Physiol (1985) 127, 895-904 (2019).
      15. Mumford, P.W., Romero, M.A., Osburn, S.C., Roberson, P.A., Vann, C.G., Mobley, C.B., Brown, M.D., Kavazis, A.N., Young, K.C. & Roberts, M.D. Skeletal muscle LINE-1 retrotransposon activity is upregulated in older versus younger rats. Am J Physiol Regul Integr Comp Physiol 317, R397-R406 (2019).
    1. Author Response

      Reviewer #1 (Public Review):

      Laurent et al. generate genotyping data from 259 individuals from Cabo Verde to investigate the histories and patterns of admixture in the set of islands that make up Cabo Verde. The authors had previously studied admixture in an earlier study but in a smaller set of individuals from two cities on one island (from Santiago) in Cabo Verde. Here, the authors sample from all the islands of Cabo Verde to study admixture in these islands and reveal that there is a varied picture of admixture in that the demographic histories are distinct amongst this set of islands.

      I found the article interesting and clearly written, and I like that it highlights that admixture is a dynamic process that has manifested differently in distinct geographical regions, which will be of broad interest. It also highlights how genetic ancestry patterns are correlated with the populations that were in power/enslaved during colonial times and proposes that certain social practices (e.g. legally enforced segregation) might have affected the distribution/length of runs of homozygosity.

      We thank the reviewer for this positive and encouraging appreciation of our work.

      My main suggestion is that the authors provide a set of hypotheses regarding admixture that may explain their observations, and it would be nice to see if at least one of these has some support using simulations. Could the authors run simulations under their proposed demographic model for populations in Cabo Verde vs what we would expect in a pseudo-panmictic population with two sources of admixture? The authors probably already have simulations they could use. And then see how pre/post admixture founding events change patterns of ancestry.

      As suggested by the reviewer, in the revised version of the manuscript, we conducted the same MetHis-ABC scenario-choice and posterior parameter inference considering the 225 Cabo Verde-born individuals as a single random-mating population, in addition to our main results considering each island of birth separately. Most interestingly, we find that our ABC inferences fail to accurately reconstruct the detailed admixture history of Cabo Verde when considered as a whole instead of per each island of birth separately. This is due to admixture histories substantially differing across islands of birth of individuals, also consistent with the significantly differentiated genetic patterns within Cabo Verde obtained from ADMIXTURE, local-ancestry inferences, ROH, and isolation-by-distance analyses. These results are now implemented throughout the revised version of the manuscript and in supplementary figures and tables. See in particular Results L758-769, and Appendix1-figures and tables, Figure7-figure supplement 1-3, and Appendix 5-table 10.

      Reviewer #2 (Public Review):

      In this article, the authors leveraged patterns on the empirical genomic data and the power of simulations and statistical inferences and aimed to address a few biologically and culturally relevant questions about Cabo Verde population's admixture history during the TAST era. Specifically, the authors provided evidence on which specific African and European populations contributed to the population per island if the genetic admixture history parallels language evolution, and the best-fitting admixture scenario that answers questions on when and which continental populations admixed on which island, and how that influenced the island population dynamics since then.

      Strengths

      1) This study sets a great example of studying population history through the lens of genetics and linguistics, jointly. Historically most of the genetic studies of population history either ignored the sociocultural aspects of the evidence or poorly (or wrongly) correlated that with genetic inference. This study identified components in language that are informative about cultural mixture (strictly African-origin words versus shared European-African words), and carefully examined the statistical correlation between genetic and linguistic variation that occurred through admixture, providing a complete picture of genetic and sociocultural transformation in the Cabo Verde islands during TAST.

      We thank the reviewer for this very enthusiastic and encouraging comment on our work.

      2) The statistical analyses are carefully designed and rigorously done. I especially appreciate the careful goodness-of-fit checking and parameter error rates estimation in the ABC part, making the inference results more convincing.

      Again, we thank the reviewer for this positive comment.

      Weaknesses

      1) Most of the methods in the main analyses here were previously developed (eg. MDS, MetHis, RF/NN-ABC). However, when being introduced and applied here, the authors didn't reinstate the necessary background (strength and weakness, limitations and usage) of these methods to make them justifiable over other methods. For example, why ADS-MDS is used here to examine the genetic relationship between Cabo Verde populations and other worldwide populations, rather than classic PCA and F-statistics?

      As mentioned in the answer to the general comments, we extensively modified our manuscript in both Results and Material and Methods, to clarify and justify our reasoning for each one of the analyses conducted, and to discuss pros and cons of the methods used. We warmly thank the reviewers for this request, as we believe it allowed us to strongly improve the accessibility of our work in particular for the less specialized audience, as well as equally crucially improve replicability of our work for specialists. See in particular Results L185-193, L245-250, L368-371, L380-386, L495-511, L567-571, L606-621, and the corresponding Material and Methods sections.

      For the particular example of PCA raised by the reviewer: see Results L185-193.

      For that of F-statistics, see Results L368-386. Note that we added the F-stat analysis suggested by the reviewer to the revised version of our manuscript (see detailed answers below), Figure 3-figure supplement 2.

      We believe that these changes strongly strengthen our manuscript and enlarged its potential readership, and we thank, again, the reviewer for this request.

      2) The senior author of this paper has an earlier published article (Verdu et al. 2017 Current Biology) on the same population, using a similar set of methods and drew similar conclusions on the source of genetic and linguistic variation in Cabo Verde. Although additional samples on island levels are added here and additional analyses on admixture history were performed, half of the main messages from this paper don't seem to provide new knowledge than what we already learned from the 2017 paper.

      We substantially modified the text of the revised version of the manuscript to address the concern raised by the reviewer in numerous locations of the Abstract, Introduction and Results and Discussion sections, thus hoping to highlight better what we think is the profound novelty brought by this study. In particular, see Introduction L128-153.

      3) Furthermore, there are a few essential factors that could confound different aspects of the major analyses in this article that I believe should be taken into account and discussed. Such factors include the demographic history of source populations prior to admixture, different scenarios of the recipient population size changes, differences in recombination rates across the genome and between African and European populations, etc.

      We thank the reviewer for these comments which allowed us to improve the clarity of our manuscript and rise very interesting discussion points that we had overlooked. As indicated in part in the general answer to reviewers above:

      1) We clarified our methods’ design and discussed extensively its limitations with respect to ancestral populations’ sizes mis-specifications. Indeed, ancestral source population sizes are not modelized in our MetHis-ABC approach. Instead, we consider that the observed proxy source populations from Africa and Europe are at the drift-mutation equilibrium and are large since the initial and recent founding of Cabo Verde in the 1460’s, and thus use observed genetic variation patterns in these populations to build virtual gamete reservoirs for the admixture history of Cabo Verde with the MetHis-ABC framework. Therefore, while we cannot evaluate explicitly the influence of ancestral source population sizes differences on our inferences in Cabo Verde, as we now state in the revised version of our manuscript: “we nevertheless implicitly take the real demographic histories of these source populations into account in our simulations, as we use observed genetic patterns themselves the product of this demographic history to create the virtual source populations at the root of the admixture history of each Cabo Verdean island.”. We then discuss the outcome of such an approach which mimics satisfactorily the real data for ABC inference. See in particular the revised versions of the Material and Methods L1454-1491 novel section “Simulating the admixed population from source-populations for 60,000 independent SNPs with MetHis”, and Results L637-649.

      2) Concerning the possibilities for population-size changes in the admixed population in our simulations and ABC inferences, we clarified our Material and Methods and explanations of our Results to better show that we readily consider various possible scenarios (for each island separately). Indeed, with our MetHis simulation design, given values of model-parameters correspond either to a constant, a linearly increasing, or a hyperbolic increase in reproductive size in the admixed population over time. We further clarified our Results and Discussion pointing out that we find, a posteriori, indeed, different demographic regimes among islands.

      Nevertheless, reviewers are right that we did not test the possibility for bottlenecks. We thus substantially expanded the Results and Discussion sections in multiple locations to highlight this limitation and the challenges involved in overcoming it in future work. See in particular Material and Methods L1386-1404 section “Hyperbolic increase, linear increase, or constant reproductive population size in the admixed population”, Results L739-742, and Discussion L934-941, and Perspectives.

      3) Finally, concerning recombination rate, we considered only independent SNPs in our simulation and inference process, as is now clarified in multiple locations throughout the text. Otherwise, we further discuss matters of recombination concern regarding specifically our ROH analyses, as suggested in the detailed reviewer’s comments. In brief, we note that in Figure 8 Pemberton 2012 (AJHG 91:275-292) shows that occurrence of long ROH at the same genomic location across individuals is correlated with low recombination rates, although the effect is relatively weak unless in extreme recombination cold spots. Unless there were many extreme recombination cold spots that were different among the islands or ancestral populations, we anticipate fine-scale recombination rate differences not to matter very much for total ROH levels in these data. Similarly, we do not expect large genome-wide differences in mutation rate, and therefore we don’t anticipate minor local variation in mutation rates to make a systematic difference in total ROH levels. We now refer to these important points in the revised version of our Results L414-415.

      Overall, the paper is of interest to the field of human evolutionary genetics - that not only does it tell the story of a historically important population, but also the methodology behind this paper sets a great example for future research to study genetic and sociocultural transformations under the same framework.

      We would like to thank the reviewer for this very encouraging conclusion and for the detailed revision of our work which, we believe, helped us to substantially improve our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The heat shock effect in the drosophila lines was not understood in the study. Why did some lines show phenotypes only at 29C but not 22C? The study showed data that ubiquilin 2 expression was not impacted by 29C, then what caused the phenotypic differences? In addition, the method section did not describe clearly whether a temperature sensitive promoter was used in the flies.

      The heat inducibility of the UBQLN2 transgenes is likely attributed to heat shock elements in the UAS promoter as noted in on page 6, line 4-14. The heat inducibility of dUbqln is interesting and may reflect transcriptional and/or posttranscriptional mechanisms. While it is possible that increased UBQLN2 contributes to the severe phenotypes in UBQLN24XALS flies reared at 29C; this is not seen for UBQLN2WT and UBQLN2P497H flies. Instead, we postulate that heat stress synergizes with the misfolded UBQLN24XALS protein to disrupt proteostasis and/or endolysosomal function. This clarification has been added to paragraph 2 of the Discussion (page 16, line 15-25) section of the revised MS: “The reason for enhanced toxicity of UBQLN24XALS is unclear; however, its enhanced aggregation potential may overwhelm cellular proteostasis machinery and/or accelerate disease mechanisms that are slow to manifest in neurons harboring ALS point mutations. This is consistent with the fact that UBQLN24XALS toxicity in flies was unmasked by HS, which is a well-known inducer of proteotoxicity.” We have also explicitly state the HS inducibility of the UAS-Gal4 in the revised Materials and methods (page 20, line 24-25).

      2) The study showed data on male and female flies separately in some but not all experiments. In addition, the manuscript largely avoided discussing whether there was a sex difference in those experiments.

      We showed separate male and female eye phenotypes in Figure 1 to clearly demonstrate that UBQLN24XALS toxicity is not sex dependent. Subtle sex differences were seen in the longevity and climbing assays and were reported in figures 4A and 4D. In Figure 4D, Unc-5 silencing extended the lifespan of Elav>Gal4 female control flies but not Elav>Gal4 male control flies. In Figure 4A, an Unc-5 KK RNAi line rescued climbing of D42>UBQLN24XALS male flies, but not female flies (a second Unc-5 RNAi line rescued both males and females). The reasons for sex differences in these specific experiments is unclear.

      3) Some data appear to be peripheral with no significant contribution to the main findings. Moreover, some data were introduced but were not explained. For instance, the RNA-Seq analysis (Fig 2) did not contribute much to the study. The rescue effect of UBA* (F594A mutant) in Fig 1-Supplemental 1B was interesting but was not elaborated or followed up. FUS flies in Fig 6-Supplement 2 were abrupted introduced with little discussion.

      We understand the reviewer’s point or the reviewer’s point is well taken. Appreciating the reviewer’s comment, we moved both figures to the supplementary data.

      RNA-Seq (Fig. 2)

      Although not essential, the RNA-Seq adds experimental rigor to the study by providing strong molecular correlates to eye degeneration phenotypes across different UBQLN2 genotypes. It shows the unique toxicity of UBQLN24XALS and reinforces phenotypic similarity between UBQLN2WT and UBQLN2P497H flies, which likely reflects non-specific toxicity of overexpressed UBQLN2 proteins. We have carried out additional data analyses requested by the reviewer and moved the RNA-Seq data to Figure 1-figure supplement 2.

      UBA mutant (Figure1-figure supplement 1)

      Both aggregation and toxicity of UBQLN24XALS were abolished by an inactivating F594A mutation in the UBA domain. While this implicates Ub binding in the biochemical mechanism of UBQLN2 toxicity, we have not followed up on the finding in either fly or iMN models and have chosen to remove the data (Figure1-figure supplement 1) from the revised MS.

      Lack of genetic interaction between FUS and Unc-5 (Figure 3-figure supplement 1).

      This data was included to show that shUnc-5 is not a general suppressor of eye toxicity in Drosophila. This contrasts with lilliputian, whose mutation rescues toxicity phenotypes elicited by FUS, TDP-43, and UBQLN2. We believe that the FUS control data enhances experimental rigor and have retained the data in the revised MS, with some additional clarification on page 10, line 5-8.

      4) The main quadrupole (4XALS) mutation used in the study was not found in patients. The relevance of the findings needs to be thoroughly justified.

      The use of combinatorial mutants—either in the same gene or same pathway—can sometimes be used to enhance neurodegenerative phenotypes in cellular and rodent models for neurodegenerative diseases, most notably, Alzheimer’s Disease. In the case of the 4XALS mutant, we reasoned that its enhanced aggregation might drive stronger phenotypes than those elicited by UBQLN2 clinical alleles, whose toxicity is barely discernible in flies (relative to overexpressed UBQLN2WT) or in iMNs. We have clarified the rationale for testing the 4XALS mutant and articulated its potential strengths and weaknesses in Results (page 5, line 14-page 6, line 2) and Discussion (page 16, line 15-25) sections.

      5) ALS and FTD are age-related neurodegenerative diseases, whereas the involvement of axon guidance genes in indicative of disruptions during the developmental stage. The manuscript did not discuss this potential caveat.

      We have inserted the following sentence in the discussion to note this caveat: “Consistent with this notion, UNC5B has been linked to neurodegeneration in the 6-OHDA model of Parkinson’s Disease (PD) and UNC5C has been nominated as a risk allele in late-onset Alzheimer’s Disease. Defining the contributions of pathologic UNC5 signaling to the development or progression of ALS-dementia awaits further study.” on Page 20, line 2-6. We have added a similar sentence to the Limitations paragraph at the end of the Discussion: “Third, it is possible that axon guidance genes are only relevant to UBQLN2 toxicity in the context of the developing nervous system”.

    1. Author Response

      Reviewer #1 (Public Review):

      This work describes a new method, Proteinfer, which uses dilated neural networks to predict protein function, using EC terms and GO terms. The software is fast and the server-side performance is fast and reliable. The method is very clearly described. However, it is hard to judge the accuracy of this method based on the current manuscript, and some more work is needed to do so.

      I would like to address the following statement by the authors: (p3, left column): "We focus on Swiss Prot to ensure that our models learn from human-curated labels, rather than labels generated by electronic annotation".

      There is a subtle but important point to be made here: while SwissProt (SP) entries are human-curated, they might still have their function annotated ("labeled") electronically only. The SP entry comprises the sequence, source organism, paper(s) (if any), annotations, cross-references, etc. A validated entry does not mean that the annotation was necessarily validated manually: but rather that there is a paper backing the veracity of the sequence itself, and that it is not an automatic generation from a genome project.

      Example: 009L_FRG3G is a reviewed entry, and has four function annotations, all generated by BLAST, with an IEA (inferred by electronic annotation) evidence code. Most GO annotations in SwissProt are generated that way: a reviewed Swissprot entry, unlike what the authors imply, does not guarantee that the function annotation was made by non-electronic means. If the authors would like to use non-electronic annotations for functional labels, they should use those that are annotated with the GO experimental evidence codes (or, at the very least, not exclusively annotated with IEA). Therefore, most of the annotations in the authors' gold standard protein annotations are simply generated by BLAST and not reviewed by a person. Essentially the authors are comparing predictions with predictions, or at least not taking care not to do so. This is an important point that the authors need to address since there is no apparent gold standard they are using.

      The above statement is relevant to GO. But since EC is mapped 1:1 to GO molecular function ontology (as a subset, there are many terms in GO MFO that are not enzymes of course), the authors can easily apply this to EC-based entries as well.

      This may explain why, in Figure S8(b), BLAST retains such a high and even plateau of the precision-recall curve: BLAST hits are used throughout as gold-standard, and therefore BLAST performs so well. This is in contrast, say to CAFA assessments which use as a gold standard only those proteins which have experimental GO evidence codes, and therefore BLAST performs much poorer upon assessment.

      We thank the reviewer for this point. We regret if we gave the impression that our training data derives exclusively, or even primarily, from direct experiments on the amino acid sequences in question. We had attempted to address this point in the discussion with this section:

      "On the other hand, many entries come from experts applying existing computational methods, including BLAST and HMM-based approaches, to identify protein function. Therefore, the data may be enriched for sequences with functions that are easily ascribable using these techniques which could limit the ability to estimate the added value of using an alternative alignment-free tool. An idealised dataset would involved training only on those sequences that have themselves been experimentally characterized, but at present too little data exists than would be needed for a fully supervised deep-learning approach."

      We have now added a sentence in the early sentence of of the manuscript reinforcing this point:

      "Despite its curated nature, SwissProt contains many proteins annotated only on the basis of electronic tools."

      We have also removed the phrase "rather than labels generated by a computational annotation pipeline" because we acknowledge that this could be read to imply that computational approaches are not used at all for SwissProt which would not be correct.

      While we agree that SwissProt contains many entries inferred via electronic means, we nevertheless think its curated nature makes an important difference. Curators as far as possible reconcile all known data for a protein, often looking for the presence of key residues in the active sites. There are proteins where electronic annotation would suggest functions in direct contradiction to experimental data, which are avoided due to this curation process. As one example, UniProt entry Q76NQ1 contains a rhomboid-like domain typically found in rhomboid proteases (IPR022764) and therefore inputting it into InterProScan results in a prediction of peptidase activity (GO:0004252). However this is in fact an inactive protein, as discovered by experiment, and so is not annotated with this activity in SwissProt. ProteInfer successfully avoids predicting peptidase activity as a result of this curated training data. (For transparency, ProteInfer is by no means perfect on this point: there are also cases in which UniProt curators have annotated single proteins as inactive but ProteInfer has not learnt this relationship, due to similar sequences which remain active).

      We had also attempted to address this point by comparing with phenotypes seen in a specific high-throughput experimental assay ("Comparison to experimental data" section).

      We have now added a new analysis in which we assess the recall of GO terms while excluding IEA annotation codes. We find that at the threshold that maximises F1 score in the full analysis, our approach is able to recall 60-75% (depending on ontology) of annotations. Inferring precision is challenging due to the fact that only a very small proportion of the possible function*gene combinations have in fact been tested, making it difficult to distinguish a true negative from a false negative.

      "We also tested how well our trained model was able to recall the subset of GO term annotations which are not associated with the "inferred from electronic annotation" (IEA) evidence code, indicating either experimental work or more intensely-curated evidence. We found that at the threshold that maximised F1 score for overall prediction, 75% of molecular function annotations could be successfully recalled, 61% of cellular component annotations, and 60% of biological process annotations."

      Pooling GO DAGs together: It is unclear how the authors generate performance data over GO as a whole. GO is really 3 disjoint DAGs (molecular function ontology or MFO, Biological Process or BPO, Cellular component or CCO). Any assessment of performance should be over each DAG separately, to make biological sense. Pooling together the three GO DAGs which describe completely different aspects of the function is not informative. Interestingly enough, in the browser applications, the GO DAG results are distinctly separated into the respective DAGs.

      Thank you for this suggestion. To answer the question of how we were previously generating performance data: this was simply by treating all terms equivalently, regardless of their ontology.

      We agree that it would be helpful to the reader to split out results by ontology type, especially given clear differences in performance.

      We now provide PR-curve graphs split by ontology type.

      We have also added the following text:

      "The same trends for the relative performance of different approaches were seen for each of the direct-acyclic graphs that make up the GO ontology (biological process, cellular component and molecular function), but there were substantial differences in absolute performance (Fig S10). Performance was highest for molecular function (max F1: 0.94), followed by biological process (max F1:0.86) and then cellular component (max F1:0.84)."

      Figure 3 and lack of baseline methods: the text refers to Figures 3A and 3B, but I could only see one figure with no panels. Is there an error here? It is not possible at this point to talk about the results in this figure as described. It looks like Figure 3A is missing, with Fmax scores. In any case, Figure 3(b?) has precision-recall curves showing the performance of predictions is the highest on Isomerases and lowest in hydrolases. It is hard to tell the Fmax values, but they seem reasonably high. However, there is no comparison with a baseline method such as BLAST or Naive, and those should be inserted. It is important to compare Proteinfer with these baseline methods to answer the following questions: (1) Does Proteinfer perform better than the go-to method of choice for most biologists? (2) does it perform better than what is expected given the frequency of these terms in the dataset? For an explanation of the Naive method which answers the latter question, see: ( https://www.nature.com/articles/nmeth.2340 )

      We apologise for the errors in figure referencing in the text here. This emerged in part from the two versions of text required to support an interactive and legacy PDF version. We had provided baseline comparisons with BLAST in Fig. 5 of the interactive version (correctly referenced in the interactive version) and in Fig. S7 of the PDF version (incorrectly referenced as Fig 3B).

      We have now moved the key panel of Fig S7 to the main-text of the PDF version (new Fig 3B), as suggested also by the editor, and updated the figure referencing appropriately. We have also added a Naive frequency-count based baseline. This baseline would not appear in Fig 3B due to axis truncation, but is shown in a supplemental figure, new Fig S9. We thank the reviewer and the editor for raising these points.

      Reviewer #2 (Public Review):

      In this paper, Sanderson et al. describe a convolutional neural network that predicts protein domains directly from amino acid sequences. They train this model with manually curated sequences from the Swiss-Prot database to predict Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. This paper builds on previous work by this group, where they trained a separate neural network to recognize each known protein domain. Here, they train one convolutional neural network to identify enzymatic functions or GO terms. They discuss how this change can deal with protein domains that frequently co-occur and more efficiently handle proteins of different lengths. The tool, ProteInfer, adds a useful new tool for computational analysis of proteins that complements existing methods like BLAST and Pfam.

      The authors make three claims:

      1) "ProteInfer models reproduce curator decisions for a variety of functional properties across sequences distant from the training data"

      This claim is well supported by the data presented in the paper. The authors compare the precision-recall curves of four model variations. The authors focus their training on the maximum F1 statistic of the precision-recall curve. Using precision-recall curves is appropriate for this kind of problem.

      2) "Attribution analysis shows that the predictions are driven by relevant regions of each protein sequence".

      This claim is very well supported by the data and particularly well illustrated by Figure 4. The examples on the interactive website are also very nice. This section is a substantial innovation of this method. It shows the value of scanning for multiple functions at the same time and the value of being able to scan proteins of any length.

      3) "ProteInfer models create a generalised mapping between sequence space and the space of protein functions, which is useful for tasks other than those for which the models were trained."

      This claim is also well supported. The print version of the figure is really clear, and the interactive version is even better. It is a clever use of UMAP representations to look at the abstract last layer of the network. It was very nice how each sub-functional class clustered.

      The interactive website was very easy to use with a good user interface. I expect will be accessible to experimental and computational biologists.

      The manuscript has many strengths. The main text is clearly written, with high-level descriptions of the modeling. I initially printed and read the static PDF version of the paper. The interactive form is much more fun to read because of the ability to analyze my favorite proteins and zoom in on their figures (e.g. Figure 8). The new Figure 1 motivates the work nicely. The website has an excellent interactive graphic showing how the number of layers in the network and the kernel size change how data is pooled across residues. I will use this tool in my teaching.

      We are grateful for these comments. We are excited that the reviewer hopes to use this figure for teaching, which is exactly the sort of impact we hoped for this interactive manuscript. We agree that the interactive manuscript is by far the most compelling version of this work.

      The manuscript has only minor weaknesses. It was not clear if the interactive model on the website was the Single CNN model or the Ensemble CNN model.

      We thank the reviewer for pointing out the ambiguity here. The model shown on the website is a Single CNN model, and is chosen with hyperparameters that achieve good performance whilst being readily downloadable to the user's machine for this demonstration without use of excessive bandwidth. We have added additional sentences to address this better in the manuscript.

      " When the user loads the tool, lightweight EC (5MB) and GO model (7MB) prediction models are downloaded and all predictions are then performed locally, with query sequences never leaving the user's computer. We selected the hyperparameters for these lightweight models by performing a tuning study in which we filtered results by the size of the model's parameters and then selected the best performing models. This approach uses a single neural network, rather than an ensemble. Inference in the browser for a 1500 amino-acid sequence takes < 1.5 seconds for both models "

      Overall, ProteInfer will be a very useful resource for a broad user base. The analysis of the 171 new proteins in Figure 7 was particularly compelling and serves as a great example of the utility and power of ProteInfer. It completes leading tools in a very valuable way. I anticipate adding it to my standard analysis workflows. The data and code are publicly available.

      Reviewer #3 (Public Review):

      In this work, the authors employ a deep convolutional neural network approach to map protein sequence to function. The rationales are that (i) once trained, the neural network would offer fast predictions for new sequences, facilitating exploration and discovery without the need for extensive computational resources, (ii) that the embedding of protein sequences in a fixed-dimensional space would allow potential analyses and interpretation of sequence-function relationships across proteins, and (iii) predicting protein function in a way that is different from alignment-based approaches could lead to new insights or superior performance, at least in certain regimes, thereby complementing existing approaches. I believe the authors demonstrate i and iii convincingly, whereas ii was left open-ended.

      A strength of the work is showing that the trained CNNs perform generally on par with existing alignment based-methods such as BLASTp, with a precision-recall tradeoff that differs from BLASTp. Because the method is more precise at lower recall values, whereas BLASTp has higher recall at lower precision values, it is indeed a good complement to BLASTp, as demonstrated by the top performance of the ensemble approach containing both methods.

      Another strength of the work is its emphasis on usability and interpretability, as demonstrated in the graphical interface, use of class activation mapping for sub-sequence attribution, and the analysis of hierarchical functional clustering when projecting the high-dimensional embedding into UMAP projections.

      We thank the reviewer for highlighting these points.

      However, a main weakness is the premise that this approach is new. For example, the authors claim that existing deep learning "models cannot infer functional annotation for full-length protein sequences." However, as the proposed method is a straightforward deep neural network implementation, there have been other very similar approaches published for protein function prediction. For example, Cai, Wang, and Deng, Frontiers in Bioengineering and Biotechnology (2020), the latter also being a CNN approach. As such, it is difficult to assess how this approach differs from or builds on previous work.

      We agree that there has been a great deal of exciting work looking at the application of deep learning to protein sequences. Our core code has been publicly available on GitHub since April 2019 , and our preprint has now been available for more than a year. We regret the time taken to release a manuscript and for it to reach review: this was in part due to the SARS-CoV-2 pandemic, which the first author was heavily involved in the scientific response to. Nevertheless, we believe that our work has a number of important features that distinguish it from much other work in this space.

      ● We train across the entire GO ontology. In the paper referenced by the reviewer, training is with 491 BP terms, 321 MF terms, and 240 CC terms. In contrast, we train with a vocabulary of 32,102 GO labels, and the majority of these are predicted at least once in our test set. ● We use a dilated convolutional approach. In the referenced paper the network used is instead of fixed dimensions. Such an approach means there is an upper limit on how large a protein can be input into the model, and also means that this maximum length defines the computational resources used for every protein, including much smaller ones. In contrast, our dilated network scales to any size of protein, but when used with smaller input sequences it performs only the calculations needed for this size of sequence.

      ● We use class-activation mapping to determine regions of a protein responsible for predictions, and therefore potentially involved in specific functions.

      ● We provide a TensorFlow.JS implementation of our approach that allows lightweight models to be tested without any downloads

      ● We provide a command-line tool that provides easy access to full models.

      We have made some changes to bring out these points more clearly in the text:

      "Since natural protein sequences can vary in length by at least three orders of magnitude, this pooling is advantageous because it allows our model to accommodate sequences of arbitrary length without imposing restrictive modeling assumptions or computational burdens that scale with sequence length. In contrast, many previous approaches operate on fixed sequence lengths: these techniques are unable to make predictions for proteins larger than this sequence length, and use unnecessary resources when employed on smaller proteins."

      We have added a table that sets out the vocabulary sizes used in our work (5,134 for EC and 32,109 for GO):

      "Gene Ontology (GO) terms describe important protein functional properties, with 32,109 such terms in Swiss-Pr ot (Table S6) that cov er the molecular functions of proteins (e.g. DNA-binding, amylase activity), the biological processes they are involved in (e.g. DNA replication, meiosis), and the cellular components to which they localise (e.g. mitochondrion, cytosol)."

      A second weakness is that it was not clear what new insights the UMAP projections of the sequence embedding could offer. For example, the authors mention that "a generalized mapping between sequence space and the space of protein functions...is useful for tasks other than those for which the models were trained." However, such tasks were not explicitly explained. The hierarchical clustering of enzymatic proteins shown in Fig. 5 and the clustering of non-enzymatic proteins in Fig. 6 are consistent with the expectation of separability in the high-dimensional embedding space that would be necessary for good CNN performance (although the sub-groups are sometimes not well-separated. For example, only the second level and leaf level are well-separated in the enzyme classification UMAP hierarchy). Therefore, the value-added of the UMAP representation should be something like using these plots to gain insight into a family or sub-family of enzymes.

      We thank the reviewer for highlighting this point. There are two types of embedding which we discuss in the paper. The first is the high-dimensional representation of the protein that the neural network constructs as part of the prediction process. This is the embedding we feel is most useful for downstream applications, and we discuss a specific example of training the EC-number network to recognise membrane proteins (a property on which it was not trained): "To quantitatively measure whether these embeddings capture the function of non-enzyme proteins, we trained a simple random forest classification model that used these embeddings to predict whether a protein was annotated with the intrinsic component of membrane GO term. We trained on a small set of non-enzymes containing 518 membrane proteins, and evaluated on the rest of the examples. This simple model achieved a precision of 97% and recall of 60% for an F1 score of 0.74. Model training and data-labelling took around 15 seconds. This demonstrates the power of embeddings to simplify other studies with limited labeled data, as has been observed in recent work (43, 72)."

      As the reviewer points out, there is a second embedding created by compressing this high-dimensional down to two dimensions using UMAP. This embedding can also be useful for understanding the properties seen by the network, for example the GO term s highlighted in Fig. 7 , but in general it will contain less information than the higher-dimensional embedding.

      The clear presentation, ease of use, and computationally accessible downstream analytics of this work make it of broad utility to the field.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Kschonsak et al. describes the rational structure-based design of novel hybrid inhibitors targeting human Nav1.7 channel. CryoEM structure of arylsulfonamide (GNE-3565) - VSD4 NaV1.7-NaVPas channel complex confirmed binding pose observed in x-ray structure GX-936 - VSD4 Nav1.7-NavAb channel. Remarkably, cryoEM structure of acylsulfonamide (GDC-0310) - VSD4 NaV1.7-NaVPas channel complex revealed a novel binding pocket between the S3 and S4 helices, with the S3 segment adopting a distinct conformation compared to the arylsulfonamide (GNE-3565) - VSD4 NaV1.7-NaVPas channel complex. Creatively, the authors designed a novel class of hybrid inhibitors that simultaneously occupy both the aryl- and acylsulfonamide binding pockets. This study underscores the power of structure-guided drug design to target transmembrane proteins and will be useful to develop safer and more effective therapeutics.

      We thank this Reviewer for the very positive feedback and for highlighting the importance of our work in utilizing structure-based drug design to target key membrane targets.

      Reviewer #2 (Public Review):

      In this manuscript, the authors identify a critical unmet need for the (structure-based) drug design of human Nav channels, which are of clinical interest. They cleverly rationalized a hybrid strategy for developing target-specific small molecule inhibitors, which integrate binding mechanisms of two drug candidates that act orthogonally on the VSD4 of Nav 1.7. Thus, the authors illustrate a promising outlook on pharmaceutical intervention on Nav channels.

      Overall, the cryo-EM structures of the ligand-bound Nav channels are convincing, with a clear indication of the site-specific, distinct density of the small molecules. At the moment, it is difficult to tell how innovative the pipeline is compared to conventional cryo-EM structure determination.

      We thank this Reviewer for this positive comments and for the very helpful suggestions. We are addressing the concerns regarding our cryoEM pipeline.

      Reviewer #3 (Public Review):

      This is an excellent manuscript, describing a few lines of discoveries:

      1. Establishment of a structural biological pipeline for iterative structural determination of an engineered Nav1.7;

      2. Illumination of the novel compound binding mode;

      3. Structure-based development of the hybrid compounds, which led to the novel Nav1.7 inhibitor;

      The cryo-EM study on the engineered Nav1.7 consistently reveals the map at the mid to low 2 Å range, which is unprecedented and impressive, thus, demonstrating the high value of this workflow. The further strength of this study is that the authors were able to develop a new compound by combining structural information gained from the two Nav1.7 structures complexed to two different compounds with different binding modes. Overall, the depth and quality of this study are excellent.

      We thank this Reviewer for highlighting the importance of this manuscript and specifically recognizing our accomplishments in enabling iterative high-resolution structure for this target which allowed us to perform SBDD and design a new series of hybrid compounds. We are also grateful for indicating the excellence of our studies.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, McQuate et al. use serial block face SEM to provide a high resolution, 3D analysis of mitochondrial structure in hair cells and surrounding supporting cells of the zebrafish lateral line. They first demonstrate that hair cells have a higher mitochondrial volume as compared to supporting cells, which likely reflects the high metabolic load of these sensory cells. Their deeper analysis of mitochondrial morphology in hair cells reveals that the base of the hair cell - near the presynapse is dominated by a large, networked mitochondrion, while the apex of the cell is dominated by many small mitochondria. By examining hair cells at different stages of development, the authors show that specialized features of hair cell mitochondria are gradually established over the course of development. Finally, by examining hair cells in mutants that lack mechanosensation or presynaptic calcium responses, McQuate et al. reveal that cellular activity contributes to the development of appropriate mitochondrial morphology and localization within hair cells. This dataset, which will be made publicly available, is an immense resource to the community and will facilitate the generation of novel hypotheses about hair cell mitochondrial function in health and disease.

      Strengths:

      1. The painstaking acquisition and analysis of hair cell EM data in a genetically tractable system that is easily accessible for in vivo functional experiments to address hypotheses that emerge from this work.

      2. The use of multiple datasets and analysis methods to cross-validate results.

      3. The thoughtful, careful analysis of the data highlights the richness of the dataset.

      4. The use of both wild-type and mutant animals substantially adds to the manuscript, providing significantly more insight than wild-type data alone.

      Weaknesses:

      1. The manuscript could more strongly highlight the utility of this dataset and facilitate its future use by providing a summary table that lists each sample together with salient details.

      2. The authors examine an opa-1 mutant with altered mitochondrial fission (which consequently has changes in mitochondrial morphology and organization) to suggest that aberrant mitochondrial architecture negatively impacts mitochondrial function. However, mitochondrial fusion is thought to be critical for mitochondrial health beyond just altered architecture. Because fusion has other roles, it is difficult to use this manipulation to conclude that it is simply disruptions in mitochondrial architecture that alters function.

      3. Although the work of acquiring and reconstructing EM data is labor-intensive, ideally, multiple fish would be examined for each genotype. Readers should take into consideration that one of the mutant datasets is derived from just one animal.

      We thank Reviewer 1 for pointing out the “painstaking acquisition” that went into this study, the “thoughtful, careful analysis,” and the “richness of the dataset.” We believe we have addressed the aforementioned weaknesses.

      Reviewer #2 (Public Review):

      Sensory hair cells have high metabolic demands and rely on mitochondria to provide energy as well as regulate homeostatic levels of intracellular calcium. Using high-resolution serial block face SEM, the authors examined the influences of both developmental age and hair cell activity on hair cell mitochondrial morphology. They show that hair cell mitochondria develop a regionally specific architecture, with the highest volume mitochondria localized to the basolateral presynaptic region of hair cells. Data obtained from mutants lacking either mechanotransduction or presynaptic calcium influx provide evidence that hair cell activity shapes regional mitochondrial morphology. These observed specializations in mitochondrial morphology may play an important role in mitochondrial function, as mutants showing disrupted hair cell mitochondrial architecture showed depolarized mitochondrial potentials and impaired evoked mitochondrial calcium influx.

      This work provides novel and intriguing evidence that mechanotransduction and presynaptic calcium influx play important roles in shaping subcellular mitochondrial morphology in sensory hair cells. Yet there was a lack of consistency in the analysis and presentation of the data which made it difficult to contextualize and interpret the results. This study would be greatly strengthened by i) consistent definitions for hair cell maturation, ii) comparable data analysis of cav1.3a mutant and cdh23 mutant mitochondrial morphologies, and iii) more detailed descriptions and interpretations of the UMAP analysis.

      We thank Reviewer #2 for thinking the work is “novel and intriguing”. We have addressed the weaknesses raised.

      Reviewer #3 (Public Review):

      McQuate et al have succeeded in reconstructing 3D images of mitochondria and discovered unique structural features of mitochondria in zebrafish hair cells. Compared to the other cell types, such as central and peripheral support cells, Hair cells have many elongated and connected mitochondria and they seem to be involved in hair cell and ribbon synapses development. These findings will contribute to understanding the mechanisms for mitochondrial network regulation.

      Using the SBFSEM technique, the authors provide clear 3D images of hair cells and the technique improves the resolution of the image to understand the structural parameters of not only mitochondria but also ribbon synapses compared to typical fluorescent imaging. These results are very attractive and have the high potential to broadly apply to 3D imaging of any type of organelles, cells, and tissues. On the other hand, however, the authors provide the data from a small sample size, and the functional experiments to make a conclusion are lacking. Some missing representative images and the nonunified methods of grouping for the analysis make the reviewer concerned.

      We thank the Reviewer for thinking the results are “very attractive and have the high potential to broadly apply to 3D imaging of any type or organelles, cell, and tissues.” We agree. We have addressed the weaknesses raised

    1. Author Response

      Reviewer #1 (Public Review):

      The article from Dumoux et al. shows the use of plasma-based focused ion beams for volume imaging on cryo-preserved samples. This exciting application can potentially increase the throughput and quality of the data acquired through serial FIB-SEM tomography on cryo-preserved and unstained biological samples. The article is well-written, and it is easy to follow. I like the structure and the experimental description, but I miss some points in the analyses, without which the conclusions are not adequately supported.

      The authors state the following: "the application of serial FIB/SEM imaging of non-stained cryogenic biological samples is limited due to low contrast, curtaining, and charging artefacts. We address these challenges using a cryogenic plasma FIB/SEM (cryo-pFIB/SEM)".

      Reading the article, I do not find that the challenges are addressed; it appears that some of these are evaluated when the samples are prepared using plasma-based beams. To support the fact that charging, contrast, and curtaining are addressed, a comparison should be made with the current state of the art, or it is otherwise impossible to determine whether these systems bring any advantage.

      Charging is an issue that is not described in detail, nor has it been adequately analysed. The effect of using plasma beams is independent of the presented algorithm for charging suppression, which is purely image processing based, although very interesting. Given that the focus of the work is on introducing the benefit of using plasma ion beams (from the title) and given that a great deal of data is presented on the effect of the multiple ion sources, one would expect to have comparable images acquired after the surfaces have been prepared with the different beams. This should also be compared against the current state-of-the-art (gallium) to provide a baseline for different beams' benefits. I realise that this requires access to another microscope and that this also imposes controls on the detector responses on each instrument to have a normalised analysis. Still, it also provides the opportunity to quantify the benefits of each instrumentation.

      We have provided a response to the charging comments outlined here in the main rebuttal above. The SEM we used in this study was selected based on its optimal performance at low electron voltages due to its immersion field. The low kV capability is particularly of interest in the case of charging (cross over energy). There is the possibility the interaction of the sample surface with chemically inert or reactive ion species could change the surface potential (either positively or negatively). The Vero cells imaged during a serial pFIB/SEM using nitrogen plasma still exhibit charging as well as the argon plasma we canonically used, suggesting that charging is ion beam independent.

      Regarding Gallium, this would require prolonged access to another very bespoke microscope for a like-for-like comparison, and indeed there are studies (e.g. Schertel et al. 2013 and Scher et al, 2021) that show SEM data of cryogenic sample surfaces milled with gallium. Therefore, we consider such a study outside of the scope of this manuscript.

      The curtaining scores. This is a good way to explain the problem, though a few aspects need to be validated. For example, curtains appear over time when milling, and it would be useful to understand how different sources behave over time in FIB/SEM tomography sessions. The score is currently done from individual windows milled, which gives a good indication of the performance. However, it would make sense to check that the behaviour remains identical in an imaging setting and with the moving milling windows (or lines). This will show the counteracting effect to the redeposition and etching effect reported when imaging with the E-beam the milled face.

      Please see our response in the main rebuttal points.

      No detail about the milling resolution has been reported. Since different currents and beams have different cross-sections, it is expected to affect the z-resolution achievable during an imaging session. It would be useful to have a description of the beam cross-sections at the various conditions used and how or whether these interfere with the preparation.

      Please see our response in the main rebuttal points.

      Contrast. No analysis of plasma FIBs' benefits on image contrast compared to the current state of the art has been provided. Measuring contrast is complex, especially when this value can change in response to the detector settings. Still, attempts can be made to quantify it through the FRC and through the analysis of the image MTF (amplitude and fall off), given that membranes are the only most prominent and visible features in cryoFIB/SEM images of biological samples.

      We agree that measuring contrast is complex, and therefore the following parameters as stated on page 6, line 6 to 7 were kept consistent throughout data collection: voltage, current, line integration, exposure, detectors voltage offset and gain. We also decided to keep constant or vary the working distance (focus) in Figure 4 and compared the FRC as well as the contrast. As discussed above, a like-for-like comparison with the state of the art (gallium) is not currently possible, making this experiment/analysis outside the scope of this manuscript.

      Figure S4 points out that electrons that hit the sample at normal incidence give better signal/contrast or imaging quality than when the sample is imaged at a tilt. This fact is expected to significantly affect large areas as the collection efficiency will vary across the sample, particularly as regions get further away from the optimal location. The dynamic focusing option available on all SEM will compensate for the focal change but not the collection efficiency. Even though this is a fact, the authors show a loss of resolution, which is not explained by the tilt itself. In particular, the generation of secondary electrons is known to increase with the increased tilt, and to consider that the curtains (that are the prominent feature on the surface) are running along the tilt direction, it would be expected to see no contrast difference between the background and the edge of each curtain as the generation of secondary electrons will increase with tilt for both the edges and the background. Therefore, the contrast should be invariant, at least on the curtains.

      Looking at the images presented in the figure, they appear astigmatic and not properly focused when imaged at a tilt. As evidence of this claim, the cellular features do not measure the same, and the sharpness of the edge of the curtains is gone when tilted. This experience comes from improper astigmatism correction, which in turn, in scanning systems, leads to the impossibility of focusing. The tilt correction provides not only dynamic focusing but also corrects for the anisotropy in the sampling due to the tilt. If all imaging is set up correctly, the two images should show the imaged features with the exact sizes regardless of the resolution (which, in the presented case, is sufficient), and the sharpness of the curtain edges should be invariant regardless of the tilt, at least while or where in focus. Only at that point, the comparison will be fair.

      Please see our response in the main rebuttal points.

      Finally, the resolution measurements presented in the last supplementary figures have no impact or relation to the use of plasma FIB/SEM. It is an effect related to the imaging conditions used in the SEM regardless of the ion beam nature. The distribution of the resolution within images appears predominantly linked to local charging and the local sample composition (from fig8). Given the focus is aimed at introducing or presenting the use of the plasma-based beams the results should be presented in that optic in mind with a comparison between beams.

      This figure is to present the absence of degradation in image quality over the dataset. As the stage is moving during the imaging at 90 it would be possible for the focus to be lost throughout a longer data acquisition session. However, this figure demonstrates that the focus is well adjusted throughout the data acquisition. We also considered potential beam damage accumulation which does not seem to be detectable with our method.

      Reviewer #2 (Public Review):

      The authors present a manuscript highlighting recent advancements in cryo-focused ion beam/scanning electron microscopy (cryo-FIB) using plasma ion sources as an alternative to positively-charged gallium sources for cryo-FIB milling and volumetric SEM (cryo-FIB/SEM) imaging. The authors benchmark several sources of plasma and determine argon gas is the most suitable source for reducing undesirable curtaining effects during milling. The authors demonstrate that milling with an argon source enables volumetric imaging of vitrified cells and tissue with sufficient contrast to gleam biological insight into the spatial localization of organelles and large macromolecular complexes in both vitrified human cells and in high-pressure frozen mouse brain tissue slices. The authors also show that altering the sample angle from 52 to 90 degrees relative to the SEM beam enhances the contrast and resolution of biological features imaged within the vitrified samples. Importantly, the authors also demonstrate that the resolution of SEM images after serial milling with argon and nitrogen plasma sources does not appear to significantly affect resolution, suggesting that resolution does not vary over an acquisition series. Finally, the authors test and apply a neural network-based approach for mitigating image artifacts caused by charging due to SEM imaging of biological features with high lipid content, such as lipid droplets in yeast, thereby increasing the clarity and interpretability of images of samples susceptible to charging.

      Strengths and Weaknesses:

      The authors do a fantastic job demonstrating the utility of plasma sources for increased contrast of biological features for cryo-FIB/SEM images. However, they do not specifically address the lingering question of whether or not it is possible to use this plasma source cryo-FIB/SEM volumetric imaging for the specific application of localizing features for downstream cryo-ET imaging and structural analyses. As a reader, I was left wondering whether this technique is ideally suited solely for volumetric imaging of cryogenic samples, or if it can be incorporated as a step in the cellular cryo-ET workflow for localization and perhaps structure determination. Another biorxiv paper (doi.org/10.1101/2022.08.01.502333) from the same group establishes a plasma cryo-FIB milling workflow to generate lamella of sufficient quality to elucidate sub-nanometer reconstructions of cellular ribosomes. However, I anticipate the real impact on the field will be from the synergistic benefits of combining both approaches of volumetric cryo-FIB/SEM imaging to localize regions of interest and cryo-ET imaging for high-resolution structural analyses.

      Additional experiments were undertaken to demonstrate that serial cryo pFIB/SEM can be used in a variety of correlative imaging workflows, including follow-on cryoET. However, we have yet to carefully determine the consequences for downstream high spatial frequencies of such imaging modalities e.g., for sub volume averaging. The role of the SEM imaging, ion beam damage, etc has yet to be analysed or optimised in detail. This work is outside of the scope of this manuscript.

      Another weakness is the lack of demonstration that the contrast gained from plasma cryo-FIB/SEM is sufficient to apply neural network-based approaches for automated segmentation of biological features. The ability to image vitrified samples with enhanced contrast is huge, but our interpretation of these reconstructions is still fundamentally limited in our ability to efficiently analyze subcellular architecture.

      We have demonstrated that the segmentation of subcellular features such as mitochondria within a serial pFIB-SEM data set of heart tissue can be automated using SuRVos2 – a neural network based automated segmentation software. These comparisons are included in an additional figure (Figure 11).

    1. Author Response

      Reviewer #2 (Public Review):

      Charme is a long non-coding RNA reported by the authors in their previous studies. Their previous work, mainly using skeletal muscles as a model, showed the functional relevance of Charme, and presented data demonstrating its nuclear role, primarily via modulating the sub-nuclear localization of Matrin 3 (MATR3). Their data from skeletal muscles suggested that loss of the intronic region of Charme affects the local 3D genome organization, affecting MATR3 occupancy and this gene expression. Loss of Charme in vivo leads to cardiac defects. In this manuscript, they characterize the cardiac developmental defects and present molecular data supporting how the loss of Charme affects the cardiac transcriptome repertoire. Specifically, by performing whole transcriptome analysis in E12.5 hearts, they identify gene expression changes affected in developing hearts due to loss of Charme. Based on their previous study in skeletal muscles, they assume that Charme regulates cardiac gene expression primarily via MATR3 also in developing cardiomyocytes. They provide CLIP-seq data for MATR3 (transcriptome-wide foot printing of MATR3) in wild-type E15.5 hearts and connect the binding of MATR3 to gene expression changes observed in Charme knockout hearts. I credit the authors for providing CLIP seq data from in vivo embryonic samples, which is technically demanding.

      Major strengths:

      Although, as previously indicated by the authors in Charme knockout mice, the major strength is the effect of Charme on cardiac development. While the phenotype might be subtle, the functional data indicate that the role of Charme is essential for cardiac development and function. The combinatorial analysis of MATR3 CLIP-seq and transcriptional changes in the absence of Charme suggests a role of Charme that could be dependent on MATR3.

      We thank this reviewer for appreciating our methodological efforts and the importance of the MATR3 CLIP-seq data from in vivo embryonic samples.

      Weakness:

      (i) Nuclear lncRNAs often affect local gene expression by influencing the local chromatin.

      Charme locus is in close proximity to MYBPC2, which is essential for cardiac function, sarcomerogenesis, and sarcomere maintenance. It is important to rule out that the cardiac-specific developmental defects due to Charme loss are not due to (a) the influence of Charme on MYBPC2 or, of that matter, other neighboring genes, (b) local chromatin changes or enhancer-promoter contacts of MYBPC2 and other immediate neighbors (both aspects in the developmental time window when Charme expression is prominent in the heart, ideally from E11 to E15.5)

      Although the cis-activity represents a mechanism-of-action for several lncRNAs, our previous work does not reveal this kind of activity for pCharme. To add stronger evidence, we have now analysed the expression of pCharme neighbouring genes in cardiac muscle. Genes were selected by narrowing the analysis not only on the genes in “linear” proximity but also on eventual chromatin contacts, which may underlie possible candidates for in cis regulation. To this purpose, we made use of the analyses that in the meantime were in progress (to answer point iv) on available Hi-C datasets (Rosa- Garrido et al. 2017). Starting from a 1 Mb region around Charme locus, we found that most of the interactions with Charme occur in a region spanning from 240 kb upstream and 115 kb downstream of Charme for a total of 370 Kb (Rev#2_Capture Fig. 1A). This region includes 39 genes, 9 of them expressed in the neonatal heart but none showing significant deregulation (see Table S2). To note, this genomic region also included the MYBPC2 locus, for which we did not find a decreased expression in the heart from our RNA-seq data (Revised Figure 2-figure supplement 1C and Table S2). This trend was confirmed through RT-qPCR analyses of several genes from E15.5 extracts, which revealed no significant difference in their abundance upon Charme ablation (Rev#2_Capture fig. 1B).

      Fig. 1. A) Contact map depicting Hi-C data of left ventricular mice heart retrived from GEO accession ID GSM2544836. Data related to 1 Mb region around Charme locus were visualized using Juicebox Web App (https://aidenlab.org/juicebox/). B) RT-qPCR quantification of Charme and its neighbouring genes in CharmeWT vs CharmeKO E15.5.5 hearts. Data were normalized to GAPDH mRNA and represent means ± SEM of WT and KO (n=3) pools. Data information: p < 0.05; p < 0.01, **p < 0.001 unpaired Student’s t test.

      For a better understanding, we also checked possible “local” Charme activities in skeletal muscle cells, from previous datasets (Ballarino et al., 2018). We found that in murine C2C12 cells treated with two different gapmers against Charme, three of its neighbouring genes were expressed (Josd2, Emc10 and Pold1), but none showed significant alterations in their expression levels in response to Charme knock-down (Rev#2_Capture Fig. 2).

      Taken together, these results would exclude the possibility of Charme in cis activity as responsible for the phenotype.

      Fig. 2: Average expression from RNA-seq (FPKM) quantification of Charme neighbouring genes in C2C12 differentiated myotubes treated with Gap-scr vs Gap-Charme. Values for Gap-Charme represent the average values of gene expression after treatment with two different gapmers (GAP-2 and GAP-2/3).

      (ii) The authors provide data indicating cardiac developmental defects in Charme knockouts. Detailed developmental phenotyping is missing, which is necessary to pinpoint the exact developmental milestones affected by Charme. This is critical when reporting the cell type/ organ-specific developmental function of a newly identified regulator.

      We did our best to answer this concern.

      Let us first emphasise that, since their generation, we have never observed any particular tissue alteration, morphological or physiological, when dissecting the CharmeKO animals other than the muscular ones. The high specificity of pCharme expression, as also shown here by ISH (Figure 1C-D, Figure 1-figure supplement 1A-B, Figure 3A), together with the minimal alteration applied to the locus for CRISPR-Cas-mediated KO (PolyA insertion), strongly excludes the presence of an alteration in other tissues and their involvement in the development of the phenotype.

      Nevertheless, we now add more developmental details to the cardiac phenotype (see also Essential revision point 2).

      1- First of all, gene expression analyses performed at 12.5E, 15.5E, 18.5E and neonatal (PN2) stages allowed us to identify, at the molecular level, the developmental time point when CharmeKO effects on the cardiac muscle can be found. Our new results clearly indicate that the pCharme-mediated regulation of morphogenic and cardiac differentiation genes is detectable from E15.5 fetal stage onward (Rev#2_Capture Fig. 3/Revised Figure 2E). Together with the analysis of pCharme targets and coherently with the altered cardiac maturation and performance, this evidence is also supported by the analysis of the myosins Myh6/Myh7 ratio, which diminution in CharmeKO hearts starts from E15.5 up to 69% of control levels at PN stages (Revised Figure 2F).

      2- Hematoxylin-eosin staining of dorso-ventral cryosections from CharmeWT and CharmeKO hearts confirmed the fetal malformation at the E15.5 stage (Revised Figure 2G). Moreover, the hypotrabeculation phenotype of CharmeKO hearts, which was initially examined by immunofluorescence, now finds confirmation by the analysis of key trabecular markers (Irx3 and Sema3a), which expression significantly decreases upon pCharme ablation (Rev#1_Capture Fig. 3B/Revised Figure 2-figure supplement 1G).

      3- Finally, the gene expression analysis on Ki-67, Birc5 and Ccna2 (Revised Figure 2-figure supplement 1E) definitively rules out the influence of pCharme ablation on cell-cycle genes and cardiomyocytes proliferation, thus allowing a more careful interpretation of the embryonic phenotype. Note that, coherently with the lncRNA implication at later stages of development, the expression of important cardiac regulators, such as Gata4, Nkx2-5 and Tbx5, is not altered by its ablation at any of the tested time points (Rev#2_Capture Fig.3), while pCharme absence mainly affects genes which are expressed downstream of these factors.

      These new results have been included in the revised version of the manuscript and better discussed.

      Fig. 3: RT-qPCR quantification Gata4, Nkx2-5 and Tbx5 in CharmeWT and CharmeKO cardiac extract at E12.5, E15.5 and E18.5 days of embryonal development. Data were normalized to GAPDH mRNA and represent means ± SEM of WT and KO (n=3) pools.

      (iii) Along the same line, at the molecular level, the authors provide evidence indicating a change in the expression of genes involved in cardiogenesis and cardiac function. Based on changes in mRNA levels of the genes affected due to loss of Charme and based on immunofluorescence analysis of a handful of markers, they propose a role of Charme in cell cycle and maturation. Such claims could be toned down or warrant detailed experimental validation.

      See above, response to Reviewer #2 (Public Review) weakness (ii).

      (iv) Authors extrapolate the mechanistic finding in skeletal muscle they reported for Charme to the developing heart. While the data support this hypothesis, it falls short in extending the mechanistic understanding of Charme beyond the papers previously published by the authors. CLIP-seq data is a step in the right direction. MATR3 is a relatively abundant RBP, binding transcriptome-wide, mainly in the intronic region, based on currently available CLIP-seq data, as well as shown by the authors' own CLIP seq in cardiomyocytes. It is also shown to regulate pre-mRNA splicing/ alternative splicing along with PTB (PMID: 25599992) and 3D genome organization (PMID: 34716321). In addition, the authors propose a MATR3 depending molecular function for Charme primarily dependent on the intronic region of Charme and due to the binding of MATR3. Answering the following question would enable a better mechanistic understanding of how Charme controls cardiac development.

      (i) what are the proximal genomic regions in the 3D space to Charme locus in embryonic cardiomyocytes? Authors can re-analysis published Hi-C data sets from embryonic cardiomyocytes or perform a 4-C experiment using Charme locus for this purpose.

      See above, response to Reviewer #2 (Public Review) weakness (i).

      (ii) does the loss of Charme affect the splicing landscape of MATR3 bound pre-mRNAs in E12.5 ventricles in general and those arising from the NCTC region specifically?

      This is an intriguing issue, as also highlighted by new evidence showing that the reactivation of fetal-specific RNA-binding proteins, including MATR3, in the injured heart drives transcriptome-wide switches through the regulation of early steps of RNA transcription and processing (D'Antonio et al., 2022).

      Using the rMATS software on our neonatal RNA-Seq datasets we then investigated the effect of pCharme depletion on splicing, with a focus on NCTC. As shown in the Rev#2_Capture Fig.4A, all classical splicing alterations were investigated, such as exon-skipping, alternative 5’ splice site, alternative 3’ splice site, mutually excluded exons and intron retention. Intriguingly, we did observe a slight alteration in the splicing patterns, in particular considering exon skipping events (62% corresponding to 381 genes). Among them, the majority corresponded to exon exclusion events (237 events = 209 genes) while a smaller fraction to exon inclusion (144 events = 133 genes). Moreover, by intersecting these genes with the MATR3-bound RNAs we found a slightly significant enrichment (p=0,038) for exon inclusion (Rev#2_Capture Fig.4B).

      Regarding the NCTC locus, we demonstrate that in hearts pCharme acts through different target genes. Indeed, none of the NCTC-arising transcripts are bound by MATR3 (see Table S4) or substrate for alternative splicing regulation.

      While these results are very interesting for deepening the investigation of pCharme/MATR3 interplay, their biological significance needs to be further investigated through one-by-one analysis of specific transcripts. As a prosecution of the project, Nanopore sequencing of these samples on a MinION platform is currently undergoing in the lab to obtain a better characterization of alternative splicing events in response to the lncRNA ablation during development.

      Fig. 4: A) Left and middle panel: Pie Chart depicting the proportion of significantly altered (FDR < 0.05) splicing events detected by rMATS comparing neonatal CharmeWT and CharmeKO RNA-seq samples. All classical splicing alterations were investigated, such as exon-skipping, alternative 3’ splice site (A3SS), intron retention, alternative 5’ splice site (A5SS) and mutually excluded exons (MXE). Right panel. Volcano plot depicting significant exon skipping events in CharmeKO (FDR < 0.05, PSI<0 for excluded and included exons, FDR >= 0.05 for invariant exons). X-axis represent exon-inclusion ratio or Percentage Spliced In (PSI) while y-axis represent –log10 of p-value. B) Pie charts representing the fraction of transcripts with at least one significant excluded (left panel), invariant (middle panel) and included (right panel) exons that are bound by MATR3. P-values of MATR3 targets enrichment for each comparison is depicted below. Statistical significance was assessed with Fisher exact test.

      (iii) MATR3 binds DNA, as also shown by authors in previous studies. Is the MATR3 genomic binding altered by Charme loss in cardiomyocytes globally, as well as on the loci differentially expressed in Charme knockout heart? Overlapping MATR3 genomic binding changes and transcriptome binding changes to differentially expressed genes in the absence of Charme would better clarify the MATR3-centric mechanisms proposed here. Further connecting that to 3D genome changes due to Charme loss could provide needed clarity to the mechanistic model proposed here.

      Previous experience from our (Desideri et al., 2020) and other labs (Zeitz et al 2009 J Cell Biochem), indicate that Chromatin IP is not the most suitable approach for identifying MATR3 specific targets because of the broad distribution of MATR3 over the genome. Given the number of animals that would need to be sacrificed, we moved further to strengthen our MATR3 CLIP evidence by adding the i) CharmeKO MATR3 CLIP-seq control and the ii) combinatorial analysis of MATR3 CLIP-seq with the RNA-seq data.

      We have better explained the reasoning within the text, which now reads “The known ability of MATR3 to interact with both DNA and RNA and the high retention of pCharme on the chromatin may predict the presence of chromatin and/or specific transcripts within these MATR3-enriched condensates. In skeletal muscle cells, we have previously observed on a genome-wide scale, a global reduction of MATR3 chromatin binding in the absence of pCharme (Desideri et al., 2020). Nevertheless, the broad distribution of the protein over the genome made the identification of specific targets through MATR3-ChIP challenging.” (lines 274-279).

      Indeed, we found that MATR3 binding was significantly decreased on numerous peaks (434/626), while its increase was observed on a smaller fraction of regions (192/626) (Revised Figure 5C). As a control, we performed MATR3 motif enrichment analysis on the differentially bound regions revealing its proximity to the peak summit (+/- 50 nt) (Revised Figure 5-figure supplement 1D) close to the strongest enrichment of MATR3, further confirming a direct and highly specific binding of the protein to these sites. To better characterise the relationship between MATR3 and pCharme, we then intersected the newly identified regions with the MATR3-bound transcripts whose expression was altered by Charme depletion. While gain peaks were equally distributed across DEGs, loss peaks were significantly enriched in a subset of pCharme down-regulated DEGs (Revised Figure 5D), suggesting a crosstalk between the lncRNA and the protein in regulating the expression of this specific group of genes. Interestingly, these RNAs mainly distribute across the same GO categories as pCharme downregulated DEGs and include genes, such as Cacna1c, Notch3, Myo18B and Rbm20 involved in embryo development and validated as pCharme/Matr3 targets in primary cardiac cells (Revised Figure 5D, lower panel and 5E)

    1. Author Response

      Reviewer #2 (Public Review):

      1) My main reservation is the presentation of the work. The writing style is conversational and expansive, which makes it challenging for the reader. Furthermore, long paragraphs shift from one topic to the next rather than using separate paragraphs with strong topic sentences to cover each topic. I suggested a few places to start new paragraphs, but many more paragraphs could be divided.

      We have also made significant efforts to reduce the text of the manuscript in each section, with more compact phrasing (including the headlines for the different results sections), and more short paragraphs to make the paper more readable. This has resulted in an overall reduction in the total number of words in the manuscript from ~11.000 to 9.000 (including Abstract, Introduction, Results, Discussion, Materials and Methods, and Figure legends sections), equivalent to approximately four pages of typed text.

      2) Most of the figures are also overly complicated. I did not attempt to edit one of them, but I am sure that findings will be much clearer with about half of the panels moved to supplemental materials, so the reader can concentrate on the most important data.

      As recommended by the reviewer, we have significantly reduced the number of panels within the figures in the revised manuscript. Accordingly, the total number of panels in the modified figures compared to the original version is as follows: Figure 1 (7 vs 8); Figure 2 (8 vs 10); Figure 3 (7 vs 10); Figure 4 (7 vs 12); Figure 5 (6 vs 11); Figure 6 (4 vs 8).

      The remaining panels, including quantitative data such as cable-to-patch ratios, or percentages of septated/multiseptated cells, among others, have been moved to existing and new supplementary figures. The total number of supplementary figures is now 9 versus 6 in the original version.

    1. Author Response

      Reviewer #1 (Public Review):

      This study combines the biologging method with captive experiments and DNA metabarcoding to detail the hunting behavior of a bat species in the wild. Specifically, it shows that bats use two foraging strategies (echolocating small prey in the air and capturing large ground prey with passive listening) with different success rates and energetic gains. This result highlights that a species believed to be a specialist forager can, in fact, have mixed strategies depending on the condition and environment.

      The detailed foraging behavior they show for such a small animal is impressive. A combination of several different methods, including captive experiments, is a major strength of the paper. I especially like the mastication sound analysis, although I don't know how new it is. However, I have a major concern about the presentation of this study. The manuscript is apparently written for a bat community, and it's hard to understand the significance of the results in the field of animal ecology.

      Thank you for your helpful feedback. We agree that the framing of the ms was too narrow for the audience of eLife, and we have framed the introduction for a broader audience of animal ecology.

      Reviewer #2 (Public Review):

      This paper has huge potential for influencing the way we think about bats as foragers. But, I think that it can be improved.

      Specifically, there is no clearly articulated hypothesis underlying the work. Second, there should be specific testable predictions arising from the hypothesis. This change, while relatively minor, will vastly improve the focus of the work, and hence its impact on the reader.

      Thank you highlighting the need for clear hypotheses. We have added three specific hypotheses to guide the reader (line: 54-56) in the introduction. We have also reformatted the discussion section to address each hypothesis in succession using subheadings with clear take home messages (line: 223-224, 271-272, 293, 318)

      Reviewer #3 (Public Review):

      The study addresses a tough question in the study of wild bats: what and where they eat, using both acoustic bio-logging and DNA metabarcoding. As a result, it was found that greater mouse-eared bats made more frequent attack attempts against passively gleaning prey with lower predation success but higher prey profitability than aerial hawking with higher predation success. This is a precious study that reveals essential new insights into the foraging strategies of wild bats, whose foraging behavior has been challenging to measure. On the other hand, the detection of capture attempts, success or failure of predation, and whether it was by passively gleaning prey or aerial hawking were determined from the audio and triaxial accelerometer analysis, and all results of this study depend entirely on the veracity of this analysis. Also, although two different weights and a tag nearly 15% of its weight were used, it is essential for the results of this data that there be no effect on foraging behavior due to tag attachment. Since this is an excellent study design using state-of-the-art methods and very valuable results, readers should carefully consider the supplemental data as well.

      Thank you for the kind words. We agree that it is critically important that the two foraging strategies are un-affected by tagging effects. In the revised ms, we have added tag weights, tag types and change in body weight during instrumentation as explanatory factors in out statistical models and found no effect of the tag weight on our results. We have also addressed this important issue in the method section (model 1: line 520-539, model 3: 568-590).

    1. Author Response

      Reviewer #1 (Public Review):

      Zeng and colleagues investigated the neural underpinnings of visual-vestibular recalibration. Specifically, they measured changes in three monkeys' perception of unisensory heading cues as well as associated changes in neuronal responses to these cues in three different cortical areas following prolonged exposure to systematic visual-vestibular discrepancies. Behavioral responses in a motion direction discrimination task indicate unisensory perceptual shifts in opposite directions that account for the cross-modal discrepancy the monkeys were exposed to. Neuronal firing patterns, related to motion discrimination judgments by means of neurometric functions indicated analogous shifts in neuronal tuning in areas MSTd and PIVC. In contrast, in area VIP tuning for visual heading stimuli shifted in the same direction as tuning for vestibular stimuli and thus in contradiction to the observed perceptual shifts.

      The shifts observed in MSTd and PIVC fit nicely with existing theories and results regarding cross-modal recalibration and substitute claims that activity in these areas might underlie perceptual decisions. The shift of visual tuning in VIP is surprising and will certainly spark further investigation.

      Overall the results are really interesting, yet, the manuscript in its current form needs revisions along two dimensions, 1) data analysis and 2) writing.

      We thank the reviewer for the positive comments and thoughtful suggestions, which have greatly helped us improve the data analysis and writing. Also, thank you for the thorough list of specific suggestions for improved writing and phrasing. This considerably helped us clarify these aspects in our manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Zeng and colleagues aims to investigate how neural representations of sensory cues in two modalities (visual and vestibular) change when conflicts are introduced between the cues. The manuscript convincingly demonstrates that this recalibration process differs between areas MSTd (a multisensory region), where sensory responses recalibrated differently for visual and vestibular cues, following each modality's conflict, and area VIP ( a higher-level region), where responses follow the vestibular cue. More limited insights are present for area PIVC, where visual responses are limited.

      The analyses generally support the conclusions of the authors, but I have two major suggestions to strengthen the statistical robustness of the manuscript:

      1) The analysis about the lack of visual recalibration in area PIVC would have been more convincing if the authors had used Bayesian statistics instead of regular t tests. In this way it would have been possible to estimate if the lack of visual recalibration in this area, for those few neurons that show visual tuning, can be taken as evidence for the absence of an effect or not. In the absence of this additional analysis, it is in fact difficult to properly interpret the results about area PIVC. Is PIVC more in line with MSTd, in view of the lack of visual responses? Or is there actually no visual recalibration, in contrast to both MSTd and VIP?

      In response to this comment, we calculated the Bayesian Pearson correlation for visual recalibration in area PIVC, with the alternative hypothesis (H1) of a correlation between neuronal shifts and perceptual shifts and the null hypothesis (H0) of no correlation: Pearson's r = 0.26, and BF10 = 0.49. Thus, the evidence neither supports H1 nor H0. The lack of support for or against visual recalibration in PIVC primarily reflects the lack of robust tuning to visual heading stimuli in PIVC. Accordingly, in the manuscript, we do not argue for or against the recalibration of visual heading tuning in PIVC. Rather, we highlight that neurons in PIVC respond strongly to vestibular signals, but not so to visual heading stimuli and that the vestibular responses undergo recalibration. We agree that the lack of evidence for (or against) visual recalibration in PIVC primarily reflects the lack of robust tuning to visual heading stimuli. We interpret the observed shifts in vestibular tuning in PIVC as lower-level, sensory, recalibration (similar to MSTd) based on the broader understanding that PIVC encodes lower-level vestibular signals, with transient time-courses, and impoverished visual tuning (Chen et al., 2016; Chen et al., 2021). Our results are in line with this interpretation, and there is no reason to suspect that PIVC reflects more complex multisensory recalibration (like VIP). Nonetheless, the data could also be in line with alternative interpretations. Therefore, in the revised manuscript we now more explicitly explain this argument and have added limitations thereof, and alternative interpretations to the Discussion (in subsection “Limitations and future directions”, paragraph 2).

      2) For all statistical analyses, multi-level statistics would have been more appropriate than simple t-tests. In fact, since recordings come from few subjects, which in turn have relatively few recording sessions, there is a risk that the results are influenced by one subject and do not represent the full population. Admittedly, this is unlikely in view of the apparently large effect size and low p values. Nonetheless, a more appropriate statistical analysis would make the results more robust and convincing.

      Thank you. We agree with this suggestion and have now: 1) added summary statistics for the individual monkeys, and 2) performed linear mixed model (LMM) analyses (please see our response to Essential Revisions Comment #1, for further details).

      Once these issues are addressed, I believe that the manuscript would provide relevant evidence supporting the hypothesis that multisensory processing in the cortex is an area-specific phenomenon, and that effects observed in one area cannot be simply expected to operate elsewhere. This will therefore elucidate the mechanisms of multimodal plasticity.

      Reviewer #3 (Public Review):

      This study documents an empirical investigation of a fundamental brain process: adaptation to systematic cross-sensory discrepancies. The question is important, the experiment is carefully designed, and the results are striking. Following an unsupervised recalibration block, perceptual judgments of self-motion on the basis of visual and vestibular cues are systematically altered. These behavioral effects are mirrored by changes in the response properties of single neurons in areas MSTd and PIVC (provided that neurons in these areas exhibited selectivity for the sensory cue). Remarkably, neurons in downstream area VIP adjust their response properties in a very different manner, seemingly exclusively reflecting vestibular recalibration (which is opposite in direction to visual perceptual shifts). In the former two areas, the neural-behavior association follows the stimulus dynamics. In VIP, this association remains high beyond the life span of the stimulus. VIP typically exhibits strong choice signals. These decreased in strength after recalibration (an effect unique to area VIP). Together, these findings further dissociate VIP's functional role from that of MSTd and PIVC, without however, fully revealing what that role may be. These results offer a novel perspective on the neural basis of cross-sensory recalibration and will inspire future modeling studies of the neural basis of perception of self-motion.

      We thank the reviewer for the supportive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Wei & Robles et al seek to estimate the heritability contribution of Neanderthal Informative Markers (NIM) relative to SNPs that arose in modern humans (MH). This is a question that has received a fair amount of attention in recent studies, but persistent statistical limitations have made some prior results difficult to interpret. Of particular concern is the possibility that heritability (h^2) attributed to Neanderthal markers might be tagging linked variants that arose in modern humans, resulting in overestimation of h^2 due to Neanderthal variants. Neanderthal variants also tend to be rare, and estimating the contribution of rare alleles to h^2 is challenging. In some previous studies, rare alleles have been excluded from h^2 estimates.

      Wei & Robles et al develop and assess a method that estimates both total heritability and per-SNP heritability of NIMs, allowing them to test whether NIM contributions to variation in human traits are similar or substantially different than modern human SNPs. They find an overall depletion of heritability across the traits that they studied, and found no traits with enrichment of heritability due to NIMs. They also developed a 'fine-mapping' procedure that aims to find potential causal alleles and report several potentially interesting associations with putatively functional variants.

      Strengths of this study include rigorous assessment of the statistical methods employed with simulations and careful design of the statistical approaches to overcome previous limitations due to LD and frequency differences between MH and NIM variants. I found the manuscript interesting and I think it makes a solid contribution to the literature that addresses limitations of some earlier studies.

      My main questions for the authors concern potential limitations of their simulation approach. In particular, they describe varying genetic architectures corresponding to the enrichment of effects among rare alleles or common alleles. I agree with the authors that it is important to assess the impact of (unknown) architecture on the inference, but the models employed here are ad hoc and unlikely to correspond to any mechanistic evolutionary model. It is unclear to me whether the contributions of rare and common alleles (and how these correspond with levels of LD) in real data will be close enough to these simulated schemes to ensure good performance of the inference.

      In particular, the common allele model employed makes 90% of effect variants have frequencies above 5% -- I am not aware of any evolutionary model that would result in this outcome, which would suggest that more recent mutations are depleted for effects on traits (of course, it is true that common alleles explain much more h^2 under neutral models than rare alleles, but this is driven largely by the effect of frequency on h^2, not the proportion of alleles that are effect alleles). Likewise, the rare allele model has the opposite pattern, with 90% of effect alleles having frequencies under 5%. Since most alleles have frequencies under 5% anyway (~58% of MH SNPs and ~73% of NIM SNPs) this only modestly boosts the prevalence of low frequency effect alleles relative to their proportion. Some selection models suggest that rare alleles should have much bigger effects and a substantially higher likelihood of being effect alleles than common alleles. I'm not sure this situation is well-captured by the simulations performed. With LD and MAF annotations being applied in relatively wide quintile bins, do the authors think their inference procedure will do a good job of capturing such rare allele effects? This seems particularly important to me in the context of this paper, since the claim is that Neanderthal alleles are depleted for overall h^2, but Neanderthal alleles are also disproportionately rare, meaning they could suffer a bigger penalty. This concern could be easily addressed by including some simulations with additional architectures to those considered in the manuscript.

      We thank the reviewers for their thoughtful comments regarding rare alleles, and we agree that our RARE simulations only moderately boosted the enrichment of rare alleles in causal mutations. To address this, we added new simulations, ULTRA RARE, in which SNPs with MAF < 0.01 constitute 90% of the causal variants. Similar to our previous simulations, we use 100,000 and 10,000 causal variants to mimic highly polygenic and moderately polygenic phenotypes, and 0.5 and 0.2 for high and moderately heritable phenotypes. We similarly did three replicated simulations for each combination and partitioned the heritability with Ancestry only annotation, Ancestry+MAF annotation, Ancestry+LD annotation, and Ancestry+MAF+LD annotation. Our Ancestry+MAF+LD annotation remains calibrated in this setting (see Figure below). We believe this experiment strengthens our paper and have added it as Fig S2.

      While we agree that these architectures are ad-hoc and are unlikely to correspond to realistic evolutionary scenarios, we have chosen these architectures to span the range of possible architecture so that the skew towards common or rare alleles that we have explored are extreme. The finding that our estimates are calibrated across the range that we have explored leads us to conclude that our inferences should be robust.

      More broadly, we concur with the reviewer that our results (as well as others in the field) may need to be revisited as our view of the genetic architecture of complex traits evolves. The methods that we propose in this paper are general enough to explore such architectures in the future by choosing a sufficiently large set of annotations that match the characteristics across NIMs and MH SNPs. A practical limitation to this strategy is that the use of a large number of annotations can result in some annotations being assigned a small number of SNPs which would, in turn, reduce the precision of our estimates. This limitation is particularly relevant due to the smaller number of NIMs compared to MH SNPs (around 250K vs around 8M).

      Reviewer #2 (Public Review):

      The goal of the work described in this paper is to comprehensively describe the contribution of Neanderthal-informative mutations (NIMs) to complex traits in modern human populations. There are some known challenges in studying these variants, namely that they are often uncommon, and have unusually long haplotype structures. To overcome these, the authors customized a genotyping array to specifically assay putative Neanderthal haplotypes, and used a recent method of estimating heritability that can explicitly account for differences in MAF and LD.

      This study is well thought-out, and the ability to specifically target the genotyping array to the variants in question and then use that information to properly control for population structure is a massive benefit. The methodology also allowed them to include rarer alleles that were generally excluded from previous studies. The simulations are thorough and convincingly show the importance of accounting for both MAF and LD in addition to ancestry. The fine-mapping done to disentangle effects between actual Neanderthal variants and Modern human ones on the same haplotype also seems reasonable. They also strike a good balance between highlighting potentially interesting examples of Neanderthal variants having an effect on phenotype without overinterpreting association-based findings.

      The main weakness of the paper is in its description of the work, not the work itself. The paper currently places a lot of emphasis on comparing these results to prior studies, particularly on its disagreement with McArthur, et al. (2021), a study on introgressed variant heritability that was also done primarily in UK Biobank. While they do show that the method used in that study (LDSR) does not account for MAF and LD as effectively as this analysis, this work does not support the conclusion that this is a major problem with previous heritability studies. McArthur et al. in fact largely replicate these results that Neanderthal variants (and more generally regions with Neanderthal variants) are depleted of heritability, and agree with the interpretation that this is likely due to selection against Neanderthal alleles. I actually find this a reassuring point, given the differences between the variant sets and methods used by the two studies, but it isn't mentioned in the text. Where the two studies differ is in specifics, mainly which loci have some association with human phenotypes; McArthur et al. also identified a couple groups of traits that were exceptions to the general rule of depleted heritability. While this work shows that not accounting for MAF and LD can lead to underestimating NIM heritability, I don't follow the logic behind the claim that this could lead to a false positive in heritability enrichment (a false negative would be more likely, surely?). There are also more differences between this and previous heritability studies than just the method used to estimate heritability, and the comparisons done here do not sufficiently account for these. A more detailed discussion to reconcile how, despite its weaknesses, LDSR picks up similar broad patterns while disagreeing in specifics is merited.

      We agree with the reviewer that our results are generally concordant with those of McArthur et al. 2021 and this concordance is reassuring given the differences across our studies. The differences across the studies, wherein McArthur et al. 2021 identify a few traits with elevated heritability while we do not, could arise due to reasons beyond the methodological differences such as differences in the sets of variants analyzed. We have partially explored this possibility in the revised manuscript by analyzing the set of introgressed variants identified by the Sprime method (which was used in McArthur et al. 2021) using our method: we continue to observe a pattern of depletion with no evidence for enrichment. We hypothesize that the reason why LDSR picks up similar overall patterns despite its limitations is indicative of the nature of selection on introgressed alleles (which, in turn, influences the dependence of effect size on allele frequency and LD). Investigating this hypothesis will require a detailed understanding of the LDSR results on parameters such as the MAF threshold on the regression SNPs and the LD reference SNPs and the choice of the LD reference panel.

      Not accounting for MAF and LD can underestimate NIM heritability but can both underestimate and overestimate heritability at MH SNPs. Hence, tests that compare per-SNP heritability at NIMs to MH SNPs can therefore lead to false positives both in the direction of enrichment and depletion.

      We have now written in the Discussion: “In spite of these differences in methods and NIMs analyzed, our observation of an overall pattern of depletion in the heritability of introgressed alleles is consistent with the findings of McArthur et al. The robustness of this pattern might provide insights into the nature of selection against introgressed alleles”

      In general this work agrees with the growing consensus in the field that introgressed Neanderthal variants were selected against, such that those that still remain in human populations do not generally have large effects on phenotypes. There are exceptions to this, but for the most part observed phenotypic associations depend on the exact set of variants being considered, and, like those highlighted in this study, still lack more concrete validation. While this paper does not make a significant advance in this general understanding of introgressed regions in modern populations, it does increase our knowledge in how best to study them, and makes a good attempt at addressing issues that are often just mentioned as caveats in other studies. It includes a nice quantification of how important these variables are in interpreting heritability estimates, and will be useful for heritability studies going forward.

    1. Author Responses

      Reviewer #1 (Public Review):

      The authors present a very detailed short report on a previously undocumented behaviour where flying squirrels are believed to have created grooves in various species of nuts to aid their secure storage in the crotch or forks of twigs. The behaviour is suggested to have evolved as an adaptive strategy in this population of flying squirrels because of the challenges for nut caching in a rainforest environment.

      Thanks

      Using detailed photographs, GPS locations, measurements and camera trap videos, the authors describe the behaviour in great depth providing a useful base for comparative and future studies. However, the weakest point of this study is that the authors did not detect any squirrels making the grooves and only monitored nuts once they were cached. Therefore more research needs to be done to ascertain who, how and where the grooves are produced in the first place.

      Three new videos are attached to show that two squirrel species are rotate and carving the nuts to create the grooves. By the new videos, we can also observe that squirrels re-fixed the nuts between the twigs by carving the nuts. These direct observations can support the claim better. See Supplementary Media files 6-8.

      This work will be of great interest to scholars of animal behaviour and cognition and draws attention to a novel behaviour that warrants further study in similar species.

      Yes, it is. Thanks

      Reviewer #2 (Public Review):

      The authors describe observations of an innovative food caching behavior attributed to two species of flying squirrels and likened the behavior to architectural joints used by humans. The discovery of nuts stored in the crook of shrub branches, facilitated by indented rings seemingly carved by squirrels, possibly represents an interesting food handling innovation that may function to prevent spoilage in a damp tropical ecosystem.

      Thanks!

      I applaud the efforts to survey the area multiple times after the initial discovery, and the use of trail cameras to try capture evidence of animal associations. For what is in essence a natural history note, the authors did a great job of trying to gather a variety of supporting evidence. The videos capturing squirrels visiting and retrieving the cached nuts were compelling, and the shaking of the shrubs demonstrating the difficulty in dislodging the nuts helps build the case that the nuts are cached effectively.

      Thanks!

      The most glaring gap in the evidence is that there is no direct observation of the squirrels actually performing this nut carving behavior, only associating with the nuts after they have been cached.There must be more documentation provided to explicitly link the causality between squirrels and this caching innovation.

      We have included three additional videos to demonstrate that squirrels of both species rotate and carve the nuts to create the grooves. These new videos also show that squirrels can fit the nuts between twigs by carving the nuts. We think that these direct observations clearly support our claim, but agree that it was oversight not to included them in the first draft. See Supplementary Media files 6-8.

      The second major weakness is more to do with writing style and could be addressed with significant revisions to phrasing and development of ideas. This is namely to do with the claim that this is somehow an evolved behavior, without providing evidence that 1) it is indeed the squirrels performing this behavior, 2) that is confers some kind of fitness benefit, and 3) hard evidence that this caching method does indeed prevent decomposition/germination in comparison to the more traditional caching methods of these species. Given the limited geographic range of the observations, I wonder how much of this is actually attributable to learning and/or innovation by these individuals. These ideas are not developed fully, and sometimes the writing wanders among learning and evolution without exploring the deep links among the two concepts.

      1) As above, three new videos establish that the squirrels do, in fact, carve the nuts. See Supplementary Media files 6-8.

      2) We added more description to suggest how this behavior likely confers fitness benefit in the discussion. At this point, however, it is correct to say that we have no hard evidence to demonstrate this, and thus, we’ve attempted to ‘tighten up’ the discussion accordingly so that our arguments (and its limitations) are more understandable.

      3) We revised the statistics about the proportion of nuts that were fresh during each of the surveys, and added some references about how long is required for the nuts to germinate in natural conditions. L163-172.

      Third, the connection to architecture is attention-grabbing, but I'd like to see this fleshed out a bit more with more text description (and a visual here would help immensely).

      We added more description about how the grooving, caching and checking processes were performed by squirrels and how the principles of this suspension are similar to the mortise-tenon joint as employed by humans. L186-202. As above, three new videos are attached.

      Ultimately this work stands to potentially contribute a fascinating piece of evidence into the growing literature on animal cognition, spatial awareness, caching behavior, innovation, and adaptation, but currently, the claims are unsupported by the evidence presented.

      Thank you for your comments about the potential importance of our work on this interesting system. In this version we try to focus more tightly on the aspects for which we have new information to interpret.

      Reviewer #3 (Public Review):

      The authors were trying to describe and document the grooving behaviour of nuts in two species of flying squirrels (Hylopetes Phayrei electilis and H. alboniger) as well as related such behaviour to tool use or that the squirrels are smart. To achieve these objectives, the authors conducted three field surveys. They also set out a camera later to capture animal species that interacted with these nuts. They found that these nuts with grooves are fixed between twigs and can be found in different small plant species. Both species of squirrels made grooves a nut. More shallow grooves are found in nuts that are fixed on alive than dead trees. Ellipsoid nuts have deeper grooves than oblate nuts. They concluded that these nut grooving behaviours are evolved or learned in those flying squirrel populations, and related these behaviours to tool use as well as that the squirrels are smart.

      Thanks!

      One strength of this work is that the data were collected in the field, which may provide hard evidence with video footage showing the two flying squirrel populations made grooves on nuts as well as fixing them between twigs. This evidence will induce new interests to understand the causes and consequences of such nut grooving behaviour. It may be bold to claim that such behaviour involves advance cognition or cognitive process without proper, systematic, experiments. Accordingly, whether the squirrels are 'smart' remains unclear. The authors did well in describing and documenting the nut grooving behaviours of the two species of flying squirrels, which has achieved their first aim. However, as mentioned above, whether such behaviour is 'smart' will need more systematic investigations.

      We have removed the description about cognition or cognitive process in the paper, and the paper is focused on the grooving behavious. “Smart” is also removed, with other words used instead.

    1. Author Response

      Reviewer #3 (Public Review):

      1) (Schichl et al. 2011 JBC 286:38466). This publication is not cited in the current version of the manuscript. The results of Schichl et al. seem particularly relevant for the interpretation of some of the results presented here and should be considered in the final discussion and conclusions of the present work.

      This reference and related text was added in the discussion section in the revised manuscript (lines 508-517).

      2) The ubiquitination of endogenous TTP has not been demonstrated.

      New data assessing the ubiquitination of endogenous TTP was added as Figure 1 – figure supplement 1D.

      3) The type of ubiquitination detected on the overexpressed version of TTP is not characterized. This seems important in view of the results of Schichl et al. who showed non-degradative ubiquitination (K63) of TTP.

      New data with the detection of K48- or K63-linked poly-ubiquitin chain by specific antibodies was added as Figure 1 – figure supplement 1G. These data show that recombinant poly-ubiquitin chains can be readily detected with both antibodies, but that only K48-linked chains were detected on TTP IPed from cells.

      4) The half-life of the non-ubiquitinated mutant of TTP (K→R) was not precisely compared to the half-life of the wild-type TTP protein (similar to the experiment presented in 1B).

      New data from TTP-KtoR chase experiments was added as Figure 1 – figure supplement 1E. The half-life was increased substantially from 1.4 h for wtTTP to 5.7 h for the mutant.

      5) The effect of the E1 ubiquitin ligase TAk-243 on endogenous TTP levels was not tested.

      New data assessing the effect of TAK-243 on endogenous TTP was added as Figure 1 – figure supplement 1B. Consistent with our data with exogenously expressed TTP, treatment with the inhibitor increased the abundance of endogenous TTP.

      6) While they demonstrate that TTP-HA is efficiently degraded after 3 to 7h of LPS stimulation (Fig 1B) and that the stronger decrease in mCherry-TTP fusion level occurs between 4 and 6h of LPS stimulation the screen for identification of TTP modulators is performed 16h of LPS stimulation (Fig 2A). The rationale behind this experimental setting is not explicitly described.

      We found that endogenous TTP and mCherry-TTP levels were substantially lower at 16 h post-LPS stimulation compared to 6 h. (see Fig. 1D), and reasoned that this would yield the best genetic screen window in which to identify mutant cells with non-functional degradation mechanisms.

      7) The authors did not directly test the effect of HUWE1 inactivation on endogenous TTP accumulation after blocking protein synthesis. This control seems important as data presented in figure 2E could result both from an effect of Huwe1 level on LPS-induced TTP synthesis and TTP degradation.

      New data from chase experiments with endogenous TTP have been added as Fig. 2G. Consistent with the data presented in Fig. 2E, TTP levels declined during the chase period in sgROSA control cells, with an estimated half-life of 3.7 h. In contrast, TTP levels did not significantly decline during the CHX chase period in Huwe1 KO cells, resulting in an estimated TTP protein half-life of ~20 h in this genotype.

      8) In the data presented in figure 2, it is not entirely clear what exactly the authors are referring to as "endogenous TTP". In Figure 2C endogenous TTP is detected by western blot on cells transfected with an mCherry-TTP fusion. In this case, the size difference allows unambiguous identification of the endogenous form of TTP (although one could not exclude that overexpressing a TTP fusion protein might affect the level of the endogenous protein). However, TTP and mCherry-TTP cannot be distinguished by FACS (Fig2 D and E). If cells used in the experiments shown in 2C and 2D-E are distinct, this should be mentioned more explicitly in the legend of Fig. 2. Otherwise, the detection of endogenous TTP should be performed on cells that do not express mCherry-TTP.

      Results from Fig. 2D/E are indeed from cells that do not express mCherry-TTP. Endogenous TTP is detected in these cells by intracellular antibody staining. The figure legend text has been updated to reflect that panel 2C is with the RAW264.7-Dox-Cas9-mCherry-TTP cell line, and D-E is with the RAW264.7-Dox-Cas9 cell line.

      9) The third part of the manuscript aims to demonstrate that loss of Huwe1 decreases the half-life of pro-inflammatory mRNAs controlled by TTP. In my opinion, this conclusion is reliably supported by the data presented in Figure 3 and Supplementary Figure 3. As the conclusion of this paragraph refers to the effect of TTP on the stability of these mRNAs, the measurement of TNF mRNA stability (Fig. sup. 3C) should be presented in the main part of Fig. 3.

      The TNF mRNA stability figure panel was moved to the main figures as Fig. 3C.

      10) Fig 4E aims to identify kinases and phosphatases potentially involved in TTP stability (line 277, line 298). However, the approach used here (a measure of intracellular TTP level) cannot distinguish between increased production of TTP or a decrease in TTP degradation.

      One of the main points of this experiment was to assess whether the steady-state increase in TTP in HUWE1 KO cells, which stems for an important part from increased stability (Fig. 2G), was influenced by TTP phospho-status. Thus, while we do not explicitly measure TTP protein half-life in this particular assay, it is very likely to reflect changes in TTP protein stability. This idea is consistent with the fact that treatment with p38i, MK2i, and CaclycA affected TTP steady-state levels consistent with their previously reported effects on TTP protein stability.

      11) Also, the result presented in fig. 4E, are not totally consistent with the results presented in 4A. Fig4D shows a similar level of endogenous TTP accumulating after 2h of LPS stimulation in Huwe1 KO and control cells while a clear difference in TTP level is observable in the same condition in fig. 4A. Could the difference in the TTP detection method (Western vs intracellular FACS) be responsible for this discrepancy?

      We do not exactly know, but agree that this could indeed be influenced by the measurement method per se, as well as small variations in cell density, or total sample numbers in a particular experiment (as this may increase the time outside of the incubator for handling/stimulations). The much larger sample size of the experiment from panel 6E, and having multiple different stimulations, may have contributed to a slightly delayed timing of the Huwe1-dependent phenotype. It is important to note, that we have consistently demonstrated with different measurement methods, that TTP is initially stabilized post-LPS treatment (2-3 h, insensitive to Huwe1 KO), followed by TTP degradation (6-16h, sensitive to Huwe1 KO).

      12) These experiments and data presented in Fig.5D show that the level of the TTP paralog ZFP36L1 accumulates in huwe1 KO cells but do not demonstrate that HUWE1 affects ZFP36L1 protein stability.

      We agree, and changed all instances in the text that claimed ZFP36L1 ‘stabilization’ to ‘increase in abundance’.

      13) Based on data presented in fig. 6 B and sup. 6B the authors conclude that residues S52 and 178, previously identified as regulators of TTP stability, are unlikely to be involved in HUWE1-dependent TTP accumulation. The data are only based on 2 independent experiments, one of which (fig 6B) shows a difference in TTP S52/S178 mutant in Huwe1 deficient cells as compared to wt TTP. These results seem therefore too preliminary to reliably exclude the implication of S52 and 178 on the HUWE1 accumulation of TTP.

      Additional new data with the S52/178 TTP mutant of six biological replicates has been added to the manuscript as Figure 6 – figure supplement 1C. Data from these experiments are consistent with our other results, and show that protein levels similarly increase for both wtTTP and the S52/178A mutant in Huwe1 KO cells.

      14) From these data, the authors conclude (line 416) that N-terminal deletion does not affect the TTP protein level. However, TTP accumulation in Huwe1 KO cells seems mostly lost in mutant N4. As mentioned above the limited number of replicates (n=2) and the absence of a statistical test makes the interpretation of this result difficult.

      Additional new data with the Δ4 mutant of two biological replicates has been added to the manuscript as Figure 6 – figure supplement 1E. Data from these experiments are consistent with our other results, and show that protein levels similarly increase for the Δ4 mutant in Huwe1 KO cells.

      15) Several TTP C-terminal mutants show a HUWE1-independent accumulation when compared to the wt protein (Fig6. D). Is this region identical to the unstructured region identified by Ngoc (line 1255) as a potent regulator of TTP degradation? If relevant this point should be discussed.

      Ngoc showed that fusion to GFP of either the N-terminal TTP part, or the TTP Cterminal part (aa 214-436), destabilized GFP in cells. Thus, the GFP destabilization was seemingly indiscriminate, and possibly caused by the disordered nature of the fusion construct per se. Since the C-terminal TTP part fused to GFP by Ngoc included aa 214-436, we cannot rule out that part of this effect was HUWE1-dependent. However, the discrepancy with our finding that the TTP N-terminus does not contribute to HUWE1-dependent TTP regulation, may suggest that the GFP fusions by Ngoc were destabilized by more general protein principles, rather than HUWE1-specific effects. Additional text conveying this notion was added to the Discussion section (line 490-497).

    1. Author Response

      Reviewer #1 (Public Review):

      Understanding the evolution of nitrogenases is a very important problem in the field of evolutionary biogeochemistry. Ancestral sequence reconstruction at least in theory could offer insights into how this planet alerting activity evolved from ancestors that did not reduce nitrogen. But the very many components of the nitrogenase enzyme system make this a very challenging question to answer.

      This paper now demonstrates the first empirical resurrection of functional ancestral nitrogenases both in vivo and in vitro. The nodes that are resurrected are very shallow in the nitrogenase tree and do not help answer how these proteins evolved. The authors' reasoning for choosing these nodes is that they are likely compatible with the metal cluster assembly machinery of their chosen host organism, A. vinelandii. The reader is left to wonder if deeper, more interesting nodes were tried but didn't yield any activity. As the paper stands, it proves that relatively shallow nitrogenase ancestors can be resurrected, but these nodes do not yet teach us anything very fundamental about how these enzymes evolved.

      Technically, this work was no doubt challenging. Genome engineering in A vinelandii is very difficult and time-consuming. This organism was chosen because it is an obligate aerobe, which makes it easier to handle than the many anaerobic bacteria and archaea that harbor nitrogenases. It does make one wonder if this choice of organism is wise: the authors themselves note that it probably has a set of specialized proteins that allow the nitrogenase to be assembled and function in the presence of oxygen. This may limit A. vinelandii's potential future ancestral reconstructions deeper in the tree, which according to the authors' reasoning probably requires different assembly machinery.

      The ancestral sequence reconstruction is done in two different ways: Two out of three reconstructions are carried out with what appears to be an incorrect algorithm implemented in older versions of RaxML. This algorithm is not a full marginal reconstruction, because it only considers the descendants of the node of interest for the reconstruction. The full algorithm (implemented e.g. in PAML and the newest versions of RaxML) considers all tips for a marginal reconstruction. The fact that this was called a marginal ancestral sequence reconstruction in RaxML's manual is unfortunate - as far as I understand it is in fact just the internal labelling of nodes produced by the pruning algorithm, which is not equivalent to a marginal reconstruction. In this specific case, it is unlikely that this has led to any fundamental issues with the reconstructions (as all are functional nitrogenases, which is to be expected in this part of the tree). For the shallower of the two nodes, the authors in fact verify that they get the same experimental results if they use PAML's full implementation of a marginal reconstruction (which yields a somewhat different sequence for this node). It would have been helpful to point this RaxML-related issue out in the methods, so as to prevent others from using this incorrect implementation of the ASR algorithm.

      One other slightly confusing aspect of the paper is that it contains two different maximum likelihood trees, which were apparently inferred using the same dataset, model, and version of RaxML. It is unclear why they have different topologies. This probably indicates a lack of convergence. Again, this does not cast any doubt on the uncontroversial findings of this paper that shallow nodes within the nitrogenases are also nitrogenases.

      We thank the reviewer for their careful appraisal of our article, and their helpful recommendations for improving its quality. We appreciate the reviewer’s comment regarding the experimental challenges associated with nitrogenase engineering and genetic studies of our bacterial model, Azotobacter vinelandii. The complexity of nitrogen fixation machinery does indeed present several experimental obstacles, though, as we note in our revised article, this feature also makes the systems-level approach we have implemented here ideal for evolutionary studies of nitrogenases and their associated network.

      The reviewer focuses on three central points: 1) the relevance of the targeted ancestral nodes for addressing fundamental questions concerning nitrogenase origins, 2) the applicability of our bacterial model for older reconstructions, and 3) issues associated with the different trees/methods for ancestral sequence reconstruction.

      Addressing the first point, we concede that targeting relatively shallow nodes cannot specifically test hypotheses concerning the earliest stages of nitrogenase evolution (e.g., “how this planet altering activity evolved from ancestors that did not reduce nitrogen”). Our central result is that a specific, enzymatic mechanism for dinitrogen binding reduction (established for three modern nitrogenases to date) extends back through nitrogenase ancestry over the studied timeline. More broadly, a conserved nitrogenase mechanism in the only surviving family of nitrogenase families suggests that life may have been constrained in its available strategies for achieving this challenging biochemical reaction. By comparison, multiple abiotic pathways for nitrogen fixation are feasible, and another, ecologically vital metabolism, carbon fixation, can proceed by at least seven pathways. Deeper investigations into these possible evolutionary constraints and across deeper portions of the nitrogenase tree will require continued study, which we anticipate will be facilitated by the experimental approach presented in this article.

      Concerning the applicability of our bacterial model, we agree that it is possible that older reconstructions may require different host organisms so as to provide a compatible genetic background. Similar considerations we have outlined in our article, including a systematic evaluation of the genetic components that likely accompanied nitrogenase ancestors in their ancient hosts, will likely be necessary. Nevertheless, we foresee that the general, systems-level approach that we have built for Azotobacter can be adapted for additional microbial models, and that these efforts will be worthwhile given the significance of biological nitrogen fixation to evolutionary biogeochemistry and microbial engineering applications.

      Finally, we thank the reviewer for noting the differences in the ancestral sequence reconstruction algorithms of RAxML v.8 and PAML and welcome an explanation of these issues in our revised article. We confirm that RAxML v.8 does not perform full marginal reconstruction (in contradiction to its description in the RAxML manual). Due to this concern, we repeated our ancestral sequence reconstruction with PAML, which, like newer versions of RAxML, does implement the full algorithm. Here, ancestors reconstructed by RAxML v.8 and PAML from equivalent phylogenetic nodes yield comparable experimental results, indicating that the algorithm differences have not significantly impacted the major outcomes of our study. In the second analysis, we repeated the entire phylogenetic ancestral sequence reconstruction workflow, though did not trim the alignment as we did in the first case (this has now been clarified). This likely explains the differences in our trees, as the reviewer notes. We have included these details in the Materials and Methods section of our revised article.

      In addition to expanding upon the points outlined above throughout the revised article, we have included additional text in the Discussion that elaborates on the limitations of our study, and in particular, the need to explore deeper portions of the nitrogenase tree in future work.

      Reviewer #2 (Public Review):

      The authors convincingly show that their reconstructed ancestral nitrogenases are active both in vivo and in vitro, and show similar inhibitory effects as extant/wild-type enzymes.

      The conclusion that, evolutionarily, there is a "single available mechanism for dinitrogen reduction" is not well explored in the paper. This suggests a limitation of using ancestral sequence reconstruction in this instance.

      We thank the reviewer for their comments and appreciate their assessment that the core experimental results are conclusively demonstrated, including in vivo/in vitro activity of ancestral nitrogenase enzymes and that they all exhibit the specific mechanism for dinitrogen binding and reduction, evidenced by hydrogen inhibition.

      We note the reviewer’s concern regarding the evolution of the dinitrogen reduction mechanism described above. Our primary conclusion is that this mechanism is conserved in the studied nitrogenase ancestors, which, together with previous demonstrations of this mechanism in the different nitrogenase isozymes (Mo, V, Fe) of Azotobacter vinelandii, suggests that this is an early evolved feature of the nitrogenase family. These enzymes have thus not only been performing an ecologically vital, metabolic function, but have likely been achieving this challenging biochemical reaction in the same manner for billions of years. We discuss the resulting implications as they relate to evolutionary constraints on biological nitrogen fixation strategies. We clarify that our presented paleomolecular approach cannot directly evaluate alternate evolutionary scenarios that did not persist and were not preserved in extant genomic sequences, as ancestral sequence reconstruction is fundamentally informed by extant sequence diversity. Our approach is a powerful tool for defining the contours of ancestral nitrogenase sequence-function space, which can serve as a basis for engineering and evaluating alternate scenarios. We have clarified these points in our Discussion.

      Reviewer #3 (Public Review):

      In this work, the authors attempt to probe the constraints on the early evolution of nitrogen fixation, the development of which presented a key metabolic transition. Given that life on Earth evolved only once (to our knowledge) which aspects were necessary and which may have taken a different course are open questions. Are there alternative forms of life, metabolic networks, or even enzymatic mechanisms that could have replaced the ones we see today, or is the space of possible biologies limited? This manuscript tests the ability of ancestrally-reconstructed molybdenum-dependent nitrogenase complexes to support diazotrophic growth in Azotobacter vinelandii, as well as in vivo and in vitro activity, which all point towards a conserved mechanism for nitrogen reduction at least since proteobacteria divergence.

      This is an ambitious project, requiring multiple techniques, systems, and approaches, and the successful combination of these is one of the major strengths of this work. Using parallel techniques is an important way to be certain that the overall results are robust, and an appropriate mix of in vivo and in vitro experiments is chosen here. The manuscript should serve as a useful model for how to combine phylogenetics and biochemistry.

      The nature of ASR means that a solid phylogeny and/or understanding of how robust the results are to uncertainty in reconstructed states is essential since all results flow from there. The overall phylogenetic methods used are appropriate and the system is an apt one for the technique, but there is not quite enough detail in the methods to be certain of the results. Given that only the single maximum a posteriori sequence is assayed at every 3 nodes, this may have compounding results in that the sensitivity to uncertainty in the reconstruction is increased. The authors appropriately make qualitative rather than quantitative inferences, but some hesitation towards the overall results still exists.

      The assumption that the Anc1A/B and Anc2 nodes correspond to ancestral states might be undermined by horizontal gene transmission, which has been reported for nif clusters. In particular, there may be different patterns of transmission for each element of the cluster. By performing reconstruction with a concatenated alignment, the phylogenetic signal is potentially maximized, but with the assumption that each gene has an identical history. Discordant transmission may cause an incorrect topology to be recovered.

      Finally, I am unsure if ASR is the most appropriate approach to answer questions of contingency and alternative pathways for protein evolution. ASR may tell what nitrogenase millions or billions of years ago looked like, but it can only say what has already existed. If there are different mechanisms or metabolic pathways enabling nitrogen fixation that simply never came to pass, via contingency and entrenchment or simple chance, ASR would say nothing about them. It is true that a conserved mechanism would point towards a constrained space for evolving nitrogen fixation, but that does not directly address it.

      Overall, despite these issues, the manuscript is compellingly written and the figures are attractive and clear, and help get the major narrative across. This work will be of interest to protein biochemists of evolutionary bent and microbial physiologists with an interest in the origins of life.

      We thank the reviewer for their evaluation of our study and appreciate their comments regarding the experimental effort involved and scientific significance. We have carefully considered their recommendations to improve our article.

      The reviewer’s critical comments concern 1) the level of detail regarding the phylogenetic methodology, 2) the impact of horizontal gene transfer on phylogenetic reconstructions, and 3) the appropriateness of ancestral sequence reconstruction for accessing alternate evolutionary scenarios in the emergence of biological nitrogen fixation.

      We have addressed the first point by including additional methodological details regarding our phylogenetic analyses in our Materials and Methods section, including alignment and model testing tools, as well as our rationale for using two ancestral sequence reconstruction methods, RAxML and PAML.

      Regarding the second point, we acknowledge that horizontal gene transfer has played a significant role in the evolution and distribution of biological nitrogen fixation, which has been established and explored in previous work by others. We have included in our Discussion an additional paragraph which addresses potential impact of horizontal gene transfer in nitrogenase evolution. Though we do not expect horizontal transfer to contribute a significant source of uncertainty in the timeline studied for the reasons discussed in the revised manuscript, we agree that it is an important consideration for future work and that may impact reconstructions in other lineages within the nitrogenase phylogeny.

      Finally, in new text within the Discussion, we also acknowledge that ancestral sequence reconstruction cannot yet directly test alternate historical scenarios. We have clarified our language concerning conservation and constraints in the evolution of biological nitrogen fixation. Because ancestral sequence reconstruction is informed by modern sequences, it is limited to exploring the historical sequence space within their shared ancestry. It is therefore possible that, early in the history of life, there were multiple enzymatic strategies for fixing nitrogen, and that they were outcompeted and thus have left no trace in modern genomes. Another possibility is that these alternate strategies simply never evolved.

      In the present study, we have identified a pattern of conservation with regard to a specific mechanism for dinitrogen binding and reduction, suggesting a level of evolutionary constraint that can be further interrogated. For example, ancestral sequence reconstruction, as implemented in our nitrogenase resurrection strategy, can be used to empirically investigate the underlying sources of these constraints. We note that despite decades of research in this domain, a full understanding of how nitrogenases perform this remarkable metabolic step, both today and in the past, remains elusive (as other reviewers of the present study have also noted). Evolutionarily informed studies of nitrogenase function enabled by ASR can reveal the design principles that have shaped its direct ancestry, which can potentially serve as a basis for engineering alternative molecular strategies for nitrogen fixation. The power of the molecular paleogenetic approach here is in extending functional investigations beyond the sequence space occupied by modern nitrogenase and identifying patterns in their functional variation through their evolutionary histories.

    1. Author Response

      Reviewer #1 (Public Review):

      Because of the importance of brain and cognitive traits in human evolution, brain morphology and neural phenotypes have been the subject of considerable attention. However, work on the molecular basis of brain evolution has tended to focus on only a handful of species (i.e., human, chimp, rhesus macaque, mouse), whereas work that adopts a phylogenetic comparative approach (e.g., to identify the ecological correlates of brain evolution) has not been concerned with molecular mechanism. In this study, Kliesmete, Wange, and colleagues attempt to bridge this gap by studying protein and cis-regulatory element evolution for the gene TRNP1, across up to 45 mammals. They provide evidence that TRNP1 protein evolution rates and its ability to drive neural stem cell proliferation are correlated with brain size and/or cortical folding in mammals, and that activity of one TRNP1 cis-regulatory element may also predict cortical folding.

      There is a lot to like about this manuscript. Its broad evolutionary scope represents an important advance over the narrower comparisons that dominate the literature on the genetics of primate brain evolution. The integration of molecular evolution with experimental tests for function is also a strength. For example, showing that TRNP1 from five different mammals drives differences in neural stem cell proliferation, which in turn correlate with brain size and cortical folding, is a very nice result. At the same time, the paper is a good reminder of the difficulty of conclusively linking macroevolutionary patterns of trait evolution to molecular function. While TRNP1 is a moderate outlier in the correlation between rate of protein evolution and brain morphology compared to 125 other genes, this result is likely sensitive to how the comparison set is chosen; additionally, it's not clear that a correlation with evolutionary rate is what should be expected. Further, while the authors show that changes in TRNP1 sequence have functional consequences, they cannot show that these changes are directly responsible for size or folding differences, or that positive selection on TRNP1 is because of selection on brain morphology (high bars to clear). Nevertheless, their findings contribute strong evidence that TRNP1 is an interesting candidate gene for studying brain evolution. They also provide a model for how functional follow-up can enrich sequence-based comparative analysis.

      We thank the reviewer for the positive assessment. With respect to our set of control genes and the interpretation of the correlation between the evolution of the TRNP1 protein sequence and the evolution of brain size and gyrification, we would like to mention the following: we do think that the set is small, but we took all similarly sized genes with one coding exon that we could find in all 30 species. Furthermore, the control genes are well comparable to TRNP1 with respect to alignment quality and average omega (Figure 1-figure supplement 3). Hence, we think that the selection procedure and the actual omega distribution make them a valid, unbiased set to which TRNP1’s co-evolution with brain phenotypes can be compared to. Moreover, we want to point out that by using Coevol, we correlate evolutionary rates, that is the rate of protein evolution of TRNP1 as measured with omega and the rate of brain size evolution that is modeled in Coevol as a Brownian motion process. We think that this was unclear in the previous version of our manuscript, and appreciate that the reviewer saw some merit in our analyses in spite of it.

      Finding conclusive evidence to link molecular evolution to concrete phenotypes is indeed difficult and necessarily inferential. This said, we still believe that correlating rates of evolution of phenotype and sequence across a phylogeny is one of the most convincing pieces of evidence available.

      Reviewer #2 (Public Review):

      In this paper, Kliesmete et al. analyze the protein and regulatory evolution of TRNP1, linking it to the evolution of brain size in mammals. We feel that this is very interesting and the conclusions are generally supported, with one concern.

      The comparison of dN/dS (omega) values to 125 control proteins is helpful, but an important factor was not controlled. The fraction of a protein in an intrinsically disordered region (IDR) is potentially even more important in affecting dN/dS than the protein length or number of exons. We suggest comparing dN/dS of TRNP1 to another control set, preferably at least ~500 proteins, which have similar % IDR.

      Thank you for this interesting suggestion. As mentioned in the public response to Reviewer #1, we are sorry that we did not explain the rationale of the approach very well in the previous version of the manuscript. As also argued above, we think that our control proteins are an unbiased set as they have a comparable alignment quality and an average omega (dN/dS) similar to TRNP1 (Figure 1-figure supplement 3). While IDR domains tend to have a higher omega than their respective non-IDR counterparts, we do not think that the IDR content should be more relevant than omega itself as we do not interpret this estimate on its own, but its covariance with the rate of phenotypic change. Indeed, the proteins of our control set that have a higher IDR content (D2P2, Oates et al. 2013) do not show stronger evidence to be coevolving with the brain phenotypes (IDR content vs. absolute brain size-omega partial correlation: Kendall's tau = 0.048, p-value = 0.45; IDR content vs. absolute GI-omega partial correlation: Kendall’s tau = -0.025, p-value = 0.68; 88 proteins (71%) contain >0% IDRs; 8 proteins contain >62% (TRNP1 content) IDRs.

      Reviewer #3 (Public Review):

      In this work, Z. Kliesmete, L. Wange and colleagues investigate TRNP1 as a gene of potential interest for the evolution of the mammalian cortex. Previous evidence suggests that TRNP1 is involved in self-renewal, proliferation and expansion in cortical cells in mouse and ferret, making this gene a good candidate for evolutionary investigation. The authors designed an experimental scheme to test two non-exclusive hypotheses: first, that evolution of the TRNP1 protein is involved in the apparition of larger and more convoluted brains; and second, that regulation of the TRNP1 gene also plays a role in this process alongside protein evolution.

      The authors report that the rate of TRNP1 protein evolution is strongly correlated to brain size and gyrification, with species with larger and more convoluted brains having more divergent sequences at this gene locus. The correlation with body mass was not as strong, suggesting a functional link between TRNP1 and brain evolution. The authors directly tested the effects of sequence changes by transfecting the TRNP1 sequences from 5 different species in mouse neural stem cells and quantifying cell proliferation. They show that both human and dolphin sequences induce higher proliferation, consistent with larger brain sizes and gyrifications in these two species. Then, the authors identified six potential cis-regulatory elements around the TRNP1 gene that are active in human fetal brain, and that may be involved in its regulation. To investigate whether sequence evolution at these sites results in changes in TRNP1 expression, the authors performed a massively parallel reporter assay using sequences from 75 mammals at these six loci. The authors report that one of the cis-regulatory elements drives reporter expression levels that are somewhat correlated to gyrification in catarrhine monkeys. Consistent with the activity of this cis-regulatory sequence in the fetal brain, the authors report that this element contains binding sites for TFs active in brain development, and contains stronger binding sites for CTCF in catarrhine monkeys than in other species. However, the specificity or functional relevance of this signal is unclear.

      Altogether, this is an interesting study that combines evolutionary analysis and molecular validation in cell cultures using a variety of well-designed assays. The main conclusions - that TRNP1 is likely involved in brain evolution in mammals - are mostly well supported, although the involvement of gene regulation in this process remains inconclusive.

      Strengths:

      • The authors have done a good deal of resequencing and data polishing to ensure that they obtained high-quality sequences for the TRNP1 gene in each species, which enabled a higher confidence investigation of this locus.

      • The statistical design is generally well done and appears robust.

      • The combination of evolutionary analysis and in vivo validation in neural precursor cells is interesting and powerful, and goes beyond the majority of studies in the field. I also appreciated that the authors investigated both protein and regulatory evolution at this locus in significant detail, including performing a MPRA assay across species, which is an interesting strategy in this context.

      Weaknesses:

      • The authors report that TRNP1 evolves under positive selection, however this seems to be the case for many of the control proteins as well, which suggests that the signal is non-specific and possibly due to misspecifications in the model.

      • The evidence for a higher regulatory activity of the intronic cis-regulatory element highlighted by the authors is fairly weak: correlation across species is only 0.07, consistent with the rapid evolution of enhancers in mammals, and the correlation in catarrhine monkeys is seems driven by a couple of outlier datapoints across the 10 species. It is unclear whether false discovery rates were controlled for in this analysis.

      • The analysis of the regulatory content in this putative enhancer provides some tangential evidence but no reliable conclusions regarding the involvement of regulatory changes at this locus in brain evolution.

      We thank the reviewer for the detailed comments. Indeed, TRNP1 overall has a rather average omega value across the tree and hence also the proportion of sites under selection is not hugely increased compared to the control proteins. This is good because we want to have comparable power to detect a correlation between the rate of protein evolution (omega) and the rate of brain size or GI evolution for TRNP1 and the control proteins. Indeed, what makes TRNP1 special is the rather strong correlation between the rate of brain size change and omega, which was only stronger in 4% of our control proteins. Hence, we do not agree with the weakness of model misspecification for TRNP1 protein evolution.

      We agree that the correlation of the activity induced by the intronic cis regulatory element (CRE) with gyrification is weak, but we dispute that the correlation is due to outliers (see residual plot below) or violations of model assumptions (see new permutation analysis in the Results section). There are many reasons why we would expect such a correlation not to be weak, including that a MPRA takes the CRE out of its natural genomic context. Our conclusions do not solely rest on those statistics, but also on independent corroborating evidence: Reilly et al (2015) found a difference in the activity of the TRNP1 intron between human and macaque samples during brain development. Furthermore, we used their and other public data to show that the intron CRE is indeed active in humans and bound by CTCF (new Figure 4 - figure supplement 2).

      We believe that the combined evidence suggests a likely role for the intron CRE for the co-evolution of TRNP1 with gyrification.

    1. Author Response

      Reviewer #1 (Public Review):

      The study's primary motivating goal of understanding how nutrigenomic signaling works in different contexts. The authors propose that OGT- a sugar-sensing enzyme- connects sugar levels to chromatin accessibility. Specifically, the authors hypothesize that the OGT/Plc-PRC axis in sweet taste neurons interprets the sugar levels and alters chromatin accessibility in sugar-activated neurons. However, the detailed model presented by authors on OGT/PRC/Pcl Rolled in regulating nutrigenomic signaling relies on pharmacological treatments and overexpression of transgenes to derive genetic interactions and pathways; these approaches provide speculative rather than convincing evidence. Secondly, evidence is absent to show that PRC occupancy remains the same in other neurons (non-sweet taste neurons) under varied sugar levels or OGT manipulations. Hence, the claim that OGT-mediated access to chromatin via PRC-Plc is a key regulatory arm of nutrigenomic signaling needs further substantiation.

      We thank the reviewer for their thoughtful reading of the manuscript and their suggestions. We disagree with the reviewer’s assessment that our work only relies solely on overexpression and pharmacological treatments and that this provides only “speculative” evidence. Indeed, both of the other two reviewers praised our approach:

      Reviewer 2: “This is an elegant group of experiments revealing mechanisms for how nutrigenomic signaling triggers cellular responses to nutrients”

      Reviewer 3: “Strengths: Good genetically targeted interventions; Thorough exploration of the epistatic relationships between different players in the system … The conclusions in this manuscript are mostly well or at least reasonably supported by data.

      All of our experiments combine genetic manipulations in combination with dietary and/or pharmacological treatments to show that molecular, neural, and behavioral taste phenotypes arise only in specific contexts, so no single phenotype occurs due to nonspecific manipulations. Without this approach, most of these epistatic relationships would be largely inaccessible in this system. We have also used a combination of both genetic and pharmacological tools to implicate not only genes but also their function (i.e., enzymatic activity) to nutrient-specific effects. Third, we established causality and relationship by inducing and rescuing the molecular, behavioral, and electrophysiological phenotypes. Thus, our model is based on a combination of direct and indirect data (genetic manipulations are by nature inferential) obtained from a controlled and careful set of experiments. Limitations of our approach were laid out under the “Limitation” section of the discussion, as well as alternative interpretations or possibilities. In the manuscript's revised version, we added additional genetic experiments to further support and validate our model and expanded data analyses as suggested by the reviewer.

      Reviewer #2 (Public Review):

      Nutrigenomics has advanced in recent years, with studies identifying how the food environment influences gene expression in multiple model organisms. The molecular mechanisms mediating these food-gene interactions are poorly understood. Previous work identified the enzyme O-GlcNAC (OGT) in mediating the decreased sensitivity in sweet-taste cells when exposed to a high-sugar diet. The present study, using fly gustatory neurons as a model, provides mechanistic insight into how nutrigenomic signaling encodes nutritional information into cellular changes. The authors expand previous work by showing that OGT is associated with neural chromatin at introns and transcriptional start sites, and that diet-induced changes in chromatin accessibility were amplified at loci with presence of both OGT and PRC2.1. The work also identifies Mitogen Activated Kinase as a critical mediator in this pathway. This is an elegant group of experiments revealing mechanisms for how nutrigenomic signaling triggers cellular responses to nutrients.

      We thank the reviewer for their thoughtful reading of the manuscript and their positive and actionable suggestions. We have addressed these in the revised manuscript.

      Reviewer #3 (Public Review):

      This paper dissects the molecular mechanisms of diet induced taste plasticity in Drosophila. The authors had previously identified two proteins essential for sugar-diet derived reduction of sweet taste sensitivity - OGT and PRC2.1. Here, they showed that OGT, an enzyme implicated in metabolic signaling with chromatin binding functions, also binds a range of genomic loci in the fly sweet gustatory receptor neurons where binding in a subset of those sites is diet composition dependent. Furthermore, a minority of OGT binding sites overlapped with PRC2.1 recruiter Pcl, where collectively binding of both proteins increased under sugar-diet while chromatin accessibility decreased. The authors demonstrate, that the observed taste plasticity requires catalytic activity of OGT, which impacts chromatin accessibility at shared OGT x Pcl but not diet induced occupancy. In an effort to identify transcriptional mechanisms that instantiate the plastic changes in sensory neuron functions the authors looked for transcription factors with enriched motifs around OGT binding sites and identified Stripe (Sr) as a transcription factor that yielded sugar taste phenotypes upon gain and loss of function experiments. In follow-up overexpression experiments, they show that this results in reduced taste sensitivity and reduced taste evoked spiking in gustatory receptor neurons. Notably the effects of Sr on taste sensitivity also depend on OGT catalytic activity as well as PRC2.1 function. Finally, they explore the function of rolled (rl) - an extracellular-signal regulated kinase (ERK) ortholog in Drosophila, suggested to function upstream of Sr - in diet induced gustatory plasticity. The authors showed that the overexpression of the constitutively active form of rl kinase results in reduced neuronal and behavioral responses to sucrose which was dependent on OGT catalytic activity. In sum, these findings reveal several new players that link dietary experience to sensory neuron plasticity and open up clear avenues to explore up- and downstream mechanisms mediating this phenomenon.

      Strengths:

      • Good genetically targeted interventions

      • Thorough exploration of the epistatic relationships between different players in the system• Identification of several new signaling systems and proteins regulating diet derived gustatory plasticity

      Weaknesses:

      • The GO term enrichment analyses with little functional follow up has limited explanatory power• ERK/rl data is a bit hard to interpret since any imbalance in this system appears to reduce gustatory sensitivity.

      The conclusions in this manuscript are mostly well or at least reasonably supported by data.

      We appreciate the reviewer’s thoughtful read of the manuscript and their feedback. We were pleased to read the reviewer’s positive comments on the experimental treatment of epistatic relationships and the identification of new pathways; we have addressed the reviewer’s comments and suggestions in the revised manuscript.

      We agree with the reviewer about the limited explanatory power of the GO term analysis. We have expanded our computation analysis of the OGT/PRC2 genes in Figure 5 and selected several of these genes for functional analysis. In the revised version of the manuscript, we show that several of the genes affected by diet via this nutrigenomic pathway impact sugar taste sensation as measured by PER. We also agree with the reviewer that the Erk data are harder to interpret than those from OGT or PRC2; this effect is somewhat expected, given the reported action of this kinase in neural activity and plasticity. Importantly, the epistatic interactions between ERK/Sr and OGT/PRC2 we discovered are intriguing and may be involved in other cellular processes beyond taste.

      Below are a few recommendations for improvement:

      • The paper claims to address cell-type-specific nutrigenomic regulatory mechanisms. However, this work only explores nutrigenomic mechanisms in a single cell type (Gr5a+ sweet sensing cells) and we don't really learn whether these nutrigenomic mechanisms exist in all other cell types or just Gr5a+ cells. It would be valuable to see how specific OGT and PRC2.1 binding locations and effects on chromatin accessibility are in a different cell type - e.g. bitter sensing Gr66a. This would reveal how global in nature these findings are and or which aspects of nutrigenomic signaling are specific for sweet sensory cells.

      This study is a cell-specific investigation of nutrigenomic mechanisms in the Gr5a+ sweet taste neurons, which is what we outlined to do. It was not our intention for this study to examine mechanisms across different cell types. However, we can understand the reviewer’s comment after rereading the abstract and introduction. As such, we have rewritten part of the manuscript to better introduce the rationale behind the study as the integration of metabolic signaling and cellular contexts. We hope this is now an improved framing for the study rationale.

      (As in response to the author’s recommendations): About analyzing the effects of diet on other cells; no doubt this is an interesting question. However, this also signifies embarking on a completely separate project that would take, optimistically speaking, at least one year to complete and require a budget of ~ $130,000 (see breakdown). Thus, this suggestion doesn’t seem in line with the peer review and editorial philosophy of eLife. Carrying out this new project would result in an additional 6-7 figures but would not fundamentally change the conclusion of the current work; in fact, it may even take away from the targeted integration of molecular biology and neuroscience we have tried to achieve. Beyond this, we do not have such an unallocated budget, and so this new project would require us first to generate preliminary data on the bitter neurons to write then a grant proposal to fund it; as you can appreciate, this would take longer than a year, especially since we do not even know if the bitter gustatory neurons are affected by a high-sugar diet. Beyond this, looking at the bitter neurons would do little to prove specificity. If we found no effects of this pathway on the activity of the bitter neurons, it wouldn’t establish that the changes in the sweet taste neurons are specific. In fact, the same pathway could be acting in some of the other thousands of fly circuits that were not investigated (Black swan effect). If we did find that OGT/PRC2/Sr play a role in the bitter neurons, it would also do little to disprove specificity since their targets would likely be different because the sets of genes expressed in these two sensory neurons are different. By analogy, the protein sensor mTOR is expressed and active in every cell, where it modulates some of the same targets (i.e., S6K); however, the effects of the pathway may be different due to the distinct metabolic and genetic idiosyncrasies of cells, as well as cellular compartments. This lack of specificity doesn’t mean that mTOR is not important. Finally, we would like to note that we have tested the effects of manipulating OGT levels in other neurons (dopamine and Mushroom Body Output Neurons) without effects on behavior or neural responses (May et al. 2020; Pardo-Garcia et al. 2022); based on these, OGT doesn’t seem to affect neurons indiscriminately.

      Budget = $129,000

      Salary and benefit for PD for 10 calendar months: (2 months behavior experiments, 2 months training for molecular biology experiments and troubleshooting in new neurons, 4 months growing flies and conducting experiments, 2 months data analysis and visualization)= $75,000. DAM ID: Pcl:dam and OGT:dam in CD and SD, with and without OSMI x 4 biological replicates per condition= 32 samples @ $500 per sample (UM Genomics core) $16,0000

      TRAP: Pcl mutant and OSMI in CD and SD x 4 biological replicates per condition + sequencing input = 32 samples @ $500 per sample (UM Genomics core) $16,0000

      Animals: $500 per person/10 months = $5,000

      Reagents: including sequencing kit (32 reactions =$6,000) x 2 = $12,000, and other reagents such as drugs and plastic = $17,000

      Note that this PD would have to be hired and retrained. The first author of the manuscript who carried out the molecular experiments graduated in Dec 2021 but failed to pass on the technical knowledge due to COVID restrictions at the UM: we were completely shut down until July 2020, and at 20% capacity from March 2020 to July 2021 (people couldn’t also work together to show techniques), and no new people joined the lab in 2020-2022 (most of the 2021 grad student class deferred to 2022).

      ● Behavioral data from the screen identifying Sr is missing. Which other candidates were screened and what were the phenotypes?

      We have now added the screen data in Fig. 5-Supplemental Fig. 1C. We targeted RNAi and OE transgenes against the candidate transcription factors (or control RNAi) to the Gr5a+ neurons and measured PER to 30, 20, and 5% sucrose in fasted flies on a control diet.

      ● Go terms analysis for Figure 4

      We selected a dozen DEGs dependent on OGT and PRC2.1 (purple circle in Fig. 4E) and tested the effects on PER when these were overexpressed or knocked down (depending on the direction of changes in the SD). In Fig. 4F we show the effects of a handful of them on proboscis responses to sucrose.

    1. Author Response

      Reviewer #2 (Public Review):

      The ability of the model to recreate one non-trivial aspect of the crossover distribution is not sufficient to rule out other possible models, which would be necessary to consider this work a significant advance. However, if the authors are able to provide additional, non-trivial predictions relating to this and to other experimental conditions, this would dramatically elevate their ability to claim that a coarsening-based mechanism is indeed the most plausible one to explain crossover distribution. Some of these conditions could involve experimental perturbation of key parameters in the model: HEI10 levels, the number of DSBs or recombination intermediates (the 'substrate' that ends up resulting in crossovers), the length of time coarsening is allowed to proceed, or the volume of the nucleus.

      As discussed above, we have now included additional experiments and modelling investigating the patterning of late-HEI10 foci in a pch2 mutant, which exhibits partial synapsis. We have also demonstrated that the nucleoplasmic coarsening model can explain the recently published massive elevation of COs in zyp1 + HEI10 overexpressor lines (Durand et al., 2022). We hope that these additional results, explaining other non-trivial aspects of CO patterning, sufficiently elevates this work to be considered as a significant advance within the field.

      Reviewer #3 (Public Review):

      The new model assumes the possibility of loading HEI10 directly from the nucleoplasm, which of course is logical considering the phenotype of the zyp1 mutant in Arabidopsis. However, in a situation where the SC is fully functional, should not we expect some level of nucleoplasmic coarsening in addition to the dominant SC-mediated coarsening? Should the original model not be corrected, and if it is not necessary (e.g., because it included this effect from the very beginning, or the effect is too weak and therefore negligible), the authors should discuss it. With reference to this observation, it would be worthwhile to compare different characteristics of both types of coarsening (e.g., time course).

      We agree with this reviewer that it seems intuitive and likely that some small amount of nucleoplasmic coarsening will persist even in the wild-type situation. As mentioned above, we have now explicitly modelled a combined version of the coarsening model than incorporates aspects of SC and nucleoplasm-mediated coarsening and compared this to simulation outputs from our original coarsening model (which did not incorporate nucleoplasmic recycling). The effects and implications of combining the two models on coarsening dynamics are now discussed.

      Recently, a preprint from the Raphael Mercier group has been released, in which the authors show a massive increase in crossover frequency in zyp1 mutants overexpressing HEI10. I think this is a great opportunity to check to what extent the parameters adopted by the authors in the nucleoplasmic coarsening model are universal and can correctly simulate such an experimental set-up. Therefore, can the authors perform such a simulation and validate it against the experimental data in Durand et al. doi.org/10.1101/2022.05.11.491364? Can CO sites identified by Durand et al. be used instead of MLH1 foci for the modeling?

      As mentioned above, we have now incorporated additional modelling demonstrating that the nucleoplasmic coarsening model can reproduce the massive increase in COs observed in zyp1 + HEI10 overexpressor lines (Durand et al., 2022). We have compared our model simulations against cytological data from this study (MLH1 counts from male Col-0 plants) as we feel this is the most appropriate data to compare our model against. The remaining CO patterning data in the Durand et al., paper is from genetic experiments, which are not optimal for comparing model simulations against for two main reasons. Firstly, the metric of interference (and coarsening) is microns of axis/SC length and not, for example, Mbp and we feel that (due to the non-uniform compaction of chromatin along pachytene chromosomes) the coarsening model cannot currently be reliably used to explain genetic mapping data. Secondly, genetic CO data includes both class I and class II COs, whereas the coarsening model only simulates class I CO patterning. Therefore, we strongly feel that, for now, it is better to exclusively rely on cytological data to fit our model against.

    1. Author Response

      Reviewer #2 (Public Review):

      By now, the public is aware of the peculiarities underlying the omicron variants emergence and dissemination globally. This study investigates the mutational biography underlying how mutation effects and epistasis manifest in binding to therapeutic receptors.

      The study highlights how epistasis and other mutation effect measurements manifest in phenotypes associated with antibody binding with respect to spike protein in the omicron variant. It rigorously tests a large suite of mutations in the omicron receptor binding domain, highlighting differences in how mutation effects affect binding to certain therapeutic antibodies.

      Interestingly, mutations of large effect drive escape from binding to certain antibodies, but not others (S309). The difference in the mutational signature is the most interesting finding, and in particular, the signature of how higher-order epistasis manifests in the partial escape in S309, but less so in the full escape of other antibodies.

      The results are timely, the scope enormous, and the analyses responsible.

      My only main criticisms walk the stylistic/scientific line: many of the others have pioneered discussions and methods relating to the measurement of epistasis in proteins and other biomolecules. While I recognize that the purpose of this study is focused on the public health implications, I would have appreciated more of a dive into the peculiarity of the finding with respect to epistasis. I think the authors could achieve this by doing the following:

      a) Reconciling discussions around the mutation effects in light of contemporary discussions of global epistasis "vs" idiosyncratic epistasis, etc. Several of the authors of the manuscript have written other leading manuscripts of the topic. I would appreciate it if the authors couched the findings within other studies in this arena.

      We added a discussion related to global epistasis at the end of the “Epistasis Analysis” methods section. We tried to highlight that the cause and relevance of global epistasis phenomena are quite different at molecular and at organismic level.

      B) While the methods used to detect epistasis in the manuscript make sense, the authors surely realize that methods used to measure is a contentious dimension of the field. I'd appreciate an appeal/explanation as to why their methods were used relative to others. For example, the Lasso correction makes sense, but there are other such methods. Citations and some explanation would be great.

      We added more context and justification in the methods section (Epistasis Analysis). We used Lasso correction not particularly to obtain a sparser representation of the epistasis coefficients (an assumption that is not always valid, particularly within proteins) but rather to reduce instabilities created by the Tobit model inference. In this inference, the model coefficients are unbounded. Thus, if one mutation causes a complete binding loss, all epistatic terms associated with this mutation are not constrained and can become very large in magnitude. A Lasso term with a small coefficient constrains these coefficients but will have a limited influence on the other coefficients.

      Lastly (somewhat relatedly), I found myself wanting the discussion to be bolder and more ambitious. The summary, as I read it, is on the nose and very direct (which is appropriate), but I want more: What do the findings say for greater discussions surrounding evolution in sequence space? For discussions of epistasis in proteins of a certain kind? In, my view, this data set offers fodder for fundamental discussion in evolutionary biology and evolutionary medicine. I recognize, however, the constraints: such topics may not be within the scope of a single paper, and such discussions may distract from the biomedical applications, which are more relevant for human health.

      But I might say something similar about the biomedical implications: the authors do a good job outlining exactly what happened, but what does this say about patterns (the role of mutations of large effect vs. higher-order epistasis) in some traits vs others? Why might we expect certain patterns of epistasis with respect to antibody binding relative to other pathogenic virus phenotypes?

      We agree that these are interesting questions, and have added a paragraph in the discussion to explore these points.

      In summary: rigorous and important work, and I congratulate the authors.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, the authors investigate a means of cell communication through physical connections they call membrane tubules (similar or identical to the previously reported nanotubes, which they reference extensively). They show that Cas9 transfer between cells is facilitated by these structures rather than exosomes. A novel contribution is that this transfer is dependent on the pair of particular cell types and that the protein syncytin is required to establish a complete syncytial connection, which they show are open ended using electron microscopy.

      The data is convincing because of the multiple readouts for transfer and the ultrastructural verification of the connection. The results support their conclusions. The implications are obvious, since it represents an avenue of cellular communication and modifications. It would be exciting if they could show this occurring in vivo, such as in tissue. The implication of this would be that neighboring cells in a tissue could be entrained over time through transfer of material.

      Thank the reviewer for his/her comments and suggestion. It’s possible that the thick tubular connections found in this study also exist in vivo. A previous study reported that TNT-like structures were found in mouse or human primary tumor cells (PMID: 34494703; PMID: 34795441). Our transfer assays could be adopted to evaluate such transfer in primary cultures and in vivo. We anticipate this for future work.

      Reviewer #2 (Public Review):

      There is a lot of interest in how cells transfer materials (proteins, RNA, organelles) by extracellular vesicles (EV) and tunneling nanotubes (TNTs). Here, Zhang and Schekman developed quantitative assays, based on two different reporters, to measure EV and direct contact-dependent mediated transfer. The first assay is based on transfer of Cas9, which then edits a luciferase gene, whose enzymatic activity is then measured. The second assay is based on a split-GFP system. The experiments on EV trafficking convincingly show that purified exosomes, or any other diffusible agent, are unable to transfer functional Cas9 (either EV-tethered or untethered) and induce significant luciferase activity in acceptor cells. The authors suggest a plausible model by which Cas9 (with the gRNA?) gets "stuck" in such vesicles and is thus unable to enter the nucleus to edit the gene.

      To test alternative pathways of transfer, e.g. by direct cell-cell contact, the authors co-cultured donor and acceptor cells and detect significant luciferase activity. The split GFP assay also showed successful transfer. The authors further characterize this process by biochemical, genetic and imaging approaches. They conclude that a small percentage of cells in the population produce open-ended membrane tubules (which are wider and distinct from TNTs) that can transfer material between cells. This process depends on actin polymerization but not endocytosis or trogocytosis. The process also seems to depend on endogenously expressed Syncytin proteins - fusogens which could be responsible for the membrane fusion leading to the open ends of the tubules.

      The paper provides additional solid evidence to what is already known about the inefficiency of EV-mediated protein transport. Importantly, it provides an interesting new mechanism for contact-dependent transport of cellular material and assigns valuable new information about the possible function of Syncytins. However, the evidence that the proteins and vesicles transfer through the tubules is incomplete and a few more experiments are required. In addition, certain inconsistencies within the paper and with previous literature need to be resolved. Finally, some parts of the text, methods and the figures require re-writing or additional information for clarity.

      Major comments

      1) In Figure 1F, the authors compare the function of exosome-transported SBP-Cas9-GFP vs. transient transfection of SBP-Cas9-GFP. It is not clear if the cells in the transiently transfected culture also express the myc-str-CD63 and were treated with biotin. It is important to determine if CD63-tethering itself affects Cas9 function.

      Thank the reviewer for his comments and suggestions. We now show in Figure 1- figure supplement 1D that CD63-tethering itself does not affect Cas9 function.

      2) The authors do not rule out that TNTs are a mode of transfer in any of their experiments. Their actin polymerization inhibition experiments are also in-line with a TNT role in transfer. This possibility is not discussed in the discussion section.

      Yes, the results in this study do not rule out a role for TNTs in the transfer. At present, we are not aware of conditions that would functionally distinguish transfer mediated by TNTs and thick tubules. We have now included this in the Discussion section.

      3) Issues with the Split GFP assay:

      a) On page 4, line 176, the authors claim that "A mixture of cells before co-culture should not exhibit a GFP signal". However, this result is not presented.

      The results of mixture experiment are included in Figure 2-figure supplement 1D, E.

      b) The authors show in Figure 2C and F that in MBA/HEK co-culture or only HEK293T co-culture, there are dual-labeled, CFP-mCherry, cells. First - what is the % of this sub-population? Second, the authors dismiss this population as cell adhesion (Page 5, line 192) - but in the methods section they claim they gated for single particles (page 17, line 642), supposedly excluding such events. There is a simple way to resolve this - sort these dual labeled cells and visualize under the microscope. Finally - why do the authors think that the GFP halves can transfer but not the mature CFP or mCherry?

      The plot in the Figure 2C and F are displayed in an all-cell mode, not in singlet mode. The percentage of dual-labeled CFP-mCherry in singlet was 0-0.2%. Thus, most of the signal was from doublet, or cell adhesion. We did not claim that the mature CFP or mCherry cannot be transferred. We suggested that the GFP signal of split-GFP recombination may be a more accurate reflection of cytoplasmic transfer between cells. In contrast, mature CFP or mCherry may simply attach to the cell surface but not enter into the other cells.

      c) In the Cas9 experiments - the authors detect an increase in Nluc activity similar in order of magnitude that that of transient transfection with the Cas9 plasmid - suggesting most acceptor cells now express Nluc. However, only 6% of the cells are GFP positive in the split-GFP assay. Can the authors explain why the rate is so low in the split-GFP assay? One possibility (related to item #2 above) is that the split-GFP is transferred by TNTs.

      The Cas9-based Nluc activity assay is more sensitive as it measures an enzyme with a very high turnover number. The split-GFP assay requires a transfer of GFP fragments to produce intact GFP molecules where the signal is not amplified. We think this explains the dramatic increase in a signal once Cas9 is transferred. Our cell sorting results suggest that at least 6% of the receptor cells are transferred in the co-cultures. Of course, nothing in either analysis rules out a role for TNTs in this transfer.

      4) The membrane tubules, the membrane fusion and the transfer process are not well characterized:

      a) The suggested tubules are distinct from TNTs by diameter and (I presume, based on the images) that they are still attached to the surface - whereas TNTs are detached. However, how are these structures different from filopodia except that they (rarely) fuse?

      We used TIRF microscopy and found that the thick tubules are not attached to the surface (not shown). Filopodia are much closer in diameter to TNTs (0.1-0.4 micron). The thick tubules we observe are much thicker (2-4 micron in diameter).

      b) Figure 5E shows that the acceptor cells send out a tubule of its own to meet and fuse. Is this the case in all 8 open-ended tubules that were imaged? Is this structure absent in the closed-ended tubules (e.g. as seen in Figures 6 & 8)?

      Around half of open-ended tubules appeared to emanate from acceptor cells. Likewise, for closed-ended tubules, for example, in Figure 6E where a recipient HEK293T cell projected a short tubule.

      c) The authors suggest a model for transport of the proteins tethered to vesicles (via CD63 tethering). However, the data is incomplete.

      i) They show only a single example of this type of transport, without quantification. How frequent is this event?

      The transport of the proteins tethered to vesicles (via CD63 tethering) were found in all 8 open-ended tubules that we detected in this study.

      ii) Furthermore, the labeling does not conclusively show that these are vesicles and not protein aggregates. Labeling of the vesicle - by dye or protein marker will be useful to determine if these are indeed vesicles, and which type.

      In Figure 4B, the moving punctum in a tubular connection appears to contain SBP-Cas9-GFP, Streptavidin-CD63-mCherry, and the cell surface WGA conjugate that may have been internalized into a donor cell endosome, which indicates that the moving punctum is vesicle type. Nonetheless, in general we cannot distinguish the forms of Cas9 that are transferred and become localized to the nucleus of target cells and we make no claim other than to suggest this possibility that Cas9 may be transferred as an aggregate.

      iii) The data from Figure 2 suggest (if I understand correctly) transfer of the CD63-tethered half-GFP, further strengthening the idea of vesicular transfer. However, the authors also show efficient transfer of untethered Cas9 protein (Figure 2A and other figures). Does this mean that free protein can diffuse through these tubules? The Cas9 has an NLS so the un-tethered versions should be concentrated in the nucleus of donor cells. How, then, do they transfer? The authors do not provide visual evidence for this and I think it is important they would.

      Based on the results using the Cas9-based luciferase assay (His- or SBP-tagged Cas9) (Figure 2A) and split-GFP assay (free GFP1-10) (Figure 2G), we suggest that free protein could be transferred between cells. Our current imaging approach is not designed to quantify protein diffusion. However, we are able to detect from images that Cas9-GFP does not colocalize exclusively with CD63 or concentrate in the nucleus, but also appears in the cytoplasm. These data indicate that both vesicle association and free diffusion may mediate the transfer through tubules. We thank the referee for emphasizing this issue which we will consider for future work to distinguish the transfer types through tubules.

      iv) In Figures 6 & 8, where transfer is diminished, there are still red granules in acceptors cells (representing CD63-mcherry). Does this mean that vesicles do transfer, just not those with Cas9-GFP? Is this background of the imaging? The latter case would suggest that the red granule moving from donor to acceptor cells in figure 4 could also be "background". This matter needs to be resolved.

      There are a few red puncta in the acceptor cell in Figure 6B. Since the acceptor cell is close to and overlapped with other donor cells containing CD63-mCherry, the red signal may, as the reviewer suggests, be from donor cells and not as a result of transfer through tubular connections. However, donor-acceptor cultures of HEK293T where transfer is not observed, little CD63-mCherry signal, for example, in Figure 6a, was seen in acceptor cells, even during several hours of observation (Figure 6- figure supplement video). A minor red signal could arise from exosomes secreted by donor cells that are internalized by acceptor cells. Images of single-culture receptor cells were added in Figure 4- figure supplement 1.

      For Figure 8, we used MDA-MB-231 syncytin-2 knock-down cells containing Fluc:Nluc:mCherry as the receptor cell, thus in these experiments the red signal most likely represents mCherry expressed in the acceptor cells.

      In Figure 4, we observed moving punctum in a tubular connection which contained co-localized green, red, and purple signals, corresponding to SBP-Cas9-GFP, streptavidin-CD63-mCherry, and the WGA conjugate, respectively. The video of punctum transport (Figure 4-figure supplement video) suggests that the red signal is not “background”.

      5) Why do HEK293T do not transfer to HEK293T?

      a) A major inexplicable result is that HEK293T express high levels of both Syncytin proteins (Figure 7 - supp figure 1A) yet ectopic expression of mouse Syncytin increases transfer (Figure 7E). Why would that be? In addition, Fig 3A shows high transfer rates to A549 cells - which express the least amount of Syncytin. The authors suggest in the discussion that Syncytin in HEK293T might not be functional without real evidence.

      We cannot yet explain why the basal level of syncytin expressed in HEK293 cells is insufficient to promote open-ended tubular connections between these cells. It could be that the proteins are not well represented in a processed form at the cell surface. Nonetheless, ectopic expression of mouse syncytin-A in HEK293T produced some increased transfer but less than when syncytin-A is ectopically expressed in MDA-MB-231 cells (up to 4-fold vs. 30-fold change of Nluc/Fluc signal) (Figure 7E). Furthermore, we have added new results which show that apparent furin-processed forms of syncytin-A, -1 and -2 can be detected by cell surface biotinylation in transfected MDA-MB-231 cells (Figure 8-figure supplement 1D). All we demonstrate is that syncytin in the acceptor cell is required for fusion and we make no claim that it is the only protein or lipid at the cell surface in the acceptor cell required for fusion. Clearly, more work is essential to establish the complexity of this fusion reaction.

      For A549 cells, syncytin-1 is highly expressed in A549 cells, thus it is possible that syncytin-1 in A549 plays crucial roles in the process.

      b) In addition - previous publications (e.g. PMID: 35596004; 31735710) show that over expression of syncytin-1 or -2 in HEK293T cells causes massive cell-cell fusion. The authors do not provide images of the cells, to rule out cell-cell fusion in this particular case.

      Overexpression of syncytin-1 or -2 in cells indeed causes massive cell-cell fusion, while overexpression of syncytin-A induced much less cell fusion than syncytin-1, or -2. We have now added new images shown in Figure 8-figure supplement 1A-C to document these observations. It may be that overexpressed human syncytins are better represented in a furin-processed form in both cell types. In contrast, we did not observe donor-acceptor cell fusion at basal levels of expression of syncytin in HEK293T and MDA-MB-231. For example, the Figure 4-figure supplement video shows that tubular structures were seen to form and break during the course of visualization with a tubule fusion event but no cell fusion to form heterokaryons.

      Reviewer #3 (Public Review):

      In this manuscript, Zhang and Schekman investigated the mechanisms underlying intercellular cargo transfer. It has been proposed that cargo transfer between cells could be mediated by exosomes, tunneling nanotubes or thicker tubules. To determine which process is efficient in delivering cargos, the authors developed two quantitative approaches to study cargo transfer between cells. Their reporter assays showed clearly that the transfer of Cas9/gRNA is mediated by cell-cell contact, but not by exosome internalization and fusion. They showed that actin polymerization is required for the intercellular transfer of Cas9/gRNA, the latter of which is observed in the projected membrane tubule connections. The authors visualized the fine structure of the tubular connections by electron microscopy and observed organelles and vesicles in the open-ended tubular structure. The formation of the open-ended tubule connections depends on a plasma membrane fusion process. Moreover, they found that the endogenous trophoblast fusogens, syncytins, are required for the formation of open-ended tubular connections, and that syncytin depletion significantly reduced cargo Cas9 protein transfer.

      Overall, this is a very nice study providing much clarity on the modes of intercellular cargo transfer. Using two quantitative approaches, the authors demonstrated convincingly that exosomes do not mediate efficient transfer via endocytosis, but that the open-ended membrane tubular connections are required for efficient cargo transfer. Furthermore, the authors pinpointed syncytins as the plasma membrane fusogenic proteins involved in this process. Experiments were well designed and conducted, and the conclusions are mostly supported by the data. My specific comments are as follows.

      1) The authors showed that knocking down actin (which isoform?) in both donor and acceptor cells blocked transfer, and more so in the acceptor cells perhaps due to the greater knockdown efficiency in these cells. However, Arp2/3 complex knockdown in donor cells, but not recipient cell, reduced Cas9 transfer. It would be good to clarify whether the latter result suggests that the recipient cells use other actin nucleators rather than Arp2/3 to promote actin polymerization in the cargo transfer process. Are formins involved in the formation of these tubular connections?

      We thank the reviewer for his/her comments and suggestions. Beta-actin was knocked down in this study. We tried a formin inhibitor, SMIFH2 which resulted in a decrease the Cas9 transfer between cells (Figure 3F).

      2) The authors provided convincing evidence to show that the tubular connections are involved in cargo transfer. Intriguingly, in Figure 4-figure supplement video (upper right), protein transfer appeared to occur along a broad cell-cell contact region instead of a single tubular connection. How often does the former scenario occur? Is it possible that transfer can happen as long as cells are contacting each other and making protrusions that can fuse with the target cell?

      In the Figure 4-figure supplement video (upper right), it may be that several membrane tubes from several different donor cells contact at sites close to one another on the recipient cell resulting in the appearance a broad cell-cell contact. This was a rare observation. In our quantification, only 8 connections were open-ended in 120 cell-cell contact junctions. Once open-ended, or plasma membrane fused, cargo transfer is observed.

      3) The requirement of MFSD2A in both donor (HEK293T) and recipient (MDA-MB-231) cells is consistent with a role for syncytin-1 or 2 in both types of cells. Since HEK293T cells contain both syncytins and MFSD2A but cargo transfer does not occur among these cells, does this suggest that syncytins and/or MFSD2A are only trafficked to the HEK293T cell membrane in the presence of MDA-MB-231 cells?

      A proper answer to this question requires the visualization of syncytins and MFSD2A. The commercial syncytin antibodies were inadequate for immunofluorescence. In advance of the more detailed effort required to tag the genes for endogenous syncytin 1 and 2, we performed live cell imaging and surface biotin labeling of cells transiently transfected to express fluorescently-tagged forms of syncytin-1, -2 and -A. We now show that syncytin-A, -1, and -2 partially localize to the plasma membrane or the cell surface of MDA-MB-231 and at points of cell-cell contact. In fact, overexpression of codon-optimized human syncytin-1, and -2 induced dramatic HEK293T cell-cell fusion. However, at basal levels of syncytin expression, HEK293T could not form open-ended tubular connections, which may be because the basal level of syncytins are not well represented in a processed form at the cell surface or their activity is limited by unknown factors.

      As an independent test of cell surface localization, we used surface biotinylation to show that a fraction of the syncytins can be labeled externally (Figure 8-figure supplement 1D). This fraction shows evidence of proteolytic processing consistent with furin cleavage whereas the overwhelming majority of transfected syncytins detected in a blot of lysates suggests that most remain in the unprocessed precursor form, consistent with the punctate and reticular fluorescence images (Figure 8-figure supplement 1A-C).

      We used IF and GFP-tagged MFSD2A and found this protein partially localized to the plasma membrane of HEK293T cells (Figure 9E, F). Given the results reveal that cargos could be transferred among MDA-MB-231 cells (Figure 2G), syncytin and its receptor appear to function in transfer among these cells.

    1. Author Response:

      eLife assessment

      This is a valuable initial study of cell type and spatially resolved gene expression in and around the locus coeruleus, the primary source of the neuromodulator norepinephrine in the human brain. The data are generated with cutting-edge techniques, and the work lays the foundation for future descriptive and experimental approaches to understand the contribution of the locus coeruleus to healthy brain function and disease. However, due to small sample size and the need for additional confirmatory data, the data only incompletely support the main conclusions presented here. With the strengthening of the analyses, this paper, and the associated web application, will be of great interest to neuroscientists working on arousal-based behaviors and neurological and neuropsychiatric phenotypes.

      Thank you for the assessment and comments. Overall, the majority of the issues raised by the reviewers relate either directly or indirectly to limitations of the sample size that precluded further optimization of protocols and expansion of the dataset. We fully acknowledge the limited sample size in this dataset and aim to be transparent about the limitations of the study. This is the first report of snRNA-seq and spatially-resolved transcriptomics in the human locus coeruleus (LC). The LC is a very small nucleus, located deep within the brainstem, which is extremely challenging to study due to its small size, difficult to access location, and the very small number of norepinephrine (NE) neurons located within the nucleus, which were of prime interest for this study. We note that this study represents our initial attempt to molecularly and spatially characterize cell types within the human LC. We note that we did not have significant, established funding from extramural sources dedicated to this study, and tissue resources for the LC are difficult to ascertain, contributing to the small sample size in this initial study. We acknowledge that there are limitations in sample size as well as data quality. Findings from this study will be used to inform, improve, and optimize future and ongoing experimental design, as well as technical and analytical workflows for larger-scale studies. As brought up by one of the reviewers, this field is still in its infancy -- pilot experimentation in new brain regions is labor-intensive and these sequencing approaches remain costly. Moreover, due to the small size and difficulties in dissecting, tissue resources from the human brain in this area are a highly limited resource. Hence, notwithstanding limitations, in our view it is important to release the data for community access at this time. Specific responses to the reviewers’ comments are provided point-by-point in the following sections.

      Reviewer #1 (Public Review):

      Weber et al. collect locus coeruleus (LC) tissue blocks from 5 neurotypical European men, dissect the dorsal pons around the LC and prepare 2-3 tissue sections from each donor on a slide for 10X spatial transcriptomics. […] The authors transparently present limitations of their work in the discussion, but some points discussed below warrant further attention.

      Specific comments:

      1) snRNAseq:

      a. Major concerns with the snRNAseq dataset are A) the low recovery rate of putative LC-neurons in the snRNAseq dataset, B) the fact that the LC neuron cluster is contaminated with mitochondrial RNA, and C) that a large fraction of the nuclei cannot be assigned to a clear cell type (presumably due to contamination or damaged nuclei). The authors chose to enrich for neurons using NeuN antibody staining and FACS. But it is difficult to assess the efficacy of this enrichment without images of the nuclear suspension obtained before FACS, and of the FACS results. As this field is in its infancy, more detail on preliminary experiments would help the reader to understand why the authors processed the tissue the way they did. It would be nice to know whether omitting the FACS procedure might in fact result in higher relative recovery of LC-neurons, or if the authors tried this and discovered other technical issues that prompted them to use FACS.

      Thank you for these comments. We agree these are valid concerns in assessing the data quality and validity of the findings from the snRNA-seq dataset. We will respond to these concerns here to the best of our ability, but in some cases, we do not have definitive answers since comparison data are not yet available for this region. In particular, we were limited in resources for this initial study -- some of the results of the study and issues that we identified in attempting to molecularly profile cells in the human LC were surprising to us, and we intend to generate additional samples and troubleshoot these issues to improve data quality and increase recovery in future work. However, these experiments are (i) expensive, (ii) time- and labor-intensive, and (iii) the tissue for this region is limited and difficult to ascertain. Given the extremely small size of the LC, the tissue resource is quickly depleted. For this study, we had fixed resources and made best-guess decisions on how to proceed with the experimental design, based on our experience with snRNA-seq in other human brain regions (Tran and Maynard et al. 2021). However, the LC is a unique region, and our experiences with this dataset will guide us to make technical adjustments in future studies. Due to the limitations in the tissue resources and the lack of data currently available to the community, we wanted to share these results immediately while acknowledging the limitations of the study as we work to increase our resource availability to expand molecular and spatial profiling studies in this region of the human brain.

      Regarding the reviewer’s concern that our choice to use FANS to enrich for neurons could have potentially led to more damage and contributed to the low recovery rate of LC-NE neurons and the mitochondrial contamination -- we do not have a definitive answer to this question, since we did not perform a direct comparison with non-sorted data. As noted above, our limited tissue resource dictated that we could not do both. We made the decision to enrich for neurons based on our previous experience with identifying relatively rare populations in other brain regions (e.g. nucleus accumbens and amygdala; Tran and Maynard et al. 2021). Based on this previous work, our rationale was that without neuronal enrichment, we could potentially miss the LC-NE population, given the relative scarcity of this neuronal population. The low recovery rate and relatively lower quality / contamination issues may be due to technical issues that lead to LC-NE neurons being more susceptible to damage during nuclear preparation and sorting. We agree that directly comparing to data prepared without NeuN labeling and sorting is reasonable, as the additional perturbations may indeed contribute to cell damage. As mentioned in the discussion, we do not have a definitive answer to the reasons for increased mitochondrial contamination and we suspect that multiple technical factors may contribute -- including the relatively large size and increased fragility of LC-NE neurons. We agree that systematically optimizing the preparation to attempt to increase recovery rate and decrease mitochondrial contamination are important avenues for future work.

      b. It is unclear what percentage of cells that make up each cluster.

      We will add this information in the clustering heatmaps or as a supplementary plot in a revised version of the manuscript.

      c. The number of subjects used in each analysis was not always clear. Only 3 subjects were used for snRNAseq, and one of them only yielded 4 LC-nuclei. This means the results are essentially based on n=2. The authors report these numbers in the corresponding section, but the first sentence of the results section (and Figure 1C specifically!) create the impression that n=5 for all analyses. Even for spatial transcriptomics, if I understood it correctly, 1 sample had to be excluded (n=4).

      This is correct. We will update the figures and text in a revised version of the manuscript to make this limitation (small sample size) more clear, and to further emphasize that the intention of this study is to provide initial data to help determine next steps and best practices for a larger scale and more comprehensive study on this region, especially given the limited availability of tissue resources and currently limited data resources available for this region.

      2) Spatial transcriptomics:

      a. It is not clear to me what the spatial transcriptomics provides beyond what can be shown with snRNAseq, nor how these two sets of results compare to each other. It would be more intuitive to start the story with snRNAseq and then try to provide spatial detail using spatial transcriptomics. The LC is not a homogeneous structure but can be divided into ensembles based on projection specificity. Spatial transcriptomics could - in theory - offer much-needed insights into the spatial variation of mRNA profiles across different ensembles, or as a first step across the spatial (rostral/caudal, ventral/dorsal) extent of the LC. The current analyses, however, cannot address this issue, as the orientation of the LC cannot be deduced from the slices analyzed.

      We understand the point of the reviewer. However, we structured the manuscript in this format due to our aims of creating a data resource for the community as well as being transparent about the limitations of our study. Our experiments began with the spatial experiments on the tissue blocks because this (i) helped orient ourselves to the region, and (ii) provided guidance for how best to score the tissue blocks for the snRNA-seq experiments to maximize recovery of LC-NE neurons. Therefore, we also decided to present the results in this sequence.

      The spatial data also provides more information in that the measurements are from nuclei, cytoplasm, and cell processes (instead of nuclei only). This is one of the main differences / advantages between the platforms at this level of spatial resolution. As noted above, we were also working with a finite tissue resource -- if we ran snRNA-seq first and captured no neurons, the tissue block would be depleted. Due to the logistics / thickness of the required tissue sections for Visium and snRNA-seq respectively, running Visium first allowed us to ensure that we could collect data from both assays.

      Regarding a point raised below on why we only ran snRNA-seq on a subset of the donors -- this was due to resource depletion and not enough available tissue remaining on the tissue blocks to run the assay. We have conducted extensive piloting in other brain regions on the amount (mg) of tissue that is needed from various sized cryosections, and the LC is particularly difficult since these are small tissue blocks and the extent of the structure is small. Hence, in some of the subjects, we did not have sufficient tissue available for the snRNA-seq assay.

      We agree with the reviewer that spatial studies could, in future work, offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other small, challenging brain regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples, e.g. spaced serial sections across the extent of the LC to make these types of insights. Due to the rarity of the tissue, limited availability of information in this region, and high expense of conducting these studies, we want to share this initial data with the community immediately. We also note that in addition to the 10x Genomics Visium platform, which lacks cellular and sub-cellular resolution, many new and exciting spatial platforms are entering the market, which may be able to address questions in very small regions such as the LC at higher spatial resolution.

      b. Unfortunately, spatial transcriptomics itself is plagued by sampling variability to a point where the RNAscope analyses the authors performed prove more powerful in addressing direct questions about gene expression patterns. Given that the authors compare their results to published datasets from rodent studies, it is surprising that a direct comparison of genes identified with spatial transcriptomics vs snRNAseq is lacking (unless this reviewer missed this comparison). Supplementary Figure 17 seems to be a first step in that direction, but this is not a gene-by-gene comparison of which analysis identifies which LC-enriched genes. Such an analysis should not compare numbers of enriched genes using artificial cutoffs for significance/fold-change, but rather use correlations to get a feeling for which genes appear to be enriched in the LC using both methods. This would result in one list of genes that can serve as a reference point for future work.

      We agree this is a good suggestion, and will add additional computational analyses to address this point in a revised version of the manuscript.

      c. Maybe the spatial transcriptomics could be useful to look at the peri-LC region, which has generated some excitement in rodent work recently, but remains largely unexplored in humans.

      We agree this is an excellent suggestion -- assessing cross-species comparisons related to convergence, especially, of GABAergic cell populations in the human LC is of high interest. We note that these types of extensions are exactly the reason why we have provided the publicly accessible web app (R/Shiny app, which includes the ability to annotate regions). We hope that others will use these apps for specialized topics they are interested in. As discussed above, we note that our initial dissections precluded the ability to keep track of the exact orientation of our tissue sections on the Visium arrays with respect to their location within the brainstem, so definitive localization of this region across subjects is difficult in our current study. However, it is possible, for example, to investigate whether there is a putative peri-LC region that is densely GABAergic that is homologous with the GABAergic peri-LC region in rodents. We also raise attention to a recent preprint by Luskin and Li et al. (2022), who apply snRNA-seq and spatially-resolved transcriptomics to molecularly define both LC and peri-LC cell types in mice -- in a revised version of our manuscript, we will extend our computational analyses of inhibitory neuronal subtypes in our data (Supplementary Figures 13, 16) to directly compare with those identified in this study in more detail. As noted above, we we have now developed a number of specialized technical and logistical strategies for keeping track of orientation of sections from the tissue block onto a single spatial array, and we feel that combined with optimized dissection strategies for this region and the guide of RNAscope for GABAergic markers on serial sections, that annotating the peri-LC region on spatial arrays in future studies will be possible.

      3) The comparison of snRNAseq data to published literature is laudable. Although the authors mention considerable methodological differences between the chosen rodent work and their own analyses, this needs to be further explained. The mouse dataset uses TRAPseq, which looks at translating mRNAs associated with ribosomes, very different from the nuclear RNA pool analyzed in the current work. The rat dataset used single-cell LC laser microdissection followed by microarray analyses, leading to major technical differences in terms of tissue processing and downstream analyses. The authors mention and reference a recent 10x mouse LC dataset (Luskin et al, 2022), however they only pick some neuropeptides from this study for their analysis of interneuron subtypes (Figure S13). Although this is a very interesting part of the manuscript, a more in-depth analysis of these two datasets would be very useful. It would likely allow for a better comparison between mouse and human, given that the technical approach is more similar (albeit without FACS), and Luskin et al have indicated that they are willing to share their data.

      As noted above, we plan to extend our comparisons with the dataset from Luskin and Li et al. (2022) in a revised version of the manuscript, which will provide a more in-depth cross-species comparison. In addition, we also note that there are some additional recent studies using TRAPseq of LC-NE neurons in a functional context, i.e. treatment vs. control experiments or in model systems (e.g. Iannitelli et al. 2023), which provide new opportunities for understanding disease context using in-depth cross-species comparisons. By providing our dataset and reproducible code, we will enable others to adapt and extend these types of comparisons (i.e. TRAPseq of LC-NE neurons or LC snRNA-seq following functional manipulations or in the context of disease or behavioral models) in the future.

      4) Statements in the manuscript about the unexpected identification of a 5-HT (serotonin) cell-cluster seem somewhat contradictory. Figure S14 suggests that 5-HT markers are expressed in the LC-regions just as much as anywhere else, but the RNAscope image in Figure S15 suggests spatial separation between these two populations. And Figure S17 again suggests almost perfect overlap between the LC and 5HT clusters. Maybe I misunderstood, in which case the authors should better clarify/explain these results.

      In our view, the most likely scenario is that the 5-HT neurons come from contamination from the dorsal raphe nucleus based on spatial separation from the RNAscope images, which we agree are more definitive. As mentioned above, since we do not have definitive documentation for the tissue sections in terms of orientation, it is difficult to say with clarity that the regions are the dorsal raphe and which sub-portion of the dorsal raphe they are. This initial study has now allowed us to optimize and improve our dissection strategy and approaches for retaining documentation of the orientation of the tissue sections from their intact position within the brainstem as they move from cryosection to placement on the array, which will enable us to better annotate regions with definitive anatomical information with respect to the rostral/caudal and dorsal/ventral axes in future experiments. Given that there are reports in the rodent that 5-HT markers have been identified in LC-NE neurons (Iijima 1993; Iijima 1989), and taking into account the technical limitations in our study, we felt that it was premature to definitively conclude in the manuscript that we were sure these signals arose from the dorsal raphe. We will update this language in a revised version of the manuscript to ensure that these limitations are clear (referring to Supplementary Figures S14-15, S17).

      Reviewer #2 (Public Review):

      The data generated for this paper provides an important resource for the neuroscience community. The locus coeruleus (LC) is the known seed of noradrenergic cells in the brain. Due to its location and size, it remains scarcely profiled in humans. Despite the physically minute structure containing these cells, its impact is wide-reaching due to the known neuromodulatory function of norepinephrine (NE) in processes like attention and mood. As such, profiling NE cells has important implications for most neurological and neuropsychiatric disorders. This paper generates transcriptomic profiles that are not only cell-specific but which also maintain their spatial context, providing the field with a map for the cells within the region.

      Strengths:

      Using spatial transcriptomics in a morphologically distinct region is a very attractive way to generate a map. Overlaying macroscopic information, i.e. a region with greater pigmentation, with its corresponding molecular profile in an unbiased manner is an extremely powerful way to understand the specific cellular and molecular composition of that brain structure.

      The technologies were used with an astute awareness of their limitations, as such, multiple technologies were leveraged to paint a more complete and resolved picture of the cellular composition of the region. For example, the lack of resolution in the spatial transcriptomic platform was compensated by complementary snRNA-seq and single molecule FISH.

      This work has been made publicly available and accessible through a user-friendly application such that any interested researcher can investigate the level of expression of their gene of interest within this region.

      Two important implications from this work are 1) the potential that the gene regulatory profiles of these cells are only partially conserved across species, humans, and rodents, and 2) that there may be other neuromodulatory cell types within the region that were otherwise not previously localized to the LC

      Weaknesses:

      Given that the markers used to identify cells are not as specific as they need to be to definitively qualify the desired cell type, the results may be over-interpreted. Specifically, TH is the primary marker used to qualify cells as noradrenergic, however, TH catalyzes the synthesis of L-DOPA, a precursor to dopamine, which in turn is a precursor for epinephrine and norepinephrine suggesting some of the cells in the region may be dopaminergic and not NE cells. Indeed, there are publications to support the presence of dopaminergic cells in the LC (see Kempadoo et al. 2016, Takeuchi et al., 2016, Devoto et al. 2005). This discrepancy is further highlighted by the apparent lack of overlap per given Visium spots with TH, SCL6A2, or DBH. While the single-nucleus FISH confirms that some of the cells in the region are noradrenergic, others very possibly represent a different catecholamine. As such it is suggested that the nomenclature for the cells be reconsidered.

      We appreciate the reviewer’s comment, and are aware of the reports suggesting the potential presence of dopaminergic cells in the LC. We initially had the same thought as the reviewer when we observed Visium spots in the spatial data with lack of overlap between TH, SLC6A2, and DBH as well as single nuclei in the snRNA-seq data with lack of overlap between TH, SLC6A2, and DBH. This surprising result was exactly why we performed the smFISH/RNAscope experiment with these three marker genes. Given known issues with read depth and coverage in the 10x Genomics assays, we wanted to better understand if this was a technical limitation in the sequencing coverage, or rather a true biological finding. The RNAscope data showed very clearly that nearly every cell body we looked at had co-localization of these three marker genes. We included an image from a single capture array of one tissue section in Supplementary Figure 11, but could, in a revised version of the manuscript, provide additional examples to illustrate how conclusive the images were by visualization. As such, we were quite convinced that the lack of overlap on Visium spots and in single nuclei in the snRNA-seq data was more likely related to technical issues with sequencing coverage, rather than a biological finding. We also note that we checked for the presence of the dopamine transporter, SLC6A3, and as can be appreciated in the iSEE web app for the snRNA-seq data or the R/Shiny web app for the Visium data, there is virtually no expression of SLC6A3 in the dataset, which in our view provides additional evidence against the possibility that there are substantial quantities of dopaminergic cells in this human LC dataset. We will include supplementary plots showing the lack of SLC6A3 expression in a revised version of the manuscript.

      The authors are unable to successfully implement unsupervised clustering with the spatial data, this greatly reduces the impact of the spatial technology as it implies that the transcriptomic data generated in the study did not have enough resolution to identify individual cell types.

      The reviewer is correct -- this is a fundamental limitation of the 10x Genomics Visium platform, i.e. the spatial resolution captures multiple cells per spot (e.g. around 1-10 cells per spot in human brain tissue). We note that new spatial platforms now provide cellular resolution (e.g. Vizgen MERSCOPE, 10x Genomics Xenium, 10x Genomics Visium HD), which will help address this in future work. However, many of these cellular-resolution in situ sequencing platforms have the limitation that they do not quantify genome-wide expression, and instead require users to select a priori gene panels to investigate. This is a problem if no genome-wide reference datasets are available. Hence, despite the limited spatial resolution of the Visium platform, this dataset is useful precisely for helping investigators choose gene panels for higher-resolution platforms or higher-order smFISH multiplexing.

      We also applied spatial clustering (using BayesSpace; Zhao et al. 2021) to attempt to segment the LC regions within the Visium samples in a data-driven manner as an alternative to the manual annotations, which was unsuccessful (and hence we relied on the manually annotated regions for downstream analyses) (Supplementary Figure S5). However, this is a different application of unsupervised clustering, which is separate from the task of identifying cell types.

      The sample contribution to the results is highly unbalanced, which consequently, may result in ungeneralizable findings in terms of regional cellular composition, limiting the usefulness of the publicly available data.

      We acknowledge the limitations of the work due to the small/unbalanced sample sizes. As mentioned above for Reviewer 1, this was an initial study in this region -- results of which will inform our (and hopefully others’) experimental design and approach to molecular profiling in this difficult to access brain region. Overall, this study was executed with finite tissue and financial resources and was intended to uncover limitations and help develop best practices and design workflows for future studies with larger numbers of donors and samples. Given the limited data availability for this brain region, we wanted to make this dataset available for the research community immediately. In addition, we note that making this genome-wide dataset available will help inform targeted gene panel design for higher-resolution platforms (e.g. 10x Genomics Xenium).

      This study aimed to deeply profile the LC in humans and provide a resource to the community. The combination of data types (snRNA-seq, SRT, smFISH) does in fact represent this resource for the community. However, due to the limitations, of which, some were described in the manuscript, we should be cautious in the use of the data for secondary analysis. For example, some of the cellular annotations may lack precision, the cellular composition also may not reflect the general population, and the presence of unexpected cell types may represent the accidental inclusion of adjacent regions, in this case, serotonergic cells from the Raphe nucleus.

      We agree, and have attempted to explain these limitations in the manuscript. We will clarify the language regarding the interpretation of the annotated cell populations and unexpected cell types, and the limited sample sizes, in a revised version of the manuscript.

      Nonetheless having a well-developed app to query and visualize these data will be an enormous asset to the community especially given the lack of information regarding the region in general.

      Reviewer #3 (Public Review):

      […] This study has many strengths. It is the first reported comprehensive map of the human LC transcriptome, and uses two independent but complementary approaches (spatial transcriptomics and snRNA-seq). Some of the key findings confirmed what has been described in the rodent LC, as well as some intriguing potential genes and modules identified that may be unique to humans and have the potential to explain LC-related disease states. The main limitations of the study were acknowledged by the authors and include the spatial resolution probably not being at the single cell level and the relatively small number of samples (and questionable quality) for the snRNA-seq data. Overall, the strengths greatly outweigh the limitations. This dataset will be a valuable resource for the neuroscience community, both in terms of methodology development and results that will no doubt enable important comparisons and follow-up studies.

      Major comments:

      Overall, the discovery of some cells in the LC region that express serotonergic markers is intriguing. However, no evidence is presented that these neurons actually produce 5-HT.

      The reviewer is correct that we did not provide any additional evidence to show that these neurons actually produce 5-HT. As noted above in the response to Reviewer 1, in our view, the most likely explanation is that these neurons are from dorsal raphe contamination on the tissue section. However, due to technical and logistical limitations in this study, we could not definitively say this because we did not clearly track the orientation of the tissue sections, and we did not have remaining tissue sections from all donor tissue blocks to repeat RNAscope experiments. For some of the donors, where we had remaining tissue sections to go back to repeat RNAscope experiments after completion of the snRNA-seq and Visium assays, we could see clear separation of the LC region / LC-NE neuron core from where putative 5-HT neurons were located (Supplementary Figure 15). However, we did not have sufficient tissue resources to map this definitively in all donors, and the orientation and anatomy of each tissue block were not fully annotated.

      Due to the lack of clarity, and the fact that there have been reports that LC-NE neurons express serotonergic markers (Iijima 1993; Iijima 1989), we felt that it was premature to definitively declare that these putative 5-HT neurons that we identified were definitively from the raphe. We will clarify the language around this discrepancy in a revised version of the manuscript to ensure that these limitations are clearly described.

      Concerning the snRNA-seq experiments, it is unclear why only 3 of the 5 donors were used, particularly given the low number of LC-NE nuclear transcriptomes obtained, why those 3 were chosen, and how many 100 um sections were used from each donor. It is also unclear if the 295 nuclei obtained truly representative of the LC population or whether they are just the most "resilient" LC nuclei that survive the process.

      As discussed above for Reviewer 1, the reason we included only 3 of the 5 donors for the snRNA-seq assays was due to the tissue availability on the tissue blocks. We will clarify the language in a revised version of the manuscript to make this limitation more clear. We will also include additional details in the Methods section on the number of 100 μm sections used for each donor (which varied between 10-15, approximating 60-80 mg of tissue).

      The LC displays rostral/caudal and dorsal/ventral differences, including where they project, which functions they regulate, and which parts are vulnerable in neurodegenerative disease (e.g. Loughlin et al., Neuroscience 18:291-306, 1986; Dahl et al., Nat Hum Behav 3:1203-14, 2019; Beardmore et al., J Alzheimer's Dis 83:5-22, 2021; Gilvesy et al., Acta Neuropathol 144:651-76, 2022; Madelung et al., Mov Disord 37:479-89, 2022). It was not clear which part(s) of the LC was captured for the SRT and snRNAseq experiments.

      As discussed above for Reviewer 1, a limitation of this study was that we did not record the orientation of the anatomy of the tissue sections, precluding our ability to annotate the tissue sections with the rostral/caudal and dorsal/ventral axis labels. We agree with the reviewer that additional spatial studies, in future work, could offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other, small, challenging regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples in order to make these types of insights.

      The authors mention that in other human SRT studies, there are typically between 1-10 cells per expression spot. I imagine that this depends heavily on the part of the brain being studied and neuronal density, but it was unclear how many LC cells were contained in each expression spot.

      The reviewer is correct that we did not include this information in the manuscript. We attempted to apply a computational method to count nuclei contained in each gene expression spot based on analyzing the histological H&E images (VistoSeg; Tippani et al. 2022), which we have developed and previously applied in data from the dorsolateral prefrontal cortex (DLPFC) (Maynard and Collado-Torres et al. 2021). Based on the segmentation using this workflow we observe that the counts in this region are similar to what we observed in the DLPFC, i.e., typically between 1-10 LC cells per expression spot, with approximately 1-2 LC-NE neurons (which are characterized by their large size) per expression spot. However, these analyses had several technical issues related to the images themselves, the relatively large size and pigmentation of LC-NE neurons, and parameter settings that had been optimized for different brain regions. We are currently optimizing this analysis workflow for these images to provide more accurate estimates of cell counts per spot to give readers additional context on the number of nuclei per spot in the annotated LC regions and outside the LC regions in a revised version of the manuscript.

      Regarding comparison of human LC-associated genes with rat or mouse LC-associated genes (Fig. 2D-F), the authors speculate that the modest degree of overlap may be due to species differences between rodents and human and/or methodological differences (SRT vs microarray vs TRAP). Was there greater overlap between mouse and rat than between mouse/rat and human? If so, that is evidence for the former. If not, that is evidence for the latter. Also would be useful for more in-depth comparison with snRNA-seq data from mouse LC: https://www.biorxiv.org/content/10.1101/2022.06.30.498327v1.

      We will investigate this question and discuss this in updated results in a revised version of the manuscript.

      The finding of ACHE expression in LC neurons is intriguing, especially in light of work from Susan Greenfield suggesting that ACHE has functions independent of ACH metabolism that contributes to cellular vulnerability in neurodegenerative disease.

      We thank the reviewer for pointing this out. We were very surprised too by the observed expression of SLC5A7 and ACHE in the LC regions (Visium data) and within the LC-NE neuron cluster (snRNA-seq data), coupled with absence of other typical cholinergic marker genes (e.g. CHAT, SLC18A3), and we do not have a compelling explanation or theory for this. Hence, the work of Susan Greenfield and colleagues suggesting non-cholinergic actions of ACHE, particularly in other catecholaminergic neurons (e.g. dopaminergic neurons in the substantia nigra) is very interesting. We will include references to this work and how it could inform interpretation of this expression in a revised version of the manuscript (Greenfield 1991; Halliday and Greenfield 2012).

      High mitochondrial reads from snRNA-seq can indicate lower quality. It was not clear why, given the mitochondrial read count, the authors are confident in the snRNA-seq data from presumptive LC-NE neurons.

      We will include additional analyses to further investigate and/or confirm this finding (e.g. comparing sum of UMI counts / number of detected genes and mitochondrial percentage per nucleus for this population to confirm data quality) in additional supplementary figures in a revised version of the manuscript.

      References

      • Greenfield (1991), A noncholinergic action of acetylcholinesterase (AChE) in the brain: from neuronal secretion to the generation of movement, Cellular and Molecular Neurobiology, 11, 1, 55-77.

      • Halliday and Greenfield (2012), From protein to peptides: a spectrum of non-hydrolytic functions of acetylcholinesterase, Protein & Peptide Letters, 19, 2, 165-172.

      • Iannitelli et al. (2023), The neurotoxin DSP-4 dysregulates the locus coeruleus-norepinephrine system and recapitulates molecular and behavioral aspects of prodromal neurodegenerative disease, eNeuro, 10, 1, ENEURO.0483-22.2022.

      • Iijima K. (1989), An immunocytochemical study on the GABA-ergic and serotonin-ergic neurons in rat locus ceruleus with special reference to possible existence of the masked indoleamine cells. Acta Histochema, 87, 1, 43-57.

      • Iijima K. (1993), Chemocytoarchitecture of the rat locus ceruleus, Histology and Histopathology, 8, 3, 581-591.

      • Luskin A.T., Li L. et al. (2022), A diverse network of pericoerulear neurons control arousal states, bioRxiv (preprint).

      • Maynard and Collado-Torres et al. (2021), Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex, Nature Neuroscience, 24, 425-436.

      • Tippani et al. (2022), VistoSeg: processing utilities for high-resolution Visium/Visium-IF images for spatial transcriptomics data, bioRxiv (preprint).

      • Tran M.N., Maynard K.R. et al. (2021), Single-nucleus transcriptome analysis reveals cell-type-specific molecular signatures across reward circuitry in the human brain, Neuron, 109, 3088-3103.

      • Zhao E. et al. (2021), Spatial transcriptomics at subspot resolution with BayesSpace, Nature Biotechnology, 39, 1375-1384.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The authors show that there are several classes of Snf1 targets (Fig. 3e), most notably some that are phosphorylated immediately after Snf1 activation by glucose (<5 min) and others that are only phosphorylated after 15 min. In a simple view, all direct Snf1 targets should be phosphorylated immediately after Snf1 activation. Is that the case? What is the overlap between the direct targets found using the OBIKA assay and the slow and fast responding in vivo targets? What about the phosphorylation motif, does it differ between the groups? These points are not discussed in the text except to point out that the direct Snf1 target Msn4 is among the slowly phosphorylated group.

      This is a very good point and we have performed the suggested analysis, which resulted in an interesting finding that we describe now in the text as follows:

      “Notably, of the 145 confirmed target sites, 81 (i.e. 72%) were significantly regulated after both 5 min and 15 min. Of the remaining 64 sites, 32 responded only after 5 min, while the other 32 responded only after 15 min. Some of the former residues are located within Snf1 itself, the -subunit of the Snf1 complex (i.e. Sip1), the Snf1-targeting kinase Sak1, or Mig1, while some of the latter are located within the known Snf1-interacting proteins such as Gln3, Msn4, and Reg1. These observations indicate that Snf1-dependent phosphorylation initiates, as expected, within the Snf1 complex and then progresses to other effectors. Interestingly, based on the residues that responded exclusively after 5 min, we retrieved a perfect Snf1 consensus motif (i.e. an arginine residue in the -3 position and a leucine residue in the +4 position; Supplementary figure 2A). The one retrieved for the residues that respond exclusively at 15 min, in contrast, significantly deviated from this consensus motif (Supplementary figure 2B). The slight temporal deferral of Snf1 target phosphorylation may therefore perhaps in part be explained by reduced substrate affinity due to consensus motif divergence.”

      2) The data showing that Snf1-dependent phosphorylation of Pib2 plays a key role in triggering inhibition of TORC1 is convincing but is entirely dependent on a rescue of the TORC1 inhibition defect seen in cells where Snf1 is inhibited. That is, TORC1 is normally inactivated during glucose starvation; this does not occur when Snf1 is inhibited by 2nm-pp1 but does occur when Snf1 is inhibited in a strain carrying a phosphomimetic version of Pib2 (Pib2SESE). This indicates that Pib2 phosphorylation is sufficient to replace Snf1 signaling and inhibit TORC1 during glucose starvation. However, in a simple model, a phosphodead version of Pib2 (SASA) should have the opposite effect. That is TORC1 should remain active during glucose starvation in the Pib2SASA strain-but that is not the case (Fig. 4g). This point is not discussed in the paper; why do the authors think that TORC1 is inhibited normally in the SASA mutant inhibits TORC1 normally?

      We fully agree with this statement and have highlighted and discussed this issue now in the last paragraph of the results section (where we think this fits best) as follows:

      “In contrast, the separated and combined expression of Sch9S288A and Pib2S268A,S309A showed, as predicted, no significant effect in the same experiment. Unexpectedly, however, the latter combination did not result in transient reactivation of TORC1, like we observed in glucose-starved, Snf1-compromised cells. This may be explained if TORC1 reactivation would rely on specific biophysical properties of the non-phosphorylated serines within Sch9 and Pib2 that may not be mimicked by respective serine-to-alanine substitutions. Alternatively, Snf1 may employ additional parallel mechanisms (perhaps through phosphorylation of Tco89, Kog1, and/or other factors; see above) to prevent TORC1 reactivation even when Pib2 and Sch9 cannot be appropriately phosphorylated. While such models warrant future studies, our current data still suggest that Snf1-mediated phosphorylation of Pib2 and Sch9 may be both additive and together sufficient to appropriately maintain TORC1 inactive in glucose-starved cells”

      Reviewer #2 (Public Review):

      1) Because PIB2 is a major focus of the manuscript, I was surprised that it was not discussed in the introduction. I think it would be appropriate to discuss prior evidence linking this protein to TORC1.

      We thank the reviewer for this suggestion. Pib2 and its role in TORC1 control is now described in the introduction.

      2) The authors introduce mutations into PIB2 at two sites determined to be phosphorylated by SNF1, at S268 and S309. Somewhat confusing results are obtained, in that the PIB2 null and phosphomimic mutants (S268E and S309E) confer a similar TORC1 phenotype, compared to the S268A S308A mutant. These results require further explanation than simply that "TORC1 inactivation defect in SNF1-compromised cells is due to a defect in PIB1 phosphorylation". This is particularly intriguing given that the opposite results are observed with the SCH9 mutants, where the null and alanine mutants confer a similar phenotype compared to the S to E mutants.

      The finding that both loss of Pib2 and expression of the phosphomimetic allele yield the same phenotype is indeed counterintuitive. Hence, we fully agree with the criticism put forward here. We believe that the underlying reason for our observation is based on the unique property of Pib2 in having both a C-terminal TORC1-activating domain (CAD) and an-N-terminal TORC1-inhibitory domain (NID). We have addressed this point briefly in the discussion ("Our current data favor a model according to which Snf1-mediated phosphorylation of the Kog1-binding domain in Pib2 weakens its affinity to Kog1 and thereby reduces the TORC1-activating influence of Pib2 that is mediated by the C-terminal TORC1-activating (CAD) domain via a mechanism that is still largely elusive"), but now also address this issue in the results section as suggested.

      3) The authors conclude, based on the co-IP data in Figure 4H, that interactions between KOG1 and PIB2 are direct. However, it remains possible that interactions between these proteins are mediated by other components of TORC1 or within cells. This should be addressed.

      Please note that the Kog1-Pib2 interaction has previously been demonstrated by different methods. Accordingly, Pib2 has not only been shown to interact with Kog1 (or TORC1) in co-IP studies in vivo (PMID: 30485160, PMID: 29698392), but also by co-IP studies in vitro (PMID: 29698392, PMID: 28483912, PMID: 34535752). In addition, the interaction between Kog1-Pib2 has also been dissected (down to defined domains) by classical two hybrid analyses (PMID: 28481201). All of these studies are cited now in the introduction where Pib2 is discussed.

      4) The authors demonstrate convincingly that the PIB2 and SCH9 SNF1-specific phospho-site mutants have a detectable effect on TORC1, primarily by examining TORC1-dependent phosphorylation of SCH9. What is unclear is whether phosphorylation at these sites has a significant physiological impact on cells. It appears that the rapamycin hyper-sensitivity displayed in Figure 6E is the only data presented to address this question. It would be appropriate for the authors to comment further on the significance of SNF1-dependent phosphorylation of these two substrates.

      To further address the physiological role of the Snf1-dependent phosphorylation of Sch9 and Pib2 combined, we newly assessed the growth rate of the strain that expresses the Sch9SE and Pib2SESE alleles combined. Accordingly, we found the snf1as pib2SESE sch9SE strain to exhibit a significantly higher doubling time than the snf1as strain on both low-nitrogen-containing media and standard synthetic complete media. This is now included in the text (results section).

      Reviewer #3 (Public Review):

      1) Conceptually, the manuscript shows that Snf1 activity is important for the acute inhibition of TORC1 during glucose starvation. However, this is mainly restricted to 10 and 15 minutes of glucose starvation. After 20 minutes, TORC1 is inhibited by some unknown mechanisms independent of Snf1 (Hughes Hallet et al). This raises concern regarding the physiological relevance of Snf1-mediated TORC1 inhibition during acute glucose stress. The authors show that this regulation is important for the survival of cells under TORC1 inhibition. How do the authors envision that the acute role of Snf1 plays an important long-term physiological relevance during rapamycin treatment? Providing more support for the physiological relevance of this regulation will make this study of interest to a broad readership.

      Please see our response to point 4 of reviewer #2.

      2) Another major concern of the manuscript is the inconsistencies between the various representative immunoblots and their quantifications. The effect of AMPK activity on TORC1 signaling under glucose starvation seems very subtle. A few specific concerns are mentioned below:

      a) In figure 1A, the increase in TORC1 activity upon inhibition of analogue sensitive Snf1as by 2NM-PP1 is very marginal. Although quantification shows a significant increase, a representative western blot figure should be shown.

      We have replaced the original immunoblots with more representative ones in Figure 1A.

      b) Does deleting Snf1 itself have any effect on TORC1 activity? Lane 4 of figure 1A shows reduced activity compared to lane 1.

      TORC1 activity is generally assessed as the ratio between phosphorylated Sch9 and total Sch9 (see also below under (e)). Accordingly, based on the quantification of 6 blots (we added two more experiments to address this point; Figure 1B), loss of Snf1 has no significant impact on TORC1 activity in exponentially growing cells, as we expected.

      c) To show the effect of Snf1 on the repression of TORC1, the time-course experiments are run on two separate gels in figure 1C. Hence, it is difficult to compare the effect of Snf1 on unscheduled reactivation of TORC1 under glucose starvation.

      Please note that the data of the two blots were cross-normalized to the sample from exponentially growing cells (labeled “Exp”; i.e. the same sample was loaded on the two blots) in order to compare and quantify the effects of Snf1.

      d) In figure 1E, the effect of Reg1 deletion on TORC1 activity seems minor as both phospho- and total levels of Sch9 are reduced.

      As correctly pointed out by this reviewer, we consistently found the total Sch9 levels to be lower in reg1Δ cells when compared to wild-type cells. To assess TORC1 activity, we therefore always determine the ratio between phosphorylated Sch9 and total Sch9, and the respective ratio is significantly different in reg1∆ cells when compared to wild-type cells. We speculate that the reduced Sch9 levels in this mutant are caused by the reduced growth rate (PMID: 22140226) and hence lower protein synthesis rate (to which translation of SCH9 mRNA may be specifically sensitive).

      Since further mechanistic insights are based on these initial findings of figure 1, solidifying these observations is very important.

      3) In figure S1, the analogue sensitive Snf1as shows significant reduction in its activity (reduced S79 phosphorylation of ACC1-GFP). This raises the concern of whether this genetic background is an ideal system to resolve the mechanism of TORC1 suppression.

      The Snf1as allele is indeed hypomorphic, which we acknowledge appropriately in the text. We would like to point out however, that we took great care in each experiment to include the DMSO control that allowed us to unequivocally assign any observed effects to the specific drug-mediated inhibition of Snf1as. Importantly, we think that the hypomorphic nature of the Snf1as allele (which allows normal growth on non-fermentable carbon sources) represents a minor trade-off when compared to the advantages that this allele provides over the use of a snf1∆ strain, which exhibits a fundamentally reprogrammed transcriptome/proteome (PMID: 17981722). Accordingly, this allele allows the assessment of Snf1 inhibition on very short time scales while minimizing confounding large-scale proteome rearrangements that may indirectly affect the studies. Moreover, use of the Snf1as allele also allowed us to compare our results more directly with other phosphoproteome studies that used the same allele (PMID: 25005228, PMID: 28265048). Finally, please also note that our main conclusions (on Snf1-mediated control of TORC1) are corroborated by additional genetic data such as the ones in Figure 1A/E where we use snf1∆ and reg1∆ cells.

      4) In figure 2, during glucose restimulation, there is increased retention of Snf1as-pThr210 in the presence of 2NM-PP1. This suggests that the upstream glucose sensing pathway as well as Snf1 might be more active than in DMSO-treated cells. This also raises concerns regarding the suitability of the genetic background for the study. Can authors comment on why this phosphorylation persists? Does the phosphoproteomic analysis give any hint for this phenotype?

      This is a very good point. In fact, we forgot to mention in the text that the observed effect of the 2NM-PP1 treatment on Snf1-Thr210 phosphorylation has already been studied and mechanistically explained earlier (PMID: 23184934). Accordingly, the entry of the drug into the broader catalytic cleft of the Snf1as mutant causes the catalytic domain to be stabilized in a conformation, which prevents dephosphorylation of pThr210 by the dedicated Glc7-Reg1 phosphatase heterodimer. This can be observed each time when we compared 2NM-PP1- and DMSO-treated cells and probed for Snf1-Thr210 phosphorylation. This is, in fact, an independent control for proper 2NM-PP1 functioning. We have now added a sentence (including reference) that pinpoints this issue in the text.

      5) In figure 4H, where authors claim reduced binding of Kog1 to Pib2SESE, levels of Kog1 in input are also reduced. Can authors provide further support using colocalization studies? Also, does Pib2SESE has any defect in forming Kog1 bodies?

      We took great care to load equal amounts of IPed Pib2-myc variants and then normalized the co-IPed Kog1-HA on the IPed Pib2-myc variant levels. The Kog1-HA input levels vary a bit between the 4 experiments, but they are on average not significantly lower in Pib2SESE-myc-expressing cells when compared to WT cells. In addition, in our Co-IP experiments, the beads are saturated with Pib2-myc variants and Kog1-HA levels are generally not limiting. We therefore deem it fair to say that the Pib2SESE has a reduced affinity for Kog1. Based on our experience with other co-localization studies of membrane-bound proteins and protein complexes (e.g. TORC1 versus EGOC), we find it extremely difficult to quantify local interactions by fluorescence microscopy (unless they are close to all or nothing). In this case, where we have a partial defect in the interaction between Kog1 and Pib2SESE, we anticipate that such analyses will not allow us to draw additional conclusions.

      Regarding the issue of Kog1/TORC1-body formation: all of our mutations in PIB2 and SCH9 were introduced (by CRISPR-Cas9) in the genome of our snf1as strain, which was used throughout this study. To analyze Kog1/TORC1-bodies, we have therefore first tried to C-terminally tag KOG1 with GFP in the genome of our strain background (similarly as was done in the original description of Kog1 bodies; PMID: 26439012). However, because all our attempts failed to create KOG1-GFP in our strain, we assumed that this construct may be lethal in our strain background. This is not completely unexpected, as it is known that the Kog1-GFP allele is hypomorphic and temperature sensitive (PMID: 19144819). In an alternative approach, we have therefore set out to study TORC1 body formation in our strains by using a GFP-TOR1 allele that can be integrated into the genome and that expresses functional TORC1 (PMID: 25046117). As we have described earlier, the respective GFP-Tor1 construct localized on vacuolar membranes and on foci that we previously have shown to correspond to signaling endosomes (PMID: PMID: 30732525, 30527664). Unexpectedly, however, when we starved the respective cells for glucose, the number of GFP-Tor1 foci did only marginally increase (20%) in our strain background over a period of up to 1 hour. Given these various unexpected issues, we prefer to not include any of these preliminary data in the current version of our manuscript, but to rather follow up on these observations in a separate study. We deem this particularly justified as the current literature on TORC1-body and TOROID formation also appears controversial and may need further clarification. For instance, while TORC1-body formation has been suggested to represent a Snf1-dependent process that is dispensable for TORC1 inhibition (PMID: 30485160), TOROID formation has been suggested to represent a Snf1-independent process that is mechanistically linked to TORC1 inhibition (PMID: 28976958).

      6) In figure 5F, where the authors claim the Sch9SE mutant has lower TORC1 activity, the difference is very minor. Furthermore, corresponding lanes also show reduced levels of Snf1as expression. Hence, improved blots are required here. Also, an in vitro kinase assay with full-length Sch9 KD with and without the Ser288 mutation could solidify the observation that phosphorylation of Ser288 indeed affects TORC1-mediated phosphorylation.

      We have replaced the blots in Figure 5F with an alternative set that more clearly highlights the (statistically significant) differences, while also exhibiting more equal levels of Snf1as levels. Regarding the in vitro kinase assays: we have repeatedly tried to perform TORC1 kinase assays on full length Sch9KD without success. We currently believe that proper TORC1-mediated phosphorylation of Sch9 may have to occur on membranes to which both TORC1 and Sch9 are tethered through phospholipid interactions (PMID: 29237820). We are trying to set up such a system on liposomes, but we assume that this will be a major effort that cannot be resolved in due time.

      7) In figure 6E, the Sch9SE mutant shows no effect in the presence of rapamycin. Thus, in vivo, phosphorylation at Ser288 may not be perturbing the phosphorylation of Sch9 by TORC1.

      When cells are grown on glucose where TORC1 is highly active (as in Fig. 6E or 6A/B in Exp), expression of Sch9SE has no significant effect indeed. However, in glucose-starved cells, where TORC1 activity is low, expression of the Sch9S288E allele clearly and significantly contributes to inhibition of Sch9-Thr737 phosphorylation by TORC1 (Figure 6A/B and Figure 5F/G).

      8) According to the author's proposed mechanism, TORC1 activity in Pib2SASA or Pib2SASA/Sch9SA backgrounds should be higher during glucose starvation compared to the control strains. However, glucose starvation shows a similar level of reduction in TORC1 activity in these backgrounds. This raises concern regarding the proposed mechanism. The authors mainly base their conclusions on Ser to Glutamate mutants. The authors should be cautious that Ser to Glutamate changes may also affect the protein structure which can confer similar phenotypes. How do the authors justify this discrepancy?

      Please see our response to point 2 of reviewer #1.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sequence some of the oldest maize macroremains found to date, from lowland Peru. They find evidence that these specimens were already domesticated forms. They also find a lack of introgression from wild maize populations. Finally, they find evidence the Par_N16 sample already carried alleles for lowland adaptation.

      Overall I think this is an interesting topic, the study is well-written and executed for the most part. I have a variety of comments, most important of which revolve around methodological clarity. I will give those comments first.

      1) The authors should say in the Results section how "alleles previously reported to be adaptive to highlands and lowlands, specifically in Mesoamerica or South America" were identified in Takuno et al. 2015. What method was used? I see this partly comes in the Discussion eventually, but it would help to have it in the Results with more detail. The answer to this question would help a skeptical reader decide the appropriateness of the resource, given that many selection scans have been performed on maize genomes, the choice would ideally not be arbitrary.

      This was explained in more detail in the Material and Methods section, to keep the Results and Discussion sections more concise. However, we agree that adding a brief explanation in the Results section would be useful and we have modified the revised version accordingly. Now the relevant part of the section Specific adaptation to lowlands in Mesoamerica and South America reads as follows: “To assess this, we identified in Par_N16 all covered SNPs with alleles previously reported to be adaptive to highlands and lowlands, specifically in Mesoamerica or South America by Takuno and coworkers (Takuno et al., 2015). These authors used genome-wide SNP data from 94 Mesoamerican and South American landraces and identified SNPs with significant FST values to infer which allele was likely adaptive. For example, those SNPs showing significant FST only in Mesoamerica, were characterized as adaptive for lowlands if they were at high frequency in the lowland population and at low frequency in the highland population, and vice versa. The same was applied for South America (Takuno et al., 2015). They identified 668 Mesoamerican and 390 South American previously reported adaptive SNPs, from which 32 and 20 were covered in Par_N16, respectively.”

      2) How were the covered putative adaptive SNPs distributed in the genome? Were any clustered and linked? The random sampled SNPs should be similarly distributed to give an appropriate null.

      The SNPs in Takuno et al. (2015) are in general at a median distance of 353 bp from each other. The 20 adaptive sites covered in Par_N16 for South America (SA) are at a median distance of 8,301,843 bp (approximately 8.3 Mbp), while the 32 for Mesoamérica (MA) are at a median distance of 24,295,968 bp (approximately 24.3 Mbp). SNPs in five pairs from Mesoamerica are closer than 100 bp between them, but each pair is at a considerable distance (beyond 1 cM) from each other and from other SNPs covered in Par_N16. This same happens for only one SNP pair from South America. Then, in general, the covered adaptive SNPs are not clustered. For our random samples, the range of genomic distances between SNPs is similar to those of adaptive SNPs. This shows that our null distributions are adequate for our statistical purposes. The genomic positions of covered adaptive sites in Par_N16 are now included in a new Table in the revised version (Supplementary File 2). We have included these observations in the main text (section Specific adaptation to lowlands in Mesoamerica and South America), as follows: “In general, adaptive SNPs represented in Par_N16 were not clustered. The 20 South American adaptive SNPs are at a median distance of 8,301,843 bp, while the 32 Mesoamerican SNPs are at a median distance of 24,295,968 bp (Supplementary File 2). SNPs in five pairs from MA are closer than 100 bp between them, but each pair is at a considerable distance (beyond 1 cM) from each other and from other SNPs. This same happens for only one SNP pair from SA. Then, although at low proportions, the adaptive SNPs in Par_N16 are a bona fide representation of different genomic responses to selection pressures...” and “We analyzed some of these random samples and observed a similar behavior as the adaptive SNPs regarding the range of distances between SNPs (Fig, S18).”

      3) How is genetic similarity calculated? It should be briefly described in the Results.

      This is formally explained in the Material and Methods section, but now we have included a brief description in the Results section (Specific adaptation to lowlands in Mesoamerica and South America) as follows: “The allelic similarity is the average of the frequencies of the Par_N16 alleles in the intersected sites with each test population (see Material and Methods).”

      4) It would help for the authors to state why they focus on Par_N16, I did not see this in my reading. Presumably, the analyses done are because of the higher quality data, but it would also help to mention why Par_N16 was sequenced in an additional run.

      Indeed, Par_N16 has an endogenous DNA content of 1.1 %, while the other two samples presented a very low DNA content (0.2%). Therefore, we decided to invest more in the best sample, as a cost/benefit decision for additional sequencing. We have included brief explanations of this in the revised text. In the Results section Paleogenomic characterization of ancient maize samples, it reads as follows: “Due to its higher endogenous DNA content (one order of magnitude larger, we further sequenced the Par_N16 library, obtaining 459M additional reads, to generate a total of 851M for this sample (Table 2).” and “To determine if the specific elimination of C to T and G to A modifications could bias the results in favor of maize rather than teosinte alleles, an additional database was generated in which all transitions were eliminated (i.e., only transversions were included) in Par_N16 only, because it was the only sample with enough sequencing data to conduct this experiment.” While in the section Tests of gene flow from mexicana, is as follows: “Par_N16 was the only sample with enough DNA sequence data to perform this analysis. All the samples showed the same phylogenetic position; therefore, Par N 16 was considered to be representative of ancient Paredones maize.”

      5) In the sections on phylogenetic analysis, introgression, and D statistics, the authors could do a better job specifically indicating how the results support their conclusions.

      Precise indications of how our results support our conclusions are given in the Discussion section. Nevertheless, we added relevant sentences in the specified sections. In the section Relationship between ancient maize, extant landraces, and Balsas teosinte, we added the following: “Thus, based on genome-wide relatedness, Paredones maize clusters with extant domesticated Andean landraces, supporting both, a single origin for maize and that these Peruvian samples were already domesticated.” In the section on introgression and D-statistics (Tests of gene flow from mexicana), we improved the last sentence as follows: “These results consistently show the absence of significant gene flow between Par_N16 and mexicana, implying that the lineage that gave rise to Paredones maize left Mesoamerica without relevant introgressions from this teosinte.”

      Reviewer #2 (Public Review):

      In this foundational article, the authors conduct an ancient DNA characterization of maize unearthed in archaeological contexts from Paredones and Huaca Prieta in the Chicama river valley of Peru. These maize specimens were recovered by painstakingly controlled excavation. Their context would appear to be beyond reproach though the individual radiocarbon determinations should be subject to further scrutiny.

      1) Radiocarbon determination for at least one of the maize cobs analyzed for aDNA is not a direct date, but dates associated material. The authors should provide a table of the direct dates on the specimens that were analyzed for ancient DNA. They should also specify the type and quantity of material sent and whether the cob, glumes, pith, or husks were submitted for dates. Include δ13C determinations for each cob with laboratory analysis numbers because there is justifiable concern that at least one of these cob dates has a δ13C value suggesting the material dated is not maize. Generally, the δ13C for maize ranges from -14 to -7. One or more of the specimens subjected to ancient DNA analysis in this paper have δ13C values far outside of this confidence interval.

      The indirect radiocarbon date on a maize cob was derived from a single piece of wood charcoal in a hearth directly associated with the analyzed cob, both embedded in a thin intact floor in Unit 20 at the Paredones site. The assay on the charcoal and the floor are in an undisturbed stratigraphic context and are in agreement with assays on other maize and charcoal remains in floors both above and below the hearth. We have included this information in Table 1 in the revised version. The information sought by Reviewer 2 on the studied cobs was published previously in Grobman et al. 2012 and in Dillehay 2017. Since details of the cobs were published, we decided to submit only what we thought were pertinent data for this manuscript.

      As for the δ13C reading of one cob outside of the confidence interval for maize, the dated specimen with this value is a maize husk fragment. Both the macro- and micro-morphology and the ancient DNA analysis of the husk demonstrated it was maize. We do not understand what affected the δ13C value for this specimen. Similarly, three human skeletons from deeper site levels have δ13C values greater than the expected range for human remains.

      2) From the perspective of future scientists being able to repeat the analyses performed here, I would hope that all details of specimen treatment, extraction methods, read length and quality would need to be assiduously described. Routine analytical results should be reported so that comparisons with earlier and future results are facilitated, and not made difficult to decipher or search for.

      The general procedures for accurate ancient DNA extraction were described in Vallebueno-Estrada et al. 2016 and we do not see the need to repeat this information in this article. Specific aspects of sample treatment and DNA extraction of the samples analyzed here are described in the Material and Methods, section on Extraction and sequencing of ancient samples. Results on quality (percentage of endogenous DNA, quality-filtered reads, mapped reads to either repetitive or unique regions, amount of sequence mapped, mapping Phred scores, estimated error rates, percentage of deamination, fragment median lengths, percentage of sites with signatures of molecular damage, number of unique genomic sites covered and their corresponding average sequencing depth) are described in the Results, section Paleogenomic characterization of ancient maize samples. This section also includes the number of SNPs in relation to the reference and the number of intersected SNPs between our samples and the HapMap3 database. In addition, complementary information to this section is included in Tables 2-4 and supplementary Figures S2-S6, as properly referenced in the last mentioned section.

      3) The aDNA analysis may or may not be affected by the anomalous δ13C values but one would anticipate that standard aDNA extraction and analysis protocols would provide a means by which the specimen's preservation of the specimens could be ascertained, for example, perhaps deamination and fragmentation rates could be compared or average read length evaluated with modern-contemporary materials so that preservation of the Paredones samples relative to that of maize in the CIMMYT germplasm bank and the San Marcos specimens investigated by the same researchers can be evaluated.

      Average read length from contemporary material depends more on the sequencing platform than sample preservation. For example, Illumina can only read fragments of hundreds of base pairs, while MinIon or PacBio can read fragments in the order of kb. Also, deamination is not an issue in DNA extracted from modern material (unless bisulfite is used for methylation detection). Comparison with San Marcos samples indicates that Paredones samples are heavily degraded, although this is not a function of time only (humidity, temperature, and pH are among other relevant factors). Therefore, to avoid misleading interpretations, we are not including a comparison with San Marcos samples in the revised version.

      4) The size and shape of the cobs depicted are similar to specimens occurring much later in Mesoamerican assemblages. For example, the approximate rachis diameter of the San Marcos specimens depicted by Valle-Bueno et al. (2016: Fig.1) averages less than 0.5cm while the specimens depicted in Valle-Bueno et al. (this manuscript) average 1.0 cm. The former - San Marcos - specimens are dated at 5300-4970 BP cal while the larger - Paredones - specimens date roughly 6777 - 5324 BP cal. The considerable disparity among the smaller more recent specimens compared to the very much larger putatively older specimens suggests the Paredones specimen's radiocarbon determinations are equivocal. The authors point this out but repeatedly state these cobs are the most ancient; a conundrum that should be resolved.

      Radiocarbon determinations in Paredones are not equivocal, on the contrary, they are perfectly in agreement with and supported by the unimpeachable stratigraphy of the site and by more than 150 other radiocarbon and OSL dates from Paredones and nearby excavated contexts. The difference in morphology between the more recent samples from Tehuacan and the more ancient samples from Paredones is exactly the paradox we try to address. Our results indicate that the rapid migration and adaptation of maize to the coast of Peru in comparison with a slower migration and adaptation to Tehuacan lands explains this apparent conundrum. This rapid movement and migration allowed the presence of more “modern” maize in Peru than in Tehuacan on the respective dates. This more rapid maize development also coincides with more rapid and advanced socio-cultural transformations in Peru, including proto-urbanism (i.e, first cities), early religious symbolism, long-distance irrigation canals, and other major innovations that far exceed what was happening in Mesoamerica at the time.

      5) I would suggest the authors consider redating these three specimens and if they do, hope that they will prepare the laboratory personnel with depositional environment information. MacNeish was skeptical about late dates on maize at Tehuacan, at first. Adovasio was initially certain about maize's associated dates from Meadowcroft. One would prefer to be reasonably certain the foundation this article creates is solid; the author's repeated reference to these cobs as the most ancient in the Americas should be reaffirmed so retraction will not be necessary.

      As discussed in Grobman et al. 2012 and in Dillehay 2017, we do not confide in C14 dating of unburned corn remains due to the possible intrusion of fungi in the soft cellular structure of cobs. The chrono-stratigraphically acceptable dates on cobs and other maize remains were taken on burned and hard tissue remains, such as husks. See detailed discussion in Supplementary Materials.

      MacNeish and Adovasio were excavating cave and rock shelter sites, which are known to often have areas of stratigraphically disturbed deposits. Paredones, Huaca Prieta, SR-18 and other Preceramic sites excavated in the study area here contain late to early varieties of maize and radiocarbon assays that are in chrono-stratigraphic agreement. As noted in the main text and in prior publications, these sites are open air localities with clear stratigraphy defined by intact floor and fill sequences, with no tree root, animal burrowing, or other major taphonomic disturbances.There were occasional hearths and pits (i.e., human burials) that intruded into deeper floor-fill sequences but none of the assayed and studied maize samples were derived from these contexts. Once again, we encourage readers to examine the stratigraphy shown in the main text and in Grobman et al. (2012) and Dillehay (2017). Moreover, as noted in the text, there is a growing number of Preceramic sites in South America that date between 6800 and 6000 years ago and later that contain micro-maize remains (see Kistler et al., 2018). Not all of these sites are well-dated and present reliable contexts, but several have good chrono-stratigraphic settings and micro-evidence (e.g., phytoliths, starch grains) indicative of a maize presence at or prior to 6000 years ago.

    1. Author Response

      Reviewer #3 (Public Review):

      The only substantial point I raise relates to the sexual selection (mate choice) part of the work. While it has no major effect on the overall conclusion, I think their interpretation needs to be reconsidered.

      When reporting the results of mate choice experiment (L219ff), the authors state that males of wild and Klara type preferred wild-type females, because 75% of laid eggs belonged to wild-type females. However, another possibility is that Klara females had reduced fecundity, and the lower share of eggs had nothing to do with mate choice. In the same way, "90% of eggs were fertilized by wild-type males" (L223) is used to conclude that they were preferred by females (active mate choice). However, male success in N. furzeri is largely driven by male dominance (and not female mate choice) and it is more likely (and more precise to state) that wild-type males were more successful in male-male competition for access to females (and fertilize their eggs). This is especially so because wild-type males were larger (L. 322) and body size plays a major role in establishing dominance between N. furzeri males. This is then also pertaining to interpretation in discussion (L 318).

      Concerning fecundity, we analyzed quantity and quality of eggs obtained from either klara or wild type breeding groups. As shown in Figure 3A we did not observe differences between klara and wild type fish. Thus, we conclude that fecundity is not reduced in klara females. Regarding males, we did not observe a size difference between the klara and wild type animals in this experiment (Fig. 3C), however, weight was different. As noted by the reviewer, this might influence male dominance and breeding success. We have been more explicit on this in the discussion of the revised version.

  3. Jan 2023
    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents the results of two fragment screens of PTP1B using room-temperature (RT) crystallography, and compares these results with a previously published fragment screen of PTP1b using cryo-temperature crystallography. The RT screen identified fewer fragment hits and lower occupancy compared to the cryo screen, consistent with prior publications on other proteins. The authors attempted to identify additional hits by applying two additional layers of data processing, which resulted in a doubling in the number of possible hits in one of the screens. Because I am not an expert in panDDA modeling, however, I am unable to evaluate the reproducibility and potential potency of these fragment hits as protein binders or their potential use as starting points for follow-up chemistry.

      The fragment library used in this study was larger than those used in previously published RT crystallography experiments. Among the cryo hits that bound in RT, most fragments bound in the same manner as they did in cryo, while some bound in altered orientations or conformations, and two bound at different locations in RT compared to cryo. This level of variability is not surprising. However, one fragment was observed to bind covalently to lysines in RT, even though it showed no density in the cryo crystallization attempt. It is unclear from the provided information whether this fragment decayed during storage or if the higher temperatures accelerated the covalent chemistry. The authors also observed temperature-dependent changes in the solvation shell, and modifications to the protein structure upon fragment binding, including a distal modification.

      We thank the reviewer for the thorough summary of our manuscript.

      Regarding reproducibility of fragment hits, cryo structures are more variable than RT structures for proteins themselves (Keedy et al., Structure, 2014). Thus the variability of repeated cryo-temperature crystallography experiments is a relevant consideration when comparing cryo to RT structures for protein-ligand interactions. However, to our knowledge, no published papers have explored this issue. Our previous cryo fragment screen (Keedy, Hill, et al., eLife, 2018), as with many others, was focused on breadth (many fragments), not depth (replicates). Unpublished work by some of the authors of the present study suggests that fragment poses are robust in replicate cryo experiments; however, future studies focused on fragment reproducibility in terms of binding occupancy, pose, and site at cryo temperature would be useful contributions to the field.

      Regarding follow-up chemistry, there is growing evidence from multiple successful fragment-based inhibitor design studies (COVID Moonshot Consortium et al., bioRxiv, 2022; Gahbauer, Correy, et al., PNAS, 2023; etc.) that, although fragments usually bind too weakly to impact function on their own, they offer rich information to seed the design of high-affinity, potent functional modulators of proteins. As our study is the first to report many structures of fragments bound to proteins at RT, we cannot yet comment as to whether they offer unique advantages over cryo fragments in downstream fragment-based drug design efforts, but this is an open area for future study.

      Regarding the covalent lysine binder, we agree with the reviewer on this point; our manuscript includes a note to this effect. Unfortunately we were unable to obtain the original fragment sample for mass spectrometry analysis. Returning to the point above about follow-up chemistry, the path forward for this fragment hit is promising and clear, and includes confirming chemistry using the original nominal compound vs. what is observed in the electron density, fragment linking and/or expansion, functional assays, and structural biology, all hopefully leading to a potent covalent inhibitor of wildtype PTP1B.

      The current version of the paper is somewhat repetitive in its presentation of the results and could be clearer in its presentation of the variations and comparisons of the two different protocols. It would be helpful to have a more concise summary of the differences between the two protocols in the current paper, as well as a discussion of how they compare to the protocol used in the previously published cryo-temperature fragment screen.

      We agree that it would be helpful to cut down on any redundant text and more straightforwardly compare/contrast the different room-temperature screen methods vs. the previous cryo-temperature screen method. To address this suggestion, we deleted the Discussion paragraph about the strengths and weaknesses of the two methods relative to serial approaches, deleted the text in the Introduction that introduces the two screens, and placed new text at the start of the Results section in the subsection titled “Two crystallographic fragment screens at room temperature” to provide a concise summary in one location of the manuscript.

      While I appreciate the speculative nature of the discussion at the end of the paper, the evidence presented by the authors does not instil confidence that these results will correspond to meaningful binders that could be used to train future machine learning models. However, depending on the intended use, it may be acceptable to train ML models to predict expected densities under typical experimental conditions.

      Indeed, this part of the Discussion is speculative, and seeks to place our results into a possible broader context. The definition of “meaningful binders” in the context of fragment screening is a difficult one. As noted above in response to the comment about follow-up chemistry, one important measure of meaningfulfulness is the ability to successfully seed structure-based design of analogs that have potent functional effects, and many fragments do meet this definition. Regarding potential applications to machine learning, we agree it is not self-evident that structural data for small-molecule fragments will be readily translatable to AI/ML methods aimed at larger compounds. The reviewer’s point about predicting densities is an intriguing one, and is in line with the fragment screening ethos, including existing experimental as well as computational (e.g. Greisman, Willmore, Yeh*, et al., bioRxiv, 2022) approaches to mapping ligandable surface sites and regions. The number of RT structures we report here is high relative to most crystallography studies, but still is likely insufficient to explore questions about AI/ML training, and at any rate would be beyond the scope of the current report. However, it seems equally true that AI/ML methods trained on structures based on data from nonphysiological cryogenic conditions, with associated structural artifacts, may have some (previously unrecognized) limitations, and thus RT crystal structures can play a useful role in AI/ML training sets in the future. We have added new text to the Discussion paragraph in question to convey these points.

      Reviewer #2 (Public Review):

      The authors set out to understand how a room-temperature X-Ray crystallography-based chemical-fragment screen against a drug target may differ from a cryo screen. They carried out two room-temperature screens and compared the results with that of a cryo screen they previously performed. With a substantial set of crystallographic evidence they showed that the modes of protein-fragment binding are affected by temperature. The conclusion of the work is compelling. It suggests that temperature provides another dimension in X-ray crystallography-based fragment screening. In a practical sense, it suggests that room-temperature fragment screen is a promising new avenue for hit identification in drug discovery and for obtaining insights into the fragment binding. Room-temperature screening carries unique advantage over cryo screening. This work is confirmative to the notion, which seems not yet universally considered, that very weak protein-small molecule binding may be inherently fluid structurally, and that crystal structures of such weak binding, especially cryo structures, cannot be taken for granted without cross validation.

      We thank the reviewer for their clear summary and positive comments about our manuscript.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, The authors developed a mouse model to specifically investigate whether GC B cells that present nuclear protein (NucPr) could be specifically suppressed by Tfr cells. Most current mouse models that have been used in investigating Tfr functions are based on the overall readout of autoantibody production in the scenario of loss-of-function of Tfr cells. The proposed model of gain-of-function of Tfr cells is novel and valuable.

      The authors mainly compared two boosting immunizations by Strepatividin (SA) alone or SA-conjugated with nuclear proteins (SA-NucPr) and demonstrated SA-NucPr boosting immunization was able to expand Tfr cells, suppress overall and SA-specific GC/memory/plasma cell responses. The results are mostly convincing.

      One major concern is the conditions and controls used in the study. The control group (SA boosting immunization) would have enhanced T and B cell responses by this boosting. Unfortunately, there was no non-boosting control group so the level was unclear. It is therefore to strictly match such boosting condition in the SA-NucPr group. Notably, both SA and SA-NucPr were used at 10ug for boosting immunization. Considering NucPr were comparable or much larger (Nucleosome, about 200KDa) than SA (about 60KDa), the dose of SA in the SA-NucPr group was far less than that in the SA group. Due to this cavity, it is difficult to judge the difference between two groups was due to less SA boosting immunization or NucPr-induced Tfr function. This was a fundamental issue weakens the conclusion.

      The single cell analyses clearly demonstrated the expansion of Tfr clones. It remains unclear why other Treg populations other than Tfr cells were not expanded? The Treg cells in the CXCR5intPD-1int population were recently activated and should be able to respond to the boosting immunization. On an alternative explanation, the changes in Tfr cells could be indirectly driven by the changes in Tfh cells. For example, Tfh can produce IL-21 and restrict Tfr expansion (Jandl C, et al.2017). This could be the case of the reduction in Tfr cells in the SA-OVA group as compared to the SA group.

      As the reviewer, we were surprised not to detect significant increase in the levels of CXCR5intPD-1int Tregs in the original experiment after the boosting with SA-NucPrs(Fig.1). Our interpretation of this result was that the fraction of NucPr-specific CXCR5intPD-1int Tregs was small as compared to the total CXCR5intPD-1int Tregs and proliferation of this small fraction of cells would not be detectable by flow cytometry analysis of the total CXCR5intPD-1int Tregs numbers. Alternatively, the observed rapid accumulation of Tfrs was due to proliferation of the NucPr-specific Tfrs that may be abundant after a standard immunization with foreign antigen.

      In single cell analysis we have used only presorted CXCR5highPD1high follicular T cells so majority of CXCR5intPD-1int Treg population was excluded from the analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors optimize a live cell imaging method based on the detection of FAD/NAD(P)H adopted from the fast-growing field of live metabolic imaging. They build upon a method described by KreiB et al 2020 that used metabolic ratio and collagen fiber second harmonic generation imaging. They follow by combining metabolic imaging with morphologic measurements to train a machine-learning model that is able to identify cell types accurately. Upon visualization, authors detected structures hypothesized and then proven to resemble the "goblet cell associated antigen passages" previously studied in intestinal epithelia.

      STRENGTHS

      • The manuscript is succinct, well written, and overall done rigorously.

      • The optimization of the method at multiple levels to the point of identifying both common and rare cell types is impressive.

      • Describes the elegant implementation of a sorely needed method in epithelial biology.

      • Provides an approach to studying the cholinergic response in epithelial cells, a poorly understood phenomenon despite broad clinical use for diagnosis and treatment.

      WEAKNESSES

      A) For what is in large part a methods-development paper, the methods are not explained or shared in a manner that facilitates reproducibility. For example:

      A.1.) The training and validation datasets seem to come from the same sample (or the source is not clearly described). Therefore, it is not clear whether the "96% accuracy" refers to accuracy within the sample measured, or whether it can extrapolate to other samples.

      In order to avoid any confusion, we further clarify that the machine learning training and validation data sets come from the same sample. We had split the total data set into 2 separate subsets for this purpose. This has been laid out in the text as follows:

      “In order to assess the performance of machine learning algorithms designed to distinguish cell types, we divided our data set into training and testing subsets. We utilized 75% of the total cells (154 cells) for machine learning training, leaving 25% (52 cells) for subsequent validation.”

      A.2.) It is unclear whether the model needs to be re-trained within each new sample measured, or if it's applicable to others. This has implications for method adoption by others. Either way is useful but needs to be clarified.

      This is a very interesting point and one that we further clarify in the Discussion noting that in both disease and non-diseased states the model needs to be re-trained in each particular experimental regime.

      A.3.) Code was only listed in a PDF file, which makes reproducing the analysis very cumbersome.

      We hope that all can utilize the code made for this methodology and have uploaded it to a publicly available GitHub account:

      https://github.com/vss11/Label-free-autofluorescence

      B) Whereas the optimization to improve cell type detection is very well described, the implementability of the approach could benefit from exploration (using the data already obtained) of the minimal set of measurements needed to identify cell types. For example, is the FAD/NAD(P)H ratio necessary? Or could just morphologic measurements achieve the same goal?

      This is an excellent point, and we appreciate the Reviewer’s suggestion for this analysis. We have added Figure 3 Supplement 5 where we perform modeling without autofluorescence data. This analysis reveals a dramatic reduction in accuracy with a Matthew’s correlation coefficient ranging from 0.66 to 0.78. This provides additional justification for the use of autofluorescence for cell type identification. Morphologic measurements are not sufficient for cell type identification alone.

      We also have determined the relative contribution of each characteristic to the cell type identification by the Xgboost algorithm in Figure 3 Supplement 4, which shows that autofluorescence signatures are amongst the top contributing characteristics to cell type identification by machine learning.

      C) Whereas the conclusions are overall supported by the data, need small adjustments in some cases:

      C.1.) For example, P3L80: Claims autofluorescence imaging is more specific than "functional markers", however, this is done in the setting of a very specific intervention that massively affects a protein often used as a secretory cell marker (CCSP aka SCGB1A1), which is known to be secreted (and depleted) in secretory cells upon stimulation.

      We agree with the Reviewer that secretory cell identification is a prime example where autofluorescence imaging may be superior to conventional staining, specifically due to the point the Reviewer makes regarding CCSP secretion. We discuss this concept in the Discussion while giving examples of CCSP staining being reduced in asthma, COPD, and smokers. It could be that these cells are missed due to depletion of CCSP. Indeed, we clarify that our methodological approach may be less affected by the loss of the category of specific markers that change with cell state. There are, of course, caveats with utilizing this approach in disease states, and we elaborate on this further below and add this point to the discussion.

      C.2.) Relatedly, it is unclear how the method's accuracy would be affected in conditions that affect redox/metabolic state; the approach may be highly affected in inflammation and injury, for example.

      As suggested by the Reviewer, we re-analyzed the data after Antimycin A + Rotenone and FCCP to determine if autofluorescence ratio is sufficiently different to identify ciliated and secretory cells and included this data in Figure 2 Supplement 1. This is an example where the redox/metabolic state is indeed altered. Though the autofluorescence ratio is affected, it is still useful for cell type identification after intervention as the ciliated and secretory cells have statistically different ratios.

      However, different disease states, particularly infection and inflammation may result in a more profound effect on autofluorescence signatures. For instance, previous work by Dilipkumar et. al, 2019 found changes in autofluorescence over days in repeated measurements in a mouse model of inflammatory bowel disease. Therefore, it is likely that the cell type identification methodology will need to be re-optimized for different experiments and diseased tissues. We include commentary to this effect in the discussion.

      D) The data used to describe "SAPs" is very cursory.

      To further elaborate on our description of SAPs we have included the following:

      1) SAP formation occurs in secretory cells in both stimulated and unstimulated conditions. We performed additional analysis of Figure 4C and determined that SAP formation does occur at baseline prior to stimulation in 9% of secretory cells. Methacholine addition results in 78% of secretory cells forming SAPs (Figure 4 Supplement 1). We have added Figure 5C to demonstrate that SAP formation occurs in the absence of stimulation and is enhanced after methacholine stimulation.

      2) We demonstrate that SAPs can uptake both FITC-dextran and FITC-ovalbumin in Figure 5E, and Figure 5 Supplement 2. We also now show that immune cells (CD11c antigen presenting cells) associate with SAPs containing FITC-dextran and FITC-ovalbumin in Figure 5E and Figure 5 Supplement 2. We have expanded the Discussion of SAPs.

      3) We now show 3 video examples and an XZ optical cross section of ALI that demonstrate uptake and secretion of FITC-dextran in Figure 5 Supplemental Videos 1-3 and Figure 5 Supplement 1.

      D.1.) Unclear if FITC dextran uptake occurs in other cells too, or in secretory cells prior to methacholine stimulation, or induced nonspecifically due to epithelia manipulation. Secretory and goblet cells are very sensitive to stimulation and often considered minimal, for example, see the paper by Abdullah et al DOI:10.1007/978-1-61779-513-8_16 in which extreme care had to be applied to prevent any secretion at all.

      Our autofluorescence methodology revealed the formation of “voids” of autofluorescence forming in secretory cells and we focused our experiments on this phenomenon. Based on the reviewer question, we generated Figure 5C to better characterize SAP formation. Figure 5C illustrates that SAP formation occurs in both unstimulated and methacholine stimulated conditions, but is dramatically increased following methacholine stimulation. This is analogous to the behavior of GAPs in the intestine (Knoop et al., 2015). Furthermore, we have reanalyzed Figure 4C to identify SAPs prior to stimulation and found that these structures are present in 9% of secretory cells. After methacholine stimulation this percentage increases to 78%.

      D.2.) A single image is provided for the SAP timeline (Figure 5C), which appears to be the same cell shown in the supplementary video.

      We now provide numerous example videos and optical XZ cross section of ALI demonstrating SAP uptake and secretion in Supplementary Videos 1-3 and Figure 5 Supplement 1.

      IMPACT AND UTILITY

      This is well-done work with high potential for widespread adoption within the epithelial biology community, particularly if the methods and code are shared in better detail.

      We indeed hope that this methodology can be utilized by others. We have posted analysis code, raw data, MATLAB algorithm, and other necessary files onto a publicly available GitHub link. https://github.com/vss11/Label-free-autofluorescence

      Reviewer #2 (Public Review):

      Shah and colleagues tackle a significant impediment to exploiting tissue culture systems that enable prospective ex vivo experimentation in real-time. Namely, the ability to identify and track dynamic and coordinated activities of multiple composite cell types in response to experimental perturbations. They develop a clever label-free approach that collects biologically-encoded autofluorescence of epithelial cells by 2-photon imaging of mouse tracheal explant culture over 2 days. They report the ability to distinguish 7 cell types simultaneously, including rare ones, by developing a machine-learning approach using a combination of fluorescence and cytologic features. Their algorithm demonstrates high accuracy by Mathew's Correlation Coefficient when applied to a test set. Lastly, they show the ability of their approach to visualize the dynamic uptake and expulsion of fluorescently-tagged dextran by individual secretory cells. Overall, the results are intriguing and may be very useful for specific applications.

      We thank the reviewers for their assessment and indeed hope that the methodology is useful and the discovery of the dynamics of SAP formation have important implications for airway mucosal immunology.

    1. Author Response

      Reviewer #1 (Public Review):

      Animal colour evolution is hard to study because colour variation is extremely complex. Colours can vary from dark to light, in their level of saturation, in their hue, and on top of that different parts of the body can have different colours as well, as can males and females. The consequence of this is that the colour phenotype of a species is highly dimensional, making statistical analyses challenging.

      Herein the authors explore how colour complexity and island versus mainland dwelling affect the rates of colour evolution in a colourful clade of birds: the kingfishers. Island-dwelling has been shown before to lead to less complex colour patterns and darker coloration in birds across the world, and the authors hypothesise that lower plumage complexity should lead to lower evolutionary rates. In this paper, the authors explore a variety of different and novel statistical approaches in detail to establish the mechanism behind these associations.

      There are three main findings: (1) rates of colour evolution are higher for species that have more complex colour phenotypes (e.g. multiple different colour patches), (2) rates of colour evolution are higher on island kingfishers, but (3) this is not because island kingfishers have a higher level of plumage complexity than their mainland counterparts.

      I think that the application of these multivariate methods to the study of colour evolution and the results could pave the way for new studies on colour evolution.

      We appreciate this positive comment about our manuscript.

      I do, however, have a set of suggestions that should hopefully improve the robustness of results and clarity of the paper as detailed below:

      1) The two main hypotheses tested linking plumage complexity and island-dwelling to rates of colour evolution seem rather disjointed in the introduction. This section should integrate these two aspects better justifying why you are testing them in the same paper. In my opinion, the main topic of the paper is colour evolution, not island-mainland comparisons. I would suggest starting with colours and the challenges associated with the study of colour evolution and then introducing other relevant aspects.

      We implemented this suggestion by reorganizing the introduction to introduce color/and challenges with studying it (para 1), then we discuss plumage complexity (para 2). We follow this with a paragraph about the importance of islands in testing evolutionary hypotheses (para 3), and onto kingfishers as a model system (para 4) and our hypothesis/predictions (para 5).

      2) Title: the title refers to both complex plumage and island-dwelling, but the potential effects of complexity should apply regardless of being an island or mainland-dwelling species, am I right? Consider dropping the reference to islands in the title.

      We removed “island” from the title.

      3) The results encompass a large variety of statistical results some closely related to the main hypothesis (eg island/mainland differences) tested and others that seem more tangential (differences between body parts, sexes). Moreover, quite a few different approaches are used. I think that it would be good to be a bit more selective and concentrate the paper on the main hypotheses, in particular, because many results are not mentioned or discussed again outside the Results section.

      We removed analyses that we felt were distracting from our main point (e.g., MCMCglmm) and streamlined our approach to use PGLS methods for both rates (phylolm) and multivariate color patterns (d-PGLS). The relevance of sex differences in coloration is also made more clear, as we added details about how we tested for a relationship between male and female coloration and that we use this strong correlation as a justification for averaging color by species (e.g., see lines 369-375).

      4) Related to the previous section, the variety of analytical approaches used is a bit bewildering and for the reader, it is unclear why different options were used in different sections. Again, streamlining would be highly desirable, and given the novel nature of the analytical approach (as far as I know, many analytical approaches are applied for the first time to study colour evolution) it would be good to properly explain them to the reader, highlighting their strengths and weaknesses.

      We appreciate the suggestion and have now included a workflow diagram, as suggested (see Figure 1). We further added considerable detail to the Methods (old length = 502 words, new length = 1355 words) and mention caveats of the approaches we have taken (e.g., line 308: “We used photosensitivity data for the blue tit (Hart et al., 2000) due to the limited availability of sensitivity data for other avian species”).

      5) The Results section contains quite a bit of discussion (and methods) despite there being a separate Discussion section. I suggest either separating them better or joining them completely.

      We appreciate this. We were following other eLife articles that include more discussion within the Results, therefore we would prefer to leave these aspects in place. However, we did move a considerable amount of information from the Results section to the Methods section. In addition, we also reorganized the Results to better match the logical flow of the Introduction. The end result, we hope, is a Results section that is considerably more streamlined.

      6) The main analyses of colour evolutionary rates only include chromatic aspects of colour variation. Why was achromatic variation (i.e. light to dark variation) not included in the analyses? I think that such variation is an important part of the perceived colour (e.g. depending on their lightness the same spectral shape could be perceived as yellow or green, black or grey or white). I realize that this omission is not uncommon and I have done so myself in the past, but I think that in this case, it is highly relevant to include it in the analyses (also because previous work suggests that island birds are darker than their mainland counterparts). This should be possible, as achromatic variation may be estimated using double cone quantum catches (Siddiqi et al., 2004) and the appropriate noise-to-signal ratios (Olsson et al., 2018). Adding one extra dimension per plumage patch should not pose substantial computational difficulties, I think.

      We incorporated this suggestion and we have now fully integrated achromatic color variation into all of our analyses. These new analyses let us compare results to previous work showing that island birds are darker than mainland counterparts. We further discuss the caveats of chromatic and achromatic channels (e.g., lines 313-317: “Although it is possible, in theory, to combine chromatic and achromatic channels of color variation in a single analysis (Pike, 2012), we opted to analyze them separately, as these different channels are likely under different selection pressures (Osorio and Vorobyev, 2005).”).

      7) The methods need to be much better explained. Currently, some methods are explained in the main text and some in the methods section. All methods should be explained in detail in the methods section and I suggest that it would be better to use a more traditional manuscript structure with Methods before Results (IMRaD), to avoid repetition (provided this is allowed by the journal). Whenever relevant the authors need to explain the choice of alternative approaches. Many functions used have different arguments that affect the outcome of the analyses, these need to be properly explained and justified. In general, most readers will not check the R script, and the methods should be understandable to readers that are not familiar with R. This is particularly important because I think that the methodological approach used will be one of the main attractions of the manuscript, and other researchers should be able to implement it on their own data with ease. Judging from the R script, there are quite a few analyses that were not reported in the manuscript (e.g. multivariate evolutionary rates being higher in forest species). This should be fixed/clarified.

      We clarified several methodological details in the manuscript (e.g., added package versions throughout, mention the permutation option used for compare.evol.rates, cited RPANDA) and modified the Methods section considerably to make logical connections among the sections. We also checked and cleaned up the R markdown file to ensure the analyses were in sync with the manuscript analyses.

      Reviewer #2 (Public Review):

      In "Complex plumages spur rapid color diversification in island kingfishers (Aves: Alcedinidae)", Eliason et al. link intraspecific plumage complexity with interspecific rates of plumage evolution. They demonstrate a correlation here and link this with the distinction between island and mainland taxa to create a compelling manuscript of general interest on drivers of phenotypic divergence and convergence in different settings.

      This will be a fantastic contribution to the literature on the evolution of plumage color and pattern and to our understanding of phenotypic divergence between mainland and island taxa. A few key revisions can help it get there. This paper needs to get, fairly quickly, up to a point where the difference between plumage complexity and color divergence is defined carefully. That should include hammering home that one is an intraspecific measure, while one is an interspecific measure. It took me three reads of the paper to be able to say this with confidence. Leading with that point will greatly improve the paper if that point gets forgotten then the premise of the paper feels very circular.

      We hope our considerable modifications throughout–including explicitly mentioning that complexity is an intraspecific measure whereas rates are interspecific (e.g., see lines 65, 140, 170, 667)–have made the premise of the paper more clear. We also added a new workflow figure (Figure 1) that includes example species pairs showing cases in which intraspecific plumage complexity and interspecific color divergence could show a negative relationship, rather than a positive one as we predict in the manuscript. We discuss this detail in lines 159-161 (“However, this is not necessarily the case, as there are examples within kingfishers that show simple plumages yet high color divergence, as well as complex plumages with little evolutionary divergence (Figure 1B).”).

      Also importantly, somewhere early on a hypothesized causal pathway by which insularity, plumage complexity, and color divergence interact needs to be laid out. The analyses that currently follow are good ones, and not wrong, but it's challenging to assess whether they are the right ones to run because I'm not following the authors' reasoning very well here. I think it's possible a more holistic analysis could be done here, but I'll refrain from any such suggestions until I better get what the authors are trying to link.

      We overhauled the Introduction. This included adding lines that connect the ideas of complexity and insularity (lines 65-58: “intraspecific plumage complexity (i.e., the degree of variably colored patches across a bird's body) could be a key innovation that drives rates of color evolution in birds and should be considered alongside ecological and geographic hypotheses.”) and insularity and color divergence (lines 69-85). We also rethought the analyses and now include PGLS analyses using tip-based rates that allow us to account for both insularity and complexity in the same analysis.

      We also need something near the top that tells us a bit more about the biogeography of kingfishers. Are kingfisher species always allopatric? I know the answer is no, but not all readers will. What I know less well though is whether your insular species are usually allopatric. I suspect the answer is yes, but I don't actually know.

      Great point. We have added details to the manuscript to clarify this (e.g., line 214: “The number of sympatric lineages ranged from 1–9 on islands, and 6–38 for mainland taxa.”).

      In short, how do the authors think allopatry/sympatry/opportunity for competition link to mainland vs. island link to plumage complexity? And rates of color evolution? Make this clear upfront.

      We believe our revised introduction makes these connections much clearer.

    1. Author Response

      Reviewer #1 (Public Review):

      Causality is important and desired but usually difficult to establish. In this work, Park et al. conducted a comprehensive phenome-wide, two-sample Mendelian randomization analysis to infer the casual effects of plasma triglyceride (TG) levels on 2,600 disease traits. They identified causal associations between plasma TG levels and 19 disease traits, related to both atherosclerotic cardiovascular diseases (ASCVD) and non-ASCVD diseases. They used biobank-scale data in both discovery analysis and replication analysis.

      The conclusions of this work are mostly supported by the data and analysis, but some aspects need to be clarified and extended.

      (1) The datasets used in this study may not be very consistent. For example, UKB participants are aged 40-69 years old at recruitment. In addition, UKB is United Kingdom-based and FinnGen is Finland-based. So the definition of outcomes may not be identical. The authors should discuss the differences between the datasets and their potential effects.

      The reviewer is correct about the differences between UKB and FinnGen and that the definition of clinical outcomes between the two datasets may not be identical due to differences in healthcare systems and population demographics. We now mention this in the discussion section as a potential limitation.

      Manuscript changes:

      Line 520-539: “Third, UKB and FinnGen have innate differences in participant demographics and medical coding systems, due in part to the former being based in the United Kingdom and the latter in Finland. As such, potential misclassification of participants in case-control assignment is a liability to this study. We exercised caution in mapping UKB traits to FinnGen traits, but we were unable to reliably map all “categorical” traits from UKB to corresponding traits in FinnGen, testing for replication only 221 of the 598 associations that were nominally significant in the primary analysis. We note however that, despite geographical differences, both datasets largely involve White European participants of older age, with the mean age in UKB and FinnGen being 56.5 and 59.8, respectively.”

      (2) The discovery analysis and replication analysis are not completely independent because data from UKB have been used in both analyses. Although in discovery, the data were used for association with outcomes; while in replication, the data were used for association with exposure. The authors may want to explain if this may cause problems.

      The reviewer is correct that UKB data were used in both the discovery and replication analyses with the caveat that the discovery analysis used UKB for outcomes while using GLGC for exposures, whereas the replication analysis used UKB for exposures while using FinnGen for outcomes. We believed this would be a creative use of three different datasets and a strength of the study; however, we agree that examining the implications of this study design is needed to acknowledge potential biases. We now expand on this in the discussion section as a potential limitation.

      Manuscript changes:

      Lines 539-545: “Fourth, discovery and replication analyses were not completely independent, since UKB data were used in both analyses. This could potentially exacerbate demographic and measurement biases inherent to UKB; however, we show that taking a traditional replication approach using GLGC instead of UKB for selecting exposure instruments in replication returns comparable Tier 1 results (Supplementary Files 5), while losing statistical power to highlight many of the Tier 2 and 3 results.”

      (3) As stated in the manuscript, there are three assumptions for MR analysis. The validity of the results depends on the validity of the assumptions. The last two assumptions are usually difficult to validate. To the authors' credit, they conducted sensitivity analyses addressing horizontal pleiotropy, which is related to assumption 3. It would be helpful if the authors can discuss those assumptions explicitly.

      We now explicitly state the assumptions of Mendelian randomization in the introduction section and discuss the validity of these assumptions in the discussion section.

      Manuscript changes:

      Lines 501-514: “The study has several limitations. First, MR is a powerful but potentially fallible method that relies on several key assumptions, namely that genetic instruments are (i) associated with the exposure (the relevance assumption); (ii) have no common cause with the outcome (the independence assumption); and (iii) have effects on the outcome solely through the exposure (the exclusion restriction assumption) (Hartwig et al., 2016). In MR, (i) is relatively straightforward to test, while (ii) and (iii) are difficult to establish unequivocally. As a prominent example, horizontal or type I pleiotropy has been shown to be common in genetic variation, which can bias MR estimates (Verbanck et al., 2018) (Jordan et al., 2019). This occurs when a genetic instrument is associated with multiple traits other than the outcome of interest. To detect and correct for this as best as possible, we used various MR tests as sensitivity analyses that each aim to adjust for or account for the presence of horizontal pleiotropy, including MR-PRESSO, as well as MR-Egger and weighted median methods. There is no universally accepted method that is perfectly robust to horizontal pleiotropy, but we take the best current approach by using multiple methods and examining the consistency of results.”

      Reviewer #2 (Public Review):

      This work conducted a Mendelian randomization analysis between TG and a large number of disease traits in biobanks. They leverage the publicly available summary statistics from the European samples from the UK Biobank and FinnGen. A solid but routine standard summary-statistics based MR study is conducted. Several significant causal associations from TG to phenotypes are called by setting p-value cutoff with some Bonferroni correction. Sensitivity statistical analyses are conducted which generate largely consistent results. The research problem is important and relevant for public health as well we drug development. Overall this is a solid execution of current methods over appropriate data source and yields a convincing result. The interpretation of the results in discussion is also well-balanced.

      While the paper does have strengths in principle, a few technical weaknesses are observed.

      They used UK Biobank as the discovery and FinnGen as the replication. But the two cohorts are rather used symmetrically. Especially for the Tier 3 (NB), it seems to be an attempt of reusing the replication cohort as the discovery. I wonder if that would create additional multiple testing burden as a greater number of hypotheses are considered.

      We thank the reviewer for this thought-provoking comment. As the reviewer is aware, MR studies have generally not accounted for multiple testing in the past since they have usually focused on single exposures and/or single diseases. Ours is among one of the more unique MR studies taking a phenome-wide, high-throughput approach, so determining the optimal threshold for balancing true-positive vs. false-positive discovery is an important aspect of the study warranting discussion.

      We agree that Tier 3 results carry the least stringent level of statistical evidence (i.e., nominally significant in discovery using UK Biobank and Bonferroni-significant in replication using FinnGen), and that these results should be interpreted with caution. As a phenome-wide study, a significant aim of this work was to generate hypotheses, and so, we decided to present our results using the three tiers of statistical evidence to highlight as many promising associations as possible for further investigation. Nevertheless, we now express extra caution in the results and discussion sections regarding Tier 2 and 3 results, and we also note as a limitation that these results especially require external replication.

      Manuscript changes:

      Lines 438-444: “Regarding non-ASCVDs, we present suggestive genetic evidence of potentially causal associations between plasma TG levels and uterine leiomyomas (uterine fibroids), diverticular disease of intestine, paroxysmal tachycardia, hemorrhage from respiratory passages (hemoptysis), and calculus of kidney and ureter (kidney stones). Due to the weaker statistical evidence supporting these associations, special caution is encouraged when interpreting these results to infer causality, and further replication and validation studies are essential for all Tier 2 and Tier 3 results.”

      The replication p-value cutoff is a bit statistically lenient. In a typical discovery-replication setting the two stages are conducted sequentially and replication should go through the Bonferroni adjustment on the number of significant signals from discovery that is tested in the replication. For example, in this case, in tier 2, the cutoff should be 0.05/39. This may make the association of leiomyoma of the uterus slightly non-significant though. Similar cutoff should be applied to tier 3 as well.

      We thank the Reviewer for highlighting this important point. We agree that in a standard two-stage discovery and replication study design, the Bonferroni adjustment should be based on the number of significant signals from discovery that is tested in the replication. We had initially considered this approach but chose the current tiered approach based on a number of factors:

      First, we had initially considered performing a standard meta-analysis between UK Biobank and FinnGen datasets and using the Bonferroni adjustment of the total number of tests. However, it was not possible to reliably map the phenotypes between UK Biobank and FinnGen on a large-scale due to different classification schemes.

      Second, we had noticed that if we only focus on the sequential two-stage design, then we would be ignoring strong causal relationships observed in FinnGen that passed Bonferroni adjustment but may only be nominally associated in UK Biobank. Although not as strong as Tier 1 findings, we believe that these findings warranted some consideration. This is particularly relevant since differences in the strength of the causal relationship could be attributed to the different populations studied, sample size, different health systems used to measure disease outcomes, differences in statistical power in the MR tests between the two stages (e.g., number of IVs), amongst others.

      Third, we wanted to point out that the total adjustment for number of phenotypes tested using Bonferroni is a very conservative adjustment because the multiple EHR phenotypes have varying degrees of redundancy and correlation. We believe the appropriate Bonferroni-adjusted P-value cutoff is somewhere in between the Bonferroni adjustment of total number of phenotypes, and the nominal P-value (no adjustment for number of phenotypes).

      Although somewhat unconventional, we came up with this tiered P-value approach to overcome the points mentioned above. We have now included text to further explain our approach and to mention that tier 2 and tier 3 results require further replication and validation.

      Manuscript changes:

      Lines 266-283: “This presentation is somewhat unconventional and partly arises from the study’s use of three different datasets for instrument selection. In a traditional two-stage discovery and replication design, Bonferroni adjustment is based on the number of significant signals from discovery that is tested in replication. Here, we used three tiers of statistical evidence to present results because a standard meta-analysis between UKB and FinnGen was not possible, given it was not possible to reliably map all phenotypes between the two datasets. Additionally, Bonferroni-significant results in the replication analysis would have been ignored in FinnGen in a sequential two-stage design if they were also only nominally associated in UKB. The three tiers are defined below:”

      Lines 441-444: “Due to the weaker statistical evidence supporting these associations, special caution is encouraged when interpreting these results to infer causality, and further replication and validation studies are essential for all Tier 2 and Tier 3 results.”

      Lines 498-500: “However, we reiterate that this Tier 3 association was only nominally significant in discovery, while Bonferroni-significant in replication, and future studies are needed to validate the statistical evidence.”

      Lines 565-567: “However, caution is still warranted in inferring causality, as MR depends on specific assumptions and the validity of those assumptions must be carefully assessed. Thus, diverse study designs remain necessary to triangulate evidence on the causal effects of plasma TG levels.”

      The causal effect of TG to leiomyoma of the uterus is weak, as indicated by both the sub-significant in the replication and the non-significant of MR-PRESSO. Similarly, I would recommend more caution on the weak statistical rigor when interpreting Tier 2 and Tier 3 results.

      We agree with the Reviewer. We have now emphasized more caution in interpreting Tier 2 and Tier 3 results. We have also explicitly restated the weaker statistical evidence underlying these results and noted need for future validation. Please see our detailed response to the Comment above.

      Manuscript changes:

      Lines 498-500: “However, we reiterate that this Tier 3 association was only nominally significant in discovery, while Bonferroni-significant in replication, and future studies are needed to validate the statistical evidence.”

      Another methodological choice that might need justification is the use of UKB TG GWAS loci (1,248 SNPs) are the instrument for FinnGen. This may create some subtle interference with the use of UKB as outcomes in the discovery analysis. It may be minor but some justification or at least some discussions of potential limitations should be mentioned. What about the alternative of using GLGC as instruments in replication?

      We agree with the reviewer that the use of UKB TG GWAS loci (1,248 SNPs) as instruments for FinnGen outcomes needs additional justification. We now detail this decision in the text as copied below.

      Additionally, we now present new data comparing MR results on FinnGen outcomes when selecting TG instruments from UKB GWAS versus GLGC GWAS. Statistical significance after Bonferroni correction was set to 0.05/221, where 221 was the number of disease traits nominally significant in UKB that were tested in FinnGen. We note that the results were fairly consistent. All Tier 1 results remained Bonferroni significant, whether using TG SNPs from UKB or GLGC. Though statistical significance decreased for the remaining diseases of interest, the direction of causality remained consistent, and three disease traits remained significant (hypertension, aortic aneurysm, and alcoholic liver disease). These results support that instrumenting TG using 1,248 SNPs from UKB might carry more power than the 141 SNPs from GLGC, allowing for the detection of associations in our initial replication analysis using UKB for exposures and FinnGen for outcomes. We now include this analysis in the text and include the figure below, as well as its underlying data, as supplementals (Supplementary File 5).

      Manuscript changes:

      Lines 229-236: “We selected UKB TG GWAS loci as the instruments for replication on FinnGen outcomes, rather than GLGC TG GWAS loci, to diversify the source of TG instruments and mitigate potential biases associated with one TG GWAS. Moreover, UKB GWAS included a larger study population than GLGC GWAS, providing a greater number of genetic instruments that can together explain more of the variance in plasma TG levels, and thus, greater statistical power and precision. Nevertheless, we also performed the replication analyses using TG instruments from GLGC and included these results as supplemental data (Supplementary File 5).”

      For disease outcomes (line 188), UKB European sample size is ~400,000 rather than ~500,000. Can the author clarify the sample size they used?

      We thank the reviewer for catching this detail. We have now clarified the sample size of UKB European participants in the Methods section, and we also included the exact sample size of each disease trait GWAS (cases and controls) in Supplementary Figure 1.

      Manuscript changes:

      Lines 194-201: “Pan-UKB had performed 16,131 GWASs on 7,221 phenotypes in ~420,531 UKB participants of European ancestry using genetic and phenotypic data (PanUKBTeam, 2020). A total of 7,221 total phenotypes had been categorized as “biomarker”, “continuous”, “categorical”, “ICD-10 code”, “phecode”, or “prescription” (PanUKBTeam, 2020). We filtered for outcomes to retain categorical, ICD-10, and phecode types; non-null heritability in European ancestry as estimated by Pan-UKB; and relevance to disease, excluding medications. This yielded 2,600 traits for primary analysis. The exact sample size of each GWAS for each of these traits is provided in Supplementary File 1.”

      It would be reassuring to the reader if the TG measurements were measured in a treatment-naïve manner. GLGC accounted for treatment (at least LDL, check paper for TGs; if they didn’t, there must be reason). Maybe not UKB.

      We now provide information about whether the lipid measurements were measured in a treatment-naïve manner in the Methods for GLGC and UKB. We also address this point in the discussion section as a potential limitation.

      Manuscript changes:

      Lines 179-180: “We note that the GLGC GWAS had excluded individuals known to be on lipid-lowering medications.”

      Lines 187-188: “We note that the Pan-UKB GWAS study did not exclude participants based on their use of lipid-lowering medications.”

      Lines 545-546: “Fifth, the GLGC GWAS used to select instruments for plasma TG levels in discovery had accounted for lipid-lowering treatment, while the UKB GWAS used in replication had not.”

      "Phenome-wide MR is a high-throughput extension of MR that, under specific assumptions, estimates the causal effects of an exposure on multiple outcomes simultaneously." - I guess it is more informative to mention the specific assumptions, at least briefly, in the introduction so it is easier for the reader to interpret the results.

      We agree with the reviewer that it would be informative to explicitly state the assumptions of Mendelian randomization. We now explicitly state these assumptions in the introduction.

      Manuscript changes:

      Lines 123-129: “Phenome-wide MR is a high-throughput extension of MR that estimates the causal effects of an exposure on multiple outcomes simultaneously. As in conventional MR, this method uses genetic variants as instrumental variables (IV) to proxy modifiable exposures (Davey Smith & Ebrahim, 2003), and importantly, it relies on three critical assumptions: (1) The genetic variant is directly associated with the exposure; (2) The genetic variant is unrelated to confounders between the exposure and outcome; and (3) The genetic variant has no effect on the outcome other than through the exposure (Davey Smith & Ebrahim, 2003).”

      Reviewer #3 (Public Review):

      Park and Bafna et al. applied a genetics-based epidemiological approach, the Mendelian randomization analysis (MR), to evaluate the potential causal roles of triglycerides across 2,600 disease traits (i.e., the phenome). In a typical two-sample MR framework, they utilized existing genome-wide association study (GWAS) summary statistics from two separate studies. They are Global Lipids Genetics Consortium (GLGC) and UK Biobank in the discovery analysis, and UK Biobank and FinnGen in the replication analysis. This replication design is a great strength of the study, enhancing the robustness and reproducibility of the results. For the candidate pairs of causal associations, the authors further perform multiple sensitivity analyses to evaluate the robustness of the results to possible violations of assumptions in MR. To disentangle the independent effects of triglycerides from other lipid fractions (i.e., LDL-cholesterol and HDL-cholesterol), the authors performed multivariable MR analysis. In the end, possible causal associations were revealed in three tiers, based on statistical significance in the two-stage analysis. The results support the causal effects of triglycerides in increasing the risk of atherosclerotic cardiovascular disease. They also reveal novel conditions, which are either new treatable conditions (e.g., leiomyoma, hypertension, calculus of kidney and ureter) for repurposing of triglycerides-lowering drug, or possible side effects (e.g., alcoholic liver disease) the triglyceride-lowering treatment should pay special attention to.

      The analysis approaches in the paper are standard and solid. The discovery-replication study design is a great strength. Correction for multiple testing was implemented in a conservative way. The sensitivity analyses and MVMR strengthen the robustness of the results. The manuscript is very clearly written and pleasant to read. The limitations were well-presented. The conclusions and interpretations are mostly supported by the data, with one major concern as explained below. But overall, in addition to the specific findings, this study could be an exemplar study for the use of phenome-wide MR in identifying treatable conditions and side effects for most existing drugs.

      1) My major concern is about reverse causation. For example, having atherosclerotic cardiovascular disease increases circulating triglycerides. Reverse causation can induce false positives in MR analysis. With the existing data in this study, the authors can perform a reverse MR to evaluate the effect of the 19 disease traits on triglycerides. Ruling out the presence of reserve causation is important to make sure that the current findings are not false positives.

      We agree with the reviewer that performing reverse MR would be important to rule out reverse causation. We now present new results using reverse MR, selecting instruments for disease from UKB and instruments for TG from GLGC (i.e., reversing the discovery analysis). We provide an interpretation of these new results in the discussion section and present the underlying data, including the number of genetic variants used, in Supplementary File 6. Please note we could only perform reverse MR on 9 of the 19 diseases of interest, due to insufficient genetic data in GLGC to extract the specific exposure instruments. As expected, we observed significant associations (orange) between “disorders of lipoprotein metabolism” and “hyperlipidemia” with plasma TG levels; however, all other estimates were non-significant, suggesting unidirectional associations for the remaining seven disease traits. We now include the figure below and its underlying data as supplements (Supplementary File 6).

      Manuscript changes:

      Lines 258-261 “Finally, we performed bidirectional or reverse MR on significant results to examine the potential presence of reverse causation. We selected instruments for each disease as described above from Pan-UKB and instruments for plasma TG levels from GLGC, essentially reversing the discovery stage design using a fixed-effect IVW method.”

      Lines 368-373: “Finally, we performed reverse MR to estimate the effects of significant disease traits on plasma TG levels, selecting instruments from UKB and GLGC, respectively. Genetic data were sufficiently available to perform this analysis for 9 of the 19 diseases of interest. These results are presented in Supplementary File 6. Expectedly, “disorders of lipoprotein metabolism” and “hyperlipidemia” had positive effects on plasma TG levels; however, no other examined disease trait showed results suggesting reverse causation.”

    1. Author Response

      Reviewer #2 (Public Review):

      The molecular characteristics of OCNs in normal or ototoxic conditions are poorly understood before. The strength of this study is that it provides the first single-cell RNA-seq database of OCNs as well as surrounding facial branchial motor neurons. By thoroughly analyzing the database, they found high heterogeneities within OCN populations and identified distinct markers that are enriched in different OCN subtypes. Furthermore, a few previously unknown neuropeptides are revealed, including Npy which is more enriched in the LOC-2 located on the medial side. They also found that neuropeptide expression levels and distributions are subjected to hearing experience and noise exposure. On the other hand, the weakness of the study is that the numbers of single-cell RNA-seq are not sufficient, and may underscore the MOC heterogeneity (Figure 3A). Moreover, the physiological functions of the LOC-2 are not revealed in this study, and no specific markers in one OCN subtype are identified that can predict the morphological or projecting axon features. Those might be addressed in the following studies.

      We agree that this study does not allow us to make conclusions about MOC heterogeneity or LOC2 functions. These are certainly interesting avenues to pursue in the future.

    1. Author Response

      Reviewer #3 (Public Review):

      Although initially discovered as axon guidance molecules in the nervous system, Semaphorins, signaling through their receptors the Neuropilins and Plexins, regulate a variety of cell-cell signaling events in a variety of cell types. In addition, cells often express multiple Semas and receptors. Thus, one important question that has yet to be adequately understood about these important signaling proteins is: how does specificity of function arise from a ubiquitously expressed signaling family?

      This study addresses that important question by investigating the role of cysteine palmitoylation on the localization and function of the Neuropilin-2 (Nrp-2) receptor. It was already known that Sema3F signaling through a complex of Nrp-2 and Plexin-A3 regulates pruning of dendritic spines in cortical neurons while Sema3A signals through Nrp-1/PlexA4 to regulate dendritic arborization. The major finding of this study which is well-supported by the data is that palmitoylation of Nrp-2 regulates its cell surface clustering and dendritic spine pruning activity in cortical neurons. Interestingly, palmitoylation of Nrp-1 at homologous residue does not appear to regulate its localization or known neuronal function.

      A clear strength of this manuscript is the many techniques that are utilized to examine the question: this study represents a tour de force of biochemical, molecular, genetic, pharmacological and cell biological assays performed both in vitro and in vivo. The authors carefully dissect the function of distinct palmitoylated cysteine residues on Nrp-2 localization and function, concluding that palmitoylation of juxtamembrane cysteines predominates over C-terminal palmityolyation for the Nrp-2 dependent processes assayed in this study. The authors also demonstrate that a specific palmityl transferase (DHHC15) acts on Nrp-2 but not Nrp-1 and is required for Nrp-2 clustering and dendritic spine pruning. These findings are important because they demonstrate one mechanism by which different signaling pathways, even from a related family of proteins, can achieve signaling specificity in the cell.

      A minor weakness of the paper is that one would like to see a connection between palmitoylation-dependent cell membrane clustering of Nrp-2 on the cell surface and Nrp-2 regulation of dendritic spine pruning. Although the two phenotypes frequently correlate in the data presented, there are a few notable exceptions: e.g. Nrp-2TCS forms larger clusters in cortical neurons while Nrp-2FullCS is diffuse on the cell surface; both mutants affect spine pruning. In the future, it would also be interesting to know if increased clustering of Nrp-2 was observed at spines that were eliminated, for example. Nonetheless this manuscript represents an important advance in our understanding of synaptic pruning and cellular mechanisms that constrain protein surface localization and signaling pathways.

      We agree that the reviewer’s comment on the need to show a direct association between palmitoylation-dependent Nrp-2 clustering on the cell surface and Nrp-2 regulation of dendritic spine pruning is very important. This underscores the need to develop new robust tools that can directly and specifically address the effects of palmitoylation on protein localization and neuronal morphology. For example, an antibody that is specific for palmitoylated Nrp-2, perhaps including site-specific Nrp-2 palmitoylation, would allow for direct visualization of palmitoylated protein localization at subcellular resolution, and if coupled with in vivo imaging, could help address questions related to spine dynamics with respect to Nrp-2 expression and palmitoylation. However, at present we consider this approach an important future direction.

      Regarding the Nrp-2 mutants TCS and Full CS, our experiments suggest the existence of a threshold for protein mislocalization beyond which Nrp-2 loses its function. In other words, the defect in protein localization imparted by the mutation of the three juxtamembrane cysteines (TCS Nrp-2 mutant) seems to be sufficient to cause Nrp-2 dysfunction. In addition, as noted above (Reviewer #1), the protein clustering assay is a useful but a more general localization assay; more sophisticated assays need to be developed to investigate palmitoylated proteins when they are mislocalized upon site-specific depalmitoylation, which could provide a more accurate association between a protein’s localization and function.

      The reviewer’s idea to look at the localization of Nrp-2 at dendritic spines and correlate this with the fate of spines during postnatal development, including relating to spine maintenance vs elimination, is an excellent suggestion that could link directly Nrp-2 to spine dynamics. To address this, however, again new assays with exogenous Nrp-2 expression will need to be developed, but with very low levels of protein expression to avoid saturation of spines with exogenous tagged-Nrp-2 protein and preserve functional specificity for spine regulation. Alternatively, robust in vivo tagging of ndogenous Nrp-2 protein using CRISPR approaches also provide another avenue to achieve this goal—of note, we are trying this approach but, thus far, we have not been successful in achieving labeling that is robust enough for such experiments.

    1. Author Response

      Reviewer #1 (Public Review):

      The current study melds computational and docking methods with functional measurements in a systematic approach: first, they analyze the mechanism of inhibitor binding to EAAT2; second, they mutate ASCT to resemble EAAT and show that the general binding pocket and inhibition mechanism are conserved; third, they perform an in silico screen to identify compounds that bind to the WT ASCT binding pocket; fourth, they perform electrophysiological assays showing that this novel compound allosterically modulates ASCT function. This is a complete and comprehensive study with extensive experimental support for the major conclusions. The authors identify an allosteric ASCT inhibitor, and although only partial inhibition is achieved, this study serves as proof-of-concept that this site can be targeted in diverse SLC-1 transporters as an allosteric inhibitory site.

      We would like to thank Reviewer #1 for the encouraging comments.

      Reviewer #2 (Public Review):

      This study set out to explore the nature of a previously described non-competitive and selective inhibitor of the human glutamate transporter, EAAT1 and to explore if this mechanism was conserved across the glutamate transporter family. The non-competitive nature of UCHPH-101 inhibition of EAAT1 has previously been demonstrated with both functional analysis and structures of EAAT1. Here, the authors use detailed electrophysiology analysis to confirm this mechanism of inhibition and to demonstrate that the inhibitor slows the steps of the transport cycle associated with substrate translocation, rather than substrate or sodium ion binding. These findings agree with previous studies that have shown that the compound binds at the interface of the transport and scaffold domains in EAAT1, two domains that are required to move relative to each other for the transport process to occur. UCPH-101 also prevents the transporter from entering an anion-conducting state, which agrees with a recent structure and MD simulations of EAAT1 that demonstrate movements of the transport domain relative to the scaffold domain are required for the EAAT1 to move into the anion-conducting state and support the mechanism of UCPH-101 inhibition confirmed in this study (PMID: 35192345; PMID: 33597752).

      While UCPH-101 has been shown to be selective for EAAT1 over other human glutamate transporter subtypes (notably EAAT2 and EAAT3), Dong et al., show that this inhibitor can also reduce transport by another member of the SLC1A family, a neutral amino acid exchanger, ASCT2. Using MD simulations and functional analysis, they show that UCPH-101 acts as a partial, low-affinity inhibitor of ASCT2 and identify two amino acid residues in the binding site that appear to be responsible for the different affinities for EAAT1 and ASCT2. Indeed, when these two residues are changed to the corresponding residues in EAAT1, UCPH-101 becomes a full inhibitor of ASCT2 with an increased affinity.

      ASCT2 is a neutral amino acid transporter that can transport glutamine and it is known to be upregulated in several cancers. Thus, finding new compounds and novel ways to inhibit ASCT2 is worthy of investigation. In the last section of this study, the authors conduct a virtual screen of 3.8 million compounds to identify other compounds that could bind to this allosteric site in ASCT2. One compound was identified, and while it had relative low affinity it provides the basis for further exploration of this site.

      We would like to thank Reviewer #2 for the thoughtful comments.

      Reviewer #3 (Public Review):

      Using whole-cell patch-clamp measurements, the authors nicely elaborate the competitive inhibition mechanism of UCPH-101 on EAAT1, concluding that it blocks conformational changes during transmembrane translocation, without inhibiting Na+/glutamate binding. The authors demonstrate that UCPH-101 binds to ASCT2 with strongly reduced affinity. Informed by sequence comparison between EAAT1 and ASCT2, the authors identify a pair of mutations, which makes the putative allosteric-binding pocket (which has been identified by crystallography earlier) in ASCT2 more similar to EAAT1 and restores the inhibitory effect of UCPH-101 in ASCT2. Overall, the electrophysiological experiments appear sound and convincing.

      We appreciate the kind words.

      Furthermore, using virtual screening against the UCPH-101 binding pocket in ASCT2, the authors identified a novel (non-UCPH-101-like) compound #302 that they experimentally demonstrate to also inhibit ASCT-2. However, the study lacks a detailed investigation of the inhibition mechanism of this compound and it remains unclear if #302 also mediates allosteric inhibition as the authors propose. Furthermore, the study lacks any experimental verification of the assumed binding site of #302.

      We agree. Therefore, we have now added more detailed experiments on compound #302 inhibition mechanism, confirming allosteric inhibition (new Fig. G and I).

      In addition, the study includes molecular-dynamics (MD) simulations on interactions of UCPH101 with EAAT1 and ASCT2. These simulations intend to support the interpretations of the electrophysiological experiments, i.e., relatively tight interactions of UCPH-101 with EAAT1 and weaker binding to ASCT2, which can be restored using two point-mutations in ASCT-2. Unfortunately, this is a relatively weak part of the study. Due to the lack of any convergence analysis, the statistical significance of the drawn conclusions remains unclear. Furthermore, since it is not reported how UCPH-101 has been parameterized, the chemical accuracy of these models is unclear.

      We now add information on the UCPH-101 parametrization protocol, and we have extended the time of MD simulations. Also, we have created additional trajectories for the atom distances between amino acid substrate and ASCT2 side chain in the substrate binding site, providing another data point on convergence in the substrate binding site, which should be unaffected by UCPH-101 binding, according to the experimental data.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the protein composition of exocytotic sites in dopaminergic neurons is investigated. While extensive data are available for both glutamatergic and GABA-ergic synapses, it is far less clear which of the known proteins (particularly proteins localized to the active zone) are also required for dopamine release, and whether proteins are involved that are not found in "classical" synapses. The approach used here uses proximity ligation to tag proteins close to synaptic release sites by using three presynaptic proteins (ELKS, RIM, and the beta4-subunit of the voltage-gated calcium channel) as "baits". Fusion proteins containing BirA were selectively expressed in striatal dopaminergic neurons, followed by in-vivo biotin labelling, isolation of biotinylated proteins and proteomics, using proteins labelled after expression of a soluble BirAconstruct in dopaminergic neurons as reference. As controls, the same experiments were performed in KO-mouse lines in which the presynaptic scaffolding protein RIM or the calcium sensor synaptotagmin 1 were selectively deleted in dopaminergic neurons. To control for specificity, the proteomes were compared with those obtained by expressing a soluble BirA construct. The authors found selective enrichments of synaptic and other proteins that were disrupted in RIM but not Syt1 KO animals, with some overlap between the different baits, thus providing a novel and useful dataset to better understand the composition of dopaminergic release sites.

      Technically, the work is clearly state-of-the-art, cutting-edge, and of high quality, and I have no suggestions for experimental improvements.

      We thank the reviewer for this summary and for pointing out the high quality of the work.

      On the other hand, the data also show the limitations of the approach, and I suggest that the authors discuss these limitations in more detail. The problem is that there is very likely to be a lot of non-specific noise (for multiple reasons) and thus the enriched proteins certainly represent candidates for the interactome in the presynaptic network, but without further corroboration it cannot be claimed that as a whole they all belong to the proteome of the release site.

      We fully agree with the reviewer. Most importantly, we have changed the final section from “Conclusions” to “Summary of conclusions and limitations” (lines 501-518) to summarize the limitations with equal weight to the conclusions. In the revised manuscript, we also included many specific additional points in this respect throughout the discussion and the results: many hits could be noise (lines 458, 478-479), thresholding affects the inclusion of proteins in the release site dataset (lines 208-215), the seven-day time window could deliver interactors from the soma to the synapse (lines 493-495), specific oddities (for example histones, lines 482-485), iBioID does not deliver an interactome per se but is simply based on proximity (lines 505-508), and several more. We also clearly state that each specific hit needs follow-up studies (lines 501-503: ” Each protein will require validation through morphological and functional characterization before an unequivocal assignment to dopamine release sites is possible.”), and a similar statement was added on lines 374-375.

      Reviewer #2 (Public Review):

      The Kaiser lab has been on the forefront in understanding the mechanism of dopamine release in central mammalian neurons. assessing dopamine neuron function has been quite difficult due to the limited experimental access to these neurons. Dopamine neurons possess a number of unique functional roles and participate in several pathophysiological conditions, making them an important target of basic research. This study here has been designed to describe the proteome of the dopamine release apparatus using proximity biotin labeling via active zone protein domains fused to BirA, to test in which ways its proteome composition is similar or different to other central nerve terminals. The control experiments demonstrating proper localization as well as specificity of biotinylation are very solid, yielding in a highly enriched and well characterized proteome data base. Several new proteins were identified and the data base will very likely be a very useful resource for future analysis of the protein composition of synapse and their function at dopamine and other synapses.

      We thank the reviewer for this positive assessment of our work.

      Major comment:

      The authors find that loss of RIM leads to major reduction in the number of synaptically enriched proteins, while they did not see this loss of number of enriched proteins in the Syt1-KO's, arguing for undisrupted synaptome. Maybe I missed this, but which fraction of proteins and synaptic proteins are than co-detected both in the Syt1 and control conditions when comparing the Venn diagrams of Fig2 and Fig 3 Suppl. 2? This analysis may provide an estimate of the reliability of the method across experimental conditions.

      We thank the reviewer for proposing to be clear in the comparison of the control and Syt-1 cKODA data. A direct comparison of hit numbers is included on lines 323-324, with 37% overlap between control and Syt-1 cKODA (vs. 15% between control and RIM cKODA). A direct mapping of this overlap is included in Fig. 4E. We think that this direct comparison is complicated by a number of factors, as outlined below, and did our best to include these complications in the discussion, including the last section (lines 501-518).

      First, to assess overall similarity, the initial comparison should be to assess axonal proteins identified in the BirA-tdTomato samples. These datasets are quite similar, with 671 (control) and 793 (Syt-1 cKODA) proteins detected, and a high overlap of 601 proteins. We think that this indicates that the experiment per se is quite reproducible. The comparison of the release site proteome between control and Syt-1 cKODA is more complicated. We think that the main point of this comparison is that the overall number of hits is quite similar, with 450 hits in the Syt-1 cKODA proteome and 527 hits in the control proteome, and we now show that this similarity holds across multiple thresholds (lines 298-301; ≥ 1.5: Syt-1 cKODA 602 hits, control 991, ≥ 2.0: 450/527, ≥ 2.5: 252/348). Detailed analyses of overlap reveals that known active zone proteins such as Bassoon, CaV2 channels, RIMs, and ELKS proteins are present in both proteomes, but the overlap is partial and incomplete with 191 proteins found in both proteomes. As discussed throughout and summarized on lines 501-518, the reasons for this partial overlap may be manifold. Trivially, it could be explained by noise or non-saturation (“incompleteness”) of the proteome. We also think that the Syt-1 proteome is not expected to be identical because there is a strong release deficit in these mice. If Syt-1 has a dopamine vesicle docking function (which it does at conventional synapses [4]), this could influence the proteome. We note that protein functions in the dopamine axon are not well established, but inferred from studies of classical synapses.

      We have scrutinized the manuscript to not express that the control and Syt-1 cKODA proteomes are identical; we know they are not and discuss the example of α-synuclein specifically (Fig. 6, lines 347-362). Rather, the striking part is that the extent of the proteomes with high hit number, much higher than RIM cKODA, are similar. Specific hits have to be assessed in a detailed way, one hit at a time, in future studies, as expressed unequivocally on lines 501-503).

      Reviewer #3 (Public Review):

      In this study Kershberg et al use three novel in vivo biotin-identification (iBioID) approaches in mice to isolate and identify proteins of axonal dopamine release sites. By dissecting the striatum, where dopamine axons are, from the substantia nigra and VTA, where dopamine somata are, the authors selectively analyzed axonal compartments. Perturbation studies were designed by crossing the iBioID lines with null mutant mice. Combining the data from these three independent iBioID approaches and the fact that axonal compartments are separated from somata provides a precise and valuable description of the protein composition of these release sites, with many new proteins not previously associated with synaptic release sites. These data are a valuable resource for future experiments on dopamine release mechanisms in the CNS and the organization of the release sites. The BirA (BioID) tags are carefully positioned in three target proteins not to affect their localization/function. Data analysis and visualization are excellent. Combining the new iBioID approaches with existing null mutant mice produces powerful perturbation experiments that lead and strong conclusions on the central role of RIM1 as central organizers of dopamine release sites and unexpected (and unexplained) new findings on how RIM1 and synaptotagmin1 are both required for the accumulation of alpha-synuclein at dopamine release sites.

      We thank the reviewer for assessing our paper, for summarizing our main findings, and for expressing genuine enthusiasm for the approach and the outcomes.

      It is not entirely clear how certain decisions made by the authors on data thresholds may affect the overall picture emerging from their analyses. This is a purely hypothesis-generating study. The authors made little efforts to define expectations and compare their results to these. Consequently, there is little guidance on how to interpret the data and how decisions made by the authors affect the overall conclusions. For instance, the collection of proteins tagged by all three tagging strategies (Fig 2) is expected to contain all known components of dopamine release sites (not at all the case), and maybe also synaptic vesicles (2 TM components detected, but not the most well-known components like vSNAREs and H+/DA-transporters), and endocytic machinery (only 2 endophilin orthologs detected). Whether or not a more complete collection the components of release sites, synaptic vesicles or endocytic machinery are observed might depend on two hard thresholds applied in this study: (a) "Hits" (depicted in Fig 2) were defined as proteins enriched {greater than or equal to} 2-fold (line 178) and peptides not detected in the negative control (soluble BirA) were defined as 0.5 (line 175). How crucial are these two decisions? It would be great to know if the overall conclusions change if these decisions were made differently.

      We agree with the reviewer that the thresholding decisions are important and have now better incorporated the rationale for these decisions in the manuscript.

      Two-fold enrichment threshold. As outlined in the response to point 1 in the editorial decision letter, we now include figure supplements to illustrate the composition of the control proteome if we apply 1.5- or 2.5-fold enrichment thresholds (Fig. 2 – figure supplements 1 and 2) instead of the 2.0-fold threshold used in Fig. 2. This leads to more or less hits (991 and 348, respectively) compared to the 2.0-fold threshold (527 hits). It is noteworthy that the SynGO-overlap is the highest with the 2.0 threshold (37% vs. 31% at 1.5 and 33% at 2.5, Fig. 2 – figure supplement 3), justifying this threshold experimentally in addition to what was done in previous work [1,2]. These data are now described on lines 208-215 of the manuscript. When we apply these different thresholds to RIM and Syt-1 cKODA datasets, the finding that RIM ablation disrupts release site assembly persists. The following hit numbers were observed in the mutants at the 1.5, 2.0 and 2.5 enrichment thresholds, respectively: RIM cKODA 268, 198 and 82 hits; Syt cKODA 602, 450 and 252 hits. Hence, the extent of the release site proteome remains much smaller after RIM ablation independent of the enrichment threshold, bolstering the conclusion that RIM is an important scaffold for these release sites. This is included in the revised manuscript on lines 298301.

      Undetected peptides in BirA-tdTomato. We did not express this well enough in the manuscript. The undetected proteins were set to 0.5 such that a protein that was detected with a specific bait but not with BirA-tdTomato could be illustrated with a specific circle size, not to determine inclusion in the analyses. If the average peptide count across repeats with a specific bait was 1, this resulted in inclusion in Fig. 2 and consecutive analyses with the smallest circle size. Hence, this decision was made to define circle size. It did not affect inclusion in Fig. 2 beyond the following two points. If one were to further decrease it, this might result in including peptides that only appeared once as a single peptide for some of the experiments, which we wanted to avoid. If one would set it higher (to 1), this artificial threshold would be equal to proteins that were actually detected experimentally multiple times, which we wanted to avoid as well. We have now clarified this on lines 165-167 and lines 1119-1121.

      Expected proteins. In general, interpreting our dataset with a strong prior of expected proteins is difficult. The literature on release site proteins specifically characterized for dopamine is limited. We have found Bassoon, RIM, ELKS and Munc13 to be present using 3D-SIM superresolution microscopy [5,6], and we indeed found these proteins in the data as discussed on lines 227-232 and lines 423-445 in the revised manuscript. The prediction for vesicular and endocytic proteins is complicated. Release sites are sparse [5,7], and vesicle clusters are widespread in the dopamine axon, in some cases filling most of the axon (for example, see extended vesicle clusters filling much of the dopamine axon in Fig. 7E of [5]). Furthermore, docking in dopamine axons has not been characterized, and it is unclear how frequently vesicles are docked. Hence, it is not clear whether vesicular proteins should be concentrated at release sites compared to the rest of the axon (the BirA-tdTomato proteome we use for normalization). Similar points can be made for proteins for endocytosis and recycling of dopamine vesicles. Within the dopamine system, it is unclear whether the recycling pathway is close to the exocytic sites. One consistent finding across functional studies is that depletion after activity is unusually long-lasting in the dopamine system, for tens of seconds, even after only mild stimulation [5,8–13]. Hence, endocytosis and RRP replenishment might be very slow in these axons. It is not certain that endocytic factors are predeployed to the plasma membrane, and if they are, it is unclear how close to release sites they would be. As such, we agree with the reviewer that the proteome we describe is a hypothesisgenerator. With the limited knowledge on dopamine release, predictions beyond the previously characterized proteins in dopamine axons are difficult to make.

      We thank the reviewer for suggesting to include a better analysis of different thresholds and for giving us the opportunity to clarify the other points that were raised.

      Given the good separation of the axonal compartment from the somata (one of the real experimental strengths of this study), it is completely unexpected to find two histones being enriched with all three tagging strategies (Hist1h1d and 1h4a). This should be mentioned and discussed.

      We agree with the reviewer and have addressed this point in the manuscript. This could either reflect noise, or there could be more specific reasons behind it. The manuscript now states on lines 482-485: “It is surprising that Hist1h1d and Hist1h4a, genes encoding for the histone proteins H1.3 and H4, were robustly enriched (Fig. 2A). These hits might be entirely unspecific, or their co-purification could be due to biotinylation of H1 and H4 proteins (Stanley et al., 2001). It is also possible that there are unidentified synaptic functions of some of the unexpected proteins.” Ultimately, we do not know why these proteins are enriched, and we state clearly in the section “Summary of conclusions and limitations” that each new hit has to be validated in future studies (lines 501-503).

      It would also help to compare the data more systematically to a previous study that attempted to define release sites (albeit not dopamine release sites) using a different methodology (biochemical purification): Boyken et al (only mentioned in relation to Nptn, but other proteins are observed in both studies too, e.g. Cend1).

      We agree with the reviewer that Boyken et al, 2013 [14] is an important resource for our paper and for the assessment of the proteomic composition of release sites. We have now introduced links and citations to this paper multiple times (for example, on lines 231, 241, 430, 443, 481) and have expanded the discussion of overlap between these proteomes, including on Cend1 (lines 479482).

      We think that a systematic comparison with Boyken et al, 2013 [14] is complicated because (1) so little is known about dopamine release mechanics and (2) because the approach is very different between the two papers. In respect to (1), most prominently, it is not certain how frequently vesicles are docked in the dopamine axon. Only ~25% of the varicosities contain these release sites, and vesicle docking has not been characterized in striatal dopamine axons to the best of our knowledge. Hence, how a docking site at a classical synapse compares to a dopamine release site remains unclear at the outset. For point (2), the key difference is that “within dataset normalizations” are very different in these two studies. In our iBioID dataset, we normalize to soluble proteins defined as proximity to BirA-tdTomato. In ref. [14], the authors express enrichment over “light”, regular synaptic vesicles purified with the same approach. This has a major impact on the proteome that strongly influences a direct comparison of hits, because there are large differences in the normalization. While each normalization makes sense for the respective paper, it complicates direct comparison.

      With these points in mind, we have compared hits across both datasets class-by-class. For some classes, the datasets have reasonable overlap for ≥ 2-fold enriched proteins: for example for active zone proteins (3 of 7 hits in [14] appear in our control proteome) and adhesion and cell surface proteins (8 of 18). For other classes, the overlap is limited: for example for nucleotide metabolism/protein synthesis (0 of 16 hits in [14] appear in our dataset) and cytoskeletal proteins (5 of 29). We hope the reviewer agrees, that given these factors, the analyses and discussion needed for a systematic comparison goes beyond the scope of our paper. We have instead added a number of references to Boyken et al., 2013 [14], as outlined above, when direct comparison is meaningful.

    1. Author Response

      Reviewer #2 (Public Review):

      In this paper, Xiao et al. suggest that PASK is a driver for stem cell differentiation by translocating from the cytosol to the nucleus. This phenomenon is dependent on the acetylation of PASK mediated by CBP/EP300, which is driven by glutamine metabolism. Furthermore, this study showed that PASK interferes/weakens the Wdr5-APC/C interaction, where PASK interacts with Wdr5, resulting in repression of Pax7, leading to stem cell differentiation.

      There exist huge interest in maintaining adult stem cells and ES cells in their pluripotent form and the work painstakingly perform several experiments to present that PASK is a good target to achieve that goal.

      However, the work on the paper relies mostly on data from C2C12 cells as adult muscle stem cell models, in vivo experimental data, and primary myoblasts from mice. Using these models makes the story contextual in muscle stem cells. Authors have not tried to extrapolate similar claims in other adult stem cell models. This severely restricts the claim to muscle stem cells even though PASK is required for the onset of embryonic and adult stem cell differentiation in general. Their work could be much strengthened if it is also tried on mesenchymal stem cells as these cells are also as metabolically active as muscle cells.

      We thank reviewers for their enthusiasm for our studies using PASKi. We have previously shown that PASKi prevented differentiation of 10T1/2 cells into adipogenic lineage (Kikani et al, Elife, 2016). We used stem cells from embryonic (ESC) and adult (MuSCs) origin to show broad application of PASKi in preserving self-renewal independent of stem cell origin. We believe that PASK function to be conversed across different stem cell paradigms; and our results in this manuscript would provide framework to further study PASK in other stem cell paradigms.

      Reviewer #3 (Public Review):

      This manuscript entitled "PASK relays metabolic signals to mitotic Wdr5-APC/C complex to drive exit from selfrenewal" by Xiao et al presents an interesting story on the role of PASK in the control of muscle stem cell fate by controlling the decision between self-renewal and differentiation. While the biochemistry presented is fairly compelling, the experiments revolving around the myogenic cells are lacking in quality and data.

      Major concerns:

      1) The isolation method used by this group to isolate muscle stem cells is inappropriate for the experiments used and may contribute to the misinterpretation of some of the results. It is simply a preplating method that results in a very heterogenous cell population in terms of cell type, comprised of numerous fibroblasts. While preplating can be used to isolate muscle stem cells and culture them as myoblasts, it takes days of growth and multiple rounds of passaging that are not used in this paper in order to get a more pure population of myogenic cells. This would also explain the high number of Pax7 negative cells in their primary myoblast experiments (~50% in some conditions) as they are most likely fibroblasts, which the authors could show by staining for fibroblast markers. The increase in Pax7 cells in certain conditions could also simply be due to the loss of contaminating cell types due to the treatment. Every single experiment that was performed on myoblasts must be redone using a more appropriate cell isolation method (i.e. FACS) or by culturing these isolated cells for a much longer period of time to eventually get a more pure cell population. As it stands, none of the data from the primary myoblast experiments are trustworthy.

      We agree – and thus, we have reproduced our results using two different methods of purifying MuSCs from mice, as indicated above. We took care to stain each isolation method with vimentin (a marker for fibroblasts) to ensure the purity of our preparation. Data are included in the Essential revisions section.

      2) The authors possess a genetic mouse model where PASK is knocked out. However, the mouse model is never described and the paper that is referenced also does not describe it. Please detail your mouse model.

      3) The majority of experiments are performed on C2C12 cells. While C2C12s are adequate for biochemistry and proof of concepts, when it comes to biological significance primary myoblasts should be used. While the authors try to explain this use by claiming that primary myoblasts undergo precocious differentiation that can be avoided by using an appropriate growth media (F10, 20% FBS, 1% P/S, 5ng/mL of bFGF).

      Kindly see the response for this comment in the Essential revision section.

      4) The authors possess a genetic mouse model, yet performed RNA-Seq on C2C12 myoblasts that were either untreated or treated with a PASK inhibitor. It would be much more informative and valuable to sequence the primary myoblasts from WT and PASK KO mice, thereby providing a more biologically relevant model.

      We used C2C12 for several reasons for initial transcriptome analysis using PASKi and validated the results from that analysis in primary myoblasts. (1) C2C12 cells are an excellent model for performing biochemical pathway characterization, including discovering new substrate targets for PASK, finding PASK interacting partners, and measuring the biochemical activity of PASK under various conditions. Thus, it would form the basis for a longer-term study of the signaling functions of PASK in one cell system (myoblasts), which can be validated and compared with the primary cell system. (2) PASKi treatment can acutely inhibit PASK catalytic activity without the genetic loss of its protein level. For many enzymatic proteins, catalytic inhibition could have a different biological effect compared with genetic loss of protein (Weiss et al.; Nat Chem Biol. 2007 Dec; 3(12): 739–744.). Thus, we chose the PASKi and C2C12 myoblasts system to study the kinase activitydependent effect on the myoblast transcriptome. However, throughout the manuscript, we used PASKi, PASK siRNA, and PASKKO primary cells to cross-validate all our data. We believe the conditional loss of PASK in MuSCs specific manner will be a great model to repeat the RNA-seq analysis in the future and compare the data obtained with PASKi in cultured myoblasts.

      5) The KO mouse model is rarely used and the cells isolated from it would be very useful in determining the biological role of PASK in muscle cells. The authors should isolate WT and KO cells and perform basic muscle functional experiments such as EDU incorporation for proliferation, and fusion index for differentiation to see whether the loss of PASK has an effect on these cells.

      We have published the characterization of myogenesis phenotype of PASKKO model in our previous manuscript (Kikani et al, 2016). Thus, we erred by not redoing those experiment in the previous version. We have now reproduced those results and markedly extended the chacterization of PASKKO cells in vitro, including BrdU incorporation, myogenesis, Pax7 heterogeneity, Myogenin expression and PASK subcellular distribution using WT cells. We have also characterized regeneration phenotype of PASKKO mice. We thank the reviewer for helping strengthen the biological context of our manuscript.

      6) The authors never look at quiescent muscle stem cells and early activated muscle stem cells in terms of PASK protein expression and dynamics. The authors should isolate EDL myofibers and stain for PASK and PAX7 at 0, 24, 48, and 72-hour post isolation. This would allow the authors to quantify the changes in PASK expression and cell localization, as well as confirm the number of muscle stem cells in WT and KO mice, during quiescence and during the process of muscle stem cell activation, proliferation, and differentiation in a near in vivo context.

      As described in Figure 1-figure supplement 2A, PASK is not expressed in quiescent MuSCs. Therefore, we do not anticipate a functional role of PASK in initial activation of QSC. We do not propose that PASK plays a role in the maintenance of the QSC state or the exit and initial activation of MuSCs following muscle injury. PASK is transcriptionally activated in proliferating myoblasts during regeneration (Kikani et al, elife 2016) and upon isolation of MuSCs (Figure S1D). Therefore, we specifically focus on studying the biochemical functional role of PASK signaling in activated (proliferating) myoblasts isolated from mice or during early regeneration. We have ongoing studies examining the precise temporal kinetics of PASK transcription regulation in Pax7+ MuSCs as they are activated, and to identify its upstream transcriptional regulators. However, we respectfully suggest that these avenues are outside of the purview of this current manuscript that specifically explores the metabolic pathway that establishes progenitor population from activated myoblasts.

      7) Contrary to their claim, MyoD is not a stemness/self-renewal gene.

      We agree, and have corrected the text.

      8) The authors state that PASK is necessary for exit from self-renewal and establishment of a progenitor population, but this is a vast overstatement. In the genetic KO mouse model, the mice are able to regenerate their muscle after injury, therefore PASK cannot be a necessary protein for the formation of progenitor cells.

      During the muscle regeneration, we observed a significant inhibition of the early regenerative response in PASKKO mice, marked by severely reduced levels of eMHC. Concomittantly, we observed increased numbers of Pax7+ MuSCs at Day 5 of regeneration compared with WT muscles. We have extensively shown requirement of PASK for myogenin induction in vitro and in vivo (Kikani et al, 2016, Kikani et al, 2019). Based on these evidence, we propose that PASK is necessary for the exit from Pax7+ self-renewing stem cells and generation of Myog+ committed progenitor populations.

      9) In numerous figure panels, the y-axis represents the # of cells, rather than a percentage or ratio. This is uninformative as the number of cells will never be the same between conditions and experiments. These panels need to be replaced with a more appropriate y-axis.

      We have updated the axes to % cells where appropriate.

    1. Author Response

      Reviewer #1 (Public Review):

      […] Overall, the results from these analyses are convincing and valuable, but still do not seem to be a big leap from their Unger 2021 paper […]. The methodology that they established should be described more clearly so that it can be shared with the research community. For example, they say cells how many donors were recruited for this experiment? are there differences in efficiency in B cell differentiation by individual?

      Also, it would be important to assay for antibodies in the culture media. How would you suggest to improve the culture system to be used to model diseases?

      We appreciate the reviewer's queries and the points raised. In response to the first set of comments, the reviewer has correctly observed that the methodology of the assay itself as employed in this paper is not new or superior to our previously published data in (Unger et al., Cells 2021), where we described a minimalistic in vitro system for efficient differentiation of human naive B cells into antibody-secreting cells (ASCs). However, the current study aims to elucidate a comprehensive evaluation of the phenotype of the cells in the in vitro system and their relationships in potential differentiation pathways. In addition, we aimed to elucidate how the detailed gene expression profiles of the differentiating cells in vitro compare to in vivo observed counterparts. In this way, we were able to uncover an antibody secreting cell phenotype in vivo that was not observed before and could only be uncovered due to our full transcriptome knowledge of these cells. In addition, we present novel findings that demonstrate that this culture system not only enables efficient ASCs generation but also recapitulates the entire in vivo B cell differentiation pathway, as evidenced by the presence of germinal-center (GC)-like and pre-memory B cells in the culture. These results have not been previously reported in the literature for human B cells in culture and represent a significant contribution to the field of human B cell biology.

      In regards to the reviewer's inquiry about the cell culture protocol, its reproducibility, donors variability, and additional experimental applications, we refer to three additional recent publications from our group that have adopted the same in vitro B cell differentiation system and have provided extensive analysis of the immunoglobulin production, intracellular signaling pathways, as well as comparison with other culture systems in the field (Marsman et al., Cells 2020; Marsman et al., Eur. J. Immunol. 2022; Marsman et al., Front. Immunol. 2022). On top pf this, we now realize that the section that describes the culture system (MATERIAL AND METHODS - “In vitro naive B cell differentiation cultures”) was a bit too concise and we thank the reviewer for mentioning it. We have extended now on it and corrected an inconsistency at lines 125-127: “After six days, activated B cells were collected and co-cultured with 1 × 104 9:1 wild type (WT) to CD40L-expressing 3T3 cells that were irradiated and seeded one day in advance (as described above), together with IL-4 (100 ng/ml) and IL-21 (50 ng/ml; Invitrogen) for five days.”

      As for the application of our in vitro system in disease modeling, as requested by the reviewer, this would require modifying the culture conditions to mimic the disease-specific biology background (if known). For instance, by inhibiting or enhancing specific transcriptional pathways that are known to be associated with the disease in question. However, it would also require the presence of antigen-specific B cells in the pool of naive B cells included in the culture, which can be difficult to achieve due to their low frequency. Alternatively, the system could be used to study antigen-specific recall responses using antigen-specific memory B cells as starting material. Our group has evaluated this approach in a recent publication (Marsman et al., Front. Immunol 2022).

      [..] B cell differentiation may also influence to cell cycle regulation. Rather than normalize its effect, can authors analyze effect of cell cycle in B cell differentiation? [...]

      We very much agree with the reviewer and know that the cell cycle plays a significant role in B cell differentiation output trajectories (Zhou et al, Front Immunol. 2018; Duffy et al., Science 2012). Preparing the manuscript, we have in fact performed a parallel analysis in which we compared both cell cycle regressed- and not cell cycle regressed-based clustering and marker gene selection. Concerning the clustering, other clusters were obtained using the not cell-cycle-regressed dataset compared to the cell-cycle-regressed dataset (figure below). However, when overlaying the clusters obtained with the cell cycle-regressed dataset, the extra clusters were the same cell population but now split based on cycling and not cycling cells: cluster 2 is now divided into the cycling cluster “c”, and the not-cycling cluster “d” while cluster 4 and 5 are now divided into the cycling clusters “e” and the not-cycling cluster “f”. A comprehensive examination of the expression of the top 50 genes associated with antibody-secreting cells in the (non)cycling clusters 4 and 5 reveals that these genes are expressed at a higher level in (non)cycling cluster 5 as compared to cluster 4. This suggests that the cells within cluster 5 are more advanced in their differentiation, regardless of their cell cycle state. This finding has led us to the decision to present the data that has undergone cell cycle regression in the manuscript. Should the reviewer so desire, we are very willing to include additional supplementary figures to the manuscript that include the un-regressed representation.

      Figure legend: A-C) UMAP projection of single-cell transcriptomes of in vitro differentiated human naive B cells without cell cycle regression. Each point represents one cell, and colors indicate graph-based cluster assignments identified without cell-cycle regression (A), with cell cycle regression (B) or with cell cycle regression and additional subdivision in cycling and not cycling cells (C). D) Dotplot showing the top 50 differentially expressed genes in cycling and not-cycling cells from cluster 4 and 5. Point size indicate percentage of cell in the cluster expressing the gene, color indicates average expression

    1. Author Response

      Reviewer #1 (Public Review):

      Doostani et al. present work in which they use fMRI to explore the role of normalization in V1, LO, PFs, EBA, and PPA. The goal of the manuscript is to provide experimental evidence of divisive normalization of neural responses in the human brain. The manuscript is well written and clear in its intentions; however, it is not comprehensive and limited in its interpretation. The manuscript is limited to two simple figures that support its concussions. There is no report of behavior, so there is no way to know whether participants followed instructions. This is important as the study focuses on object-based attention and the analysis depends on the task manipulation. The manuscript does not show any clear progression towards the conclusions and this makes it difficult to assess its scientific quality and the claims that it makes.

      Strengths:

      The intentions of the paper are clear and the design of the experiment itself is simple to follow. The paper presents some evidence for normalization in V1, LO, PFs, EBA, and PPA. The presented study has laid the foundation for a piece of work that could have importance for the field once it is fleshed out.

      Weakness:

      The paper claims that it provides compelling evidence for normalization in the human brain. Very broadly, the presented data support this conclusion; for the most part, the normalization model is better than the weighted sum model and a weighted average model. However, the paper is limited in how it works its way up to this conclusion. There is no interpretation of how the data should look based on expectations, just how it does look, and how/why the normalization model is most similar to the data. The paper shows a bias in focusing on visualization of the 'best' data/areas that support the conclusions whereas the data that are not as clear are minimized, yet the conclusions seem to lump all the areas in together and any nuanced differences are not recognized. It is surprising that the manuscript does not present illustrative examples of BOLD series from voxel responses across conditions given that it is stated that it is modeling responses to single voxels; these responses need to be provided for the readers to get some sense of data quality. There are also issues regarding the statistics; the statistics in the paper are not explicitly stated, and from what information is provided (multiple t-tests?), they seem to be incorrect. Last, but not least, there is no report of behavior, so it is not possible to assess the success of the attentional manipulation.

      We appreciate the reviewer’s feedback on providing more information so that the scientific quality of our work can be assessed. We have now added a new figure including BOLD responses in different conditions, as well as how we expected the data to look and the interpretations. To provide extra evidence for data quality and reliability, we have included BOLD responses of different conditions for odd and even runs separately in the supplementary information.

      In order to avoid any bias in presentation, we have now visualized the results from all areas with the same size and in a more logical order. However, we have also modified all results to include only those voxels in each ROI that were active for the stimuli presented in the main task based on the comment of one of the reviewers. According to the current results, there is no difference in the efficiency of the normalization model in different regions, which we have reported in the results section.

      Regarding the statistics, we have corrected the problem. We have performed ANOVA tests, have corrected all results for multiple comparisons, and have added a statistics subsection in the methods section to explicitly explain the statistics.

      Finally, we have added the report of the reaction time and accuracy in the results section and the supplementary information. As stated, average performance was above 86% in all conditions, confirming that the participants correctly followed the instructions and that the attentional manipulation was successful.

      We hope that the reviewer would find the manuscript improved and that the new analyses, figures, and discussions would address the reviewer’s concerns.

      Reviewer #2 (Public Review):

      My main concern is in regards to the interpretation of these results has to do with the sparseness of data available to fit with the models. The authors pit two linear models against a nonlinear (normalization) model. The predictions for weighted average and summed models are both linear models doomed to poorly match the fMRI data, particularly in contrast to the nonlinear model. So, while I appreciate the verification that responses to multiple stimuli don't add up or average each other, the model comparisons seem less interesting in this light. This is particularly salient of an issue because the model testing endeavor seems rather unconstrained. A 'true' test of the model would likely need a whole range of contrasts tested for one (or both) of the stimuli, Otherwise, as it stands we simply have a parameter (sigma) that instantly gives more wiggle room than the other models. It would be fairer to pit this normalization model against other nonlinear models. Indeed, this has been already been done in previous work by Kendrick Kay, Jon Winawer and Serge Dumoulin's groups. So far, may concern above has only been in regards to the "unattended" data. But the same issue of course extends to the attended conditions. I think the authors need to either acknowledge the limits of this approach to testing the model or introduce some other frameworks.

      We thank the reviewer for their feedback. We have taken two approaches to answer this concern. First, we have included simulations of neural population responses to attended and unattended stimuli. The results demonstrate that with our cross-validation approach, the normalization model is only a better fit if the computation performed at the neural level for multiple-stimulus responses is divisive normalization. Otherwise, the weighted sum or the weighted average models are better fits to the population response when the neurons respectively sum or average responses. These results suggest that the normalization model provides a better fit to the data because the underlying computation performed by the neurons is divisive normalization, not because of the model’s non-linearity.

      In a second approach, we tested a nonlinear model, which was a generalization of the weighted sum and the weighted average models with an extra saturation parameter (with even more parameters than the normalization model). The results demonstrated that this model was also a worse fit than the normalization model.

      Regarding the reviewer’s comment on testing for a range of contrasts, as we have emphasized now in the discussion, here, we have used single-, multiple-, attended- and unattended-stimulus conditions to explore the change in response and how the normalization model accounts for the observed changes in different conditions. While testing for a range of contrasts would also be interesting, it would need a multi-session fMRI experiment to test for a range of contrasts with isolated and paired stimulus conditions in the presence and absence of attention. Moreover, the role of contrast in normalization has been investigated in previous studies, and here we added to the existing literature by exploring responses to multiple objects, and investigating the role of attention. Finally, since the design of our experiment includes presenting superimposed stimuli, the range of contrasts we can use is limited. Low-contrast superimposed stimuli cannot be easily distinguished, and high-contrast stimuli block each other.

      We hope that the reviewer would find the manuscript improved and that the new models, simulations, analyses, and discussions would address the reviewer’s concerns.

      Reviewer #3 (Public Review):

      In this paper, the authors model brain responses for visual objects and the effect of attention on these brain responses. The authors compare three models that have been studied in the literature to account for the effect of attention on brain responses to multiple stimuli: a normalization model, a weighted average model, and a weighted sum model.

      The authors presented human volunteers with images of houses and bodies, presented in isolation or together, and measured fMRI brain activity. The authors fit the fMRI data to the predictions of these three models, and argue that the normalization model best accounts for the data.

      The strengths of this study include a relatively large number of participants (N=19), and data collected in a variety of different visual brain regions. The blocked design paradigm and the large number of fMRI runs enhance the quality of the dataset.

      Regarding the interpretation of the findings, there are a few points that should be considered: 1) The different models that are being studied have different numbers of free parameters. The normalization model has the highest number of free parameters, and it turns out to fit the data the best. Thus, the main finding could be due to the larger number of parameters in the model. The more parameters a model has, the higher "capacity" it has to potentially fit a dataset. 2) In the abstract, the authors claim that the normalization model best fits the data. However, on closer inspection, this does not appear to be the case systematically in all conditions, but rather more so in the attended conditions. In some of the other conditions, the weighted average model also appears to provide a reasonable fit, suggesting that the normalization model may be particularly relevant to modeling the effects of attention. 3) In the primary results, the data are collapsed across five different conditions (isolated/attended for preferred and null stimuli), making it difficult to determine how each model fares in each condition. It would be helpful to provide data separately for the different conditions.

      We thank the reviewer for their feedback.

      Regarding the reviewer’s concern about the number of free parameters, we have introduced a simulation approach, demonstrating that with our cross-validation approach, a model with a higher number of parameters is not a good fit when the underlying neural computation does not match the computation performed by the model. Moreover, we have now included another nonlinear model with 5 parameters that performs worse than the normalization model. Besides, we have used the AIC measure in addition to cross-validation for model comparison, and the AIC measure confirms the previous results.

      Regarding the difference in the efficiency of the normalization model across conditions, after selecting the voxels that were active during the main task in each ROI (done according to the suggestion of one of the reviewers to compensate for the difference in size of localizer and task stimuli), we observed that the normalization model was a better fit for both attended and unattended conditions. However, since the weighted average model results were also close to the data in unattended conditions, we have discussed the unattended condition separately and have discussed the relevance of our results to previous reports of multiple-stimulus responses in the absence of attention.

      Finally, concerning model comparison for different conditions, we have calculated the models’ goodness of fit across conditions for each voxel. The reason for calculating the goodness of fit in this manner was to evaluate model fits based on their ability in predicting response changes with the addition of a second stimulus and with the shifts of attention. Since correlation is blind to a systematic error in prediction for all voxels in a condition, calculating the goodness of fit across voxels would lead to misinterpretation. We have now included a figure in the supplementary information illustrating the method we used for calculating the goodness of fit.

      We hope that the reviewer would find the manuscript improved and that the new analyses, simulations, figures, and discussions would address the reviewer’s concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Braet et al provide a rigorous analysis of SARS-CoV-2 spike protein dynamics using hydrogen/deuterium exchange mass spectrometry. Their findings reveal an interesting increase in the dynamics of the N-terminal domain that progressed with the emergence of new variants. In addition, the authors also observe an increase in the stabilization of the spike trimeric core, which they identify originates from the early D614G mutation.

      Overall this is a timely and interesting exploration of spike protein dynamics, which have so far remained largely unexplored in the literature.

      What I find a bit missing in this manuscript is a link between how the identified changes in protein dynamics lead to increased viral fitness. While there are some possibilities listed in the discussion, I think these should be elaborated upon further. In addition, it should also be discussed how understanding the changes in the spike protein dynamics could have implications for the development of small molecule inhibitors for the virus.

      We have included information in the introduction and conclusion to make the connection more clearly between our observations, function, and viral fitness of spike protein. We have also connected specific mutations to observed function. We have re-organized the discussion for increased clarity and to improve the correlation of our observations to viral fitness.

      Reviewer #2 (Public Review):

      The study systematically looks at dynamic differences across variants longitudinally and the authors appropriately only limit their analyses to peptides that are conserved across the different variants.

      There are some concerns listed below, particularly related to the ensemble heterogeneity that is reported and need considerable revision.

      1) The authors explain that cold-temperature treatment of the S trimer ectodomain constructs has been shown to lead to instability and heterogeneity. They also show this with a comparison of untreated vs. 3-hour 37 ℃ treated samples. I'm confused as to why "During automated HDXMS experiments protein samples were stored at 0 ℃". Will this not cause issues in protein heterogeneity, where the longer the protein sits at 0 ℃ the more potential heterogeneity there will be, and thus greatly confound the analysis?

      We thank the reviewer for highlighting this point. We have carefully examined and reevaluated our analysis of both wild -type and variant spike HDXMS. During automated HDXMS experiments, protein samples are indeed maintained at 0 ℃, in between runs and replicates for fixed periods of time (4 h per replicate). In the case of WT S, we did observe conformational heterogeneity between replicates (Figure 2- figure supplement 6), as correctly pointed out by the reviewer. We have repeated analysis of WT S without 0 ℃ incubation in automated HDXMS experiments. In the revised manuscript, Figure 2 shows the more homogenous conformation of WT S, when not incubated at 0 ℃ in between replicates. Extension of these analyses to D614G (Figure 2- figure supplement 7) and all subsequent variants that each contain D614G, showed almost no conformational heterogeneity.

      We have included a detailed description (lines 237-244) of the revised manuscript to describe in greater detail effects of 0 ℃ incubation on HDXMS of WT S.

      Our results revealed that WT S was more sensitive to cold denaturation as described previously [Costello et al. 2021] where the reported half-life for conformational transitions after 0 ℃ incubation was 17 hours. We had not anticipated conformational heterogeneity revealed by deuterium exchange when using an automated HDXMS setup. Upon further review, we see a significant ensemble shift in trimer stalk peptides for the second and third replicates which sat at 0 ℃ for 4 and 8 hours respectively. This is only observed in WT but not any of the other variant S samples. We thank the reviewer for pointing this out and strengthening our conclusions.

      2) The authors presume that the bimodal spectra that are observed reflect EX1 kinetics, however, there can be multiple reasons for an apparent bimodal distribution in the spectra. I agree that some of the spectra indicate that more than a single species is present, but what the two populations represent is murky. In Figure 2D, the apparent size of the highly deuterated population gets larger going from the 60 sec to the 600-sec spectra, as expected for an EX1 transition. However, in Figure 3D the WT highly deuterated population gets smaller going from the 60-sec to the 600-sec spectra. Were bimodal examples observed beyond those shown in Figure 2?

      We agree with the reviewer. The appearance of bimodal spectra in deuterium exchange of S protein peptides in WT S are not a result of EX1 kinetics alone. We have revised the explanation for the presence of the bimodal spectra. These are largely a consequence of automated HDXMS workflows, that included 0° C incubations for short periods of time in between replicates. We report new experiments where we have eliminated 0 °C incubations by incubating at 20 °C between replicates and observed a lot lower conformational heterogeneity.

      Consequently, the shifts in bimodal spectra in figure 3D for WT S are also likely a consequence of automated HDX MS experiments with 0 ℃ incubation. We have carried out new experiments without 0 ℃ incubation, and these are shown in a revised figure 3. Even without 0 ℃ incubation, we do see bimodal spectra for certain peptides [figure 2 – S5]. These reflect an ensemble of prefusion and splayed conformations of WT S. Lack of baseline resolution precludes application of HDexaminer to resolve spectral envelopes quantitatively.

      3) How were the spectra that appeared broadened analyzed? There is no description of this in the methods, and the only data shown for this is in table 1. The left/right percentages are reported without any description of how they were obtained. Are these solely from a single spectrum? The most alarming issue is that Table 1B reports 9.4% for the right population of the 988-998 peptide, but the corresponding spectra in Figure 3D doesn't seem to have any highly deuterated population at all.

      We agree with the reviewer. We have removed HD examiner analysis of spectral broadening. Some of the spectral broadening was a consequence of 0 ℃ incubation in automated HDX analyses. These have been revised in new supplemental figures for wild -type HDX MS. Baseline resolution precludes effective quantitation of spectral envelopes, Figure 2-figure supplement 5 highlights qualitatively the spectral broadening for the reader’s benefit.

      4) The authors state on page 12: "Replicate analysis of stabilized S trimers with incubation at 4C prior to deuterium exchange (see methods) showed a time-dependent reversal of stabilization as reported previously (Costello et al., 2022), most evident at the same peptides." Is this data shown anywhere? If not then it should be included somewhere, possibly in table 1 as I would expect the cold treatment to offset the left/right population sizes.

      We note that this statement was misleading and have revised the text. The time-dependent reversal of stabilization has previously been described (Costello et al., 2022 paper) and is not part of this study.

      5) The authors state that peptide 899-913 'exhibits a slow conformational interconversion (time scale ~ 15-30 min)'. Where did this estimated rate come from? From the data shown and the limited number of time points, I don't think there is sufficient sampling of this conformational transition to really narrow down the exact timescale, especially since the ratio of left/right populations is so dependent on the pre-treatment of the sample prior to deuterium exchange. (See 1st comment)

      We thank the reviewer. The heterogeneity in deuterium exchange is attributable to the variable 0 °C incubation times in our automated HDXMS workflow. We have removed any explanations of conformational interconversion occurring in our experimental timescales.

      6) The woods plots presented in the Supporting information: (Figures 2-S4, 2-S5, 3-S4, 4-S2, 5-S2, 6-S2) are not conventional Woods plots. Normally the plots would indicate a global threshold for what is deemed to be significant based on the overall error in the dataset. From what I gather the authors used error within an individual peptide to establish significance for each specific peptide, which would be okay, but the authors don't describe the number of replicates or how the p-value was calculated. I would strongly recommend that the authors instead rely on a hybrid significance testing approach, as described recently: (PMID 31099554). What's really alarming with the current approach is that several of the Woods plots shown have data points found to be significantly different that are right at zero on the y-axis.

      We thank the reviewer. We have replaced all of the Woods plots with volcano plots. We have now applied a hybrid significance testing approach as recommended by the reviewer.

      7) Table 1: The summary of the peptides with observed bimodal behavior should include data from the replicates, particularly for assessment of how consistent the left/right population sizes are across replicates. Instead of just a percentage, the table should report an average and the standard deviation from the replicate measurements. Furthermore, the table should also include peptides that are overlapping with those presented. Based on Figure 2-figure supplement 1, there are at least two other peptides that cover the 899-913 region. These additional peptides should show a similar trend with bimodal profiles and will be important for showing how reproducible the apparent EX1 kinetics are in the dataset.

      All available replicates and overlapping peptides should be analyzed to ensure that these percentages reported are consistent across the data. It is also odd that the authors choose to use the 3+ charge state of the WT, but the 2+ for the D614G mutant. If both charge states were present, then both of them should be analyzed to ensure the population distributions are consistent within different charge states.

      We thank the reviewers for their suggestion. We have removed Table 1 since bimodal spectra are not resolvable for quantitation as described previously. We instead show spectra of overlapping peptides in these regions for interpretation by the reader.

      We show charge states that provide highest intensity for the peptides (Figure 2-figure supplement 5, Figure 3-figure supplement 3, Figure 4-figure supplement 3, Figure 5-figure supplement 3, Figure 6-figure supplement 3).

      8) The method for calculating p-values used to assess the significance of a difference in observed deuterium uptake is not described. The manuscript mentions technical replicates, but no specific information as to how many replicates were collected for each time point. These details should be included as they are also part of the summary table that is recommended for the publication of HDX data.

      We have utilized hybrid significance testing as suggested by the reviewers to determine significance as outlined by Hageman et al. We have included this in table S3 and in the text.

    1. Author Response

      Reviewer #1 (Public Review):

      Major points:

      1) How STC1 controls changes in MSCs' ability for hampering CAR-T cell-mediated anti-tumor responses is unclear.

      In this study, we demonstrated that the presence of STC1 is critical for MSCs to exert their immunosuppressive role by inhibiting cytotoxic T cell subsets, activating key immune suppressive/escape related molecules such as IDO and PD-L1, and crosstalking with macrophages in the TME. These immunosuppressive functions of MSC could be significantly hampered when the STC1 gene was knockdown. Considering that staniocalcin-1 is glycoprotein hormone that is secreted into the extracellular matrix in a paracrine manner, we would conclude that the role of STC-1 is not to alter the function of MSCs intracellularly. Rather, it facilitates the immunosuppressive capabilities of MSCs through extracellular secretion into the TME as a pleiotropic factor, thus impacting the functioning of T cells, cancer cells and other immune cells.

      The reviewer's question is well taken, and we have added the points mentioned above to the Discussion section to ensure a more comprehensive conclusion. Moreover, a recent study published in Cancer Cell, which was suggested by the other reviewer, is consistent with our results. It has provided further mechanistic information on how stanniocalcin-1 impacts immunotherapy efficacy and T cell activation. The reference has been cited and discussed as shown below.

      "In this model, activated macrophages or stress signals during CAR-T therapy may prompt MSCs to secret staniocalcin-1 into the extracellular matrix of TME, serving as a pleiotropic factor to negatively impact the function of T cells and stimulate the expression of molecules that inactivate immune responses, ultimately providing an immunosuppressive effect of MSC." (page 22, highlighted). "In line with our study, it was recently reported that stanniocalcin-1 negatively correlates with immunotherapy efficacy and T cell activation by trapping calreticulin, which abrogates membrane calreticulin-directed antigen presentation function and phagocytosis [50]." (Page 20, highlighted)

      2) Is ROS important? It is not tested directly.

      ROS plays an important role during immune response, which are released by neutrophils and macrophages. Not only do they act as key mediators of the adaptive immune response, but they also have the ability to modulate the activation of B-cells and T-cells. In our study, we suggest that ROS may be involved in NLRP3 inflammasome activation and the expression and secretion of STC1. Although we did not pursue this line of inquiry further as it was beyond the scope of our paper, we have included additional relevant research in Discussion and a reference is provided.

      "It has been proved that the expression and secretion of STC1 in multiple cell lines can be stimulated by external stimuli, including cytokines and oxidative stress [26]." (Page 21, highlighted)

      3) The changes in CD8 and Treg are not convincing. Moreover, it is not tested how these changes can be elicited by the presence of MSCs.

      We have included additional in vivo data to assess the levels of Treg cells and CD8+ in this revised manuscript. This not only confirms the alterations of CD8 and Treg, but also offers additional line of evidence to further analyze the influence of MSCs on CAR-T in vivo. The findings are presented in Figure 4B, and the corresponding discussion can be found on Page 17 (highlighted).

      Reviewer #2 (Public Review):

      Major points:

      1) STC-1 is expressed and secreted by many human cancer cells. This should be discussed in the introduction or discussion with more inter-related background info on both its regulation in cancer cells and secretion pattern into TME. It is important because you state that the STC-1 secreted by MSC has such strong functions, then how about those produced and secreted by cancer cells? Are those also stimulated by macrophages or other components in TME? Do they have possible functions in helping cancer cell to escape the immune surveillance mechanisms?

      Thanks for the suggestion. We have added more details about the regulation and secretion of STC-1 in cancer cells (see below). The information is added to both the introduction and discussion (highlighted on pages 4 and 21), and all the above questions are addressed.

      "It was proved that STC1 is involved in several oxidative and cancer-related signaling pathways such as NF-κB, ERK, and JNK pathways [26,27]. The expression and secretion of STC1 in cancer tissue can be stimulated by external stimulus including external cytokines and oxidative stress [26]. Under hypoxia conditions, STC1 could be modulated by HIF-1 to facilitate the reprogramming of tumor metabolism from oxidative to glycolytic metabolism [28]. STC1 was also reported to participate in the process of epithelial-to-mesenchymal transition (EMT), which is associated with tumor invasion and the reshape the tumor microenvironment, as well as increasing therapy resistance [29]." (Page 4)

      "It has been proved that the expression and secretion of STC1 in multiple cell lines can be stimulated by external stimuli including cytokines and oxidative stress [26]." (Page 21)

      2) In Figure 4B, using a single marker of IL-1β to show the immune suppressive capability of MSC in vivo is not sufficient, staining for CD4+ and CD8+ should also be included to demonstrate whether MSC could modulate T cell compositions, which can give more direct evidence about MSC's impacts on CAR-T cell.

      The above experiments were done as suggested, and the data were presented in figure 4B. Explanations of the results are shown on page 17 Results section and page 21 Discussion section (highlighted).

      3) One of the major risks associated with CAR-T therapy is an excessive immune response that causes cytokine release syndrome. MSCs have been used in clinics as a way to suppress immune response including post-CAR-T. What does the author think about using MSC with STC-1 knockout? Can it still help reduce toxicity while maintaining CAR-T efficacy? This might be a potential application.

      This is definitely an interesting idea. Based on the data presented in the current study, it is clear that knockdown of STC-1 would abrogate the immune-suppressive impact of MSC, and therefore affect CAR-T efficacy. However, whether the presence of MSC can help reduce cytokine release syndrome when losing the function of STC-1 requires further study. We agree with the reviewer, and we had briefly discussed this possibility at the very end of the discussion as shown below (Page 22, highlighted).

      "… the findings we presented here are no doubt that would have potential clinical applications toward improving the efficiency of CAR-T therapy as well as reducing the excessive toxicity by modulating the level of STC1 in TME".

      4) There was a recent study published in Cancer Cell (Lin et al. Stanniocalcin 1 is a phagocytosis checkpoint driving tumor immune resistance. 2021), and they also reported that STC1 negatively correlates with immunotherapy efficacy and patient survival. It should be cited, and in fact, it provided support to the authors' present study with completely different experimental settings.

      Thanks for providing this important information. It is an excellent study and consistent with our findings. The reference was added and discussed on page 20 (highlighted) as shown below.

      "In line with our study, it was recently reported that stanniocalcin-1 negatively correlates with immunotherapy efficacy and T cell activation by trapping calreticulin, which abrogates membrane calreticulin-directed antigen presentation function and phagocytosis [50]"

    1. Author Response

      Reviewer #1 (Public Review):

      This theoretical (computational modelling) study explores a mechanism that may underlie beta (13-30Hz) oscillations in the primate motor cortex. The authors conjecture that traveling beta oscillation bursts emerge following dephasing of intracortical dynamics by extracortical inputs. This is a well written and illustrated manuscript that addressed issues that are both of fundamental and translational importance.

      We are pleased by the reviewer’s judgement about the importance of the question that we consider and about the presentation of our manuscript.

      Unfortunately, existing work in the field is not well considered and related to the present work. The rationale of the model network follows closely the description in Sherman et al (2016). The relation (difference/advance) to this published and available model needs to be explicitly made clear. Does the Sherman model lack emerging physiological features that the new proposed model exhibits?

      We view the work of Sherman et al (2016) and ours as complementary. Sherman et al propose a model of a single E-I module, using the terminology of our manuscript, that is much more detailed than ours since it approximately accounts for the layered structure of the cortex using two layers of multi-compartment spiking neurons, each comprising 100 excitatory neurons and 35 inhibitory neurons. This allows a detailed comparison of the model with local MEG signals. We used a much simpler description and only describe the population behavior of local E and I neurons populations in each module. However, contrary to Sherman’s model, this allows us to address the spatial aspect of beta oscillations which is the main target of our work. Our simple description of a local E-I module allows us to consider several hundred E-I modules with a spatially-structured connectivity and to analyze the spatio-temporal characteristics of beta activity. We have now described the relation of our work with Sherman et al (2019) in the discussion section (lines 540-547).

      The authors may also note the stability analysis in: Yaqian Chen et al., “Emergence of Beta Oscillations of a Resonance Model for Parkinson’s Disease”, Neural Plasticity, vol. 2020, https://doi.org/10.1155/2020/8824760

      We thank the reviewer for pointing out this paper that had escaped our notice. It presents the stability analysis of a single E-I module with propagation delay (and instantaneous synapses). At the mathematical level, the analysis brings little as compared to the much older article of Geisler et al., J Neurophys (2005) that we cite. However, the model specifically proposes to describe beta oscillations in the motor cortex as arising from the interaction between excitatory and inhibitory neurons, as we do. Therefore, we included this reference as well as a reference to the previous work of Pavlides et al., PLoS Comp Biol (2015) where the model was developed.

      The model-based analysis of the traveling nature of the beta frequency bursts appears to be the most original component of the manuscript. Unfortunately, this is also the least worked out component. The phase velocity analysis is limited by the small number (10 x 10) of modeled (and experimentally recorded) sites and this needs to be acknowledged.How were border effects treated in the model and which are they?

      We thank the reviewer for these points which gave us the opportunity to clarify them and improve our manuscript. As described in Methods: Simulations (line 847 and seq.) and shown in Fig. S2 (Fig. S10 in the original submission), we actually simulated our model on a 24 × 24 grid and did all our measurements in a central 10×10 grid to take into account that the electrode covers only part of the motor cortex. In addition to minimize border effects, we added on each side of the 24×24 grid two rows of E-I modules kept at their (non-oscillating) fixed points of stationary activity, as depicted in Fig. S2. In order to address the concern of the reviewer, and to check that indeed border effects had a minimal impact on our results, we have performed a new set of simulations on a 24×24 grid with periodic boundary conditions. The results are shown in the new supplementary Fig. S9 and are indistinguishable from those reported in the main text and figures. In particular, the proportion of the different wave types and the wave speeds are unaffected by this change of boundary conditions. A paragraph has been added in the revised version (lines 371-378) to discuss this point.

      How much of the phase velocities are due to unsynchronized random fluctuations? At least an analysis of shuffled LFPs needs to be performed.

      The phase velocities are indeed due to unsynchronized random fluctuations (coming from the finite number of neurons in each of our modules as well as, and more importantly, from the uncorrelated local external inputs). In order to check that the spatial-structure of connectivity was important, we followed the suggestion of the reviewer and also performed a new set of simulations to provide a further test. As proposed by the reviewer, after performing the simulations we shuffled in space the signal of the different electrodes and also did a parallel analysis where we shuffled the signal from different electrodes in the recording. We then reclassified the shuffled simulations/recordings in exactly the same way as the original ones. As shown in the new additional Fig. S16, this resulted in the full elimination of time frames classified as “planar waves” both in the model and in the experimental recordings. Additionally, it little modified the proportion of “synchronized” or “random” episodes which is intuitively understandable since shuffling does not change the nature of these states. In order to further assess the impact of connections between modules, we also decided to suppress them, namely to put their range l to zero. In order to avoid modifying the working point of a local module by this manipulation, we focused on the case without propagation delay. Without long-range connection, the local dynamics of each module is little modified. However, as shown in the new Fig. S18a, synchronization between neighboring modules is strongly decreased and the proportion of the different wave types is entirely changed: synchronized states and planar waves disappear and are replaced by random states. These results are described in two new paragraphs (lines 401-414 and lines 431-435).

      Is there a relationship between the localizations of the non-global external input and the starting sites of the traveling waves?

      This is also an interesting question that parallels some asked by the other reviewers and which we did our best to address. As described in the “Essential revisions” point 5) above, we aligned all “planar wave events” in space and time with the help of the spatio-temporal phase maps of the oscillations. We did find that planar waves were preceded by an increase in the global synchronization index σp, both in simulations and in experiments. In simulations this increase also corresponded to a shift of the global inputs away from their mean, as depicted in the new Fig. 4 in the main manuscript. However, no significant average spatio-temporal profile of the local inputs emerged when we used these temporal alignments. This is presumably due to the large variability of local inputs that can give rise to planar waves. We have described these results in the new section “Properties of planar waves and characteristics of their inputs”.

      In summary, this work could benefit from a widening of its scope to eventually inspire new experimental research questions. While the model is constructed well, there is insufficient evidence to conclude that the presented model advances over another published model (e.g. Sherman et al., 2016).

      As described in the “Essential revisions” and the discussion section of the manuscript, our work highlights a number of questions that can (and hopefully will) inspire new experimental research. We also hope that we have clarified above that our model complements Sherman et al.’s model and advances it as far as the spatial aspects of beta oscillations in motor cortex are concerned.

      Reviewer #2 (Public Review):

      Kang et. al., model the cortical dynamics, specifically distributions of beta burst durations and proportion of different kind of spatial waves using a firing rate model with local E-I connections and long range and distance dependent excitatory connections. The model also predicts that the observed cortical activity may be a result of non stationary external input (correlated at short time scales) and a combination of two sources of input, global and local. Overall, the manuscript is very clear, concise and well written. The modeling work is comprehensive and makes interesting and testable predictions about the mechanism of beta bursts and waves in the cortical activity. There are just a few minor typos and curiosities if they can be addressed by the model. Notwithstanding, the study is a valuable contribution towards developing data driven firing rate.

      We really appreciate the positive comments of the reviewer and thank her/him for them. We have done our best to correct the typos and to address the questions raised by the reviewer.

      1) The model beautifully reproduces the proportion of different kind of waves that can be seen in the data (Fig 3), however the manuscript does not comment on when would a planar/random wave appear for a given set of parameters (eg. fixed v ext, tau ext, c) from the mechanistic point of view. If these spatio-temporal activities are functional in nature, their occurrence is unlikely to be just stochastic and a strong computational model like this one would be a perfect substrate to ask this question. Is it possible to characterize what aspects of the global/local input fluctuations or interaction of input fluctuations with the network lead to a specific kind of spatio-temporal activity, even if just empirically ?

      This is an important question that parallels some asked by the other reviewers and which we did our best to address. As described in the “Essential revisions” paragraph above, we aligned all “planar wave events” either in phase or at their starting time points. We did find that planar waves were preceded by an increase in the global synchronization index σp, both in simulations and in experiments. In simulations this increase also corresponded to a shift of the global inputs away from their mean, as depicted in the new Fig. 4 in the main manuscript. When we used the same alignment to average spatio-temporal local inputs, we did not see the emergence of any significant patterns. This presumably reflects the high variability of local inputs able to produce a planar wave.

      Do different waves appear in the same trial simulation or does the same wave type persist over the whole trial? If former, are the transition probabilities between the different wave types uniform, i.e probability of a planar wave to transit into a synchronized wave equal to the probability of a random wave into synchronized wave?

      In the same trial simulation, different types of waves indeed successively appear. The curiosity of the reviewer led us to investigate this interesting point. Since time frames classified as random or synchronized are much more numerous than the planar (and radial) wave ones, it is much more probable that a planar wave transits into a synchronized or a random pattern than the reverse process (i.e., synchronized and random patterns preferentially transit into each other). Nonetheless, we considered questions related to the one of the reviewer. What are the states preceding a planar wave event? Given that a planar wave episode is preceded by a random (or synchronous) episode, is it more likely to be followed by a random or by a synchronous event? We actually find that the entry state is prominently a synchronized state. Furthermore, when the entry state is synchronized, the exit state is also synchronized much more often than would be expected by chance. This shows that most often, planar waves are created from an underlying synchronized persistent state. This has been described in the revised manuscript (lines 443-451).

      2) Denker et al 2018, also reports a strong relationship between the spatial wave category, beta burst amplitude, the beta burst duration and the velocity (Fig 6E - Denker et. al), eg synchronized waves are fastest with the highest beta amplitude and duration. Was this also observed in the model ?

      We had long exchanges with Michael Denker about his analysis since there are some differences between his code and what is described in Denker et al. (2017), possibly because of several typos in the Method section of Denker et al (2017). We have checked that the results of our code agree with his but there are some differences with the results obtained on the available datasets and those reported in Denker et al from other data sets. We have now provided the detailed statistics of the different wave types as obtained by our analysis in the simulation of model SN (Fig. S9) and SN’ (Fig. S11) and in the recordings for monkey L (Fig. S10) and monkey N (Fig. S12). In the recording data, the amplitude and speed of the synchronized and planar waves are comparable and higher than in the radial and random wave types. The duration of synchronized events is longer than the one of planar waves and of the other waves types. Comparable results are obtained in the simulations with nonetheless a few differences: the mean amplitude of planar waves is somewhat larger than those of synchronized states, the hierarchy of duration in the different states is respected but the duration themselves are longer in the simulations than in the recordings (about 40 % for the planar waves and almost two times longer for the synchronized states). We attribute these differences to the fact synchronization is slightly less effective in the recordings than in the model. Long synchronization episodes in the recordings are often cut-off by a few time frames where the synchronization index goes below the threshold value for a synchronized pattern. This happens rarely enough not to affect much the global statistics of the different states but it as a much more visible effect on the measured duration of the synchronized states.

      Reviewer #3 (Public Review):

      In this manuscript, the authors consider a rate model with recurrently connections excitatory-inhibitory (E-I) modules coupled by distance-dependent excitatory connections. The rate-based formulation with adaptive threshold has been previously shown to agree well with simulations of spiking neurons, and simplifies both analytical analysis and simulations of the model. The cycles of beta oscillations are driven by fluctuating external inputs, and traveling waves emerge from the dephasing by external inputs. The authors constrain the parameters of external inputs so that the model reproduces the power spectral density of LFPs, the correlation of LFPs from different channels and the velocity of propagation of traveling waves. They propose that external inputs are a combination of spatially homogeneous inputs and more localized ones. A very interesting finding is that wave propagation speed is on the order of 30 cm/s in their model which is consistent with the data but does not depend on propagation delays across E-I modules which may suggest that propagation speed is not a consequence of unmylenated axons as has been suggested by others. Overall, the analysis looks solid, and we found no inconsistency in their mathematical analysis.

      We thank the reviewer for his comments and for his expert review.

      However, we think that the authors should discuss more thoroughly how their modeling assumptions affect their result, especially because they use a simple rate-based model for both theory and simulations, and a very simplified proxy for the LFPs.

      In the revised manuscript, we have performed additional simulations to test different modeling assumptions as suggested by the reviewer and discussed further below.

      The authors introduce anisotropy in the connectivity to explain the findings of Rubino et al. (2006), showing that motor cortical traveling waves propagate preferentially along a specific axis. They introduce anisotropy in the connectivity by imposing that the long range excitatory connections be twice as long along a given axis, and they observe waves propagating along the orthogonal axis, where the connectivity is shorter range. Referring specifically to the direction of propagation found by Rubino et al, could the authors argue why we should expect longer range connections along the orthogonal axis? In fact, Gatter and Powell (1978, Brain) documented a preponderance of horizontal axons in layers 2/3 and 5 of motor cortex in non-human primates that were more spatially extensive along the rostro-caudal dimension as compared with the medio-lateral dimension, and Rubino et al. (2006) showed the dominant propagation direction was along the rostro-caudal axis. This is inconsistent with the modeling work presented in the current manuscript.

      This is an important comment and we thank the reviewer for pointing out these data in Gatter and Powell (1978). Since the experimental data show that planar wave propagation directions are anisotropically distributed, we have tried and investigated what the underlying mechanism of this anisotropy could be in the framework of our model. Anisotropy in connectivity is an obvious possibility. Given our result, and the data of Gatter and Powell, it appears however that it is not the underlying cause of the observed anisotropy direction in the motor cortex (in the framework of our model). We have thus investigated another possibility, namely that the local external inputs are anisotropically targeting the motor cortex, being more spread out along a given axis (lines 510-529 and new Fig. 5g-l). We find that planar waves propagate preferentially along the orthogonal axis. This leads us to conclude that the observed propagation anisotropy could be of consequence of the external input being more spread out along the medio-lateral axis. Data addressing this issue could be obtained using retroviral tracing techniques.

      The clarity and significance of the work would greatly improve if the authors discussed more thoroughly how their modeling assumptions affect their result. In particular, the prediction that external inputs are a combination of local and global ones relies on fitting the model to the correlation between LFPs at distant channels. The authors note that when the model parameter c=1, LFPs from distant channels are much more correlated than in the data, and thus have to include the presence of local inputs. We wonder whether the strong correlation between distant LFPs would be lower in a more biologically realistic model, for example a spiking model with sparse connectivity and a spiking external population, where all connections are distant dependent. While the analysis of such a model is beyond the scope of the present work, it would be helpful if the authors discussed if their prediction on the structure of external inputs would still hold in a more realistic model.

      This is a legitimate question that we indeed asked ourselves. In a previous work with a simpler chain model, we only considered finite size fluctuations. We found good agreement between our simplified description of finite size fluctuations and simulations of a spiking network with fully connected modules and sparse distance-dependent connectivity. This leads us to believe that our description of finite-size fluctuations is reliable in this setting. Assuming that it is the case, we find that with 104 neurons or more per module finite size noise is not strong enough to replace our local external inputs. Even with 2000 neurons per modules the intrinsic fluctuations the network is very synchronized (new Fig. S15e-g). With 200 neurons per module, the intrinsic fluctuations are strong enough to replace the fluctuating local inputs (Fig. S15a-d) but this is quite a low number. Our description of local noise would have to underestimate the fluctuation in a more sparsely connected network by a significant amount for agreement with the data to be obtained without local inputs. Moreover, it seems to us quite plausible that different regions of motor cortex receive different inputs but, of course, this can only settled by further experiments. Together with the new Fig. S15, we have added a paragraph to address this question in the manuscript (lines 379-400).

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses (major)

      1) Adding control groups (sham stimulation) to Experiment 5 and Experiment 8 would be needed to increase confidence that NITESGON's memory-enhancing effects do not depend on sleep but do depend on dopamine receptor activity.

      Thank you for highlighting this major weakness within our research; we will be sure to include control groups in future research if we conduct replication studies. Additionally, upon review of your comment, we have addressed the lack of control/sham groups in Experiment 5 and 8 in the Discussion section when acknowledging the limitations of the research.

      Please see the newly added text from the Discussion section on pages 21-22 below:

      “Moreover, it must also be acknowledged that Experiments 5 and 8 did not include a control-sham stimulation group, thus limiting the interpretation of these two experimental findings. Control-sham stimulation groups would increase our confidence in our findings that NITESGON’s memory-enhancing effects depend not on sleep but on DA receptor activity.”

      2) Task order in the interference study in Experiment 4 was randomized during the first visit for task training as well as during the memory test, however, the word-association and spatial navigation tasks used in Experiments 3 and 4 were not counterbalanced during training or memory testing. Thus, the authors cannot rule out the possibility of order effects.

      Upon reading your comment and reviewing the paper, we have decided to add a limitations paragraph to the paper which highlights the concern of Experiments 3 and 4 not being counterbalanced during training or memory testing. Additionally, the new section provides an explanation of how not counterbalancing Experiments 3 and 4 introduced the possibility of order effects being present in the results.

      Please see the new addition from the Discussion section on page 21 below:

      “When interpreting the current findings, it must be considered that some limitations exist within the research; limitations on experimental design are noted below, followed by a discussion of utilizing indirect proxy measures. The task order for Experiment 4 was randomized during the first visit for training and the recall-only memory test 7-days later; however, the word association and spatial navigation task used in Experiments 2 and 3 were not counterbalanced; therefore, the findings of Experiments 2 and 3 could have been impacted by a potential order effect.”

      3) It is unclear how Experiment 3 and Experiment 4 differ. Percent of words recalled is the measure of memory performance, however, there is not a clear measure of interference in Experiment 4 (i.e., words recalled during Memory task II that were from Memory task I).

      Thank you for highlighting the difficulty in distinguishing the differences between Experiment 3 and Experiment 4. To clarify what the differences are between Experiment 3 and Experiment 4, we explained in Experiment 4’s introductory paragraph that the object-location task used in Experiment 3 was replaced with a Japanese-English verbal associative learning task in Experiment 4.

      Please see the paragraph from the Experiment 4 subsection on page 10 below:

      “Experiments 2 and 3 revealed both retroactive and proactive memory effects 7-days after initial learning of the two tasks. To further explore if NITESGON is linked to behavioral tagging and evaluate if interference impacts NITESGON as the strong stimulus, Experiment 4 removed the object-location task used in Experiments 2 and 3 and replaced it with a Japanese-English verbal associative learning task similar to the Swahili-English verbal associative task. Considering how memory formation and persistence are susceptible to interference occurring pre-and post-encoding(37-39) and are heavily influenced by commonality amongst the learned and intervening stimuli(40); it is believed that conducting two consecutive, like-minded word-association (i.e., Swahili-English and Japanese-English) tasks will result in one’s consolidation process interfering with that of the other(41). Considering how our previous experiments suggest the effect obtained by NITESGON improves the consolidation of information via behavioral tagging, it is possible that NITESGON on the first task might help reduce the overall interference effect on the second task.”

      Additionally, we explained in further detail that comparing the percentage of correctly recalled word pairs on the second task 7-days after learning from the percentage of correctly recalled word pairs on the first task 7-days after learning was done to measure for an interference effect.

      Please see the adapted text from the Experiment 4 subsection on page 11 below:

      “Upon assessment for a potential interference effect, the active group displayed no significant difference in how many words participants were able to recall between the first and the second task (difference: .76 4.93) (F = .29, p = .60), whereas the sham group demonstrated the first task rendered an interference effect on the second task (difference: 5.16 5.99) (F = 14.11, p = .001).”

      Lastly, in the methods section describing how the interference effect was calculated was changed. The newly edited text better explains that the percentage of words pairs learned were subtracted from one another to measure the significance of interference one may have potentially had on the other.

      Please see the amended text in the Methods section on page 38 below:

      “In addition, an interference effect was calculated by subtracting the percentage of correctly recalled word pairs on the second task 7-days after learning from the percentage of correctly recalled word pairs on the first task 7-days after learning. This number gave a proxy of interference.”

      4) In Experiment 5 the learning and test phases for the two sleep groups were conducted at different times of day (sleep group: training at 8pm and testing the next morning at 8am, sleep deprivation group: training at 8am and testing at 8pm) which introduces the possibility of circadian effects between the two groups. Additionally, the memory test occurred at the 12h point for this experiment instead of the 7-day point. Therefore, the authors' conclusions are not addressed by this experiment, and it remains unclear whether the 7-day long-term memory effects of NITESGON are sleep-dependent.

      Upon reading your comment and reviewing the paper, we have decided to add a limitations paragraph to the paper which highlights the two sleep groups being conducted at different times of day and the memory test occurring at the 12-hour point as opposed to 7-days after initial learning. In addition to acknowledging these limitations, we have also provided explanations regarding what potential effects are introduced by having the sleep groups learn and test at different times of day, such as circadian effects between the two groups, and the memory tests occurring at 12-hours rather than 7-days after initial learning.

      Please see the new addition from the Discussion section on page 21 below:

      “Additionally, in Experiment 5, the learning and test phases for the two groups were conducted at different times of day (i.e., sleep group: training at 8 p.m. and testing at 8 a.m., sleep deprivation group: training at 8 a.m. and testing at 8 p.m.), thus introducing the potential for circadian effects between the two groups. Furthermore, the recall-only memory testing occurred at the 12-hour point rather than 7-days later, allowing us to conclude that the observed effect seen 12-hours later was not affected by sleep; however, it remains unclear whether the 7-day long-term memory effects of NITESGON are sleep-dependent.”

      Weaknesses (minor)

      1) Salivary amylase is being used as a proxy of noradrenergic activity; however, salivary amylase levels increase with stress as well, which impacts memory performance. It would be helpful if the authors addressed this and whether they measured other physiological indicators of stress/sympathetic nervous system activation.

      Upon review of your comment, we have edited the paper so that it includes text in the Discussion section that brings attention to the fact that stress can enhance salivary amylase and advises readers that this should be considered when interpreting results. We also add an additional measure which measure pupil size, a measure well-know for sympathetic measure. In addition we add also a VAS score to ask people about their stress levels.

      Please see the added new addition from page 22 below.

      “Although the use of indirect proxy measures, such as sAA for NA activity and sEBR for DA activity, enabled the tracking of LC-NA activity changes from baseline measurements and demonstrated the potential of an LC-DA relationship, caution must be advised when interpreting results considering these proxy measures are affiliated with limitations, such as being substantially variable, as well as the potential of other brain regions and monoamine neurotransmitters being associated with changes seen in sAA concentration levels(80), an enzyme that is provoked by both central parasympathetic and sympathetic nervous system activation, including acute stress responses(81). Additionally, although sEBR has been increasingly linked to DA, it has been defined as a more viable measure of striatal DA activity(52, 82). At the same time, some evidence suggests that sEBR and DA levels may be unrelated(83, 84), thus requiring further validation as a behavioral proxy measure.”

      2) Insufficient details of how the blinding experiment was conducted make it difficult to determine whether participants had awareness or subjective responses during the NITESGON stimulation. Adding physiological indicators of heart rate, skin conductance, and respiration would provide a better indicator of a sympathetic nervous system response. Additionally, a series of randomized stimulation and sham trials delivered to the participant would provide a more objective measure of the detectability of the stimulation.

      Thank you for your comment regarding the portion of the experiments that were included to determine the efficacy of the measures taken to ensure the experiments were well blinded. After reviewing the comment and reading over the paper, we were concerned that it was not clear enough to the reader that the efficacy of blinding was determined by having each participant of every experiment complete the same single-answer questionnaire after all NITESGON and testing had been experienced. Therefore, we edited the wording below to elucidate that there was not an individual blinding experiment but that there was a questionnaire for every participant in every experiment to help determine the efficacy of blinding for each experiment and the research.

      Please see the text from the Blinding section on pages 17-18 below:

      “Blinding. To determine if the stimulation was well blinded, all participants in Experiments 1-7 were asked to guess if they thought they were placed in the active or control group (i.e., what stimulation participants received compared to what participants expected). Our findings demonstrated that participants could not accurately determine if they were assigned to the active or sham NITESGON group in each experiment, suggesting that our sham protocol is reliable and well-blinded (see fig. 8).”

      Additionally, please see the text in the Methods section that has been reworded to clarify how the questionnaire of blinding was conducted on page 47 below:

      “Blinding: To determine if the stimulation for all experiments was well blinded, all participants who participated in Experiments 1-7 were asked to complete a single-response questionnaire after the conclusion of the NITESGON procedure. Here, participants were asked to guess if they thought they were placed in the active or control group. A χ2 analysis was used to determine if there was a difference between what stimulation participants received compared to what participants expected.”

      3) It would be appreciated if the authors could speak to the possible role of the amygdala in the memory-enhancing effects of NITESGON, as this region is a well-known modulator of many types of memory consolidation and is implicated in noradrenergic-related memory enhancement.

      Upon consideration of your comment, we added text providing the reader with insight into how NITESGON has activated the amygdala in previous research, similar to the VTA in the current study, and how the LC and amygdala were shown to be activated during emotionally arousing stimuli in another study. Furthermore, we have acknowledged that the amygdala is understood to have modulatory implications in long term memory and how future investigations are needed to establish the amygdala’s role with NITESGON.

      Please see the text from the Discussion section on page 20 below:

      “Additionally, it is well-known that the amygdala is not the final place of memory storage, but rather has major modulatory influences on the strength of a memory(74). Similar to the VTA in the current study, prior research has shown that the amygdala is activated during NITESGON but ceased post-stimulation; however, NITESGON was not accompanied by a task during the experiment(14). Moreover, a recent fMRI study spotlights the dynamic behavior of the LC during arousal-related memory processing stages whereby emotionally arousing stimuli triggered engagement from the LC and the amygdala during encoding; however, during consolidation and recollection stages, activity shifted to more hippocampal involvement(75). Considering the impact the VTA and amygdala can have on memory, future experimental investigations are needed to establish their role in the memory-enhancing effects of NITESGON.”

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Cover et al. examine the role of thalamic neurons of the rostral intralaminar nuclei (rILN) that project to the dorsal striatum (DS) in mice performing a reinforced action sequence task. Using patch-clamp electrophysiology, they find that neurons from the three rILN (CM, PC, and CL) have similar electrophysiological properties. Using fiber photometry recordings of calcium activity from rILN neurons that project to DS, they show that these neurons increase in activity at the first lever press and reward acquisition in mice performing a lever pressing operant task. They additionally demonstrate that this action initiation and reward-related activity exists more generally in mice performing other movements or rewarded tasks. Building on their lab's previous work, the authors further find that by optogenetically activating or inhibiting these rILN-DS neurons, mice will increase or decrease task performance, respectively. Lastly, the authors show that a variety of cortical and subcortical areas have input to rILN-DS neurons suggesting that these neurons might act as an integrator of signals from such areas during task performance.

      • The authors beautifully show that the electrophysiological properties of CM, PC, and CL neurons are similar and go on to treat the rILN as one homogenous nucleus for functional fiber photometry recordings and optogenetic stimulations. It seems that these recordings and stimulations were only performed in CL, as indicated in the images (Fig. 2A, 4A). Is this the case, or were CM, PC, and CL neurons sampled? It would be helpful to clarify if DS projecting neurons from all rILN nuclei show the reported action initiation and reward acquisition activity or only CL neurons.

      The arrangement of the rILN nuclei presents a technical challenge for experiments attempting to selectively record from or manipulate a single nucleus in this grouping. Based on our findings that the three nuclei do not differ in electrophysiological properties, we approached the in vivo experiments with the intent to target the rILN as a unit. As the reviewer points out, the medial-lateral coordinate for optic fiber placement tended to align above the CL and PC nuclei. However, variability in fiber placement and spread of light within tissue resulted in inclusion of CM activity as well. Given the spread of light through tissue (Shin, et al., 2016; PMID: 27895987), it would be very difficult to confidently determine from histology which photometry recordings were primarily obtained from CL vs PC vs CM neuronal activity. We agree with the reviewer that these three nuclei may differently signal during reward-driven behavior. Our di-synaptic tracing study supports this possibility as it revealed unique afferent connectivity to rILNDS projecting neurons. We now mention this limitation of our approach in the discussion (lines 324 - 330).

      • Along similar lines, to what extent of rILN was targeted for optogenetic activation and inhibition? It seems that the authors implanted a total of 4 optic fibers, two on each side (please clarify in methods). What was the reasoning behind this? Please show that only rILN and not PF was activated/inhibited.

      We apologize for the confusion in our description of this method. For our optogenetic experiments, we infused viruses at four locations (bilateral striatum and rILN) and implanted only two fibers (bilateral rILN) to selectively target striatally-projecting rILN neurons. We have added clarification on this detail to the methods section.

      To prevent inadvertent modulation of Pf neurons, we used virus injection coordinates and volumes that prevented viral spread to the Pf and furthermore implanted the optic fibers in the more rostral regions of the rILN. We histologically confirmed viral expression and fiber placement for all mice and excluded any mice with viral spread to the Pf or off-target fiber placement. We include these criteria for post-hoc exclusion in the methods.

      • While AAV1 is becoming a popular tool for transsynaptic labeling, performing confirmatory patch-clamp recordings with optogenetic activation of inputs, would provide better evidence for the synaptic connection between upstream regions, such as ACC and OFC, and rILN neurons.

      We agree that electrophysiological confirmation of these inputs to the rILN would complement our tracing study. As our focus for this experiment was to specifically identify inputs that synapse on striatally-projecting rILN neurons, we interrogated putative afferents that were already established to project to the rILN. There are several studies that demonstrate the physiological circuits from some of these afferent projections to the rILN (without di-synaptic specificity), such as the SNr  rILN projection (Rizzi & Tan, 2019; PMID: 31091455).

      • In addition, the transsynaptic tracing experiments would benefit from showing the cell count quantifications in CM, PC, and CL. It seems that the authors have already performed this quantification for constructing their diagrams on the right. To make any point about the relative strength of afferent innervation to rILN-DS neurons showing such quantification would be necessary.

      Thank you for this suggestion, we now include cell counts for 2 cases per investigated afferent (Supplemental Table S2).

      • Why is the injection site for the retrograde cre-dependent tdTomato AAV (Fig. 5 middle left panels) showing expression? Is the cre coming through transsynaptic AAV1 from direct projections of each AAV1 injection site (AAV1 is not supposed to spread across a second synapse)? The diagrams suggest that not all regions (e.g. SUM or SC) have direct projections to DS.

      We apologize for this confusion. The tdTomato fluorophore expression observed in the striatum may arise from several possible circuit configurations. To survey just a couple: 1) tdTomato expression in the DS arises from direct projections from the afferent bypassing the thalamus (e.g. ipsilateral ACC→Striatum), which would result in labeled striatal somata (ACC pyramidal neurons delivering AAV1-cre to an MSN, and those local MSN collaterals retrogradely picking up rAAV-DIO-tdtomato) and ACC labeled axon terminals in the DS (ACC interneurons delivering AAV1-cre to DS-projecting ACC pyramidal neurons that pick up rAAV-DIO-tdtomato); 2) terminal projections arising from the labeled rILN neurons shown in the middle-right panels (i.e. ACC→rILN→Striatum).

      Reviewer #2 (Public Review):

      This manuscript details the role of the rILN to the DS pathway in the onset of operant behavior that promotes the delivery of a reward and in the ultimate acquisition of that reward. The strengths of the paper are in the detailed fiber photometry study that encompasses several behavioral domains that correlate to the signal observed in the rILN to DS pathway. I am especially interested in how the "encoding" shifts across time as the animals refine their behavior both in a temporal sense and in the magnitude of the signal. Further, the authors demonstrate then that this is dependent on action, as they do not observe signals in a Pavlovian behavioral task, but do observe reward-based signals in a "free consumption" task (the strawberry milk). The examination into devaluation also enhances the understanding of this pathway, even though there were no differences between a valued and devalued task. Finally, the authors examine bi-directional optogenetic manipulation of the pathway, and its impact on how the trials are completed, omitted, or incomplete. They find that manipulation alters the % completed trials and regulates trial omission. This paper really does not have any glaring weaknesses to point out, however, the physiological assessment does seem to have a few strong trends and even though the studies are well powered, and included both sexes, sex as a biological variable was not commented on that I could find. My estimation of the data doesn't suggest strong sex differences in any metric measured. Additionally, the data that included projections to the rILN were very interesting, and future studies looking into the physiology of these neurons, and/or how the physiology of these neurons adapt after operant training may be very interesting to understand plasticity within the adaptation across the training from FR1 to FR5 with time limits.

      Thank you for your review. We analyzed our data for sex differences but did not identify any significant differences between male and female subjects for any of the experiments.

    1. Public Review

      Reviewer #1 (Public Review):

      1) “In fact, it is not surprising that the collagen mutants display a detached cuticle, because the extracellular domains of MUP-4 and MUA-3 (the transmembrane receptors of apical hemidesmosomes that are primarily responsible for tethering the epidermis to the cuticle) both contain vWFA collagen-binding domain (Hong et al., JCB 2001; Bersher et al., JCB 2001). Hence loss of certain collagens in the cuticle directly affects cuticle-epidermis attachment due to defective ligand-receptor interactions is a much more plausible explanation.”

      We agree with the reviewer that a specific molecular interaction likely mediates the attachment of the cuticle to the epidermis, not only in the area above the hemidesmosomes, but also in the area of the meisosomes. The collagens that potentially associate with MUP-4 and/or MUA-3 in the muscle regions have not been identified, nor in the main epidermal region, where the putative receptor is not known. We have modified the text accordingly.

      “Likewise, it is more resonable to propose that lack of certain collagens in the cuticle directly affects cuticle stiffness, rather than working indirectly through epidermal meisosomes.”

      We agree with the reviewer that the loss of specific structural components of the cuticle could well affect stiffness directly, especially if the furrows are affected; non-furrow collagen mutants do not show this phenotype. An analogy might be the increased stiffness that corrugation provides. We have modified the text accordingly. Our future research aims precisely at modelling these physical aspects.

      2) “VHA-5::GFP does not co-localize with fluorescent markers for MVB, recycling endosomes and autophagolysosomes. By claiming this, the authors made a huge assumption that the overexpressed VHA-5::GFP fusion protein can only possibly associate with four types of organelles (meisosomes, MVB, recycling endosomes and autophagolysosomes) but not any other known or to-be-identified subcellular structures. In addition, a previous study did report that VHA-5 is localized in several other places besides the apical membrane stacks (Liegeois et al., JCB 2006).”

      The reviewer cites the Liegeois paper that we mention above, which, in our opinion, and that of reviewer 2 (“VHA-5 is well known to localise to the apical membrane stacks (Liegeois 2006) and could be served as marker of apical membrane structure”), provides extremely strong support for our position. In Liegeois et al., 2006, there is a quantification of immunogold staining that shows that >85% of VHA-5 is found in meisosomes (Fig S5D). By providing the results of co-localisation analyses with 3 cytoplasmic vesicular markers, we simply wanted to illustrate the specificity of the signal to the non-initiated. Importantly, we now provide strong evidence that VHA-5::GFP marker co-localises with apical plasma membrane macrodomains revealed by both a PH domain of PLCδ and a CAAX marker. As our ultrastructural analyses demonstrate that meisosomes are composed by apical membrane folds, this again is wholly consistent with VHA-5 being a bonafide marker of meisosomes.

      Reviewer #2 (Public Review):

      The reviewer questioned the need to give another name to the “apical membrane stacks”. We made this proposition after consultation with a broad community of researchers in the field. We believe that this simpler name provides a link to an analogous structure in yeast, the eisosome, also at the interface between the aECM and the cell.

      The reviewer wrote, “The major problem of this paper is that there is not much new information”, that it was known, for example, that “"furrowless" dpy mutants result in complete disorganization of the epidermis”. In addition to demonstrating that the furrowless Dpy mutants have very particular and specific phenotypes, without affecting the presence of hemidesmosomes (PMID: 33033182), nor different vesicular markers (FIgure 6S2), we would like to point out that reviewer #1 commented, “the work presented by Aggad et al. is rich in novelty”, and Reviewer #3, “The major strengths of the paper are the novelty”. We have re-written and reorganised the text and hope Reviewer #2 appreciates the novelty more in the revised version.

    1. Author Response

      Reviewer #2 (Public Review):

      Wu Yang et al. investigated how exophers (large vesicles released from neuronal somas) are degraded. They find that the hypodermal skin cells surrounding the neuron break up the exophers into smaller vesicles that are eventually phagocytosed. The neuronal exophers accumulate early phagosomal markers such as F-actin and PIP2, and blocking actin assembly suppressed the formation of smaller vesicles and the clearance of neuronal exophers. They show the smaller vesicles are labeled with various markers for maturing phagosomes, and inhibiting phagosome maturation blocked the breakdown of exophers in to smaller vesicles. Interestingly, they discover that GTPase ARF-6, effector SEC-10/Exocyst, and the phagocytic receptor CED-1 in the hypodermis are required for efficient production of exophers by neurons.

      Strength

      The study clearly demonstrates that exophers are eliminated via hypodermal cellmediated phagocytosis. Exophers are broken down into smaller vesicles that accumulate phagocytic markers, and inhibiting this process shows that exophers are not resolved. The paper does a thorough examination of various markers and mutants to demonstrate this process.

      The hypodermal cells not only engulf these small vesicles, but they also play a role in the formation of exophers. Exopher production is reduced when ARF-6, SEC-10, or CED-1 are knocked down in the hypodermis. This is intriguing because phagocytosis is a critical step in the final elimination of cells, but in this unique situation, it appears that the neuron fails to extrude the exopher without phagocytes.

      Weakness

      Non-professional phagocytes engulfing cell corpses and many other types of cellular debris (e.g. degenerating axons) have been shown in multiple systems and the observations here are not surprising. Many of the markers used in the study are wellestablished phagocytic markers and do not bring forward a new technological advance.

      What's interesting is that the breakdown of exophers into smaller vesicles and eventual clearance follows a different sequence of events than macrophages. Exophers appear to undergo phagosomal fission before interacting with lysosomes. This would be difficult to appreciate by a general reader.

      While the paper has strengths, it appears that the message is not clear. The title suggests that the reader will learn about how ARF-6 and CED-1 control exopher extrusion. Although this observation is intriguing and maybe the main point of the paper, there does not appear to be a substantial amount of data to support this claim. The only data to back this up is in the final figure and the majority of the paper is focused on how hypodermal cells phagocytose exophers.

      The title has been revised.

      To show exopher secretion is dependent on the hypodermal cells-

      1) Could authors induce exopher production through other means? And test any involvement of CED-1? For example, authors note exopher production increases under stress conditions including expression of mutant Huntingtin protein. It would be intriguing if loss of CED-1 would be sufficient to block or reduce exopher production in that context and would highlight an exciting role for phagocytic cell types.

      We interpreted this question as an inquiry into whether the neuron intrinsic exopher inducer was relevant to reliance on hypodermal interaction for exophergenesis, given our use of aggregating mCherry as the inducer. Unfortunately, our Huntingtin expressor lines now display high levels of transgene silencing, precluding their use in this experiment. To address this concern, we switched to a low toxicity GFP expressing transgene from the Chalfie lab, uIs31[Pmec17::GFP]. We found that arf-6 mutations suppressed exophers in this background as effectively as they did in previous mCherry experiments, indicating that our results are not dependent upon the particular transgene marking the touch neurons, or the specific protein they express (Fig 6E).

      2) It is not clear if the CED-1 localization to the exopher is due to CED-1 expression during phagocytosis or is it involved in the extrusion. Perhaps the basal level of CED-1 is important for the extrusion but the strong expression is important for recognition of the exopher.

      In the experiments we performed we used a constitutively expressed hypodermisspecific CED-1::GFP to show localization to exophers, so the recruitment of CED1::GFP in hypodermal membranes to the site where the neighboring neuron is producing an exopher is not caused by changes in expression, but rather is more likely to reflects protein recruitment. We now point this out more explicitly in the text. Added text: “Since the hypodermal CED-1DC::GFP we used is constitutively expressed, we attribute the exopher surrounding CED-1DC::GFP signal to CED-1 recruitment by exopher-surface signals."

      3) While the data with ttr-52 and anoh-1 alleles is compelling, do we know that exophers actually expose PS? Especially since at a certain point, the exopher is still attached to the neuronal soma. Is PS still exposed by exopher in CED-1 background?

      We are also very interested in this. Unfortunately, we have had difficulty obtaining sufficient MFGE8 PS-biosensor expression in the adult to test this question directly.

      4) What is the fate of a neuron that is unable to produce exophers? Could one look at lifespan of ALMR neuron in CED-1, ARF-6 or Sec-10 allele (potentially with specificity to hypodermis)?

      To address this question we measured the function of the mechanosensory touch neurons, using the classic gentle touch response assay in mCherry expressing animals, comparing controls to arf-6 and ced-1 mutants. For both arf-6 and ced-1 alleles, we found reduced response to gentle touch in older adults (Ad10), indicating a deficit in neuronal function. These results are consistent with exopher production maintaining neuronal health into old age, but interpretation is limited since neither ced-1 or arf-6 act specifically in exophergenesis and therefore also affect the animals in additional ways. Currently, there are no known genetic perturbations that act specifically in exophergenesis, so there is no better approach to do the analysis. We had already published similar results in our 2017 Nature paper that first described exophers, showing that gentle touch response is better preserved in a touch neuron HttQ128::CFP strain that produced a touch neuron exopher than in the same mutant background in which the touch neurons that had not produced an exopher.

    1. Author Response

      Reviewer 2 (Public Review):

      The authors’ coarse-grained mathematical model is based upon proteome partitioning constraints. Similar models have been developed in the past, although the authors do an excellent job distinguishing their work. The interdependence among growth rate, growth yield, and carbon transport (together with the comparatively few state variables) makes the proposed model an attractive general framework for predictive metabolic engineering and strain optimization in bio-manufacturing.

      Strengths:

      1) The recognition that the constant biomass concentration (1/beta) can be used to recast the growthrate versus growth yield trade-off in terms of a growth rate versus carbon uptake trade-off (lines 147-155, Eq. 2), and coupling of the growth- and carbon uptake-rates through proteome partitioning, are powerful ideas. They transform the traditional (false) dichotomy of a negative correlation between growth and yield into a feasible space of growth-yield combinations (e.g. Figs 2BC).

      2) The authors calibrate the model for E. coli (BW25113) grown in glycerol/glucose, batch/continuousculture (lines 157-164), then apply the model to an impressive variety of E. coli strains. This is not typically done with semi-mechanistic models and elevates the authors’ approach by implying that their model is sufficiently-general so as to apply across strains, yet sufficiently-constrained so as to provide quantitative predictions.

      Weaknesses:

      1) The tension between generality and constraint leads to some category errors where strain-specific empirical invariants are taken as general strain-independent operating conditions. This happens at least twice: a minor case involving the growth-rate threshold for acetate overflow, and a serious case where the magnitude of the ’housekeeping’ proteome fraction φq is taken to be strain- and condition-independent.

      a) (lines 82-86) The growth-rate threshold for the acetate overflow switch in E. coli was observedin ’studies with a single strain in different conditions’ [i.e. different carbon sources in batch]. The interpretation provided in the references cited (lines 83-84) is that the threshold is a manifestation of a tipping point between carbon uptake rate and the costs of energy generation. The carbon uptake rate is implicitly strain-dependent; there is no reasonable expectation that all strains growing in glucose will be fermenting (or all respiring). The conclusion (line 84) that ’the model predicted no correlation between growth rate and acetate secretion rate in the case of different strains growing in the same environment’ is tautological when the carbon uptake rate (vmc) is used by the authors to distinguish among strains. This error is easily fixed by simply changing the wording, but it serves to illustrate how constraints operating at the strain level can be tacitly (and erroneously) applied at the genus level.

      The emphasis we put on the comparison between batch growth on glucose of different strains vs batch growth in different environments of a single strain may have been misleading. The point we wanted to make was that the occurrence of fermentation (acetate overflow) during fast growth on glucose is not a necessary consequence of intrinsic physical constraints on metabolism, but the consequence of strain-specific regulatory mechanisms. This is demonstrated by the existence of E. coli strains that do not ferment while growing on glucose, but that have essentially the same metabolic capacities as strains that do. When we started this study, we did expect (perhaps naively) that growth on glucose at a high rate necessarily comes with low yield due to the higher relative acetate overflow, that is, the ratio of the acetate secretion and glucose uptake rates (Supplementary Figure 4 in the revised manuscript).

      In the new version of the manuscript, we have modified the analysis of the glucose uptake and acetate secretion data, by plotting them against growth rate and growth yield in separate 2D plots, as suggested by Reviewer 1. This has led to a perspective that is more in line with the comment of this reviewer that the model explores different ways in which a carbon uptake rate can be converted into a growth rate, depending on the selected resource allocation strategy, and that this gives rise to trade-offs between growth rate and growth yield. In the context of this analysis, we do come back to the original point we wanted to make, but phrased differently (and hopefully more clearly this time).

      Changes in manuscript: The comparison between batch growth on glucose of different strains and batch growth on different carbon sources of a single strain is less emphasized. We have rewritten the section and rephrased our claims accordingly throughout the paper (notably in the Abstract, Introduction, and Discussion).

      b) The second example of this strain-genus confusion is more serious, and perhaps is enough to unravel the model. One of the strengths of the current framework is that although there are four degrees of freedom via the proteome allocation parameters, the model is sufficiently-constrained that the behavior can be meaningfully projected onto lower-dimensional observables like growth rate and yield (e.g. Figs 2BC).

      One of the main constraints in the model that allows this meaningful projection is the assumption that the fraction of ’housekeeping’ proteins φq is constant irrespective of strain and growth conditions (line 172) and that these proteins carry flux synthesizing non-protein macromolecules (lines 141-142). Neither of these claims is supported by the references provided.

      The ’housekeeping’ fraction φq was inferred in Scott et al. 2010 (line 172) from a nearly-growthmedium-independent maximum in the RNA/protein ratio under translation limitation of strain MG1655. The magnitude of that intercept is highly strain-dependent and can vary nearly 2-fold, especially in ALE strains. Furthermore, subsequent proteomic data (e.g. Hui et al. 2015 cited by the authors) has clarified that this ’housekeeping’ fraction is, for the most part, composed of growth-rate independent offsets in the metabolic proteins.

      The origin of these offsets is thought to be related to substrate-saturation (Eqs. 1 and 2 of Dourado et al. 2021 cited by the authors) and consequently, these offsets (and by extension most of φq) carry no flux. Substrate saturation is perhaps at the root of the discrepancy in the Fig. 4 fits that necessitates adjustment of the catalytic constants (line 338). It is not correct to say that ’external substrate concentration S is assumed constant’ (bottom p. 25) therefore the catabolic rate vmc is an environment-dependent [i.e. substrate-concentration-independent] parameter. The ’mc’ proteins include carbon uptake and metabolism (e.g. Fig 1, or Table 2) so that intracellular changes in S could arise from strain differences thereby affecting vmc and the magnitude of the ‘housekeeping’ fraction.

      It is not clear to me how the predictive power of the model will be affected by relaxing the constant φq assumption and replacing it with the more justifiable assumption that all metabolic proteins contribute some small fraction to φq based upon substrate saturation.

      The reviewer criticizes two assumptions made in the construction and analysis of the model: (i) the fraction of housekeeping proteins is constant irrespective of strain and growth conditions, and (ii) the housekeeping proteins carry flux because they synthesize macromolecules other than proteins. Below, we summarize how we have tried to clarify these assumptions and which additional work we have performed to build model variants relaxing the assumptions.

      We identified the housekeeping protein category with the Q-sector in the original paper of Scott et al. [13], which was misleading. The Hwa group indeed defines the Q-sector as not carrying flux [7], whereas we do allow this for the housekeeping protein category. Our housekeeping protein category, which we refer to as ”other proteins” or ”residual proteins” (Mu) in the new version of the manuscript, consists of all proteins not labelled as proteins in the categories of ribosomes and translation-affiliated proteins (R), enzymes in central carbon metabolism (Mc), or enzymes in energy metabolism (Mer+Mef). Mu carries flux, because it includes (among other things) the machinery for DNA and RNA synthesis (DNA polymerase, RNA polymerase, ...). When plotting the proteome fraction of this category determined from the data of Schmidt et al. [12], we found that the fraction remains approximately constant over a large range of growth conditions. This motivated the simplifying assumption to keep the proteome fraction for Mu constant in the simulations.

      The reviewer is right, however, that this may not be the case when considering a variety of E. coli strains growing on glucose, especially the strains resulting from laboratory evolution experiments. We have therefore redone the simulations while allowing the Mu category to vary, by a percentage corresponding to experimentally-observed variations of this category over the range of growth conditions considered by Schmidt et al. [12] (Supplementary Figure 1). In comparison with the original results, the relaxation of this condition enlarges the attainable range of growth rates by about 10%, but the overall shape of the cloud of rate-yield phenotypes remains the same. These new simulation results are shown in the main figures of the revised manuscript.

      In parallel, we have developed a model variant that includes a Q category in the sense of Scott et al., defined by the (growth-rate independent) offsets of the linear relations between growth rate and protein fractions [7]. We have retained an Mu category of other proteins in the model, interpreted as consisting of the growth-rate dependent fraction of other proteins, including the molecular machinery responsible for the synthesis of other macromolecules. Whereas the Mu category carries a flux, this is not the case for the Q category. We have calibrated the model variant from the same data as the original model, and predicted the admissible rate-yield phenotypes. While the cloud of predicted rate-yield phenotypes is slightly displaced in comparison with the reference model, the overall qualitative shape is the same. We explain this robustness by the fact that, despite the different interpretation of the protein categories, the models are structurally very similar and calibrated from data for the same reference strain. This gives rise to different values of the catalytic constants, which compensate for the differences in protein concentrations. Note that more data are needed for the calibration of the model with the Q category, because it requires estimation of the growth-rate-independent proteome fraction for all individual protein categories. In particular, in addition to carbon limitation, conditions of nitrogen and sulfur limitation are necessary [7]. In the absence of such data, additional assumptions need to be made, as we have explained in the new version of the manuscript.

      We could not find a discussion of the relation between substrate saturation and growth-rate independent offsets in proteomics data in the paper by Dourado et al. [2]. In the revised version of the manuscript, however, we have exploited their idea to compare substrate saturation for different predicted and observed rate-yield phenotypes. As a prerequisite, this has required a refinement of the estimation of the half-saturation constants during model calibration, for which we have used the dataset of Km values collected by Dourado et al. [2]. The finding that high-rate, high-yield growth comes with high substrate saturation, indicating an efficient utilization of proteomic resources, has been given more emphasis in the revised manuscript. Note that each resource allocation strategy will give rise to a different concentration of metabolites, and therefore to a different level of substrate saturation of the enzymes.

      The reviewer is right that the phrase ”the external substrate concentration S is assumed constant” is not correct for batch growth, although it approximately holds for continuous growth in a chemostat. In the case of balanced growth in batch, the external substrate concentration S is much higher than the half-saturation constant ), so that the kinetic equation for the macroreaction can be approximated by vmc = mc es, where es = kmc. In the revised manuscript, we have explicitly distinguished between these two situations (batch and continuous growth). Note that S is not the intracellular, but the extracellular concentration of substrate.

      Changes in manuscript: We have better explained the meaning of the residual protein category Mu and corrected the misleading identification of this category with the Q-sector of Scott et al. [13] in the section Coarse-grained model with coupled carbon and energy fluxes and in Appendix 1. In new subsections of Appendix 1 and Appendix 2, we discuss the construction and calibration of a model variant with an additional growth-rate independent protein category corresponding to the Q-sector of Scott et al.. In the Discussion, we explain that the rate-yield predictions obtained from this model and the reference model are essentially the same, indicating the robustness of the model predictions.

      We have redone all simulations using a resource allocation parameter for the housekeeping protein fraction Mu that is allowed to vary within experimentally-determined bounds (Coarsegrained model with coupled carbon and energy fluxes and Methods). The bounds are determined from the data of Schmidt et al. [12], as shown in the new Supplementary Figure 1. These simulations also include refined estimates for the half-saturation constants in the metabolic macroreactions.

      In the final Results section, Resource allocation strategies enabling fast and efficient growth of Escherichia coli, we develop the point that higher saturation of enzymes and ribosomes is key to high-rate, high-yield growth of E. coli, in agreement with observations from other recent studies [2, 5, 9]. In Appendix 1, we emphasize that S is the extracellular substrate concentration and we distinguish between simplifications of vmc for batch and continuous growth.

    1. Author Response

      Reviewer #1 (Public Review):

      Castelán-Sánchez et al. analyzed SARS-CoV-2 genomes from Mexico collected between February 2020 and November 2021. This period spans three major spikes in daily COVID-19 cases in Mexico and the rise of three distinct variants of concern (VOCs; B.1.1.7, P.1., and B.1.617.2). The authors perform careful phylogenetic analyses of these three VOCs, as well as two other lineages that rose to substantial frequency in Mexico, focusing on identifying periods of cryptic transmission (before the lineage was first detected) and introductions to and from the neighboring United States. The figures are well presented and described, and the results add to our understanding of SARS-CoV-2 in Mexico. However, I have some concerns and questions about sampling that could affect the results and conclusions. The authors do not provide any details on the distribution of samples across the various Mexican States, making it hard to evaluate several key conclusions. Although this information is provided in Supplementary Data 2, it is not presented in a way that enables the reader to evaluate if lineages were truly predominant in certain regions of the country, or if these results are attributable purely to sampling bias. Specifically, each lineage is said to be dominant in a particular state or region, but it was not clear to me if sampling across states was even at all-time points. For example, the authors state that most B.1.1.7 genome sampling is from the state of Chihuahua, but it is not clear if this was due to more sequenced samples from that region during the time that B.1.1.7 was circulating, or if the effects of B.1.1.7 were truly differential across the country. The authors do mention sequencing biases several times, but need to be more specific about the nature of this bias and how it could affect their conclusions. It is surprising to see in this manuscript that the B.1.1.7 lineage did not rise above 25% prevalence in the data presented, despite its rapid rise in prevalence in many other parts of the world. This calls into question if the presented frequencies of each lineage are truly representative of what was circulating in Mexico at the time, especially since the coordinated sampling and surveillance program across Mexico did not start until May 2021.

      We thank the reviewer for the constructive comments. We recognize the need to better explain how the sequencing efforts in the country were set up and carried out, and this has now been clarified throughout the main text (L43-51, L95-105). A new figure comparing the overall cumulative proportion of genomes generated per state between 2020-2021 is now available as Supplementary Figure 1 c. The cumulative proportion of genomes sampled across states per lineage of interest, and corresponding to the period of circulation of the given lineage, were originally provided as maps in Figures 2-4. This has been further clarified in the Results section and in the corresponding figure legends. We also now provide additional maps representing the geographic distribution of the clades identified per lineage, integrating in the figures the information previously available in Supplementary Data 2, Supplementary Figures 4 and 5. As a note, for our analyses, we used the total cumulative genome data available from the country (and not only that generated by CoViGen-Mex, representing one third of the SARS-CoV-2 genomes from Mexico). This is expected to improve any sampling biases related to the scheme adopted by CoViGenMex, and is now clearly stated in the main text.

      However, we believe that there has been a misunderstanding related to the genome sampling scheme adopted by CoViGen-Mex, as ‘coordinated sampling and surveillance program across Mexico did not start until May 2021’. Although it is true that further improvements were implemented after this date (enabling genome sampling and sequencing to become more homogenous across the country), the overall virus genome sequencing in Mexico was already sufficient from February 2021. This is represented by the cumulative number of viral genomes sequenced throughout 2020-2021 (both by CoViGen-Mex and other contributing institutions) correlating to the number of cases officially reported in the country during this time (see Supplementary Figure 1 a). This has now been clarified in the Results section (L94-105). Therefore, we hold that “SARS-CoV-2 sequencing in Mexico has been sufficient to explore the spatial and temporal frequency of viral lineages across national territory, and now to further investigate the number of lineage-specific introduction events, and to characterize the extension and geographic distribution of associated transmission chains, as we present in this study” (L102-105). In this context, “a more homogenous sampling across the country is unlikely to impact our main findings, but could i) help pinpoint additional clades we are currently unable to detect, ii) provide further details on the geographic distribution of clades across other regions of the country, and iii) deliver a higher resolution for the viral spread reconstructions we present” (discussed in L466-470).

      For the B.1.1.7 lineage in Mexico, we have clarified the issue raised as follows: “during its circulation period, most B.1.1.7 genomes from Mexico were generated from the state of Chihuahua, with these representing the earliest B.1.1.7-assigned genomes from the country. However, our phylodynamic analysis revealed that only a small proportion of these grouped within a larger clade denoting an extended transmission chain (C2a), with the rest falling within minor clusters, or representing singleton events. Relative to other states, Chihuahua generated an overall lower proportion of viral genomes throughout 2020-2021. Thus, more viral genomes sequenced from a particular state does not necessarily translate into more well-supported clades denoting extended transmission chains, whilst the geographic distribution of clades is somewhat independent to the genome sampling across the country.” (L202-211). Again, these observations are supported by a sufficient overall genome sampling from Mexico.

      We would further like to make clear that “our results confirm that the B.1.1.7 lineage reached an overall lower sampling frequency of up to 25% (relative to other virus lineages circulating in the country), as was noted prior to this study (for example, see Zárate et al. 2022)” (L189-193). As similar observations were independently made for other Latin American countries such as Brazil, Chile, and Peru (some with better genome representation than others, like Brazil https://www.gisaid.org/), it is possible that “the overall epidemiological dynamics of the B.1.1.7 in Latin America may have substantially differed from what was observed in the USA and UK. Such differences could be partly explained by competition between cocirculating lineages, exemplified in Mexico by the regional co-circulation of B.1.1.7, P.1 and B.1.1.519. Nonetheless, the lack of a representative number of viral genomes for most of these countries prevents exploring such hypothesis at a larger scale, and further highlights the need to strengthen genomic epidemiology-based surveillance across the region” (now discussed in L372-379). We hope the reviewer considers that the issues raised have now been resolved.

      Reviewer #2 (Public Review):

      The authors use a series of subsampling methods based on phylogenetic placement and geographic setting, informed by human movement data to control for differences in sampling of SARS-CoV-2 genomes across countries. Of note, the authors show that 2 variants likely arose in Mexico and spread via multiple introductions globally, while other variant waves were driven by repeat introductions into Mexico from elsewhere. Finally, they use human mobility data to assess the impact of movement on transmission within Mexico. Overall, the study is well done and provides nice data on an under-studied country. The authors take a thoughtful approach to subsampling and provide a very thorough analysis. Because of the care given to subsampling and the great challenge that proper subsampling represents for the field of phylodynamics, the paper would benefit from a more thorough exploration of how their migration-informed subsampling procedure impacts their results. This would not only help strengthen the findings of the paper, but would likely provide a useful reference for others doing similar studies. Additionally, I would suggest the authors provide a bit more discussion of this subsampling approach and how it may be useful to others in the discussion section of the paper.

      We thank the reviewer for the constructive comments, and appreciate the recognition of our sub-sampling scheme as a valuable tool with potential application in other studies. We acknowledge the need for a ‘more thorough exploration and discussion of how a different migration-informed subsampling approach could impact our results’. To address this issue, “we further sought to validate our migration-informed genome subsampling scheme (applied to B.1.617.2+, representing the best sampled lineage in Mexico). For this, an independent dataset was built using a different migration sub-sampling approach, comprising all countries represented by B.1.617.2+ sequences deposited in GISAID (available up to November 30th 2021). In order to compare the number of introduction events, the new dataset was analysed independently under a time-scaled DTA (as described in Methods Section 4).” (L517-524). In the new dataset, <100 genome sequences from the USA were retained for further analysis (Supplementary Figure 2b), compared to approximately 2000 ‘USA’ genome sequences included in the original B.1.617.2+ alignment. Thus, we expected a lower number of inferred introduction events into Mexico, as an undersampling of viral genome sequences from the USA is likely to result in ‘Mexico’ clades not fully segregating (particularly impacting C5d).

      Our original results revealed a minimum number of 142 introduction events into Mexico (95% HPD interval = [125-148]), with 6 clades identified as denoting extended transmission chains. The DTA results derived from the new dataset (subsampling all countries) revealed a minimum number of 84 introduction events into Mexico (95% HPD interval = [81-87]), with again 6 major clades identified. Thus, a significantly lower number of introduction events into Mexico were inferred, as was expected. On the other hand, the number of clades identified were consistent between both datasets, supporting for the robustness of our phylogenetic methodological approach. However, in the new dataset, we observe that C5d displayed a reduced diversity (represented by the AY.113 and AY.100 genomes from Mexico, but excluded the B.1.617.2 genome sampled from the USA). This highlights the relevance of our genome sub-sampling using migration data as a proxy.

      In further agreement with these observations, publicly available data on global human mobility (https://migration-demography-tools.jrc.ec.europa.eu/data- hub/index.html?state=5d6005b30045242cabd750a2) shows that migration into Mexico is mostly represented by movements from the USA, followed by Indonesia, Guatemala, Belize and Colombia and Belize. However, the volume of movements from the USA into Mexico is much higher (up to 6 orders of magnitude above the volumes recorded into Mexico from any other country).

      Given time constraints related to performing additional analyses, we decided to exclude the subsampling scheme for ‘top ten countries’ suggested by the reviewer. However, we consider that the results derived from the comparison between the original and the new dataset (top-5 vs all countries) is sufficient to support for our migration-informed subsampling approach. A full description of the methodology and the result obtained, as well as a short discussion, is now available as Supplementary Text 2, and Supplementary Figure 2b and 2c. We hope the reviewer considers that the issues raised has been addressed.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to identify the relationship between social touch experiences and the endogenous release of oxytocin and cortisol. Female participants who received a touch from their romantic partner before a stranger exhibited a blunted hormonal response compared to when the stranger was the first toucher, suggesting that social touch history and context influence subsequent touch experiences. Concurrent fMRI recordings identified key brain networks whose activity corresponded to hormonal changes and self-report.

      The strengths of the manuscript are in the power achieved by collecting multi-faceted metrics: plasma hormones across time, BOLD signal, and self-report. The experiment was cleverly designed and nicely counterbalanced. Data analysis was thorough and statistically sophisticated, making the findings and conclusions convincing.

      This work sheds new light on potential mechanisms underlying how humans place social experiences in context, demonstrating how oxytocin and cortisol might interact to modulate higher-level processing and contextualizing of familiar vs. stranger encounters.

      Thank you very much for this generous evaluation of the study.

      Reviewer #2 (Public Review):

      To test how oxytocin impacts the brain and the psychological, neural, and hormonal response to touch, the authors tested human females during two counterbalanced fMRI sessions wherein females were stroked on the arm or the palm, by a real-world romantic partner or a stranger, while blood levels of oxytocin and cortisol were collected at multiple time points.

      This combination of measures, and the number of hypotheses that could be tested with them, is remarkable - virtually unheard of. This impressive, difficult, and more ecological design than is typical for the field is a major strength of the study, which allowed the authors to test many important hypotheses concurrently and to show contextual effects that could not otherwise be observed. The only potential drawback perhaps is that with such a large design, including many measures, the authors produced so many significant interactions and results that it could be hard for the casual reader to appreciate the importance of each.

      The authors supported their hypothesis that oxytocin effects are context-sensitive, as they found a key interaction wherein experiencing the partner first increased oxytocin for the partner relative to when they came first the OT levels were low but then increased if they were preceded by the partner (excepting one timepoint). Cortisol responses (which reflect hormonal stress) were also higher when the stranger came first than when he was preceded by the partner). In addition, touch was experienced more positively on the arm than on the palm, supporting the role of c-fibers in conveying specifically felt responses to warm, tender touch.

      These data indicate significant context sensitivity with real-world implications. For example, experiencing warm touch on the arm can make us more receptive to other people in subsequent encounters. Conversely, when strangers try to approach and get close to us "out of the blue" people experience this as stressful, which reduces the pleasantness of the interaction and may reduce trust in the moment...perhaps even subsequently.

      This research is critical to the basic science of neurohormonal modulation, given that most of this research occurs in rodents or in simplified studies in humans, usually through intranasal oxytocin administration with unclear impacts on circulating levels in the brain and blood. Oxytocin in particular has suffered from oversimplification as the "love drug" - wherein people assume that it always renders people more loving and trusting. The reality is more complex, as they showed, and these demonstrations are needed to clarify for the field and the public that neurohormones adaptively shift with the context, location, and identity of the social partner in an adaptive way. These results also help us understand the many null effects of oxytocin on trusting strangers in human neuroeconomic studies. In a modern world that is characterized by significant loneliness, interactions with strangers and outsiders, and touch-free digital interactions, our ability to understand the human need for genuine social contact and how it impacts our response to outsiders (welcomed in versus a source of stress) is critical to human health and the wellbeing of individuals and society.

      Thank you very much for this nice summary of the study and its implications.

      As you pointed out, the design was ambitious and involved a broad range of measures and levels of hypothesis-testing. This presented challenges in reporting the results. In this paper we have tried to provide interpretation of the basic results, such as that social encounters (even in the scanner environment) are sufficient to evoke changes in endogenous oxytocin levels over the course of the experimental session, and that various interactions arise due to an influence of contextual factors such as the familiarity of the person and the recent social interaction history. For the more complex results, such as the nature of relationships between BOLD signal change and the degree of change in individuals’ plasma oxytocin levels, we have tried to outline provisional interpretations.

      We hope that the picture will gradually become more filled-in by work from ours and others’ labs—maybe these findings and interpretations will look very different in a few years’ time. We consider this study a starting point for future research into the dynamics and function of human endogenous oxytocin.

      Reviewer #3 (Public Review):

      In an ambitious, multimodal effort, Handlin, Novembre et al. investigated how the endogenous release of oxytocin and cortisol as well as functional brain activity are modulated by social touch under different contextual circumstances (e.g. palm vs. arm touch, stranger vs. partner touch) in neurotypical female participants.

      Using serial sampling of plasma hormone levels in blood during concurrent functional MRI neuroimaging, the authors show that the familiarity of the interactant during social touch not only impacts current hormonal levels but also subsequent hormonal responses in a successive touch interaction. Specifically, endogenous oxytocin levels are significantly heightened (and cortisol levels dampened) during touch from a romantic partner compared to touch from an unfamiliar stranger, at least during the first touch interaction. During the second touch interaction, however, oxytocin levels plummeted when being touched by a stranger following partner touch (although a recovery was made), whereas the normally elevated oxytocin responses to partner touch were dampened when following stranger touch. These results are paralleled by similar familiarity- and order-related effects in neural regions involving the hypothalamus, dorsal raphe, and precuneus.

      However, an important distinction to be made is that, although a significant main effect of familiarity was encountered in several brain regions when taking peak plasma oxytocin levels into account, subsequent t-tests showed no activation differences in the BOLD response between partner and stranger touch within the same subjects. Significant interaction maps seem thus mainly driven by between-subject effects at the different time points, which is arguably due to differences between subjects in their initial calibration of neural/hormonal responses, and not session-to-session changes within the same subjects.

      A similar comment can be made for the reported covariance between (changes in) maximal oxytocin levels and (changes in) BOLD activity for the hypothalamus.

      In an effort to delineate the complex cascade of responses induced by afferent tactile stimulation, the authors report an exploratory regression analysis to identify BOLD activation that precedes the pattern of serial plasma changes in oxytocin levels (looking backwards; i.e. implying changes in brain activation drive changes in hormonal plasma levels). Although the authors are appropriately modest about the significance of the encountered effects, additional control analyses could bring further clarifications about the temporal (e.g., can similar covariations also be found when looking forward) and hormonal specificity (e.g. can similar findings be found for cortisol-variations) of the encountered results. Nevertheless, despite the 'dynamically' covarying relationships between BOLD and max plasma oxytocin levels (i.e. dynamic as in the sense across conditions, not across timepoints), claims about the directionality of this effect (i.e. 'hormonal neuromodulation' vs. 'neural modulation of hormonal levels') remain speculative.

      A particular strength of this study is the employment of a "female-first" strategy since experimental data concerning endogenous oxytocin levels in women are sparse. Adequate control analyses are reported to take potential variability due to differences in contraception and phase in the hormonal cycle into account.

      Thank you for your attentive reading of the study, and for raising several very important points.

      You are right that the BOLD activation maps showing interactions between the change in OT levels and other factors (familiarity, order) reflect differences between subjects in the two runs of the experiment. The effect of familiarity emerged from the full model for the whole group (all participants, whether they started with partner or stranger), as an interaction between the partner/stranger factor and the change in OT. As you point out, this reflects interindividual-level covariation between OT changes and BOLD changes. For example, individuals showing greater OT increase were also more likely to show higher BOLD in certain clusters during partner compared to stranger touch. Similarly, the partner vs stranger contrast showing hypothalamus and Raphe reflects greater OT-BOLD covariance in the stranger first compared to the partner fist groups: in the stranger first group, BOLD was greater the lower the mean OT was across individuals.

      The t-tests with OT as covariate further indicate that the interaction was driven by group differences in the second run. As you point out, within groups (partner or stranger first), there was no significant change in the OT-BOLD covariance from the first to the second run, though these relationships were different between groups. We agree with you that this lack of difference in within-group OT-BOLD covariance from the first to the second run is likely because responses in the first run biased responses in the second run—but in different ways depending on whether the partner or the stranger was presented first. Both groups did show a meaningful correlation in mean OT levels between the first and the second run (we have now included this information in the paper).

      In general, we agree that it is very important to make clear that, as in many covariation/correlation effects in fMRI studies, the effects are driven by interindividual differences for a given covariant relationship, rather than the within-subject BOLD response increasing or decreasing.

      We also agree that it is not possible to determine the direction of modulation from these results. The creation of the temporal OT regressor as “backward-looking” was informed by evidence from animal models for central-to-peripheral effects from hypothalamus to pituitary to bloodstream. We assumed this directionality in the analysis. Given the exploratory nature of this regressor, “looking forward” from temporal OT sample patterns to BOLD patterns with different time intervals would be an equally valid approach. It could reveal activation related to any systematic influence of peripheral OT levels on cortical responses. As the premise of the temporal OT regressor analysis in the present study was any assumed central-to-peripheral modulation, we have kept this as the focus but will explore any specific peripheral-to-central covariation in future work.

      We believe that the full causal picture is likely to involve bidirectional modulation: a modulatory loop (or even loops) in which peripheral and central changes influence one another. Unfortunately, it is difficult to address such temporal feedback with the poor time resolution of fMRI.

    1. Author Response

      Reviewer #1 (Public Review):

      This is one of the most careful analyses of sexual dimorphism in dinosaurs, based on a remarkable assemblage of 61 ornithomimosaur fossils from the Early Cretaceous of western France. The dimorphism is expressed in variations in the shaft curvature and the distal epiphysis width, analysed appropriately here and plausible because these are the kinds of morphological features that vary between males and females among birds and crocodilians, among others.

      In the Introduction, it is right to highlight the shortage of convincing cases of demonstrated sexual dimorphism (SD) in dinosaurs. But note the points made by Hone, Saitta and others that SD can exist in many species today without major morphological differences, making it hard to demonstrate in fossils with such types of dimorphism. Also, some proposed statistical tests to ensure that SD has been convincingly demonstrated in fossils are so stringent they would be hard ever to pass (requiring enormous and constant morphological distinctiveness). In other words, we are conditioned not to find SD in dinosaurs, and yet may be massively under-reporting it because of preservation difficulties (of course) but also because of some overly rigorous demands for proof. These issues help argue that the current study is especially valuable because the data set is large (itself a rarity), and 3D bone shape analysis and proper statistical testing have been applied.

      We are grateful that Reviewer 1 raised this point regarding the occurrence of many subtle sexual dimorphism among modern populations, and added a sentence in the introduction, to further emphasize the importance of a large dataset composed of coeval organisms.

      It's interesting the dinosaur example shows the same two dimorphic traits (femoral obliquity = bicondylar angle; width of distal epiphysis = bicondylar breadth) seen in mammals (MS, lines 117-123), where the femur angle may vary because of the need for broader hips in the female to accommodate the birth canal, and yet dinosaurs laid eggs. These are small dinosaurs, so perhaps their eggs were relatively large in proportion to body size. Perhaps the authors could comment on this. There is some discussion with regard to modern birds at MS lines 187-199.

      We agree with comments from Reviewer 1 and we raise the question of egg possibly constraining the pelvic and proximal hindlimb morphology from line 170 to 189 and how it relates to modern archosaurs from line 189 to 202. We also originally intended to discuss how the Kiwi hindlimb morphology accommodates large eggs, but no significant dimorphism was demonstrated in the pelvic and hindlimb morphology of this bird.

    1. Author Response

      Reviewer #2 (Public review):

      Ansari et al. describe a web-based software for the design of guide RNA (gRNA) sequences and primers for CRISPR-Cas-based identification of single nucleotide variants (SNVs). The use of CRISPR-Cas to rapidly identify specific mutations in both cancer and infection is an evolving field with good potential to play a role in future research and diagnostics.

      The software described by Ansari et al. is easy to use for the design of guide RNAs. The most important question is how good the gRNAs that the software suggests are. As such, the manuscript would benefit from better describing the parameters used for the gRNA design as well as including more validation experiments. Clearly, the scope of the manuscript is not about developing different detection methods, but I would argue that performing more wet lab experiments is needed to support the usability of the software.

      We thank the reviewer for taking interest in this manuscript and raising an important point about increasing the number of targets for our wet lab experiments. To address this, we have tried to include more supporting data in the updated version of the manuscript.

      Reviewer #3 (Public review):

      This manuscript by Ansari and coworkers describes CriSNPr, a tool for designing gRNAs for CRISPR-based diagnostics for SNP detection. CriSNPr allows one to design assays to detect human and SARS-CoV-2 mutations, positioning the mismatches for optimal detection based on results from the literature. Designs can be generated for six different CRISPR effector proteins. The authors test their approach by designing assays to detect a single SNV using three different CRISPR effectors. A strength of the manuscript is that the method does appear to work, at least for the E484K mutation, for multiple CRISPR effector proteins.

      The weaknesses of this manuscript are the lack of data demonstrating that the method works. There is only one very small experimental demonstration using a single mutation (Figure 4) and some very high-level analyses using two SNP/SNV databases (Figure 5). The authors do not provide any data to answer any basic questions about how well their designs work, how fast and easy it is to run their method, or which designs are predicted to work better than others. These weaknesses ultimately limit the impact of the work on the field, as it is not clear what the benefits of using the author's approach are versus simply applying the rules for the individual CRISPR effector proteins outlined in Figure 1 of the manuscript.

      We thank the reviewer for taking interest in this manuscript and appreciate the constructive feedback and suggestions. In the new version of this paper, we've added more data to back up other SNVs with different CRISPR systems and the CriSNPr pipeline for sgRNA design. Even in these datasets, we see that for particular SNVs, the choice of the CRISPR system used might affect the sensitivity of detecting the mutation (Figures 5 and 6). This would be a huge task to do again for multiple targets and targeting systems, which is outside the scope of this study. Importantly, such large datasets are currently missing for the different CRISPRDx systems since we have not come across studies where users have comparatively determined the best methodology for their assay. In our opinion, criSNPr gives users this opportunity by providing a unified platform, and our validation assays show how this can be done in a relatively fast manner.

      A stand-alone version of the server is made available for download at https://github.com/asgarhussain/CriSNPr to increase its speed and accessibility for the end user.

      Addressing the point of determining which crRNAs work best for a given assay requires a large amount of data on target SNPs for individual Cas systems, which is currently scarce. In the current version of CriSNPr, we have considered prioritizing crRNA mismatch-sensitive positions based on original published studies. For example, for AaCas12b, mismatch positions are ranked as follows: 1&4 > 1&5 > 4&11 > 4&16 > 5&8 > 5&11 > 16&19. Similarly, crRNA mismatch-sensitive positions for individual Cas systems (as shown in Figure 1) have been used to prioritize crRNAs. Improving on these design principles further would require studying the biology of individual Cas:DNA/RNA interactions, which is beyond the scope of this study. However, in the updated version of the CriSNPr, we attempted to improve the scoring algorithm by taking into account off-targets for a crRNA design, and priority is given to the combinatorial positions with the fewest off-targets as well as the weightage of their efficacy.

    1. Author Response:

      We would like to thank both reviewers and editors for their time and effort in reviewing our work, and the thoughtful suggestions made.

      Reviewer #1 (Public Review):

      […] The experiments are well-designed and carefully conducted. The conclusions of this work are in general well supported by the data. There are a couple of points that need to be addressed or tested.

      1) It is unclear how LC phasic stimulation used in this study gates cortical plasticity without altering cellular responses (at least at the calcium imaging level). As the authors mentioned that Polack et al 2013 showed a significant effect of NE blockers in membrane potential and firing rate in V1 layer2/3 neurons during locomotion, it would be useful to test the effect of LC silencing (coupled to mismatch training) on both cellular response and cortical plasticity or applying NE antagonists in V1 in addition to LC optical stimulation. The latter experiment will also address which neuromodulator mediates plasticity, given that LC could co-release other modulators such as dopamine (Takeuchi et al. 2016 and Kempadoo et al. 2016). LC silencing experiment would establish a causal effect more convincingly than the activation experiment.

      Regarding the question of how phasic stimulation could alter plasticity without affecting the response sizes or activity in general, we believe there are possibilities supported by previous literature. It has been shown that catecholamines can gate plasticity by acting on eligibility traces at synapses (He et al., 2015; Hong et al., 2022). In addition, all catecholamine receptors are metabotropic and influence intracellular signaling cascades, e.g., via adenylyl cyclase and phospholipases. Catecholamines can gate LTP and LTD via these signaling pathways in vitro (Seol et al., 2007). Both of these influences on plasticity at the molecular level do not necessitate or predict an effect on calcium activity levels. We will expand on this in the discussion of the revised manuscript.

      While a loss of function experiment could add additional corroborating evidence that LC output is required for the plasticity seen, we did not perform loss-of-function experiments for three reasons:

      1. The effects of artificial activity changes around physiological set point are likely not linear for increases and decreases. The problem with a loss of function experiment here is that neuromodulators like noradrenaline affect general aspects neuronal function. This is apparent in Polack et al., 2013: during the pharmacological blocking experiment, the membrane hyperpolarizes, membrane variance becomes very low, and the cells are effectively silenced (Figure 7 of (Polack et al., 2013)), demonstrating an immediate impact on neuronal function when noradrenaline receptor activation is presumably taken below physiological/waking levels. In light of this, if we reduce LC output/noradrenergic receptor activation and find that plasticity is prevented, this could be the result of a direct influence on the plasticity process, or, the result of a disruption of another aspect of neuronal function, like synaptic transmission or spiking. We would therefore challenge the reviewer’s statement that a loss-of-function experiment would establish a causal effect more convincingly than the gain-of-function experiment that we performed.

      2. The loss-of-function experiment is technically more difficult both in implementation and interpretation. Control mice show no sign of plasticity in locomotion modulation index (LMI) on the 10-minute timescale (Figure 4J), thus we would not expect to see any effect when blocking plasticity in this experiment. We would need to use dark-rearing and coupled-training of mice in the VR across development to elicit the relevant plasticity ((Attinger et al., 2017); manuscript Figure 5). We would then need to silence LC activity across days of VR experience to prevent the expected physiological levels of plasticity. Applying NE antagonists in V1 over the entire period of development seems very difficult. This would leave optogenetically silencing axons locally, which in addition to the problems of doing this acutely (Mahn et al., 2016; Raimondo et al., 2012), has not been demonstrated to work chronically over the duration of weeks. Thus, a negative result in this experiment will be difficult to interpret, and likely uninformative: We will not be able to distinguish whether the experimental approach did not work, or whether local LC silencing does nothing to plasticity.

        Note that pharmacologically blocking noradrenaline receptors during LC stimulation in the plasticity experiment is also particularly challenging: they would need to be blocked throughout the entire 15 minute duration of the experiment with no changes in concentration of antagonist between the ‘before’ and ‘after’ phases, since the block itself is likely to affect the response size, as seen in Polack et al., 2013, creating a confound for plasticity-related changes in response size. Thus, we make no claim about which particular neuromodulator released by the LC is causing the plasticity.

      3. There are several loss-of-function experiments reported in the literature using different developmental plasticity paradigms alongside pharmacological or genetic knockout approaches. These experiments show that chronic suppression of noradrenergic receptor activity prevents ocular dominance plasticity and auditory plasticity (Kasamatsu and Pettigrew, 1976; Shepard et al., 2015). Almost absent from the literature, however, are convincing gain-of-function plasticity experiments.

      Overall, we feel that loss-of-function experiments may be a possible direction for future work but, given the technical difficulty and – in our opinion – limited benefit that these experiments, would provide in light of the evidence already provided for the claims we make, we have chosen not to perform these experiments at this time. Note that we already discuss some of the problems with loss-of-function experiments in the discussion.

      2) The cortical responses to NE often exhibit an inverted U-curve, with higher or lower doses of NE showing more inhibitory effects. It is unclear how responses induced by optical LC stimulation compare or interact with the physiological activation of the LC during the mismatch. Since the authors only used one frequency stimulation pattern, some discussion or additional tests with a frequency range would be helpful.

      This is correct, we do not know how the artificial activation of LC axons relates to physiological activation, e.g. under mismatch. The stimulation strength is intrinsically consistent in our study in the sense that the stimulation level to test for changes in neuronal activity is similar to that used to probe for plasticity effects. We suspect that the artificial activation results in much stronger LC activity than seen during mismatch responses, given that no sign of the plasticity in LMI seen in high ChrimsonR occurs in low ChrimsonR or control mice (Figure 4J). Note, that our conclusions do not rely on the assumption that the stimulation is matched to physiological levels of activation during the visuomotor mismatches that we assayed. The hypothesis that we put forward is that increasing levels of activation of the LC (reflecting increasing rates or amplitude of prediction errors across the brain) will result in increased levels of plasticity. We know that LC axons can reach levels of activity far higher than that seen during visuomotor mismatches, for instance during air puff responses, which constitute a form of positive prediction error (unexpected tactile input) (Figures 2C and S1C).  The visuomotor mismatches used in this study were only used to demonstrate that LC activity is consistent with prediction error signaling. We will expand on these points in the discussion as suggested.

      Reviewer #2 (Public Review):

      […] The study provides very compelling data on a timely and fascinating topic in neuroscience. The authors carefully designed experiments and corresponding controls to exclude any confounding factors in the interpretation of neuronal activity in LC axons and cortical neurons. The quality of the data and the rigor of the analysis are important strengths of the study. I believe this study will have an important contribution to the field of system neuroscience by shedding new light on the role of a key neuromodulator. The results provide strong support for the claims of the study. However, I also believe that some results could have been strengthened by providing additional analyses and experimental controls. These points are discussed below.

      Calcium signals in LC axons tend to respond with pupil dilation, air puffs, and locomotion as the authors reported. A more quantitative analysis such as a GLM model could help understand the relative contribution (and temporal relationship) of these variables in explaining calcium signals. This could also help compare signals obtained in the sensory and motor cortical domains. Indeed, the comparison in Figure 2 seems a bit incomplete since only "posterior versus anterior" comparisons have been performed and not within-group comparisons. I believe it is hard to properly assess differences or similarities between calcium signal amplitude measured in different mice and cranial windows as they are subject to important variability (caused by different levels of viral expression for instance). The authors should at the very least provide a full statistical comparison between/within groups through a GLM model that would provide a more systematic quantification.

      We will implement an improved analysis in the revised version of the manuscript.

      Previous studies using stimulations of the locus coeruleus or local iontophoresis of norepinephrine in sensory cortices have shown robust responses modulations (see McBurney-Lin et al., 2019, https://doi.org/10.1016/j.neubiorev.2019.06.009 for a review). The weak modulations observed in this study seem at odds with these reports. Given that the density of ChrimsonR-expressing axons varies across mice and that there are no direct measurements of their activation (besides pupil dilation), it is difficult to appreciate how they impact the local network. How does the density of ChrimsonR-expressing axons compare to the actual density of LC axons in V1? The authors could further discuss this point.

      In terms of estimating the percentage of cortical axons labelled based on our axon density measurements: we refer to cortical LC axonal immunostaining in the literature to make this comparison. In motor cortex, an average axon density of 0.07 µm/µm2 has been reported (Yin et al., 2021), and 0.09 µm/µm2 in prefrontal cortex (Sakakibara et al., 2021). Density of LC axons varies by cortical area, with higher density in motor cortex and medial areas than sensory areas (Agster et al., 2013): V1 axon density is roughly 70% of that in cingulate cortex (adjacent to motor and prefrontal cortices) (Nomura et al., 2014). So, we approximate a maximum average axon density in V1 of approximately 0.056 µm/µm2. Because these published measurements were made from images taken of tissue volumes with larger z-depth (~ 10 µm) than our reported measurements (~ 1 µm), they appear much larger than the ranges reported in our manuscript (0.002 to 0.007 µm/µm2). We repeated the measurements in our data using images of volumes with 10 µm z-depth, and find that the percentage axons labelled in our study in high ChrimsonR-expressing mice ranges between 0.012 to 0.039 µm/µm2. This corresponds to between 20% to 70% of the density we would expect based on previous work. Note that this is a potentially significant underestimate, and therefore should be used as a lower bound: analyses in the literature use images from immunostaining, where the signal to background ratio is very high. In contrast, we did not transcardially perfuse our mice leading to significant background (especially in the pia/L1, where axon density is high - (Agster et al., 2013; Nomura et al., 2014)), and the intensity of the tdTomato is not especially high. We therefore are likely missing some narrow, dim, and superficial fibers in our analysis.

      We also can quantify how our variance in axonal labelling affects our results: For the dataset in Figure 3, there doesn’t appear to be any correlation between the level of expression and the effect of stimulating the axons on the mismatch or visual flow responses for each animal (Figure R1: https://imgur.com/gallery/Yl60hnT), while there is a significant correlation between the level of expression and the pupil dilation, consistent with the dataset shown in Figure 4. Thus, even in the most highly expressing mice, there is no clear effect on average response size at the level of the population. We will add these correlations to the revised manuscript.

      To our knowledge, there has not yet been any similar experiment reported utilizing local LC axonal optogenetic stimulation while recording cortical responses, so when comparing our results to those in the literature, there are several important methodological differences to keep in mind. The vast majority of the work demonstrating an effect of LC output/noradrenaline on responses in the cortex has been done using unit recordings, and while results are mixed, these have most often demonstrated a suppressive effect on spontaneous and/or evoked activity in the cortex (McBurney-Lin et al., 2019). In contrast to these studies, we do not see a major effect of LC stimulation either on baseline or evoked calcium activity (Figure 3), and, if anything, we see a minor potentiation of transient visual flow onset responses (see also Figure R2). There could be several reasons why our stimulation does not have the same effect as these older studies:

      1. Recording location: Unit recordings are often very biased toward highly active neurons (Margrie et al., 2002) and deeper layers of the cortex, while we are imaging from layer 2/3 – a layer notorious for sparse activity. In one of the few papers to record from superficial layers, it was been demonstrated that deeper layers in V1 are affected differently by LC stimulation methods compared to more superficial ones (Sato et al., 1989), with suppression more common in superficial layers. Thus, some differences between our results and those in the majority of the literature could simply be due to recording depth and the sampling bias of unit recordings.

      2. Stimulation method: Most previous studies have manipulated LC output/noradrenaline levels by either iontophoretically applying noradrenergic receptor agonists, or by electrically stimulating the LC. Arguably, even though our optogenetic stimulation is still artificial, it represents a more physiologically relevant activation compared to iontophoresis, since the LC releases a number of neuromodulators including dopamine, and these will be released in a more physiological manner in the spatial domain and in terms of neuromodulator concentration. Electrical stimulation of the LC as used by previous studies differs from our optogenetic method in that LC axons will be stimulated across much wider regions of the brain (affecting both the cortex and many of its inputs), and it is not clear whether the cause of cortical response changes is in cortex or subcortical. In addition, electrical LC stimulation is not cell type specific.

      3. Temporal features of stimulation: Few previous studies had the same level of temporal control over manipulating LC output that we had using optogenetics. Given that electrical stimulation generates electrical artifacts, coincident stimulation during the stimulus was not used in previous studies. Instead, the LC is often repeatedly or tonically stimulated, sometimes for many seconds, prior to the stimulus being presented. Iontophoresis also does not have the same temporal specificity and will lead to tonically raised receptor activity over a time course determined by washout times.

      4. State specificity: Most previous studies have been performed under anesthesia – which is known to impact noradrenaline levels and LC activity (Müller et al., 2011). Thus, the acute effects of LC stimulation are likely not comparable between anesthesia and in the awake animal.

      Due to these differences, it is hard to infer why our results differ compared to other papers. The study with the most similar methodology to ours is (Vazey et al., 2018), which used optogenetic stimulation directly into the mouse LC while recording spiking in deep layers of the somatosensory cortex with extracellular electrodes. Like us, they found that phasic optogenetic stimulation alone did not alter baseline spiking activity (Figure 2F of Vazey et al., 2018), and they found that in layers 5 and 6, short latency transient responses to foot touch were potentiated and recruited by simultaneous LC stimulation. While this finding appears more overt than the small modulations we see, it is qualitatively not so dissimilar from our finding that transient responses appear to be slightly potentiated when visual flow begins (Figure R2). Differences in the degree of the effect may be due to differences in the layers recorded, the proportion of the LC recruited, or the fact anesthesia was used in Vazey et al., 2018.

      Note that we only used one set of stimulation parameters for optogenetic stimulation, and it is always possible that using different parameters would result in different effects. We will add a discussion on the topic to the revised manuscript.

      In the analysis performed in Figure 3, it seems that red light stimulations used to drive ChrimsonR also have an indirect impact on V1 neurons through the retina. Indeed, figure 3D shows a similar response profile for ChrimsonR and control with calcium signals increasing at laser onset (ON response) and offset (OFF response). With that in mind, it is hard to interpret the results shown in Figure 3E-F without seeing the average calcium time course for Control mice. Are the responses following visual flow caused by LC activation or additional visual inputs? The authors should provide additional information to clarify this result.

      This is a good point. When we plot the average difference between the stimulus response alone and the optogenetic stimulation + stimulus response, we do indeed find that there is a transient increase in response at the visual flow onset (and the offset of mismatch, which is where visual flow resumes), and this is only seen in ChrimsonR-expressing mice (Figure R2: https://imgur.com/gallery/cqN2Khd). We therefore believe that these enhanced transients at visual flow onset could be due to the effect of ChrimsonR stimulation, and indeed previous studies have shown that LC stimulation can reduce the onset latency and latency jitter of afferent-evoked activity (Devilbiss and Waterhouse, 2004; Lecas, 2004), an effect which could mediate the differences we see. We will add this analysis to the revised manuscript.

      Some aspects of the described plasticity process remained unanswered. It is not clear over which time scale the locomotion modulation index changes and how many optogenetic stimulations are necessary or sufficient to saturate this index. Some of these questions could be addressed with the dataset of Figure 3 by measuring this index over different epochs of the imaging session (from early to late) to estimate the dynamics of the ongoing plasticity process (in comparison to control mice). Also, is there any behavioural consequence of plasticity/update of functional representation in V1? If plasticity gated by repeated LC activations reproduced visuomotor responses observed in mice that were exposed to visual stimulation only in the virtual environment, then I would expect to see a change in the locomotion behaviour (such as a change in speed distribution) as a result of the repeated LC stimulation. This would provide more compelling evidence for changes in internal models for visuomotor coupling in relation to its behavioural relevance. An experiment that could confirm the existence of the LC-gated learning process would be to change the gain of the visuomotor coupling and see if mice adapt faster with LC optogenetic activation compared to control mice with no ChrimsonR expression. Authors should discuss how they imagine the behavioural manifestation of this artificially-induced learning process in V1.

      Regarding the question of plasticity time course: Unfortunately, owing to the paradigm used in Figure 3, the time course of the plasticity will not be quantifiable from this experiment. This is because in the first 10 minutes, the mouse is in closed loop visuomotor VR experience, undergoing optogenetic stimulation (this is the time period in which we record mismatches). We then shift to the open loop session to quantify the effect of optogenetic stimulation on visual flow responses. Since the plasticity is presumably happening during the closed loop phase, and we have no read-out of the plasticity during this phase (we do not have uncoupled visual flow onsets to quantify LMI in closed loop), it is not possible to track the plasticity over time.

      Regarding the behavioral relevance of the plasticity: The type of plasticity we describe here is consistent with predictive, visuomotor plasticity in the form of a learned suppression of responses to self-generated visual feedback during movement. Intuitive purposes of this type of plasticity would be 1) to enable better detection of external moving objects by suppressing the predictable (and therefore redundant) self-generated visual motion and 2) to better detect changes in the geometry of the world (near objects have a larger visuomotor gain that far objects). In our paradigm, we have no intuitive read-out of the mouse’s perception of these things, and it is not clear to us that they would be reflected in locomotion speed, which does not differ between groups (manuscript Figure S5). Instead, we would need to turn to other paradigms for a clear behavioral read-out of predictive forms of sensorimotor learning: for instance, sensorimotor learning paradigms in the VR (such as those used in (Heindorf et al., 2018; Leinweber et al., 2017)), or novel paradigms that reinforce the mouse for detecting changes in the gain of the VR, or moving objects in the VR, using LC stimulation during the learning phase to assess if this improves acquisition. This is certainly a direction for future work. In the case of a positive effect, however, the link between the precise form of plasticity we quantify in this manuscript and the effect on the behavior would remain indirect, so we see this as beyond the scope of the manuscript. We will add a discussion on this topic to the revised manuscript.

      Finally, control mice used as a comparison to mice expressing ChrimsonR in Figure 3 were not injected with a control viral vector expressing a fluorescent protein alone. Although it is unlikely that the procedure of injection could cause the results observed, it would have been a better control for the interpretation of the results.

      We agree that this indeed would have been a better control. However, we believe that this is fortunately not a major problem for the interpretation of our results for two reasons:

      1. The control and ChrimsonR expressing mice do not show major differences in the effect of optogenetic LC stimulation at the level of the calcium responses for all results in Figure 3, with the exception of the locomotion modulation indices (Figure 3I). Therefore, in terms of response size, there is no major effect compared to control animals that could be caused by the injection procedure, apart from marginally increased transient responses to visual flow onset – and, as the reviewer notes, it is difficult to see how the injection procedure would cause this effect.

      2. The effect on locomotion modulation index (Figure 3I) was replicated with another set of mice in Figure 4C, for which we did have a form of injected control (‘Low ChrimsonR’), which did not show the same plasticity in locomotion modulation index (Figure 4E). We therefore know that at least the injection itself is not resulting in the plasticity effect seen.

      References:

      • Agster, K.L., Mejias-Aponte, C.A., Clark, B.D., Waterhouse, B.D., 2013. Evidence for a regional specificity in the density and distribution of noradrenergic varicosities in rat cortex. Journal of Comparative Neurology 521, 2195–2207. https://doi.org/10.1002/cne.23270

      • Attinger, A., Wang, B., Keller, G.B., 2017. Visuomotor Coupling Shapes the Functional Development of Mouse Visual Cortex. Cell 169, 1291-1302.e14. https://doi.org/10.1016/j.cell.2017.05.023

      • Devilbiss, D.M., Waterhouse, B.D., 2004. The Effects of Tonic Locus Ceruleus Output on Sensory-Evoked Responses of Ventral Posterior Medial Thalamic and Barrel Field Cortical Neurons in the Awake Rat. J. Neurosci. 24, 10773–10785. https://doi.org/10.1523/JNEUROSCI.1573-04.2004

      • He, K., Huertas, M., Hong, S.Z., Tie, X., Hell, J.W., Shouval, H., Kirkwood, A., 2015. Distinct Eligibility Traces for LTP and LTD in Cortical Synapses. Neuron 88, 528–538. https://doi.org/10.1016/j.neuron.2015.09.037

      • Heindorf, M., Arber, S., Keller, G.B., 2018. Mouse Motor Cortex Coordinates the Behavioral Response to Unpredicted Sensory Feedback. Neuron 0. https://doi.org/10.1016/j.neuron.2018.07.046

      • Hong, S.Z., Mesik, L., Grossman, C.D., Cohen, J.Y., Lee, B., Severin, D., Lee, H.-K., Hell, J.W., Kirkwood, A., 2022. Norepinephrine potentiates and serotonin depresses visual cortical responses by transforming eligibility traces. Nat Commun 13, 3202. https://doi.org/10.1038/s41467-022-30827-1

      • Kasamatsu, T., Pettigrew, J.D., 1976. Depletion of brain catecholamines: failure of ocular dominance shift after monocular occlusion in kittens. Science 194, 206–209. https://doi.org/10.1126/science.959850

      • Lecas, J.-C., 2004. Locus coeruleus activation shortens synaptic drive while decreasing spike latency and jitter in sensorimotor cortex. Implications for neuronal integration. European Journal of Neuroscience 19, 2519–2530. https://doi.org/10.1111/j.0953-816X.2004.03341.x

      • Leinweber, M., Ward, D.R., Sobczak, J.M., Attinger, A., Keller, G.B., 2017. A Sensorimotor Circuit in Mouse Cortex for Visual Flow Predictions. Neuron 95, 1420-1432.e5. https://doi.org/10.1016/j.neuron.2017.08.036

      • Mahn, M., Prigge, M., Ron, S., Levy, R., Yizhar, O., 2016. Biophysical constraints of optogenetic inhibition at presynaptic terminals. Nat Neurosci 19, 554–556. https://doi.org/10.1038/nn.4266

      • Margrie, T.W., Brecht, M., Sakmann, B., 2002. In vivo, low-resistance, whole-cell recordings from neurons in the anaesthetized and awake mammalian brain. Pflugers Arch. 444, 491–498. https://doi.org/10.1007/s00424-002-0831-z

      • McBurney-Lin, J., Lu, J., Zuo, Y., Yang, H., 2019. Locus coeruleus-norepinephrine modulation of sensory processing and perception: A focused review. Neurosci Biobehav Rev 105, 190–199. https://doi.org/10.1016/j.neubiorev.2019.06.009

      • Müller, C.P., Pum, M.E., Amato, D., Schüttler, J., Huston, J.P., De Souza Silva, M.A., 2011. The in vivo neurochemistry of the brain during general anesthesia. Journal of Neurochemistry 119, 419–446. https://doi.org/10.1111/j.1471-4159.2011.07445.x

      • Nomura, S., Bouhadana, M., Morel, C., Faure, P., Cauli, B., Lambolez, B., Hepp, R., 2014. Noradrenalin and dopamine receptors both control cAMP-PKA signaling throughout the cerebral cortex. Front Cell Neurosci 8. https://doi.org/10.3389/fncel.2014.00247

      • Polack, P.-O., Friedman, J., Golshani, P., 2013. Cellular mechanisms of brain-state-dependent gain modulation in visual cortex. Nat Neurosci 16, 1331–1339. https://doi.org/10.1038/nn.3464

      • Raimondo, J.V., Kay, L., Ellender, T.J., Akerman, C.J., 2012. Optogenetic silencing strategies differ in their effects on inhibitory synaptic transmission. Nat Neurosci 15, 1102–1104. https://doi.org/10.1038/nn.3143

      • Sakakibara, Y., Hirota, Y., Ibaraki, K., Takei, K., Chikamatsu, S., Tsubokawa, Y., Saito, T., Saido, T.C., Sekiya, M., Iijima, K.M., n.d. Widespread Reduced Density of Noradrenergic Locus Coeruleus Axons in the App Knock-In Mouse Model of Amyloid-β Amyloidosis. J Alzheimers Dis 82, 1513–1530. https://doi.org/10.3233/JAD-210385

      • Sato, H., Fox, K., Daw, N.W., 1989. Effect of electrical stimulation of locus coeruleus on the activity of neurons in the cat visual cortex. Journal of Neurophysiology. https://doi.org/10.1152/jn.1989.62.4.946

      • Seol, G.H., Ziburkus, J., Huang, S., Song, L., Kim, I.T., Takamiya, K., Huganir, R.L., Lee, H.-K., Kirkwood, A., 2007. Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity. Neuron 55, 919–929. https://doi.org/10.1016/j.neuron.2007.08.013

      • Shepard, K.N., Liles, L.C., Weinshenker, D., Liu, R.C., 2015. Norepinephrine is necessary for experience-dependent plasticity in the developing mouse auditory cortex. J Neurosci 35, 2432–2437. https://doi.org/10.1523/JNEUROSCI.0532-14.2015

      • Vazey, E.M., Moorman, D.E., Aston-Jones, G., 2018. Phasic locus coeruleus activity regulates cortical encoding of salience information. Proceedings of the National Academy of Sciences 115, E9439–E9448. https://doi.org/10.1073/pnas.1803716115

      • Yin, X., Jones, N., Yang, J., Asraoui, N., Mathieu, M.-E., Cai, L., Chen, S.X., 2021. Delayed motor learning in a 16p11.2 deletion mouse model of autism is rescued by locus coeruleus activation. Nat Neurosci 24, 646–657. https://doi.org/10.1038/s41593-021-00815-7

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses: The authors do not make a direct link between TOR and REPTOR2 signalling. This seems important since REPTOR2 is a novel gene that arose from the duplication of REPTOR.

      We have added several experiments to strengthen the connection between TOR and REPTOR2, and determined the effect of co-silencing of TOR and REPTOR2 on autophagy and proportion of the winged morph. Please see the details below in your comments point 3.

    1. Author Response

      Reviewer #2 (Public Review):

      This paper has collected an impressive data set of the visual response properties of neurons in the visual layers of the mouse superior colliculus. There are 3 main findings of the study. First, the authors identify 24 functional classes of neurons based on the clustering of each neuron's visual response properties. Second, unlike in the retina where each cell type is regularly spaced, functional classes in the superior colliculus appear to cluster near each other. Third, visual representation has a lower dimensionality in the superior colliculus compared to the retina. The dataset has the potential to support the conclusions of the paper, but further analysis is required to make the claims convincing.

      Strengths:

      The main strength of the paper is its impressive dataset of more than 5000 neurons from the visual layers of the superior colliculus. This data set includes recordings from both an interesting set of genetically labelled classes of cells and from a reasonably large portion of the superior colliculus. This dataset offers the opportunity to support the major claims of the paper. This includes i) the identification of 24 functional classes of neurons, ii) the intriguing possibility that functional classes form local patches within the superior colliculus and iii) that the representation of visual information in the superior colliculus has a lower dimensionality compared to the retina.

      Weaknesses:

      The weakness of the paper is that its main claims are not adequately supported by the presented data or analysis. First, support for the existence of 24 functional classes is not clear enough. Our major concern is that it is not clear that each class of neurons was distributed across different mice. Are certain cell types overrepresented in individual animals, or do you find examples of each cell type in most animals?

      The new Supplementary Figure 7G shows how individual mice contribute to the functional types for all neurons. Further, the new Supplementary Figure 12 shows the receptive field locations derived from recordings in each of the animals.

      In addition, it should be made explicit how the responses of each genetically labeled class of neurons are distributed among the 24 functional clusters.

      We have added a new Figure 5D to show this.

      Second, the analysis of the spatial clustering of functional cell types is not complete. Do the same functional clusters sample the same retinotopic locations in different mice? How are clusters of the functional type distributed in visual space?

      Please see our point-by-point responses below to the concerns.

      Third, the lower dimensionality of representation in the superior colliculus may be the result of selective projections of retinal ganglion cells, not all retinal ganglion cell types project to the superior colliculus. Please estimate the dimensionality of the visual representation of those retinal ganglion cell types that projects to the superior colliculus.

      Certainly part of the dimensionality reduction may come from the incomplete retino-geniculate projection; we have added discussion on this topic.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors describe a one-step genome editing method to replace endogenous EB1 with their previously-developed light-sensitive variant, in order to examine the effect of acute and local optogenetic inactivation of EB1 in human neurons. They then attempt to assess the effects of EB1 inactivation on microtubule growth, F-actin dynamics, and growth cone advance and turning. They also perform these experiments in neurons that are lacking EB3, in order to determine whether EB1 can function in a direct and specific way without possible EB3 redundancy.

      First, the experiments depicting the methodology are rigorous and compelling. Most previous studies of +TIP function use knockout or knockdown studies in which the proteins are inactivated over many hours or days in non-human systems. This is the first study to acutely and locally inactivate a +TIP in human neurons. While this group previously published the effects of replacing endogenous EB1 with the light-sensitive variant, the novelty in this current study is that they use a one-step gene editing replacement method (using CRISPR/Cas9) along with using human neurons derived from iPSCs. After proving their new experimental system works, the authors next seek to test the effect that acutely inactivating EB1 (alongside chronic EB3 knockdown) has on microtubule dynamics, and they observe a marked reduction in MT growth and MT length. They then seek to investigate whether F-actin dynamics are immediately affected by EB1 inactivation.

      While measured F-actin flow rates are not significantly affected, which leads the authors to conclude that EB1 inactivation does not have any immediate effect, the included figures and movies show a different phenotype, which is not discussed. Finally, they examine the effect of EB1 inactivation on growth cone advance and growth cone turning, and find that both are affected. However, the lack of certain controls in these final experiments (specifically for Figures 3, 4, and 5) reduces the strength of their findings.

      Thus, the first part of this paper describing the new methodology is very compelling and should be of interest to a wide readership, while the second part describing the functional analysis is mostly solid, with very high-quality imaging data. However, additional analysis and controls would be needed to increase confidence in their conclusions.

      1) Analysis of F-actin dynamics is not thorough, and their claim is not completely supported by the data. Figure 3 only depicts F-actin dynamics data from growth cones of π-EB1 EB3-/- i3Neurons and does not [include] control growth cones (to compare dark and light conditions). While their conclusion is that F-actin dynamics are not affected, there do appear to be immediate changes in the F-actin images, other than flow rates. For example, the F-actin bundles do not appear to emanate straight out with the light condition, compared to the dark condition. There also appears to be more F-actin intensity in the transition domain of the growth cone, compared to the dark condition. If the reason is due to the effects of four minutes of blue light exposure, this would be made clear by doing this experiment with control growth cones as well.

      In Figure 3, we wanted to specifically test if π-EB1 photoinactivation has an immediate effect on growth cone leading edge actin polymerization (for example because of rapid changes in Rho GTPase activity) by measuring F-actin retrograde flow. Because of photobleaching, these experiments are limited to relatively short time-lapse data sets, and within 4-5 min of blue light exposure, we found no significant difference between the dark and light conditions. As requested by this and another reviewer, we added a few more data points as well as a wild-type control. Statistical analysis by ANOVA shows no difference in retrograde flow between any of the four groups.

      We did not see a consistent difference in overall F-actin organization after a few minutes of blue light, and we now include control and π-EB1 growth cones in Fig. 3 that are more similar to one another with the dark image shown more immediately before blue light exposure. The growth cone that we had in the original figure (and that remains in Video 5 to illustrate retrograde flow and how dynamic these growth cones are) was a poor choice for this figure as it undergoes quite dramatic F-actin reorganization before the blue light is turned on, and the morphology immediately before blue light exposure is much more similar to the growth cone during blue light compared with the -5 min time point that we had originally shown.

      Lastly, the apparent relocalization of F-actin to the growth cone center is seen in both control and experimental conditions and we believe that has to do with photobleaching of the F-actin probe at the relatively high frame rates required to observe retrograde flow. We agree with the reviewer that it is important to know this, and we included a note in the figure legend.

      2) Analysis of the effect of EB1 inactivation on growth cone advance and growth cone turning. Figure 4C, showing the neurite unable to cross the blue light barrier, is potentially quite compelling data, but it would be even more convincing if there were also data showing that the blue light barrier has no effect on a control neurite. Given that a number of previous recent studies have shown a detrimental effect of blue light on neurons, it seems important to include these negative controls in this current study.

      The experiment growing neurites on a micropatterned laminin surface in combination with photoinactivation in (now) Figure 4D is incredibly low throughput but serves to illustrate repeated retraction from blue light over many hours of imaging. To show that blue light barriers do not affect control cells we have instead included a quantification of the retraction response of control and π-EB1 neurites growing randomly on a laminin-coated surface (not micropatterned stripes) in new Fig. 4C. It is also worth noting that the dose of blue light used for π-EB1 photoinactivation is much lower than what is typically used for fluorescence imaging (we analyzed and discussed this in great detail in our original π-EB1 publication), and especially in experiments with a blue light barrier, cells are not exposed to any blue light before they hit the barrier.

      3) This concern also holds true for the final experiment, in which the authors examine whether localized blue light would lead to growth cone turning. The authors report difficulty with performing this technically challenging experiment of accurately targeting the light to only a localized region of the growth cone. Thus, the majority of the growth cones (72%) were completely retracted, and so only a small subset of growth cones showed turning. However, this data would be more compelling if there were also a control condition of blue light with neurons that are not expressing the light-inactivated EB1. Another useful control would be to examine whether precise region-of-interest blue light leads to localized loss of EGFP-Zdk1-EB1C on MT plus-ends within the growth cone, or if the loss extends throughout the growth cone. Either outcome would be helpful to potential readers.

      We modified Fig. 5 to include control i3Neurons in this experiment. We also included a supplement to Fig. 5 showing that π-EB1 photodissociation remains localized to the blue light-exposed region. However, because in our π-EB1 line the C-terminal π-EB1 half is EGFP-tagged, we cannot show before and after images of local π-EB1 photodissociation.

      Reviewer #3 (Public Review):

      The major strength of the study was the approach of using photosensitive protein variants to replace endogenous protein with the 1-step Crispr-based gene editing, which not only allowed acute manipulation of protein function but also mimicked the endogenous targeted protein. However, the same strategy has been used by the same first author previously in dividing cells, somewhat reducing the novelty of the current study. In addition, the results obtained from the study were the same as those from previous studies using different approaches. In other words, the current study only confirmed the known findings without any novel or unexpected results. As a result, the study did not provide strong evidence regarding the advantage of the new experimental approach in our understanding of the function of EB1. Some specific comments are listed below.

      1) In Figure 1, to show that the photosensitive EB1 variant did not affect stem cell properties and their neuronal differentiation, Oct4 staining and western blot of KIF2C and EB3 were not strong evidence. Some new experiments more specifically related to stem cell properties or iPSC-derived neurons are necessary.

      While we did not attempt to fully characterize stemness in our π-EB1 edited i3N lines, we believe, most importantly, we show that π-EB1 i3N hiPSCs differentiate normally into i3Neurons. We show this morphologically as well as by immunoblotting and RT-qPCR experiments looking at marker proteins also including DCX, a well-established neuronal differentiation marker. Although not directly related to stemness, we included one additional RT-qPCR experiment more carefully analyzing the expression level of π-EB1 in the edited lines compared with EB1 in control i3N hiPSCs (new Fig. 1E).

      In addition, the effect of EB1 inactivation on microtubule growth was quantified in stem cells but not in differentiated neurons, which supposed to be the focus of the study.

      Quantification of MT dynamics in the hiPSCs parallels our previous experiments in cancer cell lines to demonstrate that π-EB1 photoinactivation had a similar inhibitory effect on MT growth in interphase cells. This serves as an additional control that our new system works as expected. Because of our inability to efficiently transfect i3Neurons, we could not measure MT growth in i3Neurons with the same method (i.e. automated EB1N tracking). However, as further outlined below we have added a quantification of MT growth rates in i3Neuron growth cones by additional manual tracking of SPY555-tubulin-labelled growth cone MTs after at least one minute of blue light exposure.

      In Figure S2D, quantification is needed to show the effect of blue light-induced EB1 inactivation in growth cones.

      Fig. 1 – supplement 2D (together with Video 3, and Fig. 2A) is simply to illustrate that the C-terminal π-EB1 half dissociates in blue light as expected. We previously characterized the kinetics of π-EB1 photodissociation and do not think redoing this would add substantially to the current manuscript. The remainder of the manuscript, however, examines the functional consequences of π-EB1 photoinactivation in i3Neurons.

      2) In Figure 2, the effect of blue light on microtubule retraction in the control cells was examined, showing little effect. However, it is still unclear if the blue light per se would have any effect on microtubule plus end dynamics, a more sensitive behavior than that of retraction. In Figure 2C, the length of individual microtubules in different growth cones was presented, showing microtubule retraction after blue light. Quantification and statistical analysis are necessary to draw a strong conclusion.

      Figure 2 shows that growth cone MTs in π-EB1 lines shorten in response to blue light and we did this by analyzing MTs that were visible in a short time window before and after blue light exposure. In response to another reviewer’s comment, we have redesigned this figure to better illustrate this result. We have now included statistical analysis comparing relative MT length 20 s before and during blue light exposure. In control cells that was not statistically significantly different. We also report statistical difference between control and π-EB1 lines at the 20 s by ANOVA in the text. Lastly, we also measured MT growth rates after at least one minute of blue light exposure showing that MT growth is greatly attenuated in π-EB1 lines (new Fig. 2D).

      The results showed that EB3 did not seem to contribute to stabilizing microtubules in growth cones. It was discussed that EB3 might have a different function from that of EB1 in the growth cone, although they are markedly up-regulated in neurons. In the differentiated neuronal growth cones examined in the study, does EB3 actually bind to the microtubule plus ends? In the EB3 knockout cells without the blue light, the microtubules were stable, indicating that EB3 had no microtubule stabilization function in these cells. Is such a result consistent with previous studies? If not, some explanation and discussion are needed.

      Other papers have shown that EB3 localizes to growth cone MT ends; for example, in rat cortical neurons (Poobalasingam et al., 2022). We did not test if endogenous EB3 is present on MT ends in i3Neurons, but transfected EB3 certainly is. Interestingly, it was reported by multiple groups that EB1 and EB3 do not bind to the exact same place near MT ends. EB3 trails behind EB1, which would be consistent with functional differences especially in controlling MT growth. We have expanded the discussion of such differences in the text, and thank Phillip Gordon-Weeks, who reminded us of this in a comment on the bioRxiv preprint.

      3) In Figure 3, for the potential roles of EB1 on actin organization and dynamics, only the rates of retrograde flow were measured for 5 min. and no change was observed. However, based on the images presented, it seemed that there was a reduced number of actin bundles after blue light and the actin structure was somewhat disrupted. Some additional examination and measurement of actin organization are necessary to get a clear result.

      This point was also raised by reviewer #1, and we now include images and quantification of retrograde flow in control growth cones and we increased the number of data points. We still see no difference in retrograde flow between all these groups. The original π-EB1 growth cone in Fig. 3A was a poor example because it underwent large morphological changes before the blue light was even turned on and just before light exposure is a lot more like the end point image. We therefore replaced this image with a different growth cone that is more similar to the wild-type growth cone shown, and also show images more immediately before blue light exposure. The bottomline is that we do not see a consistent difference in overall F-actin organization after a few minutes of blue light.

      4) In Figure 4, the effect of blue light and EB1 inactivation on neurite extension need to be quantified in some way, such as the neurite length changes in a fixed time period, and the % of growth cones passing the blue light barrier compared with growth cones of the control cells.

      We have included a statistical comparison (by ANOVA) at the 15 min time point, and a quantification of neurite retraction of growth cones encountering a blue light barrier.

      5) For the quantification of growth cone turning, a control condition is needed to show that blue light itself has no effect on turning.

      We have also added a control experiment to Fig. 5.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The role of increased temperature on immunity and homeostasis in cold-blooded vertebrates is an understudied yet important field. This work not only examines how immunity is impacted by fever, but also incorporates an infection model and examines resolution of the response. This work can serve as a model for other groups interested in the study of hyperthermia and immunity.

      Thank you very much.

      2) Generally speaking, I agree with the authors' strategy and interpretations of the data.

      • In the Introduction, the authors chose to begin with how fever in endotherms impact the immune system. Considering that this work exclusively examines the response of a teleost (goldfish), the authors might consider flipping the way they present this work. After all, cold-blooded vertebrates rely on this response because of their basic physiology.

      We chose to begin with a description of fever in endotherms because we know less about those immune mechanisms impacted by fever in ectotherms. The goal was to provide points of comparison based on published datasets. Indeed, we also expect differences between cold- and warm-blooded vertebrates based on their basic physiologies. However, it is interesting that despite different physiologies and thermoregulatory strategies, common biochemical pathways appear to regulate fever across cold- and warm-blooded vertebrates. This is now captured more clearly in the Introduction section (lines 134-136). Added support also comes from the work that we present in this study, including fever inhibition experiments using ketorolac tromethamine (lines 244-253; Figure 3C).

      3) I thought the set up of the work in figure 1 was innovative and could provide an example of how to study such a problem.

      Thank you. Very much appreciated.

      4) Figure 2 was (to me) unexpected. One would not expect such tight response to hyperthermia and infection. This experiment in and of itself was quite interesting, and worth following up in future experiments (by the authors and other groups).

      The level of homogeneity in the behavioural responses shown in Figure 2 was a big part of why we pursued this work. It was striking that fish would display such consistency in behaviour during the febrile window, regardless of whether they were evaluated in groups or individually. To us, this suggested that the temperature chosen and the kinetics of this thermal preference are central for modulation of downstream biological processes. Added support for the importance of precise thermal selection comes from "failed" experiments during this study where incoming aquatic facility water temperatures fluctuated due to factors outside of our control. This caused temporary disruption to the temperatures available to these fish in the annular thermal preference tank. In these cases, we noted disruption of both classical behaviours shown in Figure 2 as well as downstream benefits.

      • The other work, on the response to infection and the resolution of infection were unique to this paper, and (sorry to be repetitive) can be an example of how to devise such studies.

      Thank you.

      • On the other hand, I am not sure this is a study of "fever." That implies how increased temperature impacts immunity and resolution in endotherms. Perhaps the authors could temper the comparisons between cold- and warm-blooded vertebrates regarding the response to hyperthermia.

      We believe that for those mechanisms that are evolutionarily conserved, the teleost system will offer an opportunity for novel insights into the effects of fever induction and disruption. Indeed, this animal model offers multiple advantages. But we agree that much work remains to establish the extent of this conservation and now highlight this issue more clearly (lines 454-455).

      An additional note on hyperthermia versus fever: although both terms are sometimes used interchangeably in the literature, we make a distinction between them. Hyperthermia captures an increase in core body temperature. However, this alone is not sufficient to engage the CNS (representative results shown in Figure 3-figure supplement 1). Consistent with prior descriptions of fever (e.g. Nat Rev Immunol (2015)15:335-49; Arch Intern Med (1998)158:1870-81), we also show that our model results in CNS engagement (Figure 3A), induces systemic pyrogen release (Figure 3B), triggers classical sickness behaviours (Figure 2), and promotes immune function (Figures 4-7).

    1. Author Response

      Reviewer 1 (Public Review):

      The authors in this manuscript investigate the effect of co-substrate cycling on the metabolic flow. The main finding is that this cycling can limit the flux through a pathway. The authors examine implications of this effect in different simple configurations to highlight the potential impact on metabolic pathways. Overall, the manuscript follows logical steps and is accessible. Once the main point-reduction in flux of a pathway with limited pool of a cycled co-substrate-is established, some of the following steps become expected (e.g. the fraction of the flux in a branched pathway). Nevertheless, it is understandable that the authors have picked a few simple examples of the metabolic network motifs to highlight the implications. The results presented in the manuscript overall support the conclusions. One weakness is that some of the details of the assumptions (e.g. the choices of rates) are not explicitly spelt out in the manuscript. This work is impactful because it brings into light how cycling of some of the intermediates in a pathway can influence metabolic fluxes and dynamics. This is a factor in addition to (and separate from) reaction rates which are often considered as the main driver of metabolic fluxes.

      We thank the reviewer for this accurate summary. Regarding the effect of parameters on the presented results, we note that the first part of the results are based on analytical solutions provided in the Appendix (formerly the SI). These results are given as inequalities comprising parameters, allowing direct evaluation of parameter effects. We have now made this point explicit in the presentation of the results.

      In the second part of the results, we utilise numerical simulations and in this case, the observed results can possibly depend on parameters. We have explored effects of key parameters, that is kin and total substrate concentration through presented 'phase diagram' style figures - see Figure 2 and 4. For additional parameters, we have now included additional simulations exploring their effects - e.g. see Appendix - Figure 11 and Appendix – Figure 13.

      Reviewer 2 (Public Review):

      The cycling of "co-substrates" in metabolic reactions is possibly a very important but often overlooked determinant of metabolic fluxes. To better understand how the turnover dynamics of co-substrates affect metabolic fluxes the authors dissect a few metabolic reaction motifs. While these motifs are necessarily much simpler than real metabolic networks with dozens or hundreds of reactions, they still include important characteristics of the full network but allow for a deeper mathematical analysis. I found this mathematical approach of the manuscript convincing and an important contribution to the field as it provides more intuitive insights how co-substrate cycling could affect metabolic fluxes. In the manuscript, the authors stress particularly how the pool sizes of co-substrates and the enzymes involved in the cycling of those can constrain metabolic fluxes but the presented results also go substantially beyond this statement as the authors further illustrate how turnover characteristics of substrates in branches/coupled reactions can affect the ratio of produced substrates.

      The authors further present an analysis of previously published experimental data (around Figure 3). This is a very nice idea as it can in principle add more direct proof that the cycling of co-substrates is indeed an important constraint shaping fluxes in real metabolic networks and (instead of being merely a theoretical phenomena which occurs only in unphysiological parameter regimes). However, the way currently presented, it remained unclear to which extent the data analysis is adding convincing support that co-cycling substantially constrains metabolic fluxes. Particularly, it remains unclear for which organisms and conditions the used experimental dataset holds, how it has been generated, and with what uncertainty different measured values come. For example, the comparison requires an estimation of v_max. How can these values determined in-vivo? Are (expected) uncertainties sufficiently low to allow for the statement that fluxes are higher than what enzyme kinetics predict? Furthermore, I am wondering to which extent the correlations between co-substrate pool levels and flux is supporting the idea that co-substrate cyling is important. The positive relation between ATP/AMP/ADP levels for example, is a nice observation. However, it remains a correlation which might occur due to many other factors beyond the limitations of cosubstrate cycling and which might change with provided conditions.

      We thank the reviewer for this accurate summary. Although, we would like to clarify that we do not observe nor analyse any relation between ATP/AMP/ADP levels. Rather, in the analysis presented in Fig. 3B-D, we are looking at the relation between fluxes in co-substrate utilising reactions and the pool size of that co-substrate (e.g. total ATP, AMP, and ADP level for reactions utilising any one of these three co-substrates).

      In their summary, the reviewer raises several valid points about the data analysis and its possible limitations. We address them here point by point:

      How are Vmax values gathered/estimated? We have now added more information regarding how the Vmax values were gathered and from which organisms and conditions. Specifically, we used previously published values of Vmax from (Davidi et al. 2016) where it was estimated by multiplying the in vitro determined kcat by the concentration of the enzyme from proteomic measurement under different conditions - all for model organism Escherichia coli. See also below, reply to recommendation 2.

      Are (expected) uncertainties sufficiently low? It is difficult to have an estimate for the uncertainty since much of the error in the previous analysis probably comes from the fact that the kinetic parameters determined in vitro are used to estimate fluxes under in vivo conditions - the main source of error is expected to be this discrepancy, which is hard to estimate. However, since the plot is in log-scale, we highlight only gaps that are more than 1 order of magnitude (dashed diagonal lines) and hopefully the uncertainty is lower than that. Furthermore, high uncertainty would probably contribute equally to over- and under-estimating the maximal flux, while we can clearly see that the flux rarely exceeds the Vmax. We have now included a statement in the revised text capturing this point.

      Correlations offer weak evidence. Unfortunately, as we do not have measurements on co-substrate pool sizes and cycling kinetics under all conditions, our analyses of experimental data from cycling-involving reactions are admittedly limited. However, they do show that (1) measured fluxes are lower than those predicted by kinetics of the primary enzyme (i.e. enzyme involved in co-substrate and substrate conversion) alone, and (2) there is - for some cycling-involving reactions - a correlation between flux and co-substrate pool size. Both observations could indicate co-substrate pool sizes and/or co-substrate cycling dynamics being limiting. As the reviewer points out, we cannot state this as a certainty.

      Other possible limitations include thermodynamic effects, i.e. limitation by the concentration of both substrate or product, or substrate saturation. We already explored the latter possibility and found that there is still a lower flux when taking into account the primary substrate saturation (see Fig. S6). The former effect is very difficult to analyse without more data, as calculating reaction thermodynamics requires knowledge of concentrations for all substrates and products, as well as enzyme Michaelis-Menten constants in both forward and backward directions. This information is currently not available except for few of the reactions among the ones we analysed. Nevertheless, to give as much insight as possible on the thermodynamic effect, we added a new figure (Appendix – Figure 8) where we plot the physiological Gibbs free energy (is calculated assuming that all reactants are at 1 mM and pH=7) against the normalized flux. The plot shows that although in few cases, such as malate dehydrogenase (MDH), the normalised flux seems to be greatly reduced by the thermodynamic barrier, the general picture is that there is little correlation between physiological Gibbs free energy and normalised flux. We have now included the resulting figure and associated discussion in the revised manuscript.

      In relation to all these points on data-based support of the theory, we would also like to point out the comments from reviewer 3 and the fact that our theoretical work provides motivation for further future experimental studies of co-substrate cycling dynamics. Our main analysis about co-substrate dynamics becoming limiting is based on analytical solutions. These solutions provide an inequality of system parameters relating pathway influx, co-substrate pool size, and co-substrate related enzymatic parameters. When this inequality is satisfied, there will be flux limitation due to cosubstrate cycling. Future experimental studies can now be devised to explore this inequality under different conditions by measuring the key parameters more explicitly. This key point and aspects of the above replies are incorporated at the relevant points in the main text. In addition, we have included a new paragraph in the Discussion section (see reply to second recommendation of reviewer 3) and the following paragraph at the end of the Results section:

      In summary, these results show that for reactions involving co-substrate cycling (1) measured fluxes are lower than those predicted by kinetics of the primary enzyme (i.e. enzyme involved in substrate conversion) alone, and (2) there is - for some reactions - a correlation between flux and co-substrate pool size. Both observations could indicate co-substrate pool sizes and/or co-substrate cycling dynamics being a main limiting factor for flux. We can not state this as a certainty, however, as there are possibly other factors acting as the extra limitation, including thermodynamic effects. These points call for further experimental analysis of co-substrate cycling within the study of metabolic system dynamics.

      Reviewer 3 (Public Review):

      In the study, the authors present a mathematical framework and data analysis approach that revisits an "old" idea in cell physiology: The role of co-substrate cycling as potential key determinant of reaction flux limits in enzyme-catalyzed reaction systems. The aim of the study is to identify metabolic network properties that indicate potential global flux regulatory capacities of co-substrate cycling.

      The authors approached this aim in two steps. First, a mathematical framework, which is based on ODEs was developed and which reflects small abstract metabolic pathways including kinetic parameters of the involved reactions. While the modeled pathways are abstract, the considered pathway motifs are motivated by structures of known existing pathways such as glycolysis (as example of a linear pathway) and certain amino acid biosynthesis pathways (as example of branched pathways). The developed ODE-based models were used for steady state analysis and symbolic and numerical simulations of flux dynamics. As a main result of the first step, the authors highlight that co-substrate cycling can act as mechanism which limits specific metabolic fluxes across the metabolic network and that co-substrate cycling can facilitate flux regulation at branching points of the network. Second, the authors re-analyzed data on flux rates (experimental measurements and flux-balance-analysis predictions) from previous publications in order to assess whether the predicted role of co-substrate cycling could explain the observed flux distributions. In this data analysis, the author provide evidence that the fluxes of specific reactions in central metabolism could be constrained by co-substrate cycling, because their observed fluxes are often lower than expected by the kinetics of the corresponding enzymes.

      A particular strength of the study is that the authors highlight that co-substrates are not limited to ATP and NAD(P)H, but could include a range of other metabolites and which could also be organism-specific. Building on this broad definition of cosubstrates, the authors developed an abstract mathematical framework that can be used to study the general potential 'design principle' of co-substrate cycling in cellular metabolism and to adapt the framework to study different co-substrates in specific organisms in future works.

      Experimental data (i.e. measured fluxes using mass-spectrometry data and labeled substrates) that is available to date is limited and therefore also limits the broad evaluation of the developed mathematical framework across various different organisms and environmental conditions. However, with advances in metabolomics and derived metabolic flux measurements, the mathematical framework will serve as a valuable resource to understand the potential role of co-substrate cycling in more biological systems. The framework might also guide new experiments that generate data for a systematic evaluation of when and to what extent co-substrate cycling governs flux distributions, e.g. depending on growth rates or response to environmental stress.

      We thank the reviewer for this accurate summary. We agree with the reviewer's final comments on limitations of current testing of our theory, due to limitations in existing data, and that this analysis will now motivate further experimental study of co-substrate dynamics. We have already included revisions of the manuscripts to further highlight and discuss limitations of the data-based analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      This study investigates the psychological and neurochemical mechanisms of pain relief. To this end, 30 healthy human volunteers participated in an experiment in which tonic heat pain was applied. Three different trial types were applied. In test trials, the volunteers played a wheel of fortune game in which wins and losses resulted in decreases and increases of the stimulation temperature, respectively. In control trials, the same stimuli were applied but the volunteers did not play the game so that stimulation decreases and increases were passively perceived. In neutral trials, no changes of stimulation temperature occurred. The experiment was performed in three conditions in which either a placebo, or a dopamineagonist or an opioid-antagonist was applied before stimulations. The results show that controllability, surprise, and novelty-seeking modulate the perception of pain relief. Moreover, these modulations are influenced by the dopaminergic but not the opioidergic manipulation.

      Strengths

      • The mechanisms of pain relief is a timely and relevant basic science topic with potential clinical implications.

      • The experimental paradigm is innovative and well-designed.

      • The analysis includes advanced assessments of reinforcement learning.

      Weaknesses

      • There is no direct evidence that the opioidergic manipulation has been effective. This weakens the negative findings in the opioid condition and should be directly demonstrated or at least critically discussed.

      We agree that we cannot provide direct evidence on the effectiveness of the opioidergic manipulation in our study. However, previous literature strongly suggests that a dose of 50 mg naltrexone (p.o.) is effective in blocking 𝜇-opioid receptors in humans. Using positron emission tomography, Weerts et al. (2013) found a blockage of 𝜇-opioid receptors of more than 90% with 50 mg naltrexone (p.o.) although given repeatedly 4 days in a row. In addition, convincing effects on behavioral functions have been reported with comparable doses that support the efficacy of the opioidergic manipulation. For example, Chelnokova et al. (2014) found attenuating effects of 50 mg naltrexone (p.o.) on wanting as well as liking of social rewards, implicating the involvement of endogenous opioids in the processing of rewarding stimuli. The same dose was also found to attenuate reward directed effort exerted in a value-based decision-making task (Eikemo et al., 2017). Moreover, 50mg of naltrexone (p.o.) have been shown to reduce endogenous pain inhibition induced by conditioned pain modulation (King et al., 2013) and to reduce the perceived pleasantness of pain relief (Sirucek et al., 2021). Thus, based on the available literature we assume the effectiveness of our opioidergic manipulation. A corresponding reasoning including a note of caution on the of the lack of a direct manipulation check of the opioidergic manipulation can be found in the manuscript in the Discussion:

      “The doses and methods used here are comparable to those used in other contexts which have identified opioidergic effects. Using positron emission tomography, Weerts et al. (2013) found a blockage of opioid receptors of more than 90% by 50 mg of naltrexone (p.o.) in humans given repeatedly over 4 days. In addition, effects on behavioral functions have been reported with comparable doses that support the efficacy of the opioidergic manipulation. Chelnokova et al. (2014) found attenuating effects of 50 mg naltrexone (p.o.) on wanting as well as liking of social rewards, implicating the involvement of endogenous opioids in the processing of rewarding stimuli. The same dose was also found to attenuate reward directed effort exerted in a value-based decision-making task (Eikemo et al., 2017). Moreover, 50 mg of naltrexone (p.o.) have been shown to reduce endogenous pain inhibition induced by conditioned pain modulation (King et al., 2013). Thus, based on the literature we assume that the opioidergic manipulation was effective in this study, although we do not have a direct manipulation check of this pharmacological manipulation. Despite its effectiveness in blocking endogenous opioid receptors, the effect of naltrexone on reward responses was found to be small (Rabiner et al., 2011). Hence, a lack of power may have limited our chances to find such effects in the present study.”

      • The negative findings are exclusively based on the absence of positive findings using frequentist statistics. Bayesian statistics could strengthen the negative findings which are essential for the key message of the paper.

      We agree with the reviewers that the power may not have been sufficient to detect potentially small effects of the pharmacological manipulations. The power calculation was based on the design and the medium effect size found in a previous study using a comparable experimental procedure for assessing pain-reward interactions (Becker et al., 2015). To acknowledge this weakness, we clarified in the manuscript the description of the a priori sample size calculation as follows:

      “The power estimation was based on the design and the finding of a medium effect size in a previous study using a comparable version of the wheel of fortune game without pharmacological interventions (Becker et al., 2015). The a priori sample size calculation for an 80% chance to detect such an effect at a significance level of 𝛼=0.05 yielded a sample size of 28 participants (estimation performed using GPower (Faul et al., 2007 version 3.1) for a repeated-measures ANOVA with a three-level within-subject factor)."

      Further, we did not aim to claim that endogenous opioids do not affect the perception of pain relief. Our phrasing in describing the results was in several instances too bold. The aim of the pharmacological manipulations was to investigate effects of dopamine and endogenous opioids on endogenous modulation of perceived intensity of pain relief. Here, we expected dopamine to enhance such endogenous modulation and naltrexone to reduce this modulation. The higher average pain modulation under naltrexone compared to placebo found in VAS ratings (naltrexone: -10.09, placebo: -7.31, see Table 1) suggests an increase in pain modulation by naltrexone compared to placebo, although this did not reach statistical significance, which is the opposite of what we had expected (see comment #11). Therefore, we concluded that we have no evidence to support our hypothesis of reduced endogenous modulation of pain relief by naltrexone. We do not want to claim that there are no effects of endogenous opioids on pain modulation. Although Bayesian statistics might be used to support such an interpretation, we think this might be misleading in our context here due to the considerations on the lack of power (which also affects null-hypothesis testing in Bayesian statistics) and the lack of a direct manipulation check mentioned above. Since we expected opposite effects of levodopa and naltrexone on pain modulation, we did not intend to compare these effects directly to avoid a distortion of the results. According to our hypotheses, we expected to see increased modulation of pain relief with enhanced dopamine availability and decreased modulation of pain relief with blocking of opioid receptors (see also comment #11). However, we had no a priori assumptions on potential differences in the absolute changes induced by the drug manipulations. Based on these considerations, we did now not include further direct comparisons of the effects of both drugs. Rather, we carefully went through the manuscript to tone down the descriptions and interpretations of our null findings and adjusted the respective section of the discussion to better reflect this interpretation.

      • The effects were found in one (pain intensity ratings) but not the other (behaviorally assessed pain perception) outcome measure. This weakens the findings and should at least be critically discussed.

      We thank the reviewers for highlighting this important aspect. We have considered the two outcome measures as indicative of two different aspects or dimensions of the pain experience, based also on previous results in the literature. Within our procedure, the ratings indicate the momentary perception of the stimulus intensity after phasic changes in nociceptive input (outcomes), while the behavioral measure indicates perceptual within-trial sensitization or habituation in response to the tonic stimulation within each trial. Supporting the assumption of such two different aspects, it has been shown before that pain intensity ratings and behavioral discrimination measures can dissociate (Hölzl et al., 2005). In line with the assumption that both outcome measures assess different aspects of the pain experience, a differential effect of controllability on these two outcome measures is conceivable. Similarly, Becker et al. (2015), using a very similar experimental paradigm, did only find endogenous pain facilitation in the losing condition of the wheel of fortune game in pain ratings but not in the behavioral outcome measure, while they found endogenous inhibition in both measures. Compared to Becker et al. (2015), we implemented here smaller changes in stimulation intensity as outcomes in the wheel of fortune game (-3°C vs -7°C for win trials, +1°C vs +5°C for lose trials), potentially resulting in the differential effects here. Nevertheless, we agree that this reasoning needs a more explicit discussion in the manuscript and we included the following sentences to the Discussion section:

      “Although we did not assess the affective component of the relief experience, we implemented two outcome measures that are assumed to capture independent aspects of the pain experience: VAS ratings indicate perception of phasic changes (outcomes), while the behavioral measure indicates perceptual within-trial sensitization or habituation in response to the tonic stimulation within each trial. We found enhanced endogenous modulation by controllability and unpredictability in the VAS ratings, in line with the view that endogenous modulation enhances behaviorally relevant information. In contrast, the within-trial sensitization did not differ between the active and passive conditions under placebo. In contrast, in a previous study using a similar experimental paradigm Becker et al. (2015) found a reduction of within-trial sensitization after pain relief outcomes by controllability. Compared to this study, we implemented here smaller changes in stimulation intensity as outcomes in the wheel of fortune (-3 °C vs -7 °C for pain relief), potentially explaining the differential results.“

      • The instructions given to the participants should be specified. Moreover, it is essential to demonstrate that the instructions do not yield differences in other factors than controllability (e.g., arousal, distraction) between test and control trials. Otherwise, the main interpretation of a controllability effect is substantially weakened.

      Thanks for pointing out that specific information on instructions given to the participants was missing. We agree that factors other than controllability would confound the interpretation of differences between test and control trials. We aimed minimizing nonspecific effects of arousal and/or distraction while still giving all needed information with our instructions (see below). In addition, control and test trials were kept as similar as possible. In order to check for unspecific effects of arousal and/or distraction, we also included lose trials in the game as an additional control condition. For clarifying participants’ instructions, we added the following paragraph to the Materials and methods section: “The participants were instructed that there were two types of trials: trials in which they could choose a color to bet on the outcome of the wheel of fortune and trials in which they had no choice. Specifically, they were told that in the first type of trials they could use the left and right mouse button, respectively, to choose between the pink and blue section of the wheel of fortune. Participants were further instructed that if the wheel lands on the color they had chosen they will win, i.e. that the stimulation temperature will decrease, while if the wheel lands on the other color, they will lose, i.e. that the stimulation temperature will increase. For the second type of trials, participants were instructed that they could not choose a color, but were to press a black button, and that after the wheel stopped spinning the temperature would by chance either increase, decrease, or remain constant.”

      In general, both arousal and distraction can be assumed to affect pain perception. If the active condition in the wheel of fortune resulted in higher arousal and/or distraction this should result in comparable effects on intensity ratings in both the win and lose outcomes compared to the passive condition. In contrast, controllability is expected to have opposite effects on pain perception in win and lose trials (decreased pain perception after winning and increased pain perception after losing in the active compared to the passive condition). These opposite effects of controllability are tested by the interaction ‘outcome × trial type’ when fitting separate models for each drug condition, which should be zero if unspecific effects of arousal and/or distraction predominated. Instead, we found a significant interaction in these models, confirming opposing effects of controllability in win and lose outcomes and contradicting such unspecific effects. We added this reasoning, marked in red here, to the Results section to better highlight this line of reasoning, as follows:

      “To test whether playing the wheel of fortune induced endogenous pain inhibition by gaining pain relief during active (controllable) decision-making, a test condition in which participants actively engaged in the game and ‘won’ relief of a tonic thermal pain stimulus in the game was compared to a control condition with passive receipt of the same outcomes (Figure 1). As a further comparator the game included an opposite (‘lose’) condition in which participants received increases of the thermal stimulation as punishment. This active loss condition was also matched by a passive condition involving receipt of the same course of nociceptive input. Comparing the effects of active versus passive trials between the pain relief and the pain increase condition (interaction ‘outcome × trial type’) allowed us to test for unspecific effects such as arousal and/or distraction. If effects seen in the active compared to the passive condition were due to such unspecific effects, then actively engaging in the game should affect comparably pain in both win and lose trials. In contrast, if the effects were due to increased controllability, pain inhibition should occur in win trials and pain facilitation in lose trials.”

      • The blinding assessment does not rule out that the volunteers perceived the difference between placebo on the one hand and levodopa/naltrexone on the other hand. It is essential to directly show that the participants were not aware of this difference.

      We based our assessment of blinding on the fact that for none of the drug conditions the frequency of guessing correctly which drug was ingested was above chance (see Results section, page 8, lines 201ff). In addition, the frequency of side effects reported by the participants did not differ between the three drug conditions, supporting this notion indirectly. However, we agree with the reviewer that this does not rule out completely that participants may have perceived a difference between the placebo and the levodopa/naltrexone conditions. We ran additional analyses to test whether participants were more likely to answer correctly that they had ingested an active drug and whether they were more likely to report side effects in the active drug conditions compared to the placebo condition. In 7 out of 28 placebo sessions (25%) the participants assumed incorrectly to have ingested one of the active drugs. In 12 out of 43 drug sessions (21.8%) the participants assumed correctly that they had ingested one of the active drugs. These frequencies did not differ between placebo sessions on the one hand and the levodopa and naltrexone active drug sessions on the other hand (𝜒)(1) = 0.11, p = 0.737). In 9 out of 28 placebo sessions (32.1%) and in 23 out of 55 drug sessions (41.8%) participants reported to be tired at the end of the session. The frequency of reporting tiredness did not significantly differ between placebo sessions on the one hand and drug sessions on the other hand (𝜒)(1) = 1.06, p = 0.304). No other side effects were reported. We added the following information, marked in red here, to the Results section:

      “In 32 out of 83 experimental sessions subjects reported tiredness at the end of the session. However, the frequency did not significantly differ between the three drug conditions (𝜒)(2) = 2.17, p = 0.337) or between the placebo condition compared to the levodopa and naltrexone condition (𝜒)(1) = 1.06, p = 0.304). No other side effects were reported. To ensure that participants were kept blinded throughout the testing, they were asked to report at the end of each testing session whether they thought they received levodopa, naltrexone, placebo, or did not know. In 43 out of 83 sessions that were included in the analysis (52%), participants reported that they did not know which drug they received. In 12 out of 28 sessions (43%), participants were correct in assuming that they had ingested the placebo, in 6 out of 27 sessions (22%) levodopa, and in 2 out of 28 sessions (7%) naltrexone. The amount of correct assumptions differed between the drug conditions (𝜒)(2) = 7.70, p = 0.021). However, posthoc tests revealed that neither in the levodopa nor in the naltrexone condition participants guessed the correct pharmacological manipulation significantly above chance level (p’s > 0.997) and the amount of correct assumptions did not differ significantly between placebo compared to levodopa and naltrexone sessions (𝜒)(1) = 0.11, p = 0.737), suggesting that the blinding was successful.”

      • The effects of novelty seeking have been assessed in the placebo and the levodopa but not in the naltrexone conditions. This should be explained. Assessing novelty seeking effects also in the naltrexone condition might represent a helpful control condition supporting the specificity of the effects in the naltrexone condition.

      We thank the reviewer for this interesting suggestion. Indeed, we did not report the association of pain modulation with novelty seeking in the naltrexone condition, because we did not have an a-priori hypothesis for this relationship. We now included correlations for all three drug conditions, testing if higher novelty seeking was associated with greater perceptual modulation in the active vs. passive condition. In line with comment 3, we applied a correction for multiple comparisons here (Bonferroni-Holm correction). This correction caused the correlation in the placebo condition to be no longer significant with an adjusted p-value of 0.073 (r = -0.412), while the correlation stays significant in the levodopa condition (r = -0.551, p = 0.013). Because of a reasonable effect size of the correlation under placebo (i.e. r = -0.412), we still report this correlation to highlight the increase under levodopa, while emphasizing that this correlation not significant We carefully toned down the interpretation of this correlation to reflected the change in significance with the correction for multiple testing.

      We added the following information, marked in red here, in the Results section:

      “Previous data suggest that endogenous pain inhibition induced by actively winning pain relief is associated with a novelty seeking personality trait: greater individual novelty seeking is associated with greater relief perception (pain inhibition) induced by winning pain relief (Becker et al., 2015). Similar to these results, we found here that endogenous pain modulation, assessed using self-reported pain intensity, induced by winning was associated with participants’ scores on novelty seeking in the NISS questionnaire (Need Inventory of Sensation Seeking; Roth & Hammelstein, 2012; subscale ‘need for stimulation’ (NS)), although this correlation failed to reach statistical significance after correction for multiple comparisons using Bonferroni-Holm method (r = -0.412, p = 0.073). A significant association between novelty seeking and endogenous pain modulation was found in the levodopa condition (r = 0.551, p = 0.013). More importantly, the higher a participants’ novelty seeking score in the NISS questionnaire, the greater the levodopa-related endogenous pain modulation when winning compared to placebo (NISS NS: r = -0.483, p = 0.034 Figure 7). In contrast, higher novelty seeking scores were not correlated with stronger pain modulation induced by winning in the naltrexone condition (r = 0.153, p = 0.381) and the naltrexone induced change in pain modulation showed no significant association with novelty seeking (r = 0.239, p = 0.499). Pain modulation after losing was not associated with novelty seeking in placebo (r = 0.083, p = 0.866), levodopa (r = -0.164, p = 0.783), or naltrexone (r = 0.405, p = 0.133).

      No significant correlations with NISS novelty seeking score were found for behaviorally assessed pain modulation in the placebo, levodopa and naltrexone conditions during pain relief or pain increase (|r|’s < 0.35, p’s > 0.238). Similarly, the difference in pain modulation during pain relief or pain increase between the levodopa and the placebo condition and between the naltrexone and the placebo condition did also not correlate with novelty seeking (|r|’s < 0.22, p’s > 0.576).” <br /> We also edited the interpretation of the correlation in the Discussion:

      “Overall, all three predictions were largely borne out by the data: relief perception as measured by VAS ratings was enhanced by controllability, unpredictability and showed a medium sized - although not significant - association with the individual novelty-seeking tendency,”

      • The writing of the manuscript is sometimes difficult to follow and should be simplified for a general readership. Sections on the information-processing account of endogenous modulation in the introduction (lines 78-93), unpredictability and endogenous pain modulation in the results (lines 278-331) are quite extensive and add comparatively little to the main findings. These sections might be shortened and simplified substantially. Moreover, providing a clearer structure for the discussion by adding subheadings might be helpful.

      We have reworked the manuscript to make it easier to follow. Specifically, we reworked the Introduction section to simplify it and to make it more concise. Further, we also shortened the extensive descriptions of modeling procedures that are not central for understanding the main findings. We think that these additions make it easier to follow the manuscript and our line of arguments, and to understand the applied analysis strategies.

      • Effect sizes are generally small. This should be acknowledged and critically discussed. Moreover, effect sizes are given in the figures but not in the text. They should be included to the text or at least explicitly referred to in the text.

      We agree that the effect sizes we report appear generally small. Importantly, the effect sizes were calculated by dividing differences in marginal means by the pooled standard deviation of the residuals and the random effects to obtain an estimate of the effect size of the underlying population rather than only for our sample. This procedure was used for the purpose of achieving more generalizable estimates. Due to considerable variance between subjects in our sample, this procedure resulted in comparatively small effect sizes. Nevertheless, we think this calculation of effects sizes results in more informative values because they can be viewed as estimates of population effects. We added specific information on the calculation of the effect sizes and a brief explanation that this procedure results in comparatively small effect sizes estimates to the Materials and methods and to the Results section (see below). In addition, we included standardized effect sizes whenever we report the respective post-hoc comparisons in the Results section.

      “Effects sizes were calculated by dividing the difference in marginal means by the pooled standard deviation of the random effects and the residuals providing an estimate for the underlying population (Hedges, 2007).” (Materials and methods section)

      “We used post-hoc comparisons to test direction and significance of differences in either outcome condition and report standardized effect sizes for these differences. Note that all reported effect sizes account for random variation within the sample, providing an estimate for the underlying population; due to considerable variance between participants in the present study, this results in comparatively small effect sizes.” (Results section)

      • The directions of dopamine and opioid effects on pain relief should be discussed.

      We amended our explanation of the hypothesis on the expected drug effects. As outlined there, we indeed expected opposite effects of levodopa and naltrexone on endogenous pain modulation in the active vs. the passive condition of the wheel of fortune.

      Reviewer #2 (Public Review):

      This study used the tonic heat stimulation combined with the probabilistic relief-seeking paradigm (which is a wheel of fortune gambling task) to manipulate the level of controllability and predictability of pain on 30 healthy participants. The authors focused on the influence of controllability and unpredictability on pain relief using pain reports and computational models and examined the involvement of dopamine and opioids in those effects. For that, the authors conducted the three-day experiments, which involved placebo, levodopa (dopamine precursor), and naltrexone (opioid receptor antagonist) administration on separate days. Lastly, the authors examined the relationship between dopamine-induced pain relief and novelty-seeking traits.

      This is a strong and well-performed study on an important topic. The paper is well-written. I really enjoyed reading the introduction and discussion and learned a lot. Below, I have a few minor comments.

      First, given that the Results section comes before the Methods section, it would be helpful to include some method and experimental design-related information crucial for the understanding of the results in the Results section. For example, how long was the thermal stimulus? What was the baseline temperature? etc. Maybe this information can be included in the caption of Figure 1.

      We thank the reviewer for this helpful suggestion. We agree that due to the order of the manuscript sections, more information on experimental design and the statistical analysis strategies should be included in the results section. Accordingly, we included more detailed information on the analysis strategies in the Results section (please see responses to comments #5 & #9). In addition, we added more detailed information on the experimental design and information such as the duration of the stimuli and the baseline temperature, marked in red below, to the caption of Figure 1 (Results section).

      “Figure 1: Time line of one trial with active decision-making (test trials) of the wheel of fortune game. Experimental pain was implemented using contact heat stimulation on capsaicin sensitized skin on the forearm. In each trial, the temperature increased from a baseline of 30 °C to a predetermined stimulation intensity perceived as moderately painful. In each testing session, one of the two colors (pink and blue) of the wheel was associated with a higher chance to win pain relief (counterbalanced across subjects and drug conditions). Pain relief (win) as outcome of the wheel of fortune game (depicted in green) and pain increase (loss; depicted in red) were implemented as phasic changes in stimulation intensity offsetting from the tonic painful stimulation. Based on a probabilistic reward schedule for theses outcomes, participants could learn which color was associated with a better chance to win pain relief. In passive control trials and neutral trials participants did not play the game, but had to press a black button after which the wheel started spinning and landed on a random position with no pointer on the wheel. Trials with active decision-making were matched by passive control trials without decision making but the same nociceptive input (control trials), resulting in the same number of pain increase and pain decrease trials as in the active condition. In neutral trials the temperature did not change during the outcome interval of the wheel. Two outcome measures were implemented in all trial types: i) after the phasic changes during the outcome phase participants rated the perceived momentary intensity of the stimulation on a visual analogue scale (‘VAS intensity’); ii) after this rating, participants had to adjust the temperature to match the sensation they had memorized at the beginning of the trial, i.e. the initial perception of the tonic stimulation intensity (‘self-adjustment of temperature’). This perceptual discrimination task served as a behavioral assessment of pain sensitization and habituation across the course of one trial. One trial lasted approximately 30 s, phasic offsets occurred after approximately 10 s of tonic pain stimulation. Adapted from Becker et al. (2015).”

      Second, it would be helpful if the authors could provide their prior hypotheses on the drug effects. It could be a little bit confusing that the goal of using these drugs given that levodopa is a precursor of dopamine, whereas naltrexone is the opioid antagonist, i.e., the effects on the target neurotransmitters seem the opposite. Then, I wondered if the authors expected to see the opposite effects, e.g., levodopa enhances pain relief, while naltrexone inhibits pain relief, or to see similar effects, e.g., both enhance pain relief. Clarifying which direction of expected effects would be helpful for novice readers.

      We thank the reviewer for pointing out that information on the expected drug effects should be explained in more detail. Indeed, we expected opposite effects of levodopa and naltrexone with respect to the effect of controllability on pain relief. Levodopa, as a precursor of dopamine, enhances dopamine availability and thus, phasic release of dopamine in response to events, for example, the reception of reward. Accordingly, we hypothesized that endogenous modulation by relief outcomes are increased in the active (reward) compared to the passive condition. In contrast, naltrexone blocks opioid receptors and as such it has been reported that naltrexone blocks placebo analgesia as a type of endogenous pain inhibition. Correspondingly, we hypothesized that naltrexone decreases endogenous pain modulation induced by actively winning pain relief compared to the passive condition. We expanded the explanation of these hypotheses in the Introduction section as follows:

      “We expected increased dopamine availability to enhance phasic release of dopamine in response to rewards, and hence, to increase the effect of active compared to passive reception of pain relief. In contrast, we expected the inhibition of endogenous opioid signaling to decrease the effect of active controllability on pain relief. The latter is based on the observation that blocking of opioid receptors attenuates other types of endogenous pain inhibition such as placebo analgesia (Benedetti, 1996; Eippert et al., 2009) or conditioned pain modulation (King et al., 2013). “

      Third, on the "Behaviorally assessed pain perception" results in Figs. 2D-F, I wonder why the results for the "pain increase" were still positive. Were the y values on the plots the temperature that participants adjusted (i.e., against the temperature right before the temperature adjustment)? or are the values showing the differences from the baseline (i.e., against the baseline temperature)?

      The behavioral measure was calculated as the difference in temperatures between the memorization interval at the beginning of the trial (i.e. the predetermined temperature perceived as moderately painful) minus the self-adjusted temperature at the end of the trial so that positive values indicate sensitization (i.e. an increase in sensitivity) and negative values indicate habituation (i.e. a decrease in sensitivity) across the stimulation within on trial (i.e. approx. 30 seconds of stimulation). In general, for a stimulation of approximately 30 seconds with intensities perceived as painful, perceptual sensitization is expected to occur (Kleinböhl et al., 1999).

      The outcome of the wheel of fortune game, i.e. the phasic decrease (winning) or increase (losing) in stimulation intensity, should indeed have opposite effects on this sensitization. A decrease in nociceptive input negatively reinforces pain perception, as seen in stronger sensitization in win trials, while an increase in nociceptive input punishes pain perception, as seen in reduced perceptual sensitization in lose trials. Using the a very similar task, Becker et al. (2015) found values indicating habituation within trials with temperature increases in lose outcomes. However, in this previous study, increases of +5°C were used for lose outcomes (as compared to +1 °C in the present study). Thus, in the present study the comparatively small increase in absolute stimulation temperature may not have been sufficient to induce within trial habituation to the tonic heat pain stimulation.

      Nevertheless, independent of the effect of the outcome (increase or decrease of the stimulation intensity) our focus was on the additional effect that controllability (active vs. passive condition) had on the perception of the underlying tonic stimulation within each outcome condition (i.e. on the same nociceptive input). Here we expected to see endogenous inhibition after winning and endogenous facilitation after losing in the active compared to the passive condition.

      We added more detailed information on the calculation of the behavioral measure and the expected perceptual modulation within each trial due to the stimulus duration in the Methods section as well as in the Results section.

      Methods section:

      “After this rating, participants had to adjust the stimulation temperature themselves to match the temperature they had memorized at the beginning of the trial. This self-adjustment operationalizes a behavioral assessment of perceptual sensitization and habituation within one trial (Becker et al., 2011, 2015; Kleinböhl et al., 1999). Participants adjusted the temperature using the left and right button of the mouse to increase and decrease the stimulation temperature. The behavioral measure was calculated as the difference in temperatures in the memorization interval at the beginning of each trial minus this selfadjusted temperature at the end of each trial. Positive values, i.e. self-adjusted temperatures lower than the stimulation intensity at the beginning of the trial, indicate perceptual sensitization, while negative values indicate habituation.” Results section:

      “Positive values (i.e. lower self-adjusted temperatures compared to the stimulation intensity at the beginning of the trial) indicate perceptual sensitization across the course of one trial of the game, negative values indicate habituation. For tonic stimulation at intensities that are perceived as painful, perceptual sensitization is expected to occur (Kleinböhl et al., 1999). Differences between the outcome conditions (win, lose) reflect the effect of the phasic changes on the perception of the underlying tonic stimulus. Differences between active and passive trials reflect the effect of controllability on this perceptual sensitization within each outcome condition.”

      Lastly, I wonder if it is feasible or not, but examining the effects of dopamine antagonists will be helpful for obtaining a more definitive answer to the role of dopamine in information-related pain relief. This could be a good suggestion for future studies.

      We thank the reviewer for this suggestion. We agree that antagonistic manipulation of the dopaminergic system could provide further insights and confirm the role of dopamine in shaping pain related perception and behavior. Moreover, we think that bidirectional manipulations of opioidergic signaling could also provide valuable insights and should be used for future research. We added the following sentences to the Discussion section:

      “Because the mechanisms underlying learning from pain and pain relief and their recursive influence on pain perception may contribute to the development and maintenance of chronic pain, it is crucial to better understand the roles of dopamine and endogenous opioids in these mechanisms. Accordingly, bidirectional manipulations of both transmitter systems should be used in future studies to better characterize their respective roles in shaping behavior and perception.“

    1. Author Response

      Joint Public review:

      1) Line 215: The authors state that pairing TCRseq with RNAseq reflects the magnitude of TCR signaling. This is absolutely not the case. TCR sequencing does not reflect TCR signaling strength.

      Thanks for the comments and we apologize for the usage of this misleading description. Actually in this part, we were trying to quantitatively assess the activation states of CD8 T cells based on the average expression of previously described activation-related gene signatures1 (also shown in Supplementary file 3). Therefore, TCRseq data was not involved in this analysis and the magnitude of TCR signaling could neither be reflected. We apologize again for this mistake and have corrected the corresponding texts and figures as follows (line 210-217): "Meanwhile, the activation states of CD8 T cell subpopulations were quantitatively assessed based on the average expression of previously described activation-related gene signatures1 (also shown in Supplementary file 3). Our results showed that the T-Tex cluster was the most activated, followed by the two P-Tex clusters (Fig. 2b left). In addition, CD8 T cells in tumor tissues were more activated than those in adjacent normal tissues (Fig. 2b, right top). And no significant difference in T cell activation states was observed between HPV-positive and HPV-negative samples (Fig. 2b right bottom)."

      2) A lot of discussion around "activation" is presented, but there is no evidence to support which genes or gene programs are associated with "activation".

      Thanks for the comments. The activation states of CD8 T cell subpopulations were quantitatively assessed based on the average expression of previously described activation-related gene signatures1 (also shown in Supplementary file 3). More specifically, activation-related gene signatures are as follows: "CD69, CCR7, CD27, BTLA, CD40LG, IL2RA, CD3E, CD47, EOMES, GNLY, GZMA, GZMB, PRF1, IFNG, CD8A, CD8B, CD95L, LAMP1, LAG3, CTLA4, HLA-DRA, TNFRSF4, ICOS, TNFRSF9, TNFRSF18".

      3) Line 249: It is unclear why the authors are indicating that TCRseq was used in pseudotime analysis. This type of analysis does not take TCRs into account but rather looks at the proportion of spliced mRNA of individual genes from the DGE data.

      Thanks for the comments and we apologize for the usage of this misleading description. As acknowledged by the reviewer, pseudotime analysis has nothing to do with TCRseq data. Actually in this part, we separately performed clonality analysis of CD8 T cells based on TCRseq data and pseudotime analysis based on RNAseq data. Shared TCRs were identified among certain cell subclusters, which could partially validate the potential lineage relationships simulated by pseudotime analysis. Therefore, we have corrected the texts as follows to avoid the misunderstanding that TCRseq was used in pseudotime analysis: "Given the clonal accumulation of CD8 T cells was a result of local T cell proliferation and activation in the tumor environment2, we further conducted clonality analysis of CD8 T cells based on TCRseq data. " (line 246-248) and "To further investigate their lineage relationships, we performed pseudotime analysis for CD3+ T cells on the basis of transcriptional similarities (Fig. 3j-l, Figure 3-figure supplementary 2d)." (line 277-279).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors develop and freely disseminate the THINGS-data collection, a large-scale dataset incorporating MRI, MEG, eye-tracking, and 4.7 million similarity ratings for 1,854 object concepts. Demonstrating the reliability of their data, the authors replicate nearly a dozen previous neuroimaging papers. This "big data" approach significantly advances our ability to link behavioral measures with neuroimaging at scale, with the potential to spark future insights into how the mind represents objects.

      I thought that the article was well-written, with a sound methodological approach, high-quality results, and well-supported conclusions. I am overall enthusiastic about this work, and I think THINGS will provide an important benchmark for future big data approaches in cognitive and computational neuroscience.

      However, I thought it was also important to articulate more directly the potential insights this dataset can offer to the field. Although the authors mentioned that they "provided five examples for potential research directions", it was not clear to me what these new research directions were, given that the authors entirely describe replications in the results.

      We thank Reviewer 1 for their positive evaluation and the enthusiasm for our work! We have revised the manuscript to articulate more clearly and directly some potential research directions for the dataset. There are two aspects to consider: What sets these datasets apart from traditional small-scale research? And what sets them apart from other large-scale research? We elaborate on these two aspects in response to specific comments below.

      Reviewer #2 (Public Review):

      Hebart et al., present a large-scale multi-model dataset consisting of fMRI, EEG, and behavioral similarity measures towards the study of object representation in the mind and brain. The effort is immense, the methods are rigorous, and the data are of reasonable quality, the demonstrative analyses are extensive and provocative. (One small note regarding one leg of this multi-modal dataset is that the fMRI design consisted of a single image presentation for 0.5s without repetitions for most of the images; this design choice has particular analysis implications, e.g. the dataset will have more power when leveraging a priori grouping of images. However, unlike other datasets of this kind, here the number of images and how they were selected does support this analysis mode, e.g. multiple exemplars per object concept, and rich accompanying meta-data and behavioral data.)

      The manuscript is well-written, and the THINGs website that lets you explore the datasets is easy to navigate, delivering on the promise of making this an integrated, expanding worldwide initiative. Further, the datasets have clear complementary strengths to recent other large-scale datasets, in terms of the ways that the images were sampled (not to mention being multi-modal)-thus I suspect that the THINGs dataset will be heavily used by the cognitive/computational/neuroscience research community going forward.

      We would like to thank the reviewer for their positive evaluation of our work! We agree that the dataset has more power when leveraging a priori grouping of images, which is specifically the design choice we made here. We also agree that we can better highlight the strength of our dataset with respect to existing datasets regarding multiple exemplars per object concept and the semantic breadth of the included object categories.

      Reviewer #3 (Public Review):

      This manuscript presents a highly valuable dataset with multimodal functional human brain imaging data (fMRI and MEG) as well as behavioural annotations of the stimuli used (thousands of images from the THINGS collection, systematically covering multiple types of concrete nameable objects).

      The manuscript presents details about the dataset, quality control measures, and a careful description of preprocessing choices. The tools and approaches that were used follow the state of the art of the field in human functional brain imaging and I praise the authors for being transparent in their methodological approaches by also sharing their code along with the data. The manuscript also presents a few analyses with the data: 1) multi-dimensional embedding of perceived similarity judgments 2) decoding of neural representations of objects both with fMRI and MEG 3) A replication of findings related to visual size and animacy of objects 4) representation similarity analysis between functional brain data and behavioural ratings 5) MEG-fMRI fusion.

      We thank the reviewer for their overall positive assessment of our work!

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Polyák et al. report detailed and systematic functional, electrocardiographic, electrophysiologic (both in vivo and in vitro experiments) and histological analysis in a large animal (canine) model of exercise to assess risk of ventricular arrhythmia susceptibility. They find that exercise-trained dogs have a slower heart rate (not accounted by heightened vagal tone alone and consistent with recent work from Denmark), an increased ventricular mass and fibrosis, APD lengthening due to repolarisation abnormality, enhanced HCN4 expression and decreased outward potassium channel density together with increased ventricular ectopic beats and ventricular fibrillation susceptibility (open-chest burst pacing). The authors suggest these changes as underlying the risk of VA in athletes, and appropriately caution against consigning the beneficial effects of exercise. In general, this study is well done, reasonably well-written, with reasonable conclusions, supported by the data presented and is much needed. There are some methodological, however, given the paucity of experimental data in this area, I think it would still be additive to the literature.

      Strengths:

      1. This is an area with very limited experimental data- this is an area of need.

      2. The study, in general seems to be well-conducted with two clear groups

      3. The use of a large animal model is appropriate

      4. The study findings, in general, support the authors conclusions

      5. The authors have shown some restraint in their conclusions and the limitations section is detailed and well written.

      Weaknesses:

      1. There are some methodological issues:

      a. Authors should explain what the conditioning protocol was and why it was necessary.

      In order to cause as little discomfort as possible to the animals, we selected animals that were naturally cooperative with the researchers and not afraid of the noise of the treadmill. This selection period lasted about three weeks, during which the animals were not exercised in a formal setting, but familiarized with the experimental setting and walked on the treadmills for a few minutes. During the conditioning period, both control and trained animals were equally handled.

      Following your remarks the corresponding part of the text was extended properly explaining the training protocol in more detail.  

      b. The rationale for the exercise parameters chosen needs to be presented.

      Experimental data on large animal models are very limited. Sled dogs are considered the highest elite of dog exercise. The distances they run are taken as a reference, although this protocol is not exactly the same due to the conditions of training, sledding, and weather. The most widely known races are the Norwegian Finnmarksløp and the Alaskan Iditarod, take place on snow and cover distances ranging from 500–1569 km in a continuous competition lasting for up to 14 days to be completed. (Calogiuri & Weydahl, 2017)

      Based on these data, preliminary experiments were conducted to determine the maximum running time and intensity that dogs can sustain without distress, injuries, or severe fatigue. We increased the intensity of exercise in line with the animals' performance. The detailed training protocol and the daily running distances applied are presented in Table 1. Now, a new figure, Figure 1, and a new table, Table 1, illustrate a detailed experimental timeline in the revised manuscript.

      Reference:

      Calogiuri, G., & Weydahl, A. (2017). Health challenges in long-distance dog sled racing: A systematic review of literature. Int J Circumpolar Health, 76(1), 1396147. https://doi.org/10.1080/22423982.2017.1396147

      c. Open chest VF induction was a limitation, and it was unnecessary.

      d. A more refined VT/VF induction protocol was required. This is a major limitation to this work.

      C, D: Thank you for the reviewer’s comment. For a detailed explanation of the VF induction procedures, please see our responses to question 11 of Reviewer #2.

      e. The concept of RV dysfunction has not been considered in the study and its analysis.

      Thank you for the suggestion. The complexity of our study and the capacity of our laboratory limited the work that could be carried out, but we are planning to perform additional studies involving the RV.

      f. The lack of a quantitative measure for fibrosis is a limitation.

      At the Department of Pathology, there was no opportunity to analyze myocardial fibrosis quantitatively. As described by Mustroph et al., quantitative analysis of fibrosis can be based on appropriate software measuring the amount of fibrotic area per total area on digitized slides. Such software was not available during the evaluation. This is a limitation of the study; however, the semi-quantitative assessment in histology reports is widely accepted in human pathology (Mustroph et al., 2021).

      Reference:

      Mustroph, J., Hupf, J., Baier, M. J., Evert, K., Brochhausen, C., Broeker, K., Meindl, C., Seither, B., Jungbauer, C., Evert, M., Maier, L. S., & Wagner, S. (2021). Cardiac Fibrosis Is a Risk Factor for Severe COVID-19. Front Immunol, 12, 740260. https://doi.org/10.3389/fimmu.2021.740260

      1. Statistical analysis requires further detail (checking of normality of the data/appropriate statistical test).

      Thank you for this comment. This question has been answered in response to question 12 of Reviewer #2 and the statistical part of the methodology in the manuscript has been updated.

      1. The use of Volders et al. study as a corollary in the discussion does not seem justified given that this study used AV block induced changes as an acquired TdP model.

      We agree with the reviewer that the two models involve completely different mechanisms. Therefore, in order to avoid misunderstandings, we have deleted the part of the discussion that made the comparison with the study by Volders et al.(Volders et al., 1998; Volders et al., 1999) Nevertheless, the exercise-induced compensatory adaptive mechanisms of the athlete's heart have been considered as a phenomenon completely distinct from pathological conditions, yet the electrical remodeling observed in our model indicates important similarities with the experimental model of long-term complete AV block. For example, both resulted in profound bradycardia, compensated cardiac hypertrophy, prolonged QTc interval, APD prolongation, and increased spatial and temporal dispersion of repolarization. These changes were attributed to the downregulation of potassium currents and were associated with increased ventricular arrhythmia susceptibility. Therefore, we hypothesized that the mechanisms of increased propensity for ventricular fibrillation in this model may have a similar electrophysiological background to the compensated hypertrophy studies of Volders et al. However, the autonomic changes, the potential impairment of the conduction system of the athlete’s heart, and the electrophysiological background require further, more detailed investigations.

      References:

      Volders, P. G., Sipido, K. R., Vos, M. A., Kulcsar, A., Verduyn, S. C., & Wellens, H. J. (1998). Cellular basis of biventricular hypertrophy and arrhythmogenesis in dogs with chronic complete atrioventricular block and acquired torsade de pointes. Circulation, 98(11), 1136-1147. https://doi.org/10.1161/01.cir.98.11.1136

      Volders, P. G., Sipido, K. R., Vos, M. A., Spatjens, R. L., Leunissen, J. D., Carmeliet, E., & Wellens, H. J. (1999). Downregulation of delayed rectifier K(+) currents in dogs with chronic complete atrioventricular block and acquired torsades de pointes. Circulation, 100(24), 2455-2461. https://doi.org/10.1161/01.cir.100.24.2455

    1. Author Response

      Reviewer #1 (Public Review):

      This article is aimed at constructing a recurrent network model of the population dynamics observed in the monkey primary motor cortex before and during reaching. The authors approach the problem from a representational viewpoint, by (i) focusing on a simple center-out reaching task where each reach is predominantly characterised by its direction, and (ii) using the machinery of continuous attractor models to construct network dynamics capable of holding stable representations of that angle. Importantly, M1 activity in this task exhibits a number of peculiarities that have pushed the authors to develop important methodological innovations which, to me, give the paper most of its appeal. In particular, M1 neurons have dramatically different tuning to reach direction in the movement preparation and execution epochs, and that fact motivated the introduction of a continuous attractor model incorporating (i) two distinct maps of direction selectivity and (ii) distinct degrees of participation of each neuron in each map. I anticipate that such models will become highly relevant as neuroscientists increasingly appreciate the highly heterogeneous, and stable-yet-non-stationary nature of neural representations in the sensory and cognitive domains.

      As far as modelling M1 is concerned, however, the paper could be considerably strengthened by a more thorough comparison between the proposed attractor model and the (few) other existing models of M1 (even if these comparisons are not favourable they will be informative nonetheless). For example, the model of Kao et al (2021) seems to capture all that the present model captures (orthogonality between preparatory and movement-related subspaces, rotational dynamics, tuned thalamic inputs mostly during preparation) but also does well at matching the temporal structure of single-neuron and population responses (shown e.g. through canonical correlation analysis). In particular, it is not clear to me how the symmetric structure of connectivity within each map would enable the production of temporally rich responses as observed in M1. If it doesn't, the model remains interesting, as feedforward connectivity between more than two maps (reflecting the encoding of many more kinematic variables) or other mechanisms (such as proprioceptive feedback) could well explain away the observed temporal complexity of neural responses. Investigating such alternative explanations would of course be beyond the scope of this paper, but it is arguably important for the readers to know where the model stands in the current literature.

      Below is a summary of my view on the main strengths and weaknesses of the paper:

      1) From a theoretical perspective, this is a great paper that makes an interesting use of the multi-map attractor model of Romani & Tsodyks (2010), motivated by the change in angular tuning configuration from the preparatory epoch to the movement execution epoch. Continuous attractor models of angular tuning are often criticised for being implausibly homogeneous/symmetrical; here, the authors address this limitation by incorporating an extra dimension to each map, namely the degree of participation of each neuron (the distribution of which is directly extracted from data). This extension of the classical ring model seems long overdue! Another nice thing is the direct use of data for constraining the model's coupling parameters; specifically, the authors adjust the model's parameters in such a way as to match the temporal evolution of a number of "order parameters" that are explicitly manifested (i.e. observable) in the population recordings.

      I believe the main weakness of this continuous attractor approach is that it - perhaps unduly binarises the configuration of angular tuning. Specifically, it assumes that while angular tuning switches at movement onset, it is otherwise constant within each epoch (preparation and execution). I commend the authors for carefully motivating this in Figure 2 (2e in particular), by showing that the circular variance of the distribution of preferred directions is higher across prep & move than within either prep or move. While this justifies a binary "two-map model" to first order, the analysis nevertheless shows that preferred directions do change, especially within the preparatory epoch. Perhaps the authors could do some bootstrapping to assess whether the observed dispersion of PDs within sub-periods of the delay epoch is within the noise floor imposed by the finite number of trials used to estimate tuning curves. If it is, then this considerably strengthens the model; otherwise, the authors should say that the binarisation reflects an approximation made for analytical tractability, and discuss any important implications.

      We thank the reviewer for the suggested analysis. We have included this new analysis in Fig. S1.

      First of all, in Fig 2e of the previous version of the manuscript, we were considering three time windows during preparation and two time windows during movement execution. We are now using a shorter time window of 160ms, so that we can fit three time windows within either epoch. The results do not change qualitatively, and the results of the bootstrap analysis below do not change based on the definition of this time window.

      The bootstrap analysis is described in detail in the second paragraph of the Methods sections (“Preparatory and movement-related epochs of motion”). The bootstrap distribution is generated by resampling trials with repetitions (and keeping the number of trials per condition the same as in the data), while shuffling the temporal windows in time, within epochs. For example: for condition 1, we have 43 trials in the data. In one trial of the bootstrap distribution for condition 1, each one of the 3 time windows of the delay period is chosen at random (with repetitions) between the possible 43*3 windows from the data. The analysis shows that the median variance of preferred directions from the data is significantly larger than the one from the bootstrap samples.

      This suggests that neurons do change their preferred direction within epochs, but these changes are smaller in magnitude than changes that occur between the epochs. We explicitly comment on this in the methods, and in the main text we point out that considering only two epochs is a simplifying assumption, and as such it can be thought as a first step towards building a more complete model that shows dynamics of tuning within both preparatory and execution epochs. Note, however, that this simple framework is enough for the model to recapitulate to a large extent neuronal activity, both at the level of single-units and at the population level.

      2) While it is great to constrain the model parameters using the data, there is a glaring "issue" here which I believe is both a weakness and a strength of the approach. The model has a lot of freedom in the external inputs, which leads to relatively severe parameter degeneracies. The authors are entirely forthright about this: they even dedicate a whole section to explaining that depending on the way the cost function is set up, the fit can land the model in very different regimes, yielding very different conclusions. The problem is that I eventually could not decide what to make of the paper's main results about the inferred external inputs, and indeed what to make of the main claim of the abstract. It would be great if the authors could discuss these issues more thoroughly than they currently do, and in particular, argue more strongly about the reasons that might lead one to favour the solutions of Fig 6d/g over that of Fig 6a. On the other hand, I see the proposed model as an interesting playground that will probably enable a more thorough investigation of input degeneracies in RNN models. Several research groups are currently grappling with this; in particular, the authors of LFADS (Pandarinath et al, 2018) and other follow-up approaches (e.g. Schimel et al, 2022) make a big deal of being able to use data to simultaneously learn the dynamics of a neural circuit and infer any external inputs that drive those dynamics, but everyone knows that this is a generally ill-posed problem (see also discussion in Malonis et al 2021, which the authors cite). As far as I know, it is not yet clear what form of regularisation/prior might best improve identifiability. While Bachschmid-Romano et al. do not go very far in dissecting this problem, the model they propose is low-dimensional and more amenable to analytical calculations, such that it provided a valuable playground for future work on this topic.

      We agree with the reviewer that the problem of disambiguating between feedforward and recurrent connections from observation of the state of the recurrent units alone is a degenerate problem in general.

      By explicitly looking for solutions that minimize the role of external inputs in driving the dynamics, we argued that the solutions of Fig 4d/g are favorable over the one of Fig 4a because they are based on local computations implemented through shorter range connections compared to incoming connections from upstream areas; as such, they likely require less metabolic energy.

      In the new version of the paper, we discuss this issue more explicitly:

      Degeneracy of solutions. We considered the case where parameters are inferred by minimizing a cost function that equals the reconstruction error only (this corresponds to the case of very large values of the parameter α in the cost function). Figure 4—figure supplement 2 shows that after minimizing the reconstruction error, the cost function is flat in a large region of the order parameters. We also added Figure 5—figure supplement 5, to show that the dynamics of the feedforward network looks almost indistinguishable from the one of the recurrent network (Fig.5) - although the average canonical correlation coefficient is a bit lower for the purely feedforward case.

      Breaking the degeneracy of solutions. We added Figure 4—figure supplement 1 to show that for a wide range of the parameter α, all solutions cluster in a small region of parameter space. Solutions are found both above and below the bifurcation line. Note that all solutions are such that parameters jA and jB are close to the bifurcation line that separate the region where tuned network activity requires tuned external input, and the region where tuned network activity can be sustained autonomously. Furthermore, the weight of recurrent-connections within map B (j_B) is much stronger than the corresponding weight for map A (j_A). Hence, we observe that external inputs play a stronger role in shaping the dynamics during motor preparation than during execution, while recurrent inputs dominate the total inputs during movement execution, for a broad range of values of alpha. This prediction needs to be tested experimentally, although it is in line with the results of ref. 39, as we explain in the Discussion, section “Interplay between external and recurrent currents”, last paragraph.

      3) As an addition to the motor control literature, this paper's main strengths lie in the modelcapturing orthogonality between preparatory and movement-related activity subspaces (Elsayed et al 2016), which few models do. However, one might argue that the model is in fact half hand-crafted for this purpose, and half-tuned to neural data, in such a way that it is almost bound to exhibit the phenomenon. Thus, some form of broader model cross-validation would be nice: what else does the model capture about the data that did not explicitly inspire/determine its construction? As a starting point, I would suggest that the authors apply the type of CCA-based analysis originally performed by Sussillo et al (2015), and compare qualitatively to both Sussillo et al. (2015) and Kao et al (2021). Also, as every recorded monkey M1 neuron can be characterized by its coordinates in the 4-dimensional space of angular tuning, it should be straightforward to identify the closest model neuron; it would be very compelling to show side-by-side comparisons of single-neuron response timecourses in model and monkey (i.e., extend the comparison of Fig S6 to the temporal domain).

      We thank the reviewer for these suggestions. We have added the following comparisons:

      ● A CCA-based analysis (Fig 5.a) shows that the performance of our model is qualitatively comparable to the Sussillo et al. (2015) and Kao et al (2021) at generating realistic motor cortical activity (average canonical correlation ρ = 0.77 during movement preparation and 0.82 during movement execution).

      ● For each of the 141 neurons in the data, we selected the corresponding one in the model that is closest in the eta- and theta- parameters space:

      a) A side-by-side comparison of the time course of responses shows a good qualitative agreement (Fig 5.c).

      b) We successfully trained a linear decoder to read the responses of these 141 neurons from simulations and output trial-averaged EMG activity recorded from a monkey performing the same task Fig 5.b.

      c) Figure 5—figure supplement 4 shows that simulated data presents sequential activity, as does the recorded data.

      In our simulations, the temporal variability in single-neuron responses is due to the temporal evolution of the inferred external inputs, and to noise, implemented by an Ornstein-Uhlenbeck (OU) process that is added to the total inputs. Another source of variability could be introduced in the synaptic connectivity: one could add a gaussian random variable to each synaptic efficacy, for example. We checked that this simple extension of our model is able to reproduce the dynamics of the order parameters seen in the data. A full characterization of this extended model is beyond the scope of our paper.

      4) The paper's clarity could be improved.

      We thank the reviewer for his feedback. We have significantly rewritten most sections of the paper to improve clarity.

      Reviewer #2 (Public Review):

      The authors study M1 cortical recordings in two non-human primates performing straight delayed center-out reaches to one of 8 peripheral targets. They build a model for the data with the goal of investigating the interplay of inferred external inputs and recurrent synaptic connectivity and their contributions to the encoding of preferred movement direction during movement preparation and execution epochs. The model assumes neurons encode movement direction via a cosine tuning that can be different during preparation and execution epochs. As a result, each type of neuron in the model is described with four main properties: their preferred direction in the cosine tuning during preparation (denoted by θ_A) and execution (denoted by θ_B) epochs, and the strength of their encoding of the movement direction during the preparation (denoted by η_A) and execution (denoted by η_B) epochs. The authors assume that a recurrent network that can have different inputs during the preparation and execution epochs has generated the activity in the neurons. In the model, these inputs can both be internal to the network or external. The authors fit the model to real data by optimizing a loss that combines, via a hyperparameter α, the reconstruction of the cosine tunings with a cost to discourage/encourage the use of external inputs to explain the data. They study the solutions that would be obtained for various values of α. The authors conclude that during the preparatory epoch, external inputs seem to be more important for reproducing the neuron's cosine tunings to movement directions, whereas during movement execution external inputs seem to be untuned to movement direction, with the movement direction rather being encoded in the direction-specific recurrent connections in the network.

      Major:

      1) Fundamentally, without actually simultaneously recording the activity of upstream regions, it should not be possible to rule out that the seemingly recurrent connections in the M1 activity are actually due to external inputs to M1. I think it should be acknowledged in the discussion that inferred external inputs here are dependent on assumptions of the model and provide hypotheses to be validated in future experiments that actually record from upstream regions. To convey with an example why I think it is critical to simultaneously record from upstream regions to confirm these conclusions, consider two alternative scenarios: I) The recorded neurons in M1 have some recurrent connections that generate a pattern of activity that is based on the modeling seems to be recurrent. II) The exact same activity has been recorded from the same M1 neurons, but these neurons have absolutely no recurrent connections themselves, and are rather activated via purely feed-forward connections from some upstream region; that upstream region has recurrent connections and is generating the recurrent-like activity that is later echoed in M1. These two scenarios can produce the exact same M1 data, so they should not be distinguishable purely based on the M1 data. To distinguish them, one would need to simultaneously record from upstream regions to see if the same recurrent-like patterns that are seen in M1 were already generated in an upstream region or not. I think acknowledging this major limitation and discussing the need to eventually confirm the conclusions of this modeling study with actual simultaneous recordings from upstream regions is critical.

      We agree with the reviewer that it is not possible to rule out the hypothesis that motor cortical activity is purely generated by feedforward connectivity.

      In the new version of the paper, we discuss more explicitly the fact that neural activity can be fully explained by feedforward inputs, and we added Figure 5—figure supplement 5 to show that the dynamics of the feedforward network looks almost indistinguishable from the one of the recurrent network (Fig.5), provided their parameters are appropriately tuned. Notice, however, that a canonical correlation analysis comparing the activity from recording with the one from simulations shows that the average canonical correlation coefficient is slightly lower for the case of a purely feedforward network (Fig.5.a vs Fig.S12.a).

      A summary of our approach is:

      • We observe that both a purely feedforward and a recurrent network can reproduce the temporal course of the recordings equally well (see also our answer to question 5 below);

      • We point out that a solution that would save metabolic energy consumption is one where the activity is generated by recurrent currents (with shorter range local connections) rather than by feedforward inputs from upstream regions (long-range connections).

      • We study the solution that best reproduces the recorded activity and minimizes inputs from upstream regions.

      In the Discussion, we included the Reviewer’s observation that our hypothesis needs to be tested by simultaneous recordings of M1 and upstream regions, as well as measures of synaptic strength between motor cortical neurons. See the second paragraph of page 14: “ Our prediction (…) will be necessary to rule out alternative explanations”. Yet, we think that the results of reference [51] are consistent with our results.

      One last point we would like to stress is that external inputs drive the network's dynamics at all times, even in the solution that we argue would save metabolic energy consumption: untuned inputs are present throughout the whole course of the motor action, also during movement execution, and they determine the precise temporal pattern of neurons firing rates.

      2) The ring network model used in this work implicitly relies on the assumption that cosinetuning models are good representations of the recorded M1 neuronal activity. However, this assumption is not quantitatively validated in the data. Given that all conclusions depend on this, it would be important to provide some goodness of fit measure for the cosine tuning models to quantify how well the neurons' directional preferences are explained by cosine tunings. For example, reporting a histogram of the cosine tuning fit error over all neurons in Fig 2 would be helpful (currently example fits are shown only for a few neurons in Fig. 2 (a), (b), and Figure S6(b)). This would help quantitatively justify the modeling choice.

      We thank the reviewer for this observation. Fig.S2.e-f shows the R^2 coefficient of the cosine fit; in particular, we show that the R^2 of the cosine fit strongly correlates with the variables \eta, which represent the degree of participation of single units to the recurrent currents. Units with higher \eta (the ones that contribute more to the recurrent currents) are the ones whose tuning curves better resemble a cosine. However, the plot also shows that the R^2 coefficient of the cosine fit is pretty low for many cells. To show that a model with cosine tuning can yield this result, we repeated the same analysis on the units in our simulated network. In our simulations, all neurons receive a stochastic input mimicking large fluctuations around mean inputs that are expected to occur in vivo. We selected the 141 units whose activity more strongly resembled the activity of the 141 recorded neurons (see figure caption for details). We then looked at the tuning curves of these 141 units from simulations, and calculated the R^2 coefficient of the cosine fit. Figure 5—figure supplement 2.c shows that the result agrees well with the data: the R^2 coefficient is pretty low for many neurons, and correlates with the variable \eta. To summarize, a model that assumes cosine tuning, but also incorporates noise in the dynamics, reproduces well the R^2 coefficient of the cosine fit of tuning curves from data. We added the paragraph “Cosine tuning “ in the Discussion to comment on this point.

      3) The authors explain that the two-cylinder model that they use has "distinct but correlated"maps A and B during the preparation and movement. This is hard to see in the formulation. It would be helpful if the authors could expand in the Results on what they mean by "correlation" between the maps and which part of the model enforces the correlation.

      We thank the reviewer for this comment. By correlation, we meant the correlation between neural activity during the preparatory and movement-related temporal intervals. In the model, the correlation between the vectors θA and θB induces correlation in the preparatory and movement-related activity patterns. To make the paper easier to read, we are not mentioning this concept in the Results; in the Discussion, we explicitly refer to it in the following two paragraphs:

      “A strong correlation between the selectivity properties of the preparatory and movement-related epochs will produce strongly correlated patterns of activity in these two intervals and a strong overlap between the respective PCA subspaces.” (Discussion, section Orthogonal spaces dedicated to movement preparation and execution)

      “The correlation between the vectors θAand θB (Discussion, section Interplay between external and recurrent currents)”

      4) The authors note that a key innovation in the model formulation here is the addition ofparticipation strengths parameters (η_A, η_B) to prior two-cylinder models to represent the degree of neuron's participation in the encoding of the circular variable in either map. The authors state that this is critical for explaining the cosine tunings well: "We have discussed how the presence of this dimension is key to having tuning curves whose shape resembles the one computed from data, and decreases the level of orthogonality between the subspaces dedicated to the preparatory and movement-related activity". However, I am not sure where this is discussed. To me, it seems like to show that an additional parameter is necessary to explain the data well, one would need to compare fit to data between the model with that parameter and a model without that parameter. I don't think such a comparison was provided in the paper. It is important to show such a comparison to quantitatively show the benefit of the novel element of the model.

      We thank the reviewer for this comment.

      ● The key observation is that without the parameters eta_A, eta_B, the temporal evolution of all neurons in the network is the same (only the noise term added to the dynamics is different). To show this, we have performed a comparison of the temporal evolution of the firing rates of single neurons of the model with data. Fig 5.c shows a comparison between the time-course of single neurons firing rates from data and simulations (good agreement), while Figure 6—figure supplement 2.a shows the same comparison for a model in which all neurons have the same value of the eta_A, eta_B parameters (worse agreement: the range of firing rates is the same for all neurons). In summary, the parameters eta_A, eta_B introduce the variability in the coupling strengths that is necessary to generate heterogeneity in neuronal responses.

      ● At the end of section “PCA subspaces dedicated to movement preparation and execution”, we refer to (Figure 6—figure supplement 2).c, showing that a model with eta_A=1=eta_B for all neurons yields less orthogonal subspaces.

      5) The model parameters are fitted by minimizing a total cost that is a weighted average of twocosts as E_tot = α E_rec + E_ext, with the hyperparameter α determining how the two costs are combined. The selection of α is key in determining how much the model relies on external inputs to explain the cosine tunings in the data. As such, the conclusions of the paper rely on a clear justification of the selection of α and a clear discussion of its effect. Otherwise, all conclusions can be arbitrary confounds of this selection and thus unreliable. Most importantly, I think there should be a quantitative fit to data measure that is reported for different scenarios to allow comparison between them (also see comment 2). For example, when arguing that α should be "chosen so that the two terms have equal magnitude after minimization", this would be convincing if somehow that selection results in a better fit to the neural data compared with other values of α. If all such selections of α have a similar fit to neural data, then how can the authors argue that some are more appropriate than others? This is critical since small changes in alpha can lead to completely different conclusions (Fig. 6, see my next two comments).

      All the points raised in questions 5 to 8 are interrelated, and we address them below, after Major issue 8.

      6) The authors seem to select alpha based on the following: "The hyperparameter α was chosen so that the two terms have equal magnitude after minimization (see Fig. S4 for details)". Why is this the appropriate choice? The authors explain that this will lead to the behavior of the model being close to the "bifurcation surface". But why is that the appropriate choice? Does it result in a better fit to neural data compared with other choices of α? It is critical to clarify and justify as again all conclusions hinge on this choice.

      7) Fig 6 shows example solutions for 2 close values of α, and how even slight changes in the selection of α can change the conclusions. In Fig. 6 (d-e-f), α is chosen as the default approach such that the two terms E_rec and E_ext have equal magnitude. Here, as the authors note, during movement execution tuned external inputs are zero. In contrast, in Fig. 6 (g-h-i), α is chosen so that the E_rec term has a "slightly larger weight" than the E_ext term so that there is less penalty for using large external inputs. This leads to a different conclusion whereby "a small input tuned to θ_B is present during movement execution". Is one value of α a better fit to neural data? Otherwise, how do the authors justify key conclusions such as the following, which seems to be based on the first choice of α shown in Fig. 6 (d-e-f): "...observed patterns of covariance are shaped by external inputs that are tuned to neurons' preferred directions during movement preparation, and they are dominated by strong direction-specific recurrent connectivity during movement execution".

      8) It would be informative to see the extreme case of very large and very small α. For example, if α is very large such that external inputs are practically not penalized, would the model rely purely on external inputs (rather than recurrent inputs) to explain the tuning curves? This would be an example of the hypothetical scenario mentioned in my first comment. Would this result in a worse fit to neural data?

      We agree with the reviewer that it is crucial to discuss how the choice of the parameter alpha affects the results, and we have strived to improve this discussion in the revised manuscript.

      I. When we looked for the coupling parameters that best explain the data, without introducing a metabolic cost, we found multiple solutions that were equally good (see Figure 4—figure supplement 2 and our answer to question (1) above). These included the solution with all couplings set to zero ( j_s^B = j_s^A = j_a = 0), as well as many solutions with different values of synaptic couplings parameters. The solution with the strongest couplings is close to the bifurcation line, in the area where j_s^B > j_s^A.

      II. We then introduced a metabolic cost to break the degeneracy between these different solutions. The cost function we minimized contains two terms; their relative strength is modulated by alpha. The case of very small alpha (i.e., only minimizing external input) yields a very poor reconstruction of neural dynamics and is not interesting. The case of very large alpha reduces to the case (I) above. We added Figure 4—figure supplement 1 to show the results for intermediate values of alpha - alpha is large enough to yield a good reconstruction of neural dynamics, yet small enough to ensure that we find a unique solution. For these intermediate values of alpha, the two terms of the cost function have comparable magnitudes. Although slight changes in the selection of alpha do change whether the solutions are above or below the bifurcation surface, Figure 4—figure supplement 1 shows that all solutions are close to the bifurcation surface. In particular, the value of j_s^B is close to its critical value, while we never find solutions where j_s^A is close to its critical value - we never find solutions in the lower-right region of the plot in Figure 4—figure supplement 1. The critical value for j_s^B is the one above which no tuned external inputs are necessary to sustain the observed activity during movement execution. For values of j_s^B close to the bifurcation line but below it (for example, Fig.4g) inferred tuned inputs are still much weaker than the untuned ones, during movement execution. Also, the inferred direction-specific couplings are strong and amplify the weak external inputs tuned to map B, therefore still playing a major role in shaping the observed dynamics during movement execution.

      We have rewritten accordingly the abstract, introduction and conclusions of the paper. Instead of focusing on only one solution for a particular value of alpha, we now discuss all solutions and their implications.

      9) The authors argue in the discussion that "the addition of an external input strengthminimization constraint breaks the degeneracy of the space of solutions, leading to a solution where synaptic couplings depend on the tuning properties of the pre- and post-synaptic neurons, in such a way that in the absence of a tuned input, neural activity is localized in map B". In other words, the use of the E_ext term, apparently reduces "degeneracy" of the solution. This was not clear to me and I'm not sure where it is explained. This is also related to α because if alpha goes toward very large values, it would be like the E_ext term is removed, so it seems like the authors are saying that the solution becomes degenerate if alpha grows very large. This should be clarified.

      We thank the reviewer for pointing this out. By degeneracy of solution, we mean that the model can explain the data equally well for different choices of the recurrent couplings parameters (j_s^A, j_s^B, j_a). In other words, if we look for the coupling parameters that best explain the data, there are many equivalent solutions. When we introduce the E_ext term in the cost function, we then find one unique solution for each choice of alpha. So by “breaking the degeneracy”, we mean going from a scenario where there are many solutions that are equally valid, to one single solution. We added this explanation in the paper, along with the explanation on how our conclusion depends on the ‘choice of alpha’.

      10) How do the authors justify setting Φ_A = Φ_B in equation (5)? In other words, how is the last assumption in the following sentence justified: "To model the data, we assumed that the neurons are responding both to recurrent inputs and to fluctuating external inputs that can be either homogeneous or tuned to θ_A; θ_B, with a peak at constant location Φ_A = Φ_B ≡ Φ". Does this mean that the preferred direction for a given neuron is the same during preparation and movement epochs? If so, how is this consistent with the not-so-high correlation between the preferred directions of the two epochs shown in Fig. 2 c, which is reported to have a circular correlation coefficient of 0.4?

      We would like to stress the important distinction between the parameters \theta and the parameters Φ. While the parameters \theta_A and \theta_B represent the preferred direction of single neurons during preparatory and execution epochs, respectively, the parameters Φ_A, Φ_B represent the direction of motion that is encoded at the population level during these two epochs. The mean-field analysis shows that Φ_A = Φ_B, even though single neurons change their preferred direction from one epoch to the next. We added a more extensive explanation of the order parameters in the Results section.

      Reviewer #3 (Public Review):

      In this work, Bachschmid-Romano et al. propose a novel model of the motor cortex, in which the evolution of neural activity throughout movement preparation and execution is determined by the kinematic tuning of individual neurons. Using analytic methods and numerical simulations, the authors find that their networks share some of the features found in empirical neural data (e.g., orthogonal preparatory and execution-related activity). While the possibility of a simple connectivity rule that explains large features of empirical data is intriguing and would be highly relevant to the motor control field, I found it difficult to assess this work because of the modeling choices made by the authors and how the results were presented in the context of prior studies.

      Overall, it was not clear to me why Bachschmid-Romano et al. couched their models within a cosine-tuning framework and whether their results could apply more generally to more realistic models of the motor cortex. Under cosine-tuning models (or kinematic encoding models, more generally), the role of the motor cortex is to represent movement parameters so that they can presumably be read out by downstream structures. Within such a framework, the question of how the motor cortex maintains a stable representation of movement direction throughout movement preparation and execution when the tuning properties of individual neurons change dramatically between epochs is highly relevant. However, prior work has demonstrated that kinematic encoding models provide a poor fit for empirical data. Specifically, simple encoding models (and the more elaborate extensions [e.g., Inoue, et al., 2018]) cannot explain the complexity of single-neuron responses (Churchland and Shenoy, 2007), and do not readily produce the population-level signals observed in the motor cortex (Michaels, Dann, and Scherberger, 2016) and cannot be extended to more complex movements (Russo, et al., 2018).

      In both the Introduction and Discussion, the authors heavily cite an alternative to kinematic encoding models, the dynamical systems framework. Here, the correlations between kinematics and neural activity in the motor cortex are largely epiphenomenal. The motor cortex does not 'represent' anything; its role is to generate patterns of muscle activity. While the authors explicitly acknowledge the shortcomings of encoding models ('Extension to modeling richer movements', Discussion) and claim that their proposed model can be extended to 'more realistic scenarios', they neither demonstrate that their models can produce patterns of muscle activity nor that their model generates realistic patterns of neural activity. The authors should either fully characterize the activity in their networks and make the argument that their models better provide a better fit to empirical data than alternative models or demonstrate that more realistic computations can be explained by the proposed framework.

      Major Comments

      1) In the present manuscript, it is unclear whether the authors are arguing that representing movement direction is a critical computation that the motor cortex performs, and the proposed models are accurate models of the motor cortex, or if directional coding is being used as a 'proof of concept' that demonstrates how specific, population-level computations can be explained by the tuning of individual neurons.

      If the authors are arguing the former, then they need to demonstrate that their models generate activity similar to what is observed in the motor cortex (e.g., realistic PSTHs and population-level signals). Presently, the manuscript only shows tuning curves for six example neurons (Fig. S6) and a single jPC plane (Fig. S8). Regarding the latter, the authors should note that Michaels et al. (2016) demonstrated that representational models can produce rotations that are superficially similar to empirical data, yet are not dependent on maintaining an underlying condition structure (unlike the rotations observed in the motor cortex).

      If the authors are arguing the latter - and they seem to be, based on the final section of the Discussion - then they need to demonstrate that their proposed framework can be extended to what they call 'more realistic scenarios'. For example, could this framework be extended to a network that produces patterns of muscle activity?

      We thank the reviewer for raising these issues.

      Is our model a kinematic encoding model or a dynamical system?

      Our model is a dynamical system, as can be seen by inspecting equations (1,2). The main difference between our model and recently proposed dynamical system models of motor cortex is that the synaptic connectivity matrix in our model is built from the tuning properties of neurons, instead of being trained using supervised learning techniques (we come back to this important difference below). Since the network’s connectivity and external input depend on the neurons’ tuning to the direction of motion (eq 5-6), kinematic parameters emerge from the dynamic interaction between recurrent and feedforward currents, as specified by equations (1-6). Thus, kinematic parameters can be decoded from population activity.

      While in kinematic encoding models neurons’ firing rates are a function of parameters of the movement, we constrained the parameters of our model by requiring the model to reproduce the dynamics of a few order parameters, which are low-dimensional measures of the activity of recorded neurons. Our model is fitted to neural data, not to the parameters of the movement.

      Although we observed that a linear decoder of the network’s activity can reproduce patterns of muscle activity without decoding any kinematic parameter (see below), discussing whether tuning in M1 plays a computational role in controlling muscle activity is outside of the scope of our work. Rather, the scope of our paper is to discuss how a specific connectivity structure can generate the observed patterns of neural activity, and which connectivity structure requires minimum external inputs to sustain the dynamics. In our approach, the correlations between kinematics and neural activity in the motor cortex are not merely epiphenomenal, but emerge from a specific structure of the connectivity that has likely been shaped by hebbian-like learning mechanisms.

      Can the model generate realistic PSTHs and patterns of muscle activity? Yes, it can. As suggested, we have added the following comparisons:

      ● A CCA-based analysis (Fig 5.a) shows that the performance of our model is qualitatively comparable to the Sussillo et al. (2015) and Kao et al (2021) at generating realistic motor cortical activity (average canonical correlation ρ = 0.77 for motor preparation, 0.82 for motor execution).

      ● For each of the 141 neurons in the data, we selected the corresponding most similar unit in the model (the closest neurons in the eta- and theta- parameters space, i.e. the one with smallest euclidean distance in the space defined by (\theta_A, \theta_B, \eta_A, \eta_B)). A side-by-side comparison of the time course of responses (Fig 5.c) shows a good qualitative agreement.

      ● We successfully trained a linear decoder to read the responses of these 141 units from simulations and output trial-averaged EMG activity recorded from a monkey performing the same task (Fig 5.b).

      ● The model displays sequential activity and rotational dynamics (Fig. S10) without the need to introduce neuron-specific latencies (Michaels, Dann, and Scherberger, 2016).

      Can our model explain the complexity of single-neuron tuning?

      We have shown that our model captures the heterogeneity of neural responses. Yet, it has been shown that neurons’ tuning properties depend on many features of movement. For example, the current version of the model does not describe the dependence of tuning on speed (Churchland and Shenoy, 2007). However, our model could be extended to incorporate it. Preliminary results suggest that in a network model in which neurons differ by the degree of symmetry of their synaptic connectivity the speed of neural trajectories can be modulated by external inputs targeting preferentially neurons that are asymmetrically connected. In our model, all connections are a sum of a symmetric and an asymmetric term. We could extend our model to incorporate variability in the degree of symmetry in the connections, and speculate that in such a model tuning would depend on the speed of movement, for appropriate forms of external inputs. We leave this study to future work.

      Can our model explain neural activity underlying more complex trajectories? When limb trajectories are more complex than simple reaches (Russo, et al., 2018), a single neuron’s activity displays intricate response patterns. Our work could be extended to model more complex movement in several ways. A simplifying assumption we made is that the task can be clearly separated into a preparatory phase and one movement-related phase. A possible extension is one where the motor action is composed of a sequence of epochs, corresponding to a sequence of maps in our model. It will be interesting to study the role of asymmetric connections for storing a sequence of maps. Such a network model could be used to study the storing of motor motifs in the motor cortex (Logiaco et al, 2021); external inputs could then combine these building blocks to compose complex actions.

      In summary, we proposed a simple model that can explain recordings during a straight-reaching task. It provides a scaffold upon which we can build more sophisticated models to explain the activity underlying more complex tasks. We point out that a similar limitation is present in modeling approaches where a network is trained to perform specific neural or muscle activity. The question of whether/how trained recurrent networks can generalize is not yet solved, although currently under investigation (e.g., Dubreuil et al 2022; Driscoll et al 2022).

      What is the advantage of the present model, compared to an RNN trained to output specific neural/muscle activity?

      Its simplicity. Our model is a low-rank recurrent neural network: the structure of the connectivity matrix is simple enough to allow for analytical tractability of the dynamics. The model can be used to test specific hypotheses on the relationship between network connectivity, external inputs and neural dynamics, and to test hypotheses on the learning mechanisms that may lead to the emergence of a given connectivity structure. The model is also helpful to illustrate the problem of degeneracy of network models. An interesting future direction would be to compare the connectivity matrices of trained RNNs and our model.

      We addressed these points in the Discussion, in sections: “Representational vs dynamical system approaches” and “Extension to modeling activity underlying more complex tasks.”

      2) Related to the above point, the authors claim in the Abstract that their models 'recapitulatethe temporal evolution of single-unit activity', yet the only evidence they present is the tuning curves of six example units. Similarly, the authors should more fully characterize the population-level signals in their networks. The inferred inputs (Fig. 6) indeed seem reasonable, yet I'm not sure how surprising this result is. Weren't the authors guaranteed to infer a large, condition-invariant input during movement and condition-specific input during preparation simply because of the shape of the order parameters estimated from the data (Fig. 6c, thin traces)?

      We thank the reviewer for this comment. Regarding the first part of the question: we added new plots with more comparisons between the activity of our model and neural recordings (see the answer above referring to Fig 5).

      Regarding the second part: It is true that the shape of the latent variables that we measure from data constrains the solution that we find. However, a “condition-invariant input during movement and condition-specific input during preparation” is not the only scenario compatible with the data. Let’s take a step back and focus on the parameters that we are inferring from data. We are inferring both the strength of external inputs and the couplings parameters. This is done in a two-step inference procedure: we start from a random guess of the couplings parameters, then we infer the strength of the external inputs, and finally we compute the cost function, which depends on all parameters. This is done iteratively, by moving in the space of the coupling parameters; for each point in the space of the coupling parameters, there is one possible configuration of external inputs. The space of the coupling parameters is shown in Fig 4.a, for example (see also Fig. S4). The solutions that we find do not trivially follow from the shape of the latent variables. For example, one possible solution could be: large parameter j_s^A, small parameter j_s^B, which correspond to a point in the lower-right region of the parameter space in Fig 4.a (Fig. S4). The resulting external input would be a strong condition-specific external input during movement execution, but a condition-invariant input during movement preparation: the model is such that, for example, exciting for a short time-interval a few neurons whose preferred direction corresponds to the direction of motion would be enough to “set the direction of motion” for the network; the pattern of tuned activity could be sustained during the whole delay period thanks to the strong recurrent connections j_s^A. We could not rule out this solution by simply looking at the shape of the latent variables. However, it is a solution we have never observed. We only found solutions in the region where j_s^B is large and close to its critical value. This implies the presence of condition-specific inputs during the whole delay period, and condition-invariant external inputs that dominate over condition-specific ones during movement execution.

      3) In the Abstract and Discussion (first paragraph), the authors highlight that the preparatory andexecution-related spaces in the empirical data and their models are not completely orthogonal, suggesting that this near-orthogonality serves an important mechanistic purpose. However, networks have no problem transferring activity between completely orthogonal subspaces. For example, the generator model in Fig. 8 of Elsayed, et al. (2016) is constrained to use completely orthogonal preparatory and execution-related subspaces. As the authors point out in the Discussion, such a strategy only works because the motor cortex received a large input just before movement (Kaufman et al., 2016).

      We thank the reviewer for this observation. We would like to stress the fact that we are not claiming that having an overlap between subspaces is necessary to transfer activity. Instead, our model shows that a small overlap between the maps can be exploited by the network to transfer activity between subspaces without requiring direction-specific external inputs right before movement execution. A solution where activity is transferred through feedforward inputs is also possible. Indeed, one of the observations of our work (which we highlight more in the new version of the paper) is that by looking at motor cortical activity only, we are not able to distinguish between the activity generated by a feedforward network, and one generated by a recurrent one. However, we argue that a solution where external inputs are minimized can be favorable from a metabolic point of view, as it requires fewer signals to be transmitted through long-range connections. This informs our cost function, and yields a solution where activity is transferred through recurrent connections, by exploiting the small correlation between subspaces.

    1. Author Response

      Reviewer #1 (Public Review):

      DeRisi and colleagues used a new phage-display peptide platform, with 238,068 tiled 62-amino acid peptides covering all known P falciparum coding regions (and numerous other entities), to survey seroreactivity in 198 Ugandan children and adults from two cohorts. They find that the breadth of responses to repeat-containing peptides was twofold higher in children living in the high versus moderate exposure setting, while no such differences were observed for peptides without repeats. Additionally, short motifs associated with seroreactivity were extensively shared among hundreds of antigens, with much of this driven by motifs shared with PfEMP1 antigens.

      Malaria immunity is complex, and this new platform is a potentially valuable addition to the toolkit for understanding humoral responses. The two cohorts differed in fundamental ways: 1) high versus moderate exposure to infective bites; 2) samples drawn at the time of malaria for most donors in the high zone versus ~100 days after the last malaria episode in the moderate zone. The effect of acute malaria to boost short-term cross-reactive antibodies can confound the ability to draw inferences when comparing the two cohorts, and this should be further explored to understand its role in the patterns of seroreactivity observed.

      We thank the reviewer for this very insightful comment. In endemic areas, this potential confounder is a natural occurrence – in areas of higher transmission, people will on average be more likely to have an active or recent infection. The question is whether the differences seen in repeat-containing peptides are due to cumulative exposure or recency/active exposure. To address this point, we have added new analyses, as suggested, taking into account infection status in both exposure settings. In the moderate exposure setting, we find that the breadth of response in children to repeat containing peptides significantly narrows between the most recently exposed subjects, and those that have been infection free for >240 days, indicative of a short-lived response. This difference was not observed for peptides without repeats. (New figure: Figure 5, Supplement 4). We also observe an increase in breadth for repeat-containing peptides in high vs. moderate exposure settings, regardless of infection status (New figure: Figure 5, Supplement 3), a difference that was absent in non-repeat containing peptides. Overall, these data suggest that responses to repeats are not only more exposure-dependent, but also short-lived relative to non-repeats in children. We have included this new analysis (lines 409-435.)

      Reviewer #2 (Public Review):

      This work profiles naturally acquired antibodies against Plasmodium falciparum proteins in two Ugandan cohorts, at incredibly high resolution, using a comprehensive library of overlapping peptides. These findings highlight the ubiquity and importance of intra- and inter-protein repeat elements in the humoral immune response to malaria. The authors discuss evidence that repeat elements reside in more seroreactive proteins, and that the breadth of immunity to repeat-containing antigens is associated with transmission intensity in children.

      A key strength and value added to publicly available data are the breadth of proteome coverage and unprecedented resolution from using tiling peptides. The authors point out that a known limitation of PhIP-seq is that conformational and discontinuous-linear epitopes cannot be detected with short linear peptides. In addition, disulfide linkages and post-translational modifications would be absent in the T7 representations.

      Several significant conclusions drawn from the results in this study are based on the humoral response to repeat elements that are present in multiple locations, including different genes. If antibodies to these regions are cross-reactive as described, it is not clear how the assay can differentiate antibodies that were developed against one or many of these loci. This potential confounding could change the conclusions about inter-protein motifs.

      • We thank the reviewer for their comments on the study. We have added a note about post-translational modifications to the text (Line 675-676) as recommended.

      • With regards to interprotein motifs (Figure 6), we only suggest a potential for antibody cross-reactivity across these motifs based on sequence similarity alone. We do not claim direct evidence that they are indeed cross-reactive, especially given the complex polyclonal nature of the response we are measuring. We present this sequence analysis only as a landscape of potential cross-reactivity among linear epitopes in the proteome, derived from the pool of seroreactive peptides enriched in this cohort.

      • Regardless, we have included a new analysis following the suggestion of Reviewer #1 to determine whether reactivity to these shared motifs indeed correlates between peptides from different proteins sharing a motif within the same individual. While this analysis shows apparent cross reactivity within individuals, we point out that the data is derived from complex polyclonal repertoires inherent to each individual, and thus these observations must be taken in that context and do not definitively establish cross reactivity. Along with the new analysis (Line 495-503), we have sought to be clear on these limitations (Line 632-635).

      Reviewer #3 (Public Review):

      This work provides a new tool, a comprehensive PhIP-seq library, containing 238,068 individual 62-amino acids peptides tiled every 25-amino acid peptide covering all known 8,980 proteins of the deadliest malaria parasite, Plasmodium falciparum, to systematically profile antibody targets in high resolution. This phage display library has been screened by plasma samples obtained from 198 Ugandan children and adults in high and moderate malaria transmission settings and 86 US controls. This work identified that repeat elements were commonly targeted by antibodies. Furthermore, extensive sharing of motifs associated with seroreactivity indicated the potential for extensive cross-reactivity among antigens in P. falciparum. This paper provides a new proteome-wide high-throughput methodology to identify antibody targets that have been investigated by protein arrays and alpha screens to date. Importantly, only this methodology (PhIP-seq library) is able to investigate repeat-containing antigens and cross-reactive epitopes in high resolution (25-amino acid resolution).

      Strengths:

      1) Novel technology

      Firstly, the uniqueness of this study is the use of novel technology, the PhIP-seq library. This PhIP-seq library in this study contains >99.5% of the parasite proteome and is the highest coverage among existing proteome-wide tools for P. falciparum. Moreover, this library can identify antibody responses in high resolution (25 amino acids).

      Secondly, the PhIP-seq converts a proteomic assay (ie. protein array and alpha screen) into a genomic assay, leveraging the massive scale and low-cost nature of next-generation short-read sequencing.

      Thirdly, the phage display system is the ability to sequentially enrich and amplify the signal to noise. Finally, a high-quality strategic bioinformatic analysis of PhIP-seq data was applied.

      2) Novel findings

      The major findings of this study were obtained only by using this novel technology because of its full-proteome coverage and high resolution. Repeat elements were the common target of naturally acquired antibodies. Furthermore, extensive sharing of motifs associated with seroreactivity was observed among hundreds of parasite proteins, indicating the potential for extensive cross-reactivity among antigens in P. falciparum.

      3) Usefulness for the future research

      Importantly, plasma samples from longitudinal cohort studies will give the scientific community important insights into protective humoral immunity which will be important for the identification of vaccine and exposure-marker candidates in the near future.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are the insufficient description of the selected parasite proteins and seroreactivity ranking of the selected proteins such as TOP100 proteins.

      We thank the reviewer for their comments, corrections, and suggestions. We have made a number of changes and added new analyses, all of which have improved the work. These changes include the following:

      • Analysis of breadth of seroreactivity to repeat and non-repeat regions taking into account infection status in both exposure settings.

      • Analysis to test whether reactivity to peptides with interprotein motifs correlates within the same individual

      • A table listing top 100 proteins in terms of their seropositivity % in response to the reviewer’s comment (Supplementary table 2b).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors used data from extracellular recordings in mouse piriform cortex (PCx) by Bolding & Franks (2018), they examined the strength, timing, and coherence of gamma oscillations with respiration in awake mice. During "spontaneous" activity (i.e. without odor or light stimulation), they observed a large peak in gamma that was driven by respiration and aligned with the spiking of FBIs. TeLC, which blocks synaptic output from principal cells onto other principal cells and FBIs, abolishes gamma. Beta oscillations are evoked while gamma oscillations are induced. Odors strongly affect beta in PCx but have minimal (duration but not amplitude) effects on gamma. Unlike gamma, strong, odor-evoked beta oscillations are observed in TeLC. Using PCA, the authors found a small subset of neurons that conveyed most of the information about the odor (winner cells). Loser cells were more phase-locked to gamma, which matched the time course of inhibition. Odor decoding accuracy closely follows the time course of gamma power.

      We thank the reviewer for the accurate summary of our work.

      I think this is an interesting study that uses a publicly available dataset to good effect and advances the field elegantly, especially by selectively analyzing activity in identified principal neurons versus inhibitory interneurons, and by making use of defined circuit perturbations to causally test some of their hypotheses.

      We thank the reviewer for the positive appraisal.

      Major:

      • The authors show odor-specificity at the time of the gamma peak and imply that the gamma coupling is important for odor coding. Is this because gamma oscillations are important or because gamma is strongest when activity in PCx is strongest (i.e. both excitatory and inhibitory activity, which would cancel each other in the population PSTH, which peaks earlier)? To make this claim, the authors could show that odor decoding accuracy - with a small (~10 ms sliding window) - oscillates at approx. gamma frequencies. As is, Fig. 5 just shows that cells respond at slightly different times in the sniff cycle. What time window was used for computing the Odor Specificity Index? Put another way, is it meaningful that decoding is most accurate when gamma oscillations are strongest, or is this just a reflection of total population activity, i.e., when activity is greatest there is more gamma power, and odor decoding accuracy is best?

      We thank the reviewer for the critical comment. Please note that the employed decoding strategy (supervised learning with cross-validation) prevents us from quantifying a time series of decoding accuracy. Nevertheless, to overcome this difficulty, we divided the spike data (0-500 ms following the inhalation start) according to the gamma cycle into four non-overlapping gamma phase bins. Then we tested whether odor decoding accuracy varied as a function of the gamma cycle phase. Using this approach, we found that decoding depended on the gamma phase, as shown below:

      (The bottom plot shows the modulation of decoding accuracy within the gamma cycle [Real MI] compared to a surrogate distribution [Surr MI, obtained by circularly shifting the gamma phases by a random amount]).

      We interpret this new result as indicative that gamma influences decoding accuracy directly and that our previous result was not only a reflection of total population activity. Moreover, please note that we only use the principal cell activity for computing the odor specificity index (Fig 5E) and decoding accuracy (Fig 7B). Both peak at ~150 ms following inhalation start, at a time window where the net principal cell activity is roughly similar to baseline levels (Fig 5A bottom panel).

      These new panels were added to revised Figure 7 and mentioned in the revised manuscript (page 8); we now also discuss the above considerations about maximal decoding not coinciding with the peak firing rate (page 10).

      Regarding the Odor Specificity Index computation, we apologize for not describing it appropriately in the corresponding Methods subsection. We employed the same sliding time window as in the population vector correlation and the decoding analyses (i.e., 100 ms window, 62.5 % overlap). This information has been added to the revised manuscript (page 15).

      • The authors say, "assembly recruitment would depend on excitatory-excitatory interactions among winner cells occurring simultaneously during gamma activity." Can the authors test this prediction by examining the TeLC recordings, in which excitatory-excitatory connections are abolished?

      We thank the reviewer for the relevant comment. We followed the reviewer's suggestion and analyzed odor assemblies in TeLC recordings. Interestingly, we found a greater increase in the firing rate of winner cells in TeLC recordings (see figure below), which therefore does not support our previous interpretation that assembly recruitment would depend on excitatory-excitatory local interactions.

      Thus, this new result suggests a much more critical role than we previously considered for the OB projections in determining winner neurons.

      Moreover, we found significant differences in the properties of loser cells. In particular, the TeLC-infected piriform cortex showed a decreased number of losing cells, which were significantly less inhibited than their contralateral counterparts:

      Furthermore, the reduced inhibition of losing cells was associated with an increased correlation of assembly weights across odors for the affected hemisphere:

      Therefore, we believe these results highlight the role of gamma oscillations in segregating cell assemblies and generating a sparse orthogonal odor representation in the piriform cortex. These findings are now included as new panels of Figure 6 and discussed on page 8. Noteworthy, to conform with them, we modified our speculative sentence (page 9) "assembly recruitment would depend on excitatory-excitatory interactions among winner cells occurring simultaneously during gamma activity" to “(…) the assembly recruitment would depend on OB projections determining which winner cells “escape” gamma inhibition, highlighting the relevance of the OB-PCx interplay for olfaction (Chae et al., 2022; Otazu et al., 2015).”

      • The authors show that gamma oscillations are abolished in the TeLC condition and use this to claim that gamma arises in the PCx. However, PCx neurons also project back to the OB, where they form excitatory connections onto granule cells. Fukunaga et al (2012) showed that granule cells are essential for generating gamma oscillations in the bulb. Can the authors be sure that gamma is generated in the PCx, per se, rather than generated in the bulb by centrifugal inputs from the PCx, and then inherited from the bulb by the PCx?

      We thank the reviewer for the pertinent comment regarding gamma generation in the PCx. To address this point, we have performed current source density (CSD) analysis, which showed sink and sources of low-gamma oscillations within the PCx and also a phase reversal:

      This result – shown as panel F in Figure 1 – suggests a local generation of gamma within the PCx. Along with the fact that PCx gamma tightly correlates with piriform FBI firing and that PCx gamma disappears in the TeLC ipsi hemisphere, which has intact OB projections, we deem it more parsimonious to assume that gamma does originate in the piriform circuit during feedback inhibition acting on principal cells and is not directly inherited from OB (though it depends on its drive). We have edited our text to incorporate the figure above panel (page 4). We now also relate our results with those of Fukunaga and colleagues for the OB gamma generation and discuss the alternative interpretation of inherited gamma (page 9).

      Reviewer #2 (Public Review):

      This is a very interesting paper, in which the authors describe how respiration-driven gamma oscillations in the piriform cortex are generated. Using a published data set, they find evidence for a feedback loop between local principal cells and feedback interneurons (FBIs) as the main driver of respiration-driven gamma. Interestingly, odour-evoked gamma bursts coincide with the emergence of neuronal assemblies that activate when a given odour is presented. The results argue in favour of a winner-take-all mechanism of assembly generation that has previously been suggested on theoretical grounds.

      We thank the reviewer for his/her work and accurate summary of our results.

      The article is well-written and the claims are justified by the data. Overall, the manuscript provides novel key insights into the generation of gamma oscillations and a potential link to the encoding of sensory input by cell assemblies. I have only minor suggestions for additional analyses that could further strengthen the manuscript:

      We thank the reviewer for the positive appraisal.

      1) The authors' analysis of firing rates of FFIs and FBIs combined with TeLC experiments make a compelling case for respiration-driven gamma being generated in a pyramidal cell-FBI feedback mechanism. This conclusion could be further strengthened by analyzing the gamma phase-coupling of the three neuronal populations investigated. One would expect strong coupling for FBIs but not FFIs (assuming that enough spikes of these populations could be sampled during the respiration-triggered gamma bursts). An additional analysis to strengthen this conclusion could be to extract FBI- and FFI spike-triggered gamma-filtered signals. One might expect an increase in gamma amplitude following FBI but not FFI spiking (see e.g., Pubmed ID 26890123).

      We thank the reviewer for the comment. To address this point, we first computed spike-coupling strength (by means of the Mean Vector Length – MVL) for each neuronal subtype. As shown below, we did not find major differences in MVL values across subtypes (if anything, the FBIs actually displayed the lowest MVL, though it should be cautioned that this metric is sensible to sample size, which differed among subtypes):

      Of note, this result also translated to spike-triggered gamma-filtered signals, with FBIs having the lowest average. We don’t however believe these findings speak against a major role of FBIs in giving rise to field gamma, since it is expected that inhibited neurons will highly phase-lock to gamma (while more active neurons during gamma would show lower phase-locking). Nevertheless, we also computed the spike-triggered gamma amplitude envelope for all three neuronal subtypes. This analysis showed that gamma envelopes closely followed FBI spikes (and not FFIs or EXC cells), and thus this new result reinforces the idea that FBIs trigger gamma oscillations. This plot is now part of an inset of Figure 1G (described on page 5).

      2) The authors utilize the neurons' weight in the first PC to assign them to odour-related assemblies. This method convincingly extracts an assembly for each odour (when odours are used individually), and these seem to be virtually non-overlapping. It would be informative to test whether a similar clear separation of the individual assemblies could be achieved by running the analysis on all odours simultaneously, perhaps by employing a procedure of assembly extraction that allows to deal with overlapping assembly membership better than a pure PCA approach (as used for instance in the work cited on page 11, including the authors' previous work)? I do not doubt the validity of the authors' approach here at all, but the suggested additional analysis might allow the authors to increase their confidence that individual neurons contribute mostly to an assembly related to a single odour.

      We thank the reviewer for the pertinent comment. In order to address it, we ran the ICA-based approach to detect cell assemblies (Lopes-dos-Santos et al., 2013) using the spike time series of all odors concatenated. The concatenation included time windows around the gamma peak (100-400 ms after inhalation start). We chose this window to prevent the ICA from picking temporal features of the response as different ICs instead of the spiking variations caused by the different odors. As a reference, we also calculated ICA for each odor independently during the gamma peak.

      We found that the results obtained from ICA computed using concatenated data from all odors show important resemblances to those from the single ICA per odor approach. For instance, we get similar sparsity and cell assembly membership (Figure 6-figure supplement 1A), orthogonality (Figure 6-figure supplement 1B), and odor specificity (Figure 6-figure supplement 1C) in the ICs loadings through both approaches. Noteworthy, the average absolute IC correlation between the six odors (computed separately) and the six first ICs (computed from the combined odor responses) were similar across animals and showed no significant differences (Figure 6-figure supplement 1C).

      We also directly tested odor selectivity and separation in the concatenated data approach by computing each odor’s mean assembly activity (i.e., “IC projection”). Regarding the former, we found that most assemblies coded for 1 or 2 odors (Figure 6-figure supplement 1D). Regarding the diversity of representations for the sampled neurons, we assessed odor separation by examining to which odor each IC is activated the most. Under this framework, we get that, on average, the first 6 ICs encode three to five different odors (Figure 6-figure supplement 1E).

      We have included this result as a new Figure 6-figure supplement 1 and mention it on page 8. Of note, we have also performed all of our previous assembly analyses (i.e., Figure 6) using ICA instead of PCA to be consistent throughout the manuscript and allow the reader to compare with the new supplementary figure. This led to a new and enhanced version of Figure 6.

      3) Do the authors observe a slow drift in assembly membership as predicted from previous work showing slowly changing odour responses of principal neurons (Schoonover et al., 2021)? This could perhaps be quantified by looking at the expression strengths of assemblies at individual odour presentations or by running the PCA separately on the first and last third of the odour presentations to test whether the same neurons are still 'winners'.

      We thank the reviewer for calling our attention to this point. We note, however, that the representation drift observed by Schoonover et al. occurred along several days of recordings, i.e., at a much slower time scale than the single-day recordings we analyzed here (of note, Schoonover et al. observed no drift within the same day [their Fig 2a]). But irrespective of this, we believe that the data at hand does not allow for a confident analysis of possible drifts. This is because each odor was only presented ~12 times; so, further subdividing the data into subsets of only 4 trials would not render a reliable analysis, unfortunately.

      4) Does the winner-take-all scenario involve the recruitment of specific sets of FBIs during the activation of the individual odour-selective assemblies? The authors could address this by testing whether the rate of FBIs changes differently with the activation of the extracted assemblies.

      Within each recording session, the number of recorded FBIs is very low, on average 3.6 FBIs per recording session. Thus, unfortunately such interesting analysis cannot be confidently performed.

      5) Given the dependence on local gamma oscillations, one might expect that odour-selective assemblies do not emerge in the TeLC-expressing hemisphere. This could be directly tested in the existing data set.

      We are thankful for the comment. We followed the reviewer's suggestion and analyzed odor assemblies in TeLC recordings, comparing the ipsilateral hemisphere (infected) with the contralateral one. Interestingly, we find an increased correlation of assembly weights across odors, suggesting that the formation/segregation of odor-selective assemblies is hindered when the principal cell synapses are abolished. This assembly selectivity reduction co-occurred as the number of losing neurons decreased, and the inhibition of the latter was also reduced. Consequently, decoding accuracy significantly decreased during the 150-250 ms window in the infected TeLC hemisphere compared to the contralateral cortex.

      Therefore, we believe these new results support the role of gamma oscillations in segregating cell assemblies and generating a sparse orthogonal odor representation. These findings are now included as new panels of Figure 6 and Figure 7 and discussed on page 8.

    1. Author Response

      Reviewer #1 (Public Review):

      This well-done platform trial identifies that ivermectin has no impact on SARS-CoV-2 viral clearance rate relative to no study drug while casirivimab lead to more rapid clearance at 5 days. The figures are simple and appealing. The study design is appropriate and the analysis is sound. The conclusions are generally well supported by the analysis. Study novelty is somewhat limited by the fact that ivermectin has already been definitively assessed and is known to lack efficacy against SARS-CoV-2. Several issues warrant addressing:

      1) Use of viral load clearance is not unique to this study and was part of multiple key trials studying paxlovid, remdesivir, molnupiravir, and monoclonal antibodies. The authors neglect to describe a substantial literature on viral load surrogate endpoints of therapeutic efficacy which exist for HIV, hepatitis B and C, Ebola, HSV-2, and CMV. For SARS-CoV-2, the story is more complicated as several drugs with proven efficacy were associated with a decrease in nasal viral loads whereas a trial of early remdesivir showed no reduction in viral load despite a 90% reduction in hospitalization. In addition, viral load kinetics have not been formally identified as a true surrogate endpoint. For maximal value, a reduction in viral load would be linked with a reduction in a hard clinical endpoint in the study (reduction in hospitalization and/or death, decreased symptom duration, etc...). This literature should be discussed and data on the secondary outcome, and reduction in hospitalization should be included to see if there is any relationship between viral load reduction and clinical outcomes.

      This is an important point and we thank the reviewer for raising it. We agree that there is a rich literature on the use of viral load kinetics in optimizing treatment of viral infectious diseases, and we are clearly not the first to think of it! We have added the following sentence in the discussion.

      “The method of assessing antiviral activity in early COVID-19 reported here builds on extensive experience of antiviral pharmacodynamic assessments in other viral infections.”

      We agree that more information is needed to link viral clearance measures to clinical outcomes. We have addressed this in the discussion as follows:

      “Using less frequent nasopharyngeal sampling in larger numbers of patients, clinical trials of monoclonal antibodies, molnupiravir and ritonavir-boosted nirmatrelvir, have each shown that accelerated viral clearance is associated with improved clinical outcomes [1,4,5]. These data suggest reduction in viral load could be used as a surrogate of clinical outcome in COVID-19. In contrast the PINETREE study, which showed that remdesivir significantly reduced disease progression in COVID-19, did not find an association between viral clearance and therapeutic benefit. This seemed to refute the usefulness of viral clearance rates as a surrogate for rates of clinical recovery [16]. However, the infrequent sampling in all these studies substantially reduced the precision of the viral clearance estimates (and thus increased the risk of type 2 errors). Using the frequent sampling employed in the PLATCOV study, we have shown recently that remdesivir does accelerate SARS-CoV-2 viral clearance [17], as would be expected from an efficacious antiviral drug. This is consistent with therapeutic responses in other viral infections [18, 19]. Taken together the weight of evidence suggests that accelerated viral clearance does reflect therapeutic efficacy in early COVID-19, although more information will be required to characterize this relationship adequately.”

      2) The statement that oropharyngeal swabs are much better tolerated than nasal swabs is subjective. More detail needs to be paid to the relative yield of these approaches.

      The statement is empirical. We know of other studies in progress where there are high rates of discontinuation because of patient intolerance of repeated nasopharyngeal sampling. Not one of 750 patients enrolled to date in PLATCOV has refused sampling, which we believe is useful information for research involving multiple sampling. This is clearly a critical point for pharmacodynamic studies.

      We agree that the optimal site of swabbing for SARS-CoV-2 and relative yields for the given test requirements (sensitivity vs quantification) need to be considered, although the literature on this is large and sometimes contradictory.

      We have added the following line:

      Oropharyngeal viral loads have been shown to be both more and less sensitive for the detection of SARS-CoV-2 infection. Although rates of clearance are very likely to be similar from the two body sites, this should be established for comparison with other studies.

      3) The stopping rules as they relate to previously modeled serial viral loads are not described in sufficient detail.

      The initial stopping rules were chosen based on previously modelled data (reference 11). We have added details to the text (lines 199-219):

      “Under the linear model, for each intervention, the treatment effect β is encoded as a multiplicative term on the time since randomisation: eβT, where T=1 if the patient was assigned the intervention, and zero otherwise. Under this specification β=0 implies no effect (no change in slope), and β>0 implies increase in slope relative to the population mean slope. Stopping rules are then defined with respect to the posterior distribution of β, with futility defined as Prob[β<λ]>0.9; and success defined as Prob[β>λ]>0.9, where λ≥0. Larger values of λ imply a smaller sample size to stop for futility but a larger sample size to stop for efficacy. λ was chosen so that it would result in reasonable sample size requirements, as was determined using a simulation approach based on previously modelled serial viral load data [11]. This modelling work suggested that a value of λ=log(1.05) [i.e. 5% increase] would requireapproximately 50 patients to demonstrate increases in the rate of viral clearance of ~50%, with control of both type 1 and type 2 errors at 10%. The first interim analysis (n=50) was prespecified as unblinded in order to review the methodology and the stopping rules (notably the value of λ). Following this, the stopping threshold was increased from 5% to 12.5% [λ=log(1.125)] because the treatment effect of casirivimab/imdevimab against the SARS-CoV-2 Delta variant was larger than expected and the estimated residual error was greater than previously estimated. Thereafter trial investigators were blinded to the virus clearance results. Interim analyses were planned every batch of additional 25 patients’ PCR data however, because of delays in setting up the PCR analysis pipeline, the second interim analysis was delayed until April 2022. By that time data from 145 patients were available (29 patients randomised to ivermectin and 26 patients randomized to no study drug).”

      4) The lack of blinding limits any analysis of symptomatic outcomes.

      We added this line to the discussion:

      “Finally, although not primarily a safety study, the lack of blinding compromises safety or tolerability assessments.”

      5) It is unclear whether all 4 swabs from 2 tonsils are aggregated. Are the swabs placed in a single tube and analyzed?

      The data are not aggregated but treated as independent and identically distributed under the linear model. 4 swabs were taken at randomization, followed by two at each follow-up visit. We have added line 183:

      “[..] (18 measurements per patient, each swab is treated as as independent and identically distributed conditional on the model).”

      Swabs were stored separately and not aggregated.

      6) In supplementary Figure 7, both models do well in most circumstances but fail in the relatively common event of non-monotonic viral kinetics (multiple peaks, rebound events). Given the importance of viral rebound during paxlovid use, an exploratory secondary analysis of this outcome would be welcome.

      Thank you for the suggestion. We agree, although the primary goal is to estimate the mean change in slope. Rebound is a relatively rare event and tends to occur after the first seven days of illness in which we are assessing rate of clearance.

      Nevertheless, we agree that this is an important point. It remains unclear how to model viral rebound. In over 700 profiles now available from the study, only a few have strong evidence of viral rebound.

      Reviewer #2 (Public Review):

      This manuscript details the analytic methods and results of one arm of the PLATCOV study, an adaptive platform designed to evaluate low-cost COVID-19 therapeutics through enrollment of a comparatively smaller number of persons with acute COVID-19, with the goal of evaluating the rate of decrease in SARS-CoV-2 clearance compared to no treatment through frequent swabbing of the oropharynx and a Bayesian linear regression model, rather than clinical outcomes or the more routinely evaluated blunt virologic outcomes employed in larger trials. Presented here, is the in vivo virologic analysis of ivermectin, with a very small sample of participants who received the casirivimab/imdevimab, a drug shown to be highly effective at preventing COVID-19 progression and improving viral clearance (during circulation of variants to which it had activity) included for comparison for model evaluation.

      The manuscript is well-written and clear. It could benefit however from adding a few clarifications on methods and results to further strengthen the discussion of the model and accurately report the results, as detailed below.

      Strengths of this study design and its report include:

      1) Selection of participants with presumptive high viral loads or viral burden by antigen test, as prior studies have shown difficulty in detecting effect in those with a lower viral burden.

      2) Adaptive sample size based on modeling- something that fell short in other studies based on changing actuals compared to assumptions, depending on circulating variant and "risk" of patients (comorbidities, vaccine state, etc) over time. There have been many other negative studies because the a priori outcomes assumptions were different from the study design to the time of enrollment (or during the enrollment period). This highlight of the trial should be emphasized more fully in the discussion.

      3) Higher dose and longer course of ivermectin than TOGETHER trial and many other global trials: 600ug/kg/day vs 400mcg/kg/day.

      4) Admission of trial participants for frequent oropharyngeal swabbing vs infrequent sampling and blunter analysis methods used in most reported clinical trials

      5) Linear mixed modeling allows for heterogeneity in participants and study sites, especially taking the number of vaccine doses, variant, age, and serostatus into account- all important variables that are not considered in more basic analyses.

      6) The novel outcome being the change in the rate of viral clearance, rather than time to the undetectable or unquantifiable virus, which is sensitive, despite a smaller sample size

      7) Discussion highlights the importance of frequent oral sampling and use of this modeled outcome for the design of both future COVID-19 studies and other respiratory viral studies, acknowledging that there are no accepted standards for measuring virologic or symptom outcomes, and many studies have failed to demonstrate such effects despite succeeding at preventing progression to severe clinical outcomes such as hospitalization or death. This study design and analyses are highly important for the design of future studies of respiratory viral infections or possibly early-phase hepatitis virus infections.

      Weaknesses or room for improvement:

      1) The methods do not clearly describe allocation to either ivermectin or casirivimab/imdevimab or both or neither. Yes, the full protocol is included, but the platform randomization could be briefly described more clearly in the methods section.

      We have added additional text to the Methods:

      “The no study drug arm comprised a minimum proportion of 20% and uniform randomization ratios were then applied across the treatment arms. For example, for 5 intervention arms and the no study drug arm, 20% of patients would be randomized to no study drug and 16% to each of the 5 interventions. Additional details on the randomization are provided in the Supplementary Materials. All patients received standard symptomatic treatment.”

      2) The handling of unquantifiable or undetectable viruses in the models is not clear in either the manuscript or supplemental statistical analysis information. Are these values imputed, or is data censored once below the limits of quantification or detection? How does the model handle censored data, if applicable?

      We have added lines 185-186:

      “Viral loads below the lower limit of quantification (CT values ≥40) were treated as left-censored under the model with a known censoring value.”

      3) Did the study need to be unblinded prior to the first interim analysis? Could the adaptive design with the first analysis have been done with only one or a subset of statisticians unblinded prior to the decision to stop enrolling in the ivermectin arm?

      The unblinded interim analysis was done on the first 50 patients enrolled in the study. The study at that time was enrolling into five arms including ivermectin, casirivimab-imdevimab, remdesivir, favipiravir, and a no study drug arm (there were exactly 10 per arm as a result of the block randomization).

      The main rationale for making this interim analysis unblinded was to determine the most reasonable value of λ (this defines stopping for futility/success), which is a trade-off between information gain, reasonable sample size expectations, and the balance between quickly identifying interventions which have antiviral activity versus the certainty of stopping for futility.

      Once the value of 12.5% was decided, the trial investigators remained blinded to the results until the stopping rules were met and the unblinded statistician discussed with the independent Data Safety and Management Board who agreed to unblind the ivermectin arm.

      4) Can the authors comment on why the interim analysis occurred prior to the enrollment of 50 persons in each of the ivermectin and comparison arms? Even though the sample sizes were close (41 and 45 persons), the trigger for interim analysis was pre-specified.

      After the first interim analysis at 50 patients enrolled into the study, they were planned every additional 25 patients (i.e. very frequently). The trigger for the interim analysis was not 50 patients into a specific arm, but 50 patients in total, and thereafter were planned to occur with every 25 new patients enrolled into the study. In practice there were backlogs in the data pipeline (which we explain), and interim analyses occurred less frequently than planned- the second one being in April 2022.

      5) The reporting of percent change for the intervention arms is overstated. All credible intervals cross zero: the clearance for ivermectin is stated to be 9% slower, but the CI includes + and - %, so it should be reported as "not different." Similarly, and more importantly for casirivimab/imdevimab, it was reported to be 52% faster, although the CI is -7.0 to +115%. This is likely a real difference, but with ten participants underpowered- and this is good to discuss. Instead, please report that the estimate was faster, but that it was not statistically significant. Similarly, the clearance half-life for ivermectin is not different, rather than "slower" as reported (CI was -2 to +6.6 hours). This result was however statistically significant for casirivimab/imdevimab.

      Thank you for your comments. The confidence interval for casirivimab/imdevimab did not cross zero and was +7.0 to +115.1%, and we thank the reviewer for picking up the error in the results section (it was correct in the abstract) where it was written -7.0 to +115.1%. We have made this correction. Elsewhere, we have provided more precise language to discriminate clinical significance from statistical significance, as per the essential revisions.

      6) While the use of oropharyngeal swabs is relatively novel for a clinical trial, and they have been validated for diagnostic purposes, the results of this study should discuss external validity, especially with respect to results from other studies that mainly use nasopharyngeal or nasal swab results. For example, oropharyngeal viral loads have been variably shown to be more sensitive for the detection of infection, or conversely to have 1-log lower viral loads compared to NP swabs. Because these models look for longitudinal change within a single sampling technique, they do not impact internal validity but may impact comparisons to other studies or future study designs.

      We have added the following sentence to the discussion:

      “Oropharyngeal viral loads have been shown to be both more and less sensitive for the detection of SARS-CoV-2 infection. Although rates of viral clearance are very likely to be similar from the two sites, this should be established for comparison with other studies.”

      7) Caution should be used around the term "clinically significant" for viral clearance. There is not an agreed-upon rate of clinically significant clearance, nor is there a log10 threshold that is agreed to be non-transmissible despite moderately strong correlations with the ability to culture virus or with antigen results at particular thresholds.

      We agree. We have addressed this partly in our response to Reviewer 1.

      8) Additional discussion could also clarify that certain drugs, such as remdesivir, have shown in vivo activity in the lungs of animal models and improvement in clinical outcomes in people, but without change in viral endpoints in nasopharyngeal samples (PINETREE study, Gottlieb, NEJM 2022). Therefore, this model must be interpreted as no evidence of antiviral activity in the pharyngeal compartment, rather than a complete lack of in vivo activity of agents given the limitations of accessible and feasible sampling. That said, strongly agree with the authors about the conclusion that ivermectin is also likely to lack activity in humans based on the results of this study and many other clinical studies combined.

      As above this has been addressed in our response to Reviewer 1.

      Reviewer #3 (Public Review):

      This is a well-conducted phase 2 randomized trial testing outpatient therapeutics for Covid-19. In this report of the platform trial, they test ivermectin, demonstrating no virologic effect in humans with Covid-19.

      Overall, the authors' conclusions are supported by the data.

      The major contribution is their implementation of a new model for Phase 2 trial design. Such designs would have been ideal earlier in the pandemic.

      We thank the reviewer for their encouraging comments.

    1. Author Response

      Reviewer #1 (Public Review):

      Bornstein and colleagues address an important question regarding the molecular makeup of the different cellular compartments contributing to the muscle spindle. While work focusing on single components of the spindle in isolation - proprioceptors, gamma-motor neurons, and intrafusal muscle fibres - have been recently published, a comprehensive analysis of the transcriptome and proteome of the spindle was missing and it fills an important gap considering how local translation and protein synthesis can affect the development and function of such a specialised organ.

      The authors combine bulk transcriptome and proteome analysis and identify new markers for neuronal, intrafusal, and capsule compartments that are validated in vivo and are shown to be useful for studying aspects of spindle differentiation during development. The methodology is sound and the conclusions in line with the results.

      We thank the reviewer for highlighting the importance of our study.

      I feel a bit more analysis regarding the specificity and developmental expression profiles of the identified markers would be a great addition. In particular:

      • Are any of the proprioceptive sensory neurons markers specific for fibres innervating the muscle spindles or also found in Golgi tendon organs?

      We thank the reviewer for the important question, following which we performed two additional analyses. First, in order to study the specificity of spindle afferent genes we identified, we examined the overlap between our list of 260 potential proprioceptive neuron genes and markers for the three proprioceptive neurons subtypes (Ia, II and Ib) identified by Wu and colleagues (Wu et al. 2021). As shown in the newly added Figure 1- figure supplement 2F, while we found many genes that are common to all subtypes, 69 genes exclusively overlapped with subtype markers (22 genes with type Ia neurons, 45 genes with type II neurons and 2 genes with both; lists are shown in Supplementary File 4). These results suggest that the 69 genes are expressed by muscle spindle afferents and not by GTO afferents.

      Second, to study the specificity of our validated markers, we examined the expression of ATP1a3, VCAN and GLTU1, marking proprioception neurons, extracellular matrix and outer capsule, respectively, in GTOs. Results showed that all three markers were also detected in the different tissues composing the GTOs (newly added Figure 3 – figure supplement 3, below). As ATP1a3 is not in the 69 unique marker list, this analysis verified that it is expressed by all proprioceptive neurons. The expression of both VCAN and GLUT1 in GTO capsules highlights the similarity between the capsules of the two proprioceptors.

      • On the same line are any of the gamma motor neurons markers found also in alpha?

      We thank the reviewer for raising this issue. Following the reviewer’s question, we conducted a detailed analysis of the expression of potential γ motor neuron genes. To this end, we first generated a list of α-motor neurons genes in our data by performing ranked GSEA using published expression profiles of these neurons (Blum et al., 2021). Then, we compared between the three lists of neuronal genes, i.e. γ motor neurons, α motor neurons and proprioceptive neurons (newly added Figure 1 – figure supplement 2G), and found an overlap between the three lists. Nonetheless, we also identified 40 spindle genes that are specific to γ motor neuron (Figure 1 – figure supplement 2G and Supplementary File 4) and, therefore, are potential markers for these neurons.

      • How early expression of ATP1A3 is found in neurons at the spindle or fibres starting to innervating the muscle? A couple of late embryonic timepoints would be great.

      We thank the reviewer for this suggestion. We performed late embryonic (E15.5-E17.5) staining for ATP1a3, which showed its expression as early as E15.5 (new Figure 4 – figure supplement 1).

      • Given that the approach used allows to obtain insights on whether local translation plays a major role into the differentiation of the spindle it would be interesting to assess whether the proprioceptor and gamma motor neuron markers identified are also found in the cell body or exclusively at the spindle.

      The reviewer raises an interesting question about local translation of the neuronal genes. Going through the literature, several lines of evidence indicate that the genes expressed at the neuronal end are also expressed in the neuron soma. In a study on retinal ganglion cell translatome, Holt and colleagues found that the axonal translatome is a subset of the significantly larger somal translatome (Shigeoka et al., Cell, 2016). Similarly, a study by Shuman and colleagues that compared the translatome of neuronal cell bodies, dendrites, and axons of rat hippocampal neurons showed that many common genes are translated, albeit at different levels (Glock et al., PNAS, 2021). Finally, following the reviewer’s suggestion, we studied the expression of ATP1a3 in the DRG, and found it to be expressed there as well (Figure L1). Thus, we predict that the markers we found in the neurons ends are likely also expressed in the soma. While this issue is very interesting, we believe that further validation of our assumption exceeds the scope of this study.

      Figure L1. ATP1a3 expression in the DRG. Confocal images of DRG sections from adult PValb-Cre;tdTomato mice stained for ATP1a3 (magenta). Scale bars represent 50 μm.

      Altogether, this is a novel and important work that will benefit scientists studying the neuromuscular and musculoskeletal systems by pushing the field toward an holistic understanding of the muscle spindle. These datasets in combination with the previous ones can be used to develop new genetic and viral strategies to study muscle spindle development and function in healthy and pathological states by analysing the roles and relative contributions of different components of this fascinating and still mysterious organ.

      We thank again the reviewer for highlighting the importance of our study.

      Reviewer #2 (Public Review):

      The data presented are of high quality. Through complementary experiments involving the isolation of masseter muscle spindles, the authors perform RNA-seq and proteomic analysis, and identify genes and proteins that are differentially expressed in the muscle spindle versus the adjacent muscle fiber, and proteins that accumulate specifically in capsule cells and nerve endings. These data, while essentially descriptive, provide important information about the developmental framework of the sensory apparatus present in each muscle that accounts for its tension/contraction state. The data presented thus allow for a better characterization of muscle spindles and provide the community with a set of new markers for better identification of these structures. Analysis of the expression pattern of the Tomato reporter in transgenic animals under the control of Piezo2-CRE, Gli1-CRE and Thy1-YFP reporter reinforces the findings and the specificity of the expression pattern of the specific genes and proteins identified by the multi-omics approach and further validated by immunohistochemistry.

      We thank the reviewer for the positive and encouraging feedback.

    1. Author Response

      Reviewer #2 (Public Review):

      1) The manuscript assumes an understanding of both economic terminology and statistical approaches that will not be familiar to most of the audience, if I am a representative example. This begins in the abstract, much of which I found incomprehensible. I still am not sure about the definition of "nominal costs ", and I certainly have no idea what they mean by a "wholly non-parametric machine learning regression". This continues throughout-presenting much of the data as Log10-transformed costs means that many of the graphs become impossible for a normal mortal like me to interpret.

      We agree with the reviewer. We provide definitions of terms in the Introduction (lines 29-41) and explain the regression methods in greater detail in the text (lines 173-182) and appendix (Tables 1 and 2).

      2) The version presented is written like some early outline draft. Rather than using narrative to guide the reader through the data, it reads like a series of Figure legends. For example, I literally thought the text on page 4 were the Figure legends, but they are not. "Figure 2 shows...." "Table 1 shows...". The Discussion is similarly difficult to follow. Given the complexity and importance of the data they present, this is a major missed opportunity/

      We agree with the reviewer. We have extensively rewritten the text as recommended by the reviewer.

      3) What will most interest my own part of the NIH-community is the assertion that "real dollar adjusted" grant funding has not decreased, but has instead remained flat. Few people I know will believe this. The authors address in a less-than-clear fashion some of the reasons for this-solicited versus non-solicited awards, clinical trials, etc, but do not dig into their own data to identify what are likely to be other issues. I doubt any one of the 20+ NIH-funded researchers in my Department (predominantly NIGMS funded) has a grant that reaches the "median level"-I do not after 32 years of continuous NIH-funding. Most new NIGMS-funded researchers, including many in my Department, are coming in funded by MIRA grants, which at $250K are half the median grant size. They do spend a few moments on disparities in Figure 7, but much more could be pulled out of this data set. Digging into issues like this-distributions in different NIH Institutes, at different career levels, etc, would make this work much more impactful.

      We agree with the reviewer. We provide additional data on R01-equivalent awards (as previously noted) and on the $250K and $500 nominal values. See new Tables 2 and 4. We acknowledge that our analysis is based on NIH as an agency, not on individual Institutes and Centers (lines 259-260).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors devised a new mRNA imaging approach, MASS, and showed that it can be applied to investigate the activation of gene expression and the dynamics of endogenous mRNAs in the epidermis of live C. elegans. The approach is potentially useful, but this manuscript will benefit by addressing the following questions:

      We thank the reviewer for spending time reviewing our manuscript and for the insightful comments.

      Major comments:

      1) In Figure 1-figure supplement 1, the authors claimed that MASS could verify the lamellipodia-localization of beta-actin mRNAs. However, the image showed the opposite of the authors' claim as the concentration of beta-actin mRNA was lower in lamellipodia than the rest of the cytosol. This result disagreed with ref. 17 (Katz, Z.B. et al., Genes and Development, 2012). Hence, the authors cannot make the statement that "MASS can be readily used to image RNA molecules in live cells without affecting RNA subcellular localization". To thoroughly test this notion, the authors should image beta-actin mRNA using MASS and the conventional MS2 system side by side and calculate the polarization index in the same way as shown in Katz, Z.B. et al., Genes and Development, 2012.

      We noticed that b-ACTIN mRNAs were less polarized in our image compared to that shown in Katz, Z.B. et al. (Genes and Development, 2012). It is likely due to different cell lines being used. In the previous study, mouse embryonic fibroblasts (MEFs) were used. In our initial experiment, HeLa cells were used. Our data showed b-that ACTIN mRNAs labeled with MASS could be localized to the lamellipodia.

      As suggested by the reviewer, we performed new experiments to image b-ACTIN mRNAs using MASS and the conventional MS2 system side by side in NIH3T3 cells, a mouse fibroblast cell line (MEF cells are not available in our lab). We did not find cells with extensively polarized b-ACTIN mRNAs localization, potentially due to different cell lines. We, therefore, did not calculate the polarization index. However, we found that b-ACTIN mRNAs detected by both methods showed a similar localization pattern. These new data suggest that MASS does not affect RNA subcellular localization. We added the new results and updated Figure 1-figure supplement 3.

      2) The experiments that validate this new RNA imaging method are not sufficient. The authors need to systematically compare MASS and the MS2 system, including their RNA signal intensity, signal-to-background ratio.

      We have systematically compared MASS and the conventional MS2 system, including signal intensity and signal-to-noise ratio, and measured the velocities of mRNA movement. We found that MASS showed a similar signal-to-noise ratio and higher signal intensity to the conventional MS2 system. We have now revised the information in the text on pages 4 and 5, and in Figure 1-figure supplement 4, 5, and 6.

      3) In line with this, does beta-actin mRNA display the same behavior as in (Figure 1C-F) when the mRNA was imaged with the MS2 system? The movies do not indicate the type of motility expected of mRNA. For instance, it seems that almost all of the GFP dots, which are presumably single beta-actin mRNAs, stayed stationary over a time course of tens of seconds (Movie 1). This seems to be very different from what has been observed before. It's not clear that the dots are real mRNAs molecules. This further stresses the importance for them to compare their new imaging system with the conventional MS2 application.

      We noticed that the mobility of b-ACTIN mRNAs vary in different cells. It is possible that the mobility of mRNAs was regulated in a cell context-dependent manner.

      To confirm that the GFP foci detected are real mRNA molecules, we performed MASS combined with single-molecule RNA FISH. We found that MASS detected a similar number of GFP foci compared to the spots detected by smFISH. In addition, the majority (72%) of GFP foci colocalized with the smFISH spots of b-ACTIN-8xMS2 mRNAs. It is reported that not all MS2 stem-loop will be bound by the MCP (Wu et al., Biophysical journal 2012). As only 8xMS2 was used in MASS, it is likely that some mRNAs were not entirely bound by MCP and were not detected. On the other hand, only sixteen probes were used in the smFISH experiment, and it is possible that some mRNAs were miss labeled by smFISH. Therefore, 100% colocalization of MASS foci with the smFISH spots was hard to achieve. Thus these results suggest that GFP dots are real mRNA molecules. We have added the new data in Figure 1, Figure 1-figure supplement 1, and the text on page 3.

      We measured the velocity of (b-ACTIN mRNA movement tracked by MASS and the conventional MS2 system. We added this information in Figure 1-figure supplement 5 and to the text on pages 4 and 5. With the conventional MS2 system, we observed similar behavior to those observed by MASS.

      4) The authors claimed that a major advantage of MASS is that it has only 8xMS2 stemloops (350 nt) and overcomes "the previous obstacle of the requirement of inserting a long 1,300 nt 24xMS2". This statement lacks experimental support in this manuscript. The authors need to quantitatively compare the genomic tagging efficiency of 8xMS2 and 24xMS2.

      It has been reported by several decent studies that the knock-in efficiency decreases dramatically with increasing insert size. For example:

      ~10-fold decrease of knockin frequency with a 1085 bp compared to a 767 bp insertion of DNA fragment (Extended Data Fig.8. Wang, J. et al. Nature methods, 2022).

      ~30-fold decrease of knockin frequency with an 1122 bp compared to a 714 bp insertion of DNA fragment (Figure 3 and Table S1. Paix, A. et al. PNAS, 2017).

      In this study, we did not directly examine the knock-in efficiency of 8xMS2 and 24xMS2. Based on published data from other laboratories, we assumed that the efficiency of the knock-in of 8xMS2 (350 nt) would be higher than that of 24xMS2 (~1300 nt).

      5) MASS has the same strategy as SunRISER (Guo, Y. & Lee, R.E.C., Cell Reports Methods, 2022). Both methods use Suntag to amplify signals of MS2- or PP7-tagged RNA. The authors need to elaborate the discussions and describe the similarities and differences of the two studies. In particular, the Guo paper needs to be properly referenced.

      We have cited the paper and discussed the similarities and differences between our method and the SunRISER (page 7). Taking both studies together, Guo and we demonstrated that it is an efficient strategy to combine the MS2 system and the Suntag system as a signal amplifier for long-term and endogenous mRNA imaging in live cells.

      6) In Guo, Y. & Lee, R.E.C., Cell Reports Methods, 2022, they showed that 8XPP7 with 24XSunTag configuration led to fewer mRNA per cell (Figure 5B of the Cell Reports Methods paper). Does MASS, which has 8xMS2 with 24xSunTag, similarly lead to few mRNAs? The authors should compare the number of mRNAs detected by MASS and the conventional MS2, or by FISH.

      We compared the number of mRNAs detected by MASS and by smFISH. We performed MASS combined with single-molecule RNA FISH and found that MASS detected a similar number of GFP foci compared to the spots detected by smFISH.

      In addition, the majority (72%) of GFP foci colocalized with the smFISH spots of b-ACTIN8xMS2 mRNAs. It is reported that not all MS2 stem-loop will be bound by the MCP. As only 8xMS2 was used in MASS, it is likely that some mRNAs were not entirely bound by MCP and were not detected. On the other hand, only sixteen probes were used in the smFISH experiment, and it is possible that some mRNAs were miss labeled by smFISH. Therefore, 100% colocalization of MASS foci with the smFISH spots was hard to achieve. These data indicated that MASS could label the majority of mRNAs from a specific gene in live cells.

      We have added the new data in Figure 1, Figure 1-figure supplement 1, and the text on page 3.

      Reviewer #2 (Public Review):

      Hu et al. developed a new reagent to enhance single mRNA imaging in live cells and animal tissues. They combined an MS2-based RNA imaging technique and a Suntag system to further amplify the signal of single mRNA molecules. They used 8xMS2 stem-loops instead of the widely-used 24xMS2 stem-loops and then amplified the signal by fusing a 24xSuntag array to an MS2 coat protein (MCP). While a typical 24xMS2 approach can label a single mRNA with 48 GFPs, this technique can label a single mRNA with 384 GFPs, providing an 8-fold higher signal. Such high amplification allowed the authors to image endogenous mRNA in the epidermis of live C. elegans. While a similar approach combining PP7 and Suntag or Moontag has been published, this paper demonstrated imaging endogenous mRNA in live animals. Data mostly support the main conclusions of this paper, but some aspects of data analysis and interpretation need to be clarified and extended.

      Strengths:

      Because the authors further amplified the signal of single mRNA, this technique can be beneficial for mRNA imaging in live animal tissues where light scattering and absorption significantly reduce the signal. In addition, the size of an MS2 repeat cassette can be reduced to 8, which will make it easier to insert into an endogenous gene. Also, the MCP24xSuntag and scFv-sfGFP constructs can be expressed in previously developed 24xMS2 knock-in animal models to image single mRNAs in live tissues more easily.

      The authors performed control experiments by omitting each one of the four elements of the system: MS2, MCP, 24xSuntag, and scFV. These control data confirm that the observed GFP foci are the labeled mRNAs rather than any artifacts or GFP aggregates. And the constructs were tested in two model systems: HeLa cells and the epidermis of C. elegans. These data demonstrate that the technique may be used across different species.

      We thank the reviewer for spending time reviewing our manuscript and for the insightful comments.

      Weaknesses:

      Although the paper has strength in providing potentially useful reagents, there are some weaknesses in their approach.

      Each MCP-24xSunTag is labeled with 24 GFPs, providing enough signal to be visualized as a single spot. Although the authors showed an image of a control experiment without MS2 in Figure 1B, the authors should at least mention this potential problem and discuss how to distinguish mRNA from MCP tagged with many GFPs. MCP-24xSunTag labeled with 24 GFPs may diffuse more rapidly than the labeled mRNA. Depending on the exposure time, they may appear as single particles or smeared background, but it will certainly increase the background noise. Such trade-offs should be discussed along with the advantage of this method.

      With MCP-24xSuntag, in theory, there will be up to 24 GFP molecules tethered to one MCP molecule, which may lead to the formation of GFP puncta. However, under our imaging conditions (100 ms to 500 ms) with a spinning disk confocal microscopy, puncta of MCP24xSuntag were not detected. As the reviewer suggested, it might be because MCP24xSuntag is diffusing too fast to be detected as a spot.

      For the signal-to-noise ratio, we did more experiments and analyses. We imaged overexpressed b-ACTIN mRNAs using the conventional 24xMS2 system or MASS with different repeats of Suntag arrays (MCP-24xSuntag, MCP-12xSuntag, MCP-6xSuntag). For the conventional 24xMS2 system, we followed the previous protocol that added a nuclear localization signal (NLS) to MCP, and b-ACTIN mRNAs were nicely detected with a signal-to-noise ratio of 1.21.

      We found that MASS showed a comparable or better signal-to-noise ratio than the conventional 24xMS2 system. (MASS with MCP-24xSuntag: 1.79, MASS with MCP12xSuntag: 1.48, MASS with MCP-6xSuntag: 1.42). These data indicate that using Suntag as a signal amplifier did not increase background noise.

      Also, more quantitative image analysis would be helpful to improve the manuscript. For instance, the authors can measure the intensity of each GFP foci, show an intensity histogram, and provide some criteria to determine whether it is an MCP-24xSuntag, a single mRNA, or a transcription site. For example, it is unclear if the GFP spots in Figure 2D are transcription sites or mRNA granules.

      Under our imaging conditions, MCP-24xSuntag was not detected as GFP foci.

      We performed MASS combined with single-molecule RNA FISH and found that MASS detected a similar number of GFP foci compared to the spots detected by smFISH.

      In addition, the majority (72%) of GFP foci colocalized with the smFISH spots of b-ACTIN8xMS2 mRNAs. It is reported that not all MS2 stem-loop will be bound by the MCP. As only 8xMS2 was used in MASS, it is likely that some mRNAs were not entirely bound by MCP and were not detected. On the other hand, only sixteen probes were used in the smFISH experiment, and it is possible that some mRNAs were miss labeled by smFISH. Therefore, 100% colocalization of MASS foci with the smFISH spots was hard to achieve. These data indicated that MASS could label the majority of mRNAs from a specific gene in live cells.

      We have added the new data in Figure 1, Figure 1-figure supplement 1, and the text on page 3.

      The GFP spots in Figure 2D are not transcription sites, as they were localized in the cytoplasm, not in the nucleus. We imaged exogenous BFP-8xMS2 mRNAs in the epidermis of C. elegans and found that the size of the GFP foci of endogenous C42D4.38xMS2 mRNAs is larger than that of BFP-8xMS2 mRNAs. Those data suggest that the GFP spots in Figure 2D (C42D4.3-8xMS2 mRNA) are mRNA granules. We added those new data in Figure 2-figure supplement 5 and the text on page 7.

      Another concern is that the heavier labeling with 24xSuntag may alter the dynamics of single mRNA. Therefore, it would be desirable to perform a control experiment to compare the diffusion coefficient of mRNAs when they are labeled with MCP-GFP vs MCP- 24xSuntag+scFv-sfGFP.

      We thank the reviewer for raising this critical issue. We have performed live imaging of bACTIN mRNA using the conventional 24xMS2 system or MASS with different lengths of Suntag arrays (MCP-24xSuntag, MCP-12xSuntag, MCP-6xSuntag). We then measured the velocity of mRNA movement in each imaging condition. We found that compared to the conventional 24xMS2 system, mRNA labeled with MCP-24xSuntag or by MCP-12xSuntag showed a smaller velocity, indicating that heavier labeling affected mRNA movement speed.<br /> In contrast, we found that mRNAs labeled with MCP-6xSuntag showed a similar velocity to that tagged with the conventional 24xMS2 system. Those data pointed out that when MASS is used to measure the speed of mRNA movement, a short Suntag array (MCP6xSuntag) should be used. We added those new data in Figure 1-figure supplement 5 and to the text on pages 4, 5.

      The authors could briefly explain about the genes c42d4.3 and mai-1. Why were these specific genes chosen to study gene expression upon wound healing? Did the authors find any difference in the dynamics of gene expression between these two genes?

      The function of C42D4.3 and mai-1 is currently not known. Through mRNA deep sequencing, It has been shown that the expression level of C42D4.3 and mai-1 was quickly increased after wounding of the epidermis of C. elegans. We, therefore, choose those two mRNAs for imaging. We added more information about C42D4.3 and mai-1 to the text on page 6.

      We observed similar dynamics of gene expression between C42D4.3 and mai-1 (Video 7 ,8, 9).

      Reviewer #3 (Public Review):

      It is a brilliant idea to combine the MS2-MCP system with Suntag. As the authors stated, it reduces the copies of the MS2 stem loops, which can create challenges during cloning process. The Suntag system can easily amplify the signal by several to tens of folds to boost the signal for live RNA tagging. One of the best ways to claim that MASS works better than the MS2 system by itself is to compare their signal-to-noise ratios (SNRs) within the same model system, such as HeLa cells or the C. elegans epidermis. Because the authors' main argument is that they made an improvement in live RNA tagging method, it is necessary to compare it with other methods side-by-side. The authors claim that MASS can significantly improves the efficiency of CRISPR by reducing the size of the insert, it still requires knocking in several transgenes, which can be even more challenging in some model systems where there are not many selection markers are available. Another possible issue is that the bulky, heavy tagging (384 scFv-sfGFP along with 24xSuntag) can affect the mobility or stability of the target mRNAs. If it also tags preprocessed RNA in the nucleus, it may affect the RNA processing and nuclear export. A few experiments to address these possibilities will strengthen the authors' arguments. I am proposing some experiments below in detailed comments.

      We thank the reviewer for spending time reviewing our manuscript and for the insightful comments.

      1) For the experiments with HeLa cells, it is not clear whether the authors used one focal plane or the whole z-stack for their assessment of mRNA kinetics, such as fusion, fission, and anchoring. If it was from one z-plane, it was possible that many mRNAs move along the z-axis of the images to assume kinetics. If the kinetics is true, is it expected by the authors? Are beta-actin mRNAs bound to some RNA-binding proteins or clustered in RNP complexes?

      One focal plane was used in the experiments showing mRNAs' fusion, fission, and anchoring behavior. We have now added this information in the figure legend of figure 1. Yes, b-ACTIN mRNA are bound to specific RNA-binding proteins, for example, ZBP1, and it has been reported that ZBP1 forms granules with b-ACTIN mRNAs (Farina, K.L., et al., Journal of cell biology, 2003).

      2) Some quantifications on beta-actin mRNA kinetics, such as a plot of their movement speed or fusion rate, etc., would help readers better understand the behaviors of the mRNAs and assess whether the MASS tagging did not affect them.

      We thank the reviewer for raising this critical issue. We have performed live imaging of bACTIN mRNA using the conventional 24xMS2 system or MASS with different lengths of Suntag arrays (MCP-24xSuntag, MCP-12xSuntag, MCP-6xSuntag). We then measured the velocity of mRNA movement in each imaging condition. We found that compared to the conventional 24xMS2 system, mRNA labeled with MCP-24xSuntag or by MCP-12xSuntag showed a smaller velocity, indicating that heavier labeling affected mRNA movement speed.<br /> In contrast, we found that mRNAs labeled with MCP-6xSuntag showed a similar velocity to that tagged with the conventional 24xMS2 system. Those data pointed out that when MASS is used to measure the speed of mRNA movement, a short Suntag array (MCP6xSuntag) should be used. We added those new data in Figure 1-figure supplement 5 and the text on pages 4 and 5.

      3) Using another target gene for MASS tagging would further confirm the efficacy of the system. Assuming the authors generated a parental strain of HeLa cell, where MCP24xSuntag and scFv-sfGFP are already stably expressed (shown in Fig. 1B), CRISPR-ing in another gene should be relatively easy and fast.

      For exogenous genes, in addition to b-ACTIN, we imaged mRNAs from three more genes, C-MYC, HSPA1A, and KIF18B, with MASS in HeLa cells. For endogenous genes, we imaged C42D4.3 and mai-1 in the epidermis of C. elegans. These data indicated that MASS is able to image both exogenous and endogenous mRNAs in live cells. We have now added those new data in Figure 1-figure supplement 2, Figure 2-figure supplement 2, and to the text on pages 3, 4, and 6.

      4) Adding a complementary approach to the data presented in Fig. 1, such as qRT-PCR for beta-actin, with or without the MASS system would ensure the intense tagging did not affect the mRNA expression or stability.

      To address this question, we performed more experiments to test whether MASS affected the mRNA expression and stability. Because b-ACTIN mRNA is very stable; thus it is not suitable for measuring mRNA stability. We, therefore, tested three genes, including C-MYC, HSPA1A, and KIF18B, which were reported as medium-stable mRNAs. We found that MASS did not affect the stability of those three mRNAs in HeLa cells. We also tested the expression level and the stability of endogenous C42D4.3 mRNA in the epidermis of C. elegans and found that both the expression and the stability were not affected by MASS. We have now added those new data in Figure 1-figure supplement 2, Figure 2-figure supplement 2, and to the text on pages 3, 4, and 6.

      5) For experiments with the C. elegans epidermis, including at least one more MASS movie clip for C42D4.3 and a movie for mai-1 would be helpful for readers to appreciate the RNA labeling and its dynamics.

      We showed two movies (video 7 and video 8) and the snapshots for C42D4.3 mRNA (Figure 2D and Figure 2-figure supplement 3). We also added a movie (Video 9) for mai-1.

      6) The difference between Fig. 2D and Fig. 2-fig supp. 3 is unclear. The authors should address the different patterns of RNA signal propagation. Is it due to the laser power used too much, resulting in photobleach in Fig. 2D?

      We have noticed the difference between Figure 2D and Figure 2-figure supplement 3. In Figure 2D, GFP foci did not appear within the injury area after wounding. In Figure 2-figure supplement 3, GFP foci quickly appeared within the injury area. Although we kept the laser power setting constant when performing the laser wounding experiment, there are indeed variations in the actual laser power used. As the reviewer suggested, the difference may be due to photobleaching in Figure 2D. Alternatively, it is possible that the location of the injury site or the degree of injury could affect the dynamics of gene expression.

      However, we would like to point out that the dynamics of gene expression pattern in Figure 2D (Video 7) and Figure 2-figure supplement 3 (Video 8) is similar. GFP foci of C42D4.3 mRNAs were first detected around the injury sites. Then GFP foci gradually appeared from the area around the injury site to distal regions.

      7) Movie 7 is the key data the authors are presenting, but there are a few discrepancies between their arguments and what is seen from the movie. The authors say the RNAs are "gradually spread" (the line 120 in the manuscript). However, it seems that the green foci just appear here and there in the epidermis and the majority of them stay where they were throughout the timelapse. This pattern seems to be different from the montage in Fig. 2-fig supp. 3, which indeed looks like the mRNA spots are formed around the lesion and spread overtime. Additional explanation on this will strengthen the arguments. Given the dramatic increase of c42d4.3 mRNA abundance 1 min. after the laser wounding, there must be a tremendous boost of transcription at the active transcription sites, which should be captured as much bigger and fewer green foci that are located inside the nucleus. Is this simply because those nuclear sites are out of focus or in a similar size as mRNA foci? Regardless, this should be addressed in the discussion.

      We apologize for the confusing description of our original data. We wrote "gradually spread", but we did not mean that mRNAs were transcribed at the wounding site and moved to the distal regions. We actually mean that GFP foci first appeared close to the wounding site and more GFP foci gradually appeared at the distal regions. We have changed our writing to "the appearance of GFP foci gradually spreads from the area around the injury site to distal regions".

      For the difference between Figure 2D and Figure 2-figure supplement 3, please see our discussion for comment 6.

      For transcription, we also expected a boost of transcription after wounding. However, we failed to detect the appearance of bigger GFP foci in the nucleus. We agree with the reviewer that this is because the active nuclear sites are out of focus. The epidermis of C. elegans is a syncytium with 139 nuclei located in different regions and focal planes. With our microscopy, we were able to image only one focal plane, in which there are usually only four to ten nuclei. Therefore, it is likely that the nuclei with active transcription were out of focus. We have now discussed this point in the revised manuscript (page 6).

      8) One clear way to confirm that MASS labels mRNAs and does not affect their stability/localization is to compare the imaging data with single-molecule RNA fluorescence in situ hybridization (smFISH) that the Singer lab developed decades ago. The authors can target the endogenous c42d4.3 or mai-1 RNAs using smFISH and compare their abundance and subcellular localization patterns with their data.

      To confirm that the GFP foci detected are real mRNA molecules, we performed MASS combined with single-molecule RNA FISH and found that MASS detected a similar number of GFP foci compared to the spots detected by smFISH. In addition, the majority (72%) of GFP foci colocalized with the smFISH spots of b-ACTIN-8xMS2 mRNAs. It is reported that not all MS2 stem-loop will be bound by the MCP. As only 8xMS2 was used in MASS, it is likely that some mRNAs were not fully bound by MCP and were not detected. On the other hand, only sixteen probes were used in the smFISH experiment, and it is possible that some mRNAs were miss labeled by smFISH. Therefore, 100% colocalization of MASS foci with the smFISH spots was hard to achieve. These data indicated that MASS could detect single mRNA molecules and label the majority of mRNAs from a specific gene in live cells. We have now added the new data in Figure 1, Figure 1-figure supplement 1, and to the text on page 3.

      We performed more experiments to test whether MASS affected the mRNA expression and stability. Because b-ACTIN mRNA is very stable; thus it is not suitable for measuring mRNA stability. We, therefore, tested three genes, including C-MYC, HSPA1A, and KIF18B, which were reported as medium-stable mRNAs. We found that MASS did not affect the stability of those three mRNAs in HeLa cells. We also tested the expression level and the stability of endogenous C42D4.3 mRNA in the epidermis of C. elegans and found that both the expression and the stability were not affected by MASS. We have now added those new data in Figure 1-figure supplement 2, Figure 2-figure supplement 2, and to the text on pages 3, 4, and 6.

      To test whether MASS affected the mRNA localization, we performed new experiments to image b-ACTIN mRNAs using MASS and the conventional MS2 system side by side in NIH3T3 cells, which is a mouse fibroblast cell line. We found that b-ACTIN mRNAs showed similar localization in both methods. These new data suggest that MASS does not affect RNA subcellular localization. We have now added the new results in Figure 1-figure supplement 2.

      9) One of the main purposes to live image RNAs is to assess their dynamics. Adding some more analyses, such as the movement speed of the foci, would be helpful to show how effective this system is to assess those dynamics features.

      We thank the reviewer for raising this critical issue. We have performed live imaging of bACTIN mRNA using the conventional 24xMS2 system or MASS with different lengths of Suntag arrays (MCP-24xSuntag, MCP-12xSuntag, MCP-6xSuntag). We then measured the velocity of mRNA movement in each imaging condition. We found that compared to the conventional 24xMS2 system, mRNA labeled with MCP-24xSuntag or by MCP-12xSuntag showed a smaller velocity, indicating that heavier labeling affected mRNA movement speed.

      In contrast, we found that mRNAs labeled with MCP-6xSuntag showed a similar velocity to that tagged with the conventional 24xMS2 system. Those data pointed out that when MASS is used to measure the speed of mRNA movement, a short Suntag array (MCP6xSuntag) should be used. We added those new data in Figure 1-figure supplement 5 and to the text on pages 4 and 5.

      Reviewer #4 (Public Review):

      Hu et al introduced the MS2-Suntag system into C. elegans to tag and image the dynamics of individual mRNAs in a live animal. The system involves CRISPR-based integration of 8x MS2 motifs into the target gene, and two transgene constructs (MCP-Suntag; scFv-sfGFP) that can potentially recruit up to 384 GFP molecule to an mRNA to amplify the fluorescent signal. The images show very high signal to background ratio, indicating a large range of optimization to control phototoxicity for live imaging and/or artifacts caused by excessive labeling. The use of epidermal wound repair as a case study provides a simplified temporal context to interpret the results, such as the initiation of transcription upon wounding. The preliminary results also reveal potentially novel biology such as localization of mRNAs and dynamic RNP complexes in wound response and repair. On the other hand, the system recruits a large protein complex to an mRNA molecule, an immediate question is to what extent it may interfere with in vivo regulation. Phenotypic assays, e.g., in development and wound repair, would have been a powerful argument but are not explored. In all, C. elegans is powerful system for live imaging, and the genome is rich in RNA binding proteins as well as miRNAs and other small RNAs for rich posttranscriptional regulation. The manuscript provides an important technical progress and valuable resource for the field to study posttranscriptional regulation in vivo.

      We thank the reviewer for spending time reviewing our manuscript and for the insightful comments.

    1. Author Response

      Reviewer #1 (Public Review):

      Auxin-induced degradation is a strong tool to deplete CHK-2 and PLK-2 in the C. elegans germ line. The authors strengthen their conclusions through multiple approaches, including rescuing mutant phenotypes and biochemical analyses of CHK-2 and PLK-2.

      The authors overcame a technical limitation that would hinder in vitro analysis (low quantity of CHK-2) through the clever approach of preventing its degradation via the proteasome. In vitro phosphorylation assays and mass spectrometry analysis that establishes that CHK-2 is a substrate of PLK-2 nicely complement the genetic data.

      The authors argue that the inactivation of CHK-2 by PLK-2 promotes crossover designation; however, the data only indicate that PLK-2 promotes proper timing of crossover designation.

      We thank the reviewer for this point of clarification. While we believe that PLK activity is essential to inactivate CHK-2 and trigger CO designation, we agree that this has not been firmly established with the tools available to us, as elaborated below. We have revised the text to avoid overstating the conclusions.

      It is not clear whether the loss of CHK-2 function with the S116A and T120A mutations is the direct result of the inability to phosphorylate these residues or whether it is caused by the apparent instability of these proteins, as their abundance was reduced in IPs compared to wild-type. Agreed. The instability of the mutant proteins was a source of significant frustration during the course of this work, and limits the strength of our conclusions.

      The mechanism of CHK-2 inactivation in the absence of PLK-2 remains unclear, though the authors were able to rule out multiple candidates that could have played this role.

      Reviewer #2 (Public Review):

      In this manuscript, Zhang et al., address the role of Polo-like kinase signaling in restricting the activity of Chk2 kinase and coordinating synapsis among homologous chromosomes with the progression of meiotic prophase in C. elegans. While individual activities of PLK-2 and CHK-2 have been demonstrated to promote chromosome pairing, and double-strand break formation necessary for homologous recombination, in this manuscript the authors attempt to link the function of these two essential kinases to assess the requirement of CHK-2 activity in controlling crossover assurance and thus chromosome segregation. The study reveals that CHK-2 acts at distinct regions of the C. elegans germline in a Polo-like kinase-dependent and independent manner.

      Strengths:

      The study reveals distinct mechanisms through which CHK-2 functions in different spatial regions of meiosis. For example, it appears that CHK-2 activity is not inhibited by PLK's (1 and 2) in the leptotene/zygotene meiotic nuclei where pairing occurs. This suggests that either CHK-2 is not phosphorylated by PLK-2 in the distal nuclei or that it has a kinase-independent function in this spatial region of the germline. These are interesting observations that further our understanding of how the processes of meiosis are orchestrated spatially for coordinated regulation of the temporal process.

      Weaknesses:

      While the possibilities stated above are interesting, they lack direct support from the data. A key missing element in the study is the actual role of PLK-2 signaling in controlling CHK-2 activity and thus function. I expand on this below.

      Throughout the manuscript, the authors test the role of each of the kinases (CHK-2 or PLK-1, or 2) using auxin-induced degradation, which would eliminate both phosphorylated and unphosphorylated pools of proteins. This experiment thus does not test the role of PLK-2 signaling in controlling CHK-2 function or the role of CHK-2 activation. To test the role of signaling from PLK-2 or CHK-2, the authors need to generate appropriate alleles such as phospho-mutants or kinase-dead mutants. The authors do generate unphosphorylatable and phosphomimetic versions of CHK-2, however, they find that the protein level for both these alleles is lower than wild-type CHK-2 (which the authors state is already low). The authors conclude that the lower level of protein in the CHK-2 phospho-mutants is because the mutations cause destabilization of the protein. I am sympathetic with the authors since clearly these results make interpretations of actual signaling activity more challenging. But there needs to be some evidence of this activity, for example through the generation of a phosphor-specific antibody to phosphorylated CHK-2. While not functional, at least the phosphorylation status of CHK-2 would provide more information on its spatial pattern of activation and inactivation. In addition, it would still be of interest to the readership to present the data on these phosphor-mutant alleles with crossover designation and COSA-1::GFP. Is the phenotype of the WT knockin, and each of the phosphomutant knock-ins similar to auxin-induced degradation of CHK-2?

      We thank the reviewer for these comments. We have made several attempts over the past decade that have failed to elicit a CHK-2 antibody that works for either immunofluorescence or western blots, likely due to the very low abundance of CHK-2. This has discouraged us from investing yet more resources to try to develop a phospho-specific antibody. Moreover, our evidence suggests that phosphorylation may promote CHK-2 degradation. Since the phosphomutants of CHK-2 are not stable, we do not think knock-in of these phosphomutants will provide new insights.

      Given that the CHK-2 phosphomutants did not pan out for assessing the signaling regulation of PLK-2 on CHK-2, to directly assess whether PLK-2 activity restricts CHK-2 function in mid-pachytene but not leptotene/zygotene, the authors should generate PLK-2 kinase dead alleles. These alleles will help decouple the signaling function of PLK-2 from a structural function.

      Similarly, to assess the potentially distinct roles of CHK-2 in leptotene/zygotene and mid-pachytene it would be important to assess CHK-2 kinase-dead mutant alleles. At this time, all of the analysis is based on removing both active CHK-2 and inactive CHK-2 (i.e. phosphorylated and unphosphorylated pool) using auxin-induced degradation. The kinase-dead alleles will help infer the role of the kinase more directly. The authors can then superimpose the auxin-induced degradation and assess the impact of complete removal of the protein vs only loss of its kinase function. These experiments may help clarify the role of signaling outcomes of these proteins, vs their complete loss. For example, what does kinase dead PLK-2 recruitment to the synapsed chromosomes appear like? Are their distinct activities for active and inactive PLK-2 that are spatially regulated? The same can be tested for CHK-2.

      A kinase-dead allele of plk-2 has been generated in previous work and we have used it for other purposes. However, the fact that CHK-2 and PLK-2 are required for homolog pairing and synapsis, which are prerequisites for crossover designation, precludes their use here.

    1. Author Response

      Reviewer #2 (Public Review):

      This is an interesting manuscript establishing a role for Ecdysone signaling in the control of sleep. The authors show that the Ecdysone receptor EcR is required primarily in cortex glia for the control of sleep and that its target E75 is also involved in sleep regulation. This is a novel function for both cortex glia and steroid signaling in Drosophila. The authors also present evidence that Ecdysone signaling would be important for response to starvation, and that lipid droplet mobilization would mediate the effect of ecdysone on sleep. This work is certainly innovative. However, the main conclusions need to be strengthened. In particular: variability in sleep amounts in certain strains could complicate interpretation, the idea that ecdysone modulates sleep response to starvation is not sufficiently well supported, and genetic evidence for mobilization of lipid droplets being the mechanism linking steroid signaling to sleep is currently quite weak.

      Major concerns:

      1) I have concerns with the variability observed with the GS drivers (whether nSyb or repo). This is particularly striking in figure S3 when comparing experiments conducted with EcR-c and the Ecl RNAi. Daytime is most affected, but even nighttime looks significantly different. Definitely, nighttime quantification should be shown in addition to total sleep in figure S3. However, I feel that confirming the key results of this study with an additional driver would be reassuring. Could repo-GAL4 combined with GAL80ts be used to drive EcR RNAi, instead of repo-GS? The same combination could help determine whether glia is responsible for the 20E-mediated increase in sleep after starvation (figure S4A).

      We have updated the old Figure S3 source data (now Figure 2 - source data 5) with both daytime and nighttime sleep and the conclusion is similar, please also see our response to essential revision question 1. Regarding the GAL80ts experiment, as noted in our detailed response to essential revision question 1, we conducted this experiment and confirmed that adult-specific knockdown of EcR in glia affects sleep. We also tried to do this experiment under starvation conditions (Figure 3 – figure supplement 1A), but this is more challenging to conduct and interpret as it requires temperature shifts, ecdysone treatment and starvation. In particular, high temperature coupled with starvation turned to be an extreme stressor for Repo-Gal4; TublinGal80ts>EcR RNAi #1 flies, as 8 of 12 flies died after 1 day in our first run; thus, we did not proceed with this experiment.

      2) The idea that ecdysone might suppress the response to starvation is interesting, but the results are not convincing. First, there is an important control missing. It is important to test the effect of Ecdysone on fed flies, to ensure that Ecdysone does not simply make flies sleepy. Second, it is not clear that EcR RNAi has a specific effect on starved flies. Starvation reduces sleep, but is this reduction really exaggerated in flies expressing EcR RNAi than in control flies? It seems to me that starvation reduces sleep by the same amount when comparing results in panels 3D and E. The effect of EcRNAi and starvation might be simply additive, which would suggest that 20E impacts sleep independently of starvation.

      We now show effects of exogenous ecdysone on fed flies. As expected, and previously, shown, ecdysone promotes sleep in fed and starved flies (Figures 3 and 6). We agree with the reviewer that 20E impacts sleep independently of starvation. The major point we made with this experiment was that robust effects of starvation on sleep are maintained in RepoGS-EcR RNAi flies. The fact that these two manipulations together virtually eliminate sleep suggests that glial ecdysone signaling is required for the sleep that remains during starvation.

      3) The material and method section needs to be improved. In particular, it is not clear to me how the starvation/ecdysone feeding assay was done. There are some additional explanations in the figure legend, but the approach is still not clear to me. Indicate clearly when the flies were starved, and when they were exposed to Ecdysone.

      We rewrote the ecdysone treatment and starvation assay section with more details in Methods. We hope it is now clear.

      4) I am not convinced that the Lsd2 results necessarily support the idea that this gene is required for the effect of 20E on sleep. Sleep is dramatically reduced during the day in the Lsd2 mutant. This is actually an interesting observation, but this strong effect on baseline sleep might be masking the ability of 20E to modulate sleep.

      Thanks so much for this great comment. As noted in our response to essential revision question 4, we now demonstrate that lsd2 mutants respond effectively to GABA, showing that their sleep can be modulated.

    1. Author Response

      Reviewer #2 (Public Review):

      The work proposes a new computational rule for classifying synaptic plasticity outcome based on the geometry of synaptic enzyme dynamics. Specifically, the authors implement a multi-timescale model of hippocampal synaptic plasticity induction that takes into account the dynamics of the membrane potential, calcium concentration as well as CaMKII and calcineurin signalling pathways. They show that the proposed rule could be applied to reproduce the outcomes from nine published experimental studies involving different spike-timing and frequency-dependent plasticity induction protocols, animal ages, and experimental conditions. The model has been also used to generate predictions regarding the effect of spike-timing irregularity on plasticity outcomes. The proposed approach constitutes an interesting and original idea that contributes to the ongoing effort in discovering the rules of synaptic plasticity.

      The conclusions of this paper are mostly well supported by data, but some model assumptions and interpretation of modelling results need to be clarified and extended.

      1) The proposed model captures well the stochastic nature of the dendritic spine ion channels and receptors except for the calcium-sensitive potassium (SK) channel that has been modelled deterministically. Given that the same justification in terms of small number of channels present in the small dendritic spine compartment applies to the SK channels as well as to the voltage gated calcium channels and the AMPA and NMDA receptors, it is not clear why the authors have chosen a deterministic representation in the case of SK. The implications of this assumption needs to be investigated and discussed.

      There are several stochastic models of AMPA and NMDA receptors based on single-channel recordings. Additionally, we had enough experimental data on single channel recordings to build a custom Markov chain model of VGCCs. For the SK channel, we could not find enough experimental data (age-dependence activity, temperature sensitivity, etc.) to custom-build a stochastic model. We thus decided to implement a deterministic model. Yet, we understand the reviewers’ comment that in theory, a stochastic model of SK channels could impact our results. We thus now provide a simulation with a stochastic model of SK, comparing it to the deterministic model implemented in the study.

      We describe a minimal version of a stochastic model of SK compatible with the deterministic version. The deterministic model of SK channel fit at ~35C is described in the methods section.

      Because of the factor ρ 𝑓𝑆𝐾 in the equation, which multiplies r(Ca) by ~2, the equation cannot be related to a 2-state Markov chain (MC). This could probably be possible with a 3-state MC but we used a different strategy. Noting that ρ 𝑆𝐾 ∼ 2 , we introduce a new equation

      As 0 < r(Ca) < 1, it is straightforward to introduce a 2-state MC for which the above equation describes the probability of the open state. We then simulate two such independent (for a given Ca concentration) channels and approximate 𝑚 𝑆𝐾 as the sum (which belongs to [0,2Nsk]) of the open states for the 2 channels.

      As the reviewer can see in the figure below, we do not find a major difference in the simulations of 3 protocols. Thus, we argue that adding a stochastic version of the SK channels in our current study would not fundamentally alter our main conclusions.

      Figure Legend: a comparison using Tigaret et al. 2016 1Pre2Post10 and 1Pre2Post50 protocols, and 900 at 50 Hz protocol from Dudek and Bear 1992 (100 repetitions) between the model with the deterministic SK channel (original model - blue), and the modified model including the stochastic SK channel (stochastic SK - red). Deterministic vs stochastic SK channel does not significantly modify the model’s behaviour.

      To explain our rationale of using a deterministic version of SK channel, we provide this sentence in the Methods when describing SK channel model: “"Due to a lack of single-channel recordings of SK channels, and a lack of published stochastic models of SK channels, we modelled SK channels deterministically. In tests we found that this assumption had only a negligible impact on the outcomes of plasticity protocols (data not shown)" (page 40).

      2) Many of the model parameters have been set to values previously estimated from synaptic physiology and biochemistry experiments, However, a significant number of important parameter values have been tuned to reproduce the plasticity experiments targeted in this study. As such, it needs to be explained which of the plasticity outcomes have been reproduced because the parameters are chosen to do so. A clarification would have helped to substantiate the authors' conclusions.

      Most parameters were set with values previously defined by experimental work. We referred to these publications where necessary throughout the Methods and Tables in our original manuscript. For the few free parameters that were adjusted, we now provide additional information wherever necessary for the Tables concerned.

      ● In the legend of Table 4 (neuron electrical properties), we explain which parameters are different from values obtained from the literature to fit experimental data (Golding et al. 2001; Buchanan et al. 2007).

      ● Parameters for the sodium and potassium conductance (Table 5) are labelled as generic since they are intentionally set to produce the BaP dynamics we have shown in the paper.

      ● Table 6 has no free parameters.

      ● Table 7 caption now includes a description saying ’Note that the buffer concentration, calcium diffusion coefficient, calcium diffusion time constant and calcium permeability were considered free parameters to adjust the calcium dynamics’.

      ● In Table 8 we had originally pointed out how we adapted the GluN2B rates from a published GluN2A model (Popescu et al. 2004; and Iacobucci and Popesco 2018). We now describe this adaptation in the Table 8 legend. In this Table, we now also better explain how we adjusted the NMDAr model to reflect the ratio between GluN2B and GluN2A, fitted from Sinclair et al. 2016; and the NMDAr conductance depending on calcium fitted from Maki and Popescu 2014.

      ● In Table 9 caption we now explain how the GABAr number and conductance were modified to fit GABAr currents as in Figures 15 b and e. The relevant parameters are indicated in the table.

      ● In Table 10 caption we now state the number of VGCCs per subtype that we used as a free parameter to reproduce the calcium dynamics (Figure 12).

      3) Adding experimental testing of model predictions, for example, that firing variability can alter the rules of plasticity, in the sense that it is possible to add noise to cause LTP for protocols that did not otherwise induce plasticity would be needed to increase confidence in the presented modelling results.

      We agree that it would be interesting in the future to test the many model predictions suggested in this work with biological experiments. This would however require a lot of work and will be the subject of further studies.

      Reviewer #3 (Public Review):

      This manuscript presents and analyzes a novel calcium-dependent model of synaptic plasticity combining both presynaptic and postsynaptic mechanisms, with the goal of reproducing a very broad set of available experimental studies of the induction of long-term potentiation (LTP) vs. long-term depression (LTD) in a single excitatory mammalian synapse in the hippocampus. The stated objective is to develop a model that is more comprehensive than the often-used simplified phenomenological models, but at the same time to avoid biochemical modeling of the complex molecular pathways involved in LTP and LTD, retaining only its most critical elements. The key part of this approach is the proposed "geometric readout" principle, which allows to predict the induction of LTP vs. LTD by examining the concentration time course of the two enzymes known to be critical for this process, namely (1) the Ca2+/calmodulin-bound calcineurin phosphatase (CaN), and (2) the Ca2+/calmodulin-bound protein kinase (CaMKII). This "geometric readout" approach bypasses the modeling of downstream pathways, implicitly assuming that no further biochemical information is required to determine whether LTP or LTD (or no synaptic change) will arise from a given stimulation protocol. Therefore, it is assumed that the modeling of downstream biochemical targets of CaN and CaMKII can be avoided without sacrificing the predictive power of the model. Finally, the authors propose a simplified phenomenological Markov chain model to show that such "geometric readout" can be implemented mechanistically and dynamically, at least in principle.

      Importantly, the presented model has fully stochastic elements, including stochastic gating of all channels, stochastic neurotransmitter release and stochastic implementation of all biochemical reactions, which allows to address the important question of the effect of intrinsic and external noise on the induction of LTP and LTD, which is studied in detail in this manuscript.

      Mathematically, this modeling approach resembles a continuous stochastic version of the "liquid computing" / "reservoir computing" approach: in this case the "hidden layer", or the reservoir, consists of the CaMKII and CaM concentration variables. In this approach, the parameters determining the dynamics of these intermediate ("hidden") variables are kept fixed (here, they are constrained by known biophysical studies), while the "readout" parameters are being trained to predict a target set of experimental observations.

      Strengths:

      1) This modeling effort is very ambitious in trying to match an extremely broad array of experimental studies of LTP/LTD induction, including the effect of several different pre- and post-synaptic spike sequence protocols, the effect of stimulation frequency, the sensitivity to extracellular Ca2+ and Mg2+ concentrations and temperature, the dependence of LTP/LTD induction on developmental state and age, and its noise dependence. The model is shown to match this large set of data quite well, in most cases.

      2) The choice for stochastic implementation of all parts of the model allows to fully explore the effects of intrinsic and extrinsic noise on the induction of LTP/LTD. This is very important and commendable, since regular noise-less spike firing induction protocols are not very realistic, and not every relevant physiologically.

      3) The modeling of the main players in the biochemical pathways involved in LTP/LTD, namely CaMKII and CaN, aims at sufficient biological realism, and as noted above, is fully stochastic, while other elements in the process are modeled phenomenologically to simplify the model and reveal more clearly the main mechanism underlying the LTP/LTD decision switch.

      4) There are several experimentally verifiable predictions that are proposed based on an in-depth analysis of the model behavior.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      1) The stated explicit goal of this work is the construction of a model with an intermediate level of detail, as compared to simplified "one-dimensional" calcium-based phenomenological models on the one hand, and comprehensive biochemical pathway models on the other hand. However, the presented model comes across as extremely detailed nonetheless. Moreover, some of these details appear to be avoidable and not critical to this work. For instance, the treatment of presynaptic neurotransmitter release is both overly detailed and not sufficiently realistic: namely, the extracellular Ca2+ concentration directly affects vesicle release probability but has no effect on the presynaptic calcium concentration. I believe that the number of parameters and the complexity in the presynaptic model could be reduced without affecting the key features and findings of this work.

      This point is largely answered in Essential Revisions point 4 where we argue the choices we made for the presynaptic model. We acknowledge, however, that in this current version, we did not incorporate all biophysical components, such as the modulation of presynaptic calcium concentration with external calcium variations and multivesicular release. The calcium-dependence of presynaptic release, as modeled currently, is however fitted in Figure 8e against data from Hardingham et al. 2006 and Tigaret et al. 2016. These current limitations could be addressed in a next version of our presynaptic model where we also plan to incorporate age and temperature influence.

      2) The main hypotheses and assumptions underlying this work need to be stated more explicitly, to clarify the main conclusions and goals of this modeling work. For instance, following much prior work, the presented model assumes that a compartment-based (not spatially-resolved) model of calcium-triggered processes is sufficient to reproduce all known properties of LTP and LTD induction and that neither spatially-resolved elements nor calcium-independent processes are required to predict the observed synaptic change. This could be stated more explicitly. It could also be clarified that the principal assumption underlying the proposed "geometric readout" mechanisms is that all information determining the induction of LTP vs. LTP is contained in the time-dependent spine-averaged Ca2+/calmodulin-bound CaN and CaMKII concentrations, and that no extra elements are required. Further, since both CaN and CaMKII concentrations are uniquely determined by the time course of postsynaptic Ca2+ concentration, the model implicitly assumes that the LTP/LTD induction depends solely on spine-averaged Ca2+ concentration time course, as in many prior simplified models. This should be stated explicitly to clarify the nature of the presented model.

      We thank the reviewer for the suggestions on how to clarify the main hypotheses and assumptions of our work. We slightly modified the sentences provided by the reviewer and added them in the main text (page 2, lines 82 and page 19, lines 593).

      3) In the Discussion, the authors appear to be very careful in framing their work as a conceptual new approach in modeling STD/STP, rather than a final definitive model: for instance, they explicitly discuss the possibility of extending the "geometric readout" approach to more than two time-dependent variables, and comment on the potential non-uniqueness of key model parameters. However, this makes it hard to judge whether the presented concrete predictions on LTP/LTD induction are simply intended as illustrations of the presented approach, or whether the authors strongly expect these predictions to hold. The level of confidence in the concrete model predictions should be clarified in the Discussion. If this confidence level is low, that would call into question the very goal of such a modeling approach.

      These are very good questions. Let us first comment on the parameter uniqueness. We believe, like in E. Marder’s work on ion channels expression in neurons, that the synapse has the possibility to adapt its internal parameters (proteins number, transition rates, etc) to provide a given functioning behaviour. As a by-product, there is non uniqueness of parameters associated with behavior. Additionally, since our model is able to reproduce 9 published experimental outcomes with a single set of parameters, it is a functioning synapse with adjusted parameters which output the expected behaviours. Thus by extrapolation, our confidence in the further predictions is high. We modified sentences in the discussion section to argue this point (page 21, line 707).

      Let us comment now on increasing the complexity. To our best, we strived to design a plasticity readout as simple as possible yet providing a functioning synapse. Given our success to reproduce 9 published experimental outcomes with a single set of parameters, adding more complexity would be akin to overfitting.

      4) The authors presented a simplified mechanistic dynamical Markov chain process to prove that the "geometric readout" step is implementable as a dynamical process, at least in principle. However, a more realistic biochemical implementation of the proposed "region indicator" variables may be complex and not guaranteed to be robust to noise. While the authors acknowledge and touch upon some of these issues in their discussion, it is important that the authors will prove in future work that the "geometric readout" is implementable as a biochemical reaction network. Barring such implementation, one must be extra careful when claiming advantages of this approach as compared to modeling work that attempts to reconstruct the entire biochemical pathways of LTP/LTD induction.

      We acknowledge this issue and agree this would be an interesting subject for future work.

    1. Author Response:

      Reviewer #2 (Public Review):

      The manuscript reports on the complex variability of expression, trafficking, assembly/stability, and peptide loading among different MHC I haplotypes. In particular by analyzing two distinct MHC I molecules as representative members of groups of allotypes, that favor canonical or non-canonical assembly modes, the PI reports on preferential cytosolic or endo-lysosomal MHC I loading. Overall, the data shed light on the intersection between MHC I conformation and subcellular sites of peptide loading and help explain MHC I immunosurveillance at a different subcellular location.

      In the first series of experiments the authors report an uneven surface expression of HLA-B vs HLA-A, and C on circulating monocytes, with HLA-B being expressed 4 times higher, also they report that as compared to the TAP-dependent allotype B*08:01 the TAP-independent allotype B*35:01 has a lower surface half-life and if often present as an empty molecule. These data set the basis for the author's hypothesis that B*35:01 could traffic in Rab11+ compartment and be involved in cross-presentation, which indeed is demonstrated in a series of pulse-chase peptide experiments and using cathepsin inhibitors.

      Overall, the experiments could be improved by performing subcellular fractionation and organelle purification to conclusively demonstrate the differential trafficking of B*08:01 vs B*35:01, as well as quantitative mass spectrometry to determine cytosolic vs endosomal processing for one selected epitope presented by the different haplotypes.

      We thank the reviewer for this suggestion, and agree that this would be a powerful method for further validating differential HLA-B trafficking and antigen processing. Unfortunately, we were unable to perform subcellular fractionation experiments for mass spec, as protocols for fractionation require upwards of 10 million cells to obtain endosomal fractions. For our donor samples, we typically obtain 1- 2 million moDCs after isolation and differentiation, greatly limiting the types of experiments we can perform with primary cells from specific donors. We considered performing these experiments in a cell line but were concerned that ER as well as endosomal trafficking and processing pathways might differ between cell lines and primary cells, which would necessitate a number of additional studies to validate use of the cell lines.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a carefully-conducted fMRI study looking at how neural representations in the hippocampus, entorhinal cortex, and ventromedial prefrontal cortex change as a function of local and global spatial learning. Collectively, the results from the study provide valuable additional constraints on our understanding of representational change in the medial temporal lobes and spatial learning. The most notable finding is that representational similarity in the hippocampus post-local-learning (but prior to any global navigation trials) predicts the efficiency of subsequent global navigation.

      Strengths:

      The paper has several strengths. It uses a clever two-phase paradigm that makes it possible to track how participants learn local structure as well as how they piece together global structure based on exposure to local environments. Using this paradigm, the authors show that - after local learning - hippocampal representations of landmarks that appeared within the same local environment show differentiation (i.e., neural similarity is higher for more distant landmarks) but landmarks that appeared in different local environments show the opposite pattern of results (i.e., neural similarity is lower for more distant landmarks); after participants have the opportunity to navigate globally, the latter finding goes away (i.e., neural similarity for landmarks that occurred in different local environments is no longer influenced by the distance between landmarks). Lastly, the authors show that the degree of hippocampal sensitivity to global distance after local-only learning (but before participants have the opportunity to navigate globally) negatively predicts subsequent global navigation efficiency. Taken together, these results meaningfully extend the space of data that can be used to constrain theories of MTL contributions to spatial learning.

      We appreciate Dr. Norman’s generous feedback here along with his other insightful comments. Please see below for a point-by-point response. We note that responses to a number of Dr. Norman’s points were surfaced by the Editor as Essential revisions; as such, in a number of instances in the point-by-point below we direct Dr. Norman to our responses above under the Essential revisions section.

      Weaknesses:

      General comment 1: The study has an exploratory feel, in the sense that - for the most part - the authors do not set forth specific predictions or hypotheses regarding the results they expected to obtain. When hypotheses are listed, they are phrased in a general way (e.g., "We hypothesized that we would find evidence for both integration and differentiation emerging at the same time points across learning, as participants build local and global representations of the virtual environment", and "We hypothesized that there would be a change in EC and hippocampal pattern similarity for items located on the same track vs. items located on different tracks" - this does not specify what the change will be and whether the change is expected to be different for EC vs. hippocampus). I should emphasize that this is not, unto itself, a weakness of the study, and it appears that the authors have corrected for multiple comparisons (encompassing the range of outcomes explored) throughout the paper. However, at times it was unclear what "denominator" was being used for the multiple comparisons corrections (i.e., what was the full space of analysis options that was being corrected for) - it would be helpful if the authors could specify this more concretely, throughout the paper.

      We appreciate this guidance and the importance of these points. We have taken a number of steps to clarify our hypotheses, we now distinguish a priori predictions from exploratory analyses, and we now explicitly indicate throughout the manuscript how we corrected for multiple comparisons. For full details, please see above for our response to Essential Revisions General comment #1.

      General comment 2: Some of the analyses featured prominently in the paper (e.g., interactions between context and scan in EC) did not pass multiple comparisons correction. I think it's fine to include these results in the paper, but it should be made clear whenever they are mentioned that the results were not significant after multiple comparisons correction (e.g., in the discussion, the authors say "learning restructures representations in the hippocampus and in the EC", but in that sentence, they don't mention that the EC results fail to pass multiple comparisons correction).

      Thank you for encouraging greater clarity here. As noted directly above, we now explicitly indicate our a priori predictions, we state explicitly which results survive multiple comparisons correction, and we added necessary caveats for effects that should be interpreted with caution.

      General comment 3: The authors describe the "flat" pattern across the distance 2, 3, and 4 conditions in Figure 4c (post-global navigation) and in Figure 5b (in the "more efficient" group) as indicating integration. However, this flat pattern across 2, 3, and 4 (unto itself) could simply indicate that the region is insensitive to location - is there some other evidence that the authors could bring to bear on the claim that this truly reflects integration? Relatedly, in the discussion, the authors say "the data suggest that, prior to Global Navigation, LEs had integrated only the nearest landmarks located on different tracks (link distance 2)" - what is the basis for this claim? Considered on its own, the fact that similarity was high for link distance 2 does not indicate that integration took place. If the authors cannot get more direct evidence for integration, it might be useful for them to hedge a bit more in how they interpret the results (the finding is still very interesting, regardless of its cause).

      Based on the outcomes of additional behavioral and neural analyses that were helpfully suggested by reviewers, we revised discussion of this aspect of the data. Please see our response above under Essential Revisions General comment #4 for full details of the changes made to the manuscript.

      Reviewer #2 (Public Review):

      This paper presents evidence of neural pattern differentiation (using representational similarity analysis) following extensive experience navigating in virtual reality, building up from individual tracks to an overall environment. The question of how neural patterns are reorganized following novel experiences and learning to integrate across them is a timely and interesting one. The task is carefully designed and the analytic setup is well-motivated. The experimental approach provides a characterization of the development of neural representations with learning across time. The behavioral analyses provide helpful insight into the participants' learning. However, there were some aspects of the conceptual setup and the analyses that I found somewhat difficult to follow. It would also be helpful to provide clearer links between specific predictions and theories of hippocampal function.

      We appreciate the Reviewer’s careful read of our manuscript and their thoughtful guidance for improvement, which we believe strengthened the revised product. We note that responses to a number of the Reviewer’s points were surfaced by the Editor as Essential revisions; as such, in a number of instances in the point-by-point below we direct the Reviewer to our responses above under the Essential revisions section.

      General comment 1: The motivation in the Introduction builds on the assumption that global representations are dependent on local ones. However, I was not completely sure about the specific predictions or assumptions regarding integration vs. differentiation and their time course in the present experimental design. What would pattern similarity consistent with 'early evidence of global map learning' (p. 7) look like? Fig. 1D was somewhat difficult to understand. The 'state space' representation is only shown in Figure 1 while all subsequent analyses are averaged pairwise correlations. It would be helpful to spell out predictions as they relate to the similarity between same-route vs. different-route neural patterns.

      We appreciate this feedback. An increase in pattern similarity across features that span tracks would indicate the linking of those features together. ‘Early evidence’ here describes the point in experience where participants had traversed local (within-track) paths but had yet to traverse across-tracks.

      Figure 1D seeks to communicate the high-level conceptual point about how similarity (abstractly represented as state-space distance) may change in one of two directions as a function of experience.

      General comment 2: The shared landmarks could be used by the participants to infer how the three tracks connected even before they were able to cross between them. It is possible that the more efficient navigators used an explicit encoding strategy to help them build a global map of the world. While I understand the authors' reasoning for excluding the shared landmarks (p. 13), it seems like it could be useful to run an analysis including them as well - one possibility is that they act as 'anchors' and drive the similarity between different tracks early on; another is that they act as 'boundaries' and repel the representations across routes. Assuming that participants crossed over at these landmarks, these seem like particularly salient aspects of the environment.

      We agree that these shared landmarks play an important role in learning the global environment and guiding participants’ navigation. However, they also add confounding elements to the analyses; mainly, shared landmarks are located near multiple goal locations and associated with multiple tracks, and transition probabilities differ at shared landmarks because they have an increased number of neighboring landmarks and fractals. In the initial submission, shared landmarks were included in all analyses except (a) global distance models and (b) context models (which compare items located on the same vs different tracks).

      With respect to (a) the global distance models, we ran these models while including shared landmarks and the results did not differ (see figure below and compare to Fig. 5 in the revised manuscript):

      Distance representations in the Global Environment, with shared landmarks included. These data can be compared to Figure 5 of the revised manuscript, which does not include shared landmarks (see page 5 of this response letter).

      We continue to report the results from models excluding shared landmarks due to the confounding factors described above, with the following addition to the Results section:

      “We excluded shared landmarks from this model as they are common to multiple tracks; however, the results do not differ if these landmarks are included in the analysis.”

      With respect to (b) the context analyses (which compare items located on the same vs different tracks), we cannot include shared landmarks in these analyses because they are common amongst multiple tracks and thus confound the analyses. Finally, we are unable to conduct additional analyses investigating shared landmarks specifically (for example, examining how similarity between shared landmarks evolves across learning) due to very low trial counts. We share the Reviewer’s perspective that the role of shared landmarks during the building of map representations promises to provide additional insights and believe this is a promising question for future investigation.

      General comment 3: What were the predictions regarding the fractals vs. landmarks (p. 13)? It makes sense to compare like-to-like, but since both were included in the models it would be helpful to provide predictions regarding their similarity patterns.

      We are grateful for the feedback on how to improve the consistency of results reporting. In the revision, we updated the relevant sections of the manuscript to include results from fractals. Please see our above response to Essential Revisions General comment #5 for additions made to the text.

      General comment 4: The median split into less-efficient and more-efficient groups does not seem to be anticipated in the Introduction and results in a small-N group comparison. Instead, as the authors have a wealth of within-individual data, it might be helpful to model single-trial navigation data in relation to pairwise similarity values for each given pair of landmarks in a mixed-effects model. While there won't be a simple one-to-one mapping and fMRI data are noisy, this approach would afford higher statistical power due to more within-individual observations and would avoid splitting the sample into small subgroups.

      We appreciate this very helpful suggestion. Following this guidance, we removed the median-split analysis and ran a mixed-effects model relating trial-wise navigation data (at the beginning of the Global Navigation Task) to pairwise similarity values for each given pair of landmarks and fractals (Post Local Navigation). We also altered our approach to the across-participant analysis examining brain-behavior relationships. Please see our above response to Essential Revisions General comment #3 for additions to the revised manuscript.

      General comment 5: If I understood correctly, comparing Fig. 4B and Fig. 5B suggests that the relationship between higher link distance and lower representational similarity was driven by less efficient navigators. The performance on average improved over time to more or less the same level as within-track (Fig. 2). Were less efficient navigators particularly inefficient on trials with longer distances? In the context of models of hippocampal function, this suggests that good navigators represented all locations as equidistant while poorer navigators showed representations more consistent with a map - locations that were further apart were more distant in their representational patterns. Perhaps more fine-grained analyses linking neural patterns to behavior would be helpful here.

      Following the above guidance, we removed the median-split analyses when exploring across-participant brain-behavior relationships (see Essential Revisions General comment #3), replacing it with a mixed-effects model analysis, and we revised our discussion of the across-track link distance effects (see Essential Revisions General comment #4). For this reason, we were hesitant and ultimately decided against conducting the proposed fine-grained analyses on the median-split data.

      General comment 6: I'm not completely sure how to interpret the functional connectivity analysis between the vmPFC and the hippocampus vs. visual cortex (Fig. 6). The analysis shows that the hippocampus and visual cortex are generally more connected than the vmPFC and visual cortex - but this relationship does not show an experience-dependent relationship and is consistent with resting-state data where the hippocampus tends to cluster into the posterior DMN network.

      We expected to see an experience-dependent relationship between vmPFC and hippocampal pattern similarity, and agree that these findings are difficult to interpret. Based on comments from several reviewers, we removed the second-order similarity analysis from the manuscript in favor of an analysis which models the relationship between vmPFC pattern similarity and hippocampal pattern similarity. Moreover, given the exploratory nature of the vmPFC analyses, and following guidance from Reviewer 1 about the visual cortex control analyses, both were moved to the Appendix. Please see our above response to Essential Revisions General comment #7 for further details of the changes made to the manuscript.

      Reviewer #3 (Public Review):

      Fernandez et al. report results from a multi-day fMRI experiment in which participants learned to locate fractal stimuli along three oval-shaped tracks. The results suggest the concurrent emergence of a local, differentiated within-track representation and a global, integrated cross-track representation. More specifically, the authors report decreases in pattern similarity for stimuli encountered on the same track in the entorhinal cortex and hippocampus relative to a pre-task baseline scan. Intriguingly, following navigation on the individual tracks, but prior to global navigation requiring track-switching, pattern similarity in the hippocampus correlated with link distances between landmark stimuli. This effect was only observed in participants who navigated less efficiently in the global navigation task and was absent after global navigation.

      Overall, the study is of high quality in my view and addresses relevant questions regarding the differentiation and integration of memories and the formation of so-called cognitive maps. The results reported by the authors are interesting and are based upon a well-designed experiment and thorough data analysis using appropriate techniques. A more detailed assessment of strengths and weaknesses can be found below.

      Strengths

      1) The authors address an interesting question at the intersection of memory differentiation and integration. The study is further relevant for researchers interested in the question of how we form cognitive maps of space.

      2) The study is well-designed. In particular, the pre-learning baseline scan and the random-order presentation of stimuli during MR scanning allow the authors to track the emergence of representations in a well-controlled fashion. Further, the authors include an adequate control region and report direct comparisons of their effects against the patterns observed in this control region.

      3) The manuscript is well-written. The introduction provides a good overview of the research field and the discussion does a good job of summarizing the findings of the present study and positioning them in the literature.

      We thank Dr. Bellmund for his positive evaluation of the manuscript. We greatly appreciate the insightful feedback, which we believe strengthened the manuscript’s clarity and potential impact. We note that responses to a number of Dr. Bellmund’s points were surfaced by the Editor as Essential revisions; as such, in a number of instances in the point-by-point below we direct the Reviewer to our responses above under the Essential revisions section.

      Weaknesses

      General comment 1: Despite these distinct strengths, the present study also has some weaknesses. On the behavioral level, I am wondering about the use of path inefficiency as a metric for global navigation performance. Because it is quantified based on the local response, it conflates the contributions of local and global errors.

      We appreciate this point with respect to path inefficiency during global navigation. As noted below, following Dr. Bellmund’s further insightful guidance, we now complement the path inefficiency analyses with additional metrics of across-track (global) navigation performance, which effectively separate local from global errors (please see below response to Author recommendation #1).

      General comment 2: For the distance-based analysis in the hippocampus, the authors choose to only analyze landmark images and do not include fractal stimuli. There seems to be little reason to expect that distances between the fractal stimuli, on which the memory task was based, would be represented differently relative to distances between the landmarks.

      We are grateful for the feedback on how to improve the consistency of results reporting. In the revision, we updated the relevant sections of the manuscript to include results from fractals. Please see our above response to Essential Revisions General comment #5 for full details.

      General comment 3: Related to the aforementioned analysis, I am wondering why the authors chose the link distance between landmarks as their distance metric for the analysis and why they limit their analysis to pairs of stimuli with distance 1 or 2 and do not include pairs separated by the highest possible distance (3).

      We appreciate the request for clarification here. Beginning with the latter question, we note that the highest possible distance varies between within-track vs. across-track paths. If participants navigate in the Local Navigation Task using the shortest or most efficient path, the highest possible within-track link distance between two stimuli is 2. For this reason, the Local Navigation/within-track analysis includes link distances of 1 and 2. For the Global Navigation analysis, we also include pairs of stimuli with link distances of 3 and 4 when examining across-track landmarks.

      Regarding the use of link distance as the distance metric, we note that the path distance (a.u.) varies only slightly between pairs of stimuli with the same link distance. As such, categorical treatment link distance accounts for the vast majority of the variance in path distance and thus is a suitable approach. Please note that in the new trial-level brain-behavior analysis included in the revised manuscript (which replaces the median-split analysis), we used the length of the optimal path.

      General comment 4: Surprisingly, the authors report that across-track distances can be observed in the hippocampus after local navigation, but that this effect cannot be detected after global, cross-track navigation. Relatedly, the cross-track distance effect was detected only in the half of participants that performed relatively badly in the cross-track navigation task. In the results and discussion, the authors suggest that the effect of cross-track distances cannot be detected because participants formed a "more fully integrated global map". I do not find this a convincing explanation for why the effect the authors are testing would be absent after global navigation and for why the effect was only present in those participants who navigated less efficiently.

      We appreciate Dr. Bellmund’s input here, which was shared by other reviewers. We revised and clarified the Discussion based on reviewer comments. Please see our above response to Essential Revisions General comment #4 for full details.

      General comment 5: The authors report differences in the hippocampal representational similarity between participants who navigated along inefficient vs. efficient paths. These are based on a median split of the sample, resulting in a comparison of groups including 11 and 10 individuals, respectively. The median split (see e.g. MacCallum et al., Psychological Methods, 2002) and the low sample size mandate cautionary interpretation of the resulting findings about interindividual differences.

      We appreciate the feedback we received from multiple reviewers with respect to the median-split brain-behavior analysis. We replaced the median-split analysis with the following: 1) a mixed-effects model predicting neural pattern similarity Post Local Navigation, with a continuous metric of task performance (each participant’s median path inefficiency for across-track trials in the first four test runs of Global Navigation) and link distance as predictors; and 2) a mixed-effects model relating trial-wise navigation data to pairwise similarity values for each given pair of landmarks and fractals (as suggested by Reviewer 2). Please see our above response to Essential Revisions General comment #3 for additions to the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This study used GWAS and RNAseq data of TCGA to show a link between telomere length and lung cancer. Authors identified novel susceptibility loci that are associated with lung adenocarcinoma risk. They showed that longer telomeres were associated with being a female nonsmoker and early-stage cancer with a signature of cell proliferation, genome stability, and telomerase activity.

      Major comments:

      1) It is not clear how are the signatures captured by PC2 specific for lung adenocarcinoma compared to other lung subtypes. In other words, why is the association between long telomeres specific to lung adenocarcinoma?

      We thank the reviewer for raising this point (similarly mentioned by reviewer #2). Indeed, it is unclear why genetically predicted LTL appears more relevant to lung adenocarcinoma. We have used LASSO approach to select important features of PC2 in lung adenocarcinoma and inferred PC2 in lung squamous cell carcinomas tumours to better explore the differences between histological subtypes. The new results are presented in Figure 5, as well as being described in the methods and results sections. In addition, we have expanded upon this point in the discussion with the following paragraph (page 11, lines 229-248):

      ‘An explanation for why long LTL was associated with increased risk of lung cancer might be that individuals with longer telomeres have lower rates of telomere attrition compared to individuals with shorter telomeres. Given a very large population of histologically normal cells, even a very small difference in telomere attrition would change the probability that a given cell is able to escape the telomere-mediated cell death pathways (24). Such inter-individual differences could suffice to explain the modest lung cancer risk observed in our MR analyses. However, it is not clear why longer TL would be more relevant to lung adenocarcinoma compared to other lung cancer subtypes. A suggestion may come from our observation that longer LTL is related to genomic stable lung tumours (such as lung adenocarcinomas in never smokers and tumours with lower proliferation rates) but not genomic unstable lung tumours (such as heavy smoking related, highly proliferating lung squamous carcinomas). One possible hypothesis is that histologic normal cells exposed to highly genotoxic compounds, such as tobacco smoking, might require an intrinsic activation of telomere length maintenance at early steps of carcinogenesis that would allow them to survival, and therefore, genetic differences in telomere length are less relevant in these cells. By contrast, in more genomic stable lung tumours, where TL attrition rate is more modest, the hypothesis related to differences in TL length may be more relevant and potentially explaining the heterogeneity in genetic effects between lung tumours (Figure 2). Alternately, we also note that the cell of origin may also differ, with lung adenocarcinoma is postulated to be mostly derived from alveolar type 2 cells, the squamous cell carcinoma is from bronchiolar epithelium cells (19), possibly suggesting that LTL might be more relevant to the former.

      2) The manuscript is lacking specific comparisons of gene expression changes across lung cancer subtypes for identified genes such as telomerase etc since all the data is presented as associations embedded within PCs.

      The genes associated with telomere maintenance such as TERT and TERC are very low expressed in these tumours (Barthel et al NG 2017). In this context, no sample has more than 5 normalised read counts by RNA-sequencing for TERT within TCGA lung cohorts (TCGA-LUSC, TCGA-LUAD). As such we have not explored the difference by individual telomere related genes. Nevertheless, we have explored an inferred telomerase activity gene signature, developed by Barthel et al and we did explore this in the context of lung adenocarcinoma tumours. We have added a note in the result section to inform the reader regarding why we did not directly test TERT/TERC expression (page 9, lines 184-187).

      3) It is not clear how novel are the findings given that most of these observations have been made previously i.e. the genetic component of the association between telomere length and cancer.

      Others, including ourselves, have studied TL and lung cancer. We have built on that on the most updated TL genetic instrument and the largest lung cancer study available. In addition, we provided insights into the possible mechanisms in which telomere length might affect lung adenocarcinoma development. Using colocalisation analyses, we reported novel shared genetic loci between telomere length and lung adenocarcinoma (MPHOSPH6, PRPF6, and POLI), such genes/loci that have not previously linked to lung adenocarcinoma susceptibility. For MPHOSPH6 locus, we showed that the risk allele of rs2303262 (missense variant annotated for MPHOSPH6 gene) colocalized with increased lung adenocarcinoma risk, lower lung function (FEV1 and FVC), and increased MPHOSPH6 gene expression in lung, as highlighted in the discussion section of the revised manuscript.

      In addition, we have used a PRS analysis to identify a gene expression component associated with genetically predicted telomere length in lung adenocarcinoma but not in squamous cell carcinoma subtype. The aspect of this gene expression component associated with longer telomere length are also associated with molecular characteristics related to genome stability (lower accumulation of DNA damage, copy number alterations, and lower proliferation rates), being female, early-stage tumours, and never smokers, which is an interesting but not completely understood lung cancer strata. As far as we are aware, this is the first time an association between a PRS related to an etiological factor, such as telomere length and a particular expression component in the tumour.

      We have adjusted the discussion further highlight the novel aspects in the discussion section of the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript of Penha et al performs genetic correlation, Mendelian randomization (MR), and colocalization studies to determine the role of genetically determined leukocyte telomere length (LTL) and susceptibility to lung cancer. They develop an instrument from the most recent published association of LTL (Codd et al), which here is based on n=144 genetic variants, and the largest association study of lung cancer (including ~29K cases and ~56K controls). They observed no significant genetic correlation between LTL and lung cancer, in MR they observed a strong association that persisted after accounting for smoking status. They performed colocalization to identify a subset of loci where LTL and lung cancer risk coincided, mainly around TERT but also other loci. They also utilized RNA-Seq data from TCGA lung cancer adenocarcinoma, noting that a particular gene expression profile (identified by a PC analysis) seemed to correlate with LTL. This expression component was associated with some additional patient characteristics, genome stability, and telomerase activity.

      In general, most of the MR analysis was performed reasonably (with some suggestions and comments below), it seems that most of this has been performed, and the major observations were made in previous work. That said, the instrument is better powered and some sub-analyses are performed, so adds further robustness to this observation. While perhaps beyond the scope here, the mechanism of why longer LTL is associated with (lung) cancer seems like one of the key observations and mechanistically interesting but nothing is added to the discussion on this point to clarify or refute previous speculations listed in the discussion mentioned here (or in other work they cite).

      Some broad comments:

      1) The observations that lung adenocarcinoma carries the lion's share of risk from LTL (relative to other cancer subtypes) could be interesting but is not particularly highlighted. This could potentially be explored or discussed in more detail. Are there specific aspects of the biology of the substrata that could explain this (or lead to testable hypotheses?)

      We thank the reviewer for these comments. A similar point was raised by reviewer #1. Please see our response above, as well as the additional analysis described in Figure 5 that considers the differences by histological subtype.

      2) Given that LTL is genetically correlated (and MR evidence suggests also possibly causal evidence in some cases) across a range of traits (e.g., adiposity) that may also associate with lung cancer, a larger genetic correlation analysis might be in order, followed by a larger set of multivariable MR (MVMR) beyond smoking as a risk factor. Basically, can the observed relationship be explained by another trait (beyond smoking)? For example, there is previous MR literature on adiposity measures, for example (BMI, WHR, or WHRadjBMI) and telomere length, plus literature on adiposity with lung cancer; furthermore, smoking with BMI. A bit more comprehensive set of MVMR analyses within this space would elevate the significance and interpretation compared to previous literature.

      Indeed, there are important effects related to BMI and lung cancer (Zhou et al., 2021. Doi:10.1002/ijc.33292; Mariosa et al., 2022. Doi: 10.1093/jnci/djac061). We have tested the potential for influence on our finding using MVMR, modelling LTL and BMI using a BMI genetic instrument of 755 SNPs obtained from UKBB (feature code: ukb-b-19953). This multivariate approach did not result any meaningful changes in the associations between LTL and lung cancer risk.

      3) In the initial LTL paper, the authors constructed an IV for MR analyses, which appears different than what the authors selected here. For example, Codd et al. proposed an n=130 SNP instrument from their n=193 sentinel variants, after filtering for LD (n=193 >>> n=147) and then for multi-trait association (n=147 >> n=130). I don't think this will fundamentally change the author's result, but the authors may want to confirm robustness to slightly different instrument selection procedures or explain why they favor their approach over the previous one.

      We appreciate the reviewer’s suggestion. Our study is designed for a Mendelian Randomization framework and chose to be conservative in the construction of our instrumental variable (IV). We therefore applied more stringent filters to the LTL variants relative to Codd et al’s approach. We applied a wider LD window (10MB vs. 1MB) centered around the LTL variants that were significant at genome-wide level (p<5e-08) and we restricted our analyses to biallelic common SNPs (MAF>1% and r2<0.01 in European population from 1000 genomes). Nevertheless, the LTL genetic instrument based on our study (144 LTL variants) is highly correlated with the PRS based on the 130 variants described by Codd et al. (correlation estimate=0.78, p<2.2e-16). The MR analyses based on the 130 LTL instrument described by Codd et al showed similar results to our study.

      4) Colocalization analysis suggests that a /subset/ of LTL signals map onto lung cancer signals. Does this mean that the MR relationships are driven entirely by this small subset, or is there evidence (polygenic) from other loci? Rather than do a "leave one out" the authors could stratify their instrument into "coloc +ve / coloc -ve" and redo the MR analyses.

      Mainly here, the goal is to interpret if the subset of signals at the top (looks like n=14, the bump of non-trivial PP4 > 0.6, say) which map predominantly to TERT, TERC, and OBFC1 explain the observed effect here. I.e., it is biology around these specific mechanisms or generally LTL (polygenicity) but exemplified by extreme examples (TERT, etc.). I appreciate that statistical power is a consideration to keep in mind with interpretation.

      We appreciate the reviewer’s comment and, indeed, we considered this idea. However, the analytical approach used the lung cancer GWAS to identify variants that colocalise. To validate this hypothesis that a subset of colocalised variants would be driving all the MR associations, we would need an independent lung cancer case control study to act as an out-of-sample validation set. This is not available to us at this point. Nevertheless, we slightly re-worded the discussion to highlight that the colocalised loci tend to be near genes related to telomere length biology and are also exploring the colocalisation approach to select variants for PRS analysis elsewhere.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to answer the standing mystery of an origin of a unique and complex system that is hagfish slime. They formulated a cogent scenario for the co-option of epidermal thread cells and mucous cells into slime and slime glands. Both histology and EM images back this up. It is a delight to see detailed and careful morphological analysis of both the cells and the secretion. The weakness of the manuscript lies in: a) the absence of an alternative hypothesis (therefore the lacking sense of hypothesis testing); and b) oversimplification and insufficient description of results in transcriptomic and phylogenetic comparison.

      These are both key elements of the narrative. Because all the data "support" the only scenario considered in this paper, it could risk giving the impression of a just-so story. My reading of the results of their transcriptomic and phylogenetic analyses is more nuanced than explained in the paper. For example, the authors didn't explain in sufficient detail how the data summary in Fig. 5 "demonstrate" that the epidermal thread cells are "ancestral", and that the diversity of alpha and gamma thread biopolymer genes is a prerequisite to slime (without a functional analysis), or that the gene duplication events facilitated the origin of hagfish slime.

      Thank you for these thoughtful comments.

      We have made extensive changes to address the two issues raised by the reviewer. For the first one, we added discussion of an alternative hypothesis, namely a cloacal origin of hagfish slime glands (see Line 369). For the second, we added new transcriptomic data from a second species (E. stoutii), and provided more detailed phylogenetic analyses and explanations. Details are provided below and can be seen in the revised manuscript.

      Reviewer #2 (Public Review):

      The study is a careful investigation of the physical properties of hagfish slime and the underlying cellular framework that enables this extraordinary evolutionary innovation. I appreciate the careful and detailed measurements and images that the authors provide. The results presented here will surely be extremely important for researchers working on this particular organism and those interested in understanding the evolution, biomedical relevance, and biochemistry of mucus. However, I had difficulty contextualizing the findings in broader biological questions (e.g., the evolution of functional novelty, the adaptive processes, and the links between genetic and phenotypic evolution). I also think that the conclusions on the evolutionary origins and underlying genetics of hagfish slime based on comparative transcriptomic data may be premature.

      Thank you for the thoughtful comments. In this revision, we have rewritten several sections and reorganized the Introduction for clearer readability. Also, we added discussion of an alternative hypothesis that the slime glands might be derived from cloacal glands (see Discussion, Line 369). Further, we provided more detailed transcriptomic data and phylogenetic analyses, along with enriched interpretations, to address the evolution of thread genes.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript aims to provide a comprehensive insight into the development of the tuberal hypothalamus of the chick by carefully analyzing the expression patterns of a plethora of proteins involved and perturbation of BMP signaling.

      Strengths:

      This manuscript presents the results of an in-depth analysis aimed to unravel the expression of a variety of transcription factors, and the role of signaling molecules, in particular BMP, SHH and Notch, and, and the role of BMP for the development of the tubular hypothalamus. For this, the authors applied a variety of methods, including in-situ RNA hybridizations to chick embryos, fate mapping, explant cultures, and loss and gain-functions studies in embryos, complemented by carefully mining previously performed scRNA-Seq data. From the data they derive a model, which explains the dynamic changes of expression of signaling molecules and transcription factors from anterior to posterior during chick development. In addition, they show that fate specification and growth occur concomitantly. Overall, the data provide a plethora of information on expression patterns and consequences of BMP signaling perturbation, which will be valuable for scientists interested in the events taking place during the development of the chick tubular hypothalamus.

      We thank the reviewer for recognising the value of this study for development of the chick tuberal hypothalamus.

      Weaknesses:

      The plethora of data presented makes it very difficult for a reader, who is not familiar with this system, to follow the major conclusions from each of the panels. This difficulty is enhanced by the lack of a concise, simple and focused summary at the end of most chapters, which, from my point of view, still contains too many details. Similarly, the discussion too often refers to details presented in the figures of the Results section, rather than giving a broader and focused summary and pointing out to novel conclusions.

      We have extensively revised the manuscript, to ensure that it is easier to follow and is less detailed. We have tightened and shortened the Introduction, without losing content or context. We have revised the narrative in the Results section, to reflect revisions to figures (detailed below and in response to Reviewer 2 comments), cut back on detail, and summarised each section. We have streamlined the Discussion, so that the broader points and novel conclusions are more prominent.

      Revisions to figures are as follows:

      1. Several main Figures and associated Supplementary Figures have been rearranged so that the text and figures are easier to follow. The rearrangements mean that the reader can follow critical conceptual points without having to jump from main to supplementary figures. Key rearrangements have been made between Figure 1 and Figure 1-figure supplement 1; Figure 2 and Figure 2-figure supplement 1; Figure 2 and Figure 2-figure supplement 2; Figure 6 and Figure 6 supplement 1.

      2. Throughout the manuscript, we have added new images/replaced previous images in cases where key points were not coming across clearly (see Reviewer 2 comments). New data is shown in Figures 1F, G, T-T”; Figures 2G-P’; Figure 2-figure supplement 1 (panels A and E); Figure 2-figure supplement 2 (panels B, E-G; Q-T).

      3. Throughout the manuscript we have improved the schematics, making it easier to follow key domains and, separately, gene expression patterns

      4. Finally, in light of the comment on the plethora of data, detail and the overall difficulty in following the manuscript, we have removed in situ data that was not needed for our central arguments (previous panels 1F-J and 1R-T).

      I also suggest that the authors check the Materials and Methods section, which does not always contain the information required. For example, in the chapter on "Chicken HCR": I guess they used the HCR IHC kit from Molecular Instruments? What kind of "modification" of the Molecular Instruments protocol did they introduce?

      We have revised the Material and Methods section as required. We followed the Molecular Instrument Protocol HCRv3-Chicken, but included a methanol dehydration step, which we have now added.

    1. Author Response:

      Reviewer #1 (Public Review):

      There is growing precedent for the utility of GWAS-type analyses in elucidating otherwise cryptic genotypic associations with specific Mtb phenotypes, most commonly drug resistance. This study represents the latest instalment of this type of approach, utilizing a large set of WGS data from clinical Mtb isolates and refining the search for DR-associated alleles by restricting the set to those predicted (or known) to be phenotypically DR. This revealed a number of potential candidate mutations, including some in nucleotide excision repair (uvrA, uvrB), in base excision repair (mutY), and homologous recombination (recF). In validating these leads functional assays, the authors present evidence supporting the impact of the identified mutations on antibiotic susceptibility in vitro and in macrophage and animal infection models. These results extend the number of candidate mutations associated with Mtb drug resistance, however the following must be considered:

      (i) The GWAS analysis is the basis of this study, yet the description of the approach used and presentation of results obtained is occasionally obscure; for example, the authors report the use of known drug resistance phenotypes (where available) or inferences of drug-resistance from genotypic data to enhance the potential to identify other mutations that might be implicated in enabling the DR mutations, yet their list of known DR mutations seem to be predominantly rare or unusual mutations, not those commonly associated with clinical DR-TB. In addition, the distribution of the identified resistance-associated mutations across the different lineages need to be explained more clearly.

      In the revised manuscript, we have performed the phylogenetic analysis of the strains used. A phylogenetic tree was generated using Mycobacterium canetti as an outgroup (Figure 1b). The phylogeny analysis suggests the clustering of the strains in lineage 1, 2, 3, and 4. Lineages 2,3 and 4 are clustering together, and lineage 1 is monophyletic, as reported previously. The genome sequence data of 2773 clinical strains were downloaded from NCBI. These strains were also part of the GWAS analysis performed by Coll et al (https://pubmed.ncbi.nlm.nih.gov/29358649/) and Manson et al. (https://pubmed.ncbi.nlm.nih.gov/28092681/). The phenotype of the strains used for the association analysis was reported in the previous studies. We have not performed other predictions. The supplementary table provides the lineage origin of each strain used in the study (Supplementary File 1 & 2). The distributions of resistance-associated mutations in different strains is shown (Figure 2-figure supplement 6a-h). As suggested, we have performed an analysis wherein we looked for the direct target mutations that harbor mutations in the DNA repair genes (Figure 2-figure supplement 6i-k).

      We identified mostly the rare mutations due to the following reasons;

      1. We looked for the mutations that were present only in the multidrug resistant strains as compared to the susceptible strains for association mapping. This strategy exclusively gave most variants associated with multidrug resistant phenotype.

      2. We have used Mixed Linear Model (MLM) for association analysis. MLM removes all the population-specific SNPs based on PCA and kinship corrections. The false discovery rate (FDR) adjusted p-values in the GAPIT software are stringent as it corrects the effects of each marker based on the population structure (Q) as well as kinship (K) values. Therefore the probability of identifying the false-positive SNP is very low. We combined it with the Bonferroni corrections to identify markers associated with the drug resistant phenotype.

      (ii) By combining target gene deletions with different complementation alleles, the authors provide compelling microbiological evidence supporting the inferred role of the mutY and uvrB mutations in enhanced survival under antibiotic treatment. The experimental work, however, is limited to assessments of competitive survival in various models, with/without antibiotic selection, or to mutant frequency analyses; there is no direct evidence provided in support of the proposed mechanism.

      To ascertain if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, is indeed due to the acquisition of mutations in the direct target of antibiotics, we performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro), was mixed in equal proportions before library preparation. Only those SNPs present in >20% of reads were retained for the analysis. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotics, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY:mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY:mutY (Figure 8-Figure Supplement 2). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

      (iii) The low drug concentrations used (especially of rifampicin against M. smegmatis) suggest the identified mutations confer low-level resistance to multiple antimycobacterial agents - in turn implying tolerance rather than resistance. If correct, it would be interesting to know how broadly tolerant strains containing these mutations are; that is, whether susceptibility is decreased to a broad range of antibiotics with different mechanisms of action (including both cidal and static agents), and whether the extent of the decrease be determined quantitatively (for example, as change in MIC value).

      To evaluate the effect of different drugs on the survival of RvDmutY or RvDmutY::mutYR262Q, we performed killing kinetics in the presence and absence of isoniazid, rifampicin, ciprofloxacin, and ethambutol (Figure 4a). In the absence of antibiotics, the growth kinetics of Rv, RvDmutY, RvDmutY:mutY, and RvDmutY::mutY-R262Q were similar (Figure 4b). In the presence of isoniazid, ~2 log-fold decreases in bacterial survival was observed on day 3 in Rv and RvDmutY:mutY; however, in RvDmutY and RvDmutY::mutY-R262Q, the difference was limited to ~1.5 log-fold (Figure 4c). A similar trend was apparent on days 6 and 9, suggesting a ~5-fold increase in the survival of RvDmutY and RvDmutY::mutY-R262Q compared with Rv and RvDmutY:mutY (Figure 4c). Interestingly, in the presence of ethambutol, we did not observe any significant difference (Figure 4d). In the presence of rifampicin and ciprofloxacin, we observed a ~10-fold increase in the survival of RvDmutY and RvDmutY::mutY-R262Q compared with Rv and RvDmutY:mutY (Figure 4e-f). Thus results suggest that the absence of mutY or the presence of mutY variant aids in subverting the antibiotic stress.

      Reviewer #2 (Public Review):

      This interesting manuscript uses a collection of whole genome sequences of TB isolates to associate specific sequence polymorphisms with MDR/XDR strains, and having found certain mutations in DNA repair pathways, does a detailed analysis of several mutations. The evaluation of the MutY polymorphism reveals it is loss of function and TB strains carrying this mutation have a higher mutation frequency and enhanced survival in serial passage in macrophages. The strengths of the manuscript are the leveraging of a large sequence dataset to derive interesting candidate mutations in DNA repair pathway and the demonstration that at least one of these mutations has a detectable effect on mutagenicity and pathogenesis. The weaknesses of the manuscript are a lack of experimental exploration of the mechanism by which loss of a DNA repair pathway would enhance survival in vivo. The model presented is that these phenotypes are due to hypermutagenicity and thereby evolution of enhanced pathogenesis, but this is not actually directly tested or investigated. There are also some technical concerns for some of the experimental data which can be strengthened.

      This paper presents the following data:

      • Analyzed whole-genome sequences 2773 clinical strains: 160 000 SNPs identified
      • 1815 drug-susceptible/422 MDR/XDR strains: 188 mutations correlated with Drug resistance.
      • Novel mutations associated with the drug resistance have been found in base excision repair (BER), nucleotide excision repair (NER), and homologous recombination (HR) pathway genes (mutY, uvrA, uvrB, and recF).
      • Specific mutations mutY-R262Q and uvrB-A524V were studied.
      • mutY-R262Q and uvrB-A524V mutations behave as loss of function alleles in vivo, as measured by non-complementation of the increased mutation frequency measured by resistance to Rif and INH.
      • The mutY deletion and the mutY-R262Q mutation increase Mtb survival over WT in macrophages when Mtb has not been submitted to previous rounds of macrophage infection.
      • This advantage is exacerbated in presence of antibiotic (Rif and Cipro but not INH).
      • The MutY deletion and the MutY-R262Q mutation result in an enhanced survival of Mtb during guinea pig infection.

      Major issues:

      The finding that mutations in MutY confers an advantage during macrophage infection is convincing based on the macrophage experiments, but it is premature to conclude that the mechanism of this effect is due to hypermutagenesis and selection of fitter bacterial clones. It is described in E. coli (Foti et al., 2012) and recently in mycobacteria (Dupuy et al., 2020) that the MutY/MutM excision pathways can increase the lethality of antibiotic treatment because of double-strand breaks caused by Adenine/oxoG excisions. The higher survival of the mutY mutant during antibiotic treatment could more be due to lower Adenine/oxoG excision in the mutant rather than acquisition of advantageous mutations, or some other mechanism. The same hypothesis cannot be excluded for the Guinea pig experiments (no antibiotics, but oxidative stress mediated by host defenses could also increase oxoG) and should at least be discussed. Experiments that would support the idea that the in vivo advantage is due to hypermutagenesis would be whole genome sequencing of the output vs input populations to directly document increased mutagenesis. Similarly, is the ΔmutY survival advantage after rounds of macrophage infections dependent on macrophage environment? What happens if the ΔmutY strain is cultivated in vitro in 7H9 (same number of generations) before infecting macrophages?

      We thank the reviewer for the insightful comments. To ascertain if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, is indeed due to the acquisition of mutations in the direct target of antibiotics, we performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro) was mixed in equal proportion prior to library preparation. For the analysis, only those SNPs that were present in >20% of reads were retained. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of the Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotic, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY:mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c-e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY:mutY (Figure 8-figure supplement 2). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

      • It would be useful to present more data about the strain relatedness and genome characteristics of the DNA repair mutant strains in the GWAS. For example, the model would suggest that strains carrying DNA repair mutations should have higher SNP load than control strains. Additionally, it would be helpful to know whether the identified DNA repair pathway mutations are from epidemiologically linked strains in the collection to deduce whether these events are arising repeatedly or are a founder effect of a single mutant since for each mutation, the number of strains is small.

      We analyzed the genome of the clinical strains that possess DNA repair gene mutations to determine the additional polymorphisms. The number of SNPs in the strains harboring DNA repair mutation and the drug susceptible strains appears to be similar. The marginal difference, if any were not statistically significant.

      We agree with the reviewer that these strains might be epidemiologically linked. In the present study, all the strains harboring mutation in mutY belong to lineage 4. We observed that all the mutY mutationcontaining strains were either MDR or pre-XDR compared with drug susceptible strains of the same clade.

      • Some of the mutation frequency, survival and competition data could be strengthened by more experimental replicates. Data Lines 370-372 (mutation frequency), lines 387-388 (Survival of strains ex vivo), line 394 (competition experiment) : "Two biologically independent experiments were performed. Each experiment was performed in technical triplicates. Data represent one of the two biological experiments." Two biological replicates is insufficient for the phenotypes presented and all replicates should be included in the analysis. In addition, the definition of "technical triplicates" should be given, does this mean the same culture sampled in triplicate?

      We thank the reviewer for the comment. We performed at least two independent experiments with biological triplicates (not technical triplicates). We apologize for writing this incorrectly. We have reported data from one independent experiment consisting of at least biological triplicates. For mutation rate analysis, we have performed experiment using six independent colonies. These points are mentioned in the methods and legends of the revised manuscript.

      • MutY phenotypes. One caveat to the conclusion that the MutY R262Q mutant is nonfunctional is the lack of examination of the expression of the complementing protein. I would be informative to comment on the location of this mutation in relation to the known structures of MutY proteins. Similarly, for the UvrB polymorphism, this null strain has a clear UV sensitivity phenotype in the literature, so a fuller interrogation for UV killing would be informative re: the A524V mutation.

      We have now included the western blot data on both complementation strains (Figure 3-figure supplement 1). We agree with the reviewer that the uvrB null mutant may have UV sensitivity phenotype, but we have not performed the experiment in the present study.

      Reviewer #3 (Public Review):

      STRENGTHS

      • This ambitious study is broad in scope, beginning with a bacterial GWAS study and extending all the way to in vivo guinea pig infection models.

      • Numerous reports have attempted to identify Mtb strains with elevated mutation rates, and the results are conflicting. The present study sets out to thoroughly evaluate one such mutation that may produce a mutator phenotype, mutY-Arg262Gln.

      WEAKNESSES

      • While the authors follow-up experiments with the mutY-Arg262Gln allele are all consistent with the conclusion that this mutation elevates the mutation rate in Mtb and thus could promote the evolution of drug resistance, further work is needed to unambiguously demonstrate this link.

      • The authors highlight five mutations in genes associated with DNA replication and or repair from their GWAS analysis:

      o dnaA-Arg233Gln: as the authors note in the Discussion, Hicks et al. associate SNPs in dnaA with low-level isoniazid resistance, as a result of lowered katG expression. Since this is unrelated to their focus on DNA repair genes whose mutation could elevate mutation rates, I would consider removing this allele from the Table.

      As suggested, we have removed the dnaA from Table 3.

      o mutY-Arg262Gln: querying publicly available whole genome sequences of clinical Mtb isolates, this SNP appears to be restricted to lineage 4.3 (L4.3). All of these L4.3 strains appear to be drug-resistant. How many times did the mutY-Arg262Gln mutation evolve in the authors dataset? If there is evidence of homoplastic evolution, this would strengthen their case. If not, it doesn't mean the authors findings are incorrect, but does elevate that risk that this mutation could be a passenger (i.e. not driver) mutation. To address this, the authors could attempt to date when the mutY-Arg262Gln arose. If it was before the evolution of drug-resistance conferring alleles in these L4.3 strains, that is consistent with (but not proof of) a driver mutation. If mutY-Arg262Gln arose after, this is much more consistent with a passenger mutation.

      As pointed out by the reviewer, the mutY-Arg262Gln mutation is restricted to lineage 4. We have checked the mutY gene sequence from the strains harboring mutY Arg262Gln mutation and sensitive strains of the same clade. We identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation that could be used for performing molecular clock analysis. To ascertain whether it is a passenger or a driver mutation, we have performed multiple experiments that suggest that identified mutation aids in the acquisition of drug resistance.

      o uvrB-Ala524Val: curiously we don't see this SNP in our dataset of publicly available whole genome sequences of clinical Mtb isolates (~45,000 genomes).

      We have rechecked this SNP in our dataset. This SNP was present in 87 drug-resistant strains that belong to lineage 2.

      o uvrA-Gln135Lys: this SNP also appears to be restricted to lineage 4.3. Same question as for mutY-Arg262Gln.

      As pointed out by the reviewer, uvrA-Gln135lys mutation is restricted to lineage 4. We identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation that can be used for performing molecular clock analysis

      o recF-Gly269Gly: this is a very common mutation, is it unique to lineage 2.2.1? Same question as for mutY-Arg262Gln.

      RecF-Gly269Gly mutation was present in the lineage 2 strains. Here also, we identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation could be used for performing molecular clock analysis.

      • The CRYPTIC consortium recently published a number of preprints on biorxiv detailing very large GWAS studies in Mtb. Did any of these reports also associate drug resistance with mutY? If yes, this should be stated. If not, the potential reasons for this discrepancy should be discussed.

      We have checked the recently published CRYPTIC consortium article (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001721#sec012) for mutY-Arg262Gln. We did not find the mutY-Arg262Gln mutation in their analysis; this is due to the different strains used in the study. However, we identified recF Gly269Gly mutation in their datase

      • Based on the authors follow-up studies in vivo, MutY-Arg262Gln is presumed to be a loss-of-function allele. If the authors could convincingly demonstrate this biochemically with recombinant proteins, this would significantly strengthen their case.

      Experiments performed in Msm and Mtb mutant strains suggest that MutY variant is a loss-of-function allele. We have not performed in vitro assays to confirm the same.

      • If the authors are correct and mutY-Arg262Gln strains have elevated mutation rates, presumably there would be evidence of this in the clinical strain sequencing data. Do mutY-Arg262Gln containing strains have elevated C→G or C→A mutations in their genomes? Presumably such strains would also have a higher number of SNPs than closely related strains WT for mutY- is this the case?

      We analyzed the genome of the clinical strains that possess DNA repair gene mutations to determine the additional polymorphisms. The number of SNPs in the strains harboring DNA repair mutation and the drug susceptible strains appears to be higher. We have also looked for the CàT and CàG mutations in the same strains. CàT mutations are higher in the strains harboring mutY variant compared with the susceptible strains (Figure 2-figure supplement 6 l). However, we could not perform statistical analysis as the number of strains that harbor mutY variant is limited to 8. Thus data suggest that empirically the strains harboring mutY variant show higher SNPs elsewhere and CàT mutations. We are not stating these conclusions strongly in the manuscript as the data is not statistically significant

      • While more work, mutation rates as measured by Luria-Delbruck fluctuation analysis are more accurate than mutation frequencies. I would recommend repeating key experiments by Luria-Delbruck fluctuation analysis. It is also important to report both drug-resistant colony counts and total CFU in these sorts of experiments. Given the clumpy nature of mycobacteria, mutation rates can appear to be artificially elevated due to low total CFU and not an increase in the number of drug-resistant colonies.

      As suggested, we determined the mutation rate in the presence of isoniazid, rifampicin, and ciprofloxacin (Figure 3g-j). The fold increase in the mutation rate relative to Rv for RvDmutY, RvDmutY:mutY, and RvDmutY::mutY-R262Q was 2.90, 0.76, and 3.0 in the presence of isoniazid and 5.62, 1.13, and 5.10 or 9.14, 1.57, and 8.71 in the presence of rifampicin and ciprofloxacin respectively (Figure 3).

      • Figure 4 would appear to measuring drug tolerance not resistance? Are the elevated CFU in the presence of drugs in the mutY-Arg262Gln strain due to an increase in the number of drug resistant strains or drug sensitive strains? This could be assessed by quantifying resulting CFU in the presence or absence the indicated drugs.

      To ascertain better survival is due to the acquisition of mutations in the direct target of antibiotics or drug tolerance. We performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro) was mixed in equal proportion prior to library preparation. Only those SNPs present in >20% of reads were retained for the analysis. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of the Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotics, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY::mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY-R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c-e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY::mutY (Figure 2-figure supplement 6). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an interesting article that uses the power of drosophila to explore how organisms work with their symbionts to adapt to a changing environment. The authors show that reducing some nonessential amino acids that cannot be produced by the "symbiont" Lactobacillus can nevertheless be rescued by the presence of this bacteria. They suggest it is not through provisioning from the bacteria using genetic screens in the bacteria, they find four bacterial strains that have a reduced ability to restore the delay. They then show that the mutants have transposon insertions in r/tRNA loci and reduced rRNA levels. These mutants and a newly generated deletion allele shows similar phenotypes (although very modest (~1day change). due to imabalance. Experiments next demonstrate that colonization with Lp leads to induction of an ATF4 reporter independent of diet. But that colonization of the mutant Lp, has reduced activation during a balanced diet but not in an imbalanced diet. This was also the case for a mutant identified in the screen. Next the authors explore the role of enterocyte GCN2. They show that there are selective requirements for GNC2 depending on the diet and aa imbalance. This is very complicated. As the depletion of GCN2 by one allele does not impact GF pupation on an imbalanced diet, it does for other alleles. And they find that this activity is independent of ATF4 and 4EBP, two known members of the pathway.

      Major strengths include the screen for bacterial mutants and demonstration that depletion of specific amino acids have specific dependencies (both bacterial and host). However, there is a disconnect between the bacterial mutants and the host physiology. How do the mutants impact host biology? Is it through an RNA signal? If so how does this get sensed? Is GCN2 involved, and if so by what mechanism?

      We thank the reviewer for his/her evaluation. The connection between the L. plantarum (Lp) mutants and host physiology is mostly established by the following observations:

      1) bacterial mutants for r/tRNAs failed to activate GCN2 to the same extent as WT bacteria. Although the difference on imbalanced diet is not significant (p-value=0.069, new Fig. 5A-B), there is a trend towards a decreased activation with the r/tRNA deletion mutant. We also observed this trend with the r/tRNA insertion mutant (new Fig. S4A-B). This decrease reached statistical significance when we performed short-term association (new Fig. S4E-F) or on balanced diet (new Fig. 5C-D and new Fig. S4C-D).

      2) providing tRNAs to larvae supports activation of GCN2 in enterocytes (new Fig. 5E-F).

      3) knocked-down of GCN2 in enterocytes using RNAi triggers a growth delay in larvae (new Fig. 6A, new Fig. S5A-B).

      4) when we knocked-down GCN2 using RNAi, we did not observe any difference between the growth of larvae associated with Lp WT and the r/tRNA mutant (new Fig. 6H-I).

      We believe these results strongly indicate that the phenotype of delayed growth upon association with r/tRNA mutant relies at least partly on a decreased GCN2 activation in enterocytes. Given the mechanism of activation of GCN2 (GCN2 is activated by structured RNA such as tRNAs or rRNAs) we propose that GCN2 is a sensor of bacterial r/tRNAs. This is supported by our new finding that Lp produces extracellular vesicles containing r/tRNAs (new Fig. 3). However, we agree that this point remains speculative. We amended our Abstract and Discussion accordingly (L30, L924-929) to clarify that direct activation of GCN2 by Lp’s r/tRNAs remains speculative.

      Reviewer #2 (Public Review):

      This manuscript investigates an intriguing observation, the data are strong, and the manuscript is clearly written. The authors very convincingly demonstrate that regions of the chromosome that encode L. plantarum tRNAs are also necessary for activation of D. melanogaster GCN2 and accelerated development in the setting of AA imbalance and that this effect on development is dependent on GCN2. They further provide transcriptomic data that broaden our understanding of the host intestinal response to L. plantarum in the setting of AA imbalance. In other host-microbe interactions such as the squid-Vibrio fischeri symbiosis, the bacterial RNA has been visualized in host cells, suggesting transport. Here, experimental data demonstrating bacterial RNA in host cells is lacking and then direct interaction of GCN2 with prokaryotic tRNAs is hypothesized but not proven. As a result, the basis of the observed effect of bacterial tRNAS remains vague. Open questions such how/if the bacterial tRNA enters the host enterocytes, whether these interact with GCN2, and whether other bacterial products are required for the response remain to be answered.

      We thank the reviewer for his/her interest in our work. Association with LpΔopr/tRNA leads to reduced activation of GCN2 in enterocytes, and tRNAs feeding activate GCN2. Given the mechanism of activation of GCN2, we speculate that tRNAs produced by Lp directly interacts with GCN2 in enterocytes. We add new data showing that Lp produces extracellular vesicles, and these vesicles contain r/tRNAs (new Fig. 8). Since extracellular vesicles can transport molecules from bacteria to hosts (Brown et al. 2015) this observation supports our model: enterocytes may acquire Lp’s r/tRNAs from extracellular vesicles.

      Reviewer #3 (Public Review):

      The strength of this study relies on the use of a chemically well-defined diet of the host and of the identification of Lp mutants that fail to rescue the noxious effects of an imbalanced amino-acid regimen. Thus, the genetic approach in both host and symbiont is a major asset of this study. The results are surprising as an imbalance of one essential amino-acid in the diet, valine, can nevertheless be compensated by Lp, even though it is itself unable to synthesize this amino-acid. The experiments are well-conducted and conclusions are appropriate.

      We thank the reviewer for his/her kind words and for his/her interest in our work.

      This study however does not identify how GCN2 promotes growth in this context. There is just a descriptive transcriptomics approach that is however not validated at the functional level (and also not by RTqPCR experiments) as it does not provide obvious leads beyond a Gene Ontology exploitation of the data.

      To answer the reviewer’s questions, we have further characterized one hit from our RNAseq analysis: Lp association causes down-regulation of the growth repressor fezzik. We show that fezzik knock-down in enterocytes improves larval growth, which suggests that Lp improves growth partly through GCN2-dependant r/tRNA-dependent repression of fezzik expression (new Fig. 8 and new Fig. S8).

      The authors propose that Lp promotes a more thorough absorption of valine, a possibility that makes sense but is not backed up by any data.

      We now provide new data showing that association with Lp increases the amounts of Valine in larva’s hemolymph (new Fig. 1E). Since Lp cannot produce Valine, this supports our model of increased nutrient absorption by the gut of Lp-associated larvae.

      Also, how Lp releases r/tRNAs is not addressed experimentally.

      We now provide new data showing that Lp produces extracellular vesicles that contain r/tRNAs (new Fig. 3).

      A minor logical flaw is the use of GCN2 pathway activation read-outs that are actually not required to mediate Lp's beneficial action.

      Our hypothesis is that GCN2 activation leads to both activation of ATF4, which is not required to mediate Lp’s beneficial action, and induction of other targets (e.g. fezzik repression, EGFR activation) that are required to mediate Lp’s beneficial action. We showed that ATF4 activation is a good readout of GCN2 activation (GCN2 knock-down completely suppresses the reporter’s expression in the anterior midgut, new Fig. 4C-F).

      The authors claim that GCN2 action is not mediated through ATF4 or Thor based on RNA interference experiments. However, in contrast to the GCN2 case, they have not validated the RNAi lines and tested also only one for each.

      To address the reviewer’s concerns, we have used two lines of 4E-BP loss-of-function alleles. These lines do not show a growth delay on imbalanced diet (new Fig. S5I). Regarding ATF4, we used the RNAseq to validate the ATF4-RNAi: the Mex>ATF4RNAi-Lp condition shows a statistically significant ~8 fold reduction in ATF4 expression compared to the control-Lp condition (N.B. ATF4 is annotated as crc in our dataset).

    1. Author Response

      Reviewer #1 (Public Review):

      Strength: The study is summarizing a large cohort of human samples of blood, nasal swabs and nasopharyngeal aspirates. This is very uncommon as most of the time studies focus on the blood and serum of patients. Within the study, 3 monocyte and 3 DC subsets have been followed in healthy and Influenza A virus-infected persons. The study also includes functional data on the responsiveness of Influenza A virus-infected DC and monocyte populations. The authors achieved their aims in that they were able to show that the tissue microenvironment is important to understand subset specific migration and activation behavior in Influenza A virus infection and in addition that it matters with which kind of agent a person is infected. Thus, this study also impacts a better understanding of vaccine design for respiratory viruses.

      We thank Reviewer 1 for highlighting what we believe to be the greatest strengths of our study. The key feature of this study was to generate a comprehensive description of monocytes and dendritic cells (DC) in the human nasopharynx during influenza A virus infection, and to provide a comparison with healthy and convalescent individuals. Further, we wished to emphasize the value of studying the nasopharynx during respiratory viral infections, particularly in light of the ongoing COVID-19 pandemic. We describe a non-invasive method to (longitudinally) sample this anatomical compartment that allows retrieval of intact immune cells as well as mucosal fluid for soluble marker analysis. We also believe that the addition of proteomic profiles in the different compartments (new Figure 7) further highlights the importance of the tissue microenvironment.

      Weakness: In the described study, the authors used a different nomenclature to introduce the DC subsets. This is confusing and the authors should stick to the nomenclature introduced by Guilliams et al., 2014 (doi.org/10.1038/nri3712) and commented in Ginhoux et al., 2022 (DOI: 10.1038/s41577-022-00675-7 ) or at least should introduce the alternative names (cDC1, cDC2, expression markers XCR1, CD172a/Sirpa). Further, Segura et al., 2013 (doi: 10.1084/jem.20121103) showed that all three DC subpopulations were able to perform cross-presentation when directly isolated. Overall, a more up-to-date introduction would be useful.

      Reviewer 1 commented on the DC nomenclature used in the manuscript. We agree that our manuscript would benefit from appropriately updating the DC nomenclature. We therefore revised the text, and now we refer to the subsets previously described as CD1c+ and CD141+ myeloid DCs (MDC) as cDC2 and CDC1 subsets, respectively. We have also modified the text in the Introduction of the revised manuscript to reflect the same and give a more up-to-date introduction of DC subsets (marked-up version lines 75-81).

      As the data of this was already obtained in 2016-2018 it is clear that the FACS panel was not developed to study DC3. If possible, the authors might be able to speculate about the role of this subset in their data set. Moreover, there were other studies on SARS-CoV-2 infection and DC subset analyses in blood (line 87, and line 489) e.g. Winheim et al., (DOI: 10.1371/journal.ppat.1009742 ), which the authors should introduce and discuss in regard to their own data.

      As reviewer 1 accurately pointed out, the flow cytometry panel used in this study was indeed not developed to study the DC3 subset. The data was obtained in 2016-2018, and lack the typical markers used to identify the DC3 subset, such as CD163, BTLA and CD5 (Cytlak et al, https://doi.org/10.1016/j.immuni.2020.07.003, Villani et al, https://doi.org/10.1126/science.aah4573). Due to the constraints of the panel, we would not be able to accurately identify DC3s. However, in an attempt to dig deeper into the data that is available, we re-analyzed the data to identify CD14+CD1c+ cells among the lineage–HLADR+CD16–CD14+ cells, here collectively called “mo-DC”. This population is likely a combination of monocytes upregulating CD1c and bona fide DC3 expressing CD14. Accordingly, the gating strategy was updated in Supplementary figure 1 (marked-up version lines 192-194), and new data plot in Figure 2H (marked-up version lines 208-220) summarizes the changes observed in mo-DC numbers in IAV patients between blood and the nasopharynx. Parallel to the pattern seen in other DC subsets, mo-DC frequencies are reduced in blood and we observed an increase (not significant) in the nasopharynx.

      As CD88 was not included in the original panel, it was not possible to discriminate between bona fide monocytes and DC3s. We performed a staining of PBMCs (buffy coat) with CD88 (FITC) added to the original flow panel used in the study, to assess if CD88 can be helpful for future studies (Reviewer figure 1). The staining showed that some cells in the mo-DC population are CD88 positive, indicating a bona fide monocyte origin, whereas some are negative, indicating that they are bona fide DC3 expressing CD14. (Bourdely et al, https://doi.org/10.1016/j.immuni.2020.06.002).

      Reviewer figure 1. Expression of CD88 in the “mo-DC” population. Cells from a buffy coat were stained with the flow cytometry panel used in the manuscript, with the addition of CD88 (FITC). Within the CD14+CD1c+ population, the “mo-DC” population, we identified both CD88+ and CD88- cells.

      Reviewer 1 also suggested citing Winheim et al (https://doi.org/10.1371/journal.ppat.1009742), and we thank them for their suggestion. We have now cited Winheim et al, and two additional reports (Kvedaraite et al, https://doi.org/10.1073/pnas.2018587118 and Affandi et al, https://doi.org/10.3389/fimmu.2021.697840) describing a depletion of DC3s (and other DC subsets) from circulation, and functional impairment of DCs following SARS-CoV-2 infection. Further, Winheim et al observed an increased frequency of a CD163+CD14+ subpopulation within the DC3s, which correlated with systemic inflammatory responses in SARS-CoV-2 infection. We speculate that perhaps in IAV infection too, DC3s may follow the trend of other DC subsets and be found in increased numbers in the nasopharynx (marked-up version lines 75-81 and 543-552).

      Taken together, although the data are very important and very interesting, my overall impression of the manuscript is that in the era of RNA seq and scRNA seq analyses the study lacks a bit of comprehensiveness.

      The final comment from reviewer 1 is well taken, in that our study does not include RNA-seq analyses. Again, we ask Reviewer 1 to take into consideration the challenging material we worked with in our study in combination with the COVID-19 pandemic that subsequently has excluded recruitment of new influenza patients to the study. The cell numbers and viability in the nasopharyngeal aspirates limit what experimental approaches can be done simultaneously, and flow cytometry seemed to be the best approach for the study. However, we agree that in future studies, both our own and those of others in the field, will greatly benefit from single cell analysis of nasopharyngeal immune cells, and from generating transcriptomic or epigenetic profiles of these cells. Unfortunately, it is a limitation that we are currently unable to overcome within the scope of this revision. Despite this weakness, we agree with Reviewer 1 that the methods we developed and the data we generated are important and interesting.

      Moreover, we have added additional proteomics data from both NPA and plasma from influenza and COVID-19 patients, using the SomaScan platform (new Figure 7) (marked-up version lines 472-511, 738-755 and 768-792). We also included a supplementary table listing enriched pathway data from gProfiler. Briefly, our data showed sizeable changes within the blood and nasopharyngeal proteome during respiratory virus infection (IAV or SARS-CoV-2), as compared to healthy controls. Importantly, we found several differentially expressed proteins unique to the nasopharynx that were not seen in blood, and pathway analysis highlighted “host immune responses” and “innate immunity” pathways, containing TNF, IL-6, ISG15, IL-18R, CCL7, CXCL10 (IP-10), CXCL11, GZMB, SEMA4A, S100A8, S100A9. These findings are in line with our flow cytometry data, and support our hypothesis that the immunological response to viral infection in the upper airways differ from that in matching plasma samples. One of the main messages in this manuscript is the importance of looking at the site of infection, and not only at systemic immune responses to better understand respiratory viral infections in humans. We believe that the addition of the proteomics data serves to further highlight this point.

      Reviewer #2 (Public Review):

      This study aims to describe the distribution and functional status of monocytes and dendritic cells in the blood and nasopharyngeal aspirate (NPA) after respiratory viral infection in more than 50 patients affected by influenza A, B, RSV and SARS-CoV2. The authors use flow cytometry to define HLA-DR+ lineage negative cells, and within this gate, classical, intermediate and non-classical monocytes and CD1c+, CD141+, and CD123+ dendritic cells (DC). They show a large increase in classical monocytes in NPA and an increase in intermediate monocytes in blood and NPA, with more subtle changes in non-classical monocytes. Changes in intermediate monocytes were age-dependent and resolution was seen with convalescence. While blood monocytes tended to increase in blood and NPA, DC frequency was reduced in blood but also increased in NPA. There were signs of maturation in monocytes and DC in NPA compared with blood as judged by expression of HLA-DR and CD86. Cytokine levels in NPA were increased in infection in association with enrichment of cytokine-producing cells. Various patterns were observed in different viral infections suggesting some specificity of pathogen response. The work did not fully document the diversity of human myeloid cells that have arisen from single-cell transcriptomics over the last 5 years, notably the classification of monocytes which shows only two distinct subsets (intermediate cannot be distinguished from classical), distinct populations of DC1, DC2 and DC3 (DC2 and 3 both having CD1c, but different levels of monocyte antigens), and the lack of distinction provided by CD123 which also includes a precursor population of AXL+SIGLEC6+ myeloid cells in addition to plasmacytoid DC. Furthermore, some greater precision of the gating could have been achieved for the subsets presented. Specifically, CD34+ cells were not excluded from the HLA-DR+ lineage- gate, and the threshold of CD11c may have excluded some DC1 owing to the low expression of this antigen. Overall, the work shows that interesting results can be obtained by comparing myeloid populations of blood and NPA during viral infection and that lineage, viral and age-specific patterns are observed. However, the mechanistic insights for host defense provided by these observations remain relatively modest.

      We thank Reviewer 2 for their assessment of our manuscript and summarizing our key findings in their public review. As reviewer 2 noted, our study describes changes in frequencies of monocytes and DCs during acute IAV infection, in blood and in the nasopharynx. Additionally, we also demonstrate pathogen-specific changes in both compartments. Reviewer 2 also highlighted a drawback of our study- that the approach did not fully capture the breadth of monocyte and DC diversity as it currently stands. Despite this, the findings we presented here laid the groundwork for continued research and led to significant progress, including mechanistic insights (Falck-Jones et al, https://doi.org/10.1172/JCI144734 and Cagigi et al, https://doi.org/10.1172/jci.insight.151463, Havervall et al. https://doi.org/10.1056/nejmc2209651 and Marking et al. Lancet Infectious Diseases in press), in understanding the role of myeloid cells in the human airways during viral infections.

    1. Author Response

      Reviewer #1 (Public Review):

      The data presented throughout are solid, however, some of the structures drawn of the oxysterols in Figure 1 are not chemically correct. 24(S)HC is drawn as 24(R)HC and visa versa, also the oxysterol sulfate should have a bond between C-3 and the O of OSO3H. It would also help the reader if the vehicle for oxysterol additions was clarified.

      We thank the reviewer for pointing out these embarrassing errors! All structures have been corrected. The vehicle for oxysterol (ethanol) is indicated in the Methods.

      The data presented in Figures 2 and 3 show that inhibition of SREBP processing by 25HC is important for the long-term maintenance of depletion of plasma membrane accessible cholesterol, but I wonder if activation of LXR may also be important here. I appreciate that the data in Figure 2 points against LXR being involved in the rapid depletion of accessible cholesterol in HEK293 cells, but perhaps it is important for the long-term depletion of accessible cholesterol. Could there be some cell type specificity here?

      We agree with the reviewer that 25HC’s effects on multiple signaling pathways complicates mechanistic interpretations. Our studies suggest that ACAT activity is absolutely required for the rapid depletion of accessible PM cholesterol and LXRs play a minor role at this stage. The long-term contributions could very well arise from any of the other 25HC targets, including LXRs, and the relative contributions of ACAT, SREBPs, and LXRs could vary between cell types.

      Something that always concerns me when the antimicrobial activity of 25HC is discussed is the fact that 25HC is usually a minor side-chain oxysterol compared to 24(S)HC and 27HC (and 22(R)HC in steroidogenic tissue), except for a short time after infection. Perhaps any long-term antimicrobial activity, and diminishment of accessible cholesterol, results from these other side-chain oxysterols. This may be worthy of some additional discussion.

      We agree with the reviewer that we cannot rule out the contribution of other oxysterols to long-term antimicrobial activity. While we have kept our focus on 25HC in this study, we point out in the Discussion that other ACAT-activating oxysterols such as 20(R)HC, 24(R)HC, 24(S)HC, and 27HC, all of which diminish accessible cholesterol, could also have long-term immunological effects.

      Reviewer #2 (Public Review):

      The paper describes a fairly complete set of experiments describing a mechanism by which 4-hour treatment with 25HC can provide reductions in plasma membrane cholesterol for up to 22 hours. The basic finding is that 25HC depletes the ER of cholesterol by stimulating esterification and that SREBP activation is also inhibited. This effect is associated with the slow loss of 25HC from the cells.

      The paper describes detailed studies of the long-lasting effects of a 4-hour exposure to 25HC on the loss of plasma membrane cholesterol. The paper characterizes the effects on SREBP processing to account for this. The possible long-lasting effects of ACAT stimulation were not investigated but may play an equal role.

      The paper presents data that the effects on plasma membrane cholesterol can account for the inhibitory effects on some bacterial toxins and viruses.

      We thank the reviewer for their positive comments.

      Reviewer #3 (Public Review):

      The paper uses multiple approaches in cultured cells to show that the rapid depletion of accessible plasma membrane cholesterol by 25-hydroxycholesterol is mediated by the activation of the cholesterol-esterifying enzyme acylCoA:cholesterol acyltransferase (ACAT). They carefully consider and exclude other potential mechanisms that could explain the effects of 25-OH cholesterol on the plasma membrane cholesterol pool, such as decreased cholesterol biosynthesis or activation of LXR transcription factors. Cell lines with mutations in ACAT and in cholesterol homeostatic factors are used in an ingenious fashion to support the role of ACAT and exclude these other mechanisms. The in vivo relevance of accessible membrane cholesterol and ACAT is then demonstrated for toxic cytolysin binding to cells, Listeria infection in vivo, and Zika and Coronavirus infections of cultured liver cells. Overall, the evidence is exceptional that ACAT modulates the plasma membrane accessible cholesterol pool as a strategy of the host to protect against various infectious agents. The discussion of the paper could be broadened to include other mechanisms that are known concerning the role of 25-OH cholesterol in infectious processes and the body's responses.

      We thank the reviewer for their positive assessment.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors Rem et al., examine the mechanism of action of APP, a protein implicated in Alzheimer's disease pathology, on GABAB receptor function. It has been reported earlier that soluble APP (sAPP) binds to the Sushi domain 1 of the GABAB1a subunit. In the current manuscript, authors examine this issue in detail and report that sAPP or APP17 interacts with GABABR with nano Molar affinity. However, binding of APP to GABAB receptor does not influence any of the canonical effects such as receptor function, K+ channel currents, spontaneous release of glutamate, or EPSC in vivo. The experimental evidence provided to support the conclusions is thorough and statistically sound. The range of techniques used to address each of the aims has been carefully curated to draw meaningful conclusions.

      The authors use HEK293T heterologous cell line to confirm the affinity of APP17 for the receptor, ligand displacement, and receptor activation. They also use this method to study PKA activation downstream of the GPCR. They use slice electrophysiology to measure changes in glutamatergic transmission EPSC and then in vivo 2-photon microscopy to measure functional changes in vivo.

      The work is significant for the field of Alzheimer's and also GABAB receptor biology, as it has been assumed for sAPP acts via GABAB receptors to influence neurotransmission in the brain. The results presented here open up the question yet again, what is the physiological function of sAPP in the brain?

      The manuscript is clearly written and easy to follow. The main criticism would be that the manuscript fails to identify the mechanism downstream of APP17 interaction with GB1a SD1.

      Our results show that APP17 does not influence GABAB receptor signaling in heterologous expression systems, neuronal cultures and anesthetized mice. Thus, our data do not support the existence of a “mechanism downstream of APP17 interaction with GB1a SD1”. As discussed in our manuscript, full-length APP controls GABAB receptor trafficking and surface stability in axons (Dinamarca et al., 2019), thus already providing a biological function for binding of APP to GB1a.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors studied Eurasian perch in an experimental setup facilitated by a nuclear cooling plant to provide a natural laboratory. The heated area of the ecosystem raised in temperature by 8 degrees centigrade, while a reference area remained unheated. The authors provide a thorough and convincing description that the two areas are segregated such that individuals could not escape from one area to another prior to 2004, and such use data only until 2003 to test their hypotheses. The authors used both length-at-catch and age-increment data in a series of Bayesian mixed effects models to estimate the growth rate and length-at-age. They find that in the warmed area, both younger, smaller fish and older adults grew faster, contrary to the prediction of the temperature-size rule as well as many predictions and observations from other systems that fish reach smaller terminal body sizes in warmer environments due to increased metabolic demands. The authors furthermore combine the estimated body sizes with a mortality rate to determine the size-spectrum slope for both areas and determine the increased growth and increased mortality combine to essentially leave the size-spectrum slope observed in the ecosystem unchanged.

      This is a thorough and interesting paper presented clearly and succinctly. These authors present a strong and thorough analysis of how temperature affects growth when all other ecosystem factors remain unchanged in a population. The dataset is a powerful one to support this type of analysis, and the statistical analysis methods the authors used appear to be robust and thorough. The diagnostics and visualizations are complete and inspire confidence in the convergence and accuracy of the modeling approach. The use of the size spectrum exponent to roll up individual-level changes across the population into a single metric was useful and interesting.

      The estimates of the von Bertalanffy growth parameters in the results and discussion are less convincing than the growth increment and length-at-age estimates which seem much more robust. The presentation of estimates of the von Bertalanffy growth parameters in Figure S6 exhibit the high negative correlation between the k and L infinity parameters that are typical whenever multiple VBGF models are fit to subsets of data. It is difficult to determine which changes in parameters correspond to actual differences in early vs late life stage growth when, in any given year, if k is estimated low, L infinity will skew high simply due to the model structure. An example of this can be seen in 1995-1997 where L infinity is quite high but k is estimated quite low concurrently - in this case, it seems more reasonable to conclude the likelihood surface is quite flat between different parameter values than that fish suddenly reached a larger asymptotic size in these three years than all of the rest. The data in this case so strongly show larger growth in the heated area even without the VBGF results, and it would be more credible to base the discussion and results of this paper on the growth rate or observed length-at-age (e.g. Figure S4) estimates which are so clear.

      We agree with the limitations of the von Bertalanffy growth equation (VBGE), and we agree with you and with Reviewer #2, that the estimated parameters for cohorts 1995–1997 are different, in particular for the L_infinity parameter in the heated area (see also reply to Reviewer#2 for a longer reply to that issue). The main reason for the size-at-age analysis in addition to growth-at-size is because the growth rates in theory could become similar between the areas for a given size, but if the initial growth rates were higher, there would still be a difference in the size-at-age, and size-at-age is an important trait in the context of the temperature-size rule (TSR). We could overcome the issues with the 3-parameter VBGE model by fitting multiple linear models to size-at-age for one age at the time. However, such models would not account for that cohorts may share similar growth trajectories. Therefore, we suggest instead to still use the VBGE growth equation, but put less emphasis on the specific parameter estimates, and instead present the results of the predictions of length-at-age only in that figure. We also wish to clarify that the size-at-age figure referred to here (Figure 2-figure supplement 4) is the predicted size-at-age from the VBGE model, rather than just the data or predictions from some other model.

      In summary, we have downplayed the role of the specific parameter estimates and instead focused on the predicted size-at-age. Part of Figure 2 has been made a supporting figure (Figure 2-figure supplement 8). We have also conducted sensitivity analysis with respect to cohorts 1995–1997. This extra analysis shows that omitting these cohorts still results in a clear difference in size-at-age between the areas but reduces the predicted difference in size-at-age by a few percentage points. See first paragraph of the results, and lines 373–378. a

    1. Author Response

      Reviewer #1 (Public Review):

      Caetano and colleagues describe the changes caused by periodontal inflammation in terms of tissue structure and provide additional evidence to understand the involvement of fibroblasts in altering the immune microenvironment.

      While interesting and a concise study, the authors should improve their work on two major points:

      1) To improve the resolution, the authors introduced a method that addresses improving the resolution by combining more information from the neighbour structure and the existing database. This raises the question of whether the lack of previous gingival tissue spatial transcriptome sequencing results weakens the reliability of this method. Does it miss the identification of some gingival tissue-specific cells? Is the failure to match two populations of fibroblasts between single-cell sequencing and spatial transcriptome sequencing of gingival tissue fibroblasts related to this?

      Thank you for raising these concerns. We don’t think that the lack of previous spatial transcriptome data of oral mucosa tissue affects the reliability of this method; however, as the technology matures our limitations will be overcome particularly regarding resolution. Understanding the exact cellular and molecular mechanisms of oral mucosa cellular remodelling processes in disease in their spatial context will be key to improve our current understanding of oral mucosa physiology. In contrast to single-cell RNA sequencing methods, we are not treating or digesting the tissue with enzymes or extracting cells from their local environment, therefore the impact on gene expression is substantially inferior compared to single-cell RNA sequencing. Because of this key difference, we expect differences between single-cell RNA sequencing and spatial data, which can preclude successful data integration. We were not successful in mapping all fibroblasts using one strategy (anchor-based integration) because this integration is performed on low resolution Visium datasets which is unable to uncover fine cell subtypes, such as fibroblasts. When we performed integration using a higher spatial resolution method, we could map these cells. In our initial single-cell RNA sequencing datasets, some gingiva cells were indeed missing due to technical limitations; for example, neutrophils were not captured given their fragile nature and low RNA content. With the spatial data, we could detect these and other immune cell types that were originally undetected. In conclusion, for a robust and unbiased molecular characterisation of human oral mucosa, spatial transcriptome data is essential.

      2) Although the authors did the identification of the captured tissues, the results seem to require more analysis. Take Figure 5A as an example, there is a clear overlap between endothelial cells and basal cells. In addition, it is suggested that the authors indicate the specific location of the 10 clusters of cells in Figures 1D and 2C.

      Thank you for your comment. Endothelial cells in Figure 5A have a predominantly subepithelial location as shown; however, these also localise in interpapillary regions which can be confounded with basal areas given the current resolution. We highlight that these analyses are not single-cell resolution. We applied a deconvolution method to increase the original spatial data resolution (55 µm), but it is still not true single-cell resolution.

      In Figure 1D and 2C we are not showing clusters of cells, but spatial/anatomical cluster regions; for example, epithelial and stromal regions. These regions contain, especially stromal areas, information of multiple cell types. We can map epithelial regions as these are generally well defined (Figure 2F), but validating stromal regions becomes more difficult. To address this, we mapped individual cell types (Figures 5 and 6) and focused on locating and validating our cell type of interest (Fibroblast 5).

    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript, Kim et al. use a deep generative model (a Variational Auto Encoder previously applied to adult data) to characterize neonatal-fetal functional brain development. The authors suggest that this approach is suitable given the rapid non-linear development taking place in the human brain across this period. Using two large neonatal and one fetal datasets, they describe that the resultant latent variables can lead to improved characterization of prenatal-neonatal development patterns, stable age prediction and that the decoder can reveal resting state networks. The study uses already accessible public datasets and the methods have been also made available.

      The manuscript is clearly written, the figures excellent and the application in this group novel. The methods are generally appropriate although there are some methodological concerns which I think would be important to address. Although the authors demonstrate that the methods are broadly generalisable across study populations - however, I am unsure about the general interest of the work beyond application of their previously described VAE approach to a new population and what new insight this offers to understanding how the human brain develops. This is a particular consideration given that the major results are age prediction (which is easily done with various imaging measures including something as simple as whole brain volume) and recapitulation of known patterns of functional activity in neonates. As such, the work will be of interest to researchers working in fMRI analysis methods and deep learning, but perhaps less so to a wider neuroscience/clinical readership.

      Specific comments:

      1) (M1) If I understand correctly, the method takes the functional data after volume registration into template space and then projects this data onto the surface. Given the complexities of changing morphology of the development brain. would it not be preferable to have the data in surface space for standard space alignment (rather than this being done later?). This would certainly help with one of the concerns expressed by the authors of "smoothing" in the youngest fetuses leading to a negative relationship between age and performance.

      While projecting onto the cortical surface has its advantages, as suggested here18, several studies have also shown that with careful registration, such as in the current study, volumetric registration can yield comparable performance19. Regardless, we did attempt to directly generate cortical surfaces for our fetuses. We refer the reviewer to our response to the RE-M2 [page 9].

      Regarding the “smoothing” effect in the youngest fetuses, we want to clarify that the smoothing effect in the scans of young fetuses is not unique to the choice of registration method. In other words, the same smoothing effect must be seen with cortical registration as well. Regarding this perspective, we kindly refer the reviewer to our response to RE-M1 [page 7]. Regarding the specific change made in the revised manuscript, we kindly refer to our response to R1-m5 [p21] or [page 9 line 191-213] in the main manuscript.

      2) (M2) A key limitation which I feel is important to consider if the method is aiming to be used for fetuses is the effects of the analysis being limited only to the cortical surface - and therefore the role of subcortical tissue (such as developmental layers in the immature white matter and key structures like the thalami) cannot be included. This is important, as in the fetal (and preterm neonatal) brain, the cortex is still developing and so not only might there be not the same kind of organisation to the activity, but also there is likely an evolving relationship with activity in the transient developmental layers (like the subplate) and inputs from the thalamus.

      The reviewer raises an important point. We agree with the reviewer that the subcortical region plays a critical role in fetal and newborn neurodevelopment. Unfortunately, our current VAE model cannot utilize such information without a major change in the model structure. We added this as a limitation of our study and discussed why our VAE model, in its current form, did not include subcortical areas. Please see our detailed response to RE-M1 [page 4] or [page 25 line 558-570] in the main manuscript.

      3) (M3) As the authors correctly describe, brain development and specifically functional relationships are likely evolving across the study time window. Beyond predicting age and a different way of estimating resting state networks using the decoding step, it is not clear to me what new insight the work is adding to the existing literature - or how the method has been specifically adapted for working with this kind of data. Whilst I agree that these developmental processes are indeed likely non-linear, to put the work in context, I think the manuscript would benefit from explaining how (or if) the method has been adapted and explicitly mentioning what additional neuroscientific/biological gains there are from this method.

      We appreciate the reviewer’s critical insights. In the revised paper, we included additional results that, we hope, can address the reviewer’s concerns. We believe that the strength of the VAE model is that, relative to linear models, it can be more generalizable across different datasets and ages (adult vs. full-term babies vs. preterm babies vs. fetuses). In the original manuscript, this was supported by the superior age prediction performance of the VAE over linear models when applied to different datasets covering the fetal to neonatal periods. Age prediction could also be done using other imaging modalities, as the reviewer pointed out. However, we do not think this undermines the potential impact of having the ability to accurately estimate age based on functional connectivity patterns. Brain function-structure relationships may not exactly be one-to-one20. It is entirely possible that for one disease, brain functional connectivity alterations precede structural changes such that delayed growth trajectories will first manifest in the functional space. There are also certain aspects of brain function that cannot be mapped directly to its structural characteristics (i.e., structural connectivity patterns). For example, brain changes its functional connectivity patterns dynamically over different brain states (resting vs. task-engaging)21, mental disorders (depression22, anxiety23, Schizophrenia24), cognitive traits25, 26, and individual uniqueness25, etc. Therefore, we believe that estimating the functional age of fetuses and neonates given their functional connectivity profiles may provide a biomarker for tracking neurodevelopment trajectories, allowing clinicians to identify deviations early and intervene in a timely manner if necessary. For these reasons, we believe that superior age prediction performance of the VAE model compared to linear models is scientifically significant.

      The value of the VAE lies in its ability to capture FC features that are otherwise not modeled by linear strategies. For example, here, we showed that only the VAE model can extract latent variables representing brain networks that are similar across different datasets. In contrast, linear models, showed higher network pattern similarity between full-term and preterm infants within the dHCP dataset. This suggests that the VAE model can be a very useful tool for capturing common brain networks in datasets acquired using different recording parameters and preprocessing steps. Moreover, the VAE representations predicted age with higher accuracy compared to linear representations. Together, these findings show that the methodology is effective in extracting functionally relevant features of the brain. Please see RE-M1 [page 3] and R1-m13 regarding the specific changes made in the revised manuscript.

      4) (M4) The unavoidable smoothing effect of VAE is very noticeable in the figures - does this suggest that the method will be relatively insensitive to the fine granularity which is important to understand brain development and the establishment of networks (such as the evolving boundaries between functional regions with age) - reducing inference to only the large primary sensory and associative networks? This will also be important to consider for the individual "reconstruction degree" - (which it would likely then overstate - and would need careful intersubject comparison also) if it was to be used as a biomarker or predictor of cognition as suggested by the authors.

      Regarding the first concern, yes. Greater smoothing will tend to yield less granular network patterns; this is true for all representational models (not only VAE, but also models like ICA or PCA). This effect becomes ever more pronounced when representations consist of fewer components (e.g., IC50); the smoothing effect becomes stronger, leading to coarser brain patterns (see Fig. 3 in the revised manuscript). In this regard, higher number of components is desired, but on the flipside, IC maps with higher components are generally less interpretable. In short, there will always be trade-offs between interpretability and spatial resolution. Also, higher components tend to cause over-fitting issue, as shown in our age prediction performance across different datasets (worse performance in the IC300 vs. IC50). In this sense, what matters for the representations is how informative each latent variable (or component) is. In the revised Fig. 2, we showed that latent variables from the VAE model were more informative in representing rsfMRI than linear representations. It is also noteworthy that the smoothing effect of the VAE is comparable to IC300 (similar effect to manual smoothing at the level of FWHM=5mm; revised Fig. 3). Given above results, we believe the VAE model may be more suitable for investigating finer scale of brain networks, than linear models. The above perspective was updated in the revised manuscript as [page 23 line 506-511]:

      "Another interesting observation was that the smoothing effect of the VAE is comparable to IC300 (similar effect to manual smoothing at the level of FWHM=5mm; Fig. 3). Given the above, we believe the VAE model may be more suitable for investigating finer scale of brain networks, than linear models. Perhaps, the VAE model with a greater number of latent variables (e.g., 512 or 1024 instead of 256 in the current VAE) can be utilized to find brain networks at finer scale."

      On top of the points raised above, network mapping with linear models is limited when it comes to mapping the spatial evolution of brain networks over aging due to their linear nature. This limitation can be observed in the ICA study with dHCP dataset (Fig. 4 in 7). On the other hand, thanks to its nonlinearity nature, the VAE model may have a potential to observe the spatial gradient of brain network over aging, while this expectation needs confirmation. To that end, we revised our discussion to reflect our perspective. We refer the full change made in the revised manuscript to our response to R1-m13.

    1. Author Response

      We thank the reviewers for their positive feedback and thoughtful suggestions that will improve our manuscript. Here we summarise our plan for immediate action. We will resubmit our manuscript once additional experiments have been performed to clarify all the major and minor concerns of the reviewers and the manuscript has been revised. At that point, we will respond to all reviewer’s points and highlight the changes made in the text.

      Reviewer #1 (Public Review):

      The authors have tried to correlate changes in the cellular environment by means of altering temperature, the expression of key cellular factors involved in the viral replication cycle, and small molecules known to affect key viral protein-protein interactions with some physical properties of the liquid condensates of viral origin. The ideas and experiments are extremely interesting as they provide a framework to study viral replication and assembly from a thermodynamic point of view in live cells.

      The major strengths of this article are the extremely thoughtful and detailed experimental approach; although this data collection and analysis are most likely extremely time-consuming, the techniques used here are so simple that the main goal and idea of the article become elegant. A second major strength is that in other to understand some of the physicochemical properties of the viral liquid inclusion, they used stimuli that have been very well studied, and thus one can really focus on a relatively easy interpretation of most of the data presented here.

      There are three major weaknesses in this article. The way it is written, especially at the beginning, is extremely confusing. First, I would suggest authors should check and review extensively for improvements to the use of English. In particular, the abstract and introduction are extremely hard to understand. Second, in the abstract and introduction, the authors use terms such as "hardening", "perturbing the type/strength of interactions", "stabilization", and "material properties", for just citing some terms. It is clear that the authors do know exactly what they are referring to, but the definitions come so late in the text that it all becomes confusing. The second major weakness is that there is a lack of deep discussion of the physical meaning of some of the measured parameters like "C dense vs inclusion", and "nuclear density and supersaturation". There is a need to explain further the physical consequences of all the graphs. Most of them are discussed in a very superficial manner. The third major weakness is a lack of analysis of phase separations. Some of their data suggest phase transition and/or phase separation, thus, a more in-deep analysis is required. For example, could they calculate the change of entropy and enthalpy of some of these processes? Could they find some boundaries for these transitions between the "hard" (whatever that means) and the liquid?

      The authors have achieved almost all their goals, with the caveat of the third weakness I mentioned before. Their work presented in this article is of significant interest and can become extremely important if a more detailed analysis of the thermodynamics parameters is assessed and a better description of the physical phenomenon is provided.

      We thank reviewer 1 for the comments and, in particular, for being so positive regarding the strengths of our manuscript and for raising concerns that will surely improve the manuscript. At this point, we propose the following actions to address the concerns of Reviewer 1:

      1) We will extensively revise the use of English, particularly, in the abstract and introduction, defining key terms as they come along in the text to make the argument clearer.

      2) We acknowledge the importance of discussing our data in more detail and we propose the following. We will discuss the graphs and what they mean as exemplified in the paragraph below.

      Regarding Figure 3 - As the concentration of vRNPs increases, we observe an increase in supersaturation until 12hpi. This means that contrary to what is observed in a binary mixture, in which the Cdilute is constant (Klosin et al., 2020), the Cdilute in our system increases with concentration. It has been reported that Cdilute increases in a multi-component system with bulk concentration (Riback et al., 2020). Our findings have important implications for how we think about the condensates formed during influenza infection. As the 8 different genomic vRNPs have a similar overall structure, they could, in theory, behave as a binary system between units of vRNPs and Rab11a. However, a change in Cdilute with concentration shows that our system behaves as a multi-component system. This means that the differences in length, RNA sequence and valency that each vRNP have are key for the integrity of condensates.

      3) The reviewer calls our attention to the lack of analysis of phase separations. We think that phase separation (or percolation coupled to phase separation) governs the formation of influenza A virus condensates. However, we think we ought to exert caution at this point as the condensates we are working with are very complex and that the physics of our system in cells may not be sufficient to claim phase separation without an in vitro reconstitution system. In fact, IAV inclusions contain cellular membranes, different vRNPs and Rab11a. So far, we can only speculate that the liquid character of IAV inclusions may arise from a network of interacting vRNPs that bridge several cognate vRNP-Rab11 units on flexible membranes, similarly to what happens in phase separated vesicles in neurological synapses. However, the speculative model for our system, although being supported by correlative light and electron microscopy, currently lacks formal experimental validation.

      For this reason, we thought of developing the current work as an alternative to explore the importance of the liquid material properties of IAV inclusions. By finding an efficient method to alter the material properties of IAV inclusions, we provide proof of principle that it is possible to impose controlled phase transitions that reduce the dynamics of vRNPs in cells and negatively impact progeny virion production. Despite having discussed these issues in the limitations of the study, we will make our point clearer.

      We are currently establishing an in vitro reconstitution system to formally demonstrate, in an independent publication, that IAV inclusions are formed by phase separation. For this future work, we teamed up with Pablo Sartori, a theorical physicist to derive in- depth analysis of the thermodynamics of the viral liquid condensates. Collectively, we think that cells have too many variables to derive meaningful physics parameters (such as entropy and enthalpy) as well as models and need to be complemented by in vitro systems. For example, increasing the concentration inside a cell is not a simple endeavour as it relies on cellular pathways to deliver material to a specific place. At the same time, the 8 vRNPs, as mentioned above, have different size, valency and RNA sequence and can behave very differently in the formation of condensates and maintenance of their material properties. Ideally, they should be analysed individually or in selected combinations. For the future, we will combine data from in vitro reconstitution systems and cells to address this very important point raised by the reviewer.

      From the paper on the section Limitations of the study: “Understanding condensate biology in living cells is physiologically relevant but complex because the systems are heterotypic and away from equilibria. This is especially challenging for influenza A liquid inclusions that are formed by 8 different vRNP complexes, which although sharing the same structure, vary in length, valency, and RNA sequence. In addition, liquid inclusions result from an incompletely understood interactome where vRNPs engage in multiple and distinct intersegment interactions bridging cognate vRNP-Rab11 units on flexible membranes (Chou et al., 2013; Gavazzi et al., 2013; Haralampiev et al., 2020; Le Sage et al., 2020; Shafiuddin & Boon, 2019; Sugita, Sagara, Noda, & Kawaoka, 2013). At present, we lack an in vitro reconstitution system to understand the underlying mechanism governing demixing of vRNP-Rab11a-host membranes from the cytosol. This in vitro system would be useful to explore how the different segments independently modulate the material properties of inclusions, explore if condensates are sites of IAV genome assembly, determine thermodynamic values, thresholds accurately, perform rheological measurements for viscosity and elasticity and validate our findings”.

      Reviewer #2 (Public Review):

      During Influenza virus infection, newly synthesized viral ribonucleoproteins (vRNPs) form cytosolic condensates, postulated as viral genome assembly sites and having liquid properties. vRNP accumulation in liquid viral inclusions requires its association with the cellular protein Rab11a directly via the viral polymerase subunit PB2. Etibor et al. investigate and compare the contributions of entropy, concentration, and valency/strength/type of interactions, on the properties of the vRNP condensates. For this, they subjected infected cells to the following perturbations: temperature variation (4, 37, and 42{degree sign}C), the concentration of viral inclusion drivers (vRNPs and Rab11a), and the number or strength of interactions between vRNPs using nucleozin a well-characterized vRNP sticker. Lowering the temperature (i.e. decreasing the entropic contribution) leads to a mild growth of condensates that does not significantly impact their stability. Altering the concentration of drivers of IAV inclusions impact their size but not their material properties. The most spectacular effect on condensates was observed using nucleozin. The drug dramatically stabilizes vRNP inclusions acting as a condensate hardener. Using a mouse model of influenza infection, the authors provide evidence that the activity of nucleozin is retained in vivo. Finally, using a mass spectrometry approach, they show that the drug affects vRNP solubility in a Rab11a-dependent manner without altering the host proteome profile.

      The data are compelling and support the idea that drugs that affect the material properties of viral condensates could constitute a new family of antiviral molecules as already described for the respiratory syncytial virus (Risso Ballester et al. Nature. 2021).

      Nevertheless, there are some limitations in the study. Several of them are mentioned in a dedicated paragraph at the end of a discussion. This includes the heterogeneity of the system (vRNP of different sizes, interactions between viral and cellular partners far from being understood), which is far from equilibrium, and the absence of minimal in vitro systems that would be useful to further characterize the thermodynamic and the material properties of the condensates.

      We thank reviewer 2 for highlighting specific details that need improving and raising such interesting questions to validate our findings. We will address all the minor comments of Reviewer 2. To address the comments of Reviewer 2, we propose the actions described in blue below each point raised that is written in italics.

      1) The concentrations are mostly evaluated using antibodies. This may be correct for Cdilute. However, measurement of Cdense should be viewed with caution as the antibodies may have some difficulty accessing the inner of the condensates (as already shown in other systems), and this access may depend on some condensate properties (which may evolve along the infection). This might induce artifactual trends in some graphs (as seen in panel 2c), which could, in turn, affect the calculation of some thermodynamic parameters.

      The concern of using antibodies to calculate Cdense is valid. We will address this concern by validating our results using a fluorescent tagged virus that has mNeon Green fused to the viral polymerase PA (PA-mNeonGreen PR8 virus). Like NP, PA is a component of vRNPs and labels viral inclusions, colocalising with Rab11 when vRNPs are in the cytosol without the need of using antibodies.

      This virus would be the best to evaluate inclusion thermodynamics, where it not an attenuated virus (Figure 1A below) with a delayed infection as demonstrated by the reduced levels of viral proteins (Figure 1B below). Consistently, it shows differences in the accumulation of vRNPs in the cytosol and viral inclusions form later in infection. After their emergence, inclusions behave as in the wild-type virus (PR8-WT), fusing and dividing (Figure 1C below) and displaying liquid properties. The differences in concentration may shift or alter thermodynamic parameters such as time of nucleation, nucleation density, inclusion maturation rate, Cdense, Cdilute. This is the reason why we performed the thermodynamics profiling using antibodies upon PR8-WT infection. For validating our results, and taking into account a possible delayed kinetics, and differenced that may occur because of reduced vRNP accumulation in the cytosol, this virus will be useful and therefore we will repeat the thermodynamics using it.

      As a side note, vRNPs are composed of viral RNA coated with several molecules of NP and each vRNP also contains 1 copy of the trimeric RNA dependent RNA polymerase formed by PA, PB1 and PB2. It is well documented that in the cytosol the vast majority of PA (and other components of the polymerase) is in the form of vRNPs (Avilov, Moisy, Munier, et al., 2012; Avilov, Moisy, Naffakh, & Cusack, 2012; Bhagwat et al., 2020; Lakdawala et al., 2014), and thus we can use this virus to label vRNPs on condensates to corroborate our studies using antibodies.

      Figure 1 – The PA- mNeonGreen virus is attenuated in comparison to the WT virus. A. Cells (A549) were infected or mock-infected with PR8 WT or PA- mNeonGreen (PA-mNG) viruses, at a multiplicity of infection (MOI) of 3, for the indicated times. Viral production was determined by plaque assay and plotted as plaque forming units (PFU) per milliliter (mL) ± standard error of the mean (SEM). Data are a pool from 2 independent experiments. B. The levels of viral PA, NP and M2 proteins and actin in cell lysates at the indicated time points were determined by western blotting. C. Cells (A549) were transfected with a plasmid encoding mCherry-NP and co-infected with PA-mNeonGreen virus for 16h, at an MOI of 10. Cells were imaged under time-lapse conditions starting at 16 hpi. White boxes highlight vRNPs/viral inclusions in the cytoplasm in the individual frames. The dashed white and yellow lines mark the cell nucleus and the cell periphery, respectively. The yellow arrows indicate the fission/fusion events and movement of vRNPs/ viral inclusions. Bar = 10 µm. Bar in insets = 2 µm.

      2) Although the authors have demonstrated that vRNP condensates exhibit several key characteristics of liquid condensates (they fuse and divide, they dissolve upon hypotonic shock or upon incubation with 1,6-hexanediol, FRAP experiments are consistent with a liquid nature), their aspect ratio (with a median above 1.4) is much higher than the aspect ratio observed for other cellular or viral liquid compartments. This is intriguing and might be discussed.

      IAV inclusions have been shown to interact with microtubules and the endoplasmic reticulum, that confers movement, and also undergo fusion and fission events. We propose that these interactions and movement impose strength and deform inclusions making them less spherical. To validate this assumption, we compared the aspect ratio of viral inclusions in the absence and presence of nocodazole (that abrogates microtubule-based movement). The data in figure 2 shows that in the presence of nocodazole, the aspect ratio decreases from 1.42±0.36 to 1.26 ±0.17, supporting our assumption.

      Figure 2 – Treatment with nocodazole reduces the aspect ratio of influenza A virus inclusions. Cells (A549) were infected PR8 WT and treated with nocodazole (10 µg/mL) for 2h time after which the movement of influenza A virus inclusions was captured by live cell imaging. Viral inclusions were segmented, and the aspect ratio measured by imageJ, analysed and plotted in R.

      3) Similarly, the fusion event presented at the bottom of figure 3I is dubious. It might as well be an aggregation of condensates without fusion.

      We will change this, thank you for the suggestion.

      4) The authors could have more systematically performed FRAP/FLAPh experiments on cells expressing fluorescent versions of both NP and Rab11a to investigate the influence of condensate size, time after infection, or global concentrations of Rab11a in the cell (using the total fluorescence of overexpressed GFP-Rab11a as a proxy) on condensate properties.

      We will try our best to be able to comply with this suggestion as we think it is important.

      Reviewer #3 (Public Review):

      This study aims to define the factors that regulate the material properties of the viral inclusion bodies of influenza A virus (IAV). In a cellular model, it shows that the material properties were not affected by lowering the temperature nor by altering the concentration of the factors that drive their formation. Impressively, the study shows that IAV inclusions may be hardened by targeting vRNP interactions via the known pharmacological modulator (also an IAV antiviral), nucleozin, both in vitro and in vivo. The study employs current state-of-the-art methodology in both influenza virology and condensate biology, and the conclusions are well-supported by data and proper data analysis. This study is an important starting point for understanding how to pharmacologically modulate the material properties of IAV viral inclusion bodies.

      We thank this reviewer for all the positive comments. We will address the minor issues brought to our attention entirely, including changing the tittle of the manuscript and we will investigate the formation and material properties of IAV inclusions in the presence and absence of nucleozin for the nucleozin escape mutant NP-Y289H.

      References

      Avilov, S. V., Moisy, D., Munier, S., Schraidt, O., Naffakh, N., & Cusack, S. (2012). Replication- competent influenza A virus that encodes a split-green fluorescent protein-tagged PB2 polymerase subunit allows live-cell imaging of the virus life cycle. J Virol, 86(3), 1433- 1448. doi:10.1128/JVI.05820-11

      Avilov, S. V., Moisy, D., Naffakh, N., & Cusack, S. (2012). Influenza A virus progeny vRNP trafficking in live infected cells studied with the virus-encoded fluorescently tagged PB2 protein. Vaccine, 30(51), 7411-7417. doi:10.1016/j.vaccine.2012.09.077

      Bhagwat, A. R., Le Sage, V., Nturibi, E., Kulej, K., Jones, J., Guo, M., . . . Lakdawala, S. S. (2020). Quantitative live cell imaging reveals influenza virus manipulation of Rab11A transport through reduced dynein association. Nat Commun, 11(1), 23. doi:10.1038/s41467-019-13838-3

      Chou, Y. Y., Heaton, N. S., Gao, Q., Palese, P., Singer, R. H., & Lionnet, T. (2013). Colocalization of different influenza viral RNA segments in the cytoplasm before viral budding as shown by single-molecule sensitivity FISH analysis. PLoS Pathog, 9(5), e1003358. doi:10.1371/journal.ppat.1003358

      Gavazzi, C., Yver, M., Isel, C., Smyth, R. P., Rosa-Calatrava, M., Lina, B., . . . Marquet, R. (2013). A functional sequence-specific interaction between influenza A virus genomic RNA segments. Proc Natl Acad Sci U S A, 110(41), 16604-16609. doi:10.1073/pnas.1314419110

      Haralampiev, I., Prisner, S., Nitzan, M., Schade, M., Jolmes, F., Schreiber, M., . . . Herrmann, A. (2020). Selective flexible packaging pathways of the segmented genome of influenza A virus. Nat Commun, 11(1), 4355. doi:10.1038/s41467-020-18108-1

      Klosin, A., Oltsch, F., Harmon, T., Honigmann, A., Julicher, F., Hyman, A. A., & Zechner, C. (2020). Phase separation provides a mechanism to reduce noise in cells. Science, 367(6476), 464-468. doi:10.1126/science.aav6691

      Lakdawala, S. S., Wu, Y., Wawrzusin, P., Kabat, J., Broadbent, A. J., Lamirande, E. W., . . . Subbarao, K. (2014). Influenza a virus assembly intermediates fuse in the cytoplasm. PLoS Pathog, 10(3), e1003971. doi:10.1371/journal.ppat.1003971

      Le Sage, V., Kanarek, J. P., Snyder, D. J., Cooper, V. S., Lakdawala, S. S., & Lee, N. (2020). Mapping of Influenza Virus RNA-RNA Interactions Reveals a Flexible Network. Cell Rep, 31(13), 107823. doi:10.1016/j.celrep.2020.107823

      Riback, J. A., Zhu, L., Ferrolino, M. C., Tolbert, M., Mitrea, D. M., Sanders, D. W., . . . Brangwynne, C. P. (2020). Composition-dependent thermodynamics of intracellular phase separation. Nature, 581(7807), 209-214. doi:10.1038/s41586-020-2256-2

      Shafiuddin, M., & Boon, A. C. M. (2019). RNA Sequence Features Are at the Core of Influenza a Virus Genome Packaging. J Mol Biol. doi:10.1016/j.jmb.2019.03.018

      Sugita, Y., Sagara, H., Noda, T., & Kawaoka, Y. (2013). Configuration of viral ribonucleoprotein complexes within the influenza A virion. J Virol, 87(23), 12879- 12884. doi:10.1128/JVI.02096-13

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Shaikh and Sunagar addresses the question of the origin of spider venom proteins. It has been known for many years that an important component of spider venoms is a diverse group of small proteins known as disulfide-rich peptides (DRPs). However, it has not been clear whether this group of proteins has a common origin or evolved convergently in different lineages. The authors collected sequences of the genes encoding these proteins from publicly available genomes of spiders from a range of families. They aligned the sequences using the structural cysteines as guides and carried out a phylogenetic analysis of the different sequences, ultimately classifying the different proteins into over 50 super-families. One thing that is not clear from the text or from the references cited (I am not an expert on spider venom) is how many of these superfamilies were known before and how many are novel. There is also no clear indication of what criteria were used to define a subset of sequences as a superfamily. Nonetheless, the authors show that all these superfamilies have a single common ancestor, predating the divergence of araneomorphs and mygalomorphs and that the DRPs underwent independent diversification in each of these two lineages.

      We have identified 78 novel superfamilies in this study and 33 were previously identified (Pineda et al. 2020 PNAS). We had previously described information in lines 90, 101 and 106 regarding the description of novel superfamilies from previous studies and the ones described in this study.

      Line 90 “Recently, using a similar approach, 33 novel spider toxin superfamilies have been identified from the venom of the Australian funnel-web spider, Hadronyche infensa (9).”

      Line 101 “This approach enabled the identification of 33 novel toxin superfamilies along the breadth of Mygalomorphae (Figures S1 and S2).”

      Line 106 “Moreover, analyses of Araneomorphae toxin sequences using the strategy above resulted in the identification of 45 novel toxin superfamilies from Araneomorphae, all of which but one (SF109) belonged to the DRP class of toxins (Figures S3 and S4).”

      Spider toxin superfamilies have been named after gods/deities of death, destruction and the underworld based on nomenclature introduced by Pineda et al. (2014 BMC genomics). We have now included this explanation in the manuscript under the methods and results sections. We have also provided additional details pertaining to this nomenclature in Table S1.

      The authors also looked at selective forces acting on the sequences using dN/dS analyses. They reach the conclusion that there are different modes of selection acting on different sequences based on their role - defensive or predatory venoms - building on previous work by the lead author on venom sequence evolution in diverse animals.

      All in all, this is an admirable piece of molecular evolution work, providing new data on the evolution of spider venom proteins. There are some confusions in terminology that need to be cleared up, and somewhat more context needs to be given for non-specialists as detailed in the points below:

      We thank the reviewer for their constructive and critical suggestions, as well as the kind words of encouragement. Their suggestions have helped us in significantly improving the quality of our work.

      Suggestion 1) Common names of the main spider infraorders should be given.

      We thank the reviewer for their helpful input. We have now introduced spider infraorders with well-known spiders and their common names under the introduction section. Furthermore, we have also included a schematic representation of the spider phylogeny, and highlighted lineages under investigation as Figure 1.

      Suggestion 2) Opisthothelae is not the common ancestor of Mygalomorphae and Araneamorphae, but the clade that encompasses those two clades. This incorrect statement appears in several places. Further on, it is stated that Opisthothelae is the common ancestor of all extant spiders. This is wrong both from a terminological point of view (a clade cannot be ancestral to another clade) and from a factual point of view, since there are extant spiders not included in Opisthothelae.

      We thank the reviewer for pointing out this oversight. We have now corrected it to suborder Opisthothelae as the clade encompassing Mygalomorphae and Araneomorphae spiders.

      Suggestion 3) Several proteins and proteins families are mentioned without being introduced, e.g. knottin. Please provide short descriptions.

      We have now provided a short introduction to terms such as Knottin.

      Reviewer #2 (Public Review):

      This interesting study looks into the evolution of putative spider venom toxins, specifically disulfide-rich peptides (DRPs). The authors use published sequence data to gain new insights into the evolution of DRPs, which are the major component of most spider venoms. Through a series of sequence comparisons and phylogenetic analyses they identify a substantial number of new spider toxin superfamilies with distinct cysteine scaffolds, and they trace these back to a primitive scaffold that must have been present in the last common ancestor of mygalomorph and araneomorph spiders. Looking at the taxonomic distribution of these putative venom DRPs, they conclude that mygalomorph and araneomorph DRPs have evolved in different ways, with the former being recruited into venom at the level of genera, and the latter at the level of families. In addition, they perform selection analyses on the DRP superfamilies to uncover the surprising result that mygalomorph and araneomorph DRPs have evolved under different selective regimes, with the evolution of the former being characterised by positive selection, and the latter by purifying (negative) selection.

      However, I don't think that in the current state of the manuscript these conclusions are robustly supported for several reasons. First, it seems that not all previously published data were included in the phylogenetic analyses that were used to identify new superfamilies of DRPs.

      We have, indeed, analysed all spider toxin sequences available to date. We have relied on the signal and propeptide regions for identifying novel superfamilies, which is an accepted convention: Pineda et al. (2014 BMC Genomics); Pineda et al. (2020 PNAS).

      Although many additional superfamilies can be identified, we have only retained those sequences for which there were at least 5 representatives for the identification of toxin superfamilies, and 15 representatives for selection analyses to ensure robustness. This filtering step ensured that the generated alignments, phylogenetic trees, and evolutionary assessments were robust and devoid of noise that stems from single-representative groups. Adding in those sequences would have enabled us to identify many more superfamilies, solely based on the signal and propeptide examination, but it wouldn’t have been possible to support them with other lines of evidence that were provided for all other superfamilies in this study, jeopardising the overall quality of the manuscript. Nonetheless, there is strong evidence that the left-out sequences are also related to the ones analysed in this study (Figure S10). In future, when more transcriptomes are sequenced, it would be possible to designate these newer toxin superfamilies with much stronger support.

      Second, much of the data were obtained from whole-body transcriptome data, which leaves a degree of uncertainty that these data indeed derive from the venom glands that produce the toxins.

      We respectfully disagree with the reviewer that ‘much of the data’ are from the whole-body transcriptomes. Nearly all sequences in our study are sourced from Pineda et al. (2014 BMC Genomics and 2020 PNAS), Sunagar et al (2013 Toxins), Cole and Brewer (2020 bioRxiv) and transcriptome sequence assembly data from established online repositories NCBI (NR and TSA) and ENA. All the above-mentioned studies (KS is a part of many of these) under their methods section clearly state that the transcriptomes were generated using mRNA isolated from venom gland tissue (BioProject accessions: PRJEB14734; PRJEB6062; PRJNA189679, PRJNA587301 and PRJNA189679, where source tissue type is designated as venom gland).

      We would like to direct the reviewer’s attention to the following excerpts from reference papers from which data for this study has been sourced:

      1. Pineda S et al. (2020 PNAS): “Three days later, they were anesthetized, and their venom glands were dissected and placed in TRIzol reagent (Life Technologies). Total RNA from pooled venom glands was extracted following the standard TRIzol protocol.”
      2. Sunagar et al (2013 Toxins): “Paired venom glands were dissected out and pooled from nine mature females on the fourth day after venom depletion by electrostimulation. Total RNA was extracted using the standard TRIzol Plus method ...”
      3. Cole and Brewer (2020 bioRxiv): “... the venom glands of each ctenid were dissected out, whole RNA was isolated from the venom glands …”

      We would also like to point out that hexatoxins are widely studied and are some of the most well-understood spider venom toxins. Many representatives have been functionally characterised and shown to be potent in affecting prey and predatory species [Sunagar et al (2013 Toxins); Pineda et al. (2014 BMC Genomics and 2020 PNAS); Volker, et al. (2020 PNAS) - KS is a part of most of these studies as well]. However, the current technologies do not permit the high-throughput screening of the enormous diversity of toxins in spiders, which is why not every toxin sequence identified from the venom gland is functionally characterised. Nonetheless, venom researchers will not contest the role of these highly expressed venom gland proteins in envenoming, especially given that they share significant sequence identities with toxins that are functionally well-characterised.

      The only exception to the above is non-ctenid araneomorph toxin superfamily sequences, which are retrieved from whole-body transcriptomes (Cole and Brewer; 2020 bioRxiv). The authors of the paper indicated these as putative toxins. As explained above, homologs of these peptides are well-characterised to be venom toxins. Additionally, in our phylogenetic trees (Figures 3, 4, S6 and S9), they are nested within the toxin clades, reaffirming their identity.

      Third, the taxonomic representation of mygalomorph and araneomorph diversity in this study is so sparse that it becomes impossible to distinguish whether toxin recruitments have happened at the level of genera, families, or even higher-level taxa.

      We respectfully disagree with this suggestion. The taxonomic breadth investigated in this study isn’t sparse. Analysed sequences belong to groups across the breadth of the spider phylogeny. To address this criticism, we are now including a schematic representation of spider phylogeny, where lineages under investigation are highlighted (Figure 1A). Given this broader taxonomic breadth, all of our interpretations are parsimoniously extendable to their common ancestors. For instance, we establish the common origin of all DRPs in the members of these widespread spider families. Therefore, not including sequences from other sister groups will not invalidate this hypothesis, and the most parsimonious explanation will be that the missing members too are likely to have DRPs in their venom (which is also a common understanding of the spider venom research). Whether DRPs dominate the venoms of these missing groups will only come to light upon investigation, but their presence in the venom is highly likely. Moreover, please do note that we have analysed nearly all sequences available in the literature to date.

      As for the recruitment of the toxin superfamily at the taxon level, we would like to point out the phylogenies in Figures 2 and 3 that clearly show the differential recruitment events. We would also like to point out lines 120 and 136 state that this may not only be a result of recruitment and could arise from differential rates of diversification (also evident in other analyses presented in Figures 5 and Tables S2 and S3).

      Line 120 “Interestingly, the plesiotypic DRP scaffold seems to have undergone lineage-specific diversification in Mygalomorphae, where the selective diversification of the scaffold has led to the origination of novel toxin superfamilies corresponding to each genus (Figure 2).”

      Line 136 “However, we also documented a large number of DRP toxins (n=32) that were found to have diversified in a family-specific manner, wherein, a toxin scaffold seems to be recruited at the level of the spider family, rather than the genus. As a result, and in contrast to mygalomorph DRPs, araneomorph toxin superfamilies were found to be scattered across spider lineages (Figure 3; Figure S6; node support: ML: >90/100; BI: >0.95).”

      Adding any number of missing lineages will neither change the fact that araneomorphs ‘appear’ to have recruited these superfamilies at the genera level, nor the family-level recruitment of toxin superfamilies in a large number of examined mygalomorphs.

      We have now introduced a new figure (Figure 7) that highlights the different scenarios that explain the observed differences in the evolution of mygalomorph and araneomorph spider toxins. We have also included additional text in the manuscript to explain this better.

      Fourth, only a selection of DRP superfamilies was used for natural selection analyses, without the authors explaining how this selection was made. Yet, they attempted to draw general conclusions about toxin evolution in mygalomorphs and araneomorphs, even though most of the striking differences they found were restricted to just two mygalomorph genera, and one family of araneomorphs.

      From our experience and previous reports [Sunagar and Moran (2015, PLoS genetics); Sunagar, et al. (2012, MBE); Yang, Z. (2007, MBE)], the unavailability of enough sequences from datasets results in inaccurate estimation of omega values. For instance, if there are only a couple of sequences in a superfamily, both of which are slightly different from one another, then even these minor differences in them would be exaggerated. Hence, we have resorted to performing selection analysis on datasets for which there are at least 15 sequences. No doubt that this conservative approach reduces the number of datasets analysed, but it also ensures that our findings are well-supported. We have now clarified this in our manuscript under the methods section.

      However, we did previously include sequences from all toxin superfamilies described to date in our alignment figure (Fig S10) and analysed their signal and propeptide regions. They were only excluded from selection analyses. It can be seen that they too are DRPs, but they belong to distinct superfamilies from the ones being described here.

      If these concerns are addressed this study can shed important new light on venom toxin evolution in one of the most diverse venomous taxa on Earth.

      We thank the reviewer for their constructive inputs and suggestions which have enabled us to make this manuscript more accessible to a wider audience.

      Reviewer #3 (Public Review):

      This work aims to elucidate the evolutionary origins of disulfide-rich spider toxin superfamilies and to determine the modes of natural selection and associated ecological pressures acting upon them. The authors provide a compelling line of evidence for a single evolutionary origin and differing factors (e.g., prey capture strategies and methods of anti-predator defense) that have shaped the evolution of these toxins. Additionally, the two major spider infraorders are claimed to have experienced differing selective pressures regarding these toxins.

      The results presented here are novel and generally well-presented. The evidence for a single origin of DRP toxins in spiders is exciting and changes the paradigm of spider venom evolution.

      The data are well analyzed, but the methods lack enough detail to reproduce the results. More information regarding the parameters passed to each software package, version numbers of all software employed, and models of molecular evolution employed in phylogenetic analyses are among the necessary missing information.

      We thank the reviewer for their kind words and constructive and critical suggestions. Their suggestions have contributed towards improving the quality of our work. Upon their suggestion, we have now expanded the methods section to include more details.

      The differences in the evolutionary pressures between mygalomorphs and RTA-clade spider DRP toxins are clear, but expanding RTA results to all araneomorphs may be overreaching. Additional araneomorph sequence data is available, despite the claims within this manuscript (e.g., see Jiang et al.. 2013 Toxins; He et al.. 2013 PLoS ONE; and Zobel-Thropp et al.. 2017 PEERJ). These papers include cDNA sequences of spider venom glands and contain representatives of inhibitory cysteine knot toxins, which are DRP toxins. These data would greatly enhance the strengths of the results presented herein.

      In response to the expansion of RTA results to araneomorphs, we would like to point out that RTA comprises about 50% of the diversity recorded in Araneomorphae. The araneomorph data analysed in our study covers a range of araneomorph family divergence time Agelenidae (<70 MYA), Pisauridae (<50 MYA) and Theridiidae (~200 MYA, Magalhaes 2020, Biological Reviews 95.1). We report a strong signature of purifying selection influencing the evolution of araneomorph toxin SFs, despite the long evolutionary time separating them (50 - 200 MYA). We firmly believe that further addition of toxin sequence data from other groups will not deviate from the general trend of molecular evolution observed in both these lineages across such large period of time; barring certain certain exceptions (such as SF13 a defensive toxin identified from Hadronyche experiencing purifying selection; Volker, et al. 2020 PNAS).

      We had initially excluded non-ctenid datasets from our analyses on account of poor sequence annotation and lack of representative sequence data. However, we have now incorporated Dolomedes mizhoanus (DRP) (Jiang et al. 2013 Toxins) and Latrodectus tredecimguttatus (non-DRP) (He et al. 2013 PLoS ONE) toxin dataset into our analyses, following reviewer’s suggestion. This has led to identification of 5 novel superfamilies, providing additional support to our spider venom evolution hypothesis.

    1. Author Response

      Reviewer #1 (Public Review):

      Lin et al. characterise cellular pathologies in PLA2G6 mutant patient-derived neuronal cells (neuronal progenitor cells, NPCs, and IPSc-derived dopaminergic neurones) and a novel compound heterozygous PLA2G6 mutant mouse model. They build on their previous findings in an INAD fly model (lacking PLA2G6) to show that lysosomal and mitochondrial defects are evolutionary conserved in PLA2G6 deficiency. The authors proceed to use their INAD fly model and to screen a number of compounds that are predicted to modulate endo-lysosomal function using a bang sensitivity assay. They then show that the drugs that can rescue this fly behavioural phenotype also reduce LAMP2 expression in patientderived NPCs on Western blot analysis. Lastly, the manuscript reports the creation of new genetic constructs that express human PLA2G6 and study expression levels in a human kidney cell line as well as in patent-derived NPCs. In the latter neuronal model, they show that expression of human PLA2G6 can rescue mitochondrial fragmentation associated with PLA2G6 loss-of-function. Lin et al then show that ICV (intracerebroventricular) and IV (intravenous) injection of a human PLA2G6-containing construct is able to partially rescue the rotarod phenotype in PLA2G6 transheterozygous PLA2G6 mutant mice between ~110 and 150 days. There is also an associated improvement in lifespan and body weight.

      The strengths of this work are that the authors use a number of different model organism systems, including patient-derived neuronal cells, Drosophila models (INAD flies) and mouse models to study PLA2G6-associated neurodegeneration (PLAN) at the cellular level. They also screen drug compounds that are predicted to target endo-lysosomal trafficking and sphingolipid metabolic pathways to ameliorate PLAN, thus identifying potential new therapeutic strategies. The work in mice, showing that gene therapy with human PLA2G6 can rescue a behavioural phenotype and lifespan is the first proof-ofconcept of such an advancement. This work will hopefully lead to further studies for optimisation toward clinical advancement.

      We thank the reviewer and editor for the positive comments about our manuscript.

      The major weaknesses are that the pathogenic mechanisms shown in the patient-derived neuronal cells and mice do not extend as far as those previously shown in the fly model published by the authors. Of note, ceramide levels and retromer function are not studied, both key pathologies described in the previous fly models. In addition, the drug screening is limited by its testing in one fly behavioural assay and LAMP2 Western blot analysis on patient derived NPCs.

      The results, in general, support the conclusions of the authors and represent well-performed work. However, the significance of elevated glucosylceramide levels is not clear in the present study. Although this was previously found to be elevated in INAD flies, it was ceramide levels that were thought to be the main toxic insult, with drugs aimed at reducing ceramide levels being shown to rescue INAD flies.

      We addressed these concerns. Please refer to our response to each of the specific point listed below.

      This work will no doubt be of significant interest to the field, confirming several previous findings in the Drosophila model of PLA2G6 (iPLA2-VIA) knockout. It also extends upon the fly work by identifying compounds that can be further studied for potential drug-re-purposing for the treatment of PLA2G6associated disease. The gene therapy studies are also very interesting and a first proof-of-principle in PLAN using ICV and IV delivery in a mouse model.

      We thank the reviewers and editor as addressing all these concerns really improved the manuscript.

      Reviewer #2 (Public Review):

      This article aims to extend human disease-related studies of PLA2G6 from fly models to iPS-neurons, mouse models, to look for drugs that suppress phenotypes and test them, and to attempt AAV whole body rescue. Generally, each of these questions/aims/experiments is excellent, but as presented, it's a bit of an underdeveloped hodgepodge of results, with each experiment somewhat underdeveloped or analyzed for the respective phenotype, in my opinion. I think the general thrust of the experiments is excellent. But the data are relatively cursory in many instances. Further development and characterization of the phenotypes would require quite a bit of work but vastly improve the paper.

      We thank the reviewer for the positive comments about our manuscript. We have addressed most of the concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      Like other sensory organs, the inner ear has a rich population of pericytes, essential for sensory hair cell heath and normal hearing. In this study, using an inducible and conditional pericyte depletion mouse (PdgfrbCreERT2/iDTR) model, the authors demonstrate that the pericytes play critical roles in maintaining vascular volume and integrity of spiral ganglion neurons (SGNs) in the cochlea. Moreover, using the coculture models, they show vigorous vascular and neuronal growth in neonatal SGN explants in the presence of exogenous pericytes. Mechanistically, this study demonstrates that these roles are achieved mainly through the interactions between pericyte-released exosomes containing VEGF-A and VEGFR2-expressing the vessels and SGNs.

      Overall, the data are analyzed thoroughly, and the conclusions are novel and convincing. It is mechanistically solid. The study is somewhat translationally limited. Nevertheless, understanding the roles of organ-specific pericytes is paramount, making this study timely and significant.

      We thank Reviewer #1 for the positive comment. We agree the pericyte depletion model is not a translational disease model. However, pericyte pathologies, including the decline in pericyte number, pericyte migration, and pericyte trans-differentiation, are frequently seen in aging and noise-induced hearing loss animal models. Moreover, hearing dysfunction due to pericyte pathology has been demonstrated in recent studies (Hou et al., 2020; Hou et al., 2018; Neng et al., 2015).

      Reviewer #2 (Public Review):

      The present study from Xiaorui Shi's lab investigated the effect of pericyte depletion on spiral ganglion neurons and auditory function. Results in vitro culture system proposed that pericyte-derived exosomes contain VEGF, and promote not just vascular stability but neuronal survival through Flk1. This study is an extension of their previous study showing pericyte depletion causes auditory dysfunction, which is ameliorated by VEGF gene therapy (Zhang et al., JCI insight 2021). Overall, the data are clear and sophisticated and promote our understanding of the biological roles of pericytes in neuronal function. Several points should be thoroughly discussed or supported by definitive experiments like analysis of neuron-specific Flk1 KO mice.

      We thank Reviewer #2 for the encouraging positive comments on our study. We especially appreciated the reviewer’s view that there would be value in using neuron-specific Flk1 KO mice to consolidate the results. However, since our in vitro adult SGN neuron cell culture model cearly demonstrates the direct role of exosome-VEGF-A signaling on adult SGN health, as shown in Figs. 5D & E and Figs. 9C & E, we are confident our conclusion is valid. A recent study used neuron-specific Flk1 conditional KO mice to demonstrate neuronal atrophy and dysfunction in memory impairment (Deyama et al., 2020). We do presume disruption of neuronal VEGF/FLK1 signaling in a specific neuronal Flk-1 deletion animal model would cause similar spiral ganglion death and subsequent hearing loss. To test this possibility, we are seeking a Cre-SGN driver animal model from the auditory community and Flk1 floxed mice from the larger research community. Of course, obtaining these models and setting up for a future study will require some time. Nevertheless, reviewer #2’s suggestion is excellent, we have added discussion of the suggestion to the Discussion section.

      Reviewer #3 (Public Review):

      Zhang et al focus on investigating the role of pericytes in the vasculature of the inner ear. They propose that pericyte-derived VEGF is required for vessels and SGN survival. Functionally, they show that pericyte ablation leads to hearing loss.

      This work is interesting to the scientific community. It describes a very specific organ vasculature and its potential crosstalk with the neuronal compartment in the peripheral nervous system.

      Major strengths and weaknesses:

      • The study is well explained, written, and discussed;

      • The design of the experiments is adequate;

      • The study is performed in vivo, in vitro, and with functional readouts;

      • Results are convincing.

      We thank the reviewer for the positive comments on our study. We especially appreciate the reviewer’s suggestions for improving the soundness and quality of the study. We address Review#3’s specific concerns below.

      The main conclusion of the study is that pericyte-derived VEGF acts on inner ear vessels and SGNs to maintain their functionality and survival. While all presented data supports this model, there could be other potential interpretations that should be tested and validated with further evidence:

      The in vitro experiments are performed with SGN explants. Using this system the authors see that pericyte-derived conditioned medium or exosomes lead to increase vessel branching and SGN neurite outgrowth. As explants contain vessels and neurons, there is the possibility that VEGF is primarily acting on endothelial cells, which then in turn signal to neurons (independent of VEGF, even when neurons express VEGFR2). This should be tested. Perhaps by targeting VEGFR2 specifically in neurons, or by culturing isolated SGN neurons and testing the effect of pericyte-derived exosomes.

      This is a great point. To confirm the effect of exosome VEGF-A on SGN neurite outgrowth, we treated isolated adult SGNs with exosomes. As shown in Figs.9C & E, we found much greater SGN dendrite and branch growth in the treated than in the untreated groups.

      • Pericyte ablation via DTA might result in the activation of the immune system, which could also influence vessel and neuronal survival. It should be checked whether there is immune activation upon pericyte ablation.

      Excellent point. We checked on macrophage activation at two weeks after pericyte depletion. We didn’t see any obvious signs of macrophage activation, but we did notice a decrease in macrophage number. We presume the reduction in macrophage number results from insufficiency blood flow and nutrient availability.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors seek to determine how various species combine their effects on the growth of a species of interest when part of the same community.

      To this end, the authors carry out an impressive experiment containing what I believe must be one of the largest pairwise + third-order co-culture experiments done to date, using a high-throughput co-culture system they had co-developed in previous work. The unprecedented nature of this data is a major strength of the paper. The authors also discover that species combine their effect through "dominance", i.e. the strongest effect masks the others. This is important as it calls into question the common assumption of additivity that is implicit in the choice of using Lotka-Volterra models.

      A stronger claim (i.e. in the abstract) is that joint effect of multiple species on the growth of another can be derived from the effect of individual species. Unless I am misunderstanding something, this statement may have to be qualified a little, as the authors show that a model based on pairwise dominance (i.e. the strongest pairwise) does a somewhat better job (lower RMSD, though granted, not by much, 0.57 vs 0.63) than a model based on single species dominance. This is, the effect of the strongest pair predicts better the effect of a trio than the effect of the larger species.

      This issue makes one wonder whether, had the authors included higher-order combinations of species (i.e. five-member consortia or higher), the strongest-effect trio would have predicted better than the strongest-effect pair, which in turn is better predictor than the strongest-effect species. This is important, as it would help one determine to what extent the strongest-effect model would work in more diverse communities, such as those one typically finds in nature. Indeed, the authors find that the predictive ability of the strongest effect species is much stronger for pairs than it is for trios (RMSD of 0.28 vs 0.63). Does the predictive ability of the single species model decline faster and faster as diversity grows beyond 4-member consortia?

      Thank you for raising this important point. It is true that in our study we see that single species predict pairs better than trios, and that pairs predict trios better than single species. As we did not perform experiments on more diverse communities (n>4), we are not sure if or how these rules will scale up. We explicitly address these caveats in our revised discussion.

      Reviewer #3 (Public Review):

      A problem in synthetic ecology is that one can't brute-force complex community design because combinatorics make it basically impossible to screen all possible communities from a bank of possible species. Therefore, we need a way to predict phenomena in complex communities from phenomena in simple communities. This paper aims to improve this predictive ability by comparing a few different simple models applied to a large dataset obtained with the use of the author's "kchip" microfluidics device. The main question they ask is whether the effect of two species on a focal species is predicted from the mean, the sum, or the max of the effect of each single "affecting" species on the focal species. They find that the max effect is often the best predictor, in the sense of minimizing the difference between predicted effect and measured effect. They also measure single-species trait data for their library of strains, including resource niche and antibiotic resistance, and then find that Pearson correlations between distance calculations generated from these metrics and the effect of added species are weak and unpredictive. This work is largely well-done, timely and likely to be of high interest to the field, as predicting ecosystem traits from species traits is a major research aim.

      My main criticism is that the main take-home from the paper (fig 3B)-that the strongest effect is the best predictor-is oversold. While it is true that, averaged over their six focal species, the "strongest effect" was the best overall predictor, when one looks at the species-specific data (S9), we see that it is not the best predictor for 1/3 of their focal species, and this fraction grows to 1/2 if one considers a difference in nRMSE of 0.01 to be negligible.

      As suggested, we have softened our language regarding the take-home message. This matter is addressed in detail above in response to 'Essential Revisions'. Briefly, we see that the strongest model works best when both single species have qualitatively similar effects, but is slightly less accurate when effects are mixed. We also see overall less accurate predictions for positive effects. In light of these findings, we propose that focal species for which the strongest model is not the most accurate is due to the interaction types, and not specific to the focal species.

      We made substantial changes to the manuscript, including the first paragraph of the discussion which more accurately describes these findings and emphasizes the relevant caveats:

      "By measuring thousands of simplified microbial communities, we quantified the effects of single species, pairs, and trios on multiple focal species. The most accurate model, overall and specifically when both single species effects were negative, was the strongest effect model. This is in stark contrast to models often used in antibiotic compound combinations, despite most effects being negative, where additivity is often the default model (Bollenbach 2015). The additive model performed well for mixed effects (i.e. one negative and one positive), but only slightly better than the strongest model, and poorly when both species had effects of the same sign. When both single species’ effects were positive, the strongest model was also the best, though the difference was less pronounced and all models performed worse for these interactions. This may be due to the small effect size seen with positive effects, as when we limited negative and mixed effects to a similar range of effects strength, their accuracy dropped to similar values (Figure 3–Figure supplement 5). We posit that the difference in accuracy across species is affected mainly by the effect type dominating different focal species' interactions, rather than by inherent species traits (Figure 3–Figure supplement 6)." (Lines 288-304)

      The same criticism applies to the result from figure 2-that pairs of affecting species have more negative effects than single species. Considered across all focal species this is true (though minor in effect size, Fig 2A). But there is only a significant effect within two individual species. Again, this points to the effects being focal-species-specific, and perhaps not as generalizable as is currently being claimed.

      Upon more rigorous analysis, and with regard to changes in the dataset after filtering, we see that the more accurate statement is that effects become stronger, not necessarily more negative (in line with the accuracy of the strongest model). The overall trend is towards more negative interactions, due to the majority of interactions being negative, but as stated this is not true for each individual focal. As such the following sentence in the manuscript has been changed:

      "The median effect on each focal was more negative by 0.28 on average, though the difference was not significant in all cases; additionally, focals with mostly positive single species interactions showed a small increase in median effect (Fig. 2D)" (Lines 151-154)

      As well as the title of this section: "Joint effects of species pairs tend to be stronger than those of individual affecting species" (Lines 127-128)

      Another thing that points to a focal-species-specific response is Fig 2D, which shows the distributions of responses of each focal species to pairs. Two of these distributions are unimodal, one appears bimodal, and three appear tri-modal. This suggests to me that the focal species respond in categorically different ways to species addition.

      We believe this distribution of pair effects is related to the distribution of single species effects, and not to the way in which different focal species respond to the addition of second species. Though this may be difficult to see from the swarm plots shown in the paper, below is a split violin plot that emphasizes this point.

      Fig R1: Distribution of single species and pair effects. Distribution of the effect of single and pairs of affecting species for each focal species individually. Dashed lines represent the median, while dotted lines the interquartile range.

      These differences occur even though the focal bacteria are all from the same family. This suggests to me that the generalizability may be even less when a more phylogenetically dispersed set of focal species are used.

      We have added the following sentence to the discussion explicitly emphasizing the phylogenetic limitations of our study:

      "Lastly, it is important to note that our focal species are all from the same order (Enterobacterales), which may also limit the purview of our findings." (Lines 364-366)

      Considering these points together, I argue that the conclusion should be shifted from "strongest effect is the best" to "in 3 of our focal species, strongest effect was the best, but this was not universal, and with only 6 focal species, we can't know if it will always be the best across a set of focal species".

      As mentioned above, we have softened our language regarding the take-home message in response to these evaluations.

      My second main criticism is that it is hard to understand exactly how the trait data were used to predict effects. It seems like it was just pearson correlation coefficients between interspecies niche distances (or antibiotic distances) and the effect. I'm not very surprised these correlations were unpredictive, because the underlying measurements don't seem to be relevant to the environment tested. What if, rather than using niche data across 20 nutrients, only the growth data on glucose (the carbon source in the experiments) was used? I understand that in a field experiment, for example, one might not know what resources are available, and so measuring niche across 20 resources may be the best thing to do. Here though it seems imperative to test using the most relevant data.

      It is true that much of the profiling data is not directly related to the experimental conditions (different carbon sources and antibiotics), but in addition to these we do use measurements from experiments carried out in the same environment as the interactions assays (i.e. growth rate and carrying capacity when growing on glucose), which also showed poor correlation with the effects on focals. Additionally, we believe that these profiles contain relevant information regarding metabolic similarity between species (similar to metabolic models often constructed computationally). To improve clarity, we added the following sentence to the figure legend of Figure 3–Figure supplement 1:

      "The growth rate, and maximum OD shown in panel A were measured only in M9 glucose, similar to conditions used in the interaction assays." (Lines 591-592)

      Additionally and relatedly, it would be valuable to show the scatterplots leading to the conclusion that trait data were uninformative. Pearson's r only works on an assumption of linearity. But there could be strong relationships between the trait data and effect that are monotonic but not linear, or even that are non-monotonic yet still strong (e.g. U-shaped). For the first case, I recommend switching to Spearman's rho over Pearson's r, because it only assumes monotonicity, not linearity. If there are observable relationships that are not monotonic, a different test should be used.

      Per your suggestion, we have changed the measurement of correlation in this analysis from Pearson's r, to Spearman's rho. As we observed similar, and still mostly weak correlations, we did not investigate these relationships further. See Figure 3–Figure supplement 1.

      Additionally, we generated heat maps including scatterplots mapping the data leading to these correlations. We found no notable dependency in these plots, and visually they were quite crowded and difficult to interpret. As this is not the central point of our study, we ultimately decided against adding this information to the plots.

      In general, I think the analyses using the trait data were too simplistic to conclude that the trait data are not predictive.

      We agree that more sophisticated analyses may help connect between species traits and their effects on focal species. In fact, other members of our research group have recently used machine learning to accomplish similar predictions (https://doi.org/10.1101/2022.08.02.502471). As such we have changed the wording in to reflect that this correlation is difficult to find using simple analyses:

      "These results indicate that it may be challenging to connect the effects of single and pairs of species on a focal strain to a specific trait of the involved strains, using simple analysis." (Lines 157-159)

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examined the impact of pre-gravid obesity in human mothers on the monocytes of newborns by collecting umbilical cord blood. Additionally, the authors also used a non-human primate (NHP) model of diet-induced obesity to isolate fetal macrophage and assess the impact of maternal obesity on fetal macrophage function. The comprehensive analysis of the human umbilical cord blood monocytes by studying cytokine release, bulk RNA-seq and bulk ATAC-seq, single cell RNA-seq and single cell ATAC-seq, responses to pathogen stimulation as well as metabolic studies such as glucose uptake are major strength of the work. They present convincing evidence that the monocytes of offspring with obese mothers have epigenetic and transcriptomic profiles consistent with impaired immune responses, both during baseline conditions and upon stimulation.

      We thank the reviewer for these positive remarks

      However, it is not clear from the data how the epigenetic data and the transcriptomic data are related to each other. The implication that the epigenetic changes drive the downstream transcriptional differences is not clearly demonstrated. Furthermore, it is not clear which of the observed attenuations of monocyte transcriptional responses overlap with chromatin accessibility differences. Such an overlap would make a stronger case for the mechanistic link.

      We thank the reviewer for this suggestion. We have included an integration section - with overlap of baseline ATAC-Seq (data from this study) with gene expression responses (from a previous study; https://doi.org/10.4049/jimmunol.1700434) following LPS stimulation in lean and obese groups - Figure 4E. Additionally, we report overlap of LPS induced chromatin changes with gene expression changes following LPS, E.coli and RSV stimulation in Figure 5I. Collectively, these changes provide the reader with a better link between chromatin accessibility and gene expression differences and their discordance with maternal obesity.

      The increased phagocytosis of E.coli in umbilical cord monocytes of newborns with obese mothers appear counter-intuitive because it implies greater host defense capacity.

      E.coli uptake assay is a standard way of measuring cellular phagocytosis by flow cytometry. We would like to clarify that despite impaired ex vivo cytokine responses and poor migration, UCB monocytes demonstrate higher ability to phagocytize pathogens. This is counterintuitive but not surprising, given that enhanced phagocytosis is a hallmark of regulatory monocytes/macrophages.

      One of the most remarkable aspects of the manuscript is the analysis of the fetal macrophages in a non-human primate (NHP) model of diet induced obesity because of the challenge of studying fetal macrophages in humans. The cytokine assays nicely show that the fetal macrophages in the obesity model show impaired cytokine production, consistent with what was seen in the umbilical cord blood monocytes of human newborns. This is especially important because circulating monocytes or monocyte progenitors seed the fetal tissues and give rise to fetal macrophages, thus elegantly linking the human work on circulating umbilical cord blood monocytes to the tissue macrophages in the NHP model. However, the NHP studies do not show any additional macrophage characterization beyond the cytokine assays. Flow cytometry analysis of the macrophage phenotype and functional assays would strengthen the conclusions regarding macrophage dysregulation.

      We have now included phenotyping data for ileal and splenic macrophages in Figure 6C-6E, which were collected during cell sorting. We unfortunately are not able to carry out additional functional assays since we don’t have any additional cells from these animals.

      Reviewer #2 (Public Review):

      This paper will be of interest to scientists studying the molecular effects of maternal obesity on offspring health. The paper represents an extension to earlier findings that have linked epigenomic alterations of monocyte population to aberrant immune responses in offsprings of obese mothers. Bulk and single cell technologies have been implemented to characterize monocytic responses to bacterial and viral pathogens at the transcriptional and epigenetic level. A macaque model of western-style diet induced obesity is also described to provide in vivo evidence in support of monocyte/immune cell reprogramming by western diet/obesity. However, enthusiasm for the paper is significantly dampened by a lack of clarity in data presentation and robustness of the analysis

      We thank the reviewer for this comprehensive summary and thoughtful assessment

      Reviewer #3 (Public Review):

      The manuscript by Sureshchandra et al is a very extensive analysis of monocyte function and their molecular landscape in cord bloods from lean and obese mothers. They aimed to analyze the effects of pre-pregnancy BMI on the functioning of the innate immune system in newborns in a very extensive way. The combination of functional and molecular analyses strengthens their observations and shows many different sides of monocyte activation. I think this approach needs to be praised and should be an inspiration to many others who study monocyte function. This allows for a broad view on the matter and also shows where potential targeting will be necessary in the future. Overall, the manuscript and particularly the methods section is very well written and extensive, making it easy to study how robust the data are.

      We thank the reviewer for their comprehensive and positive assessment of our work

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides further detailed analysis of recently published Fly Atlas datasets supplemented with newly generated single cell RNA-seq data obtained from 6,000 testis cells. Using these data, the authors define 43 germline cell clusters and 22 somatic cell clusters. This work confirms and extends previous observations regarding changing gene expression programs through the course of germ cell and somatic cell differentiation.

      This study makes several interesting observations that will be of interest to the field. For example, the authors find that spermatocytes exhibit sex chromosome specific changes in gene expression. In addition, comparisons between the single nucleus and single cell data reveal differences in active transcription versus global mRNA levels. For example, previous results showed that (1) several mRNAs remain high in spermatids long after they are actively transcribed in spermatocytes and (2) defined a set of post-meiotic transcripts. The analysis presented here shows that these patterns of mRNA expression are shared by hundreds of genes in the developing germline. Moreover, variable patterns between the sn- and sc-RNAseq datasets reveals considerable complexity in the post-transcriptional regulation of gene expression.

      Overall, this paper represents a significant contribution to the field. These findings will be of broad interest to developmental biologists and will establish an important foundation for future studies. However, several points should be addressed.

      In figure 1, I am struck by the widespread expression of vasa outside of the germ cell lineage. Do the authors have a technical or biological explanation for this observation? This point should be addressed in the paper with new experiments or further explanation in the text.

      Thank you for pointing this out. We found that our single cell dataset shows a similar (low) level of vasa expression outside the germline, suggesting that this is not due to single nucleus versus single cell RNA-seq (cluster 1, red in the lefthand umap).

      Analyzing the single nucleus RNA-seq in more detail revealed that, compared to the germline, both the fraction of cells in a cluster expressing vasa and the level at which they express it are very low. This analysis is included in a new Figure 1 – figure supplement 1. It is likely that much of this is due to a technical artifact, such as ambient RNA. Finally, we note in the resubmission that vasa is in fact expressed in embryonic somatic cells, and thus some of the vasa expression we observe may be real (Renault. Biol Open 2012; https://doi.org/10.1242/bio.20121909).

      Plots in the original submission drew undue attention to the few somatic cells that exhibited vasa signal, due to the fact that expressing cell points were forced to the front of the plot. Given our new analysis reporting the low levels and fraction of cells exhibiting vasa expression (Figure 1 – figure supplement 1), we have modified the panels of Figure 1, changing point size to more faithfully reflect the small proportion of somatic cells with some vasa expression.

      The proposed bifurcation of the cyst cells into head and tail populations is interesting and worth further exploration/validation. While the presented in situ hybridization for Nep4, geko, and shg hint at differences between these populations, double fluorescent in situs or the use of additional markers would help make this point clearer. Higher magnification images would also help in this regard.

      We thank the reviewer for their suggestions on clarifying the differences between HCC and TCC populations. As suggested, we have repeated the FISH experiments of Nep4 and geko with higher resolution, and included the additional marker Coracle that demarcates the junction between HCC and TCC (Figure 6O,Q,S,T). These panels replaced previous Nep4 and geko FISH images (see previous Figure 6Q,U,U’). FISH for Nep4 validated the split, and the enrichment of geko strongly suggests that this arm represents one cell type (HCCs). We have not yet identified a gene reciprocally enriched to the other arm. Therefore, in the revised submission, we call the assignment of TCC identity, and to a lesser extent, HCC identity ‘tentative’, but point out that genes predicted to be enriched to one or the other arm represent fertile candidates for the field to test.

      Reviewer #2 (Public Review):

      In this manuscript the authors explain in greater detail a recent testis snRNAseq dataset that many of these authors published earlier this year as part of the Fly Cell Atlas (FCA) Li et al. Science 2022. As part of the current effort additional collaborators were recruited and about 6,000 whole cell scRNAseq cells were added to the previous 42,000 nuclei dataset. The authors now describe 65 snRNseq clusters, each representing potential cell types or cell states, including 43 germline clusters and 22 somatic clusters. The authors state that this analysis confirms and extends previously knowledge of the testis in several important areas.

      “However, in areas where testis biology is well studied, such as the development of germ cells from GSC to the onset of spermatocyte differentiation, the resolution seems less than current knowledge by considerable margins. No clusters correspond to GSCs, or specific mitotic spermatogonia, and even the major stages of meiotic prophase are not resolved. Instead, the transitions between one state and the next are broad and almost continuous, which could be an intrinsic characteristic of the testis compared to other tissues, of snRNAseq compared to scRNAseq, or of the particular experimental and software analysis choices that were used in this study.”

      Note that the referee raises the same issue later in their review also. To respond succinctly, we placed the relevant sentence from a later portion of this referee’s comment here

      “Support for the view that the problems are mostly technical, rather than a reflection of testis biology, comes from studies of scRNAseq in the mouse, where it has been possible to resolve a stem cell cluster, and germ cell pathways that follow known germ cell differentiation trajectories with much more discrete steps than were reported here (for example, Cao et al. 2021 cited by the authors).”

      Respectfully, we have a different interpretation of other work as cited by this referee. Our data, as well as that from others, supports the notion that transitions are generally broad and continuous and are indeed a feature of testis biology. As we report here, data from both single cell and single nucleus RNAseq exhibit transitions from one cluster to the next. Thus, this feature cannot be due to the choice of method (single cell versus single nucleus).

      In fact, prior scRNA-seq results on systems containing a continuously renewing cell population, such as is the case in the testis, do indeed exhibit a contiguous trajectory rather than discrete, well-separated cell states in gene expression space (that is, in a UMAP presentation). For example, this is the case from single-cell or single-nucleus sequencing from spermatogenesis in mouse (Cao et al 2021), human (Sohni et al 2019), and zebrafish (Qian et al 2022).

      Along differentiation trajectories in these tissues, successive clusters are defined by their aggregate, transcript repertoire. Indeed, differentially-expressed genes can be identified for clusters, with expression enriched in a given cluster. However, expression is rarely restricted to a cluster. For instance, Cao et al. subcluster spermatogonia into four subgroups, termed SPG1-4. They state clearly that these SPG1-4 “follow a continuous differentiation trajectory,” as can be inferred by marker expression across cells in this lineage. Similar to our findings, while the spermatogonia can fall into discrete clusters, gene expression patterns are contiguous. For example, the “undifferentiated” marker used in Cao et al, Crabp1, clearly shows expression in SPG1-3, annotated as spermatogonial stem cells, undifferentiated spermatogonia, and early differentiated spermatogonia, respectively. Likewise, markers for the “SPG3” state spermatogonia have detectable expression in SPG2 and SPG4, and likewise for markers of the “SPG4” state (with expression found also in SPG3). <br /> Analogous study of human spermatogenesis arrives at a similar conclusion. In that work, although clusters are named as “spermatogonial stem cell (SSC)”, the authors are careful to specifically point out that, “…while we refer to the SSC-1 and SSC-2 cell clusters as ‘‘SSCs,’’ scRNA-seq is not a functional assay and thus we do not know the percentage of cells in these clusters with SSC activity. These subsets almost certainly contain other A-SPG cells [A type spermatogonia], including SPG progenitors that have committed to differentiate.” (Sohi et al 2019)

      Thus, the work in several disparate systems, all involving renewing lineages, finds that discrete clusters, such as a “stem cell cluster” are not identified. In the Drosophila testis, germline differentiation flows in a continuous-like manner similar to spermatogenesis in several other organisms studied by scRNA-seq, and our finding is not a function of the methodology, but rather a facet of the biology of the organ.

      Operating in parallel with continuous differentiation, we did find evidence of, and extensively discussed in concert with Figure 4, huge and dramatic shifts in transcriptional state in spermatocytes compared to spermatogonia, in early spermatids compared to spermatocytes, and in late spermatid elongation. Lastly, as we describe further below, new data in this resubmission identify four distinct genes with stage-selective expression as predicted by our analysis (new Figure 2 - figure supplement 1), illustrating the utility of our study for the field to find new markers and new genes to test for function.

      A goal of the study was to identify new rare cell types, and the hub, a small apical somatic cell region, was mentioned as a target region, since it regulates both stem cell populations, GSCs and CySCs, is capable of regeneration, and other fascinating properties. However the analysis of the hub cluster revealed more problems of specificity. 41 or 120 cells in the cluster were discordant with the remaining 79 which did express markers consistent with previous studies. Why these cells co-clustered was not explained and one can only presume that similar problems may be found in other clusters.

      Our writing seems not to have been clear enough on this point and we thank the reviewer. We have revised the section. In addition, we have added new data (Figure 7 - figure supplement 2). We had already stated that only 79 of these 120 nuclei were near to each other in 2D UMAP space, while other members of original cluster 90 were dispersed. Thus the 79 hub nuclei in fact clustered together on the UMAP. Other nuclei that mapped at dispersed positions were initially ‘called’ as part of this cluster in the original Fly Cell Atlas (FCA) paper (Li et al., 2022), making it obvious that a correction to that assignment was necessary, which we carried out. To our eye, no other called cluster was represented by such dispersed groupings. For the hub, we definitively established the 79 nuclei to represent hub cells by marker gene analysis, including the identification of a new maker, tup, that was included in the 79 annotated hub nuclei but excluded from the 41 other nuclei (Figure 7). In this resubmission, to independently verify the relationship of the 79 nuclei to each other, we subjected the 120 nuclei from the original cluster 90 defined by the FCA study to hierarchical clustering using only genes that are highly expressed and variable in these nuclei (Figure 7 - figure supplement 2). This computationally distinct approach strongly supported our identification of the 79 definitive hub nuclei.

      Indeed, many other indications of specificity issues were described, including contamination of fat body with spermatocytes, the expression of germline genes such as Vasa in many somatic cell clusters like muscle, hemocytes, and male gonad epithelium, and the promiscuous expression of many genes, including 25% of somatic-specific transcription factors, in mid to late spermatocytes. The expression of only one such genes, Hml, was documented in tissue, and the authors for reasons not explained did not attempt to decisively address whether this phenomenon is biologically meaningful.

      We discussed the question of vasa expression in somatic clusters in some detail above, in response to referee #1, and included new analysis in the resubmission.

      With respect to the observation of ‘somatic gene’ expression in spermatocytes, we are also intrigued. We do not believe this is due to “contamination,” but rather a spermatocyte expression program that includes expression of somatic genes. First, these somatic markers were not observed in other germline clusters, which would be expected if this was due to general transcript contamination. Second, we observed expression of somatic markers in spermatocytes independently in the single-cell and single-nucleus data, making it unlikely to be an artifact of preparation of isolated nuclei. Finally, in the resubmission, in addition to Hml, we validated ‘somatic’ marker expression in spermatocytes by FISH of a somatic, tail cyst cell marker, Vsx1. Vsx1 is predicted to be expressed at low levels in spermatocytes in our dataset and is clearly visible in germline cells by FISH (Figure 3 – figure supplement 2G,H). We also refer the referee to Figure 6K, where the mRNA for the somatic cyst cell marker eya was observed by FISH at low levels in spermatocytes.

      A truly interesting question mentioned by the authors is why the testis consistently ranks near the top of all tissues in the complexity of its gene expression. In the Li et al. (2022) paper it was suggested that this is due an inherently greater biological complexity of spermiogenesis than other tissues. It seems difficult to independently and rationally determine "biological complexity," but if a conserved characteristic of testis was to promiscuously express a wide range of (random?) genes, something not out of the question, this would be highly relevant and important.

      We agree that the massive transcriptional program found in spermatocytes is, indeed, truly interesting. There are many speculations as to why spermatocytes are so highly transcriptional, including the possibility of “transcriptional scanning” (e.g., Xia et al. 2020) regulating the evolution of new genes. Testing such models is beyond the scope of this paper. However, one must also keep in mind that spermatogenesis involves one of the most dramatic cellular transformations in biology, where cellular components spanning from nuclei to chromatin to Golgi, cell cycle, extensive membrane addition, changes in cell shape, and building of a complex swimming organelle all must occur and be temporally coordinated. Small wonder that many genes must be expressed to accomplish these tasks.

      Unfortunately, the most likely problems are simply technical. Drosophila cells are small and difficult to separate as intact cells. The use of nuclei was meant to overcome this inherent problem, but the effectiveness of this new approach is not yet well-documented. Support for the view that the problems are mostly technical, rather than a reflection of testis biology, comes from studies of scRNAseq in the mouse, where it has been possible to resolve a stem cell cluster, and germ cell pathways that follow known germ cell differentiation trajectories with much more discrete steps than were reported here (for example, Cao et al. 2021 cited by the authors).

      We respectfully disagree with the referee about this collection of statements. First, the use of snRNASeq has been extensively characterized and compared to scRNA-seq in brain tissue by McLaughlin et al., 2021 (cited in the original submission) and was shown to be effective (McLaughlin, et al. eLife 2021;10:e63856. DOI: https://doi.org/10.7554/eLife.63856). snRNA-seq has a distinct advantage when dealing with long, thin cells, such as neurons or cyst cells (as featured in this work), where cytoplasm can easily be sheared off during cell isolation. Second, in a previous portion of our response to this referee, we discussed how our interpretation of Cao et al., 2021 differs from that expressed by this referee. Lastly, as requested in ‘Essential revision’ 2, we adjusted clustering methods and selected four genes, two predicted to be markers for early stage germline cells, and two for mid-spermatocyte stage development. FISH analysis demonstrates that expression for each of these maps to the appropriate stages (new Figure 2 - figure supplement 1). This confirms that the datasets we present in this manuscript can be mined to identify unique, diagnostic markers for various stages.

      The conclusions that were made by the authors seem to either be facts that are already well known, such as the problem that transcriptional changes in spermatocytes will be obscured by the large stored mRNA pool, or promises of future utility. For example, "mining the snRNA-seq data for changes in gene expression as one cluster advances to the next should identify new sub-stage-specific markers." If worthwhile new markers could be identified from these data, surely this could have been accomplished and presented in a supplemental Table. As it currently stands, the manuscript presents the dataset including a fair description of its current limitations, but very little else of novel biological interest is to be found.

      “In sum, this project represents an extremely worthwhile undertaking that will eventually pay off. However, some currently unappreciated technical issues, in cell/nuclear isolation, and certainly in the bioinformatic programs and procedures used that mis-clustered many different cells, has created the current difficulties.

      Most scRNAseq software is written to meet the needs of mammalian researchers working with cultured cells, cellular giants compared to Drosophila and of generally similar size. Such software may not be ideal for much smaller cells, but which also include the much wider variation in cell size, properties and biological mechanisms that exist in the world of tissues.”

      We appreciate the referee’s acknowledgement that this ‘undertaking will eventually pay off’. It was not our intention to address ‘function’ for this study, but rather to make the system accessible to the broadest community possible. We are uncertain if there is any remaining reservation held by this referee. A brief summary of what we covered in the manuscript may help allay any residual concern. Obviously, study of the Drosophila testis and spermatogenesis benefits from the knowledge of a large number of established cell-type and stage-selective markers. Thus, we extensively used the community’s accepted markers to assign identity to clusters in both the sn- and sc-RNA-seq UMAPs. We believe that effort well establishes the validity and reliability of the dataset . Furthermore, we identified upwards of a dozen new markers out of the cluster analysis, and verified their expression by FISH or reporter line in various figures throughout (tup, amph, piwi, geko, Nep4, CG3902, Akr1B, loqs, Vsx1, Drep2, Pxt, CG43317, Vha16-5, l(2)41Ab). To our mind, these contributions, coupled with annotation of the datasets, suggest strongly that they will serve the community well. This is especially true as we provide users with objects that they can feed into commonly used software algorithms such as Seurat and Monocle to explore the datasets to their purposes. Rather than simply relying on default settings within some of the applications, we also adjusted parameters for various clusterings as called for; some of which were in response to astute comments from referees, and included in the resubmission. Of course, it is possible that rare issues may arise in the datasets as these are further studied, but that is the case with all scRNA-seq data, and is not specific to work on this model organism.

      Reviewer #3 (Public Review):

      In this study, the authors use recently published single nucleus RNA sequencing data and a newly generated single cell RNA sequencing dataset to determine the transcriptional profiles of the different cell types in the Drosophila ovary. Their analysis of the data and experimental validation of key findings provide new insight into testis biology and create a resource for the community. The manuscript is clearly written, the data provide strong support for the conclusions, and the analysis is rigorous. Indeed, this manuscript serves as a case study demonstrating best practices in the analysis of this type of genomics data and the many types of predictions that can be made from a deep dive into the data. Researchers who are studying the testis will find many starting points for new projects suggested by this work, and the insightful comparison of methods, such as between slingshot and Monocle3 and single cell vs single nucleus sequencing will be of interest beyond the study of the Drosophila testis.

      We greatly appreciate the reviewer’s comments.

      Reviewer #4 (Public Review):

      This is an extraordinary study that will serve as key resource for all researchers in the field of Drosophila testis development. The lineages that derive from the germline stem cells and somatic stem cells are described in a detail that has not been previously achieved. The RNAseq approaches have permitted the description of cell states that have not been inferred from morphological analyses, although it is the combination of RNAseq and morphological studies that makes this study exceptional. The field will now have a good understanding of interactions between specific cell states in the somatic lineage with specific states in the germ cell lineage. This resource will permit future studies on precise mechanisms of communication between these lineages during the differentiation process, and will serve as a model for studies of co-differentiation in other stem cell systems. The combination of snRNAseq and scRNAseq has conclusively shown differences in transcriptional activation and RNA storage at specific stages of germ cell differentiation and is a unique study that will inform other studies of cell differentiation.

      Could the authors please describe whether genes on the Y chromosome are expressed outside of the male germline. For example, what is represented by the spots of expression within the seminal vesicle observed in Figure 3D?

      Prior work demonstrated that proteins encoded by Y-linked genes are not expressed outside of the germline (Zhang et al. Genetics 2020. https://doi.org/10.1534/genetics.120.303324). In our snRNAseq dataset, we find that genes on the Y chromosome are not highly expressed outside of the male germline (on the order of ~100-fold lower in other tissues). In fact, we observe Y chromosome transcripts at this level in many nuclei across tissues collected for the Fly Cell Atlas project, including the ovary. Since we have not followed up on the Fly Cell Atlas observations directly using FISH to examine Y chromosome transcript expression outside the germline, we cannot rule out the possibility that such low level expression is real. However, the detection across several tissues argues that this is likely technical artifact. With regard to ‘spots of expression within the seminal vesicle’ (Figure 3D), a spot is colored red if the average expression level of genes on the Y chromosome is greater in that cell than in an average cell on our plot. These red spots are likely due to ambient RNA being carried over.

      I would appreciate some discussion of the "somatic factors" that are observed to be upregulated in spermatocytes (e.g. Mhc, Hml, grh, Syt1). Is there any indication of functional significance of any of these factors in spermatocytes?

      This is an excellent question. Although we validated expression for several (Hml, Vsx1 and eya), we did not test for their function here and this issue remains to be studied. This is now directly stated in the main text.

      In the discussion of cyst cell lineage differentiation following cluster 74 the authors state that neither the HCC or TCC lineages were enriched for eya (Figure 6V). It seems in this panel that cluster 57 shows some enrichment for eya - is this regarded as too low expression to be considered enriched?

      We thank the reviewer for their insightful comment and we agree with their conclusions. We have modified the text to reflect the low, but present, expression of eya in the HCC and TCC lineages. The text now reads as follows at line (insert line # here): “Enrichment of eya was dramatically reduced in the clusters along either late cyst cell branch compared to those of earlier lineage nuclei (Figure 6J,U).”

    1. Author Response

      Reviewer #2 (Public Review):

      This is an interesting study investigating the effects of sensory conflict on rhythmic behaviour and gene expression in the sea anemone Nematostella vectensis. Sensory conflict can arise when two environmental inputs (Zeitgeber) that usually act cooperatively to synchronize circadian clocks and behaviour, are presented out of phase. The clock system then needs to somehow cope with this challenge, for example by prioritising one cue and ignoring the other. While the daily light dark cycle is usually considered the more reliable and potent Zeitgeber, under some conditions, daily temperature cycles appear to be more prominent, and a certain offset between light and temperature cycles can even lead to a breakdown of the circadian clock and normal daily behavioural rhythms. Understanding the weighting and integration of different environmental cues is important for proper synchronization to daily environmental cycles, because organisms need to distinguish between 'environmental noise' (e.g., cloudy weather and/or sudden, within day/night temperature changes) and regular daily changes of light and temperature. In this study, a systematic analysis of different offsets between light and temperature cycles on behavioural activity was conducted. The results indicated that several degrees of chronic offset results in the disruption of rhythmic behaviour. In the 2nd part of the study the authors determine the effect of sensory conflict (12 hr offset that leads to robust disruption of rhythmic behaviour) on overall gene expression rhythms. They observe substantial differences between aligned and offset conditions and conclude a major role for temperature cycles in setting transcriptional phase. While the study is thoroughly conducted and represents and impressive amount of experimental and analytical work, there are several issues, which I think question the main conclusions. The main issue being that temperature cycles by themselves do not seem to fulfil the criteria for being considered a true Zeitgeber for the circadian clock of Nematostella.

      Major points:

      Line 53: 'However, many of these studies did not compare more than two possible phase relationships.....'. Harper et al. (2016) did perform a comprehensive comparison of different phase relationships between light and temperature Zeitgebers (1 hr steps between 2 and 10 hr offsets), similar to the one conducted here. I think this previous study is highly relevant for the current manuscript and -- although cited -- should be discussed in more detail. For example, Harper et al. show that during smaller offsets temperature is the dominant Zeitgeber, and during larger sensory conflict light becomes the dominant Zeitgeber for behavioural synchronization. Only during a small offset window (5-7 hr) behavioural synchronization becomes highly aberrant, presumably because of a near breakdown of the molecular clock, caused by sensory conflict. Do the authors see something similar in Nematostella? Figure 3 suggests otherwise, at least under entrainment conditions, where behaviour becomes desynchronized only at 10 and 12 hr offset conditions. But in free-run conditions behaviour appears largely AR already at 6 hr offset, but not so much at 4 and 8 hr offsets (Table 2). So there seems to be at least some similarity to the situation in Drosophila during sensory conflict, which I think is worth mentioning and discussing.

      We have added a more detailed discussion of our results in the context of Harper et al. 2016 (L468-476).

      Line 111: The authors state that 14-26C temperature cycle is 'well within the daily temperature range experienced by the source population'. Too me this is surprising, as I was not expecting that water temperature changes that much on a daily basis. Is this because Nematostella live near the water surface, and/or do they show vertical daily migration? Also, I do not understand what is meant by '...range of in situ diel variation (of temperature)'. I think a few explanatory words would be helpful here for the reader not familiar with this organism.

      In fact, one of our motivations for studying temperature is that Nematostella naturally experience extreme temperature variation. The data we cite (Tarrant et al. 2019) are from in-situ water measurements. Nematostella live in extremely shallow water (in salt marshes), and the local population in Massachusetts experience wide swings in temperature due to the temperate latitude.

      We have added this information to the Introduction (L88-90), and we also added a discussion of Nematostella’s ecology in the Discussion section (L591-654).

      Lines 114-117: I was surprised that clock genes can basically not be synchronized by temperature cycles alone. Only cry2 cycled during temperature cycles but not in free-run, so the cry2 cycling during temperature cycles could just be masking (response to temperature). Later the authors show robust molecular cycling during combined LD and temperature cycles (both aligned and out of phase), indicating that LD cycles are required to synchronize the molecular clock. Moreover, a previous study has demonstrated that LD cycles alone (i.e., at constant temperature) are able to induce rhythmic molecular clock gene expression (Oren et al. 2015). Similarly, the free running behaviour after temperature cycles does not look rhythmic to me. In Figure 2A, 14-26C there is at best one peak visible on the first day of DD, and even that shows a ~6 phase delay compared to the entrained condition. After the larger amplitude temperature cycle (8:32C) behaviour looks completely AR and peak activity phases in free-run appear desynchronized as well (Fig. 2B). Overall, I think the authors present data demonstrating that temperature cycles alone are not sufficient to synchronize the circadian clock of Nematostella. One way to proof if the clock can be entrained is to perform T-cycle experiments, so changing the thermoperiod away from 24 hr (e.g., 10 h warm : 10 h cold). If in a series of different T-cycles the peak activity always matches the transition from warm to cold (as in 12:12 T-cycles shown in Fig. 1A) this would speak against entrainment and vice versa.

      Thank you for these thoughtful comments and constructive suggestions. We have conducted an additional experiment, which provides further evidence that temperature cycles can, in fact, synchronize the circadian clock. To do this, we measured the behavior of animals entrained in cycles with a short (12h) period, half the length of a circadian period. This takes advantage of a phenomenon called “frequency demultiplication”, in which organisms in 12h environmental cycles display both 12h and 24h components--essentially, the clock perceives every other cycle as a “day” (Bruce, 1960; Merrow et al., 1999). The important thing is that the 24h behavioral component can only occur if the signal is entraining a circadian clock—otherwise, we would only observe a directly-driven 12h behavior pattern.

      We first show that this phenomenon occurs with 6:6 LD cycles—which we expected, because we know light is a zeitgeber. We then show that animals entrained to a temperature cycle with a 12h period also display 24h behavioral rhythms—and in fact the 24h component is stronger than the 12h component. We believe this is strong evidence that temperature is a bona fide zeitgeber in this system. This experiment is now explained in the Results (L127-154) and in Figure 2–Figure supplement 1.

      In terms of our original data, the reviewer is correct that the statistically-detectable free-running rhythms were weak and not visually obvious). Our confidence in thermal entrainment came from the fact that some individual animals had 24h rhythmicity in free-run, even if the signal was weak in the mean time series—this suggested that temperature must be at least capable of synchronizing internal clocks. It is also important to note that even light-entrained rhythms are “noisy” in cnidarians, which is why we were not surprised that the signal was weak. We have added a discussion of this observation in L601-612.

      Lines 210-226: As mentioned above, I think it is not clear that temperature alone can synchronize the Nematostella clock and it is therefore problematic to call it a Zeitgeber. Nevertheless, Figure 3A, B, D show that certain offsets of the temperature cycle relative to the LD cycle do influence rhythmicity and phase in constant conditions. This is most likely due to a direct effect of temperature cycles on the endogenous circadian clock, which only becomes visible (measureable) when the animals are also exposed to certain offset LD cycles. My interpretation of the combined results would be that temperature cycles play only are very minor role in synchronizing the Nematostella clock (after all, LD and temperature cycles are not offset in nature), perhaps mainly supporting entrainment by the prominent LD cycles.

      With our new data (see previous point), we believe we can safely say that temperature is a zeitgeber. We are not totally clear on what is meant by “a direct effect of temperature cycles on the endogenous circadian clock.” We argue that, because we see changes in free-running behavior during certain offsets, the timing of temperature cycles must affect the internal clock in a way that persists during constant conditions—it can’t just be a direct (clock-independent) effect of temperature.

      Gene expression part: The authors performed an extensive temporal transcriptomic analysis and comparison of gene expression between animals kept in aligned LD and temperature cycles and those maintained in a 12 hr offset. While this was a tremendous amount of experimental work that was followed by sophisticated mathematical analysis, I think that the conclusions that can be drawn from the data are rather limited. First of all, it is known from other organisms that temperature cycles alone have drastic effects on overall gene expression and importantly in a clock independent manner (e.g., Boothroyd et al. 2007). Temperature therefore seems to have a substantially larger effect on gene expression levels compared to light (Boothroyd et al. 2007). In the current study, except for a few clock gene candidates (Figure 2C), the effects of temperature cycles alone on overall gene expression have not been determined. Instead the authors analysed gene expression during aligned and 12 h offset conditions making it difficult to judge which of the observed differences are due to clock independent and clock dependent temperature effects on gene expression. This is further complicated by the lack of expression data in constant conditions. I think the authors need to address these limitations of their study and tone down their interpretations of 'temperature being the most important driver of rhythmic gene expression' (e.g., line 401). At least they need to acknowledge that they cannot distinguish between clock independent, driven gene expression and potential influences of temperature on clock-dependent gene expression rhythms. Moreover, in their comparison between their own data and LD data obtained at constant temperature (taken from Oren et al. 2015), they show that temperature has only a very limited effect (if any) on core clock gene expression, further questioning the role of temperature cycles in synchronising the Nematostella clock. Nevertheless, I noted in Table 3 that there is a 1.5 to 3 hr delay when comparing the phase of eight potential key clock genes between the current study (temperature and LD cycles aligned) and LD constant temperature (determined by Oren et al.). To me, this is the strongest argument that temperature cycles at least affect the phase of clock gene expression, but the authors do not comment on this phase difference.

      We agree with these points about the limitations of our study, and have revised the manuscript to phrase our conclusions more carefully. We still think it is reasonable to observe that temperature was a stronger drive of gene expression than light in our study, but this may not be true in other contexts.

      In terms of the comparison with Oren et al. 2015, we didn’t want to over-interpret these results because there are other differences between the studies (L1181-1185), including the use of a different source population. In addition, we would prefer denser sampling (2h time points rather than 4h) and larger sample sizes to make claims about phase differences.

      Network analysis: This last section of the results was very difficult to read and follow (at least for me). For example, do the colours in Figure 6A correspond to those in Figure 6B, C? A legend for each colour, i.e., which GO terms are included in each colour would perhaps be helpful. As mentioned above, I also do not think we can learn a lot from this analysis, since we do not know the effects of temperature cycles alone and we have no free-run data to judge potential influence on clock controlled gene expression. Under aligned conditions genes are expressed at a certain phase during the daily cycle (either morning to midday, or evening to midnight), which interestingly, is very similar to temperature cycle-only driven genes in Drosophila (Boothroyd et al. 2007). Inverting the temperature cycle has drastic effects on the peak phases of gene expression, but not so much on overall rhythmicity. But since no free-run data are available, we do not know to what extend these (expected) phase changes reflect temperature-driven responses, or are a result of alterations in the endogenous circadian clock.

      We have revised and streamlined this section and Fig. 6, including removing panel 6C. The colors do correspond across panels in the figure. For space, GO terms of select modules are included in Fig. 6, and GO results for all modules are included in the Supplemental Data and discussed in the Results.

      It is true that we can’t distinguish temperature-driven versus clock effects here, and it does seem like many modules simply follow the temperature cycle (which we say in this section). The most interesting finding from this section is probably that the co-expression structure (correlations between rhythmic genes) are substantially weakened during SC, and we do discuss certain modules of genes that lose or gain rhythmicity. We have revised this section to focus on the main points and have cut several of the less pertinent results.

      Reviewer #3 (Public Review):

      This article reflects a significant effort by the authors and the results are interesting.

      For the third set of experiments, are temperature and light really out of synch? While peak in temperature no longer occurs along with lights on, we do still have two 24 hour cycles where changes in the environmental cues still occur simultaneously (lights on with peak in temperature, lights off with min in temperature). I wonder what would happen if light remained at a 24 hour cycle and temperature became either sporadic (randomly changing cycles) or was placed on a longer cycle altogether (temperature taking 20 hours to increase from min to max, and then another 20 hours to go from max to min).

      Thank you for your interesting suggestions for future experiments. This point is addressed in our revisions responding to Reviewer #1, who requested a discussion of the phrase “sensory conflict.” We agree that the binary “in-sync vs. out-of-sync” may be too simplistic. Our original conception of sensory conflict was a situation in which light and temperature provide different phase information, as informed by experiments with only light (prior literature) or only temperature (this work).

      In our revised manuscript, we discuss the idea that “sensory conflict” is not always a useful framework because there are many possible relationships between light and temperature. Although our 12h offset is certainly less “natural” than our aligned time series, it may be useful to think of them simply as 2 different possible light and temperature regimes in which the two signals interact, rather than abstract ideals of “aligned” or “misaligned.”

      An area that could significantly benefit a broader readership would be to improve overall clarity of figures and rethink if all the results are necessary to convert the key findings of the paper. As written, the results sections is somewhat confusing.

      We have revised Figs. 1 and 6 for clarity, and we have also shortened the network analysis portion of the Results.

    1. Author Response

      Reviewer #1 (Public Review):

      Here the authors sought to understand how BPGM/2,3-BPG levels are involved in adaptive responses to hypoxia and whether they are involved in fetal growth restriction. In the current state, I find the data to be confusing and lacking in mechanistic data to justify that increased BPGM is an adaptive response to hypoxia. While the authors find increased staining for the enzyme BPGM in SpA-TGCs after hypoxia, they did not assess 2,3-BPG in cord blood. This would show that increased enzymatic levels have a downstream impact. MRI experiments assessing placental and fetal haemoglobin-oxygenation, showed no differences. Human FGR samples, however, showed reduced 2,3-BPG in cord blood. Further evidence is required to show hypoxia increases BPGM as a compensatory mechanism to permit adequate 2,3-BPG and placental-fetal oxygenation levels as the authors claim.

      Additional experiments that demonstrate that BPGM is advantageous in the context of hypoxia would strengthen the authors arguments, and would provide a novel mechanism for adaptive responses to hypoxia in the placenta which is highly interesting.

      Obtaining cord-blood from mouse embryos and analyzing its 2,3 BPG content is technically not feasible thus we concentrated on the human data only. However note that the dominant physiological effect would be on maternal blood in the placenta, where local elevation of 23BPG can aid in oxygen release.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript will be of interest for investigators in the field of development and the biology of pregnancy. The major strengths of the data are the detailed description of a hypoxia-induced mouse model of fetal growth restriction, where phenotypes, tissue histology, MRI images and metabolic analysis combine to characterize the experimental system. The data seem descriptive and preliminary, and the comparison to human pregnancy is neither supportive nor rigorous.

      Strengths

      • The mouse pregnancy has been used by the authors and by others as a model for placental insufficiency. The manuscript provides incremental data to characterize hypoxia- induced fetal growth restriction

      • The 15.2T MR imaging technology is high quality and informative, even if the results did not reveal marked changes.

      • The detailed characterization of BPGM expression in the apical mouse placental surfaces is valuable.

      • The provided model may be useful for future studies by the authors.

      Weaknesses

      • The metabolic analysis was restricted to one enzyme and metabolite. Placental analysis of 2,3-BPG and BPGM were already published (ref 29-30). At best, if the 2,3 BPG is related to the phenotype, it night be interpreted as a part of the injury in human cases, and adaptive response in the mouse models (as the authors suggested lines 286-288 and 332-336.). However, these assumptions are not tested.

      In the paper of Pritlove et al. (ref. 29) the authors demonstrated the expression of BPGM in normal human cohort. However, they did not test BPGM expression or 2,3 BPG levels in FGR placentae. In the paper of Gu et al. (ref. 30) the authors analyze murine placental BPGM expression secondary to igf2 deletion. Our study is the first to demonstrate the impact of maternal hypoxia on placental BPGM levels in murine gestational hypoxia models .

      • The human cases are not very informative. The causes of FGR were not known, but clearly (Table 1) not analogous to that of the mouse model. Systemic hypoxia in humans might have been more informative. In its absence, the value of cross-species comparison is low. -

      • While the provided experiments are of good quality, the approach is very descriptive and not advancing mechanistic understanding of FGR-related placental insufficiency.

      The human placenta were specifically selected to exclude known causes of FGR such as heavy smoking or iron deficiency. We will work to expand the diversity of cases to test the potential role of BPGM in those cases as well.

    1. Author Response

      Reviewer #1 (Public Review)

      This manuscript describes a new method to perform online movement correction and extraction of calcium signals from a miniscope. The efficiency of the algorithm is tested by quantifying the accuracy of animal location decoding from hippocampal place cells. The online decoding happens with virtually no delay which is promising for closed-loop methods. It seems to be superior to online decoding without motion correction, which was the state of the art.

      The strength of this technique is therefore that it achieves real-time processing.

      The weakness of the study is the lack of comparison of the decoding accuracy with what can be obtained with electrophysiological state of the art, which prevents really estimating how precise the technique is.

      In revision, we present data showing that when our system is used to decode contour-based calcium traces from N≈50 neurons, the decoder achieves a mean distance error of ~30 cm which is worse than the mean error of ~20 cm achieved using maximum likelihood decoding of single unit spike trains from electrophysiological recordings (Fig. 7E). However, when decoding of N=900 contour-free calcium traces from the same image frames in the same rats, the mean decoding error goes down to ~15 cm, which is better than the mean for electrophysiological recordings. From this we conclude that real-time decoding of position from calcium traces achieves accuracies similar to those achievable with electrophysiology.

      Although less critical, there is no demonstration of a closed-loop application.

      It is true that we have not yet demonstrated a real-time closed loop application, but by demonstrating short latency generation of TTL outputs triggered by the decoder, we demonstrate the capability for closed-loop applications.

      Real-time position decoding is technically nice, but the position can be obtained from tracking the animal so it is practically useless.

      We offer two points in reply to this comment. First, decoding position from neural activity could offer useful (though not yet demonstrated) capabilities that would not be achievable with simple position tracking; for example, the position decoder could be trained on CA1 signals obtained during waking and then used to read out position trajectories generating during REM sleep.

      Second, and more importantly, position decoding was selected as a benchmark for performance testing mainly because it allows highly precise comparisons between decoder predictions and ground truth, which is important for establishing that the fidelity of calcium signals imaged in real time is adequate for accurate decoding of behavior at short latencies.

      It is also clear that decoding position on a linear track is easier than on a 2D arena, therefore it is difficult to estimate how much the efficiency of the method can be challenged in harder settings.

      It is true that decoding in a 2D arena would be a greater challenge than a 1D linear track, but in pursuit of our goal to rapidly disseminate a system with capabilities for short latency decoding of behavior from calcium signals, optimizing system performance for one specific application (e.g,, position decoding) is not our main priority. A higher priority is to offer versatility for a wide range of experimental applications. To better demonstrate such versatility, the revised manuscript includes a new section in the Results that demonstrates categorical classification of behaviors during an instrumental touchscreen task.

      Reviewer #2 (Public Review):

      In this paper, the authors developed a new device for online decoding of position based on calcium imaging in freely moving rodents. This device could be used in the brain-computer interface to investigate neurofeedback-based therapies for neurological disorders. The technical part is properly done and gives convincing results that can be truly helpful for the scientific community using the miniscope. Nevertheless, as a methodological article, there should be more details regarding the accuracy of the decoding and of the different steps to follow if someone wants to use their methodology. Moreover, a true online real-time experiment should be performed to validate the device.

      Please find below my comments:

      • From what I read the authors did not perform a true real-time experiment. I think this step iscrucial to ensure the quality of their device.

      It is unclear from this comment where to draw the bar for a “true real-time experiment.” Some previous publications of real-time approaches (such as refs #6,#11,#26) have proposed causal algorithms without performance tests in hardware at all, whereas others (such as ref #14) have performance tested their system in hardware by carrying full experiments using closed-loop feedback (albeit with much smaller numbers of calcium trace predictors than we demonstrate here) without comparing different algorithmic approaches. Here we use an intermediate strategy of feeding raw offline video from a virtual sensor through the hardware processing pipeline (verifying that calcium trace outputs were identical for the real and virtual sensors). We adopted this intermediate approach to achieve the dual objectives of testing a true hardware implementation on real-time performance measures (e.g., microsecond processing latencies) while also benchmarking different algorithms (such as CB versus CF trace extraction as in Fig. 3, or raw calcium traces versus deconvolved spikes as in panel A of the Supplement to Fig. 3) against one another on the same datasets.

      • There should be a validation against a classical offline Bayesian decoding.

      We have presented an accuracy comparison for decoding linear track position from calcium traces with DeCalciOn versus decoding from single-unit spikes with electrophysiological recording data (Fig. 7E); decoding from single-unit spikes utilized a classical Bayesian maximum likelihood approach (see Methods), so Fig. 7E not only offers a comparison between calcium imaging versus electrophysiology, but between online linear classifier versus classical offline Bayesian approaches as well. In addition, we compared the performance of the linear classifier to a naïve Bayes decoder in panel B of the Supplement to Fig 3, showing that performance is better for the linear classifier than naïve Bayes.

      • "To mimic these steps using the virtual sensor in our performance tests, one session of imagedata was collected and stored from each of the 13 rats, yielding ~7 min (8K-9K frames) of sensor and position tracking data per rat. The linear classifier was then trained on data from the first half of each session and tested on data from the second half." This sentence is not clear enough. The authors should clearly describe the exact time needed for each experimental step. What is the time needed for instance for the experimental step 2, during which the linear classifier is trained to decode behavior from the initial dataset? This is crucial information if someone wants to use this device.

      In response to this comment, the Results section of the revised manuscript includes an extensive subsection (‘Steps of a real-time imaging session’) that describes each experimental step in detail (pages 4-6), including the time required for each step. In addition, this information is now more thoroughly summarized in the diagram of Fig. 1B.

      How the accuracy varies with the duration (or the quality) of the initial dataset? It is important that the authors provide an investigation of this to validate their device.

      This issue is now discussed in the Results near the bottom of page 5. In addition, Fig. 3G now plots how position decoding improves as a function of the size of the training dataset.

      • For instance, what is the decrease in decoding accuracy 1) with fewer place cells?

      The scatterplots in the right panels of Fig. 3D show that decoding accuracy improves as a function of the number of neurons imaged in given rat.

      What is the approximative number of place cells to obtain reliable decoding?

      This question is addressed by showing how decoding accuracy improves with the number of imaged neurons (Fig. 3D scatterplots). We also address this issue on our performance comparison of CB versus CF and CF+ traces since differing numbers of calcium trace predictors appear to be an important factor in accounting for the observed performance differences, as discussed in the main text (page 16, last paragraph).

      2) With the duration of the initial recording session. Here it seems to be of the order of 3-4 min.What if the recording session is shorter? Is there some constraint about this recording session (in terms of speed, stops, etc...) to obtain good decoding?

      The revised Fig. 3G plots how position decoding improves as a function of the size of the training dataset.

      3) Is there a link between the decoding accuracy and the number of place cells nearby?

      We did not select calcium traces that met a spatial criterion (i.e, “place cells”) to be include in the decoding analysis, Instead, all detected CA1 calcium traces provided input to the decoder, regardless of their spatial tuning properties (Fig. 3D and panels D,E of the Supplement to Fig. 3 show that many cells were indeed spatially tuned). Also note that when contour-free (CF) trace extraction methods were used, each calcium trace could detect fluorescence from multiple neurons. Under this methodology it is not straightforward to analyze how decoding accuracy at a given position varies with the “number of place cells nearby” and we are not convinced that presenting such an analysis would advance our main goal of demonstrating DeCalciOn’s capabilities to researchers.

      • The authors specified the time delay of 2.5ms for their device. Yet, it is pointless regarding thepurpose of the decoding. The important information is the precise position of the animal when the device is used to trigger a stimulation at a given location. Again, a true online experiment should be done to validate that a TTL can be triggered by the device at a precise location (with a quantification of the error made).

      We agree that this is an important issue, and it has been thoroughly addressed in the revised manuscript.

      • There is no information on the accuracy of the decoding with respect to the location in thelinear track. It is likely that the extremities of the linear track will be better identified. Figure 4C does not provide a clear description of the error made. The choice of D=2 (which seems to represent the spatial bin) is not justified. Two spatial bins seem to represent +/-40 cm which is quite large.

      Polar plots in Fig. 3F of the revised manuscript show mean accuracy in each position bin for decoders trained on offline, CB, CF,. and CB+ calcium traces.

      • The movement artefacts are not equally observed in the maze. The way they are correctedmight be captured by the linear decoder. These artefacts might have a strong influence on the decoding. Please provide a quantification of the correction made during steps 1 and 2 in relation to the position of the animal on the linear track. The authors should provide a correlation between the presence of these corrections with the decoding accuracy.

      Regardless of whether analysis is done offline or online, any calcium imaging and decoding experiment is vulnerable to two potential problems arising from motion artifact:

      PROBLEM #1. Image motion can generate noise in calcium signals that disrupts the accuracy of decoding.

      PROBLEM #2. Image motion that is correlated with behavior can convey uncontrolled information that allows the decoder to learn predictions from image motion rather than calcium signals. Very few published in-vivo calcium imaging experiments provide adequate controls for these two possible sources of artifact (again, such controls are just as necessary for offline as for online experiments). In response to the referee comments, we have provided controls for these confounds in our performance tests of DeCalciOn’s online decoding capabilities.

      Fig. 4B of the revised paper shows that without online motion correction, several rats in the linear track experiment show a significant correlation between position error and motion artifact (indicated by positive values on the y-axis); hence, motion artifact impairs decoding of position on the linear track in these rats (problem #1 above). This correlation between motion artifact and decoding error is reduced or eliminated by online motion correction (as indicated by values near zero on the x-axis), demonstrating that online motion correction helps to prevent motion artifact from impairing the accuracy of decoding.

      Fig. 6 of the revised paper shows that during an operant touchscreen experiment, motion artifact occurs preferentially during specific behaviors such as visiting the food magazine (reward retrieval, Fig. 6A) or touching the screen to make a response (correct choice, Fig. 6B). When motion correction is not used (top graphs in Figs. 6C-F), the average motion artifact is higher during frames when the decoder accurately predicts behavior than during frames when the decoder fails to predict behavior; hence, motion artifact appears to improve the accuracy of predicting these behaviors (problem #2 above). When motion correction is used, the average motion artifact no longer differs for correctly versus incorrectly decoded frames (except in one case, bottom right graph of Fig. 6E), indicating that motion correction helps to prevent the decoder from learning to predict behavior from motion artifact.

      • Besides the methodological part, I have some physiological questions. It is quite common inlinear tracks to have bi-directional and unidirectional place cells. Is it the case here? How many? It is difficult to see this in figure C. Is there an error due to the online decoding of the position in the two directions of the linear track?

      Again, since we did not select calcium traces that met a spatial criterion (i.e, “place cells”) to be include in the decoding analysis, and since CF traces could detect fluorescence from multiple neurons, we are not convinced that presenting a detailed analysis of this issue would advance our primary goal of demonstrating DeCalciOn’s capabilities to reseachers.

      Reviewer #3 (Public Review):

      DeCalciOn is an innovative contribution to the toolbox of real-time processing of calcium imaging data. It provides calcium traces from hippocampal CA1 neurons with a roughly two-millisecond latency and uses them to decode the position of rats running along a linear track - setting the stage for closed-loop experiments requiring fast interpretation of neural activity. The manuscript would be strengthened by a more systematic, empirical comparison to other, currently available alternative approaches. In addition, the decoding analysis does not fully account for the possibility of artifactual motion in the imaging video being informative of position.

      We suggest strengthening this manuscript by addressing the following four points:

      1) In the discussion of other platforms, the authors state that "Any system that lacks motionstabilization would also be vulnerable to artifactually decoding behavior from brain motion (which can be correlated with behavior) rather than neural activity." It follows that the same problem might also occur with incomplete motion correction. While the motion-corrected video shown in Supplementary Video 1 has reduced motion compared to the raw video, motion is still visible, including outside of the marked jitter. It remains possible that the linear decoders for the position in the linear track are utilizing brain motion-induced, as opposed to calcium fluorescence-induced, signal changes. A critical first step to assess this issue is to ask whether the motion in the video is related to the rat's behavior. One could test whether the 2D motion displacement traces can be used to predict rat position using linear classifiers.

      Briefly, we show that motion correction helps to prevent the decoder from learning to predict behavior from motion artifact.

      2) The manuscript would benefit from repeating the experiment in a more complex environment,such as a 2D arena. This would increase the generalizability of the findings. In addition, increasing the complexity of the environment would reduce the possibility that particular types of brain motion are closely linked with positions in the environment.

      We have diversified our performance testing by presenting results for decoding calcium activity from a different brain region (OFC rather than CA1) during a different kind of behavior (an instrumental touchscreen task rather than a linear track).

      3) The authors present an interesting comparison between "contour-free" and traditionalcontour-based source extraction. A more comprehensive discussion on the history or novelty of "contour-free" calcium imaging processing would contextualize this result.

      The revised Discussion section contains a new subsection titled “Source identification” to contextualize this issue.

      4) In the discussion, the authors compare DeCalciOn to two previous online calcium imagingalgorithms. The technical innovations of this work would be better highlighted by directly testing all three of these algorithms, ideally on similar datasets.

      Briefly, one of the two cited systems is designed for compatibility with benchtop 2P microscopes and does not interface with miniscopes; public resources are not available for the other cited online algorithm.

    1. Author Response

      Reviewer #3 (Public Review):

      This is an interesting study to examine how alveolar bone responds to oral infection using unbiased scRNA-seq. The manuscript is well-written and the results are convincing.

      1) The authors should revise the abstract. The study did nothing with the understanding of healing. The whole conditions were performed under infection and inflammation which actually induce bone loss, but not healing.

      Thank you for raising this point. We have revised the manuscript accordingly.

      2) Since periapical inflammation causes progressive bone loss, how MSC with increasing osteogenic potentials contributes to bone loss? The authors should discuss it.

      We would like to thank the reviewer for this important comment. Although AP is an inflammatory disease with periapical bone loss, the progression of AP is usually self-limiting in which a new equilibrium has been established between root canal pathogens and anti-infective defense mechanisms (Wang, Zhang, Xiong, & Peng, 2011). Animal experiments revealed that the bone lesion size reached to stable 21 days after establishing AP, which was resulted from a balance of bone remodeling (Márton & Kiss, 2014; Wang et al., 2011). Previous studies have shown that human apical granulation tissues contain osteogenic cells (Maeda, Wada, Nakamuta, & Akamine, 2004). A population of MSCs were isolated from human periapical cysts, which tended to be directed to differentiate toward the osteogenesis lineage (Marrelli, Paduano, & Tatullo, 2013, 2015; Tatullo et al., 2015). Activated by inflammatory bone destruction, these MSCs with increased osteogenic potentials may rescue the bone resorption process, which reach the equilibrium between bone formation and resorption then drive the progression of AP into stable states (Márton & Kiss, 2014). Since the pathologic stimuli exists constantly, the protective actions can alleviate the bone loss to some extent. In clinical practice, root canal therapy (RCT) aims to disinfect and remove the pathogenic factors, which makes the protective activities overweigh the destructive ones (L. M. Lin, Ricucci, Lin, & Rosenberg, 2009). The bone lesions of AP patients receiving RCT usually fully recovered with resolution of radiolucency after the inflammation is controlled in apical area (Soares, Santos, Silveira, & Nunes, 2006). The healing of AP lesion is highly correlated with the osteogenic potential of inflamed MSCs (L. M. Lin et al., 2009).

      We added the related contents in the discussion section.

      3) Did the authors detect osteoclasts by scRNA-seq? If not, are there any precursors of osteoclasts identified in inflammatory alveolar bones? 1) I suggest that the authors provide a more detailed analysis of inflammation since this is a unique model to study oral bone inflammation.

      Thank you for this valuable point. Bone destruction is a major pathological factor in chronic inflammatory diseases such as AP. Various cytokines including TNF-α, IL-1α, IL-6 were released by immunocytes to recruit the osteoclast precursors and induce the maturation of osteoclasts. We detected osteoclast markers including Ctsk, Acp5, Mmp9 and Nfatc1 by scRNA-seq. Moreover, Csfr1, Cx3cr1, Itgam, and Tnfrs11a were used to identify osteoclast precursors. The expression pattern of these osteoclast-related markers in all clusters were presented in Figure 3A. Markers of osteoclast and osteoclast precursors were highly expressed in the clusters of monocyte and macrophage. The expression levels of these markers were analyzed in all clusters (Figure 3B). The GO analysis showed that inflammation related immune reactions and bone resorption activity were significantly enriched in macrophage cluster (Figure 3C). Moreover, pseudotime analysis was performed for the clusters of macrophage and monocyte. Two independent branch points were determined and five monocyte/macrophage subclusters scattered at different branches in the developmental tree (Figure 3D, G). The results showed that the monocyte cluster differentiated into the macrophage cluster (Figure 3E). During this trajectory, the gene expression pattern across pseudotime showed that osteoclastic genes, such as Ctsk, Acp5, Mmp9, Atp6v0d2, and Dcstamp were progressively elevated (Figure 3F). Of note, we have observed a branch which was highly positive for Ctsk and Acp5 (Figure 3H), indicating the mature osteoclasts were differentiated from monocyte/macrophage lineage and contributed to inflammatory bone resorption during AP. We have also analyzed the expression of osteoclast related genes using the bulk RNA-seq library built on mandibular samples extracted from mice with AP. Markers of osteoclast and osteoclast precursors were significantly upregulated, confirming the osteoclasts activity in the inflammatory-related bone lesion (Figure 3I). Please see page 9 and figure 3.

      4) It is known that macrophages can be classified into M1 and M2. Based on scRNA-seq, did the authors observe these two types?

      We appreciate this point raised by the reviewer. We used CD86, CD80, IL1β, and TNF as markers of M1-like macrophages. CD163, CD206, MSR1 and IL-10 were used as markers to detect M2 subset in the macrophage cluster. The analysis of macrophage cluster showed the M1-like macrophage accounted for the vast majority in AP lesions. The expression pattern of M2 markers were also presented in macrophage cluster (Figure 3-figure supplement 1A, B).

    1. Author Response

      Reviewer #1 (Public Review):

      This study intended to identify the metabolic at-risk profile within PLWH on ART, by integrating and analyzing the multiomics data from multi-omics including untargeted plasma metabolomic, lipidomic, and fecal 16s microbiome. The overall strength of the study is the long-term treatment (~15 years) of the study subjects with well-recovered CD4 cell count and viral suppression. The integration and analysis of multi-omics data using similarity network fusion and factor analysis, etc. to group or differentiate HIV patients are informative and useful. The weakness of the study is the lack of presentation of comparability between patients and healthy controls and the use of multiple regression analysis for controlling potential confounders.

      We are thankful to the reviewer for the critical reading of our manuscript. The primary aim of our study was to identify the molecular data-driven phenotypic patient stratification in a cohort of PLWHART with prolonged suppressive therapy to identify the at-risk metabolic profile following long-term successful therapy. We and others have reported in several studies (e.g., Ref#9 and 10) that there were distinct systemic patterns in multi-omics data. However, as suggested, we have now provided Table 1-source data 1. We have kept HC in the analysis to define which group is presenting an HC-like profile among HIV, but we are not using them to perform statistics and draw conclusions.

      Reviewer #2 (Public Review):

      This study systematically integrates multi-omics (plasma lipidomic and metabolomic, and fecal 16s microbiome) data to identify the metabolic at-risk profiles within people living with HIV on antiretroviral therapy (PLWHART). As a result, three groups of PLWHART (SNF-1 to 3) were identified, which showed distinct phenotypes. Such insights cannot be obtained by a single type of omics data or clinical data, and have implications in personalized medicine and lifestyle intervention. Connecting the findings in this study with specific medical/clinical insights is the next challenge.

      We are thankful to the reviewer for the suggestion. System biology's application in identifying a disease state's biological mechanism in HIV-infected individuals is a relatively new field. We agree with the reviewer that connecting the findings in this study with specific medical/clinical insights is the next challenge. However, the first proof-of-concept study on 108 patients showed that multi-omics studies could generate a correlation network of communities of related analytes associated with physiology and disease. More importantly, the behavioral coaching informed by personal data helped participants to improve clinical biomarkers [PMID: 28714965]. The applications of multi-omics data are more and more valuable in non-communicable diseases [PMID: 35528975, PMID: 36503356 etc.]. As suggested by the reviewer, we have now elaborated on the medical/clinical value in identifying metabolic at-risk profiles, in particular the potential to improve individual risk stratification and to personalize lifestyle interventions. Still, as our study is an association study, data should be regarded as exploratory, and not sufficient to suggest any changes in clinical practice.

      We have concluded the manuscript as follows:

      “However, alterations in the metabolomics profile and higher CD4 T-cell count at the time of sample collection indicate a complex systemic interplay between host immunity and metabolic health. It can lead to an aggravated higher inflammation profile leading to a cardiometabolic risk profile among the MSM that might affect healthy aging in this population. Integrative analytical approaches that reflect the overall systemic health profile of PLWH may improve patient stratification and individual therapeutic and preventive strategies. Given the complex interplay between the clinical and molecular metabolic profile, the application of the multi-omics data for much larger cohorts of PLWH might facilitate a better identification of network perturbations and molecular network connections to detect early disease transition toward metabolic complications at an earlier stage. Developing a more personalized model or targeting the interaction networks rather than individual clinical or omics features may provide novel treatment strategies in countering dysregulated metabolic traits, aiming to achieve healthier aging.”

    1. Author Response

      eLife assessment:

      This study addresses whether the composition of the microbiota influences the intestinal colonization of encapsulated vs unencapsulated Bacteroides thetaiotaomicron, a resident micro-organism of the colon. This is an important question because factors determining the colonization of gut bacteria remain a critical barrier in translating microbiome research into new bacterial cell-based therapies. To answer the question, the authors develop an innovative method to quantify B. theta population bottlenecks during intestinal colonization in the setting of different microbiota. Their main finding that the colonization defect of an acapsular mutant is dependent on the composition of the microbiota is valuable and this observation suggests that interactions between gut bacteria explains why the mutant has a colonization defect. The evidence supporting this claim is currently insufficient. Additionally, some of the analyses and claims are compromised because the authors do not fully explain their data and the number of animals is sometimes very small.

      Thank you for this frank evaluation. Based on the Reviewers’ comments, the points raised have been addressed by improving the writing (apologies for insufficient clarity), and by the addition of data that to a large extent already existed or could be rapidly generated. In particularly the following data has been added:

      1. Increase to n>=7 for all fecal time-course experiments

      2. Microbiota composition analysis for all mouse lines used

      3. Data elucidating mechanisms of SPF microbiome/ host immune mechanisms restriction of acapsular B. theta

      4. Short- versus long-term recolonization of germ-free mice with a complete SPF microbiota and assessment of the effect on B. theta colonization probability.

      5. Challenge of B. theta monocolonized mice with avirulent Salmonella to disentangle effects of the host inflammatory response from other potential explanations of the observations.

      6. Details of all inocula used

      7. Resequencing of all barcoded strains

      Additionally, we have improved the clarity of the text, particularly the methods section describing mathematical modeling in the main text. Major changes in the text and particularly those replying to reviewers comment have been highlighted here and in the manuscript.

      Reviewer #1 (Public Review):

      The study addresses an important question - how the composition of the microbiota influences the intestinal colonization of encapsulated vs unencapsulated B. theta, an important commensal organism. To answer the question, the authors develop a refurbished WITS with extended mathematical modeling to quantify B. theta population bottlenecks during intestinal colonization in the setting of different microbiota. Interestingly, they show that the colonization defect of an acapsular mutant is dependent on the composition of the microbiota, suggesting (but not proving) that interactions between gut bacteria, rather than with host immune mechanisms, explains why the mutant has a colonization defect. However, it is fairly difficult to evaluate some of the claims because experimental details are not easy to find and the number of animals is very small. Furthermore, some of the analyses and claims are compromised because the authors do not fully explain their data; for example, leaving out the zero values in Fig. 3 and not integrating the effect of bottlenecks into the resulting model, undermines the claim that the acapsular mutant has a longer in vivo lag phase.

      We thank the reviewer for taking time to give this details critique of our work, and apologies that the experimental details were insufficiently explained. This criticism is well taken. Exact inoculum details for experiment are now present in each figure (or as a supplement when multiple inocula are included). Exact microbiome composition analysis for OligoMM12, LCM and SPF microbiota is now included in Figure 2 – Figure supplement 1.

      Of course, the models could be expanded to include more factors, but I think this comment is rather based on the data being insufficiently clearly explained by us. There are no “zero values missing” from Fig. 3 – this is visible in the submitted raw data table (excel file Source Data 1), but the points are fully overlapped in the graph shown and therefore not easily discernable from one another. Time-points where no CFU were recovered were plotted at a detection limit of CFU (50 CFU/g) and are included in the curve-fitting. However, on re-examination we noticed that the curve fit was carried out on the raw-data and not the log-normalized data which resulted in over-weighting of the higher values. Re-fitting this data does not change the conclusions but provides a better fit. These experiments have now been repeated such that we now have >=7 animals in each group. This new data is presented in Fig. 3C and D and Fig. 3 Supplement 2.

      Limitations:

      1) The experiments do not allow clear separation of effects derived from the microbiota composition and those that occur secondary to host development without a microbiota or with a different microbiota. Furthermore, the measured bottlenecks are very similar in LCM and Oligo mice, even though these microbiotas differ in complexity. Oligo-MM12 was originally developed and described to confer resistance to Salmonella colonization, suggesting that it should tighten the bottleneck. Overall, an add-back experiment demonstrating that conventionalizing germ-free mice imparts a similar bottleneck to SPF would strengthen the conclusions.

      These are excellent suggestions and have been followed. Additional data is now presented in Figure 2 – figure supplement 8 showing short, versus long-term recolonization of germ-free mice with an SPF microbiota and recovering very similar values of beta, to our standard SPF mouse colony. These data demonstrate a larger total niche size for B. theta at 2 days post-colonization which normalizes by 2 weeks post-colonization. Independent of this, the colonization probability, is already equivalent to that observed in our SPF colony at day 2 post-colonization. Therefore, the mechanisms causing early clonal loss are very rapidly established on colonization of a germ-free mouse with an SPF microbiota. We have additionally demonstrated that SPF mice do not have detectable intestinal antibody titers specific for acapsular B. theta. (Figure 2 – figure supplement 7), such that this is unlikely to be part of the reason why acapsular B. theta struggles to colonize at all in the context of an SPF microbiota. Experiments were also carried to detect bacteriophage capable of inducing lysis of B. theta and acapsular B. theta from SPF mouse cecal content (Figure 2 – figure supplement 7). No lytic phage plaques were observed. However, plaque assays are not sensitive for detection of weakly lytic phage, or phage that may require expression of surface structures that are not induced in vitro. We can therefore conclude that the restrictive activity of the SPF microbiota is a) reconstituted very fast in germ-free mice, b) is very likely not related to the activity of intestinal IgA and c) cannot be attributed to a high abundance of strongly lytic bacteriophage. The simplest explanation is that a large fraction of the restriction is due to metabolic competition with a complex microbiota, but we cannot formally exclude other factors such as antimicrobial peptides or changes in intestinal physiology.

      2) It is often difficult to evaluate results because important parameters are not always given. Dose is a critical variable in bottleneck experiments, but it is not clear if total dose changes in Figure 2 or just the WITS dose? Total dose as well as n0 should be depicted in all figures.

      We apologized for the lack of clarity in the figures. Have added panels depicting the exact inoculum for each figure legend (or a supplementary figure where many inocula were used). Additionally, the methods section describing how barcoded CFU were calculated has been rewritten and is hopefully now clearer.

      3) This is in part a methods paper but the method is not described clearly in the results, with important bits only found in a very difficult supplement. Is there a difference between colonization probability (beta) and inoculum size at which tags start to disappear? Can there be some culture-based validation of "colonization probability" as explained in the mathematics? Can the authors contrast the advantages/disadvantages of this system with other methods (e.g. sequencing-based approaches)? It seems like the numerator in the colonization probability equation has a very limited range (from 0.18-1.8), potentially limiting the sensitivity of this approach.

      We apologized for the lack of clarity in the methods. This criticism is well taken, and we have re-written large sections of the methods in the main text to include all relevant detail currently buried in the extensive supplement.

      On the question of the colonization probability and the inoculum size, we kept the inoculum size at 107 CFU/ mouse in all experiments (except those in Fig.4, where this is explicitly stated); only changing the fraction of spiked barcoded strains. We verified the accuracy of our barcode recovery rate by serial dilution over 5 logs (new figure added: Figure 1 – figure supplement 1). “The CFU of barcoded strains in the inoculum at which tags start to disappear” is by definition closely related to the colonization probability, as this value (n0) appears in the calculation. Note that this is not the total inoculum size – this is (unless otherwise stated in Fig. 4) kept constant at 107 CFU by diluting the barcoded B. theta with untagged B. theta. Again, this is now better explained in all figure legends and the main text.

      We have added an experiment using peak-to-trough ratios in metagenomic sequencing to estimate the B. theta growth rate. This could be usefully employed for wildtype B. theta at a relatively early timepoint post-colonization where growth was rapid. However, this is a metagenomics-based technique that requires the examined strain to be present at an abundance of over 0.1-1% for accurate quantification such that we could not analyze the acapsular B. theta strain in cecum content at the same timepoint. These data have been added (Figure 3 – figure supplement 3). Note that the information gleaned from these techniques is different. PTR reveals relative growth rates at a specific time (if your strain is abundant enough), whereas neutral tagging reveals average population values over quite large time-windows. We believe that both approaches are valuable. A few sentences comparing the approaches have been added to the discussion.

      The actual numerator is the fraction of lost tags, which is obtained from the total number of tags used across the experiment (number of mice times the number of tags lost) over the total number of tags (number of mice times the number of tags used). Very low tag recovery (less than one per mouse) starts to stray into very noisy data, while close to zero loss is also associated with a low-information-to-noise ratio. Therefore, the size of this numerator is necessarily constrained by us setting up the experiments to have close to optimal information recovery from the WITS abundance. Robustness of these analyses is provided by the high “n” of between 10 and 17 mice per group.

      4) Figure 3 and the associated model is confusing and does not support the idea that a longer lag-phase contributes to the fitness defect of acapsular B.theta in competitive colonization. Figure 3B clearly indicates that in competition acapsular B. theta experiences a restrictive bottleneck, i.e., in competition, less of the initial B. theta population is contributed by the acapsular inoculum. There is no need to appeal to lag-phase defects to explain the role of the capsule in vivo. The model in Figure 3D should depict the acapsular population with less cells after the bottleneck. In fact, the data in Figure 3E-F can be explained by the tighter bottleneck experienced by the acapsular mutant resulting in a smaller acapsular founding population. This idea can be seen in the data: the acapsular mutant shedding actually dips in the first 12-hours. This cannot be discerned in Figure 3E because mice with zero shedding were excluded from the analysis, leaving the data (and conclusion) of this experiment to be extrapolated from a single mouse.

      We of course completely agree that this would be a correct conclusion if only the competitive colonization data is taken into account. However, we are also trying to understand the mechanisms at play generating this bottleneck and have investigated a range of hypotheses to explain the results, taking into account all of our data.

      Hypothesis 1) Competition is due to increased killing prior to reaching the cecum and commencing growth: Note that the probability of colonization for single B. theta clones is very similar for OligoMM12 mouse single-colonization by the wildtype and acapsular strains. For this hypothesis to be the reason for outcompetition of the acapsular strain, it would be necessary that the presence of wildtype would increase the killing of acapsular B. theta in the stomach or small intestine. The bacteria are at low density at this stage and stomach acid/small intestinal secretions should be similar in all animals. Therefore, this explanation seems highly unlikely

      Hypothesis 2) Competition between wildtype and acapsular B. theta occurs at the point of niche competition before commencing growth in the cecum (similar to the proposal of the reviewer). It is possible that the wildtype strain has a competitive advantage in colonizing physical niches (for example proximity to bacteria producing colicins). On the basis of the data, we cannot exclude this hypothesis completely and it is challenging to measure directly. However, from our in vivo growth-curve data we observe a similar delay in CFU arrival in the feces for acapsular B. theta on single colonization as in competition, suggesting that the presence of wildtype (i.e., initial niche competition) is not the cause of this delay. Rather it is an intrinsic property of the acapsular strain in vivo,

      Hypothesis 3) Competition between wildtype and acapsular B. theta is mainly attributable to differences in growth kinetics in the gut lumen. To investigate growth kinetics, we carried our time-courses of fecal collection from OligoMM12 mice single-colonized with wildtype or acapsular B. theta, i.e., in a situation where we observe identical colonization probabilities for the two strains. These date, shown now in Figure 3 C and D and Figure 3 – figure supplement 2, show that also without competition, the CFU of acapsular B. theta appear later and with a lower net growth rate than the wildtype. As these single-colonizations do not show a measurable difference between the colonization probability for the two strains, it is not likely that the delayed appearance of acapsular B. theta in feces is due to increased killing (this would be clearly visible in the barcode loss for the single-colonizations). Rather the simplest explanation for this observation is a bona fide lag phase before growth commences in the cecum. Interestingly, using only the lower net growth rate (assumed to be a similar growth rate but increased clearance rate) produces a good fit for our data on both competitive index and colonization probability in competition (Figure 3, figure supplement 5). This is slightly improved by adding in the observed lag-phase (Figure 3). It is very difficult to experimentally manipulate the lag phase in order to directly test how much of an effect this has on our hypothesis and the contribution is therefore carefully described in the new text.

      Please note that all data was plotted and used in fitting in Fig 3E, but “zero-shedding” is plotted at a detection limit and overlayed, making it look like only one point was present when in fact several were used. This was clear in the submitted raw data tables. To sure-up these observations we have repeated all time-courses and now have n>=7 mice per group.

      5) The conclusions from Figure 4 rely on assumptions not well-supported by the data. In the high fat diet experiment, a lower dose of WITS is required to conclude that the diet has no effect. Furthermore, the authors conclude that Salmonella restricts the B. theta population by causing inflammation, but do not demonstrate inflammation at their timepoint or disprove that the Salmonella population could cause the same effect in the absence of inflammation (through non-inflammatory direct or indirect interactions).

      We of course agree that we would expect to see some loss of B. theta in HFD. However, for these experiments the inoculum was ~109 CFUs/100μL dose of untagged strain spiked with approximately 30 CFU of each tagged strain. Decreasing the number of each WITS below 30 CFU leads to very high variation in the starting inocula from mouse-to-mouse which massively complicates the analysis. To clarify this point, we have added in a detection-limit calculation showing that the neutral tagging technique is not very sensitive to population contractions of less than 10-fold, which is likely in line with what would be expected for a high-fat diet feeding in monocolonized mice for a short time-span.

      This is a very good observation regarding our Salmonella infection data. We have now added the fecal lipocalin 2 values, as well as a group infected with a ssaV/invG double mutant of S. Typhimurium that does not cause clinical grade inflammation (“avirulent”). This shows 1) that the attenuated S. Typhimurium is causing intestinal inflammation in B. theta colonized mice and 2) that a major fraction of the population bottleneck can be attributed to inflammation. Interestingly, we do observe a slight bottleneck in the group infected with avirulent Salmonella which could be attributable either to direct toxicity/competition of Salmonella with B. theta or to mildly increased intestinal inflammation caused by this strain. As we cannot distinguish these effects, this is carefully discussed in the manuscript.

      6) Several of the experiments rely on very few mice/groups.

      We have increased the n to over 5 per group in all experiments (most critically those shown in Fig 3, Supplement 5). See figure legends for specific number of mice per experiment.

      Reviewer #2 (Public Review):

      The goal of this study was to understand population bottlenecks during colonization in the context of different microbial communities. Capsular polysaccharide mutants, diet, and enteric infection were also used paired to short-term monitoring of overall colonization and the levels of specific strains. The major strength of this study is the innovative approach and the significance of the overall research area.

      The first major limitation is the lack of clear and novel insight into the biology of B. theta or other gut bacterial species. The title is provocative, but the experiments as is do not definitively show that the microbiota controls the relative fitness of acapsular and wild-type strains or provide any mechanistic insights into why that would be the case. The data on diet and infection seem preliminary. Furthermore, many of the experiments conflict with prior literature (i.e., lack of fitness difference between acapsular and wild-type strain and lack of impact of diet) but satisfying explanations are not provided for the lack of reproducibility.

      In line with suggestions from Reviewer 1, the paper has undergone quite extensive re-writing to better explain the data presented and its consequences. Additionally, we now explicitly comment on apparent discrepancies between our reported data and the literature – for example the colonization defect of acapsular B. theta is only published for competitive colonizations, where we also observe a fitness defect so there is no actual conflict. Additionally, we have calculated detection limits for the effect of high-fat diet and demonstrate that a 10-fold reduction in the effective population size would not be robustly detected with the neutral tagging technique such that we are probably just underpowered to detect small effects, and we believe it is important to point out the numerical limits of the technique we present here. Additionally for the Figure 4 experiments, we have added data on colonization/competition with an avirulent Salmonella challenge giving some mechanistic data on the role of inflammation in the B. theta bottleneck.

      Another major limitation is the lack of data on the various background gut microbiotas used. eLife is a journal for a broad readership. As such, describing what microbes are in LCM, OligoMM, or SPF groups is important. The authors seem to assume that the gut microbiota will reflect prior studies without measuring it themselves.

      All gnotobiotic lines are bred as gnotobiotic colonies in our isolator facility. This is now better explained in the methods section. Additionally, 16S sequencing of all microbiotas used in the paper has been added as Figure 2 – figure supplement 1.

      I also did not follow the logic of concluding that any differences between SPF and the two other groups are due to microbial diversity, which is presumably just one of many differences. For example, the authors acknowledge that host immunity may be distinct. It is essential to profile the gut microbiota by 16S rRNA amplicon sequencing in all these experiments and to design experiments that more explicitly test the diversity hypotheses vs. alternatives like differences in the membership of each community or other host phenotypes.

      This is an important point. We have carried out a number of experiments to potentially address some issues here.

      1) We carried out B. theta colonization experiments in germ-free mice that had been colonized by gavage of SPF feces either 1 day prior to colonization of 2 weeks prior to colonization. While the shorter pre-colonization allowed B. theta to colonize to a higher population density in the cecum, the colonization probability was already reduced to levels observed in our SPF colony in the short pre-colonization. Therefore, the factors limiting B. theta establishment in the cecum are already established 1-2 days post-colonization with an SPF microbiota (Figure 2 - figure supplement 8). 2) We checked for the presence of secretory IgA capable of binding to the surface of live B. theta, compared to a positive control of a mouse orally vaccinated against B. theta. (Fig. 2, Supplement 7) and could find no evidence of specific IgA targeting B. theta in the intestinal lavages of our SPF mouse colony. 3) We isolated bacteriophage from the intestine of SPF mice and used this to infect lawns of B. theta wildtype and acapsular in vitro. We could not detect and plaque-forming phage coming from the intestine of SPF mice (Figure 2 – figure supplement 7).

      We can therefore exclude strongly lytic phage and host IgA as dominant driving mechanisms restricting B. theta colonization. It remains possible that rapidly upregulated host factors such as antimicrobial peptide secretion could play a role, but metabolic competition from the microbiota is also a very strong candidate hypothesis. The text regarding these experiments has been slightly rewritten to point out that colonization probability inversely correlates with microbiota complexity, and the mechanisms involved may involve both direct microbe-microbe interactions as well as host factors.

      Given the prior work on the importance of capsule for phage, I was surprised that no efforts are taken to monitor phage levels in these experiments. Could B. theta phage be present in SPF mice, explaining the results? Alternatively, is the mucus layer distinct? Both could be readily monitored using established molecular/imaging methods.

      See above: no plaque-forming phage could be recovered from the SPF mouse cecum content. The main replicative site that we have studied here, in mice, is the cecum which does not have true mucus layers in the same way as the distal colon and is upstream of the colon so is unlikely to be affected by colon geography. Rather mucus is well mixed with the cecum content and may behave as a dispersed nutrient source. There is for sure a higher availability of mucus in the gnotobiotic mice due to less competition for mucus degradation by other strains. However, this would be challenging to directly link to the B. theta colonization phenotype as Muc2-deficient mice develop intestinal inflammation.

      The conclusion that the acapsular strain loses out due to a difference of lag phase seems highly speculative. More work would be needed to ensure that there is no difference in the initial bottleneck; for example, by monitoring the level of this strain in the proximal gut immediately after oral gavage.

      This is an excellent suggestion and has been carried out. At 8h post-colonization with a high inoculum (allowing easy detection) there were identical low levels of B. theta in the upper and lower small intestine, but more B. theta wildtype than B. theta acapsular in the cecum and colon, consistent with commencement of growth for B. theta wildtype but not the acapsular strain at this timepoint. We have additionally repeated the single-colonization time-courses using our standard inoculum and can clearly see the delayed detection of acapsular B. theta in feces even in the single-colonization state when no increased bottleneck is observed. This can only be reasonably explained by a bona fide lag-phase extension for acapsular B. theta in vivo. These data also reveal and decreased net growth rate of acapsular B. theta. Interestingly, our model can be quite well-fitted to the data obtained both for competitive index and for colonization probability using only the difference in net growth rate. Adding the (clearly observed) extended lag-phase generates a model that is still consistent with our observations.

      Another major limitation of this paper is the reliance on short timepoints (2-3 days post colonization). Data for B. theta levels over 2 weeks or longer is essential to put these values in context. For example, I was surprised that B. theta could invade the gut microbiota of SPF mice at all and wonder if the early time points reflect transient colonization.

      It should be noted that “SPF” defines microbiota only on missing pathogens and not on absolute composition. Therefore, the rather efficient B. theta colonization in our SPF colony is likely due to a permissive composition and this is likely to be not at all reproducible between different SPF colonies (a major confounder in reproducibility of mouse experiments between institutions. In contrast the gnotobiotic colonies are highly reproducible). We do consistently see colonization of our SPF colony by wildtype B. theta out to at least 10 days post-inoculation (latest time-point tested) at similar loads to the ones observed in this work, indicating that this is not just transient “flow-through” colonization. Data included below:

      For this paper we were very specifically quantifying the early stages of colonization, also because the longer we run the experiments for, the more confounding features of our “neutrality” assumptions appear (e.g., host immunity selecting for evolved/phase-varied clones, within-host evolution of individual clones etc.). For this reason, we have used timepoints of a maximum of 2-3 days.

      Finally, the number of mice/group is very low, especially given the novelty of these types of studies and uncertainty about reproducibility. Key experiments should be replicated at least once, ideally with more than n=3/group.

      For all barcode quantification experiments we have between 10 and 17 mice per group. Experiments for the in vivo time-courses of colonization have been expanded to an “n” of at least 7 per group.

    1. Author Response

      Reviewer #2 (Public Review):

      This is a highly interesting paper that provides important insights into the understanding of how HC-derived osteoblasts contribute to trabecular bone formation. Using single-cell transcriptomics, the authors found that HC descendent cells activate MMP14 and the PTH pathway as they transition to osteoblasts in neonatal and adult mice. They further demonstrate that HC lineage-specific Mmp14 null mutants (Mmp14ΔHC) produce more bone. By performing a panel of elegant in vitro studies, the authors show that MMP14 cleaves the extracellular domain of PTH1R, dampening PTH signaling. The authors provide more in vivo evidence showing that HC-derived osteogenic cells respond to PTH which is enhanced in Mmp14ΔHC. Generally, this is a very well-performed study that may contribute important novel aspects to the field.

      I have the following issues for the authors to address:

      1) The novel mechanism identified in this study (i.e. MMP14-induced PTH1R cleavage) is intriguing. It is unclear how specific this pathway is in the transition of HCs to osteoblasts. Are other MMPs besides MMP14 involved in the PTH1R cleavage? Is PTH1R the only substrate of MMP14?

      Thank you for your interest in our findings. ADAMs are known to cleave various transmembrane proteins such as RANKL. As described in supplementary fFgure 4A we tested A Disintegrin And Metalloproteinase (ADAMs) for their potential ability to cleave PTH1R. We did not find that ADAM10, 15, 17 could cleave PTH1R. The lack of the cleaved PTH1R peptide in extracts isolated from osteoblasts isolated from MMP 14 null bones (New Fig. 3E) suggest that there is not another major MMP that cleaves PTH1R. In regard to other substrates that are cleaved by MMP14 – we do review these in the manuscript and the possibility that the phenotype is contributed by deficiency in other substrates.

      2) Would it be possible for the authors to detect the truncated PTH1R fragment(s) from the conditioned medium prepared from either 293T or osteoblast culture?

      We tried to detect whether there could be PTH1R cleaved fragment in cultured medium by western blot of PCA precipitates of cultured medium. We could not detect any free peptide using anti-Flag or anti-HA antibody. It has been reported the ligand binding domain are linked by disulphide bond in vivo, therefore cleavage of PTH1R at the unstructured loop domain does not necessarily imply a release of cleaved fragment.

      3) The finding that HC-descendants persist and contribute to the anabolic response to PTH in aged mice is interesting. Have the authors examined the changes in MMP14 expression in bone with age and in response to PTH treatment?

      Thank you for your question, we added additional data showing induction of MMP14 expression upon PTH treatment in Figure 7—figure supplement 1. It has also been published that PTH stimulation increased MMP14 expression in osteocytes (1).

    1. Author Response

      Reviewer #2 (Public Review):

      Susswein et al. analyze a fine-scale, novel data stream of human mobility, openly available from Safegraph, based on the usage of mobile apps with GPS and sampled from over 45 million smartphone devices. They define a metric $\sigma_{it}$, properly normalized, that quantifies the propensity for visits to indoor locations relative to outdoor locations in a given county $i$ at week $t$. For each pair of counties $i$ and $j$, they compute the Pearson correlation coefficient $\rho_{ij}$ between the corresponding $\sigma$ metrics. This generates a correlation matrix that can be interpreted as the adjacency matrix of a network. They then perform community detection on this network/matrix, effectively clustering together time series that are correlated. This identifies three main clusters of counties, characterized geographically as either in the north of the country, in the south of the country, and possibly in tourism active areas. They then show, via a simple model, how including over-simplified models of seasonality may affect infectious disease models.

      This work is very interesting for the infectious disease modeling community, as it addresses a complex problem introducing a new data stream.

      This work builds on several strengths, among which:

      It is the first analysis of the Safegraph dataset to capture seasonality in indoor behavior.

      It provides a simple metric to quantify indoor activity, that thanks to the dataset can be computed with a high level of spatial detail.

      It aims at characterizing clusters of counties with a similar pattern of indoor activity.

      It aims at quantifying the impact of neglecting finer-scale patterns of seasonality, for example considering seasonality to be homogeneous at the US level.

      We thank the reviewer for the positive review of our work.

      At the same time, it presents several weaknesses that should be addressed to improve the methodology, its results, and the implication:

      There is no quantitative comparison of the newly introduced metric for indoor activity with other proxies of seasonality (e.g. temperature or relative humidity). The (dis)similarity with other proxies may help in assessing the importance of this metric, showing why it can not be exchanged with other data sources (like temperature data) that are widely available and are not affected by sampling issues (more on that later).

      We have now added supplementary figures (Figure S3) to illustrate how indoor activity seasonality compares with temperature and humidity. We have also added text to the Results and the Discussion to discuss this point.

      A major flow of the analysis is to perform community detection on a network defined by the correlation between time series with an algorithm that is based on modularity optimization. As explained in Macmahon et al.[1], all modularity optimization methods rely on null assumptions that in the case of correlation between time series are violated. Therefore, there is a very strong potential bias in their results that is not accounted for. Possible solutions could be to proceed via the methodology presented in [1] or via a different type of algorithm (e.g. Infomap [2]). In both cases, as the network is thresholded (considering only a correlation larger than 0.9), a more quantitative assessment of the impact of the threshold value should be included.

      References

      [1] Mel MacMahon and Diego Garlaschelli Phys. Rev. X 5, 021006 (2015).

      [2] Martin Rosvall and Carl T. Bergstrom PNAS 105, 1118 (2008).

      We thank the reviewer for making this excellent point. We have now added Supplementary Figures S13 and S14. In Figure S13, we demonstrate the robustness of our clustering results with different correlation thresholds. (We have also corrected a typo in our original Methods section which mistakenly stated our correlation threshold as 0.9 rather than the 90th percentile which is what we used.) In Figure S14, we show the clustering results using a different clustering algorithm. In an effort to test a non-network-based clustering approach, we use a hierarchical clustering approach and find a consistent partition of the US to our main results.

      It is not clear what is the added value of the data on indoor activity, as no fitting to real data is performed. Although this may be considered beyond the scope of this paper, I think it would be crucial to quantify how much a data-informed model would better describe real epidemic data (for example in the case of COVID-19). For now, only the impact of neglecting heterogeneity in indoor activity is shown, comparing a model with region-average parameters vs a model with county-level average parameters. Given that the dataset comes with potential bias in sampling (more on this later) it would be good to assess its goodness in predicting real epidemic spread. When showing results from different models, no visible errors are shown on the plot. How have the errors been estimated?

      We appreciate this point by the reviewer, and agree that future work will have to consider how indoor activity seasonality affects our ability to capture observed transmission trends. However, such work would additionally need careful characterization of other seasonal factors hypothesized to drive transmission (including environmental and other behavioral factors), and is beyond the scope of our work. Instead, in Figure 4 we aim to (a) provide the infectious disease modeling community with empirically-inferred parameters for a simple sinusoidal model which is commonly used in infectious disease models to capture transmission seasonality; and (b) demonstrate the implications of ignoring geographic heterogeneity in transmission seasonality in theoretical models of disease dynamics, which are commonly used for scenario analysis and model-based intervention design. As we demonstrate, transmission seasonality described by such sinusoidal models, even when they are empirically characterized as in our case, can lead to meaningfully different epidemic dynamics when transmission seasonality varies from the assumptions.

      Additionally, there is no uncertainty included in Figure 4B because transmission seasonality is either based on empirical data point per time step, or on the fitted sinusoidal model (where the estimated parameters have negligible standard errors).

      The dataset is presented as representative of the US population. However, this has not been assessed over time. As adherence to social distancing is influenced by several socio-economic determinants the lack of representativity in certain strata of the population at a given time may introduce an important bias in the dataset. Although this is an inherent limitation of the dataset, it should be discussed in the paper more thoroughly.

      We agree with the reviewer that this is a limitation. However, we do not have any way of assessing demographic representation in the dataset over time. We have instead included an additional sentence into the Discussion section acknowledging this point.

      In conclusion, I think that the methodology should be revised to account for the fact that the analysis is performed on a correlation matrix. Capturing seasonal patterns of indoor activity can help in tackling the crucial problem of seasonality in human behavior. This could help in identifying effective strategies of disease containment able to curb disease spread at a lower societal cost than fully-fledged lockdowns.

      We thank the reviewer again for their helpful suggestions.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors characterized the expression of DDR2 in the developing craniofacial skeleton. The authors showed that Ddr2-deficient mice exhibited defects in craniofacial bones including impaired calvarial growth and frontal suture formation, cranial base hypoplasia due to aberrant chondrogenesis, and delayed ossification at growth plate synchondroses. The histological studies are well done. However, the studies as shown in this manuscript do not provide cellular and molecular mechanisms beyond what is already known, particularly beyond what the authors have already published in a similar study in Bone Research (Mohamed et al., 2022 Feb 9;10(1):11). With the same Cre lines and analytic approaches, the authors already showed in the Bone Research paper that Ddr2 in the Gli1+ cells is required for chondrocyte proliferation and polarity in growth plate development and osteoblast differentiation. Cartilage development and bone formation occur in both long bones and craniofacial skeleton, the authors showed similar functions of Ddr2 in similar skeletal tissues, although the location is different. One new point in this manuscript might be: the authors indicated that loss of Ddr2 led to ectopic chondrocyte hypertrophic (Fig. 7I). But what the data actually showed was delayed chondrocyte hypertrophy and abnormal location of the delayed hypertrophic chondrocytes, which could be well caused by abnormal chondrocyte polarity. This interesting defect was superficially described with no mechanistic investigation at cellular or molecular level.

      New data is now provided showing that Ddr2 deficiency is associated with abnormal collagen organization and orientation as measured by second harmonic generation (SHG) (Fig 3-figure supplement 1). Specifically, collagen orientation as reflected by SHG anisotropy measurements was disrupted in Ddr2-deficient synchondroses. This result complements data showing that the distribution of type II collagen as measured by immunofluorescence changes with Ddr2 deficiency such that no collagen is seen in the interterritorial matrix between chondrocyte bundles (Fig 3a). This loss of collagen organization provides a potential mechanism to explain the disruption of chondrocyte polarity and altered localization of hypertrophic cells in synchondroses. In further support of this concept, other recently published studies described in the Discussion have shown that Ddr2 deficiency is associated with disruption of collagen fibril orientation in other experimental systems such as in CAF cells surrounding breast tumors as well as at sites of heterotopic ossification and that these abnormalities are associated with defective integrin signaling. Additional studies beyond the scope of the present communication will be required to determine if these matrix changes can explain the observed phenotypes. However, we believe this proposed mechanism is the most likely explanation for DDR2 effects based on current data.

      Reviewer #2 (Public Review):

      DDR2 is a collagen-binding receptor that is required for proper skull development. Ddr2 loss-of-function in humans is associated with the developmental disease spondylo-meta-epiphyseal dysplasia (SMED). Here, the authors aim to elucidate the role of DDR2 in skull development. In this work, the role of DDR2 in skull and face development is studied in mice, which exhibit SMED-like symptoms in the absence of Ddr2. Histological studies showed that Ddr2 knockout disrupts organization and proper differentiation within progenitor-rich regions of the skull from which bone growth occurs. Histology and lineage tracing studies revealed that DDR-expressing cells in/around these zones 1) generally also express the proliferation regulator Gli1, and 2) eventually contribute to osteogenic and chondrogenic lineages. Cell-type specific knockout studies were used to show that DDR2 has a development-specific role: knockout of Ddr2 in Gli+ cells re-capitulated the developmental abnormalities observed in global Ddr2 knockout mice; knockout in chondrocytes partially recapitulated developmental abnormalities, and osteoblast-specific knockout mice were indistinguishable from their wild-type littermates. This work also catalogues the locations of Ddr2 positive cells and their lineages at various stages of development. Additionally, the anatomical effects of loss of DDR2 function on skull and face development are thoroughly described in global and cell-type specific knockouts.

      This work is a vital and stimulating contribution to the scientific literature. The authors' claims and conclusions are well supported by the evidence they present.

      The scientific approach is sound and the conclusions important. However, a limitation of the work's discussion is a lack of attention paid to the specific biophysical mechanism that DDR2 is playing during development. The discussion of the positioning of the golgi is nice, but a lack of golgi polarity is likely a downstream effect of processes occurring within the cell adhesion and mechanotransduction machinery. Perhaps, like integrins, DDR2 is a mechanosensor that the cell needs to properly sense local collagen orientation, polarize, and secrete properly-organized COL2. It would be beneficial to put up some guideposts that will facilitate engagement from the molecular biophysics/mechanobiology community.

      Thank you for this suggestion. In response, we added new studies showing that DDR2 is necessary for ECM organization (please see reviewer 1 comments and additions to the Discussion section). In addition, the Discussion has been revised to include speculation on the relationship between DDR2-dependent ECM organization, mechanical properties of the matrix and cell differentiation. Because very little is known about DDR2 from a mechanistic perspective, much of what we propose is currently conjecture, but hopefully can guide future study.

      Reviewer #3 (Public Review):

      From this work, the authors investigated a number of parameters in order to profoundly understand and demonstrate the vital role of ongoing interaction between components of extracellular matrix and particular stem cells to induce normal Craniofacial development. Thus, there was a focus on the genetic manipulation (knockout) impact of molecules behind the above-mentioned interaction, and on determining how such modification would be reflected on skull bone morphogenesis.

      Strengths and Weaknesses

      • Using different animals' backgrounds in the same experiment might impact work outcomes.

      • Better to have (ethical approval) at the beginning of the material and methods in separate paragraphs.

      • It is great that the authors precisely explain all the measurements.

      • Supplementary file to have details of used antibodies might be required.

      • All methods have been written in academic and clear ways.

      • It is nice that there is a conclusion sentence by end of the results paragraph, which made it easy for readers to fully remember and understand.

      • It is possible to see a reduction in proliferative chondrocyte, with no change in apoptosis rate?

      Reductions in proliferation are certainly seen in many systems. Proliferation and apoptosis are not necessarily coupled.

      • Results are supposed to be compatible.

      • Very nice and representative images from the immunofluorescence protocol.

      • Using different techniques to confirm observations is clearly manifested in methods and results.

      It is clear that the author has used different methods and techniques in order to meet his work's objectives. Importantly, there was more than one procedure to confirm observations that are related to one or more than one aim.

      Although determining to what extent the outcomes of this work could be applied to community need might require a subspecialist physician's opinion, it seems that observations of the present study are likely to require a series of further investigations in order to take it to the level of human users. Notably, identification of molecules and pathways behind skull development abnormalities would open a door to early diagnosis reasons for such deformities, thus mitigating future abnormalities either by developing new prevention methods or discovering unique medications.

      Thank you for these comments. Additional commentary has been added to the Discussion to provide a more mechanistic interpretation of our results, however speculative they may be at this time. Ln 555-605

    1. Author Response

      Reviewer #1 (Public Review):

      King et al. provide an interesting reanalysis of existing fMRI data with a novel functional connectivity modeling approach. Three connectivity models accounting for the relationship between cortical and cerebellar regions are compared, each representing a hypothesis. Evidence is presented that - contrary to a prominent theoretical account in the literature - cortical connectivity converges on cerebellar regions, such that the cerebellum likely integrates information from the cortex (rather than forming parallel loops with the cortex). If true, this would have large implications for understanding the likely computational role of the cerebellum in influencing cortical functions. Further, this paper provides a unique and potentially groundbreaking set of methods for testing alternate connectivity hypotheses in the human brain. However, it appears that insufficient details were provided to properly evaluate these methods and their implications, as described below.

      Strengths:

      • Use of a large task battery performed by every participant, increasing confidence in the generality ofthe results across a variety of cognitive functions.

      • Multiple regression was used to reduce the chance of confounding (false connections driven by a thirdregion) in the functional connectivity estimates.

      • A focus on the function and connectivity of the cerebellum is important, given that it is clearly essentialfor a wide variety of cognitive processes but is studied much less often than the cortex.

      • The focus on clear connectivity-based hypotheses and clear descriptions of what would be expectedin the results if different hypotheses were true.

      • Generalization of models to a completely held-out dataset further increases confidence in thegeneralizability of the models.

      Concerns:

      1) The main conclusion of the paper (including in the title) involves a directional inference, and yet it is notoriously difficult to make directional inferences with fMRI. The term "input" into the cerebellum is repeatedly used to describe the prediction of cerebellar activity based on cortical activity, and yet the cerebellum is known to form loops with the cortex. With the slow temporal resolution of fMRI it is typically unclear what is the "input" versus the "output" in the kinds of predictions used in the present study. Critically, this may mean that a cerebellar region could receive input from a single cortical region (i.e., the alternate hypothesis supposedly ruled out by the present study), then output to multiple cortical regions, likely resulting (using the fMRI-based approach used here) in a faulty inference that convergent signals from cortex drove the results. On pg. 4 it is stated: "We chose this direction of prediction, as the cerebellar BOLD signal overwhelmingly reflects mossy-fiber input, with minimal contribution from cerebellar output neurons, the Purkinje cells (Mathiesen et al., 2000; Thomsen et al., 2004)." First, it would be good to know how certain this is in 2022, given the older references and ongoing progress in understanding the relationship between neuronal activity and the BOLD signal (e.g., Drew 2019). Second, given that it's likely that activity in the mossy-fiber inputs has an impact on Purkinje cell outputs, and that some cortical activity supposedly reflects cerebellar output, it is possible that FC could also reflect the opposite direction (cerebellumcortex). It would seem important to consider these possibilities in the interpretation of the results.

      We agree that making directional inferences with fMRI BOLD signals is difficult. We also note that because of the low temporal resolution of fMRI BOLD signals, we have not tried to extract directional information based on temporal lags. Rather, we emphasize that the relationship between neural activity and BOLD differs between the neocortex and cerebellum. In the cerebellum, mossy fiber activity releases glutamate which activates granule cells and the release of Nitric oxide (NO). NO is mostly released by granule cells and stellate cells. The release of NO increases the diameter of capillaries which in turn causes changes in blood flow and blood volume, two major contributors to BOLD signal changes (Alahmadi et al. 2016; Alahmadi et al. 2015; Drew 2019; Mapelli et al. 2017; Gagliano et al. 2022). Importantly, there is a negligible contribution of NO from the Purkinje cells. Taken together, these data make a strong case that the BOLD signal in the cerebellar cortex reflects activity at the input stage. We acknowledge that the references cited in our initial submission were somewhat dated. We have now provided additional references (which are in agreement with the findings from the earlier papers).. Based on this evidence, we chose to predict cerebellar activity from cortical activity.

      References: Alahmadi, A. A., Samson, R. S., Gasston, D., Pardini, M., Friston, K. J., D’Angelo, E., ... & Wheeler-Kingshott, C. A. (2016). Complex motor task associated with non-linear BOLD responses in cerebro-cortical areas and cerebellum. Brain Structure and Function, 221(5), 2443-2458.

      Alahmadi, A. A., Pardini, M., Samson, R. S., D'Angelo, E., Friston, K. J., Toosy, A. T., & Gandini Wheeler‐Kingshott, C. A. (2015). Differential involvement of cortical and cerebellar areas using dominant and nondominant hands: an FMRI study. Human brain mapping, 36(12), 5079-5100.

      Mapelli, L., Gagliano, G., Soda, T., Laforenza, U., Moccia, F., & D'Angelo, E. U. (2017). Granular layer neurons control cerebellar neurovascular coupling through an NMDA receptor/NO-dependent system. Journal of Neuroscience, 37(5), 1340-1351.

      Gagliano, G., Monteverdi, A., Casali, S., Laforenza, U., Gandini Wheeler-Kingshott, C. A., D’Angelo, E., & Mapelli, L. (2022). Non-Linear Frequency Dependence of Neurovascular Coupling in the Cerebellar Cortex Implies Vasodilation–Vasoconstriction Competition. Cells, 11(6), 1047.

      Drew, P. J. (2019). Vascular and neural basis of the BOLD signal. Current Opinion in Neurobiology, 58, 61–69.

      2) It would be helpful to have more details included in the "Connectivity Models" sub-section of the Methods section. The GLM-based connectivity approach is highly non-standard, such that more details on the logic behind it and any validation of the approach would be helpful. More specifically, it would be helpful to have clarity on how this form of functional connectivity relates to more standard forms, such as Pearson correlation and perhaps less standard multiple regression (or partial correlation) approaches. If I understand this approach correctly, each cortical parcel's time series is modulated (up or down) using that parcel's task-evoked beta weights, then "normalized" by the standard deviation of that parcel's time series, with the resulting time series then used in a multiple regression model to explain variance in a given cerebellar voxel's time series. It would be helpful if each of these steps were better explained and justified. For example, it is unclear what modulation of the cortical parcel time series by task-related beta weights does to the functional connectivity estimates, and thus how they should be interpreted.

      All of the models are multiple regression models. The independent variables (X) are the fitted (task-evoked) time series of the cortical parcels and the dependent variables (Y) are the fitted time series of each cerebellar voxel. Coefficients from multiple regression are identical to partial correlation coefficients if the cortical and cerebellar time series are z-standardized (SD=1). Here we only standardized the cortical time series. This only retains the weighting of the different cerebellar voxels (a cerebellar voxel that has a strong task-related signal should contribute more to the overall evaluation than a voxel where the task-related signal is weak); beyond this, the conclusions will be the same as that obtained with a partial correlation analysis.

      Because the number of predictors (#cortical parcels) approaches or outstrips the number of available observations (#task-related regressors), the ordinary-least-squares (OLS) solution to the multiple regression problem is not unique. We thus compared 3 common ways of regularizing a multiple regression problem: a) Picking only the most important regressor (a form of feature selection or optimal subspace selection), Ridge regression (L2 regularization) or Lasso regression (L1 regularization). Each method biases the solution in a particular way: The winner-take-all solution is obviously very sparse, the Lasso solution somewhat less sparse, and the Ridge solution quite dispersed. Here we exploited these differences in inductive bias, reasoning that the method with the bias that best matches the structure of the data-generating process will lead to better prediction performance on independent data.

      The results clearly favored a distributed input to each cerebellar voxel from the cortical parcels. We have rewritten the method section on connectivity models to better communicate the main idea.

      3) It appears that task-related functional connectivity is used in the present study, and yet the potential for task-evoked activations to distort such connectivity estimates does not appear to be accounted for (Norman-Haignere et al. 2012; Cole et al. 2019). For example, voxel A may respond to just the left hemifield of visual space while voxel B may respond to just the right hemifield of visual space, yet their correlation will be inflated due to task-evoked activity for any centrally presented visual stimuli. There are multiple methods for accounting for the confounding effect of task-evoked activations, none of which appear to be applied here. For example, the following publications include some options for reducing this confounding bias: (Cole et al. 2019; Norman-Haignere et al. 2012; Ito et al. 2020; Rissman, Gazzaley, and D'Esposito 2004; Al-Aidroos, Said, and Turk-Browne 2012). If this concern does not apply in the current context it would be important to explain/show why.

      The papers cited by the reviewer focus on the problem of how to remove task-evoked activity to estimate the correlation of spontaneous (task-independent) fluctuations. Here we are doing the opposite. We removed almost all spontaneous fluctuations and noise by averaging across trials and runs in order to fit the task-evoked activity. Additionally, we used a crossed approach as a way to control for the influence of task-independent fluctuations on the regression models: Within each task set, cerebellar activity from one half of the runs was predicted from cortical activity from the other half of the runs. Returning to the papers cited by the reviewer, these are designed to look at connectivity not related to task-evoked activity. We briefly summarize each below:

      ● Cole et al. (2019): Demonstrates that the removal of mean task-evoked activations while preserving task-evoked response shape is an important preprocessing step for validating task-based FC.

      ● Ito et al. (2020): Addressed the issue of shared variability between brain regions during task-evoked activity by estimating time series variance. They removed task-evoked activity from the time series in order to get a direct measure of neural-to-neural correlations (e.g., “background connectivity”) rather than task-to-neural associations.

      ● Al-Aidroos et al. (2012): Confronted with a similar problem of interpreting intrinsic correlations related to a goal (e.g., attending to scenes) from correlations related to synchronized stimulus-evoked responses. To mitigate this confound, they removed stimulus-evoked responses from the data resulting in “background connectivity” which was then used to assess inter-region coupling.

      ● Rissman et al. (2004): Introduced a new approach to characterize inter-region correlations during event-related activity by allowing inter-regional interactions to be assessed independent of activity at individual stages of a task.

      ● Norman-Haignere et al. (2012): To assess inter-region interactions (between fusiform gyrus and parahippocampal cortex), the authors removed the mean stimulus-evoked response and examined the correlations that occurred in the background of stimulus-locked changes (e.g., background connectivity).

      4) It is stated (pg. 21): "To reduce the influence of these noise correlations, we used a "crossed" approach to train the models: The cerebellar time series for the first session was predicted by the cortical time series from the second session, and vice-versa (see Figure 1). This procedure effectively negates the influence of noise processes, given that noise processes are uncorrelated across sessions." However, this does not appear to be strictly true, given that the task design (parts of which repeat across sessions) could interact with sources of noise. For example, task instruction cues (regardless of the specific task) likely increase arousal, which likely increases breathing and heart rates known to impact global fMRI BOLD signals. The current approach likely reduces the impact of noise relative to other approaches, but such strong certainty that noise processes are uncorrelated across sessions appears to be unwarranted.

      We completely agree. What we meant to say is that the procedure “negates the influence of any noise process that is uncorrelated with the tasks.” If we can predict the cerebellar activity patterns in session 2 by the cortical activity patterns measured in session 1, we can conclude that this prediction must be based on task-related signal changes given that the sequence of tasks is randomized. However, we do not know whether these task-related signals are caused directly by neural processes or indirectly by physiological processes (for example increased heart-rate in some conditions). The procedure only removes the influence of noise processes that are unrelated to the tasks. In our experience, these noise correlations can be quite strong and methods to remove them can introduce biases. For task-related noise processes we relied on high-pass filtering, a standard approach in task-based GLM approaches (see Methods).

      5) It appears possible that the sparse cerebellar model does worse simply because there are fewer predictors than the alternate models. It would be helpful to verify that the methods used, such as cross-validation, rule out (or at least reduce the chance) that this result is a trivial consequence of just having a different number of predictors across the tested models. It appears that the "model recovery" simulations may rule this out, but it is unclear how these simulations were conducted. Additional details in the Methods section would be important for evaluating this portion of the study.

      Our methods ensure full correction for model complexity (see response to major comment #2). Note that the sparse methods select regressors from all available cortical parcels; as such, “model complexity” is not well summarized by the number of non-zero regressors. We have now clarified these issues in the Methods section and have also revised the paper to better describe our model recovery simulations designed to address the issue of possible biases caused by different degrees of collinearity between cortical regressors.

      Reviewer #2 (Public Review):

      The human cerebellum likely has a significant but understudied contribution to cognition and behavior beyond the motor domain. Clarifying its functional relationship with the cerebral cortex is a critical detail necessary for understanding cerebellar functions. This paper addresses this challenge by testing three simple but intuitive models: winner-take-all, one-to-one model versus two converging input models. Results showed that the convergence model outperformed the one-to-one mapping model, indicating that cerebellar regions received multiple converging inputs from the different cortical regions. Overall the paper is well-written, and the results are clean and interesting. The methodological rigor of using cross-validation and generalization is also a strength of this paper.

      1) The authors concluded that some cerebellar regions receive converging inputs from multiple cortical regions because the Ridge and Lasso models outperformed the WTA model. The WTA model has a fixed diagonal pattern, in contrast, Ridge/Lasso models included more weights in the connectivity matrix. Considering what's being estimated in this matrix, then perhaps the findings are not surprising because even after penalizing and regularization, the ridge regression models are still more complex than the WTA model (more elements are allowed to vary). In other words, Lasso/Ridge models allow more variables from the X side to explain variances in Y, similar to how throwing in more regressors can always improve the R square. I am unsure if cross-validation mitigates this issue. It would be more straightforward for the authors to compare model performance in a way that controls for the number of variables in the Ridge/Lasso models.

      We now recognize that we could have done a better job in explaining our approach on this issue in the original submission. The models (including connectivity weights and regularization parameter) are trained solely on data from Task set A. They are tested on 2 independent datasets: 1) Data from the same participants performing novel tasks; 2) Data from new participants performing novel tasks. This allows us to compare models of different structure and complexity.

      2) The authors did an excellent job reviewing the anatomical relationship between the cerebral cortex and the cerebellum. There are several issues that the authors should address in the introduction or discussion. First, if the anatomical relationship between the cerebellum and the cortex is closed-loop as suggested in the intro, then how convergence can arise from multiple cortical inputs given there is no physical cross-talk? Second, there are multiple synapses connecting a cerebellar region and the cortex, and therefore could integration occur at other sites but not the cerebellum? For example, the caudate, the thalamus, or even the cortex (integrating inputs before sending to the cerebellum)?

      We agree that the correlation structure of BOLD signals in the neocortex and cerebellum is shaped by the closed-loop (bi-directional) interactions between the two structures. As such, some of the observed convergence could be caused by divergence of cerebellar output. We have added a new section to the discussion on the directionality of the model (Page 18).

      That said, there are strong reasons to believe that our results are mainly determined by how the neocortex sends signals to the cerebellum, and not vice versa. An increasing body of physiological studies (and this includes newer papers, see response to reviewer #1, comment #1 for details) show that cerebellar blood flow is determined by signal transmission from mossy fibers to granule cells and parallel fibers, followed by Nitric oxide signaling from molecular layer interneurons. Importantly, it is clear that Purkinje cells, the only output cell of the cerebellar cortex, are not reflected in the BOLD signal from the cerebellar cortex. (We also note that increases in the firing rate of inhibitory Purkinje cells means less activation of the neocortex). Thus, while we acknowledge that cerebellar-cortical connectivity likely plays a role in the correlations we observed, we cannot use fMRI observations from the cerebellar cortex and neocortex to draw conclusions about cerebellar-cortical connectivity. To do so we would need to measure activity in the deep cerebellar nuclei (and likely thalamus).

      The situation is different when considering the other direction (cortico-cerebellar connections). Here we have the advantage that the cerebellar BOLD signal is mostly determined by the mossy fiber input which, at least for the human cerebellum, comes overwhelmingly from cortical sources. On the neocortical side, the story is admittedly less clear: The cortical BOLD signal is likely determined by a mixture of incoming signals from the thalamus (which mixes inputs from the basal ganglia and cerebellum), subcortex, other cortical areas, and local cortical inputs (e.g., across layers). While the cortical BOLD signal (in contrast to the cerebellum) also reflects the firing rate of output cells, not all output cells will send collaterals to the pontine nuclei. These caveats are now clearly expressed in the discussion section2.

      On balance, there is an asymmetry: Cerebellar BOLD signal is dominated by neocortical input without contribution from the output (Purkinje) cells. Neocortical BOLD signal reflects a mixture of many inputs (with the cerebellar input making a small contribution) and cortical output firing. This asymmetry means that the observed correlation structure between cortical and cerebellar BOLD activity (the determinant of the estimated connectivity weights) will be determined more directly by cortico-cerebellar connections than by cerebellar-cortical connections. Given this, we have left the title and abstract largely the same, but have tempered the strength of the claim by discussing the influence of connectivity in the opposite direction.

      3) The dispersion metric quantifying the spread level in cortical inputs is interesting. Could the authors expand this finding and show anatomically what the physical spread is like in cortical space? The metric is novel but hard to interpret. A figure demonstrating the physical spread in the cortex should help readers interpret this result.

      Figure 3 (previously Figure 4) was included to provide examples of differences in the spatial spread of cortical inputs. For example, regions 1 and 2 are explained by a more restricted and spatially contiguous set of cortical inputs (e.g., primary motor cortices) whereas regions 7 & 8 are explained by a set of spatially disparate regions (e.g., angular gyrus, superior and middle frontal cortices, and superior temporal gyrus). Prompted by this comment, we have opted to reverse the order of Figures 3 and 4 to give the reader a chance to visualize differences in physical spread of cortical regions before we walk through the quantitative analysis.

      4) At the end of the discussion section, the authors discussed how results are more likely driven by cortical inputs to the cerebellum but not the other way around. This interpretation is likely overstated given the hemodynamic blurring and low temporal resolution of BOLD. Without a faster imaging sequence and accurate models that account for differences in hemodynamic properties, the more parsimonious interpretation is results are driven by bidirectional cortico-cerebellar interactions. The results are still very interesting without this added nuisance.

      Our analyses do not rely on the exact time course or delays between neocortical and cerebellar activation, but only on the activity profiles across a wide range of tasks. In terms of bidirectionality, please see our response above. We have added a dedicated section in the revised Discussion on this issue.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to define the molecular mechanism of activation of the thrombopoietin receptor (TpoR), a very important cytokine receptor that regulates megakaryocyte differentiation and platelet production. They conducted a thorough series of experiments combining mutagenesis experiments with sophistical biological assays and that also includes solid-state NMR structural measurements. This work builds on a body of previous studies of TpoR from this group and from others. They focused both on (1) the role and impact of W515 located in the juxtamembrane cytosolic domain and (2) the impact of introducing either Asn at sites in the transmembrane domain to induce various dimerization modes, or insertion of pairs of Ala residues to induce helical rotation to the TM domain. There is a lot of nice data in this paper, which is fairly intricate - a tough read, but that's because it's a complicated system. The writing is excellent.

      This paper presents a model for receptor activation in which the inactive receptor is the monomeric form of the receptor in which the juxtamembrane domain, including W515, maintains a helical structure. Activation of the receptor triggers dimerization of the transmembrane domain and loss of helicity of the juxtamembrane segment, which facilitates optimal interactions of the kinase domains with their JACK2 domain phosphorylation substrates.

      There is a lot to like in this careful work and the resulting manuscript. There is one major shortcoming in this manuscript, which concerns W515. It is known that mutation of W515 to any of 17 of the canonical amino acids, including Phe, is sufficient to trigger homodimerization and receptor activation. The authors present some evidence that the phenomenon behind this is that mutation of W515 to almost any other residues disrupts the helical secondary structure of the critical juxtamembrane segment, which promotes dimerization and receptor activation. What I find puzzling is why a Trp at site 515 promotes helix formation, but nearly all other amino acids at this site disrupt helix formation. This strongly suggests the side chain of W515 must be interacting with another domain of the protein in the inactive state, in a manner that is responsible for how Trp stabilizes the juxtamembrane helix, which is a central feature that helps define that state. I think that for this paper, this dangling missing piece of their mechanistic model should be resolved.

      We agree with the reviewers that the mechanism by which Trp515 stabilizes the TM helix is central to the mechanism of activation. More broadly, our studies over the past decade have sought to address the importance of the entire RWQFP insert in the TM domain. Our working model for this sequence has been that cation-π interactions are central to the role of the Trp and the accompanying amino acids.

      Arginine and tryptophan both are over-represented at the cytoplasmic TM-JM boundaries of membrane proteins. Arginine is positively charged and part of the “positive-inside” rule for membrane protein insertion. Arginine and lysine define the cytoplasmic ends of TM helices and prefer to be accessible to the water-exposed membrane surface. In contrast, tryptophan residues prefer hydrophobic head-group or membrane interior locations. A revealing aspect of the RWQFP motif is that the arginine and tryptophan are located at the membrane to cytosolic border. As a result, in order to accommodate arginine in a more water-inaccessible membrane environment, it interacts with the surface of the tryptophan indole ring. Partitioning of the RWQF sequence in a more water-inaccessible environment also drives the formation of helical secondary structure as an unpaired backbone C=O...NH in a hydrophobic environment is estimated to cost 3-6 kcal/mol of energy.

      We have taken two approaches in respond to this essential criticism of the reviewers: one structural and one computational. Additional NMR data (structural approach) has been included in the supporting information (see response to point 2 below). Computational approaches provide a second way to address whether a cation– interaction between Trp515 and the positively charged Arg514 is responsible for stabilizing the C-terminal TM helix. We have included a new supporting figure using Alpha-Fold 2.0 that probes the structural changes upon mutation of Trp515. In the wild-type receptor, Arg514 is predicted to form a cation– interaction with Trp515. In the W515K mutant, the helical secondary structure in the RKQFP sequence is disrupted and Arg514 forms a new cation– interaction with Trp529. Similar changes occur in other Trp515 mutants (e.g. W515A) highlighting the ability of Alpha-Fold to predict such interactions and the consequences of mutation. Overall, 15 out of 19 W515X mutants are predicted to be unfolded. Experimentally, 17 out of 19 mutations lead to activation. Importantly, W515C and W515P are the only two amino acid substitutions that do not cause constitutive activity experimentally (Defour, Chachoua, Pecquet, & Constantinescu, 2016). Computationally, these two sites do not predict helix unraveling. In short, the overall the predictions of Alpha-Fold agree with the unique nature of tryptophan at position 515.

      In addition, we have expanded the arguments supporting the potential role of cation–π interactions by adding a new section entitled “Unfolding of the RWQF -helical motif is a common mechanism of receptor activation”.

      These modifications are now in the revised manuscript starting with line 213:

      Our working model for the mechanism of activation in the wild-type or mutant receptors is that the RWQF motif is stabilized in the inactive state as an -helix as a result of a cation- interaction between R514 and W515. This interaction allows the RWQF sequence to partition into the more hydrophobic head-group region of the bilayer. Both Arg and Trp are over-represented at the cytoplasmic ends of TM helices (von Heijne, 1992), but whereas Arg prefers a water-accessible environment, Trp prefers to be buried in a more hydrophobic environment (Yau, Wimley, Gawrisch, & White, 1998). Since Arg and Trp are located at the border between membrane and cytosolic domains and Arg precedes Trp in the sequence, partitioning into the membrane head-group region results in a favorable interaction of the positive charge associated with the guanidinium group of the R514 side chain with the partial negative charge associated with the aromatic surface of the W515 side chain. Partitioning of the RWQF sequence into the more water-inaccessible environment drives the formation of helical secondary structure as an unpaired backbone C=O...NH in a hydrophobic environment is estimated to cost 6 kcal/mol of energy (Engelman, Steitz, & Goldman, 1986). In this model, activation of the receptor results in or is caused by disruption of the R514-W515 cation-π interaction. In the W515 mutants, R514 is no longer stabilized in a membrane environment and the helix containing the RWQFP sequence unravels to allow the positively charged side chain to reach outside of the membrane. In the case of the Asn mutants and in the wild-type receptor with bound Tpo, dimerization of hTpoR (or rotation of the TM helices in mTpoR dimer), places W515 in the center of the helix-helix interface. The data suggest that a steric clash of the W515 side chains results in unraveling of the cytoplasmic end of the TM helix.<br /> Computational and additional NMR data are provided in the supplementary figures to support the model of helix unraveling suggested by the solid-state NMR studies. Computationally, we used AlphaFold 2.0 (Jumper et al., 2021) calculations of hTpoR TM-JM peptides to predict the influence of all possible mutations at position 515 on the TM-JM helix structure. Remarkably, -helix unraveling was predicted for 15 out of 20 possible amino acids at 515 (supplement 2 to Figure 3). Importantly, two of the mutations that are not predicted to cause helix unraveling are W515C and W515P. Experimentally, these two amino acid substitutions are the only ones that do not induce constitutive activity among all possible amin oacid substitutions at W515 (Defour et al., 2016). Introducing a Trp at the preceding position 514 instead of R/K in W515K/R mutants reverses helix unfolding in AlphaFold simulations (supplement 3 to Figure 3). This result agrees with our previous data that the WRQFP mutant is inactive and is essentially monomeric (J. P. Defour et al., 2013). Structurally, we have undertaken solution-NMR studies of the wild-type hTpoR TM-JM peptide and its W515K mutant. Relaxation measurements of the backbone 15N resonances show that W515K mutation leads to association of the TM helices, and that it induces upfield chemical shift changes in the RWQF sequence consistent with helix unraveling (supplement 1 to Figure 3).

      Reviewer #2 (Public Review):

      The thrombopoietin receptor (TpoR) regulates stem cell proliferation, platelet production, and megakaryocyte differentiation. Past cell biology and biophysical studies have established that ligand-induced dimerization constitutes the mechanism of activation of TpoR. Specifically, ligands bind to the extracellular domain of TpoR and generate an allosteric response that is transmitted to the transmembrane domain, activating downstream signaling. However, up to now the molecular details of how the allosteric signals are transmitted to the intramembrane domains have been elusive. In this manuscript, Constantinescu and co-workers combined NMR, in vitro, and in vivo assays to investigate the activation and oncogenicity of TpoR. The authors concluded that the unwinding of the juxtamembrane domain is the main structural event that determines TpoR activation and regulates oncogenicity. The solid-state NMR studies were carried out in lipid membranes with polypeptides spanning the juxtamembrane and transmembrane residues. The authors show a series of spectra of 13CO resonances that encompass the juxtamembrane domain that is diagnostic of a structural transition from a helical conformation to a partially disordered state. The unwinding of the helical juxtamembrane domain was confirmed by site-specific mutations in this region. The chemical shift changes clearly indicate the transition from order to disorder (and vice versa) for selected sites. These conclusions are compounded by INEPT-type experiments that detect the most dynamic region of polypeptides. To rationalize the molecular mechanism for activation, the authors also used Ala-Ala insertions at strategic positions along the transmembrane domain. These experiments showed that the specific orientation of the transmembrane residues is central for TpoR activation, and a slight rotation of the helix is critical for activation of the receptor. Transcriptional activity assays confirm the importance of the proper orientation of the transmembrane domain for receptor activation.

      Overall, I believe the data are solid, and both biophysical and cell biology studies support the conclusions of the authors. These new findings represent a significant advancement in understanding cytokine receptor activation.

      We thank the reviewer for these comments.

      Reviewer #3 (Public Review):

      The authors sought to propose a mechanism by which cancer-causing mutations in the thrombopoietin receptor (TpoR) activate the receptor. To do so, they used a systematic approach of introducing non-native and naturally occurring mutations into the receptor and use a combination of in-vivo and cell-based assays and solid-state NMR spectroscopy. They propose that the proximity of the asparagine mutations to the cytosolic boundary influences the secondary structure of the receptor and suggests that this structural change induces receptor activation.

      The strengths of this work are the importance of the system being studied and tackling a problem that is not yet fully resolved. The authors acquired a large and convincing set of biological data, including in vivo experiments that support the gain-of-function/activating role of the mutations studied. The solid-state NMR data are of high quality as well. In particular, the INEPT data in figure 6a display very clear differences within one region of the wild-type compared to the mutants.

      One significant weakness is the validity of the conclusions given the limited atomistic measurements presented. Namely, the authors make rather specific conclusions about protein folding based on a single set of 13C alanine carbonyl chemical shifts in the wild-type and mutant TM peptides. Essentially, the authors observe chemical shift perturbations at this carbonyl carbon when mutations are introduced into a protein and use this information to make conclusions about secondary structure. I am not convinced that the authors have presented sufficient evidence to justify the conclusion that the helix unwinds and that this is responsible for the mechanism of activation. While the other cell-based experiments in mutations are interesting, deciphering such a specific folding mechanism with limited atomistic data is not justified.

      We added both computational data and solution NMR to support our conclusion.

    1. Author Response

      Reviewer #1 (Public Review):

      Proton pumps are necessary to set up gradients necessary for myriad biological processes. The malaria-causing parasite Plasmodium falciparum, uses two main pathways to achieve this, the vacuolar ATPase (V-type ATPase) and a more ancient vacuolar pyrophosphatase (PfPV1). The proton motive force set up across the parasite plasma membrane holds particular significance since it is necessary for transport of nutrients and waste products into and out of the cell. Motivated by the observation that the V-type ATPase is no expressed until several hours after the parasite has entered host cells, the present study examines the function of PfVP1. The authors demonstrate PfVP1 depletion blocks the early development of Plasmodium-specifically the transition from the ring to the trophozoite stage-and this is associated with changes to cellular pH and pyrophosphate levels, consistent with predicted functions. Complementation of the conditional knockdown suggests that pyrophosphatase activity alone is not sufficient to overcome the loss of PfVP1. Overall, data supporting a critical role for PfVP1 in parasite energetics is compelling. However, the lack of several key controls somewhat weakens the conclusions of the paper when it comes to complementation of the mutants and description of which activities are needed for parasite survival. Because the proximal activities of the enzyme ATP generation and the proton motive force are incompletely examined, some of the major conclusions from the study remain speculative.

      We thank the reviewer for these constructive comments. We are grateful to the reviewer for his/her recognition of the significance of our study. The major discovery of this manuscript is to uncover PfVP1’s essential role in the early-stage development of the 48h asexual lifecycle in P. falciparum. Our data suggest PPi is an energy source when ATP level is likely low in the ring stage malaria parasite and its transition to the trophozoite stage. We have performed additional experiments and tried the best to address each comment from the reviewer.

      Reviewer #2 (Public Review):

      In this work, the authors characterize a proton pump from the parasite Plasmodium falciparum that uses pyrophosphate as an energy source (PfVP1).

      They looked at the expression and localization of the pump in different stages of the parasite and determined that it localizes to the plasma membrane and it is highly expressed in the ring stage. They studied the biochemical function by expressing the gene in Saccharomyces followed by isolation of vesicles and measurements of proton transport and PPi hydrolysis. They also characterized the biological role of PfVP1 in the parasites by creating conditional mutants that express PfVP1 when cultured in the presence of anhydrotetracycline (ATC). Upon removal of ATC the expression of PfVP1 is downregulated, which impacted growth and transition to the trophozoite stage. Mutant parasites struggled to progress through the ring state and failed to become trophozoites in the second intraerythrocytic cycle. They complemented the mutants with the yeast inorganic pyrophosphatase gene and the Arabidopsis vacuolar pyrophosphatase.

      We thank the reviewer for positive and constructive comments. We have seriously worked on every comment raised by the reviewer. We have tried the best to perform additional experiments.

      Reviewer #3 (Public Review):

      Solebo and coworkers investigated the energy requirements of blood-stage malaria parasites (the stage of infection that causes symptoms). Traditionally, parasites were thought to be somewhat quiescent during the first half of their life cycle in red blood cells and become metabolically active as they prepare for replication. Consequently, antimalarial drugs are more active against parasites during the second half of their life cycle. In this report, the authors show that the metabolic by-product pyrophosphate is an essential energy source for the development of early-stage malaria parasites and that it is consumed by a vacuolar pyrophosphatase (PfVP1). Knock down studies showed that PfVP1 is required for the development of early-stage parasites and localization studies established that it is located in the parasite plasma membrane. Characterization of PfVP1 heterologously expressed in yeast confirmed that it is a pyrophosphate hydrolyzing proton pump. Consequently, loss of PfVP1 in early-stage parasites results in reduced pyrophosphate consumption and a reduction in pH (accumulation of protons). The authors further show that a similar vacuolar pyrophosphatase from Arabidopsis thaliana can complement the loss of the parasite ortholog, but a general pyrophosphatase enzyme cannot. Consistent with this result, mutations designed to inactivate either the pyrophosphatase activity or the proton-pumping activity demonstrated that both activities are essential for the development and survival of early-stage parasites.

      The conclusions of this paper are firmly supported by data, often from more than one type of experimental approach. The conclusions provide fundamental information about the stage of parasite development that has been hard to target with antimalarial drugs. The most energy-consuming process in a cell is the maintenance of membrane potential and in malaria parasites, it is known that proton pumps (rather than sodium pumps) are responsible for this process. Although PfVP1 was previously reported to be located internally in an organelle of the parasite, the data presented in this report clearly define its location on the plasma membrane and its essential role in maintaining the membrane potential. PfVP1 inhibitors could preferentially target early stage malaria parasites and the current results support efforts to find these inhibitors. Perhaps the most exciting aspect of this work is the potential to act synergistically and enhance the effect of current antimalarial drugs on early stage parasites. In this vein, the authors tested four antimalarial compounds in conjunction with knockdown of PfVP1 to determine whether there was enhanced activity. These experiments were not conducted in a systematic way and this is perhaps the only weakness of the paper.

      We thank the reviewer for positive, constructive, and encouraging comments. We really appreciate that. We are also very excited about our discovery that a non-ATP driven proton pump plays essential roles in the early-stage development of the asexual lifecycle. Our data suggest PPi is an energy source in the malaria parasite P. falciparum.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Xu et. al. does a very thorough characterization and molecular dissection of the role of SSH2 in spermatogenesis. Loss of SSh2 in germ cells results in germ cell arrest In step2-3 spermatids and eventually leads to germ cell loss by apoptosis. Molecular characterization of the mutant mice shows that the loss of SSH2 prevents the fusion of proacrosomal vesicles leading to the formation of a fragmented acrosome. The fragmentation of the acrosome is due to the impaired actin bundling and dephosphorylation of COFILIN. In short, this is a comprehensive body of work.

      We thank the referee for these insightful comments.

      Reviewer #2 (Public Review):

      The acrosome is a unique sperm-specific subcellular organelle required for the fertilization process, and it is also an organelle undergoing extensive morphological and structural transformation during sperm development. The mechanism underlying the extensive acrosome morphogenesis and biogenesis remains incompletely understood. Xu et al in their manuscript entitled "The Slingshot phosphatase 2 is required for acrosome biogenesis during spermatogenesis in mice" reported that the Slingshot Phosphatase 2 is essential for acrosome biogenesis and male fertility through their characterization of spermatogenic and acrosomal defects in Ssh2 knockout mice they generated. Specifically, the authors provided molecular, genetic, and subcellular evidence supporting that Ssh2 mutation impaired the phosphorylation of an acting-binding protein, COFILIN during spermiogenesis and accordingly actin cytoskeleton remodeling, crucial for proacrosomal vesicle trafficking and acrosome biogenesis. The manuscript by Xu et. al. does a very thorough characterization and molecular dissection of the role of SSH2 in spermatogenesis. Loss of SSh2 in germ cells results in germ cell arrest In step2-3 spermatids and eventually leads to germ cell loss by apoptosis. Molecular characterization of the mutant mice shows that the loss of SSH2 prevents the fusion of proacrosomal vesicles leading to the formation of a fragmented acrosome. The fragmentation of the acrosome is due to the impaired actin bundling and dephosphorylation of COFILIN. In short, this is a comprehensive body of work.

      We appreciate and thank Referee #2 for the positive feedback and insightful comments.

      Strengths:

      Nicely written manuscript, addresses an important mechanistic question of the roles of cytoskeleton remodeling in acrosome biogenesis and provided genetic, subcellular, and molecular evidence to build up their support for their hypothesis that Ssh2 regulates actin cytoskeleton remodeling, a process essential for proacrosomal vesicle trafficking and acrosome biogenesis, through dephosphorylation actin-binding protein during spermiogenesis.

      We again thank to the Referee #2 for appreciating and encouraging us regarding our current research work.

      Weaknesses:

      For body weight, and testis weight of the mutants, the authors concluded that there is no significant difference between the mutant and wildtype (Fig 1E -1G), but they appear to use mice between 6-8 wk old, both the testis and body weight of males at 6-8 wks is still growing, with the number of mice analyzed being six, you could easily miss the significant difference of the testis size and or body weight with such a varied age and a small sample size.

      We thank the referee for their prompting of this important discussion point, which we now cover in our revised manuscript. In our originally submitted manuscript, we only presented the data for body weight, testis weight, and T/B ratio for mice between the age of 6–8 weeks, however, we have added the additional data of mice with age more than 8 weeks in the revised manuscript in a new Figure 1E-1G with the sample size of 12 for each genotype. We have also updated the relevant content in the figure caption. The revised figure caption for Figure 1 panels E–G reads as follows: “(E-G) Body weights (26.3609 ± 0.4914 for WT; 25.1741 ± 0.5189 for Ssh2 KO), weights of the testes (0.0862 ± 0.0036 for WT; 0.0788 ± 0.0023 for Ssh2 KO), and the testis-to-body weight ratio (0.3281 ± 0.0153 for WT; 0.3154 ± 0.0135 for Ssh2 KO) of adult WT and Ssh2 KO males (n = 12). Data are presented as the mean ± SEM; p > 0.05 calculated by Student’s t-test. Bars indicate the range of the data.”

      Other points:

      Comments: 1) Could the uniform cytoplasmic distribution of diminutive actin filaments in the wild type and disrupted actin filament remodeling be examined at the EM level on the round spermatids?

      We apologize for the confusion. Previously, we conducted a transmission electron microscopy (TEM) analysis on the testes samples to discover the distribution and ultrastructural organization of F-actin in WT and Ssh2 KO round spermatids. Unfortunately, even at high magnification (30,000x, right panel of Figure R1-Response Figure 1) by TEM of testicular section no diminutive actin filament was observed in the cytoplasm of round spermatids except for the acroplaxome-an actin-rich specialized structure anchors the acrosome-in WT spermatids as well as some thick bundle-like structures located at the acrosomal region of Ssh2 KO spermatids (Fig. R1). According to their unique characteristic of appearance, we interpreted these electron-dense bundles as the aberrantly aggregated actin filaments whose lengths are in accordance with the lengths of COFILIN-saturated F-actin fragments (Bamburg et al., 2021), suggesting the disrupted actin filament remodeling during acrosome biogenesis resulted from Ssh2 KO. However, due to the technological limitations of TEM and the complexity of intracellular environment of round spermatids, we only recognized few aggregated actin bundles with the loss of filamentous appearance in Ssh2 KO spermatids and no typical diminutive actin filament was detected which had been imaged under high-resolution cryo-TEM (Haviv et al., 2008) or live-cell total internal reflection fluorescence microscopy (Johnson et al., 2015) on the purified actin bundles and cultured cells. Given the lack of effective approaches to culture murine round spermatids in vitro, confocal microscopy of flourescence-labelled F-actin (e.g., IF staining by FITC-phalloidin) is a more accessible method for visualizing the disruption of actin remodeling than EM in murine spermatids as the actin-related findings that several other studies demonstrated (Djuzenova et al., 2015; Meenderink et al., 2019).

      Comments: 2) Any other defects are seen besides acrosome in the mutant testis given the important roles of actin cytoskeleton network and high expression of Ssh2 in spermatocytes, were chromatoid bodies or mitochondria affected in any way? Any other defects in the mice overall including female fertility and other organs, given the previously reported roles in the nervous system. It could be helpful information for others interested in Ssh 2 protein and actin cytoskeleton's roles in general.

      The referee has here raised an interesting point. Firstly, besides the acrosome-related defects in Ssh2 KO spermatids, we identified increased germ cell apoptosis and aberrant activation of apoptotic Bcl-2/Caspase-3 pathway in the testes of Ssh2 KO mice which were speculated to be triggered by the disordered COFILIN-mediated F-actin remodeling and have attracted our attention to further elucidate the underlying mechanisms in the future. Secondly, given the high expression of SSH2 in spermatocytes demonstrated by IF staining shown in figure 4B and 4C,we thus performed the surface chromosome spreading on spermatocytes to observe whether the morphology of chromatid bodies and the meiotic progression was affected by Ssh2 KO and no obvious defects were observed as shown in supplementary Figure S3 in originally submitted manuscript. Thirdly, no obvious morphological abnormality in chromatin or mitochondrial structure was detected in Ssh2 KO germ cells such as spermatocytes and round spermatids under TEM which prevents us to pursue it further. Fourthly, we have observed the potential effect(s) of Ssh2 KO on female fertility using Ssh2 KO female mice and did not find any obvious infertility defect in Ssh2 KO females compared to their WT littermates as demonstrated by the data of the body weight, ovary weight, ovary-to-body weight ratio, size of ovaries and fertility test as well as the images of ovarian HE staining (Fig. R1). Moreover, given that during our investigation period, Ssh2 KO males and females did not manifest any defective physical development, aberrant physiological status or mental disorder notwithstanding the roles of SSH2 in neurite extension had been reported (Endo, Ohashi, & Mizuno, 2007), we did not conduct the experiments to observe the effect(s) of SSH2 in other organs except for the female fertility.

      Fig. R1 No reproductive defects were found in Ssh2 KO females. (A-C) Body weights, weights of the ovaries, and the ovary-to-body weight ratio of adult WT and Ssh2 KO females aged 8-10 weeks (n = 5); p > 0.05 calculated by Student’s t-test. Bars indicate the range of data. (D) The size of ovaries from Ssh2 KO were indistinguishable from ovaries of WT mice age 8 weeks, n = 4. (E) Histology of the ovaries from WT and Ssh2 KO mice. Sections were stained with hematoxylin and eosin. Scale bars: 200 μm. Images are representative of ovaries extracted from 8-week-old adult female mice per genotype. (F) Number of pups per litter from WT and Ssh2 KO male mice (8 weeks old) after crossing with WT adult male mice (n =3); p > 0.05 calculated by Student’s t-test. Bars indicate the range of the data.

      Comments: 3) Providing detailed information on the number of animals used and cells analyzed in the legend is nice, but it might be even better for the readers to include sample size and the number of cells examined in the figure/graph if possible.

      We appreciate the suggestions from the reviewer. We have integrated some information of sample size in the figures where appropriate. Firstly, we integrated sample size in the figure 1C, 1E, 1F, 1G and 1I. Secondly, we included sample size and the number of seminiferous tubule/epididymal duct we evaluated for TUNEL (+) cell counting in figure 2C and figure 2D. Thirdly, we included sample size and the number of spermatids for co-localization in figure 6B and figure 6D.

      Comments: 4) Nice discussion and comparison with GOPC and GM130, how about comparison and discussion with other acrosome defective mutants like PICK1, and ATG to provide some insights into acrosome biogenesis and proacrosomal vesicle trafficking?

      We greatly appreciate the referee for positive appraisal of our work with constructive suggestions, unfortunately, we are unable to address these defective mutants with certainty due to the lack of proper sample accessibility (only 3 of 16-month-old Ssh2 KO mice are accessible now). We compared the cytological staining of GM130 and GOPC in WT and Ssh2 KO spermatids using tubule squash sections as the description in the originally submitted manuscript which are prepared from fresh testes originated from 8-week-old mice and we now have several aged Ssh2 KO mice which prevent us to achieve the staining of PICK1 and ATG. PICK1 was previously reported to facilitate vesicle trafficking from the Golgi apparatus to the acrosome which co-localizes with GOPC in the proacrosomal granules (Xiao et al., 2009) and the phenotypes of Pick1 KO mice share a lot of similar characteristics with that of Ssh2 KO mice such as the fragmentation of the acrosome and increased germ cell apoptosis. Both autophagy-related ATG5 (Huang et al., 2021) and ATG7 (Wang et al., 2014) were reported to participate in the process of acrosome biogenesis and ATG7 is required for proacrosomal vesicle transportation/fusion by conjugating LC3 to the membrane of proacrosomal vesicles. Although the spermatids evaluated in these KO mice models could still be developed into spermatozoa with defective acrosome that is different from the situation in Ssh2 KO mice, it would be meaningful to discover the affects by Ssh2 KO on the localization of these regulators of acrosome biogenesis in spermatids and their potential interactions with SSH2. Indeed, in future work, we plan to pursue these issues and the content related to PICK1 has been added to the discussion in the revised manuscript as follows: “Moreover, it is intriguing to note that the phenotypes of Ssh2 KO mice share a lot of similarities with that of Pick1 KO model (Xiao et al., 2009) such as acrosome fragmentation and enhanced germ cell apoptosis, suggesting the possibility that SSH2 and PICK1 work together in a same trafficking machinery functioning in acrosome biogenesis which needs to be clarified further.”

      Comments: 5) Given the literature on Cofilin's requirement for male fertility and the increased p-Cofilin in Ssh2 mutant testis by Western and IF, the authors have a strong case for their hypothesis. But given the general role of phosphatase, it might be prudent to discuss alternative possibilities.

      We thank the reviewer for these valuable suggestions. Given that p-COFILIN is the only known substrate of SSH2 based on previous reports, we focused principally on this cascade to conduct our investigation. As a phosphatase, SSH2 is very likely to interact with many other proteins functioning in various cellular processes other than the actin-binding proteins which remain elusive. As directed, we now have added some content related to the regarding above concern in the discussion section of the revised manuscript as follows: “Given the diverse physiological roles reported for Slingshot family proteins, the possibility of the alternative mechanism underlying involvement of SSH2 in cellular events beyond the COFILIN-mediated actin remodeling should be noted. According to some publicly accessible databases as the indicators of potential protein–protein interactions such as BioGRID (Oughtred et al., 2019) and IntAct (Del Toro et al., 2022), SSH2 might interact with a set of actin-based molecular motors covering MYH9, MYO19 and MYO18A, which have been implicated in the maintenance of Golgi morphology and Golgi anterograde vesicular trafficking via the PI4P/GOLPH3/MYO18A/F-actin pathway (Rahajeng et al., 2019).”

    1. Author Response

      Reviewer #1 (Public Review):

      Voltage-clamp fluorometry combines electrophysiology, reporting on channel opening, with a fluorescence signal reporting on local conformational changes. Classically, fluorescence changes are reported by an organic fluoropohore tethered to the receptor thanks to the cysteine chemistry. However, this classical approach does not allow fluorescent labeling of solvent-inaccessible regions or cytoplasmic regions. Incorporation of the fluorescent unnatural amino acid ANAP directly in the sequence of the protein allows counteracting these limitations. However, expression of ANAP-containing receptors is usually weak, leading to very small ANAP-related fluorescence changes (ΔFs).

      In this paper, the authors developed an improved method for expression of full-length, ANAP-mutated proteins in Xenopus oocytes. In particular, they managed to increase the ratio of full-length over truncated proteins for C-terminal ANAP incorporation sites. Since C-terminally truncated P2X receptors are usually functional, it is important to maximize the full-length over truncated protein ratio to have a good correspondence between the observed current and fluorescence. Using their improved strategy, they screened for ANAP incorporation sites and ATP-mediated ANAP ΔFs along the whole structure of the P2X7 receptor: extracellular ligand binding domain (head domain), M2 transmembrane segment (gate), as well as a large extracellular domain specific for the P2X7 subtype, the "ballast" domain. The functional role of this domain and its motions following ATP application are indeed unknown. Monitoring ANAP fluorescence changes in this region following ATP binding provides a unique way to study those questions. By analyzing ATP-induced ΔFs from different parts of the receptors, the authors conclude that the ATP-binding domain mainly follows gating, while intracellular "ballast" motions are largely decoupled from ATP-binding

      Strengths of the paper:

      This paper provides an improved method for efficient unnatural amino acid incorporation in Xenopus oocytes. Thanks to this technique, they managed to enhance membrane expression of ANAP-mutated P2X7 receptors and observed strong fluorescent changes upon ATP application. The paper furthermore describes an impressive screen of ANAP-incorporation sites along the whole protein sequence, which allows them to monitor conformational changes of solvent-inaccessible regions (transmembrane domains) and cytoplasmic regions that were not accessible to cysteine-reactive fluorophores. This screen was performed in a very thorough manner, each ANAP mutant being characterized biochemically for membrane expression, as well as in term of fluorescence changes. The limitations of the approach -small ΔF upon ATP application on wt receptors, problem of baseline fluorescence variations in presence of calcium- are well explained. Overall, this study should thus not only serve as a guide to anyone willing to perform VCF on P2X7 receptors but it should be useful to the whole community of researchers using unnatural amino acids. Thanks to orthogonal labeling with TMRM and ANAP, the authors managed to simultaneously monitor the motions of the extracellular and intracellular domains of P2X7. Finally, they propose methods to simultaneously monitor intracellular domain motion and downstream signaling.

      Weaknesses:

      Although the fluorescence screen is impressive and well conducted, the biological conclusions remain superficial at this stage. The paper furthermore lacks quantitative analysis. Finally, the title only reflects a minor part of the paper and is therefore not representative of the paper content.

      Quantitative analyses (DRCs and current rise times) were now added for the key mutations. In addition, we performed a variety of experiments to address the challenging question of mechanistic insight (mutants that track facilitation) and effects of intracellular factors (mutation of calmodulin binding site, FRET experiments with calmodulin). These data confirmed that deletion of a cysteine-rich intracellular region eliminates current facilitation (Roger et al., 2010) and that some of our mutants indeed track facilitation. However, mutation of the CaM binding site and FRET experiments did not support an effect of calmodulin or were inconclusive. As pointed out above, we think that VCF has limited capacity to identify novel biologically relevant consequences of receptor activation but is more suited to determine the sites and dynamics of already defined interactions.

      The title was changed to: "Improved ANAP incorporation and VCF analysis reveals details of P2X7 current facilitation and a limited conformational interplay between ATP binding and the intracellular ballast domain"

      Reviewer #2 (Public Review):

      The authors aimed to elucidate the structural rearrangements and activation mechanisms of P2X7 upon ATP application by voltage clamp fluorometry (VCF) using fluorescent unnatural amino acid (fUAA) and other fluorophores. They improved the fUAA methodology and detected ATP binding evoked changes in the ATP binding region and other regions. They also observed facilitation of fluorescence (F) changes by repeated application of ATP associated with gating. The F change in the cytoplasmic ballast region was minor, and with their experimental data, they discussed this region is involved in activation by other cytoplasmic factors, such as Ca2+.

      The strengths of the study are as follows.

      (1) fUAA methodology was improved to enable experiments by one time injection to oocytes (Figs. 1 and Suppl).

      (2) They performed intensive mutagenesis study of as many as 61 mutants (Figs. 3, 4, 5).

      (3) A careful evaluation of the successful Anap incorporation and formation of full length proteins was performed by western blot analysis (Fig. 2).

      (4) By three wave lengths F recording, they obtained better information, i.e. they classified the interpretation of F changes to, quenching, dequenching, increase in polarity and decrease in polarity (Fig. 3E).

      (5) They detected F changes upon ATP application in various regions of P2X7, but not many in the ballast region, showing that the ballast region is not well involved in the ATP evoked gating.

      (6) They analyzed the kinetics of F and current and their changes upon repeated ATP application to approach the known facilitation mechanisms. The data are very interesting. They concluded that it is intrinsic to the P2X7 molecule and that it is associated not with the ATP binding but with the gating process (Figs. 3F, 4D, 6A).

      (7) They performed interesting analysis to clarify the mechanisms of activation by cytoplasmic factors, especially Ca2+ entered via P2X7 (Fig. 6).

      The weaknesses of the study are as follows.

      (1) As both structures of P2X in the open and closed states are already solved, and the ATP binding evoked structural rearrangements from the ATP binding site to the gate are already known in detail. The structural rearrangements detected in the extracellular region (Fig. 3) and TM region (Fig. 4) upon ATP application are just as expected. The impact and scientific merits of this part are rather limited.

      We generally agree that the cryo-EM structures clarified basic principles of receptor function. However, considering the specific features of the P2X7 receptor and its likely regulation/modulation by membrane components and environment and the fact that the actual states of the receptor structures (e.g. facilitated or not?) is not known, we think that VCF analysis of its dynamics in a more native cellular environment is still required to confirm the predicted motions and also has the potential to identify details of "P2X7 fine tuning".

      (2) The facilitation mechanism is of high interest. The authors showed it is intrinsic to P2X2 and associated with the gating rather than ATP binding. However, this reviewer cannot have better understanding about the actual mechanism. (a) What is the mechanistic trigger of facilitation? Possibilities are discussed, but it appears there is no clear answer with experimental evidences yet. (b) How is the memory of the 1st ATP application stored in the molecule, i.e. how does the P2X7 structure just before the 1st application differ from that just before the 2nd application of ATP?

      These are indeed fundamental questions but based on the available information we do not see a rational approach to address this issue any further. Additional extensive "screening" for ideal fluorophore positions would probably be required and is beyond our possibilities in the present study.

      (3) The structural rearrangement of the CaM-M13 region (Fig. 6B, C) attached at the C-terminus by Ca2+ influx through P2X7 upon ATP application is natural due course and not very surprising. Also, it is not accepted as an evidence proving that Ca2+ is the mediator of facilitation.

      We apologize, this is a misunderstanding. We only provided protocols for parallel recordings of ANAP with other fluorophores for further analysis of downstream signaling pathways but we did not show or propose any functional consequences of the Ca2+ influx (see also point 7 above).

      (4) As to the ballast region, data showed its limited involvement in the ATP-induced structural rearrangements. The function of the ballast region is not clear yet. A possible involvement in GDP binding and/ or metabolism is discussed, but there is no clear experimental evidence.

      We are aware of these limitations. In the absence of a clear fluorescence change around the GTP/GDP-binding site or information about its role, it is difficult to investigate its molecular function by VCF. The fact, that (un-)binding of the guanosine nucleotide does not seem to be related to channel opening (McCarthy et al., 2019) further limits our options to study its function and currently it is not even known whether GDP/GTP has just a structural role. However, we identified A564* as a potential reporter for yet undefined processes that might affect GTP/GDP binding and/or metabolism.

      Reviewer #3 (Public Review):

      This research contributes to optimizing the amber stop-codon suppression protocol for voltage-clamp fluorometry (VCF) experiments using Xenopus oocyte heterologous expression system. By in vitro RNA synthesizing the tRNA and tRNA synthetases, combined with the dominant-negative release factor initially developed by Jason Chin's lab, L-Anap can be site-specifically labeled to proteins by a single microinjection of a mixture of molecular components into the cytoplasm of oocytes. Although it avoids nuclear microinjection to oocytes, it adds more RNA synthesis steps. This strategy of using eRF dominant negative variant (eRF1-E55D), was previously applied to the Anap incorporation system using mammalian cell lines and model proteins (Gordon et al, eLife, 2018). In this previous 2018 paper, with eRF1-E55D, the percentage of full-length protein expression increased substantially. Using oocytes in this paper, this percentage apparently did not increase significantly as shown in Fig. 1D, different from the previous paper. Nevertheless, the overall expression level increased successfully by this method, which could facilitate macroscopic fluorescence measurements, especially considering that L-Anap is relatively dim as a fluorophore.

      Anap fluorescence change was measured mostly using its environmental sensitivity, which has limited information in interpreting structural changes. The structural mechanisms proposed could be potentially strengthened and the conclusions could be further validated by combining FRET or other distance ruler experiments with the VCF method. The engineered CaM-M13 FRET experiments mostly report the calcium entry, not measuring the rearrangements of P2X7 directly.

      We tried FRET analyses with ANAP-labeled P2X7 and mNeonGreen-labeled CaM but unfortunately, results were inconclusive.

      In addition, results of ATP dose-response relationship for channel activation correlated with ATP dose-dependent Anap fluorescence change, especially for sites showing a large percentage of ATP-induced change in fluorescence, would provide more insights regarding the allosteric mechanism of the channel.

      We agree, but unfortunately, bleaching of ANAP and the variation of background fluorescence in individual oocytes prevented such analyses .

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1) The relevance of the LPS-induced calvarial osteolysis model is not clear. Calvaria is mostly composed of cortical bone-like structures lacking marrow space, though small marrow space exists near the suture. Osteolysis appears to occur in areas apart from where marrow is located. The authors did not show in the manuscript which cells Adipoq-Cre marks in the calvaria.

      We have shown in a recent publication that MALPs exist in the calvarial bone marrow (2). As shown in Fig. R1A, Td+ cells are layer of cortical bone (Fig. R1B, blue arrows). In WT mice, after LPS injection, the normal bone structure, including suture and cortical bone, were mostly eroded, and filled with inflammatory cells (green arrows). Thus, osteolysis does occur at the area where bone marrow is originally located. On the contrary, calvarial bone structure was preserved in the CKO mice, demonstrating that Csf1 deficiency in MALPs suppresses LPS-induced osteolysis. We included the H&E staining data in the revised manuscript:

      "H&E staining showed that calvarial bone marrow is surrounded by a thin layer of cortical bone (Fig. 5C). After the LPS injection, normal calvarial structure, including suture and cortical bone, were mostly eroded and filled with inflammatory cells in WT mice, but unaltered in CKO mice."

      Figure R1. Calvarial bone marrow structure. (A) Representative coronal section of 1.5-month-old Adipoq/Td mouse calvaria. Bone surfaces are outlined by dashed lines. Boxed areas in the low magnification image (top) are enlarged to show periosteum (bottom left), suture (bottom middle), and bone marrow (BM, bottom right) regions. Red: Td; Blue: DAPI. Adopted from our previous publication (2). (B) H&E staining of coronal sections of WT and Csf1 CKOAdipoq mice after LPS injection. Blue arrows point to bone marrow space close to suture (indicated by *). Green arrows point to the osteolytic lesion where cortical bone was eroded, and the space were filled with inflammatory cells.

      2) Although the contrast between the two Csf1 conditional deletion models (Adipoq-Cre and Prx1-Cre) is very interesting, the relationship between these two cell populations are not well described. The authors did not clarify if MALPs are also targeted by Prx1-Cre, or these two cell types are from different cell lineages. "Other mesenchymal lineage cells" in the subtitle is not extremely helpful to place this finding in context.

      We thank the Reviewer for this comment. The original article constructing Prx1-Cre mouse line demonstrates that Prx1-Cre targets all mesenchymal cells in the limb bud at early as 10.5 dpc (10). This early expression pattern ensures that all bone marrow mesenchymal lineage cells, including MALPs, are targeted by Prx1-Cre. In addition, based on our scRNA-seq data (1), Adipoq is mainly expressed in MALPs, while Prrx1 (Prx1) is highly expressed not only in MALPs but also in EMPs, IMPs, LMPs, LCPs, and OBs (Fig. R2). Thus, the fact that Prx1-Cre driven CKO mice have much more severer bone phenotypes than AdipoqCre driven CKO mice indicates that mesenchymal lineage cells other than MALPs also contribute Csf1 to regulate bone resorption. To avoid confusion, we changed the title and the first sentence in the Result session about Prx1 mice to the following:

      "Csf1 from mesenchymal lineage cells other than MALPs regulate bone structure.

      To explore whether Csf1 from MALPs plays a dominant role in regulating bone structure, we generated Prx1-Cre Csf1flox/flox (Csf1 CKOPrx1) mice to knockout Csf1 in all mesenchymal lineage cells in bone (10), including MALPs."

      Figure R2. Dotplot of Prrx1 and Adipoq expression in bone marrow mesenchymal lineage cells based on our scRNA-seq analysis of 1-month-old mice.

      3) The data supporting defective bone marrow hematopoiesis in Csf1 CKO mice are not particularly strong. They observed a reduction in bone marrow cellularity, but this was only associated with an expected reduction in macrophages and a mild reduction in overall HSPC populations. More in-depth analyses might be required to define mechanisms underlying reduced bone marrow cellularity in CKO mice.

      We thank the Reviewer for this constructive comment. Accordingly, we performed a thorough analysis of bone marrow hematopoietic compartments and observed significant decreases of monocytes and erythroid progenitors in CKO mice compared to WT mice. These results are now included as Fig. 6E.

      4) Some of the phenotypic analyses are still incomplete. The authors did not report whether CHet (Adipoq-Cre Csf1(flox/+)) showed any bone phenotype. Further, the authors did not report whether Csf1 mRNA or M-Csf protein is indeed expressed by MALPs, with current evidence solely reliant on scRNAseq and qPCR data of bulk-isolated cells. More specific histological methods will be helpful to support the premise of the study.

      A pilot microCT study revealed the same femoral trabecular bone structure in WT and Adipoq-Cre Csf1flox/+ (Csf1 Het) mice at 3 months of age (Fig. R3). While the sample number for Het is low, we are confident about this conclusion.

      Figure R3. MicroCT measurement of trabecular bone structural parameters from WT and Csf1 Het mice. BV/TV: bone volume fraction; BMD: bone mineral density; Tb.N: trabecular number; Tb.Th: trabecular thickness; Tb.Sp: trabecular separation; SMI: structural model index. n=3-8 mice/group.

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides evidence for previously unknown relationship between oncogenic protein kinase A (PKA) signaling and MYC family members. Specifically, the authors have employed a combination of systems biology and biochemical assays to capture mediators of oncogenic PKA signaling in a fibrolamellar carcinoma and melanoma cell line. This lead to identification of Aurora A and PIM kinases as potential effectors of constitutively active PKA. Aurora A and PIM kinases have been previously shown to stabilize MYC proteins. Accordingly, evidence is provided that the effects of PKA/Aurora A and PKA/PIM axis are mediated via MYC. Collectively, these findings suggest a model whereby the effects of aberrant PKA signaling are mediated via Aurora A and PIM kinases and related feedback mechanisms that ultimately result in stabilization of MYC proteins. Importantly, PKA-driven cancer cell lines exhibited high sensitivity to Aurora A kinase inhibitors in cell culture-based assays. These findings not only provide pioneering insights into oncogenic PKA signaling, but may also have implications for developing therapeutic approaches for neoplasia that harbor constitutively active PKA.

      Strengths:

      This study addresses the role of aberrant PKA signaling in cancer, which represents a major gap in knowledge in cancer biology. Systems biology approaches and dissection of signaling networks downstream of constitutively active PKA are found to be exciting in the context of this study and likely to provide a wealth of information for future studies. Results from samples obtained from fibrolamellar carcinoma patients partially confirmed correlations observed in cell lines, which was seen as an advantage. Notwithstanding that, it was thought that orthogonal genetic validation may in some cases be warranted, pharmacological approaches using e.g. Aurora A inhibitors hold a promise for accelerated translation of observed findings into the clinic.

      We appreciate this positive assessment of our work and are hopeful that we have solidified the significance and potential impact of our findings through additional analysis.

      Weaknesses:

      The major drawback of the study is the lack of in vivo models to validate observations garnered from the cell lines. This is particularly important considering that experiments carried out in samples from fibrolamellar carcinoma patients suggested additional Aurora A and PIM kinase-independent mechanisms of PKA-driven increase in MYC levels and likely in neoplastic growth may be implicated in vivo. In addition, it was thought that more mechanistic evidence is required for linking PKA to PIM kinase, especially because different PIM kinases were implicated in stabilization of MYC in fibrolamellar carcinoma vs. melanoma cell lines. Finally, although pharmacological approaches were appreciated, due to potential issues with the specificity of the inhibitors, it was thought that orthogonal genetic approaches are warranted to further corroborate the proposed model.

      We acknowledge the lack of in vivo treatment modeling in this manuscript. The work presented here provides motivation for these important experiments, but they remain outside the scope of this manuscript. The expansion of the manuscript in revision with new investigations into protein translation and several additional data sets creates a more complete systems biology analysis of PKA signaling and PKA-induced signaling dependencies. This expanded scope makes in vivo validation of specific treatments and treatment combinations an even larger undertaking. The text has been modified to emphasize this point. We further acknowledge the accuracy of the reviewer’s assessment of our findings on PIM2. The limited reagents to study PIM kinases made this relatively difficult to expand. We shifted the focus of the work to include assessment of PKA effects on mRNA translation as a mechanism of c-MYC regulation. We have strengthened our assessments with loss- and gain-of-function genetic and pharmacological models, which we believe will more completely answer the reviewer’s concerns.

      Reviewer #2 (Public Review):

      Protein kinase A (PKA) is often stimulated and contributes to cancer growth, yet the downstream kinase signaling cascades remain unclear. Here the authors use a global phosphoproteomics and kinome activity profile to show that not only is the RAS/MAPK pathway activated, as expected, but the authors also suggest Aurora kinase A (AURKA) and PIM kinases are activated to stabilize the expression of MYC expression; a potent oncoprotein associated with poor prognosis and aggressive disease. The authors use a number of different cell lines in this study, but focus on fibrolamellar carcinoma as PKA is known to contribute to this disease.

      Strengths: It has been notoriously difficult to map kinases and their substrates as these protein-protein interactions are not always amenable to traditional biochemical techniques due to their labile nature, and kinase substrate consensus sites are often overlapping and not highly specific. Thus, the authors' pipeline to delineate such kinase cascades is quite novel and useful. They apply it here to determine PKA signaling in cancer using sophisticated computational strategies and then validate with classic molecular techniques.

      We appreciate this positive assessment of our analytical tools and the importance of understanding oncogenic PKA signaling.

      Weaknesses: The lack of mechanistic evidence linking aberrant PKA activation with regulation of MYC family members was considered to be a major weakness of the study. As it stands, it is hard to delineate whether observed changes in the levels of MYC family members are indeed a consequence of aberrant PKA signaling. It also remains unclear which MYC phosphorylation sites are implicated in the context of neoplastic PKA function and whether MYC family members are regulated at the level of protein stability or mRNA translation. Moreover, some methodological issues (e.g. using single siRNAs) were also observed. Collectively it was thought that these weaknesses should be addressed to corroborate author's conclusions.

      We acknowledge these concerns about our initially submitted manuscript and present extensive data that advances the manuscript in answering the key questions posed by the reviewer. We note that with the development of data showing PKA-induced phosphorylation of translation initiation components and sensitivity of c-MYC levels to eIF4A inhibition, some detailed evaluations of c-MYC phosphorylation were not undertaken, although key c-MYC mutants were tested in the course of our study and are included for reviewer interest.

    1. Author Response

      Reviewer #1 (Public Review):

      In the current study, the authors reanalyze a prior dataset testing effects of D2 antagonism on choices in a delay discounting task. While the prior report using standard analysis, showed no effects, the current study used a DDM to examine more carefully possible effects on different subcomponents of the decision process. This approach revealed contrasting effects of D2 blockade on the effect of reward size differences and bias. Effects were uncorrelated, suggesting separate mechanisms perhaps. The authors speculate that these opposing effects explain the variability in effects across studies, since they mean that effects would depend on which of these factors is more important in a particular design. Overall the study is novel and well-executed, and the explanation offers interesting insight into neural processes.

      We thank the reviewer for judging our study as interesting and well-executed.

      Reviewer #2 (Public Review):

      The authors aim to test the hypothesis that dopamine mediates the evaluation of temporal costs in intertemporal choice in humans, with a specific goal of synthesizing the competing accounts and previous results regarding whether dopamine increases or decreases evaluation of delays in comparing differently delayed future rewards. To do this, they computationally dissect the impact of the drug amisulpride, a D2R antagonist, using a variant of a sequential sampling model, the drift-diffusion model (DDM), that is well established in decision-making literature as a cognitive process model of choice. This model allows the dissociation of starting bias from the rate at which decision evidence is integrated ('drift'), which the authors map to different accounts of the role of dopamine: the temporal proximity of an outcome is proposed to impact bias, while the cost of a delay to impact the drift rate of evidence evaluation/accumulation. Consistent with previous results, and perhaps integrating conflicting findings, the authors find that d2R blockade impacts both bias and drift rate in a cohort of 50 participants, demonstrating dopaminergic action at this receptor is implicated in dissociable components of intertemporal choice, with D2R block reducing the bias towards sooner, more temporally proximate rewards as well as enhancing the contrast between reward magnitudes irrespective of delay, effectively diminishing the effect of delay in the drug condition. These effects are consistent across a small subset of alternative models, confirming the multiple cognitive mechanisms through which D2R block impacts intertemporal choice is a robust feature of decisions on this task.

      Overall, this study is a detailed dissection of the specific effects of amisulpride on a type of future-oriented, hypothetical intertemporal choice, and provides consistent evidence integrating conflicting accounts that implicate dopaminergic signaling on evaluation of the cognitive costs, such as a delay, on choice. However the specificity of the empirical intervention and the task design limits the interpretation of the broader dopaminergic mechanisms at play in intertemporal choice, especially given the complexity of receptor specificity of this drug, dopamine precursor availability and individual differences and the specifics of the intertemporal choice in this task. As it stands, the results contribute an interesting, synthesized account of how D2R manipulation can impact evaluation of delays in multiple ways, that will likely be useful for motivating future studies and more detailed computational assessments of the cognitive process-level components of intertemporal choice more generally.

      We thank the reviewer for the positive overall evaluation of our study. We revised the manuscript according to the reviewer’s comments, addressing also the receptor specificity of amisulpride and the specifics of the administered intertemporal choice task, which further improved the quality of the manuscript.

      The focus of this study is important, and delineating the role of DA in intertemporal choice is of high relevance given DA disfunction is prevalent in many psychiatric disorders and a key target of pharmacological treatment. While the hypotheses of the current study are framed with respect to "costs", the task used by the authors reduces these to evaluation of a hypothetical delay, one which the participants do not necessarily experience in the context of the task. In some respects this is reasonable, given the prevalence of this task paradigm in testing temporal aspects of choice in humans in an economic sense. However, humans are also notoriously subject to framing effects and the impact of instructions in cognitive tasks like these, which can limit the generality of the conclusions, and in particular the specific ways in which a delay can be interpreted as costly (for eg cost as loss of potential earnings, cost as effortful waiting, cost as computational/simulation cost in future evaluation). Given the hypothesis recruits the idea of cost in assessing the role of dopamine, testing for generality in the effects of amisulpride in related but differently framed tasks seems critical for making this link in a general sense, and in connecting it to the previous studies in the literature the authors point to as demonstrating conflicting effects.

      We agree that it is important to discuss whether our findings for delay costs can be generalized to other costs types as well, such as risk, social costs, effort, or opportunity costs. Based on a recent literature review (Soutschek, Jetter, & Tobler, 2022), we speculate that dopamine may moderate proximity effects also for risk and social costs but not for effortful rewards, though we emphasize that these hypotheses still require more direct empirical evidence. We also discuss the issue that delays can be perceived as costly in different ways. While in some tasks participants actually experience the waiting time until reward delivery, such that delayed rewards are associated with opportunity costs, in our current task paradigm delayed rewards were virtually free of opportunity costs as participants could engage in other reward-related behaviors during the waiting time. Previous studies suggest that lower tonic dopamine levels reduce the sensitivity to opportunity costs (Niv et al., 2007), which seems in line with our finding that amisulpride decreases the influence of delays on the starting bias parameter. Nevertheless, we emphasize that further evidence is needed to decide whether dopamine shows similar effects for experienced and non-experienced waiting costs. In the revised manuscript, we discuss the cost specificity of our findings on p.22:

      “An important question refers to whether our findings for delay costs can be generalized to other types of costs as well, including risk, social costs (i.e., inequity), effort, and opportunity costs. In a recent review, we proposed that dopamine might also moderate proximity effects for reward options differing in risk and social costs, whereas the existing literature provides no evidence for a proximity advantage for effort-free over effortful rewards (Soutschek et al., 2022). However, these hypotheses need to be tested more explicitly by future investigations. Dopamine has also been ascribed a role for moderating opportunity costs, with lower tonic dopamine reducing the sensitivity to opportunity costs (Niv et al., 2007). While this appears consistent with our finding that amisulpride (under the assumption of postsynaptic effects) reduced the impact of delay on the starting bias, it is important to note that choosing delayed rewards did not involve any opportunity costs in our paradigm, given that participants could pursue other rewards during the waiting time. Thus, it needs to be clarified whether our findings for delayed rewards without experienced waiting time can be generalized to choice situations involving experienced opportunity costs.”

      Further, while the study aims to test the actions of dopamine broadly, the empirical manipulation is limited to the action of amisulpride, a D2R anatgonist. There is little to no discussion of, or control for, the relationship between dopaminergic action at D2 receptors (the site of amisulpride effects) and wider mechanisms of dopaminergic action at other sites eg D1-like receptors, and the interplay between activation at these two receptor types alongside baseline levels of dopamine concentration. This is necessary for a comprehensive account of dopamine effects on intertemporal choice as the authors aim to test, as opposed to a specific test of the role of the D2 receptor, which is what the study achieves. On a related note, in some preparations at least, amisulpride also acts at some of the 5-HT receptors, raising the possibility of a non-dopaminergic mechanism by which this drug might impact intertemporal decisions. This possibility, while it would not be expected to act without dopaminergic effects as well, is consistent with established effects of serotonin on waiting behaviors and patience. Granted, the limits of pharmacology in humans does not necessarily mean this can be controlled for, it should be kept in mind with a systemic manipulation such as this.

      We agree with the reviewer that it is important to distinguish between the contributions of D1 and D2 receptors to decision making, given that these receptor families are hypothesized to have dissociable functional roles. We therefore re-analyzed also data on the impact of a D1 agonist on intertemporal decision making (previous findings for this data set were published in Soutschek et al., 2020, Biological Psychiatry). This analysis provided no evidence for significant effects of D1R stimulation on parameters from a drift diffusion model. This suggests that D2R, rather than D1R, activation mediates the impact of proximity on intertemporal choices.

      In the revised manuscript, we report the findings for the D1 agonist study on p.16:

      “To assess the receptor specificity of our findings, we conducted the same analyses on the data from a study (published previously in Soutschek et al. (2020)) testing the impact of three doses of a D1 agonist (6 mg, 15 mg, 30 mg) relative to placebo on intertemporal choices (between-subject design). In the intertemporal choice task used in this experiment, the SS reward was always immediately available (delay = 0), contrary to the task in the D2 experiment where the delay of the SS reward varied from 0-30 days. Again, the data in the D1 experiment were best explained by DDM-1 (DICDDM-1 = 19,657) compared with all other DDMs (DICDDM-2 = 20,934; DICDDM-3 = 21,710; DICDDM-5 = 21,982; DICDDM-6 = 19,660; note that DDM-4 was identical with DDM-1 for the D1 agonist study because the delay of the SS reward was 0). Neither the best-fitting nor any other model yielded significant drug effects on any drift diffusion parameter (see Table 4 for the best-fitting model). Also model-free analyses conducted in the same way as for the D2 antagonist study revealed no significant drug effects (all HDI95% included zero). There was thus no evidence for any influence of D1R stimulation on intertemporal decisions.”

      We discuss the specificity of D2 receptors for moderating the proximity bias on p.17: “This finding represents first evidence for the hypothesis that tonic dopamine moderates the impact of proximity (e.g., more concrete versus more abstract rewards) on cost-benefit decision making (Soutschek et al., 2022; Westbrook & Frank, 2018). Pharmacological manipulation of D1R activation, in contrast, showed no significant effects on the decision process. This provides evidence for the receptor specificity of dopamine’s role in intertemporal decision making (though as caveat it is worth keeping the differences between the tasks administered in the D1 and the D2 studies in mind).”

      We also agree that amisulpride acts also on 5-HT7 receptors, such that it remains unclear whether also such effects contribute to the observed result pattern. We discuss this limitation in the revised manuscript on p.21:

      “Lastly, while the actions of amisulpride on D2/D3 receptors are relatively selective, it also affects serotonergic 5-HT7 receptors (Abbas et al., 2009). Because serotonin was related to impulsive behavior (Mori, Tsutsui-Kimura, Mimura, & Tanaka, 2018), it is worth keeping in mind that amisulpride effects on serotonergic, in addition to dopaminergic, activity might contribute to the observed result pattern.”

      Overall the modeling methods are robust and appropriate for the specific test of decision impacts of D2R blockade, and include several prima facie variable alternative models for comparison. Some caution is warranted, since there are not many trials per subject, and some trials are discarded as well as outliers, which raises the question of power. Given the models are fit hierarchically, which gives both group-level and individual-level parameter estimates, the elements are there to probe more deeply into individual differences, and to test how reliably this approach can dissociate the dual effects of bias and drift rate at the individual level, and perhaps correlate it with other informative subject measures of either dopamine activity/capacity or other dopamine-dependent behaviors. Alternative DDMs might also capture some of this individual variation, with meaningful differences potentially in model comparison at the individual level. It should be noted that the scope of these models do not exhaust the ways in which proximity (here, temporal) of rewards and contrast between choice options might be incorporated into a cognitive process model account of choice; all alternatives here rest on the same implicit 2-alternative forced choice assumption of the DDM, and the assumptions of this model are not here tested against other accounts of choice, for example the linear ballistic accumulator (LBA) and its derivatives. Further, the concept of proximity as a global feature of a trial (on average, how soon are these options overall?) is never tested on my read of the alternative models.

      We thank the reviewer for these interesting suggestions. First, to explore whether measures of dopaminerigc activity correlate with individual differences in drug effects on DDM parameters, we now report correlations between DDM parameters and performance in the digit span backward task as proxy for dopamine synthesis capacity (Cools et al., 2008). None of these correlation analyses showed significant results. In the revised manuscript, we report these analyses on p.13:

      “However, we observed no evidence that individual random coefficients for the drug effects on the drift rate or on the starting bias correlated with body weight, all r < 0.22, all p > 0.10. There were also no significant correlations between DDM parameters and performance in the digit span backward task as proxy for baseline dopamine synthesis capacity (Cools, Gibbs, Miyakawa, Jagust, & D'Esposito, 2008), all r < 0.17, all p > 0.22. There was thus no evidence that pharmacological effects on intertemporal choices depended on body weight as proxy of effective dose or working memory performance as proxy for baseline dopaminergic activity.”

      Regarding model comparisons on the individual level, we note that the hierarchical Bayesian modelling approach allows (to the best of our knowledge) computing indices of model fit like DIC only on the group, not the individual level (while accounting for individual differences). However, we agree with the reviewer that theoretically different models might work best in different individuals (depending, for example, on the individual sensitivity to proximity). While such fine-grained model comparisons on the individual level are beyond the scope of the current study (and might not yield robust results given the limited number of trials for each participant), we now discuss this limitation in the revised manuscript (p.17-18):

      “We note that the hierarchical modelling approach allowed us to compare models on the group level only, such that in some individuals behavior might better be explained by a different model than DDM-1. Such model comparisons on the individual level, however, were beyond the scope of the current study and might not yield robust results given the limited number of trials per individual.”

      Likewise, linear ballistic accumulator (LBA) models represent a further class of process models with different assumptions on the mechanisms underlying the choice process than DDMs. In LBAs, evidence is accumulated separately for each choice alternative, whereas DDMs assume only one accumulation process which integrates attributes from two choice options, limiting the use of DDMs to two-alternative forced-choice scenarios. Nevertheless, proximity effects might be incorporated also in LBA models via modulating the starting point of the option-specific accumulators as a function of proximity. To the best of our knowledge, there is no built-in function in JAGS that allows estimating LBA models in a hierarchical Bayesian fashion (in contrast to, e.g., STAN), such that in the context of the current study it is difficult to directly compare our DDM-based approach with LBA models. It is importance to emphasize, however, that similar to other studies we do not make any claims about whether the choice process per se is best explained by DDMs or LBA models; instead, we focus on how rewards and delay costs affect different components of the decision process within a class of decision models. Nevertheless, we discuss such alternative modelling approaches in the revised manuscript on p.18:

      “We also emphasize that alternative process models like the linear ballistic accumulator (LBA) model make different assumptions than DDMs, for example by positing the existence of separate option-specific accumulators rather than only one as assumed by DDMs. However, proximity effects as investigated in the current study might be incorporated in LBA models as well by varying the starting points of the accumulators as function of proximity.”

      Lastly, we thank the reviewer for the interesting suggestion to assess whether the starting bias parameter is affected by the overall proximity of offers (sum of delays) instead of by the difference in proximity between the options. We ran a further DDM to test this hypothesis, but this model explained the data worse (DIC = 9,492) than the original DDM (DIC = 9,478). Nevertheless, also the overall proximity DDM yielded a significant amisulpride effect on the impact of reward magnitude on the drift rate, HDImean = 0.83, HDI95% = [0.04; 1.75], underlining the robustness of this effect. In the revised manuscript, we report this analysis on p.12:

      “In a further model (DDM-4), we explored whether the starting bias is affected by the overall proximity of the options (sum of delays, Delaysum) rather than the difference in proximity (Delaydiff; see Table 3 for an overview over the parameters included in the various models). Importantly, our original DDM-1 (DIC = 9,478) explained the data better than DDM-2 (DIC = 9,481), DDM-3 (DIC = 10,224), or DDM-4 (DIC = 9,492). Nevertheless, amisulpride moderated the impact of Magnitudediff on the drift rate also in DDM-2, HDImean = 0.86, HDI95% = [0.18; 1.64], and DDM-4, HDImean = 0.83, HDI95% = [0.04; 1.75], and amisulpride also lowered the impact of Delaydiff on the starting bias in DDM-3, HDImean = -0.02, HDI95% = [-0.04; -0.001]. Thus, the dopaminergic effects on these subcomponents of the choice process are robust to the exact specification of the DDM.”

      Reviewer #3 (Public Review):

      Soutschek and Tobler provide an intriguing re-analysis of inter-temporal choice data on amisulpride versus placebo which provides evidence for an as-yet untested hypothesis that dopamine interacts with proximity to bias choices.

      The modeling methods are sound with a robust and reasonably exhaustive set of models for comparison, with good posterior predictive checks at the single subject level, and decent evidence of parameter recoverability. Importantly, they show that while there is no main effect of drug on the proportion of larger, later (LL) versus smaller, sooner (SS) choices, this obscures conflicting-directional effects on drift rate versus starting point bias which are under-the-hood, yet anticipated by the hypothesis of interest.

      We thank the reviewer for judging our findings as intriguing and the modelling approach as robust and convincing.

      While I have no major concerns about methodology, I think the Authors should consider an alternative interpretation - albeit an interpretation which would actually support the hypothesis in question more directly than their current interpretation. Namely, the Authors should re-consider the possibility that amisulpride's effects are mediated primarily by acting at pre-synaptic receptors. If the D2R antagonist were to act pre-synaptically, it would drive more versus less post-synaptic dopamine signaling.

      There are multiple reason for this inference. First, the Authors observe that the drug increases sensitivity to differences in the relative offer amounts (in terms of effects on the drift rate). With respect to the canonical model of dopamine signaling in the direct versus indirect pathway, greater post-synaptic signaling should amplify sensitivity to reward benefits - which is what the Authors observe.

      Second, the Authors also observe an effect on the starting bias which may also be consistent with an increase in post-synaptic dopamine signaling. Note that according to the Westbrook & Frank hypothesis, a proximity bias in delay discounting should favor the SS over the LL reward, yet the Authors primarily observe a starting bias in the direction of the LL reward. This contradiction can be resolved with the ancillary assumption that, independent of any choice attribute, participants are on average predisposed to select the LL option. Indeed, the Authors observe a reliable non-zero intercept in their logistic regression model indicating that participants selected the LL more often, on average. As such, the estimated starting point may reflect a combination of a heightened predisposition to select the LL option, opposed by a proximity bias towards the sooner option. Perhaps the estimated DDM starting point is positive because the predisposition to select the LL option has a larger effect on choices than the proximity bias towards sooner rewards does in this data set. To the extent that amisulpride increases post-synaptic dopamine signaling (by antagonizing pre-synaptic D2Rs) it should amplify the proximity bias arising from the differences in delay, shifting the starting bias towards the SS option. Indeed, this is also what the Authors observe.

      Note that it remains unclear why an increase in post-synaptic dopamine signaling would amplify one kind of proximity bias (towards sooner over later rewards) without amplifying the other (towards a predisposition to select the LL option). Perhaps the cognitive / psychological nature of the sooner bias is more amenable to interacting with dopamine signaling than the latter. Or maybe proximity bias effects are most sensitive to dopamine signaling when they are smaller, and the LL predisposition bias is already at ceiling in the context of this task. These assumptions would help explain why a potential increase in post-synaptic dopamine signaling both amplified the proximity effect of delay when it was smallest (when the differences in delay were smaller), and also failed to amplify the predisposition to select the LL option (which may already be maxed out). More importantly, the assumption that there are opposing proximity biases would also help explain why there is a negative effect of delay magnitude on the estimated starting point on placebo. Namely - as the delay gets larger, the psychological proximity of sooner over later rewards grows, counteracting the proximity bias arising from choice predisposition / repetition.

      We thank the reviewer for suggesting this alternative interpretation of our data. We agree that the administered dose of 400 mg amisulpride can show both postsynaptic (reducing D2R activation) and presynaptic effects (enhancing D2R activation), which in many studies makes it difficult to decide whether the observed behavioral effects are caused by presynaptic or postsynaptic mechanisms.

      The reviewer suggests that the observed stronger influence of reward magnitudes on drift rates under amisulpride compared with placebo speaks in favor of presynaptic effects, because according to theoretical accounts higher dopamine levels should increase reward seeking (e.g., Frank & O’Reilly, 2006). On the other hand, Figure 2C suggests that amisulpride (compared with placebo) increased the preference only for relatively high, above-average rewards. If the difference between reward magnitudes was below average, amisulpride reduced rather than increased the preference for the larger reward. In our view, this is consistent with the hypothesis that D2R activation implements a cost control, with higher D2R activation increasing the attractiveness of costly rewards and lower D2R activation reducing it. In other words, under low dopamine levels individuals should decide for the costlier reward only if the magnitude of the costlier reward is sufficiently large compared with the lower, less costly reward. In fact, this is exactly what we find in our data according to Figure 2C. In our view, the amisulpride effect on drift rates is thus compatible with both presynaptic and postsynaptic mechanisms of action, depending on the underlying conceptual account of dopamine, as we now discuss in the revised manuscript.

      According to the reviewer, also the observed influence of amisulpride on the starting bias speaks in favor of increased rather than reduced dopamine levels. We agree with the reviewer that the result pattern for the starting bias is somewhat complex and seems to combine the effects of two different biases: a general tendency to choose LL over SS rewards (intercept of starting bias where the difference in delays is close to zero), and a shift towards the SS option under placebo if one options has a strong (temporal) proximity advantage over the other. Amisulpride shows opposite effects on the two different biases, as it shifts the intercept of the starting bias further away from the LL option but also reduces the proximity advantage of the SS over the LL reward for larger differences in delay. The reviewer writes that “To the extent that amisulpride increases post-synaptic dopamine signaling (by antagonizing pre-synaptic D2Rs) it should amplify the proximity bias arising from the differences in delay, shifting the starting bias towards the SS option. Indeed, this is also what the Authors observe.” In contrast to that statement, in our study amisulpride reduced rather than increased the starting bias arising from delay (as in Figure 2K the regression line is flatter under amisulpride compared with placebo, despite the differences regarding the intercept). We believe that the amisulpride effects on both the intercept and the delay-dependent slope can be explained via postsynaptic effects: First, the shift of the intercept of the starting bias (small differences in proximity) from the LL towards the SS option under amisulpride is consistent with the assumption that lower dopamine reduces the preference for larger reward (e.g., Beeler & Mourra, 2018; Salamone & Correa, 2012). Second, the finding that amisulpride weakens the proximity advantage of SS over LL rewards (delay-dependent slope) is consistent with the proximity account by Westbrook & Frank (2018) according to which lower tonic dopamine should reduce proximity effects. Thus, if we assume that the result pattern for the starting bias parameter is driven by dopaminergic effects on two separate decision biases (as suggested by the reviewer), we believe that both effects can better be explained by pharmacologically reduced rather than increased dopamine levels.

      In the revised manuscript, we extensively discuss the question as to whether the observed drug effects are caused by postsynaptic versus presynaptic effects. We clarify that the amisulpride effect on drift rates seems consistent with both presynaptic and postsynaptic effects (depending on the underlying conceptual account). We moreover discuss that the starting bias effects may reflect the interaction between two different bias types, and the drug effects on both bias types can more easily be reconciled with postsynaptic than presynaptic effects. On balance, we believe that the observed effects are more likely to reflect lower as compared to higher dopamine levels, but the extended discussion of this issue gives all readers the opportunity to weigh the arguments for and against these alternatives. If the reviewer should not agree with some aspects of our argumentation as outlined above, we would of course be happy to modify the discussion according to the reviewer’s advice.

      In the revised manuscript, we modified the discussion of presynaptic versus postsynaptic effects as follows (p.20-21):

      “While higher doses of amisulpride (as administered in the current study) antagonize post-synaptic D2Rs, lower doses (50-300 mg) were found to primarily block pre-synaptic dopamine receptors (Schoemaker et al., 1997), which may result in amplified phasic dopamine release and thus increased sensitivity to benefits (Frank & O'Reilly, 2006). At first glance, the stronger influence of differences in reward magnitude on drift rates under amisulpride compared with placebo might therefore speak in favor of presynaptic (higher dopamine levels) rather than postsynaptic mechanisms of action in the current study. On the other hand, one could argue that amisulpride reduced the preference for the LL reward if the gain from the costlier LL option compared with the SS option was small (as suggested by Figure 2C), which is consistent with the cost control hypothesis of dopamine (Beeler & Mourra, 2018). The impact of amisulpride on the drift rate thus appears ambiguous regarding the question of pre- versus postsynaptic effects. The result pattern for the starting bias parameter, in turn, suggests the presence of two distinct response biases, reflected by the intercept and the delay-dependent slope of the bias parameter (see Figure 2K), which are both under dopaminergic control but in opposite directions. First, participants seem to have a general bias towards the LL option in the current task (intercept), which is reduced under amisulpride compared with placebo, consistent with the assumption that dopamine strengthens the preference for larger rewards (Beeler & Mourra, 2018; Salamone & Correa, 2012; Schultz, 2015). Second, amisulpride reduced the proximity advantage of SS over LL rewards with increasing differences in delay, as predicted by the proximity account of tonic dopamine (Westbrook & Frank, 2018). On balance, the current results thus appear more likely under the assumption of postsynaptic rather than presynaptic effects. Unfortunately, the lack of a significant amisulpride effect on decision times (which should be reduced or increased as consequence of presynaptic or postsynaptic effects, respectively) sheds no additional light on the issue.”

      Regardless of the final interpretation, showing that pharmacological intervention into striatal dopamine signaling can simultaneously modify a starting point bias and drift rate (in opposite directions - thus having systematic effects on choice biases without altering the average proportion of LL choices) provides crucial first evidence for the hypothesis that dopamine and proximity interact to influence decision-making. These results thereby enrich our understanding of the neuromodulatory mechanisms influencing inter-temporal choice, and take an important step towards resolving prior contradictions in this literature. They also have implications for how striatal dopamine might impact decision-making in diverse domains of impulsivity beyond inter-temporal choice, ranging from cognitive neuroscience (e.g. in numerous cognitive control tasks) to psychiatry (treating diverse disorders of impulse control).

      We thank the reviewer for highlighting the importance of the current findings for understanding dopamine’s role in decision making.

    1. Author Response

      Reviewer #1 (Public Review):

      Liau and colleagues have previously reported an approach that uses PAM-saturating CRISPR screens to identify mechanisms of resistance to active site enzyme inhibitors, allosteric inhibitors, and molecular glue degraders. Here, Ngan et al report a PAM-saturating CRISPR screen for resistance to the hypomethylating agent, decitabine, and focus on putatively allosteric regulatory sites. Integrating multiple computational approaches, they validate known - and discover new - mechanisms that increase DNMT1 activity. The work described is of the typical high quality expected from this outstanding group of scientists, but I find several claims to be slightly overreaching.

      Major points:

      The paper is presented as a new method - activity-based CRISPR scanning - to identify allosteric regulatory sites using DNMT1 as a proof-of-concept. Methodologically, the key differentiating feature from past work is that the inhibitor being used is an activity-based substrate analog inhibitor that forms a covalent adduct with the enzyme. I find the argument that this represents a new method for identifying allosteric sites to be relatively unconvincing and I would have preferred more follow-up of the compelling screening hits instead. The basic biology of DNMT1 and the translational relevance of decitabine resistance are undoubtedly of interest to researchers in diverse fields. In contrast, I am unconvinced that there is any qualitative or quantitative difference in the insights that can be derived from "activity-based CRISPR scanning" (using an activity-based inhibitor) compared to their standard "CRISPR suppressor scanning" (not using an activity-based inhibitor). Key to their argument, which is expanded upon at length in the manuscript, is that decitabine - being an activity-based inhibitor that only differs from the substrate by 2 atoms - will enrich for mutations in allosteric sites versus orthosteric sites because it will be more difficult to find mutations that selectively impact analog binding than it is for other active-site inhibitors. However, other work from this group clearly shows that non-activity-based allosteric and orthosteric inhibitors can just as easily identify resistance mutations in allosteric sites distal from the active site of an enzyme (https://www.biorxiv.org/content/10.1101/2022.04.04.486977v1). If the authors had compared their decitabine screen to a reversible DNMT1 inhibitor, such as GSK3685032, and found that decitabine was uniquely able to identify resistance mutations in allosteric sites, then I would be convinced. But with the data currently available, I see no reason to conclude that "activity-based CRISPR scanning" biases for different functional outcomes compared to the "CRISPR suppressor scanning" approach.

      We appreciate the reviewer’s comments and thank them for their constructive feedback. We agree with the reviewer that our claims regarding the utility of activity-based CRISPR scanning would be more strongly supported with a head-to-head comparison against a non-covalent, reversible inhibitor. To address this point, we conducted a CRISPR scanning experiment on DNMT1 and UHRF1 using GSK3484862 (GSKi), which is shown in Fig. 1e–h. We observed that the top enriched sgRNA under GSKi treatment targets H1507, which directly interacts with the drug and contributes to compound binding. (Fig. 1e,h, Supplementary Fig. 1e). Our results are consistent with previous structural and biochemical studies of these inhibitors (reported in Pappalardi, M.B. et al., Nat. Cancer 2021), in which they demonstrate that the H1507Y mutation reduces GSK3685032 (a derivative of GSK3484862) inhibition of DNMT1 by >350-fold compared to wild-type DNMT1. By contrast, the top enriched sgRNA under decitabine (DAC) treatment targets D702 in the autoinhibitory linker region (Fig. 1c). Furthermore, comparison of sgRNA resistance scores across DAC and GSKi treatment conditions reveals highly distinct sgRNA enrichment profiles (Fig. 1g). Taken together, our data suggest that these two mechanistic classes of inhibitors may exert differential selective pressures that lead to unique enrichment profiles.

      While we consider these data to strengthen our claim that activity-based CRISPR scanning can preferentially enrich for mutations in allosteric sites versus orthosteric sites, we also recognize that allosteric site mutations can be identified without the use of activity-based inhibitors, as the reviewer points out. To address this point, we have modified the text to suggest that the use of activity-based inhibitors may exert a greater bias for the enrichment of allosteric site mutations but clarifying that the enrichment of such mutations are not exclusive to the use of activity-based inhibitors.

      How can LOF mutations from cluster 2 be leading to drug resistance? It is speculated in the paper that a change in gene dosage decreases the DNA crosslinks that cause toxicity. However, the immediate question then would be why do the resistance mutations cluster around the catalytic site? If it's just gene dosage from LOF editing outcomes, would you not expect the effect to occur more or less equally across the entire CDS?

      This is an excellent point. As outlined previously above, we recognize that our gene dosage hypothesis regarding the mechanism of cluster 2 sgRNAs may lack sufficient explanation to convey our reasoning clearly, and we have added more text and data to clarify and support our claim.

      Mutations that are highly likely to lead to a nonfunctional protein product (i.e., frameshift, nonsense, splice site disrupting) are annotated as “loss-of-function” (LOF) in the text, with all other protein coding mutations designated as “in-frame.” The key insight underlying our gene dosage hypothesis is that sgRNAs targeting essential protein regions and functional domains generate greater proportions of null (i.e., knockout) mutations and undergo stronger negative selection compared to sgRNAs targeting non-essential protein regions (see Shi, J. et al., Nat. Biotechnol. 2015). This is because in-frame coding mutations in protein regions that are functionally important (e.g., DNMT1 catalytic domain) are more likely to disrupt protein function than those in non-essential protein regions. As a result, sgRNAs targeting functional protein regions are more likely to generate in-frame mutations resulting in a null allele and are thus “effectively LOF.” Importantly, the observation that sgRNAs targeting specific protein regions are more likely to lead to null mutations also implies that 1. not all CDS-targeting sgRNAs are equivalent at inducing LOF effects and 2. sgRNAs that are more effective at generating null mutations may exhibit preferential clustering within functionally important protein regions.

      In this context, we reasoned that cluster 2 sgRNAs, which target the essential catalytic domain, may be more effective at reducing DNMT1 gene dosage than other DNMT1-targeting sgRNAs because in-frame mutations generated by these sgRNAs are more likely to lead to nonfunctional DNMT1 protein. That is, cluster 2 sgRNAs may generate greater proportions of “effectively LOF” in-frame mutations that disrupt DNMT1’s essential function. Consequently, we posited that the observed clustering of these sgRNAs in the catalytic domain is likely a reflection of its functional importance. To test this idea, we transduced WT K562 cells with 6 individual sgRNAs targeting the N-terminus, RFTS domain, and catalytic domain of DNMT1, and performed genotyping on the cellular pools over 28 days (Fig. 4f). We observed that sgRNAs targeting outside of the catalytic domain exhibited increasing frequencies of in-frame mutations over time, consistent with the idea that these sgRNAs generate functional in-frame mutations that are not under strong negative selection. By contrast, catalytic-targeting sgRNAs exhibited significant depletion of inframe mutations over time, supporting the notion that in-frame mutations in essential regions are functional knockouts and thus negatively selected under normal growth conditions. Consequently, the ability of catalytic-targeting sgRNAs to generate greater proportions of null mutations would therefore make them more effective at conferring resistance through gene dosage reduction than other DNMT1-targeting sgRNAs.

      Our hypothesis implies that a large proportion of in-frame mutations generated by cluster 2 sgRNAs are functionally equivalent to LOF mutations (i.e., frameshift, nonsense, splice site disruption), and therefore neither in-frame or LOF mutations should be preferentially selected for under DAC treatment, in contrast to the positive selection of gain-of-function (GOF) in-frame mutations in cluster 1 sgRNAs. Consistent with this idea, our data indicate that the relative proportions of in-frame and LOF mutations in cluster 2 sgRNAs remain comparable across vehicle and DAC treatments (Fig. 4b). Furthermore, since the selective pressure on in-frame and LOF mutations should be similar if they are functionally equivalent, the relative proportions of in-frame versus LOF mutations in cluster 2 sgRNAs should be primarily dictated by their frequencies as editing outcomes. Consistent with this idea, the observed proportions of in-frame versus LOF mutations in cluster 2 sgRNAs under DAC treatment do not deviate significantly from their expected proportions as predicted by inDelphi (Supplementary Fig. 4c). Conversely, cluster 1 sgRNAs exhibit greater ratios of in-frame versus LOF mutations under DAC treatment than their predicted ratios from inDelphi (Supplementary Fig. 4c,d). Altogether, these data are consistent with the notion that cluster 2 sgRNAs may operate through a gene dosage reduction effect.

      In general, I found the screens, and integrative analyses, highly compelling. But the follow-up was rather narrow. For example, how much do these mutations shift the IC50 curves for DAC?

      To address this point, we derived two clonal cell lines from the screen harboring endogenous DNMT1 mutations in either the autoinhibitory linker or the RFTS domain (Supplementary Fig. 3g). We treated these cell lines, in addition to WT K562 cells, with varying concentrations of DAC and observed a partial growth rescue in the mutant cell lines relative to WT K562 cells (Fig. 3i). We also show that these mutant cell lines exhibit DAC-mediated degradation of DNMT1, consistent with our fluorescent reporter results (Supplementary Fig. 3h). To further validate whether these endogenous DNMT1 mutations confer partial resistance to DAC, we transduced WT K562 cells with vectors encoding an shRNA targeting the 3' UTR of the endogenous DNMT1 transcript and a DNMT1 overexpression vector encoding WT and mutant DNMT1 constructs (Supplementary Fig. 3i). Upon treating these knockdown and overexpression cells with varying concentrations of DAC, we again observed a partial growth rescue in the presence of mutant versus WT DNMT1 (Fig. 3j).

      What kinetic parameters have changed to increase catalytic activity?

      We performed enzyme activity assays at various temperatures with recombinant DNMT1 protein for WT and mutant DNMT1 constructs, observing that mutant DNMT1 constructs exhibit varying degrees of overactivity relative to WT DNMT1 at different temperatures (Fig. 3h, Supplementary Fig. 4f). Whereas the autoinhibitory linker mutations display consistently higher levels of activity relative to WT DNMT1 at all temperatures tested, we observed that RFTS and CXXC mutants exhibited decreasing levels of overactivity with increasing temperature (Fig. 3h). Previous studies (see Berkyurek, A.C. et al., J. Biol. Chem. 2014) have observed similar behavior with RFTS mutations, suggesting that these mutations may disrupt critical hydrogen bonds at the autoinhibitory interface that reduce the activation energy required to release DNMT1 from an autoinhibited to active conformation. Our RFTS and CXXC mutations exhibit behavior that are consistent with this hypothesis, which may explain the decreasing levels of overactivity with increasing temperature.

      Do the mutants with increased catalytic activity alter the abundance of methylated DNA (naively or in response to the drug)? It is speculated that several UHRF1 sgRNAs disrupt PPIs and not DNA binding, but this is never tested.

      While we derived clonal cell lines containing DNMT1 mutations, as noted above, it proved too difficult to compare these drug-resistant cells to naïve cells because they were cultured in the presence of DAC for 2 months, leading to large changes in DNA methylation that may confound any conclusions about the effects of the mutations alone. Additionally, the reviewer also brings up valid limitations regarding our studies on UHRF1, which also proved very difficult to biochemically purify and beyond our expertise. After some initial studies, we chose not to pursue these additional experiments further but instead prioritized the GSKi CRISPR-suppressor scan and cluster 2 studies, as suggested by the reviewers. We acknowledge these limitations in the text.

      Reviewer #2 (Public Review):

      In this manuscript, Ngan and coworkers described a CRISPER-based screening approach to identify potential variants of DNMT1 and UHRF1 that can suppress the anti-proliferation role of decitabine. In theory, such an effect can be achieved by at least two types of gain-of-activity DNMT1/UHRF1 mutants by directly boosting the enzymatic activity or by indirectly abolishing the intrinsic inhibitory activity of the DNMT1-UHRF1 axis. Through systematically targeting the DNMT1-UHRF1 reading frames with a rationally designed sgRNA library, the authors identified and characterized a few potential hotspots within multiple autoinhibitory motifs. While the approach has its merits in regard to the unbiased screening of the target proteins in living cells, there are the following serious concerns in terms of how the data were interpreted and the limitation of the approach itself as detailed below.

      (1) Although the authors identified multiple hotspots in the DNMT1-UHRF1 complex with their alterations associated with the resistance to decitabine, it is risky to argue these mutations increase DNMT1 activity simply because they are clustered within known auto-inhibitory regions. There are many alternative explanations for this observation. For instance, some mutants may allosterically alter how DNMT1 recognizes decitabine-containing vs native GpC motifs; others may recruit other proteins as modulators. The key gap here is to associate the decitabine-resistance phenotype to the loss of auto-inhibitory functions because multiple hotspots were in the auto-inhibitory regions.

      In our original manuscript, we supported our claim that gain-of-function DNMT1 mutations enhance DNMT1 activity with experimental data using purified DNMT1 protein constructs in enzyme activity assays (Fig. 3g, Fig. 4g), so our conclusion was not solely inferred from sgRNA clustering at the autoinhibitory interface, but also experimentally validated. In our revised manuscript, we provide additional experimental biochemical characterization to further support the claim that autoinhibition is weakened in the DNMT1 mutants we identified (Fig. 3h, Supplementary Fig. 4f). Moreover, we provide cellular data using clonal cell lines harboring endogenous DNMT1 mutations in addition to knockdown/overexpression experiments, demonstrating that RFTS and autoinhibitory linker mutations confer partial growth rescue to DAC treatment (Fig. 3i,j). We agree that we cannot rule out the possibility that these mutations may exert other effects that independently contribute to the observed resistance phenotype (e.g., altered CpG recognition), and we have added a statement acknowledging this limitation.

      (2) Lack of general biological relevance of the corresponding findings. Through this work, the author identified multiple DNMT1-UHRF1 variants that alter the anti-proliferation role of decitabine. However, the observation that the multiple mutants were clustered in a hotspot doesn't mean that these mutants have to act via the same mechanism. The authors seem to underestimate the complexity of how these mutants can render the same biological readouts and even haven't considered the possibility of transcriptional modulation of antagonists or agonists in the DNMT1-UHRF1. Therefore, the biological relevance of these findings remains unclear.

      We agree that although the cluster 1 mutations share a common property of increased DNMT1 activity, it does not preclude alternative mechanisms. Indeed, it is likely that these mutations have complex and nuanced mechanistic differences in the biochemical alterations underlying their observed increases in DNMT1 activity. Indeed, we have included enzyme activity data suggesting that autoinhibitory linker mutations may exhibit a different biochemical basis for increased DNMT1 activity than RFTS and CXXC mutations. That said, we did not intend to make broader claims regarding biological relevance and were instead focused on conveying that this activity-based methodology can identify gain-of-function mutations, which we directly support with experimental data. To clarify these points, we have adapted the text to more precisely convey our intended claims and have acknowledged that other complex mechanisms may also be involved.

      (3) Collectively for reasons (1) and (2), the mechanistic analysis seems only to associate the current findings with known regulatory pathways. Without detailed in vitro and in-cell characterization of the DNMT1-UHRF1 mutants, the novel regulatory mechanisms, which may exist, could be largely missed.

      We have added some additional characterization of these mutations in the revised manuscript, which have been detailed above, and we would like to note that we identified new sites in DNMT1 and UHRF1 that may be functional based off our allele analysis. However, since this manuscript is intended more as a methodology, we believe that extensively exploring novel regulatory mechanisms and their mechanism is beyond the scope of this report.

      (4) The current CRISPER-based screening approach has the technical limitation of mainly screen deletion with some exceptions for point mutations. As a result, the majority of loss/gain-of-function point mutations will be missed by the CRISPER-based screening method.

      We acknowledge that a technical limitation of this Cas nuclease-based mutational scan is that it is biased toward insertion/deletion mutations versus point mutations. However, we disagree with the reviewer’s claim that this means that the majority of the loss-/gain-of-function mutations will be missed, since insertion/deletions are often larger perturbations than point mutations and thus have stronger effect sizes in many cases. In principle, the selection modalities (e.g., activity-based inhibitors) used here — which are the primary focus of the study — can also be combined with alternative genomic editing approaches to assess distinct mutational perturbations, such as base editing for point mutations (see Lue, N.Z. et al., Nat. Chem. Biol. 2022). To acknowledge the reviewer’s concern, however, we have added additional text explicitly stating that the screen is biased against point mutations and that future integration with base editing and other mutational modalities may be useful to complement our nuclease-based approach.

      (5) The current CRISPER-based screening approach can work only in the context of living cells. As a result, robust cellular readouts are needed. The DNMT1-UHRF1 in combination with decitabine is among few suitable targets for such application.

      While running CRISPR-based screens requires robust cellular assays, the main advantage of CRISPRbased mutational scanning is the ability to mutagenize the endogenous protein target in situ and assess the effect of the perturbation in the native cellular and genomic context. Resistance to activity-based probes — and small molecules more broadly — provides a robust phenotypic readout that has been extensively used by our group and many others. Alternatively, other types of phenotypic readouts that do not rely on cell viability can also be employed with these screens, including those used to assess DNA methylation (see Lue, N.Z. et al., Nat. Chem. Biol. 2022). Given the increasingly large body of literature applying CRISPR-based screens towards a multitude of biological pathways and diverse targets, we disagree with the reviewer’s claim that only a few targets can be evaluated in such a manner.

      (6) Although the authors claim that their mutants are "gain-of-function" for DNMT1/UHRF1, they were indeed due to the loss of inhibitory regulation. It is a little disappointing because the screening outcomes still fall into the conventional expectation of the loss-of-function variants.

      We agree that the mutations are not truly neomorphic, but instead likely hypermorphic due to loss of an autoinhibitory mechanism, resulting in gain-of-function increase in catalytic activity. While discovering neomorphic mutations would be extraordinary, we do not believe that our results are disappointing since the identification of autoinhibitory mechanisms is nevertheless impactful.

      Collectively, the current status of the manuscript is short of merits in terms of the impacts of technology and biological findings.

      We respectfully disagree with the reviewer’s comment as we believe that the experimental and computational methodology may be broadly useful for the field. Indeed, we have already implemented many of the tools developed here in our current ongoing work.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript presents a rather technical modelling analysis of the impact of local lockdowns on Covid-19 hospitalisations in the Netherlands. The major strength of the study is that the authors attempt to calibrate their model to a novel data source, a commercial database of mobility patterns between municipalities. The major weakness is that the model seems overly complicated, many parameters seem to have been 'guessed' without a formal uncertainty analysis, e.g. within a Bayesian framework, so that it is impossible to judge how robust the results and therefore the conclusions are.

      Major points:

      1) In some aspects the structure of the model presented seems overly complicated: It is not clear why the authors chose the 1:100 population scale and why they didn't go directly for modelling the full population. Artificially reducing the population size has important stochastic effects at the early phase of the epidemic. Also it is not clear what it means when 1:100 of one municipality mixes with 1:100 of another municipality? The authors should at least attempt to see what impact this has on output, i.e. conduct a sensitivity analysis.

      The reason for choosing a 1:100 population scale instead of the full population is computational speed. Indeed, this (and its consequences) is not mentioned explicitly and will be added. Moreover, to identify the sensitivity of the results to population scale, we add runs on different population scales.

      • Added reasoning and consequences associated with the 1:100 population scale in SI C.1.

      • The sensitivity of the results to population scale is now discussed in SI C.1 using runs with other population resolutions.

      2) On the other hand the model goes into (too) much detail regarding mixing behaviour and attempts to model processes during each hour of the day. This does not seem to be informed by actual data, but the data seems to be made up e.g. as in A.6. As an ex-student and a father of a teenager I can tell you that the susceptibility profile guessed in Table 3 does not seem to be very realistic. As it is stated in the appendix, the Mezuro data set only provides daily averages of travelling between communitities, so it is not clear why the hourly resolution is actually needed in the model.

      Indeed, several aspects in the model are informed by “secondary statistics” which unfortunately add uncertainty. An example would be the normalization of the mobility matrices by means of data on how people spend their time (see SI A.3). Note that the example of the susceptibility profile that the reviewer mentions, however, does not involve such secondary statistics and happens to be exactly reported by the Dutch health agency (cited in SI A.5).

      We agree that the model includes much detail, which potentially has weaknesses as the reviewer rightfully mentions. However, one of the main points of this paper is that in order to address the questions of local interventions, geographical spread and associated hospital admissions, we simply need this level of detail, or even higher. In other words, assessments of such mechanics would be even more uncertain if this level of detail is not included.

      We agree that the motivation for hourly resolution is not well described in the manuscript – this will be added. The reasoning is that mixing of the population is highly heterogeneous throughout the day: clearly, seen in Fig. S5 (SI A.7), mixing at work is fully different from mixing at school or at home.

      Moreover, people meet at work in different municipalities and then return to home to potentially spread the disease further. It is exactly such mechanics that we are after in our analysis.

      • Added a more in-depth discussion of the mobility data in SI C.2.

      • Added the motivation for hourly resolution in SI A.1-A.3.

      3) It is not clear why the authors rely on only one short period of the Mezuro data set in March 2019 and not investigate the same data source during the actual lockdown in 2020, or even for the full year, as travelling is likely to be very season dependent. This would provide much better estimates of the effects of lockdown on travel patterns. The analysis presented and categorisation into frequent, regular and incidental also need further explanation. It is not clear how international travel is accounted for in the mobility data.

      The reviewer is correct that using a longer mobility dataset or one that is exactly addressing the period of the actual lockdown would be beneficial. The reason we are not doing so is simply that this data is not available.

      The model accounts for international travel by means of its initialization, but not further. In practise, international travel got severely reduced throughout this period. Hence, we deem the uncertainties associated with not accounting for international travel limited.

      • Added a discussion on the effect of using this mobility dataset in SI C.2. • Added a further explanation of categorizing the movements (in SI C .2).

      • Added a discussion on international travel in SI C.2.

      4) Beyond the technical points on the modelling, the main hypothesis of whether local lockdowns may work has also not been sufficiently discussed outside of the Dutch context. The authors fail to mention that this was the approach chosen in Northern Italy at the start of the epidemic (https://en.wikipedia.org/wiki/COVID-19_lockdowns_in_Italy) where it didn't work, as we all know. On the other hand, more recent local lockdowns in China appeared to be successful, albeit at a great societal cost in terms of restrictions to freedom (https://en.wikipedia.org/wiki/COVID19_lockdown_in_China).

      The reviewer is correct that we only show this in the Dutch context. We can reason about other situations, but clearly these situations differ vastly from country to country.

      Reviewer #3 (Public Review):

      This work uses an agent-based model of SARS-CoV-2 transmission (calibrated to the first wave in the Netherlands) to examine how the societal impact of interventions could have been reduced - while maintaining epidemiological impact - if they were implemented at a subnational (eg, municipality) rather than a national level. After more than two years of lockdowns and mobility restrictions, the societal cost of such measures is becoming better understood, and it is important to evaluate the effectiveness of such measures and reflect upon how they can be deployed in a minimally disruptive fashion. Mathematical and computational models are a natural choice for such investigations as they enable researchers to explore counter-factual scenarios ("what might have happened had we acted differently?")

      The authors conclude that subnational interventions, triggered via prevalence in a particular municipality, could have controlled the first wave of SARS-CoV-2 in the Netherlands with minimal health cost but less societal disruption than national interventions. This claim is supported by reference to Figure 4 showing the impact on (a) hospital admissions and (b) municipalities without interventions through different phases of the outbreak. For more remote/rural municipalities, the use of interventions is delayed by ~1 week, although some (6%) of municipalities avoid interventions altogether.

      Strengths:

      As noted above, the general objective of this study is important and of potentially broad interest. The agent-based model is complex, but not unreasonably so, and makes good use of rich demographic, mobility, epidemiological/clinical, etc. data for calibration. The simulations conducted using the model support the specific conclusions of the manuscript.

      Weaknesses:

      While the motivation and approach are strong points of this work, the analysis and interpretation would benefit from further development. The robustness of model behaviour to the threshold used to trigger subnational interventions is explored; however, there are other aspects of the model that are not subjected to sensitivity analysis, including:

      1) The impact of imperfect surveillance (eg, due to asymptomatic transmission, reporting delays, etc);

      2) The impact of non-compliance, which could potentially differ for subnational versus national interventions;

      3) The impact of pathogens/variants with transmission/severity characteristics different from the original SARS-CoV-2 strain.

      In the absence of such analyses, it is difficult to generalise the findings beyond "this is how subnational interventions could have been used to control the first wave of SARS-CoV-2 in the Netherlands" to "this is how subnational interventions could be used effectively in the event of future outbreaks" (of a SARS-CoV-2 variant or other pathogen).

      The discussion focuses on limitations associated with the model but does not consider other potential implications of subnational interventions. For example:

      1) Subnational interventions may produce unintended consequences if populations respond by relocating from regions with interventions (high prevalence) to regions without interventions (low prevalence).

      2) Subnational interventions would require extremely effective public health messaging to avoid confusing populations. Particularly in densely populated regions where municipalities may be small and tightly connected, the feasibility of communicating (and enforcing compliance with) interventions may be challenging.

      3) A proposal to implement subnational interventions - following the results of this work - may raise ethical questions about cost-benefit trade-offs (eg, whether 355 additional hospital admissions is an acceptable price to pay for 36 million person-days without interventions; ie, two days per citizen, on average). The fact that such decisions would (in the even of a future outbreak) need to be made rapidly, in the face of potential uncertainty about pathogen characteristics, heightens the need for clear understanding of how situational factors may affect the likely effectiveness of interventions (at any scale).

      Impact and broader utility:

      As noted, the question addressed - how we can reduce the disruption caused by interventions for transmission control - is important. Thus, the work presented in this manuscript has the potential for broad utility. Currently, this is limited by the focus on specific outbreak instance.

      In general terms, we agree with the reviewer. That said, the “possibility space” of policymaking is infinite dimensional, in the sense that the intervention measures can take an infinitely many forms, starting times and durations. The framework that we have built upon combining data sources such as demography, mobility, interactions and disease parameters now makes it possible to explore these possibilities. These will be explored in future work.

    1. Author Response

      Reviewer #1 (Public Review):

      The data that is presented is quite clear, and expected given the prior in vitro work, as well as prior work in vivo with helminth infection and BCG vaccination. Overall, it is important to demonstrate that observations from in vitro experiments are relevant in vivo, however, there are concerns with the design of this study which limits its impact. In addition, the study confirms what is expected from prior work, but falls short of adding any new mechanistic insight.

      We thank the Reviewer for evaluation of the manuscript and for the comments. Indeed, published studies have shown that helminth infection can impair the response to the BCG vaccine. However, this manuscript shows for the first time that IL-4 and helminth infection impair MINCLE expression in vivo. In addition, it is the first report demonstrating a negative effect of helminth infections on the antigen-specific Th1/Th17 response after vaccination with a MINCLE-dependent adjuvant.

      Regarding mechanistic insight, we have employed mice deficient in IL-4/IL-13 to determine whether the thwarted Th1/Th17 response is caused by these Th2 cytokines in helminth-infected mice. New Figure 6 in the revised manuscript indeed demonstrates recovery of antigen-specific IFN and IL-17 production in the absence of IL-4/IL-13.

      In terms of the in vivo experimental design, it is unclear why the authors chose to administer BCG IP, when the vaccine is given SC (and then based on more recent data, IV could be arguably interesting and relevant). The focus on the peritoneum limits the potential application of these findings to address the important question of the effects of helminth infection on BCG vaccine responses. The ultimate in vivo experiment to be able to demonstrate a physiological relevance of the mechanisms explored here would be to see what the effect was on Mtb infection in the lung.

      BCG was injected i.p. to induce upregulation of MINCLE on peritoneal cells and to be able to ask whether IL-4 and/or helminth infection will lead to a down-regulation of MINCLE expression on myeloid cells in vivo. Thus, we were not interested in this context in the adaptive immune response to BCG. Instead, the peritoneal BCG injection provided access to myeloid cells exposed to Th2 immune condition in vivo for analysis of MINCLE protein levels on the surface. As stated in the Discussion section (lines 400-405 in the revised manuscript), detection of MINCLE by flow cytometry from tissue cells is complicated by the loss of cell surface protein during enzymatic organ digestion.

      We agree that it would be interesting to study the impact of helminth infection on BCG-induced protection to Mtb challenge infection in the lung. As we have described here the impairment of Th1/Th17 immune responses after immunization with H1/CAF01 that induces protection (Werninghaus et al. 2009 J Exp Med), it would make most sense to perform such challenge infections first in this setting. However, Mtb infection requires a dedicated BSL3 animal facility, we therefore consider such challenge experiments beyond the scope of this manuscript

      The authors do report different responses in the spleen and lymphnode, which is interesting, but lines 336-337 accurately point out that compartmentalized overexpression of IL-10 in the spleens but not the lymph nodes has been described in mice with chronic schistosomiasis. Mechanistic insight into this phenomenon was lacking, and the relevance to Mtb infection is still unknown.

      We agree that the mechanism for the compartmentalized regulation of adaptive immune differentiation in helminth-infected mice is not clear.

      Reviewer #2 (Public Review):

      The manuscript entitled "IL-4 and helminth infection downregulate Mincle-dependent macrophage response to mycobacteria and Th17 adjuvanticity" by Schick et al. demonstrate the inhibitory activity of IL-4 and helminth infection on mycobacteria-mediated Th17 immunity. Overall, the authors reported interesting findings with solid data that advance our understanding of CLR function in fungal-bacterial co-infection.

      We thank the Reviewer for the appreciation of our study.

      Reviewer #3 (Public Review):

      The authors first demonstrated in bone marrow-derived macrophages (BMMs) that IL-4 treatment of BMMs led to a significant reduction of BCG- and TDB-induced MINCLE expression (Fig. 1). While IL-4 treatment did not impact BCG phagocytosis by BMMs, it led to a reduced production of the cytokines G-CSF and TNF by BMMs (Fig. 2). In an elegant model using hydrodynamic injection of mini-circle DNA encoding IL-4, the authors show that IL-4 overexpression abrogated the increased MINCLE expression in monocytes upon BCG infection in vivo. Similar findings were observed in a co-infection model with the hookworm Nippostrongylus brasiliensis, where MINCLE expression on inflammatory monocytes from BCG-infected mice was reduced compared to control mice infected only with BCG (Fig. 3). The key findings of the manuscript include the two murine helminth infection models, S. mansoni as a chronic infection, and N. brasiliensis as a transient infection, in both of which the authors showed an organ-specific inhibition of the Th17 response in a vaccination setting with a MINCLE-dependent adjuvant (Fig. 4 and 5).

      Data shown in the manuscript represents a major advance over previous studies because for the first time a relation between IL-4 and MINCLE expression and function is demonstrated in vivo in relevant co-infection models. All experiments have been done with care. Appropriate controls have been included and conclusions are largely supported by the data. Future studies in human patients will be needed to determine the clinical relevance of the findings observed in the murine helminth infection models.

      We thank the Reviewer for the positive comments and agree that it will be interesting to study the impact of helminth infection on CLR expression and function in human infection and vaccination settings.

    1. Author Response

      Reviewer #1 (Public Review):

      COVID-19 severity has been previously linked to a genetic region on chromosome 3 introgressed from Neandertals. The authors use several computational methods to, within this region, identify specific regions that putatively regulate gene expression, and to identify genes within these regions associated with COVID-19 severity. The use of several complementary computational approaches is a major strength of the paper as it bolsters confidence in the findings and narrows the search for significant genomic regions down to most likely candidates. They find 14 genes that exhibit expression regulated by the identified introgressed genomic regions. Among these are several chemokine receptors including two - CCR1 and CCR5 - whose upregulation is associated with severe COVID-19. The authors then use functional genomics to determine whether the identified regions do regulate gene expression.

      We thank this Reviewer for highlighting these strengths.

      In contrast to the robustness of the computational findings, the authors' MPRA results are less robust with respect to the significance of the paper to clinical severity of COVID-19. The MPRA shows that the computational methods were reasonably effective at identifying regulatory elements within the introgressed region (53%). The authors then focus on emVars where the H.n. allele differentially regulates expression and identify 4 putative emVars that may regulate expression of CCR1 and CCR5. However, the authors found in their MPRA that these emVars downregulate reporter gene expression, whereas the genes of interest CCR1 and CCR5 are upregulated during severe COVID.

      This result highlights the principal weakness of using the MPRA in this context, as it assumes that reporter gene expression using a minimal promoter has identical regulatory determinants as expression of the gene of interest. Its strength is the high-throughput nature of the assay, but its weakness is the lack of specificity with respect to the question at hand. This lack of specificity mitigates the impact of the functional aspect of the work. The authors' computational findings certainly bolster previous work that H.n. introgressed alleles are associated with COVID-19 severity and that this association may be at least partly dependent on gene expression differences between the archaic and modern alleles. However, the specific question at hand, whether chemokine receptor expression is linked to the clinical phenotype, remains unaddressed.

      Ultimately the authors results support the conclusions that the 4 emVars identified do regulate gene expression. However, the hypothesis that these specific regions are linked to COVID-19 severity is not supported. The authors' speculation as to why their results may differ from the observed upregulation during disease is intriguing, but lacks support.

      We thank the Reviewer for providing these important points and we hope through our new experimental approach we helped to strengthen our findings. However, we also have modified the manuscript to also be more critical of our findings in the context of the issues Reviewer has brought up. This is shown in our updated Discussion, whose parts are provided above in the section addressed to the Editor, as well as in the newly revised manuscript.

      Reviewer #2 (Public Review):

      Previous research using GWAS and population genetics approach identified a genetic haplotype on chromosome 3 derived from Neanderthals as the major risk factor for severe COVID-19. However, the specific variants that are causative of the severe COVID-19 phenotype remain unknown. Here, Jagoda et al. aim to identify the causative variants for the severe COVID-19 by leveraging eQTL analysis followed by Massively parallel reporter assays (MPRA). Their datasets and results are unique and novel. Their research is well designed, and will serve as a model strategy for future studies of functional annotation of disease-associated variants.

      We thanks Reviewer #2 for these compliments.

      However, there are following critical weaknesses in this manuscript that reduce the impact of this work; (1) The quantitativity of the MPRA output is questionable because of their incomplete definition of MPRA activity, which is based on absolute barcode counts without comparing negative controls. (2) Molecular mechanisms (binding transcription factors, etc.) of causative variants that underlie the regulation of CCR1/5 expression and COVID19 severity are not analyzed and validated.

      We hope that below we have addressed these comments through our analyses and new experiments.

      Reviewer #3 (Public Review):

      This manuscript by Jagoda et al. addresses the genetic mechanism of the haplotype at chromosome 3 where introgressed from Neanderthals shows the strong association with COVID-19 severity in Europeans. They re-evaluate the adoptively introgressed segment using Sprime and U and Q95 methods and analyze cis- and trans- eQTLs based on the whole blood dataset. All the 361 Sprime-identified introgressed variants act as eQTLs in the whole blood and alter the expression of 14 genes including seven chemokine receptor genes. Then they tested the 613 variants using a Massively Parallel Reporter Assay (MPRA) in K562 cells and narrow downed the 20 emVars. In the end, they selected the four variants based on four criteria regarding the association of COVID-19 severity, eQTL data, chromosomal interaction, and epigenetic marks in immune cells. They highlighted variant rs35454877 (CCR5 regulation), rs71327024, rs71327057, and rs34041956 (CCR1 regulation).

      Narrowing down the four critical variants from the around 800 kb introgressed region is impressive work. However, MPRA and eQTL data are not consistent, and these data don't support clinical gene expression data (increased expression of CCR1 in severe COVID-19 patients).

      We thank this Reviewer for noting our impressive work, we have now addressed these concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an interesting and timely paper investigating the impact on participation in cancer screening programs across Italy during the COVID-19 pandemic where there was massive disruptions to health services. What is of particular interest in this analysis was the investigation of social, educational and cultural factors that might have impacted access and participation to screening.

      • In the present study, the authors analyzed data collected by PASSI between 2017 and 2021, from interviews of more than 106,000 people, a representative sample of the Italian population aged 25-69 was selected but its not clear what was the representativeness by region, gender and age educational attainment? Also what is the total population (so I don't have to look it up). I am wondering if participation differed by characteristics and what approach to achieving the representative sample was made (e.g. replacement of individuals or oversampling certain strata where participation was lower).

      PASSI is one of the two routinely collected Italian National Health Interviews. It has been described in several papers and there is a website reporting in detail methods, percentage of refusals, and numbers of interviews. Nevertheless, we agree with the reviewer that a brief summary of the methods is needed, and we added some details on data collection. Furthermore, details on the number of interviews according to the selected period, age, and sex strata cannot be found in the general description of the survey. Therefore, we gave more details also on the sample used for this study in supplementary table 1.

      • For figures 5-8 what is the N for the different groups not just the %?

      We agree with the reviewer that giving the actual numbers on which the percentages are computed is necessary. Nevertheless, as with any stratified sample, estimates from PASSI are computed using weights, therefore percentages cannot be computed directly from the observed numbers. We decided to add supplementary table 1, which reports the number of valid interviews on which percentages are estimated.

      • Table 2 to me is a key piece of information and very interesting can the authors formally test if there are significant differences between the time periods?

      Thank the reviewer for this suggestion. Firstly, we added a table in which we analyzed all the data together and we included the calendar period, categorized as before and after the pandemic, among covariates. Secondly, we checked if any of the differences between the prevalence ratios observed in the two periods were significantly different at a 0.05 alpha error threshold and we added a comment in the text: “Nevertheless, the differences could be due to random fluctuations”. We did not add p-values for the interaction of all the variables in each cancer screening because the table is already very complex, and three more columns would make it difficult to read.

      Reviewer #3 (Public Review):

      This study is primarily a descriptive analysis that provides a clear and accessible account of how screening activity varied across Italy and between groups. While primarily a simple descriptive account such work is important to document what were the impacts of the pandemic on preventative health services and to understand how they differed across groups. The combination of survey responses from regional screening programmes and individuals is a useful use of two data sources. The study is very clearly written and does not over-interpret the presented data.

      The methods description states that the analysis presents the "standard months" required for the programmes to recover from the service delays. The subsequent reporting of these delays in the results section did not use the same terminology and I see scope for clarification by using common language regarding this assessment throughout the paper. I see scope for further disaggregation of the regional results within the study but equally I understand why the authors might not wish to report outcomes for specific regions. I see scope for improvement in the figures within the manuscript but this is a relatively presentational matter. I would like to see some further description of the Poisson regression analysis as what is included within the manuscript appears rather brief. There is also one section of the methods that seems as if it would better belong in the introduction, but overall the manuscript was very clearly structured.

      We thank the reviewer for his encouraging comments. We checked all the manuscript and we tried to use always the same name for each concept.<br /> We expanded the method section giving more details on models and statistical analysis. We decided not to report data at the regional level but the variability within macro areas.

      The analysis presented achieves the authors' stated aims in my view. I see a useful contribution in documenting the impact of the COVID-19 pandemic on screening in Italy. This may inform further work on assessing the eventual health impact of delays as well as work considering how best to make screening programmes more resilient to such shocks. Ultimately it will take time to observe just how significant the impacts of service interruptions were on cancer prevention. Readers should remember that many screening services may still provide good protection against cancer as long as the interruptions are limited to simply to delays in coverage rather than the longer-term loss of participation, especially for those with incomplete screening histories or of otherwise elevated risk of disease.

      Further work may wish to consider how programmes prioritised capacity or what efforts have been made to restart screening. Similarly, there is scope for more detailed disaggregation assessment of who received screening as programs restarted. Both these issues are beyond the scope of the present study however. The present submission provides a good basis for any further such exploration.

      We thank the reviewer for these comments. We tried to capture some of the concepts in our discussion.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors explore the use of SRT as a host-directed therapy for use in combination with other first-line TB antibiotics. This manuscript is of substantial importance since TB is a major world health concern, and there is growing interest in the development of host-directed therapies to augment existing therapies for TB. Demonstrating the effectiveness of adding an FDA-approved drug to existing cocktails of anti-TB drugs has potentially exciting implications.

      The manuscript is bolstered by their use of multiple in vitro and in vivo models of infection, as well as a clinically relevant strain of TB. While their findings generally support the use of SRT as an effective HDT/treatment, the mechanistic details underlying the effectiveness of SRT remain somewhat obscure, and as presented, the in vitro experiments support more limited conclusions.

      Major concerns:

      In vitro studies (i.e. bacterial culture) were only performed with SRT up to 6 uM while the cultured cell experiments used a range up to 20 uM. 5 uM had almost no effect on the viability/growth of Mtb in macrophages. The authors should use the same concentrations in vitro as their macrophage studies to test whether SRT directly impacts Mtb viability to be able to rule in/out that SRT does not impact Mtb viability when cultured.

      We haven’t seen any appreciable decrease in the growth of Mtb at upto 20M in in vitro experiments, nearly 30-40% restriction after 8 days of culture. We used in combination of HR a lower dose of 6mM in combination with HR to offset the effect of minimal SRT inhibitory effects so that only the effect of SRT is understood.

      The mechanism of action of SRT during TB infection and the conclusions drawn by the authors are not supported by the limited experimentation. SRT is presented as an antagonist of polyI:C-induced type I IFNs, but during TB infection, cytosolic DNA sensing via the cGAS/STING axis constitutes the major pathway through which type I IFNs are induced in macrophages.

      To offer more support that SRT inhibits type I IFN, the authors should consider measuring the the actual amount of type I IFN using an IFNb ELISA. Additionally, the authors should use human/mouse primary macrophages (not just THP1 reporter cells) and measure transcript levels (at key time points post infection) and protein levels of type I IFN and other proinflammatory mediators (e.g. TNFa, IL-1, IL-6) +/- SRT to determine if SRT is specific to the type I IFN response. If this is indeed the case, other NFkB genes/cytokines should not be impacted.

      Moreover, to draw the conclusion that "augmentation property of SRT is due to its ability to inhibit IFN signalling" a set of experiments using an IFN blocking antibody would enhance Figure 2, as both cGAS and STING KO macs have significant differences in basal gene expression and their ability to respond to innate immune stimuli.

      Because the first half of the paper focuses on type I IFNs during macrophage infection to explain the mechanism of action for SRT, additional analysis of the mouse infections to examine levels of type I IFNs, as well as IL-1B and IFN-g (in serum/tissues?), is important for connecting the two halves of the manuscript. The in vivo data would also be strengthened by quantitative analysis of histological changes by, for example, blinded pathology scoring. This type of quantitation would also permit statistical analyses of this important pathology readout.

      We have performed analyse of tissue cytokine levels and did not see stark differences in the levels between HRZE and HRZES at two time points of 4 and 8weeks post treatment (Figure below). We feel that such studies would need a more comprehensive analyses of the immunological response induced in the host by the treatment at multiple time points. Such studies would be part of a more focussed plan in the future proposals and manuscripts. We have also conducted a manual scoring of the lesions between the groups and have recorded this data in the manuscript (Fig.4-figure supplement 1)

      The authors conclude that SRT functions through an inflammasome-related function, but this conclusion requires further support of actual inflammasome activation, such as IL-1B secretion by ELISA or IL-1B processing by western blot analysis, rather than Il1b gene expression alone. Additional functional readouts of inflammasome activation like cell death assays would also strengthen this conclusion.

      We thank the reviewer for these suggestions. These studies are currently underway and will be part of a future manuscript detailing the mechanistics of SRT mediated increase in antibiotic efficacy.

      What strain of TB was used in these studies? The results and methods do not indicate the strain used, which is critical to know since different strains have varying pathogenesis phenotypes.

      We have used Mtb Erdman for routine drug sensitive and N73 for the drug tolerant studies. This has been added in the text.

      Minor concerns:

      It might be worth consistently using the more common INH and RIF abbreviations to increase the clarity/readability of the MS and figures.

      We have used the conventional clinical abbreviations used for INH and Rifampicin What is the physiological concentration of SRT when taken for depression and how does that compare to the concentrations used in vitro? Are the in vitro concentrations feasible to achieve in patients?

      In Figure 3B, why is there a spike in TNF-a in the HRS treated cells only at 42h?

      The authors wish to thank the reviewer for this query. We have reanalysed the data and have depicted the modified figures in the current text version. The spike at 42H for TNF was an oversight and due to an erroneous representation of the values in the figure.

      Was statistical analysis performed on the data in Figure 3B and D?

      Yes, we have incorporated this information in the modified figure.

      A description/discussion of the different mouse strains use in infection - what benefits each has as a model and why several were used - would help convey the impact of the in vivo studies.

      These have been incorporated in the text. A discussion of the mouse strains and their immunopathology in infection has been included in the text.

      Since antibiotics and SRT were administered ad libitum, how did the authors ensure that mice took enough of the antibiotics and especially SRT? Is it known whether these drugs affect the water taste enough to affect a mouse's willingness to drink them?

      We preferred the use of ad libitum delivery of TB drugs in drinking water as used in the previous studies by Vilchèze et .al, 2018 Antimicrob Agents Chemother 23;62(3):e02165-17. To avoid non drinking, we used 5% glucose in the water of all animals including the non-antibiotic treated groups. We also followed the uptake of water during the treatment and found comparable levels of usage between the groups.

      Was statistical analysis performed on time-to-death experiments?

      Because of the inherent differences in the susceptibility and response between males and females C3HEBFEJ mice, we did not perform statistical analyses between the groups.

      Were CFUs measured in mice from Figure 4 to determine empirically how effective the antibiotic treatments were? And if SRT impacted their effectiveness?

      We have not tested the effect of SRT on bacterial burdens on bacteria treated with HR alone as these studies were aimed at deciphering chronic pathology. We have tested the effect on bacterial loads in the C3HEBFEJ model with the four-drug therapy and the C57BL6 and Balbc models of infection.

      The H&E images could use some additional labels to more easily discern what groups they belong to.

      These have been incorporated in the figure.

    1. Author Response

      eLife assessment

      The purpose of this study was to determine whether heme oxygenase -2 deficiency translates to deficiencies in motor neuron function. This paper plays a plausible mechanism by which heme oxygenase-2 deficiency can lead to obstructive apneas. Indeed, this is among the first papers to comprehensively describe a signaling pathway in motor neurons and the consequences of its deficiency. Furthermore, the work completed here may be relevant to other diseases in which motor neuron signal transmission is a key contributor.

      We thank for their assessment and constructive comments. Based on their input below we performed additional analyses focused on the impact of HO-2 dysregulation on the rhythmogenesis from the preBötC.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript discussed the combination use of pyrotinib, tamoxifen, and dalpiciclib against HER2+/HR+ breast cancer cells. Through a series of in vitro drug sensitivity studies and in vivo drug susceptibility studies, the authors revealed that pyrotinib combined with dalpiciclib exhibits better therapeutic efficacy than the combination use of pyrotinib with tamoxifen. Moreover, the authors found that CALML5 may serve as a biomarker in the treatment of HER2+/HR+ breast cancer.

      The authors provide solid evidence for the following:

      1) The combination use of pyrotinib with dalpiciclib exhibits better therapeutic efficacy than the combination use of pyrotinib with tamoxifen.

      2) Nuclear ER distribution is increased upon anti-HER2 therapy and could be partially abrogated by the treatment of dalpiciclib.

      3) CALML5 may serve as a putative risk biomarker in the treatment of HER2+/HR+ breast cancer.

      The manuscript has significant strengths and several weaknesses. The strengths include the identification of the novel role of dalpiciclib in the treatment of HER2+/HR+ breast cancer. Moreover, the authors provide solid evidence that the combined use of dalpiciclib with pyrotinib significantly decreased the total and nuclear expression of ER. The main weakness of the manuscript is that the manuscript is difficult to read due to language inconsistency. In addition, some figure captions and figure legends should be carefully amended.

      Thanks for your comments on our manuscript. We feel sincerely sorry for the inconsistency of the manuscript due to poor language. We have improved our manuscript as well as the figures according to your valuable suggestions.

      Reviewer #2 (Public Review):

      The authors performed preclinical studies to investigate the underlying mechanism of how the combination of pyrotinib, letrozole and dalpiciclib achieved satisfactory clinical outcomes in the MUKDEN 01 clinical trial (NCT04486911). Mechanistically, using anti-HER2 drugs such as pyrotinib and trastuzumab could degrade HER2 and facilitate the nuclear transportation of ER in HER2+HR+ breast cancer, which enhanced the function of ER signaling pathway. The introduction of dalpiciclib partially abrogated the nuclear transportation of ER and exerted its canonical function as cell cycle blockers, which led to the optimal cytotoxicity effect in treating HER2+HR+ breast cancer. Furthermore, using mRNA-seq analysis and in vivo drug susceptibility test, the authors succeeded in identifying CALML5 as a novel risk factor in the treatment of HER2+HR+ breast cancer.

      Thanks for your comments and valuable suggestions, we’ve improved our manuscript according to your suggestions.

      Reviewer #3 (Public Review):

      In this research, the authors explore a novel mechanism of CDK4/6 inhibitor dalpiciclib in HER2+HR+ breast cancers, in which dalpiciclib could reverse the process of ER intra-nuclear transportation upon HER2 degradation. The conclusions are significant to gain insight into the biological behavior of TPBC and provided a conceptual basis for the ideal efficacy in the published clinical trial. The findings are supported by supplemented in vivo assay and transcriptomic analysis.

      Thanks for your comments and valuable suggestions to us so that we could improve this manuscript.

  4. Dec 2022
    1. Author Response

      Reviewer #2 (Public Review):

      The majority of genetic effects discovered in genome-wide association studies (GWAS) of common human diseases point to non-coding variants with putative gene regulatory effects. In principle, studying genetic effects on gene expression phenotypes, as mediators between genotype and disease, can help understand the underlying function of GWAS variants.

      Lafferty et al., set to study the regulation of microRNA (miRNA) levels in mid-gestation human neocortical tissues as a potential contributor to brain-related phenotypes. To this end they performed miRNA expression profiling via small-RNA sequencing, followed by assaying expression quantitative trait loci (eQTLs) that locally regulate miRNA genes.

      In addition to reporting some properties of miRNA-eQTLs, e.g., their tissue-specificity, the authors searched for potential overlap or "colocalization" between these eQTL loci and GWAS loci for several putatively brain-related phenotypes. They reported colocalization at the locus containing the SNP rs4981455 which is an eQTL for miR-4707-3p and is also associated with global cortical surface area (GSA) and educational attainment phenotypes in GWAS. They further showed that exogenously increased expression of miR-4707-3p in primary human neural progenitor cells (as a model to study neurogenesis) derives an increased rate of proliferation.

      The reported results are interesting and important, particularly for the understanding of miRNA biology. That said, as I detail below, the claim that miR-4707-3p expression modulates brain size and thus cognitive ability, although potentially consistent with the data, is not unequivocally supported by the analyses. As such, considering the potential social impact of the misinterpretations of these results, I believe the authors should explicitly discuss caveats, alternative explanations consistent with the data, and broader implications:

      We thank the reviewer for their positive evaluation of our work and detailed comments. We agree that misinterpretation of our results could have negative social impacts, and now have added caveats and alternative explanations to our discussion section.

      1) The colocalization analysis used effectively tests whether miRNA-eQTL and GWAS variants are in linkage disequilibrium (LD), and does not formally test whether the miRNA-eQTL and GWAS signals are explained by the same genetic variant which is necessary for establishing causality. In this study, a formal test of colocalization is challenging given that the LD patterns in the eQTL data (from mixed ancestries) are dissimilar to the GWAS data (from European-descent samples). Furthermore, even if GWAS and miRNA-eQTL signals are explained by the same variant, this could be due to confounding (a confounder affecting both), or pleiotropy (genotype independently affecting both), and not necessarily that the miRNA-eQTL signal mediates the GWAS signal. These are also true for colocalization analyses of miRNA-eQTLs with mRNA-eQTLs or splicing-QTLs. One practical suggestion is whether authors can perform the colocalization analysis better, e.g., with methods such as SMR (https://yanglab.westlake.edu.cn/software/smr/#Overview).

      As the reviewer mentioned, testing colocalized genetic signals using the eQTL dataset presented in this study remains challenging given the mixed-ancestry of the samples. We believe our primary test for colocalization, conditioning the miRNA-eQTL association using a secondary signal index variant, is sufficient evidence for a shared genetic signal (Nica et al., 2010). This is particularly true when looking for colocalizations between the miRNA-eQTLs and mRNA-e/sQTLs because both datasets used largely the same samples for expression quantification. However, the colocalization between the miRNA-eQTL for miR-4707-3p expression and the GWAS signal for educational attainment warrants greater scrutiny because the GWAS signal was discovered in European-descent samples.

      To address this concern, we have conducted an additional colocalization test using the SMR and HEIDI methods as suggested by the reviewer (Zhu et al., 2016). We have updated the results section, “Colocalization of miR-4707-3p miRNA-eQTL with brain size and cognitive ability GWAS”:

      "In addition to the HAUS4 mRNA-eQTL colocalization, the miRNA-eQTL for miR-4707-3p expression is also co-localized with a locus associated with educational attainment (Figure 5A)(2). Conditioning the miR-4707-3p associations with the educational attainment index SNP at this locus (rs1043209) shows a decrease in association significance, which is a hallmark of colocalized genetic signals (Figure 5-figure supplement 2A)(58,59). Additionally, the significance of the variants at this locus associated with miR-4707-3p expression are correlated to the significance for their association with educational attainment (Pearson correlation=0.898, p=5.1x10-7, Figure 5-figure supplement 2B). To further test this colocalization, we ran Summary-data-based Mendelian Randomization (SMR) at this locus which found a single causal variant to be associated with both miR-4707-3p expression and educational attainment (p=7.26x10-7)(60). Finally, the heterogeneity in dependent instruments test (HEIDI), as implemented in the SMR package to test for two causal variants linked by LD, failed to reject the null hypothesis that there is a single causal variant affecting both gene expression and educational attainment when using the mixed-ancestry samples in this study as the reference population (p=0.159). The HEIDI test yielded similar results when estimating LD with 1000 Genomes European samples (p=0.120). All this evidence points to a robust colocalization between variants associated with both miR-4707-3p expression and educational attainment despite the different populations from which each study discovered the genetic associations."

      To strengthen the argument for colocalization, we added Figure 5-figure supplement 2.

      Given the unique problem of colocalizing genetic signals from datasets with different LD patterns, we also attempted to colocalize the miRNA-eQTL for miR-4707-3p and educational attainment GWAS using eCAVIAR and coloc (Hormozdiari et al., 2016; Wang et al., 2020). Neither of these methods produced a significant colocalization between these two genetic signals at this locus. However, neither of these methods were designed or tested using mix-ancestry reference populations, and therefore we are still confident in declaring a shared genetic signal at this locus.

      2) Although possible, there is no direct evidence that the GWAS signals at rs4981455 for educational attainment and GSA are driven by variation in miRNA levels in the studied tissue. As the authors noted, rs4981455 is also an eQTL for the gene HAUS4. Furthermore, rs4981455 is a significant e/sQTL across almost all adult tissues in GTEx, and so likely has regulatory activity across wide ranges of cell or tissue types. Therefore, pinpointing the causal contexts mediating the effect in GWAS is impossible with the current data.

      We agree that fully understanding the causal relationship, or mechanism, between rs4981455 and educational attainment is impossible with the current data. However, we believe the miRNA-eQTL at rs4981455, discovered in developing brain tissue, provides clues as to the causal context of this locus on educational attainment. We have updated the language throughout the manuscript to temper the notion that expression differences in miR-4707-3p is causal for changes in educational attainment (discussed below), yet we maintain that the evidence provided is consistent with miR-4707-3p playing a role in brain development ultimately leading to changes in adult educational attainment. The updated hypothesized causal relationship is shown in Figure 6H and expanded discussion on the caveats of this study, addressed in the next section, also serve to mitigate this concern.

      3) Orthogonal to the issues above, the genotype-to-phenotype pathway as hypothesized, i.e., genotype → miRNA levels → brain structure → educational attainment, is oversimplistic and rests on an implicit prior belief that genetic associations with educational attainment can be trivially mapped to fundamental brain features that determine cognitive ability. To illustrate the problem with this prior I refer to an old example by Christopher Jencks: in a society that prevents red-hair kids to go to school, genetic effects on hair color would be associated with educational attainment, despite having no intrinsic biological relationship with cognition. I give two scenarios consistent with the specific case of rs4981455 that are fundamentally different from what is implied in the paper: (i) The case of indirect genetic effects (see Kong et al., Science 2018). In this case, rs4981455 affects the nurturing behavior of an individual's parents, which in turn influences the individual's educational achievements and consequently brain structure features. (ii) The case of confounding. In this case, the genetic effects on brain structure are shared with another feature, such as facial shape (see Naqvi et al., Nature Genetics 2021). Variation in facial shape in a discriminatory educational environment can covary with educational attainment.

      The causal pathway presented in the original version of this manuscript was indeed too simplistic and inferred a causal pathway between rs4981455 and educational attainment that was not fully backed by our data nor could be fully proved experimentally. The point we had hoped to make, and which is better represented by the updated version of Figure 6H, is that if there is a causal relationship between rs4981455 and educational attainment mediated by miR-4707-3p expression, we may be able to detect the influence of miR-4707-3p on a cellular phenotype that would explain the association of rs4981455 with cortical surface area, intracranial volume, and head size.

      An updated discussion summarizes how we were not able to find evidence for a molecular mechanism consistent with the radial unit hypothesis, but that a biological link between the miRNA-eQTL and GWAS phenotypes may yet be uncovered:

      "We did find one colocalization between a miRNA-eQTL for miR-4707-3p expression and GWAS signals for brain size phenotypes and educational attainment. This revealed a possible molecular mechanism by which genetic variation causing expression differences in this miRNA during fetal cortical development may influence adult brain size and cognition (Figure 6H). Experimental overexpression of miR-4707-3p in proliferating phNPCs showed an increase in both proliferative and neurogenic gene markers with an overall increase in proliferation rate. At two weeks in differentiating phNPCs, we observed an overall increase in the number of cells upon miR-4707-3p overexpression, but we did not detect a difference in the number of neurons at this time point. Based on the radial unit hypothesis (26,73), we expected to find an overall decrease in proliferation or increase in neurogenesis upon miR-4707-3p overexpression which would explain decreased cortical surface area. However, our in vitro observations with phNPCs do not point to a mechanism consistent with the radial unit hypothesis by which increased miR-4707-3p expression during cortical development leads to decreased brain size. This has also been seen in similar studies using stem cells to model brain size differences linked with genetic variation (74). Nevertheless, the transcriptomic differences associated with overexpression of miR-4707-3p in differentiating phNPCs suggest this miRNA may influence synaptogenesis or neuronal maturation, but these phenotypes may be better interrogated at later differentiation time points, by jointly expressing HAUS4 and mir-4707, or with assays to directly measure neuronal migration, maturation, or synaptic activity."

      We believe the two cases addressed by the reviewer of indirect genetic effects and confounding which may actually explain the association between rs4981455 and educational attainment are less likely to be influencing the miRNA expression of miR-4707-3p measured in developing cortical tissue. This is combined with a discussion on the caveats of our findings and is addressed in the next section.

      4) The paper lacks a discussion on caveats to protect against potential misinterpretation of findings, especially considering the troubled history of linking facial shape and head morphology to human behavior and intelligence. I refer to the last paragraph of Naqvi et al., Nature Genetics 2021, as an example of such discussion. This is particularly crucial given that the frequency of rs4981455 varies across human populations. For example, it is important to state that the GSA and education attainment GWAS findings are in individuals of European descent, and may not necessarily point to an effect in other ancestries or even in European-descent individuals that differ from the GWAS samples in various ways, e.g., socioeconomic status (see Mostafavi et al., eLife 2020). In other words, these findings pertain to variation within the studied samples. On this note, it is important to state the amount of variation in multiple phenotypes explained by rs4981455 (which is likely tiny), and that it by no means determines the phenotype.

      We have added a paragraph to the discussion highlighting the caveats of our analysis and protecting from overinterpretation of our findings:

      "Here we have proposed a biological mechanism linking genetic variation to inter-individual differences in educational attainment. Given the important societal implications education plays on health, mortality, and social stratification, a proposed causal mechanism between genes and education warrants greater scrutiny (75,76). Any given locus associated with educational attainment may be driven by a direct effect on brain development, structure, and function, an indirect genetic effect such as parental nurturing behavior, or confounding caused by discriminatory practices or societal biases (77,78). Given that expression was measured in prenatal cortical tissue, where confounding societal biases are less likely to drive genetic associations and that experimental overexpression of miR-4707 affected molecular and cellular processes in human neural progenitors, the evidence at this locus is consistent with a direct effect of genetic variation on brain development, structure, and function rather than being driven by confounding or indirect effects. However, there are some important caveats to this statement. First, our study only provides evidence for the direct effect on the brain at this one educational attainment locus. Our study does not provide evidence for the direct brain effects of any other locus identified in the educational attainment GWAS. Second, common variation at this locus explains a mere 0.00802% of the variation in educational attainment in a population, so this locus is clearly not predictive or the sole determinant of this phenotype. Third, the GWAS for educational attainment and brain structure were conducted in populations of European ancestry, and allele frequency differences at these loci cannot be used to predict differences in educational attainment or brain size across populations. Finally, though both experimental and association evidence suggests a causal link between this locus and educational attainment mediated through brain development, we are unable to directly test the influence of miR-4707-3p expression during fetal cortical development on adult brain structure and function phenotypes. Therefore, we cannot rule out the possibility that the causal mechanism between rs4981455 and adult cognition may be a result of genetic pleiotropy rather than mediation at this locus. Despite these caveats, identifying the mechanisms leading from genetic variation to inter-individual differences in educational attainment will likely be useful for understanding the basis of psychiatric disorders because educational attainment is genetically correlated with many psychiatric disorders and brain-related traits (2,79)."

      We hope that this paragraph contextualizes our results sufficiently to emphasize the high bar that must be surpassed to propose a biological link between a miRNA-eQTL and a risk loci for brain related traits while maintaining that we can not completely rule out the possibility of genetic pleiotropy.

      5) The main colocalization signal is tentatively shown for GSA. However, the authors casually refer to links with "brain size" or "head size" throughout the paper.

      In addition to the locus showing a sub-genome wide significant association to global cortical surface area (GSA) presented in Figure 5, a GWAS for head size (Knol et al., 2020) and a GWAS for intracranial volume (Nawaz et al., 2022) (recently published since submitting the original manuscript) both show genomic associations at this locus for miR-4707-3p expression. The index variants for both traits colocalize with the miRNA-eQTL for miR-4707-3p and their effect directions match: alleles increasing expression of miR-4707-3p show association to decreased head size and decreased intracranial volume. For both of these studies, the summary data is not yet publicly available, preventing us from constructing plots at this locus (similar to those shown in Figure 5) or conducting additional colocalization analyses. To be more consistent throughout the paper, we have replaced many “head size” references with “brain size” when talking about this locus.

    1. Author Response

      Reviewer #2 (Public Review):

      I am not a specialist in cryo-EM, so cannot comment on the technicalities of the structure reconstruction or methods used. I thus focus on the conclusions and observations that the authors provide in the manuscript and their relevance to functional photosynthesis.

      The authors attempt to resolve the structure of PSII from Dunaliella and noticed that three types of PSII could be identified: two conformational states, and a stacked configuration. There is no doubt that these structures add to our current knowledge of PSII and that they exist in abundance upon solubilisation of the sample. My main issue however is the relevance to in vivo conditions, and the efforts to exclude the possibility that pigment loss and conformational states and stacking are a reflection of ex-vivo manipulations.

      Our compact model contains 202 Chls molecules while the stretched conformation contains 206 Chls. All of the differences in Chl binding are attributed to CP29. We have compiled a table enumerating the different CP29 structures currently available from plants and green alga at similar resolution to our work (Supplementary table 2). In the larger plant complexes (C2S2M2) CP29 contains 14 chls, while CP29 in smaller C2S2 complexes contains 10-13 chls, so it appears the some chl loss from CP29 is associated with the release of LHCIIM. In the green alga structures, CP29 contains less chls in general and shows a similar trend. The currently published structure most relevant to our work contains 8 chls (6KAC), a somewhat lower amount then both the compact and stretched models (9 and 11 chls, respectively). The stretched orientation, which is the closest match to the known PSII core arrangement, therefore contains more chls than comparable models. While the in-vivo configuration is not known in the sense that it could contain more chls, the current structure is apparently the closest representation of it.

      The presence of CP29 with lower chls content in the chlamy C2S2 (6KAC, which is in a stretched orientation) supports a conclusion that pigment loss from CP29 alone is not sufficient to trigger the stretch to compact transition although it is associated with it. In general, the precise orientation of CP29 is variable and seem to depend on the binding of additional LHCII, it is possible that some chl loss is accompanied with these changes in vivo.

      I see a number of questions pertaining to this work. Starting from the two conformations of PSII, compact and stretched, the authors say that both are highly active based on oxygen measurements at a saturating light intensity. In the meantime, they report large variations in the chl content and positions of the chlorophyll molecules in these structures (also compared to other known PSIIs). This gives the impression that one can lose two chlorophylls, and freely modify the distance between others without losing efficiency, certainly a risky conclusion. Are the samples highly active also in light-limiting conditions? It is thought that even tiny movements and alterations in chl-chl distances alter their coupling and spectral properties, how come the variations in this report are so huge? In other words, the assay tests the charge separation activity of the PSII RC in the preps, but not the light-harvesting efficiency.

      The chl content differences reported in this work amounts to 2%. In our opinion this represents quite a low variation in pigment content, which exist in virtually any experiment involving large complexes. We agree that measurements of activity in limiting light conditions are interesting, however this goes beyond the scope of the current work. Light harvesting efficiency in PSII is known to vary substantially as a result of additional mechanisms (NPQ in some of its forms), not associated with chl loss or gain. While the formation of quenching centers is attributed to small structural changes within specific pigment protein complexes, what we are showing in this work are structural changes between pigment protein complexes. These can affect transfer rates between the different complexes but are distinct from the structural changes thought to accompany the formation of quenching centers within specific pigment protein complexes.

      How does one ascertain that the lost chlorophyll molecules in CP29 are not a preparation error? Does slightly increasing the detergent concentration impact the proportion of stretched:compact forms?

      The effect of detergent concentration on the proportion of the different forms was not tested directly. However, we do not detect many differences in lipids or bound detergent molecules content between the two conformations, suggesting that for these “ligands” the differences are not substantial. We can only distinguish these two forms at the very last stages of data processing, at the present state of cryoEM cost and time availability, mapping the effect of detergent concentration on the different orientations is outside our reach.

      On a similar note, how do the authors exclude that a certain interaction with this type of grid impacts the distribution of these complexes? Is it identical to a biologically separate preparation of algae? In case of discoveries of this type, it is of high importance to exclude as many possibilities of non-native conditions or influences on the structure.

      It’s hard to completely exclude grid and sample preparation issues. However, we employed relatively standard grids and vitrification conditions. The observed complexes are embedded in vitrified ice and do not interact with the grid directly. The differences we observed are mainly in the orientations of the PSII cores, all the interactions between PSII subunits within each core are preserved and agree with previously published structures. Since the interactions within the core and between cores involve the same physical principles, we think its fairly conservative to think that the observed core orientations are not an artefact of sample preparation.

      I would further like to encourage the authors to elaborate on the CP29 phosphorylation. What is the proportion of PSIIcomp that are phosphorylated? I assume it is not 100%, as in this case, the authors would propose that this is the effect that modulates between compact and stretched architectures.

      Its difficult to estimate the proportion of observed phosphorylation/sulfinylation. To be detected in maps, most of the residues (above 50%) are probably modified. We attempted to estimate this by refining the atom occupancies of the Pi molecule on Ser84 and the oxygens attached to Cys218, both values suggested that about 70% of the complexes are modified. With regards to the possibility that these modifications can promote the formation of the compact state, we think that this is certainly a possibility, since these modifications were detected in this state and are in close proximity to each other. However, this can also result from the resolution differences of the maps and the structural implications of both modifications are hard to predict. At this point we prefer to note their existence without further interpretations.

      In line 290, the authors highlight the structural heterogeneity within the two groups' PSII conformations. I would like to see how does the distribution look like for all the structures together: are the two (stretched and compact) specifically forming two heterogenous distributions? Or is it possible that the distribution between the two is quasi-continuous? In other words, if the structures are not perfectly defined, how do the authors decide that two- and not more or less subtypes exist?

      We went back and refined the initial particle group (containing both compact and stretched orientations) using multibody with masks defining the two PSII monomers. This analysis showed the expected two peaks only in the first Principal components which accounted for ~38% of the variance in the dataset.

      Multibody refinement carried out on the combined particle dataset shows one very large PC accounting for about 38% of the variance and the presence of two distinct peaks in the particle distribution of the first PC.

      From this analysis it’s clear that there are two distinct classes in this particle set (as expected), as none of the other PC’s shows any signs of multiple peaks, this analysis suggests that two distinct models are the best representation of this eukaryotic PSII. Whether these are quasi continuous or distinct is more complex. There is continuity in this representation (particle distributions along PC), a different picture may appear if characters such as CP29 state are considered, but the size of CP29 and the remaining heterogeneity does not provide enough signal to carry out this classification at the moment.

      Considering the stacked PSII, I also have a few concerns. Contrary to previous studies the authors do not assign a functional role to the stacking beyond the structural aspect. This could be better backed by a discussion about the closest chlorophyll a molecules across the stacked PSII, which given the rather large distance shown in fig. 4L seems to be too large for any EET across the stromal gap.

      The closest chl-chl distance that we can measure in the stacked PSII dimer is ~54 Å, with most distances at the ~70 Å range, making EET between staked complexes very slow. We have added a statement clarifying this to our manuscript. In our opinion a structural role for the staked PSII dimer is more likely.

      There is a report that suggests the presence of some density between the stacked PSII - could the authors comment on the differences between it and their work? Are the angles and positions conserved between these types of stacks? https://doi.org/10.1038/s41598-017-10700-8

      We referred to Albanese et al, in our manuscript. We isolated the C2S2 complex from green alga, the analysis in Albanese et al was done on C2S2M1 complexes from pea and this can account for some of the differences. At any rate, our conclusion that we don’t find any evidence for protein linkers in the stacked complex is stated clearly. The angles described in Albanese et al are consistent with our analysis.

      Line 387, the authors state that due to the transient nature of the interactions across the stromal gap, the stacks could be "under-detected" in cryo-ET data. This statement is in my opinion misformulated. For once, the transient interaction argument would apply the same (if not more due to changing conditions induced by the purification process) to the single particle analysis performed in this paper. Second, tomographic volumes detect hundreds of PSII in a suspended state. Any transient interaction that adds up to 25% of particle population in a steady state cell should be clearly visible, while the in situ data suggests not more than random cross-stromal-gap orientations. Of course, this can be a specificity of Chlamydomonas or a particular growth condition. The statement used by the authors could be indeed converted into: the PSII stacks are over-detected in vitro, and it is certainly a simpler explanation for their presence. It is also important to mention that PSII stacking alone is not the only reason for grana architecture - stacking with the antenna of larger complexes, absent in the authors' preparation could also contribute to grana maintenance; and auxiliary proteins such as CURT help with this issue as well. Here a recent demonstration of the importance of minor antenna should probably be also cited: https://doi.org/10.1101/2021.12.31.474624

      We used the term “flexible” rather than “transient” to describe the interactions within the stacked PSII dimer. Our data (and tomographic data) do not contain any temporal component. When we used the term under-detected we refer to the fact that PSII is mainly detected by the luminal extrinsic subunits. The flexibility detected in our analysis may affect the concurrent visibly of these features in the PSII complexes making up an individual PSII stack. Specifically, Wietrzynski et al mainly analyze C2S2M2L2 complexes while our analysis only contained C2S2 complexes. It is likely that the different amount of bound LHCII affect PSII stacking as well. For example, Wietrzynski et al, show some overlap between LHCII complexes and little overlap between cores in the larger complexes they analyzed. We observe mainly core to core overlap with little LHCII overlap in the smaller C2S2, although we did not observe any states where LHC’s were not included in what appear to be the binding interface. We agree with the reviewer on the relevance Lhcb’s and CURT contributions to stacking but prefer to focus on what was directly demonstrated in our data. We clearly note that we are discussing in-vitro results.

      Taking these last thoughts, I would like to finish by mentioning one more thing - almost philosophical. The authors are certainly at the forefront of the booming cryoEM revolution in biology which is profoundly changing the way we understand the living. There is absolutely zero doubt that this powerful technique is of the highest interest. But a growing number of structures of photosynthetic complexes remain puzzling, in particular with regard to their abundance in vivo (such as the PSII stacks) and functional relevance. How do we ascertain that these interactions are not due to in vitro preparation (isolation from cells, solubilisation)? Which ways can we use to try to exclude this (simple) hypothesis? I suggest that at least a small extent of biological replicas - experiments performed on separate batches, in different technical conditions, with slightly altered solubilization conditions, and so on - could shed light on the nature of these structures and their occurrence in vivo. Technical reps of the freezing+analysis pipeline could also be tried to see the variability. This would strongly reinforce this manuscript and its conclusions, and while not completely unequivocal (the stacked PSII, for example, could form upon each purification), a quantification of the effects would be of high interest.

      We certainly share the reviewer hope of being able to conduct cause and effect cryoEM experiments covering a complete set of experimental parameters. This is still beyond reach in terms of time and cost. Within each cryoEM experiment, however, all the analysis is consistent and, more importantly, transparent with regards to image analysis, which is the most important factor in our opinion. Preparation artefacts are always a possibility but, in our opinion, cryoEM is not affected by them differentially compared to other techniques. As we mentioned above, the particles are being observed suspended in vitreous ice, this is not different, and one can say even better, then numerous low temperature spectroscopic observations on samples suspended in glass state or crystals obtained in the presence of high concentrations of various agents. One thing that validates structural studies are the chemical details (bond lengths and angles etc…) underlying every model which are consistence with known values to close tolerances.

      Reviewer #3 (Public Review):

      In this manuscript, Caspy et al. present a detailed structural analysis of eukaryotic photosystem II (PSII) isolated from the green alga Dunaliella salina. By combining single-particle cryo-EM with multibody refinement, the authors not only reveal a high-resolution (2.4Å) structure of the eukaryotic PSII, but also demonstrate alternate conformations and intrinsic flexibility of the overall complex. Stretched and compact conformations of the PSII dimer were readily identified within the single-particle dataset. From this structural analysis, the authors propose that excitation energy transfer properties may be modulated by changes in transfer distance between key chlorophyll molecules observed in different conformational states of the PSII dimer. Due to the high resolution of the maps obtained, the authors identify post-translational modifications and a sodium binding site based on the observed cryo-EM maps. Additionally, the authors analyze PSII complexes in stacked and unstacked configurations, and find that compact and stretched states also exist within the stacked PSII complexes. From their cryo-EM maps, the authors demonstrate that there is no direct protein-protein interaction between stacked PSII complexes, and rather propose a model wherein long-range electrostatic interactions mediated by divalent cations such as magnesium, can facilitate PSII stacking.

      The conclusions and models presented in the manuscript are mostly well justified by the data. The cryo-EM maps are high quality and the models appear generally well refined. However, some aspects of data processing and analysis, as well as the resultant conclusions need to be clarified.

      1) In general, it is not clear from the cryo-EM processing workflow (suppl. Fig 1) or the methods section when exactly symmetry was applied during 3D classification and refinement. In the case of C2S2 unstacked particles, when was symmetry first applied in the overall processing workflow? To identify the compact and stretched configurations of C2S2, did the 3D classification without alignment (and/or the refinement preceding this classification) have C2 symmetry applied? If so, have you considered the possibility that some particles may actually be asymmetric in some regions?

      We modified figure S1 to clearly indicate the use of symmetry and particle expansion. In general, we refined most of the particle sets without symmetry (C1). At the final processing stage of the unstacked PSII sets, after we separated both conformations, we used C2 symmetry to expand the data, this was followed by multibody refinement. No symmetry or symmetry expansion was used for the stacked PSII particle sets.

      2) Following multibody refinement in Relion individual maps and half-maps for each body will be generated. There is no mention in the methods of how these individual maps for each C2S2 "monomer" were combined to produce an overall map of the dimer following multibody refinement. There are several methods currently used to combine such maps, including taking the maximum or average of the two maps or using a model-based approach in phenix. The authors should be explicit about the method they used, any potential artifacts that may develop from this map combination process, and/or the interface between masks used in multibody refinement.

      We used phenix.combined_focused_maps to combine the maps. This is now indicated in the method section.

      3) In addition to the point raised above, following multibody refinement there will be an individual FSC curve and resolution for each body. However, in supplemental figure 2 and supplemental table 1, only a single FSC curve and resolution are reported. Are these FSC curves/resolutions only reported for the better of the two bodies? If not, how was a single resolution calculated for the overall map of combined bodies?

      Both FSC curves were calculated and were highly similar, as expected following C2 expansion. This can also be evaluated from the local resolution maps which are highly similar between the two bodies. The reported resolutions are all taken from the displayed FSC curves generated through relion PostProcess.

      4) One of the major conclusions from the 3D classification and multibody refinement is that conformational changes and inherent flexibility of the PSII dimers have the potential to change distances between cofactors in the complex, ultimately leading to altered excitation energy transfer. However, it is unclear whether or not the authors believe one conformation over another may more readily support the evolution of oxygen. It would be nice if the authors could elaborate slightly upon this topic in the discussion.

      As discussed above the structural changes associated with the formation of quenching centers are not expected to be detected in the current work. The changes we observe can however affect the transfer to such centers and by doing so can play an important part in PSII biology. We do not detect any changes around the OEC and we don’t find any reason to think the two conformations are different with respect to their ETC.

      5) Along the lines of point 4 above, on line 95 the authors claim that "the high specific activity of 816 umol O2/ (mg Chl * hr) suggest that" both the C2S2 compact and stretched conformation are highly active. However, it is not clear to me why this measure of specific activity would suggest that both PSII conformations should have "high" activity. Maybe a reference here would help guide readers to previous measures of specific activity?

      Looking at specific activity from previously published structural studies on eukaryotic PSII we find that Sheng et al, 2019 reported on a specific activity of 272 mol O2/ (mg Chl * hr), this difference can stem partially from the presence of larger complexes in their preparation and is comparable to the activity that we measured in our As fraction (276 mol O2/ (mg Chl * hr), Figure 1-figure supplement 9). Reported specific activity values from plants (Pisum sativum) are also similar, Su et al, reported on a maximal value of 288 mol O2/ (mg Chl * hr), again, for larger complexes which can explain some of the difference. However, the specific activity measured for the C2S2 PSII isolated in the current study is 2.8 X higher than this value, more than the differences in chl content which ranges between 1.5 X to 2 X in favor of the larger complexes. If either one of the conformations is not as active, it would only mean that the other conformation will display even higher specific activity which seems less likely. In addition, we find no difference around the oxygen evolution center or in the peripheral luminal subunits in both the shape or map strength so both orientations show highly similar structures around these regions which determine the oxygen evolution activity.

      6) It is claimed that "more than 2100 water molecules were detected in the C2S2 compressed model", and the water distribution is shown in Figure 3. Obtaining resolutions capable of visualizing waters with cryo-EM is still a significant challenge. Upon visual inspection of the map supplied, it appears that several of the waters that were built into the atomic model simply do not have supporting peaks in the coulomb potential map above the level of noise. While some of the modeled waters are certainly supported by the map, in my opinion, there are many waters that simply are not, or at best are questionable. What method or tool was originally used to build waters into the model, and how were these waters subsequently validated during structure refinement?

      We followed standard methods for water placement and refinement in the preparation of the model, in addition to manually curating the water structure. However, in light of the reviewer comment we undertook additional rounds of refinement and inspection of the water molecules in the model. We removed a few hundred water molecules so that the total number of water molecules is now around 1700. All the water molecules in the present model should be well supported at maps values higher then 2.5 sigma and in our opinion the current water model should be regarded as conservative and underestimates the number of bound water molecules. This also led to some improvements in additional validation statistics of the model which are listed in the Table 1. The new model has been deposited in the PDB and the new PDB validation report is included in our resubmission.

      7) The authors claim to identify several unique map densities during model building. One of these is a sodium ion close to the OEC, which is coordinated by D1-His337, several backbone carbonyls, and a water molecule. When looking closely at the cryo-EM map supplied, it appears that the coulomb potential map is quite weak for this sodium, and is only visible at quite low contour levels. In fact, the features for the coordinating water, and chloride ions located ~7-9A away are much stronger than the sodium. Do the authors have any explanation for why the cryo-EM map is significantly weaker for the sodium compared to the coordinating water or chloride ions in the same general vicinity? Similar to what they did for the other post-translational modifications, the authors should consider showing the actual cryo-EM map for the bound sodium in supplemental Figure 10 a,b.

      Our main support for the placement of a Na+ ion in this location stems from the analysis of Wang et al. Our maps show the presence of a density which is discernible at 4 σ with an elongated shape suggesting the presence of multiple atoms/waters. Although in principle positive ions should have very strong densities in cryoEM maps due to their interactions with electrons, other factors such as occupancy, coordination and b-factor also play a role making the distinction between water and sodium complicated and case specific. The sodium peak is not observed in unsharpened maps (as do most of the water molecules which occupy conserved positions).

        We collected a few examples from comparable cases (cryo-EM maps of similar resolution ranges) where the presence of sodium ions is highly probable based on additional evidence. These maps densities highlight the factors we discussed above. In cases ‘a’ (dual oxidase 1 prepared in high sodium conditions) and ‘b’ (human voltage-gated sodium channel), Na+ is observed in a highly coordinated states and especially in ‘a’ shows the expected increase density values compared to water molecules. However, cases ‘d’ (human Na+/K+ P type Atpase) and ‘e’ (voltage-gated sodium channel) appear very similar to the proposed Na+ assignment in PSII. We conclude that map density alone is not enough to distinguish between Na+ and water molecules and rely on the additional experiments described by Wang et al. which show increase PSII activity in elevated Na+ levels in basic conditions.

      8) The cryo-EM maps showing CP29-Ser84 phosphorylation and CP47-Cys218 sulfinylation are quite convincing. However, it is interesting that these modifications are only observed in the compact conformation, and not in the stretched conformation. Can the authors elaborate on whether or not they believe the compact and stretched conformations could be a result of these posttranslational modifications, or vice versa?

      This is an interesting suggestion. In our opinion it is less likely that the modification themselves trigger the transition between compact and stretched states. It is not clear how these modifications will stabilize the compact vs the stretched states. It is equally likely that these modifications are somehow triggered by the structural change. We cannot be certain that these modifications are not present in the stretched orientation as well but remain unobserved due to resolution differences. The correlation between the states and post translation modifications should be verified before a discussion on their possible roles in the transitions.

      9) Do the authors believe that PSII dimers in the solution can readily interconvert between compact and stretched conformations? Or is the relative ratio of these conformations fixed at the time of membrane solubilization with decyl-maltoside?

      We think that its more probable that the transition between these states occur in the membrane phase. The main reason for this will be that pigment loss and structural transitions in CP29 are more likely to occur in the membrane rather than in aqueous/micelle environments.

      10) The model proposed for divalent cation-mediated stacking of PSII dimers is compelling, and seems to be in agreement with previous investigations that observed a lack of stacked dimers in cryo-EM preparations lacking calcium/magnesium. However, my understanding from reading the methods section is that the observed lack of density between the stacked PSII dimers was inferred from maps obtained after multibody refinement. Based on the way the masks to define bodies were created for multibody refinement (Fig. 4A), the region between stacked dimers would be highly prone to map artifacts following multibody refinement. Have the authors looked closely at the interfacial region between stacked dimers following conventional 3D classification/refinement to ensure that there are indeed no features observed in the interfacial region even at low contour levels?

      We’ve made several attempts to resolve differences in the space between the stacked PSII dimer. These include focused classification with masks containing selected volumes from this regions and masks that include only one of the stacked PSII dimers to avoid signal subtraction in this region. All of these did not reveal any discernible features in this region. In addition, any stable binding of a bridging protein across the stacked dimer will probably be at least partially visible as additional density over the unstacked PSII. We searched for such features and found none.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses

      The author's approach, as with traditional approaches to molecular identification of vector species, relies on expert entomologists capable of identifying mosquitoes in the field which is rare in most places. The authors do not provide citations for the taxonomic keys used for morphological identification, which in many places are outdated or unavailable for specific locations.

      We have added references for taxonomic identification keys in lines 677–679.

      The authors give no explanation as to why they chose rRNA-seq as their method of next-generation sequencing, which is most commonly used for transcriptomics, instead of traditional DNA-based metagenomics which is more commonly used to define community relationships as would be more appropriate for this study.

      We have added a sentence in the Introduction (lines 65–66) to explain why RNA-seq is a frequent choice for surveillance and virus discovery in mosquitoes.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper shows that nuclear pore complex components are required for Kras/p53 driven liver tumors in zebrafish. The authors previously found that nonsense mutation in ahctf1 disrupted nuclear pore formation and caused cell death in highly proliferative cells in vivo. In the absence of this gene, there are multiple mitotic functions involving the nuclear pore that are defective, leading to p53 dependent cell death. Heterozygous fish are viable but have reduced kras/p53 liver tumor growth, and this is associated with multiple nuclear and mitotic defects that lead to cancer cell death/lack of growth. This therapeutic window suggests targetability of this pathway in cancer. I think the data are robust, rigorous, and clearly presented. I believe this in vivo work will encourage therapeutic targeting of NPCs in cancer.

      We are pleased that this reviewer believes that our data are robust, rigorous, and clearly presented and that our in vivo work will encourage therapeutic targeting of NPCs in cancer.

      Reviewer #2 (Public Review):

      Overall this is a very interesting and important paper that demonstrates a novel synthetic interaction between nucleoporin inhibition and oncogene-driven hyperproliferation. This work is especially significant because of the paucity of effective treatments for hepatocellular carcinoma (HCC). The authors' demonstration that the Nup inhibitor Selinexor decreases larval liver size in KRAS-overexpressing zebrafish but does not cause toxicity in wild-type animals lays the groundwork for exploiting this class of drugs in HCC treatment. This paper represents an elegant demonstration of the utility of zebrafish models in cancer studies. The relevance of this work to human cancer is supported by the authors' studies using TCGA data, wherein they demonstrate that decreased NUP expression is associated with increased survival in HCC.

      Other major strengths of the paper include beautiful pictures demonstrating that ahctf1+/- decreases the density and volume of nuclear pores in TO(kras) larvae and increases the rate of multipolar spindle formation, misaligned chromosomes, and anaphase bridges. The experiments are very well-controlled, including detailed analysis of the effects of ahctf1 heterozygosity and Selinexor on wild-type animals. The inclusion of distinct methods for disruption nucleoporins (ranbp2 heterozygosity and drug treatment) bolsters the authors' conclusion that this represents a viable drug target in HCC.

      My major concerns are as follows:

      1) The authors state that "the beneficial effect of ahctf1 heterozygosity to reduce tumour burden persists in the absence of functional Tp53, due to compensatory increases in the levels of tp63 and tp73". However, tp63 and tp73 appear similarly upregulated in ahctf1 heterozygotes regardless of tp53 status. The authors do not provide enough evidence that tp63 and tp73 are compensating for tp53 loss. An alternative possibility based on the data presented is that the effects of ahctf1+/- are independent of tp53 family members, and the effects on apoptosis go through a different pathway.

      We agree with this reviewer that we did not provide enough evidence that tp63 and tp73 are compensating for tp53 loss. Accordingly, we have addressed this issue comprehensively.

      2) The authors state in multiple locations that nucleoporin inhibition decreases tumor burden. In my opinion, this is not strictly correct. The TO(kras) model clearly results in HCC in adults, but it's a little unclear whether the larval liver overgrowth is truly HCC or not based on the original paper by Nguyen et al. (2012 Dis Model Mech).

      We agree with these comments and accordingly, we performed several new experiments in adult fish.

      Reviewer #3 (Public Review):

      The nuclear transport machinery is aberrantly regulated in many cancers in a context-dependent fashion, and mounting evidence with cultured cell and animal models indicates that reducing the activity or expression of certain nuclear transport proteins can selectively kill cancer cells while sparing nontransformed cells. Here the authors further explore this concept using a zebrafish model for hepatocellular carcinoma (HCC) induced by liver-specific transgenic expression of oncogenic krasG12V. The transgene causes greatly increased liver size by day 7 in larvae, associated with a gene expression profile that resembles early-stage human HCC. This study focuses on Ahctf1, a nuclear pore complex (NPC) protein known to be essential for postmitotic NPC assembly. Using the krasG12V background, the authors analyze animals that are heterozygous for a recessive mutation in the ahctf1 gene that leads to ~50% reduction in ahctf1 mRNA (and likely the encoded protein). The authors show that the ~4-fold increase in liver volume of krasG12V animals is reduced by ~1/3 in the ahctf1 heterozygous mutants. This is associated with increased apoptosis, decreased DNA replication, up-regulation of pro-apoptotic and cdk-inhibitor genes, and down-regulation of anti-apoptotic gene. These effects found to be substantially Tp53-dependent. Consistent with previous Ahctf1 depletion studies, hepatocytes of ahctf1 heterozygotes show decreased NPC density at the nuclear surface, elevated levels of aberrant mitoses and increased DNA damage/double stranded breaks. Finally, the authors show that combining the achtf1 heterozygous mutant with a heterozygous mutation in another NPC protein- RanBP2- or treating animals with a chemical inhibitor of exportin-1 (Selinexor) can further reduce liver volume. Overall they suggest that combinatorial targeting of the nuclear transport machinery can provide a therapeutic approach for targeting HCC.

      This is an interesting study that bolsters the notion that reduction in the levels of discrete nucleoporins (and/or inhibiting specific nuclear transport pathways) can result in cancer cell-selective killing. Moreover, the work extends previous studies involving cultured cell and mouse xenografts to a new cancer model (HCC) and nucleoporin (Ahctf1). Whereas the authors describe multiple aberrant cellular phenotypes associated with the dosage reduction in ahctf1, the exact causes for reduction in liver size in the krasG12V model remain unclear. Although it would be desirable to parse effects of Ahctf1 related to NPC number, aberrant mitoses, licensing of DNA replication and chromatin regulation, this is a tall order at present, given the limited understanding of Ahctf1. However, useful insight on these and related questions could be gained with further analysis of the system as outlined below.

      We are pleased this reviewer thinks this is an interesting study that bolsters the notion that reduction in the levels of discrete nucleoporins (and/or inhibiting specific nuclear transport pathways) can result in cancer cell-selective killing. This reviewer also suggests that useful insight on these and related questions could be gained with further analysis of the system as outlined below:

      1) In the krasG12V model, it would be helpful to distinguish the contribution of increased cell death vs decreased cell proliferation to the change in liver size seen with heterozygous ahctf1. Is this predominantly due to decreased proliferation?

      We think this question is difficult to address, because the relative contributions of the two processes may vary with time. Our data show definitively that by 7 dpf, the impact of ahctf1 heterozygous mutation has disrupted multiple cellular processes, leading to a 40% increase in the number of hepatocytes expressing Annexin 5 (dying cells), and a 40% decrease in the number of hepatocytes incorporating EdU over a 2 h incubation (fewer cells in S-phase). Both responses are likely to contribute to the reduction in liver volume observed in response to ahctf1 heterozygosity. It is worth stating that in our experiments, we captured snapshots of apoptosis and DNA replication in the livers of larvae at 7 days post-fertilisation after 5d of dox treatment/KrasG12V expression. To answer the Reviewer’s question properly, we would need to monitor the behaviour of individual cells over time. If such experiments were technically possible, we think that some cells that undergo growth arrest in response to dox treatment might ultimately succumb to apoptosis (unless dox treatment is withdrawn) while other cells might enter into a state of prolonged senescence. However, given the technical challenges, we did not attempt to test this in the current manuscript.

      2) It would be good to know whether the heterozygous ahctf1 state blunts the overall level of Ras activity in krasG12V animals.

      We have addressed this interesting question thoroughly in new Fig. 1g, h. To do this, we used a commercial RAS-RBD pulldown kit followed by western blot analysis to determine the levels of activated GTP-bound Kras protein. Our results demonstrate that the levels of GTP-bound Kras protein, expressed as a proportion of total Kras protein, do not change in response to ahctf1 heterozygosity. We conclude from these data that the potentially therapeutic value of reduced ahctf1 expression in a cancer setting is not caused by inhibiting Kras activity.

      3) Notwithstanding the analysis of Tp53 target genes presented in this study, it would be helpful to see detailed transcriptional profiling of hepatocytes in the krasG12V model with the heterozygous ahctf1 mutation, and to assess the effects of Selinexor. GSEA type analysis offers a way to start untangling the effects of these pathways. Moreover this analysis could provide insight on the relevance of this model to human HCC.

      We used RNAseq to address the relevance of our larval model to human HCC. Specifically, we performed differential gene expression analysis to identify up- and downregulated genes in cohorts of ahctf1+/+ (WT) larvae versus dox-treated ahctf1+/+(WT);krasG12V larvae. We used gene set enrichment analysis to compare these differentially regulated transcripts with the gene expression signature of 369 patient samples in the Liver hepatocellular carcinoma (LIHC) dataset versus healthy liver samples in the TCGA. These analyses revealed a significant association between the patterns of gene expression in our larval model of zebrafish HCC and those of human HCC (Fig. 1-figure supplement 1c, d).

      The genetic experiments we report in Figures 4, 5, 6 show that WT Tp53 is required for the reductions in liver enlargement (Fig. 4), apoptosis (Fig. 5) and DNA replication (Fig. 6) that occurs in response to ahctf1 heterozygosity in dox-treated krasG12V larvae. We also used RT-qPCR to show that a Tp53-mediated transcriptional program was activated in these ahctf1 heterozygous livers (Fig. 5). Similarly, in adult livers, ahctf1 heterozygosity triggered the upregulation of Tp53 target genes, including pro-apoptotic genes (pmaip1, bbc3, bim and bax) and cell cycle arrest genes (cdkn1a and ccng1) (new Fig. 6-figure supplement 1). These results show that to obtain the full potential of ahctf1 heterozygosity in reducing growth and survival of KrasG12V-expressing hyperplastic hepatocytes requires activation of WT Tp53. This is an important conclusion from our paper that is likely to be relevant in a clinical setting, for instance in patient selection, if ELYS inhibitors are developed for the treatment of HCC in which the KRAS/MAPK pathway is activated.

      Also, one reviewer mentions performing genome-wide transcriptional profiling of hepatocytes in the krasG12V model in response to ahctf1 heterozygosity and the presence and absence of Selinexor treatment. While these are potentially interesting experiments, they are substantial in nature and not crucial for the main messages of our paper. Therefore, we respectively contend that they are beyond the scope of the current manuscript.

      4) Functions of Achtf1 in regard to chromatin regulation could be compromised in this model. Scholz et al (Nat Gen 2019) report that Ahctf1 is involved in increasing Myc expression via gene gating mechanism. It would be good to know what the effects are in this system.

      The Scholz, 2019 and Gondor, 2022 papers from the same group, are very interesting in that they demonstrate a novel role for the ELYS protein in addition to the ones we pursued in our paper. The authors showed that in HCT116 cells, a human colorectal cancer cell line in which proliferation is driven by aberrant WNT/CTNNB1 signalling, the longevity of nascent MYC mRNA was increased by accelerating its movement from the nucleus to the cytoplasm, thereby preventing its degradation by nuclear surveillance mechanisms. The authors showed that siRNA knockdown of AHCTF1 in HCT-116 cells reduced the rate of nuclear export of MYC transcripts without changing the transcriptional rate of the MYC gene. They proposed a mechanism that depended on the formation of a complex chromatin architecture comprising transcriptionally active MYC and CCAT1 alleles plus proteins including β-Catenin, CTCF and ELYS. Together these interacting components guided nascent MYC mRNA molecules to nuclear pores, enhanced their export to the cytoplasm to be translated, resulting in activation of a MYC transcriptional program that induced expression of pro-proliferation genes.

      In theory, this role of ELYS in protecting MYC from nuclear degradation might extrapolate to other cancer settings where MYC expression is elevated. While interplay between MYC and mutant KRAS to enhance cancer growth has been previously reported, to date, most emphasis on this interaction has focused on the role of mutant KRAS in increasing the stability of the MYC protein, for example via RAS effector protein kinases (ERK1/2 and ERK5) that stabilise MYC by phosphorylation at S62 (Farrell and Sears, 2014: https://doi.org/10.1101/cshperspect.a014365) (Vaseva and Blake 2018: DOI:https://doi.org/10.1016/j.ccell.2018.10.001). While we appreciate the novelty of the recent papers, the current findings are limited to -Catenin activated HCT-116 cells and may not be relevant to our zebrafish model of mutant Kras-driven HCC. Accordingly, we have not allocated a high priority to following this up in our current manuscript.

      6) The synthetic lethality argument pressed in this manuscript seems exaggerated. Standard anti-cancer treatments typically target several cellular pathways, and nucleoporins directly affect a multiplicity of pathways besides nuclear transport.

      While we do not disagree that standard anti-cancer treatments may target several cellular pathways, we believe our data are consistent with the accepted definition of a synthetic lethal interaction whereby single mutations in two separate genes (kras and ahctf1) cooperate to cause cell death, whereas cells harbouring just one of these mutations are spared.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Context and definitions for stochasticity and heritability: The authors provide well-referenced introductions and explanations throughout the manuscript. However, key understanding of concepts for their central hypothesis on transient heritability are not shared until well into the results sections (Lines 215-227), leaving the introduction somewhat unclear on the authors thinking and motivation. The manuscript would benefit by including clear definitions of "stochastic", "transiently heritable", and "heritable" and their relationships to "intrinsic" and "deterministic" in the introduction.

      Regarding the first point, we agree it is important to include clear definitions timely. Therefore, we added much more detail to the introduction (see tracked changes), and added the following definitions and additional explanations:

      Multilayered stochasticity: “stochasticity originating from different levels over the course of an infection.“

      “Importantly, although the terms stochasticity and determinism seem highly dichotomous, deterministic features (e.g., epigenetic regulation) are often, if not always, stochastically regulated (Zernicka-Goetz and Huang, 2010). However, in cellular decision-making, the major difference between a stochastic process and a deterministic process boils down to the effects of (varying) inputs on dictating (varying) outputs. In fact, a stochastic process in characterized by the exact same stimulus leading to varying response outcomes, often as a result of varying host-intrinsic factors (Symmons and Raj, 2016). In contrast, a deterministic process is characterized by an outcome (e.g., IFN-I production) that is fixed, or at least to a large degree, while the input can be variable. How cells are epigenetically predispositioned, in turn, can again be a stochastic process, similar to the fundamentals of developmental biology in which cells are randomly pushed towards deterministic outcomes (Zernicka-Goetz and Huang, 2010).”

      “Transient heritability refers to heritable epigenetic profiles [e.g., profiles encoding cellular fates for the production IFN-Is] that only transfer over a couple of generations, as observed across cell types and systems including cancer drug resistance (Shaffer et al., 2020), cancer fitness (Fennell et al., 2022; Oren et al., 2021), NK cell memory (Rückert et al., 2022), HIV reactivation in T cells (Lu et al., 2021), epithelial immunity (Clark et al., 2021), and trained immunity (Katzmarski et al., 2021).”

      “Besides a growing body of evidence on the role of transient heritable fates dictating cellular behaviors, the effects of population density, often referred to as quorum sensing, are getting more established for immune (signaling) systems (Antonioli et al., 2019; Polonsky et al., 2018; Van Eyndhoven and Tel, 2022). On top of the intrinsic features characterized by stochasticity and determinism, individual immune cells can communicate in various ways to elicit appropriate systemic immune responses. Typically, cytokine-mediated communication is categorized into two types: autocrine and paracrine signaling. Autocrine signaling is defined by cells secreting signaling molecules while simultaneously expressing the cognate receptor. Paracrine signaling is defined by cells either secreting signaling molecules without expressing the cognate receptor, or cells expressing the receptor without secreting the molecule. In essence, quorum sensing can be considered a phenomenon in which autocrine cells determine their population density based on cells engaging in neighbor communication, but without self-communication (Doğaner et al., 2016; Van Eyndhoven and Tel, 2022). Especially in the presence of other competitive decision makers [i.e., cytokine consumers and producers], it is critical for individual cells to assess cellular density, and act accordingly (Oyler-Yaniv et al., 2017).”

      2) Generalizability of findings to other cell types, systems, and triggers: The cell line and Poly(I:C) delivery method used by the authors lacks sufficient characterization to extend the conclusions derived from its use. Notably, the NIH3T3-IRF7-CFP cell line expresses IRF7 constitutively and thus may only be a good model for cells with similar expression levels; many primary cells only express IRF7 at low levels or not at all until stimulated (PMID: 2140621). The conclusions would be greatly strengthened by demonstrating similar first responder dynamics/heritability in other cell types. The experiments measuring the efficiency of Poly(I:C) delivery by transfection lack sufficient resolution to determine if the Poly(I:C) is intracellular or membrane bound. IFN-I response kinetics, and potentially quality, would likely be distinct between cytosolic and endosomal sensing and may impact the likelihood of becoming a first responder.

      Regarding the generalizability of findings to other cell types, systems, and triggers, we thank reviewer 1 for binging up this crucial point. About the IRF7 expression, IRF7 is expressed at a low amount in most cells and is strongly induced by type I IFN-mediated signaling (Marie et al., 1998; Sato et al., 1998b; Honda et al., 2006). How we used the word “constitutively” refers to the IRF7 molecules always being fluorescent, not that IRF7 is always highly expressed in these cells. Therefore, NIH3T3 is similar to all other cells, except for plasmacytoid dendritic cells, which are known for their high background levels of IRF7. We changed the revised manuscript accordingly:

      “Accordingly, we used a NIH3T3:IRF7-CFP reporter cell line, expressing low, physiological background levels of IRF7-CFP fusion proteins, to monitor signaling dynamics during early phase IFN-I response dynamics (Figure 1b).”

      Regarding the comparison with other cell types, we emphasized the similar responders numbers observed in plasmacytoid dendritic cells (an argument that the intrinsic factor of IRF7 background differences is not determining responders). We changed the revised manuscript accordingly:

      “In short, IFN-I responses are elicited by fractions of so-called first responding cells, also referred to as ‘precocious cells’ or ‘early responding cells’, which start the initial IFN-I production upon viral detection, both validated in vitro, in vivo, and across cell types (Bauer et al., 2016; Hjorton et al., 2020; Patil et al., 2015; Shalek et al., 2014; Van Eyndhoven et al., 2021a; Wimmers et al., 2018).”

      “This percentage is in line with what has been found across literature, species [i.e., human and mice] and cell types [i.e., fibroblasts, monocyte derived dendritic cells, plasmacytoid dendritic cells], which ranges from 0.8 to 10% of early responders, emphasizing the elegant yet robust feature of only a fraction of first responding cells driving the population-wide IFN-I system (Bauer et al., 2016; Drayman et al., 2019; Patil et al., 2015; Shalek et al., 2014; Van Eyndhoven et al., 2021a; Wimmers et al., 2018).”

      Regarding the hypothesis brought up by the reviewer on the role of cytosolic versus endosomal sensing impacting IFN-I response kinetics, and potentially quality, we hypothesize otherwise. Shalek and colleagues tested LPS (TLR4 ligand), PIC (TLR3 ligand, endosomal), and PAM (TLR2 ligand), all eliciting similar early responding cells, which they called precocious cells. This argues that the phenomenon of first responders is independent of the type of stimulation. Besides, for plasmacytoid dendritic cells, both R848 (TLR7/8 ligand), and CpG-C (TLR9 ligand) elicit very similar early IFN-I responses. In contrast, R848 and CpG-C elicit very different late IFN-I response dynamics, reflected by the fraction and activation dynamic of second responders (yet unpublished). We clarified accordingly:

      “Moreover, various stimuli (live and synthetic) targeted membrane, cytosolic, and endosomal receptors, arguing that the mode of activation is not driving the discrepancies in responder fates.”

      3) Epigenetic regulation of transient heritability: To test the contribution of epigenetic regulation on first responder fate, the authors treat their cells with DNMTi. While treatment with this drug does increase the proportion of first responder cells, the authors don't provide evidence that the mechanism of action is mediated by inhibiting DNA methylation. This is further confounded by the reduced responder frequencies in DNMTi treated cells transduced with Poly(I:C) (Fig 4g). The authors offer an explanation for this observation, but their reported data (Fig 4h) doesn't measure whether DNMTi, leads to latent retrovirus activation, broader demethylation, or a combination of the two.

      We are well aware that the hypothesis on retrovirus activation are inconclusive. Unfortunately, we currently do not have the ability to utilize the tools to properly assess this hypothesis. Instead, we can only speculate. However, we were able to assess the effects of a different epigenetic drug [i.e., HDACi], as suggested later by the reviewer. Therefore, to strengthen our data interpretation, we added the following additional information and experimental data to the revised manuscript:

      “Also the treatment with varying dosages and durations of Trichostatin A, an histone deacetylase inhibitor (HDACi), increased the number of responding cells (Supplementary Figure 5).”

      “The rather long timescales of switching from responders to non-responders, and the other way around, imply epigenetic mechanisms at play, and indeed, prior work has indicated an important role for epigenetics dictating IFN-I response dynamics (reviewed in (Barrat et al., 2019)).”

      “Both methylation and histone acetylation have been suggested in dictating transient heritable cellular fates (Clark et al., 2021; Lu et al., 2021; Shaffer et al., 2020).”

      4) Temporal experimental data to validate and extend transient heritability and quorum sensing: Developing a model for cellular-decision making during early IFN-I responses, the authors formalize and test the hypothesis of transient heritability. While the data largely fit the model proposed (Fig 6D-F), the reported data points lack sufficient temporal resolution to validate the model during the earlier and more variable generations. Given that by generation 9 variability in first responder frequency has almost stabilized, there is only one data point (generation 6) to evaluate the fit of the ODE described. More densely sampled data points below generation 10 are necessary to validate the model. Moreover, a discussion of Kon calculation/observation, meaning, and validation is missing. To partially test their claim that Kon is a function of density (i.e., quorum sensing), the authors plate cells at different densities and measure the responder frequency at generation 6. This analysis lacks contextualization of other autocrine and paracrine signals potentially impacting IFN-I response. Moreover, these signals will be diverse in different cell types and could impact Kon and/or the overall model.

      We agree that our first model validation was suboptimal, indeed because of lacking sufficient temporal resolution. Therefore, we performed additional experiments on clones of generation 1, 2, 3, 4, 5, of which the results turned out to be remarkably robust. We changed the revised manuscript accordingly:

      “Surprisingly, the data obtained from clones of generation one through nine resulted in a mean higher than 2.134% (Figure 6d; Supplementary Figure 9), and a fluctuating CV (Figure 6e). From generation 13 onwards, both the mean and the CV start to meet the data obtained from the regular cultures again, which are similar to the theoretical outcomes of a stochastic process. Accordingly, we modeled first responders as a binary switch, where individual cells are either responding (ON) or nonresponding (OFF), similar to the transient heritable fates characterized and modeled before (Shaffer et al., 2020). Details on the ODE model are provided in the Materials and Methods section. We could fit the transient heritability model to the data when starting from 100% responders at generation zero [i.e., a single cell isolated from the regular culture]. Cells switch OFF after 5 generation on average, with a constant kon rate throughout. Interestingly, in generation zero we observed (nearly) only IFN-I responders, which we believe might be caused by single cells being deprived from any paracrine cues, which could include inhibitory factors that normally limited responsiveness. However, single IFN-I-producing cells [i.e., plasmacytoid dendritic cells and monocyte derived dendritic cells] encapsulated in picoliter droplets or captured in small microfluidic chambers did not display this behavior (Shalek et al., 2014; Wimmers et al., 2018). Instead, one could argue that single cells establish a different microenvironment, compared to a situation in which cells are close to neighboring cells, which elicits behavioral changes accordingly. The dimensions of microfluidic droplets and chambers are in the same range of cell-to-cell contacts in vitro, while single cells seeded for cloning are surrounded by rather massive areas and volumes without other cells present. Therefore, we hypothesize that these single cells lack biochemical, and perhaps biomechanical cues provided by dense cell populations, which result in behavioral changes in these cells, in our case, making them more responsive. Similarly, in quorum sensing, cells secrete soluble signaling molecules (called autoinducers), which enables cells to get a sense of their cell density (Postat and Bousso, 2019; Waters and Bassler, 2005). Without signaling of these molecules, cells perceive being isolated from the rest. In microfluidic droplets and chambers, these molecules accumulate, given the relatively small volumes.”

      Regarding the contextualization of autocrine and paracrine signaling impacting IFN-I response dynamics in these studies, we added the following additional information:

      “On top of the intrinsic features characterized by stochasticity and determinism, individual immune cells can communicate in various ways to elicit appropriate systemic immune responses. Typically, cytokine-mediated communication is categorized into two types: autocrine and paracrine signaling. Autocrine signaling is defined by cells secreting signaling molecules while simultaneously expressing the cognate receptor. Paracrine signaling is defined by cells either secreting signaling molecules without expressing the cognate receptor, or cells expressing the receptor without secreting the molecule. In essence, quorum sensing can be considered a phenomenon in which autocrine cells determine their population density based on cells engaging in neighbor communication, but without self-communication (Doğaner et al., 2016; Van Eyndhoven and Tel, 2022).”

      Regarding the point that signals will be diverse in different cell types and could impact Kon and/or the overall model, yes, but we expect this to be only minor. Besides, the model can be easily adjusted to the different parameters per cell type (see Saint-Antoine et al., 2022).

      Reviewer #3 (Public Review):

      1) For the small fraction of cells that respond in the absence of Poly(I:C), are these cells just showing IRF7 translocation or are they fully responding with IFNB production? Has this been observed in other experimental systems or contexts? Do you also observe secondary responders in the unstimulated samples (as shown in the stimulated in Fig. 2G-I)?

      Regarding the first point on the unstimulated translocated cells, excellent point. Although we have not experimentally validated it, we hypothesize that cells are able to produce constitutive levels of IFN-Is, as thoroughly described in literature, so we assume that these translocated cells produce IFN-Is. We provided additional speculation in the revised manuscript:

      “Besides, the background numbers of translocated cells possibly reflect the intrinsic feature of the IFN-I system to ensure basal IFN-I expression and IFNAR signaling to equip immune cells to rapidly mobilize effective antiviral immune responses, and homeostatic balance through tonic signaling (Gough et al., 2012; Ivashkiv and Donlin, 2014).”

      2) Do the second responders only arise through direct IFN-I production by first responders? Is it possible that this response has any relationship with the initial transfection with Poly(I:C)?

      From the droplet-based experiments with plasmacytoid dendritic cells performed before (Wimmers et al., 2018; Van Eyndhoven et al., 2021), we could conclude that the second responders indeed required the activation and subsequent early IFN-I production of first responders. Whereas droplet-based microfluidics is a very stable, and controlled method, producing thousands of homogeneous droplets, we concluded that the difference between first and second responders is not elicited upon variations in activation (e.g., transfection discrepancies).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use their expertise in live-cell imaging and mathematical modeling to further explore the relationship between chromatin structure, gene positioning and transcriptional coregulation. One of the strengths of the manuscript arises from the authors analysis of two publicly available datasets encompassing chromatin tracing and transcriptional activity. Using spatial analysis and modeling, the authors have impressively extended the findings of Su et. al, Cell 2020, who generated the analyzed dataset. A number of important concepts were explored including 1.) do genes re-position upon activation and 2.) can spatial proximity be correlated with transcriptional co-regulation. In general the authors conclusions are supported by their findings and should provide a blueprint for analysis of additional related big imaging datasets in the future.

      However there are a number of weaknesses including lack of statistical analysis or incomplete description (e.g. bootstrapping parameters, statistical methods, number of genes/cells/measurements, etc.) on some figures that make it difficult to interpret the significance of the trends. In addition, the modeling using live-cell studies is generalized based on a behavior (e.g. diffusion) of a single gene. The manuscript is densely written in a way that may be inaccessible for non-specialists. A final schematic model that summarizes biological findings would help alleviate this weakness.

      We are glad that the reviewer considers the observed phenomenon important and that our overall findings are consistent with our results. We implemented changes in response to each of the above requests:

      1) we added additional explanation of test statistics;

      2) we analyzed diffusion of additional genes;

      3) we tried to simplify the text;

      4) we added a final schematic.

      Reviewer #2 (Public Review):

      In their manuscript, Bohrer and Larson reanalyse previously published imaging datasets in order to tackle a long-standing question in modern genome biology: does the physical proximity of transcribed genes correlate with their co-expression?

      The authors start off by reanalysing fixed-cell data, in which they find that active genes (i.e., any gene with RNA FISH signal) often repositions towards the centroid of the imaged chromatin environment one transcriptionally active. The analysis is straightforward, but the notion of "closer to the centroid" remains a bit vague to me, and is not well defined as regards its functional significance. There is no doubt of the clear trend in the analysed data -- but the interpretation could be strengthened.

      We tried to clarify this part of the text and also added a schematic illustration to the discussion to help clarify this important point (Fig. 5).

      Then, using the same dataset, the question on physical gene proximity is addressed. This is not only an important and timely question, but also one which the authors address very nicely. They deduce that when a pair of loci are brought within sufficiently low physical 3D proximity (unrelated to their genomic distance) they are more likely than expected to be co-expressed. In cis, this distance can be defined to approx. <2.5 Mb of genomic separation. They also looked in trans, via a complex transfer of knowledge from live-cell imaging to the fixed-cell dataset, to show that genes brought within approx. 400 nm from one another display quite a high coexpression correlation. Despite the parsimonious nature of the model and assumptions that the authors use for this (testing more complex parameters might prove beneficial here), their postulations can quite adequately explain observations published by others that were previously left largely without interpretation.

      In my opinion, the main strength of this manuscript lies with the initial analysis of the fixed-cell data and the clear trends therein. The latter part, which nicely identifies caveats in available data and analyses and which makes a solid effort to combine live-cell with fixed-cell data, leaves more scenarios to be tested. Nevertheless, based on the outcome of this analysis (mostly found in Fig. 4), the value of ~400 nm as a physical proximity cutoff for co-expression is reasonable (based on previous knowledge) and does provide a solid first step in the direction of deciphering the rules that allow coordinated gene expression in mammalian cells.

      We agree that the modelling section is more of a first step and that future work will need to be done to investigate further. In the revision, we make this point explicit within the main text (See below).

      Overall, this is a conceptual advance of merit that can re-shape ways of approaching the stillopen issue of gene co-bursting in light of novel (mostly imaging) technologies.

      We appreciate the comment.

    1. Author Response

      Reviewer #2 (Public Review):

      This paper by Angueyra, et al., adds to the field’s current understanding of photoreceptor specification and factors regulating opsin expression in vertebrates. Current models of specification of vertebrate photoreceptors are largely based on studies of mammals. However, a great number of animals including teleosts express a wider array of photoreceptor subtypes. Zebrafish for example have 4 distinct cone subtypes and rods. The approach is sound and the data are quite convincing. The only minor weaknesses are that the statistical analyses need to be revisited and the discussion should be a bit more focused.

      To identify differentially expressed transcription factors, the authors performed bulk RNA-seq of pooled, hand-sorted photoreceptors. The selection criterion was tightly controlled to limit unhealthy cells and cellular debris from other photoreceptors subtypes. The pooling of cells provided a considerable depth of sequencing, orders of magnitude better than scSeq. The authors identified known transcription factors and several that appear to be novel or their role has not been determined. The data are made available on the PIs website as is a program to access and compare the gene expression data.

      The authors then used CRISPR/Cas9 gene targeting of two known and several novel factors identified in their analysis for effects on cell fate decisions and opsin expression. Phenotyping performed on the injected larvae is possible, and the target genes were applied and sequenced to demonstrate the efficiency of the gene targeting. Targeting of 2 genes with know functions in photoreceptor specification in zebrafish, Tbx2b and Foxq2 resulted in the anticipated changes in cell fate, albeit, the strength of the alterations in cell fate in the F0 larvae appears to be less than the published phenotypes for the inherited alleles. Interestingly, the authors also identified the expression of an RH2 opsin in the SWS2 another cone type. The changes are subtle but important.

      The authors then targeted tbx2a, the function of which was not known. The result is quite interesting as it matches the increase of rods and decrease of UV cones observed in tbx2b mutants. However, the injected animals also showed RH2 opsin expression but are now in the LWS cone subtype. These data suggest that Tbx2 transcription factors repress misexpression of opsins in the wrong cell type.

      The authors also show that targeting additional differentially expressed factors does not affect photoreceptor fate or survival in the time frame investigated. These are important data to present. For these or any of the other targeted genes above, did the authors test for changes in photoreceptor number or survival?

      We have attempted to address this point, but the answer is not clear cut. We used activated caspase-3 inmmunolabeling as a marker of apoptosis (Lusk and Kwan 2022). At 5 dpf, the age we chose to make quantifications, we don’t see an increase in activated caspase-3 positive cells when we compare control and tbx2a F0 mutants (Reviewer Figure 1A-B). Labeled cells are very rare and located near the ciliary marginal zone irrespective of genotype. This suggests that there is no detectable active death at this late stage of development in tbx2 F0 mutants. Earlier in development, at 3 dpf, when photoreceptor subtypes first appear, there is also a normal wave of apoptosis in the retina (Blume et al. 2020; Biehlmaier, Neuhauss, and Kohler 2001), resulting in many cells positive for activated caspase-3; our preliminary quantifications don’t show a marked increase in the number of labeled cells in tbx2a F0 mutants, but we consider that it’s likely that subtle effects might be obscured by the physiological wave of apoptosis (Reviewer Figure 1C-D).

      Reviewer Figure 1 - Assessment of apoptosis in tbx2a F0 mutants. (A-B) Confocal images of 5 dpf larval eyes of control (A and A’) and tbx2a F0 mutants (B and B’) counterstained with DAPI (grey) and immunolabeled against activated Caspase 3 (yellow) show sparse and dim labeling, restricted to cells located in the ciliary marginal zone, without clear differences between groups. (C-D) Confocal images of 3 dpf larval eyes of control (C and C’) and tbx2a F0 mutants (D and D’) immunolabeled against activated Caspase 3 show many positive cells, located in all retinal layers, as expected from physiological apoptosis at this stage of development and without clear differences between groups.

      Furthermore, the additional single-cell RNA-seq datasets we have reanalyzed suggest that tbx2a and tbx2b are expressed by other retinal neurons and progenitors and not just photoreceptors (Reviewer Figure 2), further confounding attempts at the quantification of apoptosis specifically in photoreceptor progenitors.

      Reviewer Figure 2 – Expression of tbx2 paralogues across retinal cell types. The transcription factors tbx2a and tbx2b are expressed by many retinal cells. Plots show average counts across clusters in RNA-seq data obtained by Hoang et al. (2020).

      At this stage, we consider that fully resolving this issue is important and will require considerably more work, which we will pursue in the future using full germline mutants and live-imaging experiments.

      Reviewer #3 (Public Review):

      Angueyra et al. tried to establish the method to identify key factors regulating fate decisions in the retinal visual photoreceptor cells by combining transcriptomic and fast genome editing approaches. First, they isolated and pooled five subtypes of photoreceptor cells from the transgenic lines in each of which a specific subtype of photoreceptor cells are labeled by fluorescence protein, and then subjected them to RNA-seq analyses. Second, by comparing the transcriptome data, they extracted the list of the transcription factor genes enriched in the pooled samples. Third, they applied CRISPR-based F0 knockout to functionally identify transcription factor genes involved in cell fate decisions of photoreceptor subtypes. To benchmark this approach, they initially targeted foxq2 and nr2e3 genes, which have been previously shown to regulate S-opsin expression and S-cone cell fate (foxq2) and to regulate rhodopsin expression and rod fate (nr2e3). They then targeted other transcription factor genes in the candidate list and found that tbx2a and tbx2b are independently required for UV-cone specification. They also found that tbx2a expressed in the L-cone subtype and tbx2b expressed in L-cones inhibit M-opsin gene expression in the respective cone subtypes. From these data, the authors concluded that the transcription factors Tbx2a and Tbx2b play a central role in controlling the identity of all photoreceptor subtypes within the retina.

      Overall, the contents of this manuscript are well organized and technically sound. The authors presented convincing data, and carefully analyzed and interpreted them. It includes an evaluation of the presented data on cell-type specific transcriptome by comparing it with previously published ones. I think the current transcriptomic data will be a valuable platform to identify the genes regulating cell-type specific functions, especially in combination with the fast CRISPR-based in vivo screening methods provided here. I hope that the following points would be helpful for the authors to improve the manuscript appropriately.

      1) The manuscript uses the word “FØ” quite often without any proper definition. I wonder how “Ø” should be pronounced - zero or phi? This word is not common and has not been used in previous publications. I feel the phrase “F0 knockout,” which was used in the paper cited by the authors (Kroll et al 2021), is more straightforward. If it is to be used in the manuscript, please define “FØ” and “CRISPR-FØ screening” appropriately, especially in the abstract.

      We have made changes to replace “FØ” to “F0.” In our other citation (Hoshijima et al., 2019), “F0 embryo” was used throughout the paper. Following our references and Dr Kojima’s suggestion, we adopted “F0 mutant larva” as the most straightforward and less confusing term. We have also made changes in the abstract to define our approach more clearly and made appropriate changes throughout the manuscript.

      2) Figure 1-supplement 1 shows that opn1mw4 has quite high (normalized) FPKM in one of the S-cone samples in contrast to the least (or no) expression in the M-cone samples, in which opn1mw4 is expected to be detected. The authors should address a possible origin of this inconsistent result for opn1mw4 expression as well as a technical limitation of using the Tg(opn1mw2:egfp) line for detection of opn1mw4 expression in the GFP-positive cells.

      In Figure 1 - Supplement 1, we had attempted to provide a summarized figure of all phototransduction genes, but the big differences in expression levels — in particular, the high expression of opsins genes — forced us to use gene-by-gene normalization for display. Without normalization, the expression of opn1mw4 is very low across all samples, and its detection in that sole S-cone sample can likely be attributed to some degree of inherent noise in our methods. We have revised Figure 1 - Supplement 1: we find that we can avoid gene-by-gene normalization and still provide a good summary of the expression of phototransduction genes if the heatmap is broken down by gene families, which have more similar expression levels. In addition, we have added caveats to the use of the Tg(opn1mw2:egfp) line as our sole M-cone marker in the results section describing our RNA-seq approach, including our inability to provide data on Opn1mw4-expressing M cones.

      3) The manuscript lacks a description of the sampling time point. It is well known that many genes are expressed with daily (or circadian) fluctuation (cf. Doherty & Kay, 2010 Annu. Rev. Genet.). For example, the cone-specific gene list in Fig.2C includes a circadian clock gene, per3, whose expression was reported to fluctuate in a circadian manner in many tissues of zebrafish including the retina (Kaneko et al. 2006 PNAS). It appears to be cone-specific at this time point of sample collection as shown in Fig.2, but might be expressed in a different pattern at other time points (eg, rod expression). The authors should add, at least, a clear description of the sampling time points so as to make their data more informative.

      We have included this information in the materials and methods. We collected all our samples during the most active peak of the zebrafish circadian rhythm between 11am and 2pm (3h to 6h after light onset) to avoid the influence of circadian fluctuations in our analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to develop an in vitro model of multiple species representing diversity in the CF airway as a platform for a range of studies on why polymicrobial communities resist therapy. The rationale for their design is sound and the methods appear justifiable and reproducible. The major strength of this work is in producing a method for a range of future work, ideally for multiple groups in the field. The primary findings are interesting but not groundbreaking. One weakness in the method of reporting interspecies interactions and another in evaluating alternative causes of lasR advantages present opportunities for a stronger research contribution beyond this terrific method.

      We thank the reviewer for this accurate summary of the data presented in our manuscript. We have addressed the raised concerned in the revised document. The modifications and comments can be seen in the “Essential Revisions” section above.

      Reviewer #2 (Public Review):

      Differences between the infection environment and in vitro model systems likely contribute to disconnects between the antimicrobial susceptibility profile of bacterial isolates and the clinical response of patients. The authors of this paper focus on a specific aspect of the infection environment, the polymicrobial nature of some chronic infections like those in people with Cystic Fibrosis (CF), as a factor that could impact antibiotic tolerance. They first use published genomic datasets and computational techniques to identify a clinically relevant, four-member polymicrobial community composed of Pseudomonas aeruginosa, Staphylococcus aureus, Streptococcus spp., and Prevotella spp. They then develop a high throughput methodology in which this community grows and persists in a CF-like environment and in which antibiotic susceptibility can be tested. The authors determine that living as a member of this community decreases the antibiotic tolerance of some strains of biofilm-associated P. aeruginosa and increases the tolerance of most strains of planktonic and biofilm-associated S. aureus and planktonic and biofilm-associated Streptococcus. They focus on the decreased tolerance of P. aeruginosa and determine that a ΔlasR mutant of P. aeruginosa does not display increased tobramycin susceptibility in the mixed community. One of the phenotypes associated with a ΔlasR mutant is an overproduction of phenazines. The authors find that by deleting the phenazine biosynthesis genes from ΔlasR, they can restore community-acquired susceptibility. They further investigate this phenomenon by showing that a specific type of phenazine, PCA, is significantly increased in mixed communities with the ΔlasR mutant compared to WT. Finally, they demonstrate that adding a specific phenazine, pyocyanin, to mixed communities can restore the tolerance of WT P. aeruginosa.

      Strengths:

      With this study the authors address a very important problem in infectious disease microbiology - our in vitro drug susceptibility assays do a poor job of mimicking the infection environment and therefore do a poor job of predicting how effective particular drugs will be for a particular patient. By demonstrating how an infection-relevant community modifies tolerance to a clinically relevant drug, tobramycin, the authors identify specific interactions that could be targeted with therapeutics to improve our ability to treat the chronic infections associated with CF. In addition, this study provides a framework for how to effectively model polymicrobial infections in vitro.

      The experiments in the paper are very rigorous and well-controlled. Statistical analysis is appropriate. The paper is very well-written and clear.

      The authors do an admirable job of using in silico analysis to inform their in vitro studies. Specifically, they provide a comprehensive rationale for why they chose and studied the specific community they did.

      The authors provide a very robust dataset which includes determining how strain differences of each of their four community members affect community dynamics and antibiotic tolerance. These types of analyses are laborious but very important for understanding how broadly applicable any given result is.

      We appreciate the reviewer’s thorough summary of our work and their positive comments.

      Weaknesses:

      The authors very clearly and convincingly demonstrate that WT P. aeruginosa becomes more susceptible to tobramycin in their mixed community. Our ability to turn these types of observations into therapeutic development depends on mechanistic insight. That said, it is unclear if the authors can make any solid conclusions about what specific aspects of the polymicrobial environment cause WT P. aeruginosa to become more susceptible. The authors make a compelling case that increased phenazine production by the ΔlasR mutant restores tolerance in the mixed community and that exogenous phenazine addition increases the survival of WT P. aeruginosa in the mixed community. However, it remains a plausible explanation that the effects of phenazines on tobramycin susceptibility are independent of the initial observation that WT. P. aeruginosa becomes susceptible to tobramycin in the mixed community.

      We agree with the reviewer’s comment here as it pertains to the initial observation of P. aeruginosa becoming more susceptible to tobramycin in the mixed community. However, as mentioned by the reviewer, we provide several lines of evidence that phenazines play a key role in the tolerance of the lasR mutant tobramycin, including genetic studies and feeding studies wherein exogenous addition of this molecule to WT P. aeruginosa phenocopies the lasR mutant exposed to tobramycin. Why the community impacts phenazine production of the WT strain is an open question, and the subject of future work. We have modified the abstract of the manuscript as follows at Lines 41–43:

      “Our data suggest that the molecular basis of this community-specific recalcitrance to tobramycin for the P. aeruginosa LasR mutant is increased production of phenazines.”

      Some aspects of the methodology are unclear. Specifically, the authors note that they use a specific sealed container system to grow their strains in anoxic conditions, which mimic portions of CF sputum. However, it is unclear how the authors change medium over the course of their experiments, or how they test susceptibility to tobramycin, without exposing the cells to oxygen. It is well understood that oxygen exposure impacts the susceptibility of P. aeruginosa to tobramycin, so it is very important that the methodology involving oxygen deprivation and exposure is described in detail.

      We have made the necessary modifications to the manuscript as indicated in the “Essential Revisions” section to address these concerns (see Comment #3). Furthermore, new validation experiments were performed in a controlled anoxic environmental chamber that yielded observations similar to the data presented in the original manuscript, thereby confirming that we were using anoxic conditions with the GasPak anaerobic jar system (see Figure 1 - figure supplement 2 and Figure 2 - figure supplement 7).

      Lines 198–204: “The impact of residual oxygen negatively influencing the growth of P. melaninogenica in monoculture was ruled out by performing these experiments using an anoxic environmental chamber (Figure 1 – figure supplement 2). That is, we did not detect CFU counts for either planktonic or biofilm populations of P. melaninogenica when grown in ASM in the anaerobic chamber, but as a positive control, significant growth was detected when using a medium shown previously to support growth of this microbe (10) (Prevotella Growth Medium, or PGM) (Figure 1 – figure supplement 2).”

      Lines 406–414: “Also, we ruled out the possibility of remaining oxygen in ASM negatively impacting the viability of P. melaninogenica by reproducing our results using an anoxic chamber (Figure 1 – figure supplement 2). That is, we observed that P. melaninogenica can robustly grow as a planktonic or biofilm monospecies community in a medium capable of sustaining its growth (PGM) while this microbe fails to grow in ASM (Figure 1 – figure supplement 2). Thus, we argue that the mixed-community-specific growth of Prevotella spp. we observed across several conditions (Figure 1C, Figure 1 – figure supplement 5, Figure 2 – figure supplement 6) is not due to residual oxygen.”

      Lines 290–293: “Growing and replenishing the preformed biofilm communities with fresh ASM supplemented or not with tobramycin using an anoxic environmental chamber resulted in similar phenotypes for all tested microorganisms (Figure 2 – figure supplement 7), indicating that the use of the GasPak system provides a robust anoxic environment.”

      Lines 533–540: “Plates were incubated using an AnaeroPak-Anaerobic container with a GasPak sachet (ThermoFisher) at 37 °C for 24 hours. Then, unattached cells were aspirated with a multichannel pipette and the pre-formed biofilms replenished with 100 µl of fresh ASM on the bench and incubated for an additional 24 hours at 37 °C using an AnaeroPak-Anaerobic container with a GasPak sachet (ThermoFisher). Similar experiments were performed using an anoxic environmental chamber (Whitley A55 - Don Whitley Scientific, Victoria Works, UK) with 10% CO2, 10% H2, 80% N2 mixed gas at 37 °C, yielding results identical to those observed for the GasPak system.”

      Reviewer #3 (Public Review) :

      This manuscript by Jean-Pierre et al. describes the creation and experimentation with a model CF lung community in an artificial sputum medium. The group uses data from 16S rRNA sequencing studies to select organisms for creating the model and then performs experiments to determine outcomes of growth competition and antibiotic tolerance in a community context. The main finding of the manuscript is that P. aeruginosa, notorious for its antimicrobial resistance phenotypes, is more susceptible to tobramycin in the community context than when grown alone. The manuscript is well prepared and follow-up experiments with mutant strains and phenazines greatly strengthen the project overall. The initial results paragraph where the authors go through the rationale for selecting the different organisms is perhaps a bit overkill, the organisms selected make sense based on their prevalence in CF airways, which in and of itself is a strong enough rationale. This aspect of the manuscript could be minimized to focus more on the exciting culture experiments in the latter parts of the results. Overall, this is a strong and well-crafted manuscript that will have a broad interest in the CF and microbial ecology fields.

      We thank the reviewer for this thoughtful review of our manuscript. We have not minimized the “front-end” of the paper because we believe the rationale for selecting the community and its members, and the validation of the model system are key for placing the resulting observations in a robust context, and for providing the underlying rationale to support the relevance of the findings.

      Major Critiques. I have two major critiques of this study.

      (1) Prevotella growth in monoculture. After reading the methods section it appears that the cultures were extensively washed and prepped prior to the inoculation into ASM. Prevotella did not grow alone, is this due to oxygen penetration of the cells during preparation? Perhaps oxygen is present in ASM prior to placement in an anaerobic bag? It is interesting, and perhaps worth exploring, whether the mixed community draws down oxygen from the media explaining the ability of Prevotella to grow. I suspect this is the case, but more detail is needed in the methods and this experiment would help us understand this interesting result.

      As presented in the “Essential Revisions” section (Comment #3), we have repeated the experiment using fully anoxic conditions (i.e., using an anoxic environmental chamber where the cultures were grown, washed and mixed before incubation) and still observed absence of growth of Prevotella cultivated in ASM in both biofilm and planktonic populations. Moreover, including a positive control, Prevotella Growth Medium, resulted in robust growth of this microbe. Taken together, our data suggest that residual oxygen in ASM is not the driver of the community-specific growth of P. melaninogenica.

      (2) Dilution of the community reproducing toby tolerance of P. aeruginosa. In supplemental figures, the replication of the 1:1000 dilution of the mixed community with P. aeruginosa shows poor replication and very large error bars. This experiment should be repeated to ensure it is reproducible.

      The diluted mixed community experiment was repeated a fourth time, yielding the same statistical conclusions. An updated “Figure 2 – figure supplement 1” was added to the paper. The highest (1:1000) dilution still yielded high variation which could perhaps be explained by low (i.e., ~103 CFU/mL) inoculum for S. aureus, S. sanguinis and P. melaninogenica used in these experiments; see updated “Microbial assays” paragraph of the “Materials and Methods” section). Thus, the variation at low inoculum is robust and reproducible. The Materials and Methods section was also updated to clarify the CFU counts used for those experiments. We have added modifications to the text as follows to address this critique:

      Lines 526–532: “The optical density (OD600) was then measured for each bacterial suspension and diluted to an OD600 of 0.2 in ASM. Monocultures and co-culture conditions were prepared from the OD600 = 0.2 suspension and diluted to a final OD600 of 0.01 for each microbial species in ASM corresponding to final bacterial concentrations of 1x107 CFU/mL, 3.5x106 CFU/mL, 1.2x106 CFU/mL and 4.6x106 CFU/mL of P. aeruginosa, S. aureus, Streptococcus spp. and Prevotella spp. respectively. A volume of 100 µl of bacterial suspension all at a final OD600 of 0.01 each in the mix was added to three wells.”

      Lines 558–570: “For experiments with varying concentrations of S. aureus, S. sanguinis and P. melaninogenica in monocultures and co-cultures, the organisms were grown from bacterial suspensions adjusted to an OD600 = 0.8 in ASM. Suspensions were further diluted in ASM to an OD600 of either 0.1, 0.001, 0.0001 or 0.00001 while maintaining P. aeruginosa at OD600 = 0.01 (approximating 1x107 CFU/mL) in all conditions. The OD600 = 0.1 dilution factor resulted in CFU/mL count average of 3.8x108 CFU/mL for S. aureus, 1.6x108 CFU/mL for S. sanguinis and 1.0x108 CFU/mL for P. melaninogenica. The OD600 = 0.001 dilution factor resulted in a CFU/mL count average of 6.7x105 CFU/mL for S. aureus, 1.1x105 CFU/mL for S. sanguinis and 1.4x105 CFU/mL for P. melaninogenica. The OD600 = 0.0001 dilution factor resulted in a CFU/mL count average of 4.2x104 CFU/mL for S. aureus, 3.3x104 CFU/mL for S. sanguinis and 4.6x104 CFU/mL for P. melaninogenica. The OD600 = 0.00001 dilution factor resulted in a CFU/mL count average of 5.6x103 CFU/mL for S. aureus, 4.4x103 CFU/mL for S. sanguinis and 6.2x103 CFU/mL for P. melaninogenica.”

    1. Author Response

      Reviewer #4 (Public Review):

      The study employs a number of methods, including TEM morphometric analysis, immunochemistry, western blotting, genomics, genetically modified models, whole heart measurements.

      However, the manuscript seems to be a collection of two unfinished works: one on the transition p20-p60 in post-natal development of the heart, second about the role of ephrinB1 in the maturation of the crests of the sarcolemma. Otherwise, it is not clear why in the first figure there is no staining for ephB1, and why there is staining for claudin 5 instead.

      The reason is clearly explained in the text on page 6. The first figure explores the postnatal maturation of the CM crests and their molecular determinants and our previous paper described Claudin-5 as the first molecular determinant of the crests (Guilbeau-Frugier et al, Cardiovasc Research 2019). Based on our previous demonstration of ephrin-B1 as a direct claudin-5 partner and regulator (Genet et al, Circulation Research 2012), we thus intuitively proposed ephrin-B1 as another potential molecular determinant of the crests that we explored for the first time in our current paper in revision. Moreover, ephrin-B1 is part of a large family of direct physical cell-cell communication proteins (Eph-Ephrin system), its role in the lateral crest-crest interaction was also obvious.

      This is why at the beginning of the paper we explored claudin-5 and thereafter ephrin-B1 to explore more the functional role of the crests using Efnb1 KO mouse model we had already established in the lab.

      The authors are trying to defend the idea that development of the heart in rats doesn't finish on postnatal day 20 and goes on for up to day 60. However, it is not convincing.

      It is no surprise transcription profile is different between day 20 and day 60, I am sure as life goes on development continues into aging and any comparison of samples collected with sufficient time lapse will give transcriptional differences. Whether these differences represent a truly separate development stage is not a clear-cut story.

      Most of the argument is based on morphometric study of TEM images.

      But also on confocal microscopy studies and more importantly on transcriptomic data.

      Whether it was evident that transcription profile is different between day 20 and day 60, then most of the studies in this postnatal field would have extended their study window over P20 which is not the case. As we mentioned it in the manuscript, most people in the field were assuming terminal maturity of the CM based essentially on its typical rod-shape which is already acquired at P20. Then growth of the heart between P20 and P60 was assumed to rely only on an increase in tissue quantitative content and not on transcriptomic changes, i.e. in qualitative content.

      However, the method is not described at all. There is reference to another paper by the authors, but this paper doesn't provide a concise description of the morphometry either. It is unclear how randomisation of images and fields of view has been achieved and what statistical methods has been implemented. In TEM it is often possible to find all sorts of oddities depending on how you choose the images.

      We agree with the author that TEM is often associated with “all sorts of oddities” and that‘s the reason our recent paper (Guilbeau-Frugier et al, Cardiovasc Research 2019) was dedicated to the analysis of technical pitfalls and analysis. All this paper relies on that: How to proceed the cardiac tissue to avoid artifacts on the crests/SSM visualization and how to quantify them?.

      Now, instead of only citing our previous paper, we have implemented the “Material and methods” / “Transmission electron microscopy (TEM) and quantitative analysis” section (Main manuscript, page 20-21) by highly detailing all the TEM observation/quantification.

      The question of randomization of images of the number fields of view is a general question in all imaging techniques and not specific at all with our TEM study. In imaging, there is no randomization.

      All statistical analysis of TEM data quantifications are accurately described in all figure legends. For instance, in the figure 1: (B) Quantification of crest heights / sarcomere length (left panel), SSM number / crest (middle panel) and SSM area (right panel) from TEM micrographs obtained from P20- or P60 rat hearts (P20 n=6, P60 n=6; 4 to 8 CMs/rat, ~ 70 crests/rat). However, to better clarify the “P20 n=6, P60 n=6”, we have now specified “P20 or P60 n=6 rats”. This have been now specified in the figure legends for all statistical analysis (highlighted in yellow in the revised manuscript).

      Why didn't the authors use microscopy of live isolated cells, which may be more relevant to study crest height?

      We clearly explained it at the very beginning of the results section of our paper (first paragraph, second sentence (i, ii). The use of living CMs is a non-sense based on our two previous papers on this topic (Dague et al JMCC 2014 and Guilbeau-Frugier et al, Cardiovasc Research 2019). Our first paper was essentially based on AFM studies using isolated CMs and we found that rapidly after isolation, CM surface crests/SSM have a high tendency to shrink and disappear in control mice. This is why the second paper was based on an extensive characterization of the crests within the tissue using TEM experiments and the comparison of CM crests between tissue and living cells is also highlighted in this paper. More importantly, in this recent paper, we have described for the first time using high resolution imaging techniques (TEM and STEAD), the existence of intermittent physical interactions between neighboring CMs on their lateral side through crest-crest interaction via the extracellular domain of claudin-5. This crest-crest physical interaction can only be observed within the tissue since isolated adult CMs remain isolated and do not reproduce CM-CM physical interactions (through lateral or physical interactions at the longitudinal level, i.e. the intercalated disk level).

      Both claudin5 and EphrinB1 seem to be expressed highly after p5, which doesn't correlate with the proposed maturation of crests at days 20 to 60.

      Many processes do not rely only on gene/protein expression but on post-translational processes and localization/trafficking of proteins within the cell. This is exactly what we show with ephrin-B1 and claudin-5 proteins that traffic from the cytoplasm to the lateral membrane at the surface of the CMs between P20 and P60, as shown by our confocal images of the cardiac tissue while the global expression level of these two proteins doesn’t change (western blot results).

      There is no causative relationship between the lack of ephrinb1 and crest maturing, at least to my mind.

      Comparing the cardiac tissue between P20 an P60 and showing both ephrin-B1 trafficking at the CM lateral surface and crest maturation is obviously not a criterion of any relationship between these two events. However, when you delete a specific protein, i.e ephrin-B1, from a specific cell, i.e. the CM, and the phenotype of the KO mice is again a lack of crest maturation, you can at least deduce that ephrin-B1 is involved, directly or indirectly we don’t know, in the maturation process of the crests in the CM.

      Now, because of the constitutive deletion of Efnb1, we couldn’t completely exclude that the phenotype of the constitutive Efnb1 CM-KO mice we described at the adult stage was directly related to specific alteration of CM surface crest/diastolic function at the adult stage or more likely related to other earlier developmental defects (secondary mechanisms). Also, to discriminate between these two possibilities, we have now used in the revision process a tamoxifen-inducible conditional-knockout (Mer-Cre-Mer) of Efnb1 in the CM (MHC promotor). This mouse model has never been reported before but its characterization (new Supplementary Figure 16) indicated that tamoxifen injection can lead to up to 50 % of Efnb1 deletion in CMs. In these conditions, deletion of Efnb1 (tamoxifen injection) was initiated at the young adult stage (2-month old) and the systolic and diastolic function (echo Doppler and LV-catheterism) but also CM crest phenotype (TEM) were examined one month later. As shown in the new Figure 7, deletion of efnb1 at the adult stage led to partial loss of CM surface crests (New Fig 7B), agreeing with the partial deletion of Efnb1, associated with a significant increase in the IVRT (echo-doppler), LVEDP (LV catheterism) with no modification of the ejection fraction (echo) compared to the control mouse littermates (tamoxifen injected) (New Fig. 7C, D). Thus, these data clearly demonstrate that ephrin-B1 is a specific determinant of the crest architecture at the CM surface and of the diastolic function at the adult stage.

    1. Author Response

      Reviewer #3 (Public Review):

      The manuscript by Le T.D.V. et al used in vitro cell culture and inhibitors for cellular signaling molecules and found that GLP-1 receptor activation stimulated the phosphorylation of Raptor, which was PKA-mediated and Akt-independent. The authors reported the physiological function of this GLP-1R-PKA-Raptor in liraglutide stimulated weight loss. This timely study has high significance in the field of metabolic research for the following reasons.

      (1) The authors' findings are significant in the field of obesity research. GLP-1 receptor (GLP-1R) is a successful target for diabetes (and weight loss) therapeutics. However, the mechanisms of action for the weight-loss effect of GLP-1 agonists are not fully understood. Therefore, mechanistic studies to elucidate the signaling pathways of GLP-1 receptors pertaining to weight loss at the cellular level are timely.

      (2) G protein-coupled receptors (GPCRs) induces various signaling activities, which could be cellular and tissue specific. As these are an important protein family for drug targeting, understanding the basic biology of these receptors is of interest to a broad readership.

      (3) The authors have made important discoveries that Exendin-4 stimulated mTORC1 signaling was essential for the anorectic effect induced by Exendin-4. The study reported in this current manuscript provides more details of brain GLP-1R signaling pathways and is innovative.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, took potential caveats into consideration, and made a justified conclusion.

      Recommendations for the authors:

      The manuscript can be further strengthened with more clarification on the following points.

      1) In Figure 1 panels B and C, please provide the quantification for pCREB/CREB. In Figure 1 panel D, please provide the quantification for pAkt/Akt.

      We thank the Reviewer for this suggestion. We now provide quantification of pCREB and pAkt expression in Supp. Fig. 1.

      2) The western blots to assess the signaling activities revealed the phosphorylation status of the key signaling molecules at a single time point. Whether the overall signaling dynamics have been affected is unclear.

      We agree with the reviewer on this point. We conducted initial time course experiments to identify a suitable time point for the subsequent experiments conducted in the present studies. The 1h time point presented in our results was chosen because it was the earliest time point at which both liraglutide stimulated mTORC1 signaling and this effect was inhibited by the various pharmacological inhibitors. We agree with the reviewer that at this point it is not clear whether the various inhibitors or the Ser791Ala mutation in Raptor modifies the dynamics of mTORC1 signaling. Although we have preliminary data in CHO-K1 cells suggesting that the temporal dynamics of these signaling events are not affected, this does not necessarily translate to the in vivo setting. Once we identify the key target tissue/cell type(s) mediating the weight loss effect of liraglutide via the PKA-Raptor interaction and generate the necessary mutant mice, we will test whether this affects signaling dynamics in vivo.

      3) Figure 3 panels A and B demonstrated the remarkable importance of the Ser791 Raptor. However, this PKA-resistant mutant did not completely abolish the weight loss effect of liraglutide. The authors pointed out the importance of AMPK in mTORC1 signaling. Other pathways that may complement GLP-1R-PKA-Raptor signaling can be further discussed.

      We agree with the reviewer that other signaling pathways are likely involved that contribute to the remaining weight-lowering effect of liraglutide. Besides AMPK, we have also included a discussion of Akt being a potential molecule that interacts with these pathways in vivo (lines 218-225). The word limitations of a Short Communication prevent us from further expanding on these possible mechanisms.

      4) Food intake was decreased on day 2 in Figure 3D but became comparable between WT and S791A Raptor groups on the following days. Could this be due to some compensatory mechanisms?

      This pattern of food intake response to GLP-1R agonists has been previously reported by our group and others (please see Brown JD et al. Am J Physiolo Regul Integr Comp Physiol 2018 and Adams JM et al. Diabetes 2018). The reason for this is unclear at this moment, but we can speculate that the rebound in food intake is a compensatory mechanism to prevent the organism from continuously losing weight. We now also present also showing an initial drop in energy expenditure with liraglutide treatment that progressively increases to pre-treatment levels.

    1. Author Response

      Reviewer #3 (Public Review):

      The size of the excitation region and the size of the aster are linearly correlated but are drastically different in size. This provokes several questions.

      • Why does only one aster form if the region of excitation is over 10x the size? Why are there not multiple asters formed within this activation region?

      • A much larger excitation diameter than the size of the resultant structure suggests the amount of dimeric motor is not limiting. Why then does the size of the aster increase with excitation diameter?

      • A linear relationship between excitation region and aster size may suggest a constant density of material within the aster. While the intensity profile of a single aster is given in Fig 1C, the magnitude of intensity versus the estimated size of the aster would determine whether the system is reduceable purely to changes in size/radial distribution.

      We thank the reviewer for the careful consideration of our work. In the experiments performed for this study, we were careful to be in a regime in which a single aster formed within the excitation region. However, by varying the concentration of components in the system, it is possible for multiple asters to form. See Figure R2 for example images of cases in which multiple asters formed.

      The increase in aster size with excitation region was also described previously in Ross, et al. 2019. In this, we found that the aster size scales with the volume of the excitation region, suggesting that the number of microtubules is limiting to aster size. This supports the hypothesis that there may be a density limit to the microtubules, likely due to steric interactions between the microtubules. We clarified this and added reference to the Ross, et al. findings in lines 115-118, as follows:

      “In Ross, et al., it was determined that the aster size roughly scaled with the volume of the excitation area, suggesting that the number of microtubules limits the size of the aster. This hints that there may be a density limit to the microtubules in an aster.”

      Is dimerization reversible after activation? If the motors cannot unbind from each other, and act as crosslinkers (for as long as they remain bound) are they likely to accumulate within the aster over time? This may challenge the steady state assumption.

      We thank the reviewer for the thoughtful analysis. Dimerization is reversible after activation - the lifetime of the optogenetic bond is about 20 seconds (Guntas et al., 2015). In order to form an aster, we repeatedly activate the sample at 20 second intervals, so there is a balance between motors unbinding from each other and ones becoming dimerized. This balance can create a non-equilibrium steady state. We have clarified this in lines 78-80, as follows:

      “The optogenetic bond lasts for about 20 seconds before reverting to the undimerized state, thus in our experiments, we repeatedly illuminate the sample every 20 seconds (Guntas, et al. 2015).”