26,869 Matching Annotations
  1. Apr 2024
    1. eLife assessment

      This study presents a valuable computational model for elaborating on the interpretation of chromosomal mosaicism in preimplantation embryos. The evidence supporting the claims of the authors is incomplete due to the assumption that is possible to quantify the cells in the embryo, oversimplification of mitotic errors, and the inclusion of the self-correction premise. The work will be of interest to embryologists, and geneticists working on reproductive medicine.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript presents a compelling model to explain the impact of mosaicism in preimplantation genetic testing for aneuploidies.

      Strengths:

      A new view of mosaicism is presented with a computational model, that brings new insights into an "old" debate in our field. It is a very well-written manuscript.

      Weaknesses:

      Although the manuscript is very well written, this is in a way that assumes that the reader has existing knowledge about specific terms and topics. This was apparent through a lack of definitions and minimal background/context to the aims and conclusions for some of the author's findings.

      There is a need for some examples to connect real evidence and scenarios from clinical reports with the model.

    3. Reviewer #2 (Public Review):

      Summary:

      Although an oversimplification of the biological complexities, this modeling work does add, in a limited way, to the current knowledge on the theoretical difficulties of detecting mosaicism in human blastocysts from a single trophectoderm biopsy in PGT. However, many of the premises that the modeling was built on are theoretical and based on unproven biological and clinical assumptions that could yet lead to be untrue. Therefore, the work should be considered only as a simplified model that could assist in further understanding of the complexities of preimplantation embryo mosaicism, but assumptions of real-world application are, at this stage, premature and should not be considered as evidence in favour of any clinical strategies.

      Strengths:

      The work has presented an intriguing theoretical model for elaborating on the interpretation of complex and still unclear biological phenomena such as chromosomal mosaicism in preimplantation embryos.

      Weaknesses:

      Lines 134-138: The spatial modeling of mitotic errors in the embryo was oversimplified in this manuscript. There is only limited (and non-comprehensive) evidence that meiotic errors leading to chromosome mosaicism arise from chromosome loss or gain only (e.g. anaphase lag). This work did not take into account the (more recognised) possibility of mitotic nondisjunction where following the event there would be clones of cells with either one more or one less of the same chromosome. Although addressed in the discussion (lines 572-574), not including this in the most basic of modeling is a significant oversight that, based on the simple likelihood, could significantly affect results.

      General comment: the premise of the manuscript is that an embryologist (embryology laboratory) is aware of and can accurately quantify the number of cells in a blastocyst or TE biopsy. The reality is that it is not possible to accurately do this without the destruction of the sample which is obviously not clinically applicable. Based on many assumptions the findings show that taking small biopsies poorly classifies mosaic embryos, which is not disputed. However, extrapolating this to the clinic and making suggestions to biopsy a certain amount of cells (lines 539-540) is careless and potentially harmful by suggesting the introduction of potential change in clinical practice without validation. Additionally, no embryologist in the field can tell how many cells are present in a clinical TE biopsy, making this suggestion even more impractical.

      On a more general clinical consideration, the authors should acknowledge that when reporting findings of unproven clinical utility and unknown predictive values this inevitably results in negative consequences for infertile couples undergoing IVF. It is proven and established that when couples face the decision on how to manage a putative mosaicism finding, the vast majority decide on embryo disposal. It was recently reported in an ESHRE survey that about 75% of practitioners in the field consider discarding or donating to research embryos with reported mosaicism. A prospective clinical trial showed that about 30% live birth rate reduction can be expected if mosaic embryos are not considered (Capalbo et al., AJHG 2021). The real-world experience is that when mosaicism is reported, embryos with almost normal reproductive potential are discarded. The authors should be more careful with the clinical interpretation and translation of these theoretical findings.

      There is a robust consensus within the field of clinical genetics and genomics regarding the necessity to exclusively report findings that possess well-established clinical validity and utility. This consensus is grounded in the imperative to mitigate misinterpretation and ineffective actions in patient care. However, the clinical framework delineated in this manuscript diverges from the prevailing consensus in clinical genetics. Clinical genetics and genomics prioritize the dissemination of findings that have undergone rigorous validation processes and have demonstrated clear clinical relevance and utility. This emphasis is crucial for ensuring accurate diagnosis, prognosis, and therapeutic decision-making in patient care. By adhering to established standards of evidence and clinical utility, healthcare providers can minimize the potential for misinterpretation and inappropriate interventions. The framework proposed in this manuscript appears to deviate from the established principles guiding clinical genetics practice. It is imperative for clinical frameworks to align closely with the consensus guidelines and recommendations set forth by professional organizations and regulatory bodies in the field. This alignment not only upholds the integrity and reliability of genetic testing and interpretation but also safeguards patient well-being and clinical outcomes.

      References:<br /> ACMG Board of Directors. (2015). Clinical utility of genetic and genomic services: a position statement of the American College of Medical Genetics and Genomics. Genetics in Medicine, 17(6), 505-507. https://doi.org/10.1038/gim.2014.194.<br /> Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., ... ACMG Laboratory Quality Assurance Committee. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine, 17(5), 405-424. https://doi.org/10.1038/gim.2015.30

      Line 61: "Self correction" - This terminology is unfortunately indiscriminately used in the field for PGT when referring to mosaicism and implies that the embryo can actively correct itself from a state of inherent abnormality. Apart from there being no evidence to suggest that there is an active process by which the embryo itself can correct chromosomal errors, most presumed euploid/aneuploid mosaic embryos will have been euploid zygotes and therefore "self-harm" may be a better explanation. True self-correction in the form of meiotic trisomy/monosomy rescue is of course theoretically possible but not at all clinically significant. The concept being conveyed in this part of the manuscript is not disputed but it is strongly suggested that the term "self correction" is not used in this context, nor in the rest of the manuscript, to prevent the perpetuation of misinformation in the field and instead use a better description.

      Lines 69-73: The ability to quantify aneuploidy in known admixtures of aneuploid cells is indeed well established. However, the authors claim that the translation of this to embryo biopsy samples is inferred with some confidence and that if a biopsy shows an intermediate chromosome copy number (ICN), that the biopsy and the embryo are mosaic. There are no references provided here and indeed the only evidence in the literature relating to this is to the contrary. Multifocal biopsy studies have shown that an ICN result in a single biopsy is often not seen in other biopsies from the same embryo (Capalbo et al 2021; Kim et al., 2022; Girardi et al., 2023; Marin, Xu, and Treff 2021). Multifocal biopsies showing reciprocal gain and loss which would provide stronger validation for the presence of true mosaicism are also rare. In this work, the entire manuscript is based on the accuracy of ICN in a biopsy being reflective of mosaicism in the embryo. The evidence however points to a large proportion of ICN detected in embryo biopsy potentially being technical artifacts (misdiagnosing both constitutionally normal and abnormal (meiotic aneuploid) embryos as mosaic. Therefore, although results from the modelling provide insight into theoretical results, these can not be used to inform clinical decision-making at all.

      Lines 87-89: The authors make the claim that emerging evidence is suggestive that the majority of embryos are mosaic to some degree. If in fact, mosaicism is the norm, the clinical importance may be limited.

      Line 102-103: The statement that data shows that the live birth rate per ET is generally lower in mosaic embryos than euploid embryos is from retrospective cohort studies that suffer from significant selection bias. The authors have ignored non-selection study results (Capalbo et al, ajhg 2021) that suggest that putative mosaicism has limited predictive value when assessed prospectively and blinded.

      Lines 94-98: The authors have misrepresented the works they have presented as evidence for biopsy result accuracy (Kim et al., 2023; Victor et al 2019; Capalbo et al., 2021; Girardi et al., 2023, and any others). These studies show that a mosaic biopsy is not representative of the whole embryo and can actually be from embryos where the remainder of the embryo shows no evidence of mosaicism. There is also a missing key reference of Capalbo et al, AJHG 2021, and Girardi et al., HR 2023 where multifocal biopsies were taken.

      Lines 371-372: "Selecting the embryo with the lowest number of aneuploid cells in the biopsy for transfer is still the most sensible decision". Where is the evidence for this other than the modeling which is affected by oversimplification and unproven assumptions? Although the statement seems logical at face value, there is no concrete evidence that the proportion of aneuploid cells within a biopsy is valuable for clinical outcomes, especially when co-evaluated with other more relevant clinical information.

      Lines 431-463: In this section, the authors discuss clinical outcome data from the transfer of putative mosaic embryos and make conclusions about the relationship between ICN level in biopsy and successful pregnancy outcomes. The retrospective and selective nature of the data used in forming the results has the potential to lead to incorrect conclusions when applied to prospective unselected data.

    4. Reviewer #3 (Public Review):

      Unfortunately, this study fails to incorporate the most important variable impacting the ability to predict mosaicism, the accuracy of the test. The fact is that most embryos diagnosed as mosaic are not mosaic. There may be 4 cases out of thousands and thousands of transfers where a confirmation was made. Mosaicism has become a category of diagnosis in which embryos with noisy NGS profiles are placed. With VeriSeq NGS it is not possible to routinely distinguish true mosaicism from noise. An analysis of NGS noise levels (MAPD) versus the rate of mosaics by clinic using the registry will likely demonstrate this is the case. Without accounting for the considerable inaccuracy of the method of testing the proposed modeling is meaningless.

      Recent data using more accurate methods of identifying mosaicism indicate that the prevalence of true preimplantation embryonic mosaicism is only 2%, which is also consistent with findings made post-implantation. This model fails to account for the possibility that, because so few embryos are actually mosaic, there is actually no relevance to clinical care whatsoever. In fact, differences in clinical outcomes of embryos designated as mosaic could be entirely attributed to poor embryo quality resulting in noise levels that make NGS results fall into the "mosaic" category.

      Additional comments:

      Indeed, as more data emerges, it appears that the majority of embryos from both healthy and infertile couples are mosaic to some degree (Coticchio et al., 2021; Griffin et al., 2022).

      This statement should be softened as all embryos will be considered mosaic when a method with a 10% false positive rate is applied to 10 more parts of the same embryo. The distinction between artifact and true mosaicism cannot be made with nearly all current methods of testing. When virtually no embryos display uniform aneuploidy in a rebiopsy study, there should be great concern over the accuracy of the testing used. The vast majority of aneuploidy is meiotic in origin.

      Experimental data provides strong evidence that, for the most part, the biopsy result obtained accurately represents the chromosome constitution of the rest of the embryo (Kim 96 et al., 2022; Navratil et al., 2020; Victor et al., 2019).

      This statement is incorrect given published systematic review of the literature indicates a 10% false positive rate based on rebiopsy results.

      This shows that accurately classifying a mosaic embryo based on a single biopsy is not robust.

      This is exactly why the practice of designating embryo mosaics with intermediate copy numbers should not exist.

    1. eLife assessment

      This useful work by Park attempts to use machine learning algorithms to identify correlates of different observed shedding patterns in two COVID-19 cohorts. The evidence supporting the study conclusions is incomplete due to the lack of uniformity in assays between the 2 cohorts, relevant metadata (previous infection/vaccination status, viral variant), early viral load data in the cohorts, and incomplete statistical analyses. With a strengthened analysis, the work may be of interest to virologists, clinicians, and public health scientists.

    2. Reviewer #1 (Public Review):

      Summary:

      This study by Park and colleagues uses longitudinal saliva viral load data from two cohorts (one in the US and one in Japan from a clinical trial) in the pre-vaccine era to subset viral shedding kinetics and then use machine learning to attempt to identify clinical correlates of different shedding patterns. The stratification method identifies three separate shedding patterns discriminated by peak viral load, shedding duration, and clearance slope. The authors also assess micro-RNAs as potential biomarkers of severity but do not identify any clear relationships with viral kinetics.

      Strengths:

      The cohorts are well developed, the mathematical model appears to capture shedding kinetics fairly well, the clustering seems generally appropriate, and the machine learning analysis is a sensible, albeit exploratory approach. The micro-RNA analysis is interesting and novel.

      Weaknesses:

      The conclusions of the paper are somewhat supported by the data but there are certain limitations that are notable and make the study's findings of only limited relevance to current COVID-19 epidemiology and clinical conditions.

      (1) The study only included previously uninfected, unvaccinated individuals without the omicron variant. It has been well documented that vaccination and prior infection both predict shorter duration shedding. Therefore, the study results are no longer relevant to current COVID-19 conditions. This is not at all the authors' fault but rather a difficult reality of much retrospective COVID research.

      (2) The target cell model, which appears to fit the data fairly well, has clear mechanistic limitations. Specifically, if such a high proportion of cells were to get infected, then the disease would be extremely severe in all cases. The authors could specify that this model was selected for ease of use and to allow clustering, rather than to provide mechanistic insight. It would be useful to list the AIC scores of this model when compared to the model by Ke.

      (3) Line 104: I don't follow why including both datasets would allow one model to work better than the other. This requires more explanation. I am also not convinced that non-linear mixed effects approaches can really be used to infer early model kinetics in individuals from one cohort by using late viral load kinetics in another (and vice versa). The approach seems better for making population-level estimates when there is such a high amount of missing data.

      (4) Along these lines, the three clusters appear to show uniform expansion slopes whereas the NBA cohort, a much larger cohort that captured early and late viral loads in most individuals, shows substantial variability in viral expansion slopes. In Figure 2D: the upslope seems extraordinarily rapid relative to other cohorts. I calculate a viral doubling time of roughly 1.5 hours. It would be helpful to understand how reliable of an estimate this is and also how much variability was observed among individuals.

      (5) A key issue is that a lack of heterogeneity in the cohort may be driving a lack of differences between the groups. Table 1 shows that Sp02 values and lab values that all look normal. All infections were mild. This may make identifying biomarkers quite challenging.

      (6) Figure 3A: many of the clinical variables such as basophil count, Cl, and protein have very low pre-test probability of correlating with virologic outcome.

      (7) A key omission appears to be micoRNA from pre and early-infection time points. It would be helpful to understand whether microRNA levels at least differed between the two collection timepoints and whether certain microRNAs are dynamic during infection.

      (8) The discussion could use a more thorough description of how viral kinetics differ in saliva versus nasal swabs and how this work complements other modeling studies in the field.

      (9) The most predictive potential variables of shedding heterogeneity which pertain to the innate and adaptive immune responses (virus-specific antibody and T cell levels) are not measured or modeled.

      (10) I am curious whether the models infer different peak viral loads, duration, expansion, and clearance slopes between the 2 cohorts based on fitting to different infection stage data.

    3. Reviewer #2 (Public Review):

      Summary:

      This study argues it has found that it has stratified viral kinetics for saliva specimens into three groups by the duration of "viral shedding"; the authors could not identify clinical data or microRNAs that correlate with these three groups.

      Strengths:

      The question of whether there is a stratification of viral kinetics is interesting.

      Weaknesses:

      The data underlying this work are not treated rigorously. The work in this manuscript is based on PCR data from two studies, with most of the data coming from a trial of nelfinavir (NFV) that showed no effect on the duration of SARS-CoV-2 PCR positivity. This study had no PCR data before symptom onset, and thus exclusively evaluated viral kinetics at or after peak viral loads. The second study is from the University of Illinois; this data set had sampling prior to infection, so has some ability to report the rate of "upswing." Problems in the analysis here include:

      -- The PCR Ct data from each study is treated as equivalent and referred to as viral load, without any reports of calibration of platforms or across platforms. Can the authors provide calibration data and justify the direct comparison as well as the use of "viral load" rather than "Ct value"? Can the authors also explain on what basis they treat Ct values in the two studies as identical?

      -- The limit of detection for the NFV PCR data was unclear, so the authors assumed it was the same as the University of Illinois study. This seems a big assumption, as PCR platforms can differ substantially. Could the authors do sensitivity analyses around this assumption?

      -- The authors refer to PCR positivity as viral shedding, but it is viral RNA detection (very different from shedding live/culturable virus, as shown in the Ke et al. paper). I suggest updating the language throughout the manuscript to be precise on this point.

      -- Eyeballing extended data in Figure 1, a number of the putative long-duration infections appear to be likely cases of viral RNA rebound (for examples, see S01-16 and S01-27). What happens if all the samples that look like rebound are reanalyzed to exclude the late PCR detectable time points that appear after negative PCRs?

      -- There's no report of uncertainty in the model fits. Given the paucity of data for the upslope, there must be large uncertainty in the up-slope and likely in the peak, too, for the NFV data. This uncertainty is ignored in the subsequent analyses. This calls into question the efforts to stratify by the components of the viral kinetics. Could the authors please include analyses of uncertainty in their model fits and propagate this uncertainty through their analyses?

      -- The clinical data are reported as a mean across the course of an infection; presumably vital signs and blood test results vary substantially, too, over this duration, so taking a mean without considering the timing of the tests or the dynamics of their results is perplexing. I'm not sure what to recommend here, as the timing and variation in the acquisition of these clinical data are not clear, and I do not have a strong understanding of the basis for the hypothesis the authors are testing.

      It's unclear why microRNAs matter. It would be helpful if the authors could provide more support for their claims that (1) microRNAs play such a substantial role in determining the kinetics of other viruses and (2) they play such an important role in modulating COVID-19 that it's worth exploring the impact of microRNAs on SARS-CoV-2 kinetics. A link to a single review paper seems insufficient justification. What strong experimental evidence is there to support this line of research?

    4. Reviewer #3 (Public Review):

      The article presents a comprehensive study on the stratification of viral shedding patterns in saliva among COVID-19 patients. The authors analyze longitudinal viral load data from 144 mildly symptomatic patients using a mathematical model, identifying three distinct groups based on the duration of viral shedding. Despite analyzing a wide range of clinical data and micro-RNA expression levels, the study could not find significant predictors for the stratified shedding patterns, highlighting the complexity of SARS-CoV-2 dynamics in saliva. The research underscores the need for identifying biomarkers to improve public health interventions and acknowledges several limitations, including the lack of consideration of recent variants, the sparsity of information before symptom onset, and the focus on symptomatic infections.

      The manuscript is well-written, with the potential for enhanced clarity in explaining statistical methodologies. This work could inform public health strategies and diagnostic testing approaches. However, there is a thorough development of new statistical analysis needed, with major revisions to address the following points:

      (1) Patient characterization & selection: Patient immunological status at inclusion (and if it was accessible at the time of infection) may be the strongest predictor for viral shedding in saliva. The authors state that the patients were not previously infected by SARS-COV-2. Was Anti-N antibody testing performed? Were other humoral measurements performed or did everything rely on declaration? From Figure 1A, I do not understand the rationale for excluding asymptomatic patients. Moreover, the mechanistic model can handle patients with only three observations, why are they not included? Finally, the 54 patients without clinical data can be used for the viral dynamics fitting and then discarded for the descriptive analysis. Excluding them can create a bias. All the discarded patients can help the virus dynamics analysis as it is a population approach. Please clarify. In Table 1 the absence of sex covariate is surprising.

      (2) Exact study timReviewer #3 (Public Review):eline for explanatory covariates: I understand the idea of finding « early predictors » of long-lasting viral shedding. I believe it is key and a great question. However, some samples (Figure 4A) seem to be taken at the end of the viral shedding. I am not sure it is really easier to micro-RNA saliva samples than a PCR. So I need to be better convinced of the impact of the possible findings. Generally, the timeline of explanatory covariate is not described in a satisfactory manner in the actual manuscript. Also, the evaluation and inclusion of the daily symptoms in the analysis are unclear to me.

      (3) Early Trajectory Differentiation: The model struggles to differentiate between patients' viral load trajectories in the early phase, with overlapping slopes and indistinguishable viral load peaks observed in Figures 2B, 2C, and 2D. The question arises whether this issue stems from the data, the nature of Covid-19, or the model itself. The authors discuss the scarcity of pre-symptom data, primarily relying on Illinois patients who underwent testing before symptom onset. This contrasts earlier statements on pages 5-6 & 23, where they claim the data captures the full infection dynamics, suggesting sufficient early data for pre-symptom kinetics estimation. The authors need to provide detailed information on the number or timing of patient sample collections during each period.

      (4) Conditioning on the future: Conditioning on the future in statistics refers to the problematic situation where an analysis inadvertently relies on information that would not have been available at the time decisions were made or data were collected. This seems to be the case when the authors create micro-RNA data (Figure 4A). First, when the sampling times are is something that needs to be clarified by the authors (for clinical outcomes as well). Second, proper causal inference relies on the assumption that the cause precedes the effect. This conditioning on the future may result in overestimating the model's accuracy. This happens because the model has been exposed to the outcome it's supposed to predict. This could question the - already weak - relation with mir-1846 level.

      (5) Mathematical Model Choice Justification and Performance: The paper lacks mention of the practical identifiability of the model (especially for tau regarding the lack of early data information). Moreover, it is expected that the immune effector model will be more useful at the beginning of the infection (for which data are the more parsimonious). Please provide AIC for comparison, saying that they have "equal performance" is not enough. Can you provide at least in a point-by-point response the VPC & convergence assessments?

      (6) Selected features of viral shedding: I wonder to what extent the viral shedding area under the curve (AUC) and normalized AUC should be added as selected features.

      (7) Two-step nature of the analysis: First you fit a mechanistic model, then you use the predictions of this model to perform clustering and prediction of groups (unsupervised then supervised). Thus you do not propagate the uncertainty intrinsic to your first estimation through the second step, ie. all the viral load selected features actually have a confidence bound which is ignored. Did you consider a one-step analysis in which your covariates of interest play a direct role in the parameters of the mechanistic model as covariates? To pursue this type of analysis SCM (Johnson et al. Pharm. Res. 1998), COSSAC (Ayral et al. 2021 CPT PsP), or SAMBA ( Prague et al. CPT PsP 2021) methods can be used. Did you consider sampling on the posterior distribution rather than using EBE to avoid shrinkage?

      (8) Need for advanced statistical methods: The analysis is characterized by a lack of power. This can indeed come from the sample size that is characterized by the number of data available in the study. However, I believe the power could be increased using more advanced statistical methods. At least it is worth a try. First considering the unsupervised clustering, summarizing the viral shedding trajectories with features collapses longitudinal information. I wonder if the R package « LongituRF » (and associated method) could help, see Capitaine et al. 2020 SMMR. Another interesting tool to investigate could be latent class models R package « lcmm » (and associated method), see Proust-Lima et al. 2017 J. Stat. Softwares. But the latter may be more far-reached.

      (9) Study intrinsic limitation: All the results cannot be extended to asymptomatic patients and patients infected with recent VOCs. It definitively limits the impact of results and their applicability to public health. However, for me, the novelty of the data analysis techniques used should also be taken into consideration.

      Strengths are:<br /> - Unique data and comprehensive analysis.<br /> - Novel results on viral shedding.

      Weaknesses are:<br /> - Limitation of study design.<br /> - The need for advanced statistical methodology.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The findings in this manuscript are important in the gene editing in human-derived hematopoietic stem and progenitor cells. By optimizing the delivery tool, adding DNA-PK inhibitor and including spacer-breaking silent mutations, the editing efficiency is significantly increased, and the heterozygosity could be tuned. The editing is even across the hematopoietic hierarchy.

      Strengths:

      The precise gene editing is important in gene therapy in vitro and in vivo. The manuscript provides solid evidence showing the efficacy and uniqueness of their gene editing approach.

      Weaknesses:

      There are several extended and unique points shown in this paper but in a specific cell population.

      The findings are indeed in a specific cell lineage, though it should be noted the editing crossed multiple cell types within that lineage. More importantly though, HSPC have substantial relevance to understanding adult stem cell biology, blood formation, and leukemia. Critically, they are also the target cells for a plethora of gene therapies for anemias, immunodeficiencies, metabolic disorders, and are also being explored for use with CAR technologies. Indeed, CRISPR-based gene therapy was recently approved for clinical use. As such, the findings here are of substantial relevance for multiple areas of research including hematology, stem cell biology, cancer, immunology and more.

      Reviewer #2 (Public Review):

      Summary:

      This work by Cloarec-Ung et al. sets out to uncover strategies that would allow for the efficient and precision editing of primitive human hematopoietic stem and progenitor cells (HSPCs). Such effective editing of HSPCs via homology directed repair has implications for the development of tractable gene therapy approaches for monogenic hematopoietic disorders as well as precise engineering of these cells for clinical regenerative and/or cell therapy strategies. In the setting of experimental hematology, precision introduction of disease relevant mutations would also open the door to more robust disease modeling approaches. It has been recognized that to encourage HDR, NHEJ as the dominant mode of repair in quiescent HSPCs must be inhibited. Testing editing of human cord blood HSPCs the authors first incorporate a prestimulation phase then identify optimal RNP amounts and donor types/amounts using standard editing culture conditions identifying optimal concentrations of AAV and short single-stranded oligonucleode donors (ssODNs) that yield minimal impacts to cell viability while still enabling heightened integration efficiency. They then demonstrate the superiority of AZD7648, an inhibitor of NHEJ-promoting DNA-PK, in allowing for much increased HDR with toxicities imparted by this compound reduced substantially by siRNAs against p53 (mean targeting efficiencies at 57 and 80% for two different loci). Although AAV offered the highest HDR frequencies, differing from ssODN by a factor by ~2-fold, the authors show that spacer breaking sequence mutations introduced into the ssODN to better mimic the disruption of the spacer sequence provided by the synthetic intron in the AAV backbone yielded ssODN HDR frequencies equal to that attained by AAV. By examining editing efficiency across specific immunophenotypically identified subpopulations they further suggest that editing efficiency with their improved strategy is consistent across stem and early progenitors and use colony assays to quantify an approximate 4-fold drop in total colony numbers but no skewing in the potentiality of progenitors in the edited HSPC pool. Finally, the authors provide a strategy using mutation-introducing AAV mixed with different ratios of silent ssODN repair templates to enable tuning of zygosity in edited CD34+ cells.

      Strengths:

      The methods are clearly described and the experiments for the most part also appropriately powered. In addition to using state of the art approaches the authors also provided useful insights into optimizing the practicalities of the experimental procedures that will aid bench scientists in effectively carrying out these editing approaches, for example avoiding longer handling times inherent when scaling up to editing over multiple conditions.

      The sum of the adjustments to the editing procedure have yielded important advances towards minimizing editing toxicity while maximizing editing efficiency in HSPCs. In particular, the significant increase in HDR facilitated by the authors' described application of AZD7648 and the preservation of a pool of targeted progenitors is encouraging that functionally valuable cell types can be effectively edited.

      The discovery of the effectiveness of spacer breaking changes in ssODNs allowing for substantially increased targeting efficiency is a promising advance towards democratizing these editing strategies given the ease of designing and synthesizing ssODNs relative to the production of viral donors.

      The ability to zygosity tune was convincingly presented and provides a valuable strategy to modify this HDR procedure towards more accurate disease modelling.

      Weaknesses:

      Despite providing convincing evidence that functional progenitors can be successfully edited by their procedure, as the authors acknowledge it remains to be verified to what degree the self-renewal capacity and in vivo regenerative potential of the more primitive fractions is maintained with their strategy.

      As other the 53BP1-based editing strategy that also disrupt DNA-PK have demonstrated maintained allele frequencies over engraftment time (De Ravin et al. Blood 2021), this suggests that a transient disruption of DNA-PK shouldn’t compromise regenerative potential. Of course, we strongly agree that maintained regenerative potential is important in any editing strategy. As such, for the version of record we have added clonal LT-CIC assessment using conditions that we’ve previously demonstrated predict long-term repopulating potential (Knapp et al. Nat Cell Bio 2018). This data, which has been added to Figure 3, shows no significant reduction in the frequency of the most potent LT-CIC in edited cells compared to unedited controls.

      Assessments of the potential for off-target effects via the authors' approach was somewhat cursory and would have benefited from a more thorough evaluation.

      Once again in the 53BP1 strategy, the authors of that study already performed CHANGE-seq, long-range PCR, NGS, and SKY with inhibition of this same pathway without obvious increases in off-target editing (as long as HDR donor was present, though they did interestingly observe increased large deletions when HDR donors were absent, De Ravin et al. Blood 2021). Our tests here were designed to confirm that our molecule was similarly not affecting off-target editing rather than to launch a large-scale investigation. We agree, however, that off-targets and particularly structural re-arrangements that could be missed by other approaches remain a concern. We have added in nanopore sequencing of the predicted off-target sites and thus verified more deeply that there was no change (indeed no observable off-target activity) at any of these sites. This data has been added to Figure 2 and to a new supplementary Figure S5. Additionally, while it’s beyond the scope of the current manuscript, a focused follow-up dedicated to structural rearrangements downstream of both single and multiple edits is currently in progress and will be submitted separately later this year.

      Viability was assessed by live cell counting however given the short-term nature of the editing assay, more sensitive readouts of potentially compromised cell health could have provided a more stringent assessment of how the editing methodology impacted cell fitness.

      Of course, we agree that viable cell counting does not fully predict whether the cell is viable in terms of retained proliferative potential or other functional potentials. This point was addressed for myeloid progenitors at least by the CFC assays already in the manuscript, as to form a colony these cells were definitionally viable at input. Indeed, in these tests, we did see a reduction beyond that of the viable counts as already discussed in the text. Similarly, we already inadvertently answered this in the general CD34+CD45RA- population in Figure 4C where we measured clonal growth following editing with different mutant to silent donor ratios. In this instance we observed 30-40% clonogenic frequencies (Figure 4C), though in this case without a specific non-edited control (as this was not the intended question). None-the-less, this would indicate that any general viability loss was no more than observed in the CFC tests (even if we assume 100% cloning efficiency if the cells had been unedited). Finally, the clonal LTC-IC show that while there is perhaps some loss in more committed progenitors, those with the highest self-renewal potential are not compromised in the edited condition compared to control (Figure 3I).

      Recommendations for the authors

      Reviewer #2 (Recommendations For The Authors):

      It will be important to include the author-provided new paragraph in the discussion to contextualize this work in the existing HSPC editing landscape and your unique findings.

      A new paragraph detailing how our manuscript fits with other recently published works is now included in the discussion.

      The legend for Figure 3 needs correction. Panel E is incorrectly labeled as panel D and panel F is incorrectly labeled as panel E.

      Thank you for catching this typo. It has been fixed.

      In Figure 4 axis headings in panel C and D require clarity beyond simply titles of "Mean Frequency".

      These axis labels have been clarified.

    2. eLife assessment

      This study presents an important methodology to increase the efficiency and precision of gene editing in human hematopoietic stem and progenitor cells. The evidence supporting the claims is convincing in that primitive LTC-ICs were minimally affected as a result of the editing procedure and the lack of edits at predicted off-target sites. The work will be of interest to biologists studying hematopoietic stem and progenitor cells and genome editing for potential clinical applications.

    3. Reviewer #2 (Public Review):

      Summary:

      This work by Cloarec-Ung et al. sets out to uncover strategies that would allow for the efficient and precision editing of primitive human hematopoietic stem and progenitor cells (HSPCs). Such effective editing of HSPCs via homology directed repair has implications for the development of tractable gene therapy approaches for monogenic hematopoietic disorders as well as precise engineering of these cells for clinical regenerative and/or cell therapy strategies. In the setting of experimental hematology, precision introduction of disease relevant mutations would also open the door to more robust disease modeling approaches. It has been recognized that to encourage HDR, NHEJ as the dominant mode of repair in quiescent HSPCs must be inhibited. Testing editing of human cord blood HSPCs the authors first incorporate a prestimulation phase then identify optimal RNP amounts and donor types/amounts using standard editing culture conditions identifying optimal concentrations of AAV and short single-stranded oligonucleotide donors (ssODNs) that yield minimal impacts to cell viability while still enabling heightened integration efficiency. They then demonstrate the superiority of AZD7648, an inhibitor of NHEJ-promoting DNA-PK, in allowing for much increased HDR with toxicities imparted by this compound reduced substantially by siRNAs against p53 (mean targeting efficiencies at 57 and 80% for two different loci). Although AAV offered the highest HDR frequencies, differing from ssODN by a factor by ~2-fold, the authors show that spacer breaking sequence mutations introduced into the ssODN to better mimic the disruption of the spacer sequence provided by the synthetic intron in the AAV backbone yielded ssODN HDR frequencies equal to that attained by AAV. By examining editing efficiency across specific immunophenotypically identified subpopulations they further suggest that editing efficiency with their improved strategy is consistent across stem and early progenitors and use colony assays to quantify an approximate 4-fold drop in total colony numbers but no skewing in the potentiality of progenitors in the edited HSPC pool. Finally, the authors provide a strategy using mutation-introducing AAV mixed with different ratios of silent ssODN repair templates to enable tuning of zygosity in edited CD34+ cells.

      Strengths:

      The methods are clearly described and the experiments for the most part also appropriately powered. In addition to using state-of-the-art approaches, the authors also provided useful insights into optimizing the practicalities of the experimental procedures that will aid bench scientists in effectively carrying out these editing approaches, for example avoiding longer handling times inherent when scaling up to editing over multiple conditions.

      The sum of the adjustments to the editing procedure have yielded important advances towards minimizing editing toxicity while maximizing editing efficiency in HSPCs. In particular, the significant increase in HDR facilitated by the authors' described application of AZD7648 and the preservation of a pool of targeted progenitors is encouraging that functionally valuable cell types can be effectively edited.

      The discovery of the effectiveness of spacer breaking changes in ssODNs allowing for substantially increased targeting efficiency is a promising advance towards democratizing these editing strategies given the ease of designing and synthesizing ssODNs relative to the production of viral donors.

      The ability to zygosity tune was convincingly presented and provides a valuable strategy to modify this HDR procedure towards more accurate disease modelling.

      Weaknesses:

      Despite providing convincing evidence that functional progenitors can be successfully edited by their procedure, as the authors acknowledge it remains to be verified to what degree the survival/self-renewal capacity and in vivo regenerative potential of the more primitive fractions is maintained with their strategy. That said the inclusion of LTC-IC assays that verify the lack of effect on these quite primitive cells is encouraging that functionality of stem cells will be similarly spared.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this letter, we respond to each of the reviewers’ comments. We support responses by referring to the revised manuscript and, where necessary, by including additional descriptions and analyses that we consider extrinsic to the manuscript itself. In this letter, all changes to the manuscript are shown in blue. As noted, the displayed figures have been added to the manuscript or the SI. We believe that we have successfully addressed all comments and that the quality of our paper has improved significantly.

      Comment 1: In addition to the technical comments by the reviewers, I would encourage the authors to discuss the dependency of their observations, e.g. emergence of microphase separation, not only on the sequence of the polypeptides, but also on the solution conditions. Similarly, the distributions of ions in the condensate bulk, interphase, and diluted phase, and hence the interfacial free energy, are significantly affected both by the chemical composition of the condensate and the salt concentration itself, see: https://pubs.acs.org/doi/10.1021/acs.nanolett.1c03138

      We thank the editor for this suggestion. Here, we have focused on the effect of sequence on condensate organization. We agree that how changes in solution condition affect condensate, including microphase separation of ELPs, is potentially interesting as well. We note this as a possible future direction at multiple places in the revised Conclusions and Discussion:

      “The simulations successfully reproduced condensate stability variation upon amino acid substitution. While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature [cite] or salt [cite] dependent models to explore how solution conditions affect the organization of ELP condensates.”

      “Such a microenvironment arises from the collective behavior of many proteins, can deviate from that of individual chains, and is likely sensitive to the solution conditions,[cite] which are held constant in our study. Future work on systems with double amino acid substitutions or changes to salt concentration or temperature could elucidate the generality of the mean field interpretation and the additivity of individual contributions.”

      Response to referee 1

      Comment 0: This is an interesting, informative, and well-designed study that combines theoretical and experimental methodologies to tackle the phenomenon of higher-resolution structures/substructures in model biomolecular condensates. The results should be published. However, there is significant room for improvement in the presentation and interpretation of the results. As it stands, the precise definition of “frustration,” which is a main theme of this manuscript (as emphasized in the title), is not sufficiently well articulated. This situation should be rectified to avoid ””rustration” becoming a ”catch-all” term without a clear perimeter of applicability rather than a precise, informative description of the physical state of affairs. There are also a few other concerns, e.g., regarding interpretation of correlation of phase-separation critical temperature and transfer free energy of amino acid residues as well as the difference between critical temperature and onset temperature, and the way the simulated configurations are similar to that of gyroids.

      We want to thank the reviewers for their insightful comments. We revised the manuscript extensively to improve its clarity and to address the reviewers’ concerns. In the following, we provide point-to-point responses to all the comments.

      Comment 1: It is accurately pointed out on p.4 that elastin-like polypeptides (ELPs) undergo heat-induced phase separation and therefore exhibit lower critical solution temperatures (LCSTs). But it is not entirely clear how this feature is reproduced by the authors’ simulation. A relationship between simulated surface tension and “transition temperature” is provided in Fig.1C; but is the ”transition temperature” (authors cited ref.41 by Urry) the same as critical temperature? Apparently, Urry’s Tt is””critical onset temperature”, the temperature when phase separation happens at a given polymer concentration. This is different from the (global) critical temperature LCST - though the two may be correlated-or not-depending on the shape of the phase boundary. Moreover, is the MOFF coarse-grained forcefield (first step in the multi-scale simulation), by itself, capable of reproducing heat-induced phase separation in a way similar to the forcefield of Dignon et al., ACS Cent Sci 5, 821-230 (2019)? Or is this temperature-dependent effect appearing only subsequently, after the implementation of the MARTINI and/or all-atom steps? Clarification is needed. To afford a more informative context for the authors’ introductory discussion, the aforementioned Dignon et al. work and the review by Cinar et al. [Chem Eur J 25, 13049-13069 (2019)], both touching upon the physical underpinning of the LCST feature of elastin, should also be cited along with refs.41-43.

      We thank the reviewer for their comment. First, we apologize for the lack of clarity between the global lower critical solution temperature, Tc, and the transition temperature, Tt. We have modified the manuscript to be more explicit that the transition temperature we utilize is dependent on the solution conditions, instead of the global lower critical solution temperature.

      Author response image 1.

      Tt as a function of concentration for ELP[V5A2G3] constructs of different chain lengths. Logarithmic fits to the data for each construct using Eq. 1 are also shown. It is evident that the different curves converge to the critical temperature Tc at the critical concentration Cc. Figure reproduced from ref.[2] CC BY 4.0.

      However, as shown by Chilkoti and coworkers [1, 2] and in Author response image 1, the critical temperature of ELPs Tc is indeed linearly related to Tt with the following relationship

      The above equation highlights the dependence of Tt on the chain length (length) and polymer concentration (conc). The parameter Cc is the corresponding theoretical polypeptide concentration that would be required to achieve Tc, and k is the proportionality constant. Instead of making computationally expensive predictions of condensate critical temperatures, we focused on the surface tension, which can be more readily determined from single constant temperature simulations as detailed in the Methods section. This decision was made so to make it computationally feasible to systematically probe the properties of all 20 amino acids in diblock ELPs in our multiscale model. Furthermore, an expected relationship between the critical temperature and the surface tension can be inferred based on the Flory Huggins theory. In particular, relationships between the Flory Huggins parameter, χ, and interfacial tension (τ) have been investigated, and the relationship can be approximated as

      where α is a positive constant, whose exact value depends on the proximity of χ to the critical value of χ necessary for phase separation (χC).[3, 4] As detailed in new Supplemental Theory of the Supporting Information, for systems undergoing LCST,

      with Therefore, we have

      Several conclusions can be drawn from Eq. 4. First, for α = 1, τ is linearly proportional to Tc. Secondly, τ decreases at larger values for Tc since trend that is consistent with results presented in Figure 1 of the main text. Finally, as detailed in the Supplemental Theory, the inverse relationship between τ and Tc is only expected for systems exhibiting LCSTs. For systems with UCST, τ increases at larger Tc. Therefore, reproducing the correct trend supports the model’s ability to capture the temperature-dependent effect specific to the ELP system.

      We modified the text to define the physical meaning of Tt more explicitly. Furthermore, we added a new section in the Supporting Information titled Supplemental Theory to detail the relationship between Tt, Tc, the Flory-Huggins parameter χ, and the surface tension τ. The updated text now reads:

      “Utilizing the simulated condensate conformations, we computed various quantities to benchmark against experimental measurements. While the critical temperature has been widely used as a measure for condensate stability, determining it computationally is expensive. As an alternative, we computed the surface tension, τ, using 100-µs-long MARTINI simulations performed with the NPNAT ensemble.[cite] As detailed in the Supplemental Theory in the Supporting information, an inverse relationship is expected between τ and the critical temperature, Tc, for systems exhibiting LCSTs. We further approximate Tc with the transition temperatures (Tt) of ELP sequences,[cite] which are the temperatures at which ELPs undergo an LCST transition at a specified solution condition. Tt was shown to be linearly proportional to TC[cite]. As expected, a negative correlation can be readily seen between computed surface tension and experimental Tt (Fig. 1C). This observed negative correlation between Tt and τ supports the simulation approach’s accuracy in reproducing the sequence-dependent changes in ELP phase behavior.”

      The reviewer is correct that MOFF does not explicitly account for temperature-dependent effects in its interaction parameters. But as mentioned above and indicated by the reviewer, the following steps with explicit solvent simulations in the multiscale strategy succeed in capturing sequence-dependent differences in ELP systems, which are evident in both transition temperature and surface tension.

      We cited the two references suggested by the reviewer in the introduction. We further added the following text in the discussion section to suggest explicitly exploring temperature-dependent effects as an interesting future direction.

      “While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature[cite] or salt[cite] dependent models to explore how solution conditions effect the organization of ELP condensates.”

      Comment 2: “Frustration” and ”frustrated” are used prominently in the manuscript to characterize certain observed molecular configurations (11 times total, in both the title and in the abstract). Apparently, it is the most significant conceptual pronouncement of this work, hence its precise meaning is of central importance to the authors’ thesis. Whereas one should recognize that the theoretical and experimental observations are striking without invocation of the “frustration” terminology, usage of the term can be useful if it offers a unifying conceptual framework. However, as it stands, a clear definition of the term “frustration” is lacking, leaving readers to wonder what molecular configurations are considered “frustrated” and what are not (i.e., is the claim of observation of frustration falsifiable?). For instance, “frustrated microphase separation” appears in both the title and abstract. A logical question one may ask is: “Are all microphase separations frustrated”? If the answer is in the affirmative, does invocation of the term “frustration” add anything to our physical insight? If the answer is not in the affirmative, then how does one distinguish between microphase separations that are frustrated from those that are not frustrated? Presumably all simulated and experimental molecular configurations in the present study are those of lowest free energy for the given temperature. In other words, they are what they are. In the discussion about frustrated phase separation on p.13, for example, the authors appear to refer to the fact that chain connectivity is preventing hydrophobic residues to come together in a way to achieve the most favorable interactions as if there were no chain connectivity (one may imagine in that case all the hydrophobic residues will form a large cluster without microphase separation). Is this what the authors mean by “frustration”? If that’s true, isn’t that merely stating the obvious, at least for the observed microphase separation? In general, does “frustration” always mean deviation of actual, physical molecular configurations from certain imagined/hypothetical/reference molecular configurations, and therefore dependent upon the choice of the imagined reference configuration? If this is how the authors apply the term “frustration” in the present work, what is the zero-frustration reference state/configuration for microphase separation? And, similarly, what is the zero-frustration reference state/configuration when frustrated EPS-water interactions are discussed (p.14-p.15, Fig.5)? How do non-frustrated water-protein interactions look like? Is the classic clathrate-like organization of water hydrogen bonds around small nonpolar solute “frustrated”?

      We thank the reviewer for their insightful comment, and agree that the concept of “frustration” is both important to our conclusions and, upon review, is too vague in our previous draft of the manuscript.

      For conceptual simplicity and to maximize transferability to real biological systems, we will focus our discussion of frustration on one specific type, which we term “chain frustration.” Chain frustration occurs in states where tertiary interactions between chemically distinct polymer blocks favor phase separation, while chain connectivity prevents macroscopic phase separation from occurring.[5] This frustration leads to microphase separation with microdomains of different monomers.

      We agree with the reviewer that “all microphase separations” are frustrated, and have revised the title to

      “Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Furthermore, we also removed frustration from the abstract to read

      “The interspersion of hydrophilic and hydrophobic residues and a lack of secondary structure formation result in an interfacial environment, which explains both the strong correlation between ELP condensate stability and interfacial hydrophobicity scales, as well as the prevalence of protein-water hydrogen bonds.”

      We have limited our discussion of the frustration to the incomplete separation of hydrophobic and hydrophobic groups. As pointed out by the reviewer, in this case, frustration refers to the fact that chain connectivity is preventing hydrophobic residues from coming together in a way to achieve the most favorable interactions as if there were no chain connectivity. The reference would be a perfectly macroscopic phase separation that partitions hydrophobic from hydrophilic groups.

      While the frustration from chain connectivity is well understood for block copolymers[5], its effect on producing the interfacial solvation environment, to the best of our knowledge, has not been emphasized before. We have revised the text at the point where we mention frustration to clearly define its meaning.

      “Therefore, while microphase separation occurs in ELP condensates, frustration remains in the system. Hydrophilic residues cannot completely separate from hydrophobic ones due to constraints imposed by the acid sequence, creating unique microenvironments.”

      When discussing the interactions between ELP and water, we used the hydrogen bond analysis to emphasize the interfacial environment. For example, the hydrophobic residues tend to “repel” water molecules, reducing the hydrogen bond density; on the other hand, hydrophilic residues and backbone retain water molecules. This difference resulted in the positive and negative correlation with Tt shown in Fig 5C. The behavior of water molecules is, therefore, inhomogeneous inside the condensate. We expect water molecules to become frustrated due to the simultaneous contact with both hydrophobic and hydrophilic chemical groups, and a perfect reference state would be the pure water environment. However, since this point is not central to our study, to avoid confusion, we have avoided mentioning frustration and revised the text to read amino acid sequence, creating unique microenvironments.”

      “The water hydrogen bond density also highlights an interfacial environment of blended hydrophobic and hydrophilic regions.”

      After revising the text, frustration only appears three times in the manuscript.

      Comment 3: In the discussion about the correlation of various transfer free energy scales for amino acids and Urry’s critical onset temperature (ref.41) on p.11 and Fig.4, is there any theoretical relationship to be expected between the interactions among amino acids of ELPs and their critical onset temperatures? While a certain correlation may be intuitively expected if the free energy scale ”is working”, is there any theoretical insight into the mathematical form of this relationship? A clarifying discussion is needed because it bears logically on whether the observed correlation or lack thereof for different transfer energy scales is a good indication of the adequacy of the energy scales in describing the actual physical interactions at play. This question requires some prior knowledge of the expected mathematical relationship between interaction parameters and onset temperature.

      We thank the reviewer for their comment. The exact relationship between the interactions between amino acids and their transition temperature can be understood in terms of the Flory-Huggins theory, which describes the thermodynamics of polymer mixtures using a lattice model. The chemical composition of the mixture is built into the polymer-solvent interaction parameter

      Where is the coordination number, T is the temperature, kB is the Boltzmann constant, and {ϵpp, ϵss, ϵps} are the strength of polymer-polymer, solventsolvent, and polymer-solvent interactions respectively.[6]

      From the original derivation of Flory-Huggins theory, it can be shown that phase separation occurs when χ is greater than its critical value, or χC, we can derive the critical temperature as

      Δϵ can indeed be interpreted as the free energy cost of transferring a polymer bead from a solution phase to a polymer phase. It corresponds to the change of energy from a mixed state, with contacts between polymer and solvent (ϵps), to the demixed state with only polymer-polymer (ϵpp) and solvent-solvent (ϵss) contacts.

      Therefore, the transfer free energy, and the interactions among amino acids of ELPs, are expected to correlate with the critical temperature. The above discussion has been incorporated into the new section Supplemental Theory in the Supporting Information. There, we also discuss the more general scenario where Δϵ is temperature dependent, which is essential for giving rise to LCST.

      We have modified the main text in the discussions of Figure 4 to better explain these mathematical relationships and their necessary assumptions in order to help interpret our simulations. Here is an expert from where we discuss Figure 4:

      “The strong dependence of molecular organization on amino acid hydrophobicity suggests that the solvation environment of individual residues might be a determining factor for condensate stability. Indeed, as shown in the Supplemental Theory of the Supporting Information, the critical temperature is closely related to the free energy cost of transferring polymer beads from a solution state to a polymer-only environment. This transfer free energy is often used to quantify the hydrophobicity of amino acids [cite]. To explore their relationship more quantitatively, we compared the transition temperature for ELP condensates measured by Urry [cite] to several hydrophobicity scales.”

      Comment 4: To provide a more comprehensive context for the present study, it is useful to compare the microphase separation seen in the authors’ simulation with the micelle-like structures observed in recent simulated condensed/aggregated states of hydrophobic-polar (HP) model sequences in Statt et al., J Chem Phys 152, 075101 (2020) [see esp. Fig.6] and Wesse´n et al., J Phys Chem B 126, 9222-9245 (2022) [see, e.g., Fig.10].

      We thank the reviewer for this suggestion. The results of Statt et al. and Wessen et al.´ indeed provide a nice comparison to our results. While we capture some of the same behavior they observe, the full array of chemical space in our model seems to give some additional morphologies as well.

      First, as predicted by the self-consistent field theory, block copolymers are expected to form primarily lamellar like micelles that clearly seperate the dense and dilute phase when the volume fraction, f, is 0.5 (Response to Comment 5). This prediction is indeed consistent with results from simulations with the HP model, and is consistent with our simulations when the substituted amino acid, X, is sufficiently polar.

      However, this observation is only one of several behaviors we observe. In particular, our simulations also produce gyroid-like structures, which are predicted to emerge at small volume differences, i.e. f ≈ 0.4 or f ≈ 0.6. These different configurations likely emerge due to the more realistic representation of amino acids in our model, which presents more frustration than the HP model. In particular, the backbone atoms are inherently hydrophilic and cannot separate from the hydrophobic side chains. Therefore, under microphase separation, it is inherently difficult to separate the different chemical groups to form lamellar or micelle-like structures. This produces a condensate interior with interfacial properties that may not be captured by the HP model.

      We make note of the micelle-like topologies predicted by HP models in the revised text, citing both Statt et al. and Wessen et al.:´

      “Surprisingly, microphase separation did not produce lamellar morphology as expected for block copolymers with equal volume fraction of the two blocks (Fig. S3 in the Supporting Information) [cite]. In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobicpolar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      Comment 5: ”Gyroid-like morphology” is mentioned several times in the manuscript (p.4, p.8, p.17, Fig.S3). This is apparently an interesting observation, but a clear explanation is lacking. A more detailed and specific discussion, perhaps with additional graphical presentations, should be provided to demonstrate why the simulated condensed-phase ELP configurations are similar to the classical description of gyroid as in, e.g., Terrones & Mackay, Chem Phys Lett 207, 45-50 (1993) and Lambert et al., Phil Trans R Soc A 354, 2009-2023 (1996).

      We thank the reviewer for their comment. Gyroids are canonical structures for diblock copolymers.[5, 7, 8, 9] Their stability is predicted using self-consistent field theory (SCFT), and occurs due to the balance of the volume fraction of polymer block A (fA), the length of the polymer (N), and the Flory-Huggins interaction parameter (χ).[8, 9] The prediction from SCFT suggests that gyroids occur at smaller values of χN and values fA near, but not equal to 0.5 (Author response image 2).[10] We hypothesize that these configurations emerge at equal molar fraction of V and X amino acids due to small differences in solvation volume between each half of the polymer chain.

      Our support for gyroid-like structures is mainly from observations of two interpenetrating networks formed by the two ELP blocks. We have revised Figure S4 to clearly highlight the two networks as shown in Author response image 3.

      We have revised the main text to clearly define the gyroid-like structures as interpenetrating networks, and added the theoretical phase diagram of diblock copolymers predicted by SCFT as Figure S3 in the Supporting Information.

      “In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobic-polar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      We note, however, that proving that our observations are indeed gyroid structures requires more sophisticated mathematical analysis that is beyond the scope of the study. It is also possible that these structures are metastable in our simulations. We emphasize these caveats in the updated Discussion Section.

      “Further studies on the thermodynamic stability of these morphologies and comparing them with predictions from the self-consistent field theory shall provide more insights into the driving forces for their emergence [cite].”

      Author response image 2.

      Theoretical phase diagram[8] and corresponding morphologies for diblock copolymers. The phases are labeled as: body centered cubic (BCC), hexagonal cylinders (HEX), gyroid (GYR), and lamellar (LAM). fA is the volume fraction of a single polymer block, denoted A, χ is the Flory-Huggins interaction parameter, and N is the total degree of polymerisation. Figure reproduced from ref.[10] CC BY 4.0.

      Author response image 3.

      Representative configurations of (A) V5F5 and (B) V5L5 condensates from MARTINI simulations. The valine substituted half of the chain is colored blue (V5) and the X substituted half of the chain is colored red (X5). To highlight the interpenetrating networks formed by the two halves, only the X substituted half of the chain is shown on the left. Simulation interfaces are once repeated periodically in the positive x and positive y dimensions for clarity. High density regions formed by the multiple X substituted half of the chains are highlighted in yellow circles, with one of the chain shown in green.

      Response to referee 2

      Comment 1: The experimental characterization relies on BODIPY and SBD reporting, respectively, on viscosity and polarity. The fluorescent signal of these dyes can possibly depend on many other factors, including quenching. Additional controls are required, or a more extensive discussion with additional references, and a mention to potential limitations of this approach.

      We agree with the reviewer that the fluorescence lifetime signal will be affected by many factors. Compared with the fluorescence intensity, the fluorescence lifetime mainly depends on the dyes’ self properties and environmental factors. BODIPY and SBD have been used in biological systems to detect the microviscosity and micropolarity of condensates. Our group published the same SBD and BODIPY fluorophores in previous work to quantify the microenvironment of protein aggregation and condensations. The extended data (ChemBioChem 20:1078–1087. doi: 10.1002/cbic.201800782; Aggregate 4:e301. doi:10.1002/agt2.301; Nat Chem Biol 1–9. doi:10.1038/s41589-023-01477-1) shows evidences that the BODIPY is only sensitive to the viscosity while SBD is only sensitive to the polarity, but nonsensitive to other environmental factors. As for the quenched issue, the fluorophores with extended pi-rich structure display aggregation-caused quenching (ACQ) effect in high probe concentration, which will lower the fluorescence lifetime and intensity. We usually labeled the 20% molar ratio of the ELPs using NHS-ester fluorophores to get stock solutions. Due to the labeling efficiency, the exact labeling ratio is much lower than 20%. The labeled ELP stock solution will be further mixed with unlabeled ELP to get ELP solutions with low labeling fractions. We measured the ELPs labeled with a different fraction of dyes. The result shows that only BODIPY performs slight ACQ phenomena at a high

      Author response image 4.

      FLIM images of ELP condensates labeled with different fractions of dyes. A) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% BODIPY labels. B) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% fraction of SBD. Droplets were formed with a final concentration of 70 µM ELP labeled with different fractions of BODIPY or SBD in 2 M NaCl solution. Scale bar:5 µm.

      To mostly avoid the potential ACQ effect and achieve enough fluorescence signals, we finally use the ELP labeled with a lower fraction of dyes, 1% of BODIPY and 2.5 % of SBD, to perform the FLIM experiments. The data in Figure 3 will be corrected with the following data.

      Author response image 5.

      Structures of NHS-BODIPY and NHS-SBD, and representative FLIM images of V30A30, A30V30, V30G30 and G30V30 labeled with respective fluorophores. The fluorescence lifetime of each image is the average acquired from three independent experiments. Scale bar: 5 µm.

      We revised the text in the section Microphase separation of ELP condensates as follows “To experimentally test the microphase separation behavior uncovered in simulations, we studied the micro-physicochemical properties of the V-end and X-end of the peptides. We constructed diblock peptides with the combination of 30 pentameric repeats of V block and X (A or G) block, namely V30A30 and V30G30 (Experimental Sequences Section in the Supporting Information). The amino-termini of V30A30 and V30G30 sequences were subsequently labeled with environmentally sensitive BODIPY or SBD fluorophores [cite], whose lifetime could be measured to quantify the viscosity or polarity of the V-end (Fig. 3A, left panel) [cite]. These probes have been reported to be only sensitive to single physicochemical properties.[cite] To avoid artifacts induced by fluorophore labeling, we usually used ELPs labeled with a low fraction of dyes. We also constructed A30V30 and G30V30 diblock peptides, wherein the viscosity or polarity of the A-end or the G-end could be measured by fluorophores that are attached at the amino-terminus (Fig. 3A, right panel). Using FLIM, we found that the lifetime of BODIPY for the V-end (5.43 ns) was longer than that for the A-end (4.35 ns), suggesting that the V-end indeed has a higher microviscosity than the A-end (ηV= 2233.54 cp vs ηA= 969.57 cp). Accordingly, the lifetime of SBD was longer for the V-end (8.75 ns) than the A-end (7.00 ns), indicating that the micropolarity of the V-end was lower than the A-end (ϵV= 13.25 vs ϵA = 18.97). These observations could be largely attributed to the greater extent of dehydration at the V-end due to its higher local peptide density. We further showed that the observed differences are not results of possible artifacts arising from any subtle distinctions between the two sequences V30A30 and A30V30 (Experimental Characterization of ELP Condensates Section in the Supporting Information, Fig. S8-S9 in the Supporting Information). Similar results were observed using the V-G sequences. FLIM experiments revealed that the V-end was more viscous than the G-end (ηV= 2972.72 cp vs ηG= 1958.60 cp) and the V-end was less polar than the G-end (ϵV= 9.14 vs ϵG = 27.50). These experimental observations provided the first line of evidence to support the microphase separation, as suggested by the simulation results.”

      We revised the text in the section Experimental methods as follows

      “The proteins of interest were labeled with NHS ester fluorophore. We used ELPs with 1% BODIPY labels or 2.5% SBD labels to form condensates, which avoid the artifacts induced by fluorophores. Droplets were formed with the final concentration of 70 µM ELP in 2 M NaCl for V-A and 1.5 M NH4SO4 for V-G diblock, respectively. A drop of droplets containing solution was placed on a 0.17 mm coverslip with a 500 µm spacer. Images were acquired by Leica Falcon Fluorescence Microscope equipped with Wil pulse laser and 63X/0.12 oil-immersion objective. The BODIPY was excited at 488 nm and the SBD was excited at 448 nm. The fluorescence lifetime fitting and image analysis were performed in LAS X and Image J.”

      We also used a lower concentration of free dyes to remeasure the properties of the ELP condensates. The Figure S9 data are corrected as follows. The slight differences between the results are caused by experimental errors, which don’t affect the conclusion.

      Author response image 6.

      FLIM image of unlabeled ELP condensates. A) Chemical structure of free fluorophore, which can measure the physicochemical properties of condensates without labeling. B) Representative FLIM images of V30A30 and A30V30. The mix is the mixture of V30A30 (35 µM) and A30V30 (35 µM). Droplets were formed with a final concentration of 70 µM ELP in 2 M NaCl solution with 1 µM fluorophore. C) Representative FLIM images of V30G30 and G30V30. Droplets were formed with a final concentration of 70 µM ELP in 1.5 M (NH4)2SO4 solution with 1 µM fluorophore. The mix is the mixture of V30G30(35 µM) and G30V30 (35 µM). Scale bar, 5 µm. The fluorescence lifetime of each image is the average from three independent measurements.

      We also revised the Sequence dependence of micro-viscosity and polarity section of the Supporting Information as follows

      “Since we used V30X30 and X30V30 to quantify the V- and X-end of the V-X blocks, it is possible that the observed differences arose from the innate property of the V30X30 and X30V30 sequences. To rule out this artifact, we formed the ELP condensates with sequences of V30X30, X30V30, or the V30X30 and X30V30 mixture. The condensates were subsequently treated with the aldehydeBODIPY and methyl-ester SBD fluorophores without the NHS ester reactive warhead (Fig. S9A in the Supporting Information). After brief incubation, aldehyde-BODIPY and methyl-ester SBD fluorophores were recruited into and homogeneously distributed in the ELP condensates. The fluorescence lifetime of aldehyde-BODIPY was the same for V30A30 (4.96 ns), A30V30 (4.99 ns), and their mixture (4.98 ns) (Fig. S9B in the Supporting Information, upper panel). Interestingly, this value is around the average (4.89 ns) of the A-end (4.35 ns) and the V-end (5.43 ns) labeled NHS-BODIPY. For the SBD measurement, methyl-ester SBD resulted in almost identical lifetime values of V30A30 (8.25 ns), A30V30 (8.27 ns), and their mixture (8.28 ns) (Fig. S9B in the Supporting Information, lower panel), again around the average values (7.88 ns) of the A-end (7.00 ns) and the V-end (8.75 ns) labeled NHS-SBD. In addition to the V-A blocks, similar observations were made for the V-G blocks as V30G30 and G30V30 sequences (Fig. S9C in the Supporting Information). The slight difference between the results is attributed to the experiment errors. Because the fluorophores did not covalently label the amino-terminus of the ELP peptides, their lifetime reports closer to the averaged property of the condensates instead of the microscopic property of the V-end or the X-end when the number of molecules is sufficient and the molecular distribution has no preference.

      Our results reveal that the V30X30 and X30V30 condensates exhibited similar macroscopic viscosity or polarity, suggesting that the previously observed different viscosity or polarity of V30X30 and X30V30 could be attributed to the microscopic property of the V-end or X-end.”

      The FLIM technique combined with environment-sensitive fluorophores is a powerful tool for us to investigate the physicochemical properties of the microenvironment within the condensates. However, there are some limitations to this method. As the fluorophore is labeled in the protein, we can only detect the microenvironment surrounding the surface of the probe(the distance may be angstrom level). The fluorescence signal values we got are the statistical average of the fluorescence signals from the complex microenvironments. The signal from the probes is determined by the sampling position, orientation, and number of fluorescent probes. So the quantified values can be compared relatively, but these values can not accurately describe the physical or chemical states in different systems. In addition, the resolution in FLIM experiments is not enough to directly distinguish the microstructure in condensates.

      Comment 2: It is unclear if, after the application of stretching, the micro-structure will eventually return to the original configuration or not. Overall, the point of this experiment remains somewhat unclear.

      We thank the reviewer for this comment. The ELP condensates are actually viscous fluids and they could coalesce into larger droplets within seconds. Due to the high viscosity, ELP condensates show slow fluorescence recovery after photobleaching. As stretching the condensates, the micro-structure of condensates changes to show a response to the outer force. The fluorophores may be pulled out from the microenvironment. For such a dynamic system, we speculate that the microstructure will return to the original after the condensation system equilibrium, which may be a long process. However, it is hard to characterize whether these microstructures have completely returned to their original positions. The purpose of this experiment is to show the microenvironment properties of each terminal in another aspect. The experiment also shows evidence that the microenvironment around the V terminus is more dense than the A terminus.

      Comment 3: The title is too generic and does not reflect the content of the work. There is no analysis of biological condensates. The results are specific to di-block polypetides with specific sequences. This should be clearly specified in text and title.

      We have revised the title to ”Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Comment 4: MD is out of the expertise of this reviewer. However, when looking at the density profiles (Figure S2), the simulation does not seem to be fully converged. The densities fluctuate inconsistently along the Z direction. The authors should comment on assessing simulation convergence. In many cases, the section used for the density values in the plot (i.e., below 0.06 box lengths away from the condensate center) does not seem representative of the dense phase. It should be justified, why these simulations can still be used for density/hydrogen bonding analysis.

      We thank the reviewer for their comment, and agree that convergence of MD simulations is simultaneously important and difficult to control for. To demonstrate the convergence of our simulations, we have taken an example system (V5F5) and reproduced the density profile in 4 unique time windows of 50 ns each (Author response image 7A-D). We find that all distributions are nearly identical, indicating that further extending these simulations is unlikely to change our findings.

      While we agree that the choice of 0.06 box lengths is arbitrary, it was chosen as an approximation for the interior of the condensate, where the more hydrophobic half of the protein chain tends to be at higher concentration. However, this choice is not important to our overall conclusion. Halving (Author response image 7E) or doubling (Author response image 7F) the cutoff maintains the inverse correlation between the protein density of the X5 half of the condensate and experimental transition temperature.

      Finally, in our multiscale simulation approach, the all-atom portion of the simulation is mostly used to examine water structure and protein solvation. We can see that dividing the simulation into four independent time estimates does not substantially change these properties, resulting in low standard deviations in Figure 5 and Figure 6. Similarly, our previous work on the dielectric of ELP condensates has shown that choosing different starting structures from MARTINI simulations is unlikely to effect the estimate of similar quantities.[11]

      Author response image 7.

      Checking convergence of all-atom simulations of ELP condensates. (A-D) The relative mass density along the Z-distance from the condensate center is shown for the V-substituted and X-substituted halves of V5F5 in four independent time windows of 50 ns each. The Z−axis is defined as the direction perpendicular to the condensate-water interface. The dashed line represents a Z-distance of 0.06 box lengths away from the condensate center, which was the original cutoff for correlation analysis. E-F) Correlation between the mass fraction of the X5 half of the condensate and transition temperature (Tt) from Urry.[12] The condensate is defined as having a Z-distance of 0.03 box lengths (E) or 0.12 box lengths (F) away from the condensate center. ρ is the Pearson correlation coefficient between the two data sets, and the dashed diagonal line is the best fit line. Error bars represent standard deviations of the mean taken over box length intervals of 0.01.

      References

      (1) McDaniel JR, Radford DC, Chilkoti A (2013) A unified model for de novo design of elastin-like polypeptides with tunable inverse transition temperatures. Biomacromolecules 14:2866–2872.

      ](2) Meyer DE, Chilkoti A (2004) Quantification of the effects of chain length and concentration on the thermal behavior of elastin-like polypeptides. Biomacromolecules 5:846–851.

      (3) Helfand E, Tagami Y (1972) Theory of the interface between immiscible polymers. J. Chem. Phys. 56:3592.

      (4) Roe RJ (1975) Theory of the interface between polymers or polymer solutions. I. Two components system. J. Chem. Phys. 62:490–499.

      (5) Shi AC (2021) Frustration in block copolymer assemblies. J. Phys. Condens. Matter 33.

      (6) Flory PJ (1942) Thermodynamics of high polymer solutions. J. Chem. Phys. 10:51.

      (7) Grason GM (2006) The packing of soft materials: Molecular asymmetry, geometric frustration and optimal lattices in block copolymer melts. Phys. Rep. 433:1–64.

      (8) Matsen MW, Bates FS (1996) Unifying weak- and strong-segregation block copolymer theories. Macromolecules 29:1091–1098.

      (9) Matsen MW, Schick M (1994) Stable and unstable phases of a diblock copolymer melt. Phys. Rev. Lett. 72:2660–2663.

      (10) Swann JM, Topham PD (2010) Design and application of nanoscale actuators using block-copolymers. Polymers 2:454–469.

      (11) Ye S et al. (2023) Micropolarity governs the structural organization of biomolecular condensates. Nat. Chem. Biol. pp 1–9.

      (12) Urry DW (1997) Physical chemistry of biological free energy transduction as demonstrated by elastic protein-based polymers. J. Phys. Chem. B 101:11007–11028.

    2. eLife assessment

      This important study investigates the structural organization of a series of diblock elastin-like polypeptide condensates. The methodology is highly compelling, as it combines multiscale simulations and fluorescence lifetime imaging microscopy experiments. The results increase our understanding of model biomolecular condensates.

    3. Reviewer #1 (Public Review):

      This is an interesting, informative, and well-designed study that combines theoretical and experimental methodologies to tackle the phenomenon of higher-resolution structures/substructures in model biomolecular condensates. However, there is significant room for improvement in the presentation and interpretation of the results. As it stands, the precise definition of "frustration," which is a main theme of this manuscript (as emphasized in the title), is not sufficiently well articulated. This situation should be rectified to avoid "frustration" becoming a "catch-all" term without a clear perimeter of applicability rather than a precise, informative description of the physical state of affairs. There are also a few other concerns, e.g., regarding interpretation of correlation of phase-separation critical temperature and transfer free energy of amino acid residues as well as the difference between critical temperature and onset temperature, and the way the simulated configurations are similar to that of gyroids. Accordingly, the manuscript should be revised to address the following:

      (1) It is accurately pointed out on p.4 that elastin-like polypeptides (ELPs) undergo heat-induced phase separation and therefore exhibit lower critical solution temperatures (LCSTs). But it is not entirely clear how this feature is reproduced by the authors' simulation. A relationship between simulated surface tension and "transition temperature" is provided in Fig.1C; but is the "transition temperature" (authors cited ref.41 by Urry) the same as critical temperature? Apparently, Urry's Tt is "critical onset temperature", the temperature when phase separation happens at a given polymer concentration. This is different from the (global) critical temperature LCST - though the two may be correlated-or not-depending on the shape of the phase boundary. Moreover, is the MOFF coarse-grained forcefield (first step in the multi-scale simulation), by itself, capable of reproducing heat-induced phase separation in a way similar to the forcefield of Dignon et al., ACS Cent Sci 5, 821-230 (2019)? Or, is this temperature-dependent effect appearing only subsequently, after the implementation of the MARTINI and/or all-atom steps? Clarification is needed. To afford a more informative context for the authors' introductory discussion, the aforementioned Dignon et al. work and the review by Cinar et al. [Chem Eur J 25, 13049-13069 (2019)], both touching upon the physical underpinning of the LCST feature of elastin, should also be cited along with refs.41-43.

      (2) "Frustration" and "frustrated" are used prominently in the manuscript to characterize certain observed molecular configurations (11 times total, in both the title and in the abstract). Apparently, it is the most significant conceptual pronouncement of this work, hence its precise meaning is of central importance to the authors' thesis. Whereas one should recognize that the theoretical and experimental observations are striking without invocation of the "frustration" terminology, usage of the term can be useful if it offers a unifying conceptual framework. However, as it stands, a clear definition of the term "frustration" is lacking, leaving readers to wonder what molecular configurations are considered "frustrated" and what are not (i.e.,is the claim of observation of frustration falsifiable?). For instance, "frustrated microphase separation" appears in both the title and abstract. A logical question one may ask is: "Are all microphase separations frustrated"? If the answer is in the affirmative, does invocation of the term "frustration" add anything to our physical insight? If the answer is not in the affirmative, then how does one distinguish between microphase separations that are frustrated from those that are not frustrated? Presumably all simulated and experimental molecular configurations in the present study are those of lowest free energy for the given temperature. In other words, they are what they are. In the discussion about frustrated phase separation on p.13, for example, the authors appear to refer to the fact that chain connectivity is preventing hydrophobic residues to come together in a way to achieve the most favorable interactions as if there were no chain connectivity (one may imagine in that case all the hydrophobic residues will form a large cluster without microphase separation). Is this what the authors mean by "frustration"? If that's true, isn't that merely stating the obvious, at least for the observed microphase separation? In general, does "frustration" always mean deviation of actual, physical molecular configurations from certain imagined/hypothetical/reference molecular configurations, and therefore dependent upon the choice of the imagined reference configuration? If this is how the authors apply the term "frustration" in the present work, what is the zero-frustration reference state/configuration for microphase separation? And, similarly, what is the zero-frustration reference state/configuration when frustrated EPS-water interactions are discussed (~p.14-p.15, Fig.5)? How do non-frustrated water-protein interactions look like? Is the classic clathrate-like organization of water hydrogen bonds around small nonpolar solute "frustrated"?

      (3) In the discussion about the correlation of various transfer free energy scales for amino acids and Urry's critical onset temperature (ref.41) on p.11 and Fig.4, is there any theoretical relationship to be expected between the interactions among amino acids of ELPs and their critical onset temperatures? While a certain correlation may be intuitively expected if the free energy scale "is working", is there any theoretical insight into the mathematical form of this relationship? A clarifying discussion is needed because it bears logically on whether the observed correlation or lack thereof for different transfer energy scales is a good indication of the adequacy of the energy scales in describing the actual physical interactions at play. This question requires some prior knowledge of the expected mathematical relationship between interaction parameters and onset temperature.

      (4) To provide a more comprehensive context for the present study, it is useful to compare the microphase separation seen in the authors' simulation with the micelle-like structures observed in recent simulated condensed/aggregated states of hydrophobic-polar (HP) model sequences in Statt et al., J Chem Phys 152, 075101 (2020) [see esp. Fig.6] and Wessén et al., J Phys Chem B 126, 9222-9245 (2022) [see, e.g., Fig.10].

      (5) "Gyroid-like morphology" is mentioned several times in the manuscript (p.4, p.8, p.17, Fig.S3). This is apparently an interesting observation but a clear explanation is lacking. A more detailed and specific discussion, perhaps with additional graphical presentations, should be provided to demonstrate why the simulated condensed-phase ELP configurations are similar to the classical description of gyroid as in, e.g., Terrones & Mackay, Chem Phys Lett 207, 45-50 (1993) and Lambert et al., Phil Trans R Soc A 354, 2009-2023 (1996).

      Comments on the revised manuscript:

      The authors have adequately addressed my previous concerns.

    4. Reviewer #2 (Public Review):

      Summary:

      Latham A.P. et al. apply simulations and FLIM to analyse several di-block elastin-like polypetides and connect their sequence to the micro-structure of coacervates resulting from their phase-separation.

      Strengths:

      Understanding the molecular grammar of phase separating proteins and the connection with mesoscale properties of the coacervates is highly relevant. This work provides insights into micro-structures of coacervates resulting from di-block polypetides.

      Weaknesses:

      The results apply to a very specific architecture (di-block polypetides) with specific sequences.

    1. eLife assessment

      This study address a fundamental question: Do lipid rafts play a role in trafficking in the secretory pathway? By performing carefully controlled experiments with synthetic membrane proteins derived from the transmembrane region of LAT, the authors describe, model and quantify the importance of transmembrane domains in the kinetics of trafficking of a protein through the cell, from the ER to the cell surface via the Golgi. While their findings are solid, further experiments that relate to the existence and nature of domains at the TGN are necessary to provide a direct connection between the phase partitioning capability of the transmembrane regions of membrane proteins and the sorting potential of rafts.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.

      We thank the reviewer for the detailed feedback and thoughtful comments on how to improve our manuscript. To address the reviewer’s concerns, we revised our discussion and added new data. Below, we address the concerns point by point.

      Points that could be addressed or discussed:

      (1) The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.

      We demonstrate thatPOA GAD2→TMN cells become more frequently activated as the pressure for REMs builds up, whereas inhibiting these neurons during high REMs pressure leads to a suppression of the REMs rebound. It is not known how POA GAD2→TMN cells encodeincreased REMs pressure and subsequently influence the REMs rebound. REMsdeprivation wasshown to changethe intrinsic excitabilityof hippocampal neurons and impact synaptic plasticity (McDermott et al., 2003; Mallick and Singh, 2011 ; Zhou et al., 2020) . We speculate that increasedREMs pressure leads to an increase in the excitabilityof POA->TMN neurons, reflected inthe increased number ofcalcium peaks. The increased excitability of POA GAD2→TMN neurons in turn likely leads to stronger inhibition of downstream REM-off neurons. Consequently, as soon as REMsdeprivation stops, there is an increased chance for enteringREMs. The time coursefor how long it takes till the POA excitability resettles toits baseline consequently sets a permissive time window for increasedamounts of REMs to recover its lostamount. For future studies, it would be interesting to map how quickly the excitability ofPOA neurons increases or decays as afunction of the lost or recovered amount of REMs andunravel the cellularmechanisms underlying the elevated activity of POAGAD2 →TMN neurons during highREMs pressure, e.g., whether changes in the expression of ion channels contribute to increasedexcitability of these neurons (Donlea et al., 2014) . As we mentioned in the Discussion, the POAalso projects to other REMs regulatorybrain regions such as the vlPAG and LH. Therefore, it remains to be tested whether POA GAD2 →TMN neurons also innervate these brain regions to potentially regulate REMs homeostasis. We explicitly state this now in the revised Discussion.

      (2) The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?

      We acknowledge that other types of neurons in the TMN may also be involved in the REMs rebound, and therefore inhibition of histamine neurons by POA GAD2 →TMN neurons may not be the sole source of the observed effect. To stress that other neurons within the TMN and/or brain regions may also contribute to the REMs rebound, we have revised the Results section.

      We performed complementary optogenetic inhibition experiments of TMN HIS neurons to investigate if suppression of these neurons is sufficient to promote REMs. We foundthat SwiChR++ mediated inhibition of TMNHIS neurons increased theamount of REMs compared withrecordings without laser stimulation in the same mice and eYFPmice withlaser stimulation. Thus, while TMN HIS neurons may not bethe only downstream target of GABAergic POA neurons, these data suggest that they contribute to REMs regulation. We have incorporated these results in Fig. S4 .

      We further investigated whether the activity of TMN HIS neurons changes between two REMs episodes. Assumingthat REMs pressure inhibits the activity ofREM-off histamine neurons,their firing rates should behighest right after REMs ends when REMs pressure is lowest, and progressivelydecay throughout the inter-REM interval, and reach their lowest activity right before the onset of REMs ( Park et al., 2021) , similarto the activity profile observed for vlPAG REM-off neurons (Weber et al., 2018).We indeed found that TMNHIS neurons displaya gradual decrease in their activity throughout theinter-REM interval and thus potentially reflect the build up of REM pressure ( Fig. S2F ).

      (3) It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?

      Previous studies have demonstrated that POA GABAergic neurons, including those projecting to the TMN, are involved in NREMs homeostasis (Sherin et al., 1998; Gong et al., 2004; Ma et al., 2019) . Therefore, we predict that POA neurons that are involved in NREMs homeostasis are a subset of POA GAD2 → TMN neurons in our manuscript.

      Using optrode recordings in the POA, we recently reported that 12.4% of neurons sampled have higher activity during NREMs compared with REMs; in contrast, 43.8% of neurons sampled have the highest activity during REMs compared with NREMs (Antila et al., 2022) indicating that the proportion of NREM max neurons is smaller compared with REM max neurons. These proportions of neurons are in agreement with previous results (Takahashi et al., 2009) . Considering fiber photometry monitors the average activity of a population of neurons as opposed to individual neurons, it is possible that we recorded neural activity across heterogeneous populations and therefore our findings may disguise the neural activity of the low proportion of NREMs neurons. We previously reported thespiking activity of POA GAD2 →TMN neurons at the singlecell level (Chung et al., 2017) . We have noted in themanuscript thatwhile the activity ofPOA GAD2→TMN neurons is highestduring REMs, theneural activity increases at NREMs → REMs transitions indicating these neurons also areactive during NREMs.

      Using our REMs restriction protocol, we selectively restricted REMs leading to the subsequent rebound of REMs without affecting NREMs and consequently we did not find an increase in the amount of NREMs during the rebound or an increase in slow-wave activity, a key characteristic of sleep rebound that gradually dissipates during recovery sleep (Blake and Gerard, 1937; Williams et al., 1964; Rosa and Bonnet, 1985; Dijk et al., 1990; Neckelmann and Ursin, 1993; Ferrara et al., 1999) . However, during total sleep deprivation when subjects are deprived of both NREMs and REMs, isolating NREMs and REMs rebound may not be attainable.

      (4) Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?

      POA can be subdivided into anatomically distinct regions such as medial preoptic area, median preoptic area, ventrolateral preoptic area, and lateral preoptic area (MPO, MPN, VLPO, and LPO respectively). To quantify where the virus expressing GAD2 cells and optic fibers are located within the POA, we overlaid the POA coronal reference images (with red boundaries denoting these anatomically distinct regions) over the virus heat maps and optic fiber tracts from datasets used in Figure 1A. We found that virus expression and optic fiber tracts were located in the ventrolateral POA, lateral POA, and the lateral part of medial POA, and included this description in the text.

      Author response image 1.

      Location of virus expression (A) and optic fiber placement (B) within subregions of POA.

      (5) It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?

      Single-cell RNA-sequencing of POA neurons has revealed an enormous level of molecular diversity, consisting of nearly 70 subpopulations based on gene expression of which 43 can be clustered into inhibitory neurons (Moffitt et al., 2018) . One of the most studied subpopulation of POA sleep-active neurons contains the inhibitory neuropeptide galanin (Sherin et al., 1998; Gaus et al., 2002; Chung et al., 2017; Kroeger et al., 2018; Ma et al., 2019; Miracca et al., 2022) . Galanin neurons have been demonstrated to innervate the TMN (Sherin et al., 1998) yet, within the galanin neurons 7 distinct clusters exist based on unique gene expression (Moffitt et al., 2018) . In addition to galanin, we have previously performed single-cell RNA-seq on POA GAD2 → TMN neurons and identified additional neuropeptides such as cholecystokinin (CCK), corticotropin-releasing hormone (CRH), prodynorphin (PDYN), and tachykinin 1 (TAC1) as subpopulations of GABAergic POA sleep-active neurons (Chung et al., 2017; Smith et al., 2023) . Like galanin, these neuropeptides can also be divided into multiple subtypes as well (Chen et al., 2017; Moffitt et al., 2018) . Thus while these molecular markers for POA neurons are immensely diverse, we agree that characterizing the molecular identity of POA GAD2 → TMN neurons and investigating the functional relevance of these neuropeptides in the context of REMs homeostasis would enrich our understanding of a neural circuit involved in REMs homeostasis and can stand as a separate extension of this manuscript.

      Reviewer #2 (Public Review):

      Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.

      We thank the reviewer for the thorough assessment of our study and supportive comments. We have addressed your concerns in the revised manuscript, and our point by point response is provided below.

      The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.

      We thank the reviewer for this comment and agree that it would be interesting to know how REMs changes for a longer period of time throughout the rebound phase. For Fig. 2, we did not record the subsequent hours. For Fig 4, we recorded the subsequent rebound between ZT7.5 and 10.5. When we compare the REMs amount during this 4 hr interval, the SwiChR mice have less REMs compared with eYFP mice with marginal significance (unpaired t-test, p=0.0641). We also plotted the cumulative REMs amount during restriction and rebound phases, and found that the cumulative amount of REMs was still lower in SwiChR mice than eYFP mice at ZT 10.5 (Author response image 2). Therefore, it will be interesting to record for a longer period of time to test when the SwiChR mice compensate for all the REMs that was lost during the restriction period.

      Author response image 2.

      Cumulative amount of REMs during REMs deprivation and rebound combined with optogenetic stimulation in eYFP and SwiChR groups. This data is shown as bar graphs in Figure 4.

      REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.

      Author response image 3.

      Inhibiting POA GAD2→ TMN neurons at ZT5-8 reduces REMs. (A) Schematic of optogenetic inhibition experiments. (B) Percentage of time spent in REMs, NREMs and wakefulness with laser in SwiChR++ and eYFP mice. Unpaired t-tests, p = 0.0013, 0.0469 for REMs and wakeamount. (C) Duration of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0113 for NREMs duration. (D) Frequency of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0063, 0.0382 for REMs and NREMs frequency.

      REMs propensity is largest towards the end of the light phase (Czeisler et al., 1980; Dijk and Czeisler, 1995; Wurts and Edgar, 2000). As a control, we therefore performed the optogenetic inhibition experiments of POA GAD2→TMN neurons during ZT5-8 (Author response image 3). Similar to our results in Figure 2, we found that SwiChR-mediated inhibition of POA GAD2 →TMN neurons attenuated REMs compared with eYFP laser sessions. These findings suggest our results are consistentat other circadian times of the day.

      The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice).

      The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?

      The effect size for REM sleep deprivation is now added in the text.

      It is important to note that these figures are analyzing two different intervals of the REMs restriction. In Fig. S4D, we analyzed the total amount of REMs over the entire 6 hr restriction interval (ZT1.5-7.5). In Fig. 4, we analyzed the amount of REMs only during the last 3 hr of restriction (ZT4.5-7.5) as optogenetic inhibition was performed only during the last 3 hrs when the REMs pressure is high. In Fig. S4D, we looked at the amount of REMs during ZT1.5-4.5 and 4.5-7.5 and found that the amount of REMs during ZT4.5-7.5 (4.46 ± 0.25 %; mean ± s.e.m.) is indeed higher than ZT 1.5-4.5 (1.66 ± 0.62 %), and is comparable to the amount of REMs during ZT4.5-7.5 in eYFP mice (5.95 ± 0.52 %) in Fig. 4. We now clearly state in the manuscript at which time points we analyzed the amount, duration and frequency of REMs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) A few further citations suggested: Discussion "The TMN contains histamine producing neurons and antagonizing histamine neurons causes sleepiness..." It would be appropriate to cite Uygun DS et al 2016 J Neurosci (PMID: 27807161) here. Using the same HDC-Cre mice as used by Maurer et al., Uygun et al found that selectively increasing GABAergic inhibition onto histamine neurons produced NREM sleep.

      We apologize for omitting this important paper. In the revised manuscript, we added this citation.

      (2) Materials and Methods.

      Although the JAX numbers are given for the mouse lines based on researchers generously donating to JAX for others to use, please cite the papers corresponding to the GAD2-ires-Cre and HDC-ires-Cre mouse lines deposited at JAX.

      GAD2-ires-Cre was described in Taniguchi H et al., 2011, Neuron (PMID: 21943598).

      The construction of the HDC-ires-CRE line is described in Zecharia AY et al J Neurosci et al 2012 (PMID: 22993424).

      We have now added these important citations in the revised manuscript.

      (3) Similarly, for the viruses, please provide the citations for the AAV constructs that were donated to Addgene.

      We have now added these citations in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The authors rely heavily on their conclusions by using an optogenetic tool that inhibits the activity of GAD2+ neurons, however, it is not shown that these neurons are indeed inhibited as expected. An alternative approach to tackle this could be the application of a different technique to achieve the same output (e.g. chemogenetics). However, both experiments (confirmation of inhibition, or using a different technique) would require a significant amount of work, and given the numerous studies out there showing that these optogenetic tools tend to work, may not be necessary. Hence the authors could also cite a similar study that used a likewise construct and where it was indeed shown that this technique works (i.e. similar retrograde optogenetic construct with Cre depedendent expression combined with electrophysiological recordings).

      This laser stimulation protocol was designed based on previous reports of sustained inhibition using the same inhibitory opsin and our prior results that recapitulate similar findings as inhibitory chemogenetic techniques (Iyer et al., 2016; Kim et al., 2016; Wiegert et al., 2017; Stucynski et al., 2022). We have now added this description in the Result section.

      Fig1A - Right: the virus expression graphs are great and give a helpful insight into the variability. The image on the left (GCAMP+ cells) is less clear, the GCAMP+ cells don't differentiate well from the background. Perhaps the whole brain image with inset in POA can show the GCAMP expression more convincingly.

      We have added a histology picture showing the whole brain image with inset in the POA in the updated Fig. 1A .

      Statistics: The table is very helpful. Based on the degrees of freedom, it seems that in some instances the stats are run on the recordings rather than on the individual mice (e.g. Fig1). It could be considered to use a mixed model where subjects as taken into account as a factor.

      Author response image 4.

      ΔF/Factivity of POA GAD2→TMN neurons during NREMs. The duration of NREMs episodes was normalized in time, ranging from 0 to 100%. Shading, ± s.e.m. Pairwise t-tests with Holm-Bonferroni correctionp = 5.34 e-4 between80 and100. Graybar, intervals where ΔF/F activity was significantly different from baseline (0 to 20%, the first time bin). n = 10 mice. In Fig. 1E , we ran stats based on the recordings. In this data set, we ran stats based on the individual mice, and found that the activity also gradually increased throughout NREMs episodes.

      There is an effect of laser in Fig2 on REM sleep amount, as well as an interaction effect with virus injection (from the table). Therefore, it would be helpful for the reader to also show REM sleep data from the control group (laser stimulation but no active optogenetics construct) in Fig 2.

      To properly control laser and virus effect, we performed the same laser stimulation experiments in eYFP control mice (expressing only eYFP without optogenetic construct, SwiChR++) and the data is provided in Fig 2C .

      Fig3B: At the start of the rebound of REM sleep, there is a massive amount of wakefulness, also reflected in the change of spectral composition. Could you comment on the text about what is happening here?

      We quantified the amount of wakefulness during the first hour of REMs rebound and found that indeed there is no significant difference in wakefulness between REM restriction and baseline control conditions ( Fig. S4H ). Therefore, while the representative image in Fig 3B shows increased wakefulness at the beginning of REMs rebound, we do not think the overall amount of wakefulness is increased.

      Fig 4, supplementary data: it would be helpful for the reader to have mentioned in the text the effect size of the REM sleep restriction protocol (e.g. mean and standard deviation).

      Thank you for this suggestion. We have now added the effect size for the REM sleep restriction experiments in the main text.

      REM sleep restriction and photometry experiment: could be improved by adding within the main body of text that, in order to conduct the photometry experiment in the last hours of REM sleep deprivation, the manual REM sleep deprivation had to be applied, because the vibrating motor technique disturbed the photometry recordings.

      Thank you for this suggestion. We have added the description in the main text.

      Suggestion to build further on the already existing data (not for this paper): you have a powerful dataset to test whether REM sleep pressure builds up during wakefulness or NREM sleep, by correlating when your optogenetic treatment occurs (NREM or wakefulness), with the subsequent rebound in REM sleep (see also Endo et al., 1998; Benington and Heller, 1994; Franken 2001).

      We thank the reviewer for this excellent suggestion. We plan to carry out this experiment in the future.

      References

      Antila, H., Kwak, I., Choi, A., Pisciotti, A., Covarrubias, I., Baik, J., et al. (2022). A noradrenergic-hypothalamic neural substrate for stress-induced sleep disturbances. Proc. Natl. Acad. Sci. 119, e2123528119. doi: 10.1073/pnas.2123528119.

      Blake, H., and Gerard, R. W. (1937). Brain potentials during sleep. Am. J. Physiol.-Leg. Content 119, 692–703. doi: 10.1152/ajplegacy.1937.119.4.692.

      Chen, R., Wu, X., Jiang, L., and Zhang, Y. (2017). Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. Cell Rep. 18, 3227–3241. doi: 10.1016/j.celrep.2017.03.004.

      Chung, S., Weber, F., Zhong, P., Tan, C. L., Nguyen, T., Beier, K. T., et al. (2017). Identification of Preoptic Sleep Neurons Using Retrograde Labeling and Gene Profiling. Nature 545, 477–481. doi: 10.1038/nature22350.

      Czeisler, C. A., Zimmerman, J. C., Ronda, J. M., Moore-Ede, M. C., and Weitzman, E. D. (1980). Timing of REM sleep is coupled to the circadian rhythm of body temperature in man. Sleep 2, 329–346.

      Dijk, D. J., Brunner, D. P., Beersma, D. G., and Borbély, A. A. (1990). Electroencephalogram power density and slow wave sleep as a function of prior waking and circadian phase. Sleep 13, 430–440. doi: 10.1093/sleep/13.5.430.

      Dijk, D. J., and Czeisler, C. A. (1995). Contribution of the circadian pacemaker and the sleep homeostat to sleep propensity, sleep structure, electroencephalographic slow waves, and sleep spindle activity in humans. J. Neurosci. Off. J. Soc. Neurosci. 15, 3526–3538. doi: 10.1523/JNEUROSCI.15-05-03526.1995.

      Donlea, J. M., Pimentel, D., and Miesenböck, G. (2014). Neuronal machinery of sleep homeostasis in Drosophila. Neuron 81, 860–872. doi: 10.1016/j.neuron.2013.12.013.

      Ferrara, M., De Gennaro, L., Casagrande, M., and Bertini, M. (1999). Auditory arousal thresholds after selective slow-wave sleep deprivation. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 110, 2148–2152. doi: 10.1016/s1388-2457(99)00171-6.

      Gaus, S. E., Strecker, R. E., Tate, B. A., Parker, R. A., and Saper, C. B. (2002). Ventrolateral preoptic nucleus contains sleep-active, galaninergic neurons in multiple mammalian species. Neuroscience 115, 285–294. doi: 10.1016/S0306-4522(02)00308-1.

      Gong, H., McGinty, D., Guzman-Marin, R., Chew, K.-T., Stewart, D., and Szymusiak, R. (2004). Activation of c-fos in GABAergic neurones in the preoptic area during sleep and in response to sleep deprivation. J. Physiol. 556, 935–946. doi: 10.1113/jphysiol.2003.056622.

      Iyer, S. M., Vesuna, S., Ramakrishnan, C., Huynh, K., Young, S., Berndt, A., et al. (2016). Optogenetic and chemogenetic strategies for sustained inhibition of pain. Sci. Rep. 6, 30570. doi: 10.1038/srep30570.

      Kim, H., Ährlund-Richter, S., Wang, X., Deisseroth, K., and Carlén, M. (2016). Prefrontal Parvalbumin Neurons in Control of Attention. Cell 164, 208–218. doi: 10.1016/j.cell.2015.11.038.

      Kroeger, D., Absi, G., Gagliardi, C., Bandaru, S. S., Madara, J. C., Ferrari, L. L., et al. (2018). Galanin neurons in the ventrolateral preoptic area promote sleep and heat loss in mice. Nat. Commun. 9, 4129. doi: 10.1038/s41467-018-06590-7.

      Ma, Y., Miracca, G., Yu, X., Harding, E. C., Miao, A., Yustos, R., et al. (2019). Galanin Neurons Unite Sleep Homeostasis and α2-Adrenergic Sedation. Curr. Biol. CB 29, 3315-3322.e3. doi: 10.1016/j.cub.2019.07.087.

      Mallick, B. N., and Singh, A. (2011). REM sleep loss increases brain excitability: role of noradrenaline and its mechanism of action. Sleep Med. Rev. 15, 165–178. doi: 10.1016/j.smrv.2010.11.001.

      McDermott, C. M., LaHoste, G. J., Chen, C., Musto, A., Bazan, N. G., and Magee, J. C. (2003). Sleep deprivation causes behavioral, synaptic, and membrane excitability alterations in hippocampal neurons. J. Neurosci. Off. J. Soc. Neurosci. 23, 9687–9695. doi: 10.1523/JNEUROSCI.23-29-09687.2003.

      Miracca, G., Anuncibay-Soto, B., Tossell, K., Yustos, R., Vyssotski, A. L., Franks, N. P., et al. (2022). NMDA Receptors in the Lateral Preoptic Hypothalamus Are Essential for Sustaining NREM and REM Sleep. J. Neurosci. 42, 5389–5409. doi: 10.1523/JNEUROSCI.0350-21.2022.

      Moffitt, J. R., Bambah-Mukku, D., Eichhorn, S. W., Vaughn, E., Shekhar, K., Perez, J. D., et al. (2018). Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362. doi: 10.1126/science.aau5324.

      Neckelmann, D., and Ursin, R. (1993). Sleep stages and EEG power spectrum in relation to acoustical stimulus arousal threshold in the rat. Sleep 16, 467–477.

      Park, S.-H., Baik, J., Hong, J., Antila, H., Kurland, B., Chung, S., et al. (2021). A probabilistic model for the ultradian timing of REM sleep in mice. PLOS Comput. Biol. 17, e1009316. doi: 10.1371/journal.pcbi.1009316.

      Rosa, R. R., and Bonnet, M. H. (1985). Sleep stages, auditory arousal threshold, and body temperature as predictors of behavior upon awakening. Int. J. Neurosci. 27, 73–83. doi: 10.3109/00207458509149136.

      Sherin, J. E., Elmquist, J. K., Torrealba, F., and Saper, C. B. (1998). Innervation of histaminergic tuberomammillary neurons by GABAergic and galaninergic neurons in the ventrolateral preoptic nucleus of the rat. J. Neurosci. Off. J. Soc. Neurosci. 18, 4705–4721.

      Smith, J., Honig-Frand, A., Antila, H., Choi, A., Kim, H., Beier, K. T., et al. (2023). Regulation of stress-induced sleep fragmentation by preoptic glutamatergic neurons. Curr. Biol. CB , S0960-9822(23)01585–3. doi: 10.1016/j.cub.2023.11.035.

      Stucynski, J. A., Schott, A. L., Baik, J., Chung, S., and Weber, F. (2022). Regulation of REM sleep by inhibitory neurons in the dorsomedial medulla. Curr. Biol. CB 32, 37-50.e6. doi: 10.1016/j.cub.2021.10.030.

      Takahashi, K., Lin, J.-S., and Sakai, K. (2009). Characterization and mapping of sleep-waking specific neurons in the basal forebrain and preoptic hypothalamus in mice. Neuroscience 161, 269–292. doi: 10.1016/j.neuroscience.2009.02.075.

      Weber, F., Hoang Do, J. P., Chung, S., Beier, K. T., Bikov, M., Saffari Doost, M., et al. (2018). Regulation of REM and Non-REM sleep by periaqueductal GABAergic neurons. Nat. Commun. 9, 1–13. doi: 10.1038/s41467-017-02765-w.

      Wiegert, J. S., Mahn, M., Prigge, M., Printz, Y., and Yizhar, O. (2017). Silencing Neurons: Tools, Applications, and Experimental Constraints. Neuron 95, 504–529. doi: 10.1016/j.neuron.2017.06.050.

      Williams, H. L., Hammack, J. T., Daly, R. L., Dement, W. C., and Lubin, A. (1964). RESPONSES TO AUDITORY STIMULATION, SLEEP LOSS AND THE EEG STAGES OF SLEEP. Electroencephalogr. Clin. Neurophysiol. 16, 269–279. doi: 10.1016/0013-4694(64)90109-9.

      Wurts, S. W., and Edgar, D. M. (2000). Circadian and homeostatic control of rapid eye movement (REM) sleep: promotion of REM tendency by the suprachiasmatic nucleus. J. Neurosci. Off. J. Soc. Neurosci. 20, 4300–4310. doi: 10.1523/JNEUROSCI.20-11-04300.2000.

      Zhou, Y., Lai, C. S. W., Bai, Y., Li, W., Zhao, R., Yang, G., et al. (2020). REM sleep promotes experience-dependent dendritic spine elimination in the mouse cortex. Nat. Commun. 11, 4819. doi: 10.1038/s41467-020-18592-5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Dormancy/diapause/hibernation (depending on how the terms are defined) is a key life history strategy that allows the temporal escape from unfavorable conditions. Although environmental conditions do play a major role in inducing and terminating dormancy (authors call this energy limitation hypothesis), the authors test a mutually non-exclusive hypothesis (life-history hypothesis) that sex-specific selection pressures, at least to some extent, would further shape the timing of these life-history events. Authors use a metanalytic approach to collect data (mainly on rodents) on various life-history traits to test trade-offs among these traits between sexes and how they affect entry and termination of dormancy.

      Strengths:

      I found the theoretical background in the Introduction quite interesting, to the point and the arguments were well-placed. How sex-specific selection pressures would drive entry and termination of diapause in insects (e.g. protandry), especially in temperate butterflies, is very well investigated. Authors attempt to extend these ideas to endotherms and trying to find general patterns across ectotherms and endotherms is particularly exciting. This work and similar evidence could make a great contribution to the life-history theory, specifically understanding factors that drive the regulation of life cycle timing.

      Weaknesses:

      (1) I felt that including 'ectotherms' in the title is a bit misleading as there is hardly (in fact any?) any data presented on ectotherms. Also, most of the focus of the discussion is heavily mammal (rodent) focussed. I believe saying endotherms in the title as well is a bit misleading as the data is mammalfocused.

      We change the title to : "Evolutionnary trade-offs in dormancy phenology". This is a hybrid article comprising both a meta-analysis and a literature review. Each of these parts brings new elements to the hypotheses presented. The statistical analyses only concern mammals and especially rodent species. But the literature review highlighted links between the evolution of dormancy in ectotherms and endotherms that have not been linked in previous studies. We feel it is important for readers to know that much of the discussion will focus on the comparison of these two groups. But we understand that placing the term ectotherms in the title might suggest a meta-analysis including these two groups.

      In addition, we indicated more specifically in the abstract and at the end of the introduction that the article includes two approaches associated with different groups of animals.

      We also specified in the section « review criteria » that:

      Only one bird species is considered to be a hibernator, and no information is available on sex differences in hibernation phenology (Woods and Brigham 2004, Woods et al. 2019).

      We have also added a "study limitations" section, which explains that although the meta-analysis is limited by the data available in the literature, the information available for the species groups not studied seems to support our results.

      (2) I think more information needs to be provided early on to make readers aware of the diversity of animals included in the study and their geographic distribution. Are they mostly temperate or tropical? What is the span of the latitude as day length can have a major influence on dormancy timings? I think it is important to point out that data is more rodent-centric. Along the line of this point, is there a reason why the extensively studied species like the Red Deer or Soay Sheep and other well-studied temperate mammals did not make it into the list?

      We specified in the abstract and at the end of the introduction that the species studied in the metaanalysis are mainly Holarctic species. We have also added a map showing all the study sites used in the meta-analysis. Finally, we've noted in the methods and added a "study limitation" section at the end of the discussion an explanation for those species that were not studied in the meta-analysis and the consequences for the interpretation of results

      The hypotheses developed in this article are based on the survival benefits of seasonal dormancy thanks to a period of complete inactivity lasting several months. The Red Deer or Soay Sheep remain active above ground throughout the year.

      The effect of photoperiod on phenology is one of the mechanisms that has evolved to match an activity with the favorable condition. In this study, we are not interested in the mechanisms but in the evolutionary pressures that explain the observed phenology. Interspecific variation in the effect of photoperiod results from different evolutionary pressures, which we are trying to highlight. It is therefore not necessary to review mechanisms and effects of photoperiod, themselves requiring a lengthy review.

      We also tested the “physiological constraint hypothesis” on several variables. Temperature and precipitation are factors correlated with sex differences in phenology of hibernation. These factors allow consideration of the geographical differences that influence hibernation phenology.

      (3) Isn't the term 'energy limitation hypothesis' which is used throughout the manuscript a bit endotherm-centric? Especially if the goal is to draw generalities across ectotherms and endotherms. Moreover, climate (e.g. interaction of photoperiod and temperature in temperatures) most often induces or terminates diapause/dormancy in ectotherms so I am not sure if saying 'energy limitation hypothesis' is general enough.

      We renamed this hypothesis the "physiological constraint hypothesis" and we have made appropriate changes in the text so as not to focus physiological constraints solely on energy aspects.

      (4) Since for some species, the data is averaged across studies to get species-level trait estimates, is there a scope to examine within population differences (e.g. across latitudes)? This may further strengthen the evidence and rule out the possibility of the environment, especially the length of the breeding season, affecting the timing of emergence and immergence.

      For a given species, data on hibernation phenology are averaged for different populations, but also for the same population when measurements are taken over several years. To test these hypotheses on a population scale, precise data on reproductive effort would be needed for each population tested, but this concerns very few species (less than 5).

      Testing the effects of temperature and precipitation allows us to take into account the effects of climate on phenology.

      (5) Although the authors are looking at the broader patterns, I felt like the overall ecology of the species (habitat, tropical or temperate, number of broods, etc.) is overlooked and could act as confounding factors.

      Yes, that's why we also tested the physiological constraints hypothesis, including the effect of temperature and precipitation. For the life-history hypothesis, we also tested reproductive effort, which takes into account the number of offspring per year.

      (6) I strongly think the data analysis part needs more clarity. As of now, it is difficult for me to visualize all the fitted models (despite Table 1), and the large number of life-history traits adds to this complexity. I would recommend explicitly writing down all the models in the text. Also, the Table doesn't make it clear whether interaction was allowed between the predictors or not. More information on how PGLS were fitted needs to be provided in the main text which is in the supplementary right now. I kept wondering if the authors have fit multiple models, for example, with different correlation structures or by choosing different values of lambda parameter. And, in addition to PGLS, authors are also fitting linear regressions. Can you explain clearly in the text why was this done?

      To simplify the results, we reduced the number of models to just three: one for emergence and two for immergence. In place of Table 1, we have written the structure of the models used. We have added a sentence to the statistics section: “each PGLS model produces a λ parameter representing the effect of phylogeny ranging between 0 (no phylogeny effect) and 1 (covariance entirely explained by co-ancestry)”. We have tested only three PGLS models and the estimated lambda value for these models is 0.

      (7) Figure 2 is unclear, and I do not understand how these three regression lines were computed. Please provide more details.

      We tested new models and modified existing figures.

      Reviewer #2 (Public Review):

      Summary:

      An article with lots of interesting ideas and questions regarding the evolution of timing of dormancy, emphasizing mammalian hibernation but also including ectotherms. The authors compare selective forces of constraints due to energy availability versus predator avoidance and requirements and consequences of reproduction in a review of between and within species (sex) differences in the seasonal timing of entry and exit from dormancy.

      Strengths:

      The multispecies approach including endotherms and ectotherms is ambitious. This review is rich with ideas if not in convincing conclusions.

      Weaknesses:

      The differences between physiological requirements for gameatogenesis between sexes that affect the timing of heterothermy and the need for euthermy during mammalian hibernator are significant issues that underlie but are under-discussed, in this contrast of selective pressures that determine seasonal timing of dormancy. Some additional discussion of the effects of rapid climate change on between and within species phenologies of dormancy would have been interesting.

      Reviewer #2 (Recommendations For The Authors):

      This review provides a very interesting and ambitious among and within-species comparison of the seasonal timing of entry and exit from dormancy, emphasizing literature from hibernating mammals (sans bats and bears) and with attention to ectotherms. The authors test hypotheses related to the timing of food availability (energy) versus life history considerations (requirements for reproduction, avoiding predation) while acknowledging that these are not mutually exclusive. I offer advice for clarifications and description of the limitations of the data (accuracy of emergence and immergence times), but mainly seek more emphasis for small mammalian hibernators on the contrast for requirements for significant periods of euthermy prior to the emergence in males versus females, a contrast that has energetic and timing consequences in both the active and hibernation seasons.

      A consideration alluded to but not fully explained or discussed is the differences in mammals between species and sexes in the timing of what can be called ecological hibernation, which is the seasonal duration that an animal remains sequestered in its burrow or den, and heterothermic hibernation, between the beginning and end of the use of torpor. The two are not synonymous. When "emergence" is the first appearance above ground, there is a significant missing observation key to the energetic contrasts discussed in this review, that of this costly pre-emergence behavior.

      To explain the difference between heterothermic hibernation and ecological hibernation, we've added a section in review Criteria from materials and methods :

      “In this study, we addressed what can be called ecological hibernation, i.e. the seasonal duration that an animal remains sequestered in its burrow or den, which is assumed to be directly linked to the reduced risk of predation. In contrast, we did not consider heterothermic hibernation, which corresponds to the time between the beginning and end of the use of torpor. So when we mention hibernation, emergence or immergence, the specific reference is to ecological hibernation.”

      In arctic and other ground squirrel species, males remain at high body temperatures after immerging and remaining in their burrows in the fall for several days to a week, and more consistently and importantly, males that will attempt to breed in the spring end torpor but remain constantly in their burrows for as much as one month at great expense whilst undergoing testicular growth, spermatogenesis, spemiation, and sperm capacitation, processes that require continuous euthermy. Female arctic ground squirrels and non-breeding males do not and typically enter their first torpor bout 1-2 days after immergence and first appear above ground 1-3 days after their last arousal in spring.

      The weeks spent euthermic in a cold burrow in spring by males while undergoing reproductive maturation require a significant energetic investment (can equate to the cost of the previous heterothermic period) that contrasts profoundly with the pre-mating energetic investment by females.

      Males cache food in their hibernacula and extend their active season in late summer/fall in order to do so and feed from these caches in spring after resuming euthermy, often emerging at body weights similar to that at immergence. Similar between-sex differences in the timing of hibernation and heterothermy occur in golden-mantled and Columbian ground squirrels and likely most other Urocitellus spp., though less well described in other species. These differences are related to life histories and requirements for male vs. female gameatogenesis and, at the same time, energetic considerations in the costs to males for remaining euthermic while undergoing spermatogenesis and the cost related to whether males undergo gonadal development being dependent on individual body mass and cache size. These issues should be better discussed in this review.

      It is the time required to complete spermatogenesis, spermiation, and maturation of sperm not the time for growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We took this comment into account. As we found little evidence of an increase in testicular maturation time with relative testicular size (apart from table 4 in Kenagy and Trombulak, 1986), we no longer tested the effect of relative testicular size on protandry.

      We examined whether the ability to store food before hibernation might reduce protandry. Although food storage in the burrow may be favored for overcoming harsh environments or predation, model selection did not retain the food-storing factor. Thus, the ability to accumulate food in the burrow was not by itself likely to keep males of some species from emerging earlier (e.g. Cricetus cricetus, protandry : 20 day, Siutz et al., 2016). Early emerging males may benefit from consuming higher quality food or in competition with other males (e.g., dominance assertion or territory establishment, Manno and Dobson 2008).

      We developed these aspects in the discussion

      While it is admirable to include ectotherms in such a broad review and modelling, I can't tell what data from how many ectothermic species contributed to the models and summary data included in the figures.

      Too few data on ectotherms were available to include ectotherms in the meta-analysis

      Some consideration should be made to the limitations of the data extracted from the literature of the accuracy of emergence and immergence dates when derived from only observations or trapping data. The most accurate results come from the use of telemetry for location and data logging reporting below vs. above ground positioning and body temperature.

      We added a "study limits" section to the discussion to address all the limits in this commentary.

      L64 "favor reproduction", better to say "allow reproduction", since there is strong evolutionary pressure to initiate reproduction early, often anticipating favorable conditions for reproduction, to maximize the time available for young to grow and prepare for overwintering themselves.

      Also, generally, it is not how "harsh" an environment is but rather how short the growing season is.

      We took this comment into account.

      L80 More simply, individuals that have amassed sufficient energy reserves as fat and caches to survive through winter may opt to initiate dormancy. This may decrease but not obviate predation, since hibernating animals are dug from their burrows and eaten by predators such as bears and ermine.

      In this sentence, we indicated a gap between dormancy phenology and the growing season, which suggests survival benefits of dormancy other than from a physiological point of view. We've changed the sentence to make it clearer : “However, some animals immerge in dormancy while environnemental conditions would allow them (from a physiological point of view) to continue their activity, suggesting other survival benefits than coping with a short growing season”

      L88 other physiological or ecological factors.... (gameatogenesis).

      In this study, we examine possible evolutionary pressures and therefore the environmental factors that may influence hibernation phenology. We focus on reproductive effort because, assuming predation pressure, we would expect a trade-off between survival and reproduction.

      L113 beginning early to afford long active seasons to offspring while not compromising the survival of parents.

      We added to the sentence:

      “For females, emergence phenology may promote breeding and/or care of offspring during the most favorable annual period (e.g., a match of the peak in lactational energy demand and maximum food availability, Fig. 1) or beginning early to afford long active seasons to offspring while not compromising the survival of parents.”

      L117 based on adequate preparation for overwintering and enter dormancy....

      We modified the sentence as follows :

      recovering from reproduction, and after acquiring adequate energy stores for overwintering”

      L123 given that males outwardly invest the least time in reproduction yet generally have shorter hibernation seasons would seem to reject this hypothesis. This changes if you overtly include the time and energy that males expend while remaining euthermic preparing for hibernation, a cost that can be similar to energy expended during heterothermy.

      Males invest a lot of time in reproduction before females emerge (whether for competition or physiological maturation) and some males seem to be subject to long-term negative effects linked to reproductive stress (see Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313). Both processes may contribute to reducing the duration of male hibernation.

      L125 again, costs to support euthermy in males undergoing reproductive development is an investment in reproduction.

      You're right, but it's difficult to quantify. We tested a model that takes into account the reproductive effort during reproduction and prior to reproduction. We also considered the hypothesis that species living in a cold climate might have a low protandry while having a high reproductive effort due to their ability to feed in the burrow (interaction effect between reproductive effort and temperature). We think these changes answer your comment.

      L134 It isn't growing large testes that takes time, but instead completing spermatogenesis and maturation of sperm in the epdidymides.

      We removed this part.

      L140 Later immergence in male ground squirrels is related to accumulation and defense of cached food, activities that are related to reproduction the next spring. An experimental analysis that would be revealing is to compare immergence times in females that completed lactation to the independence of their litters vs. females that did not breed or lost their litters. Who immerges first?

      Body mass variation from emergence to the end of mating in males seems to explain the delayed immergence of males in species that don't hide food in their burrows for hibernation. For example, in spermophilus citellus, males immege on average more than 3 weeks after females, yet they do not hide food in their burrows for the winter.

      Such a study already exists and shows that non-breeding females immerge earlier than breeding females. We refer to it

      L386: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      L164 So you examined literature from 152 species but included data from only 29 species? Did you include data from social hibernators (marmots) that mate before emergence?

      With current models, we have 28 different species. We have few species because very few have data on both sex difference data and information on reproductive effort data (especially for males).

      Data on sex differences in hibernation were not available for social hibernating species.

      L169 Were these data from trapping or observation results? How reliable are these versus the use of information from implanted data loggers or collars that definitively document when euthermy is resumed and/or when immergence and first emergence occurs (through light loggers)?

      We did not focus heterothermic hibernation, but in ecological hibernation. We have no idea of the margin of error for these types of data, but we have discussed these limitations in the "Study limitations" section.

      L180, again, it is the time required to complete spermatogenesis and spermiation not the time for the growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We removed this part.

      L200 Males that accumulate caches in fall and then feed from those during the spring pre-emergence euthermic interval and after will often be at their seasonal maximum in body mass. Declining from that peak may not be stressful.

      It has been suggested that reproductive effort in Spermophilus citellus might induce long-term negative effects that delay male immergence.

      Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313.

      L210 How about altitude, which affects the length of the growing season at similar latitudes?

      We extracted the location of each study site to determine the temperature and precipitation at that precise location (based on interpolated climate surface). We therefore take into account differences in growing season (based on temperature) in altitude between sites.

      L267 How did whether males cache food or not figure into these comparisons? Refeeding before mating occurs during the pre-emergence euthermic interval.

      We removed this part.

      L332, 344 not a "proxy" but functionally related to advantages in mating systems with multiple mating males.

      We removed this part.

      L353 The need for a pre-emergence euthermic interval in male ground squirrels requires costs in the previous active season in accumulating and defending a cache and the proximal costs in spring while remaining at high body temperatures prior to emergence with resulting loss in body mass or devouring of the cache.

      You're right, but in this section, we quickly explain the benefits of food catching compared with other species that don't do so.

      L385 This review should discuss why females are not known to cache and contrast as "income breeders" from "capital breeder" males. What advantages of caches are females indifferent to (no need for a prolonged pre-emergence period) and what costs of accumulating caches do they avoid (prolonged activity period and defense of caches).

      We clarified the case of female emergence.

      L321 : “Thus, an early emergence of males may have evolved in response to sexual selection to accumulate energy reserve in anticipation of reproductive effort. Females, on the contrary, are not subject to intraspecific competition for reproduction and may have sufficient time before (generally one week after emergence) and during the breeding period to improve their body condition.”

      L388 I don't understand the logic of the conclusion that "did not ...adequately explain the late male immergence" in this section. The greater mass loss in males over the mating period is afforded by the presence of a cache that requires later immergence.

      We removed this part.

      L412 Not just congeners that invest less in reproduction, but within species individuals that do not attempt to breed in one or more years and thus have no reproductive costs should be an interesting comparison for differences in phenology from individuals that do breed. Non-breeders are often yearlings but can be a significant overall proportion of males that fail to fatten or cache enough to afford a pre-emergence euthermic period.

      L385: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      The sentence refers to individuals who reproduce little or not at all.

      L445 Males that gain weight between emergence and mating may do so by feeding from a cache regardless of how "harsh" an environment is.

      We observe this phenomenon even in species that are not known to hoard food

      “Gains in body mass observed for some individuals, even in species not known to hoard food, may indicate that the environment allows a positive energy balance for other individuals with comparable energy demands.”

      L492 Some insects retreat to refugia in mid-summer to avoid parasitism (Gynaephora).

      Escape from parasites is also a benefit of dormancy.

      Fig 1 - It is difficult to see the differences in black and green colors, esp if color blind.<br /> Maternal effort is front-loaded within the active season (line for "optimal period" shown in midseason).

      Add "energy" underneath c) Prediction (H1) and "reproduction" underneath d) "Prediction (H2). Explain the orange vs black, green colors of triangles.

      We made the necessary changes

      Fig 2 - I don't buy the regression lines as significant in this figure. The red line, cannot have a regression with two sample points and without the left-hand most dot, nothing is significant.

      We deleted this graph.

      Fig 3 - females only?

      We deleted this graph.

    2. Reviewer #2 (Public Review):

      Summary:

      An article with lots of interesting ideas and questions regarding the evolution of timing of dormancy, emphasizing mammalian hibernation but also including ectotherms. The authors compare selective forces of constraints due to energy availability versus predator avoidance and requirements and consequences of reproduction in a review of between and within species (sex) differences in the seasonal timing of entry and exit from dormancy.

      Strengths:

      The multispecies approach including endotherms and ectotherms is ambitious. This review is rich with ideas if not in convincing conclusions. Limitations are discussed yet are impactful, namely that differences among and within species are contrast only for ecological hibernation (the duration of remaining sequestered) and not for "heterothermic hibernation" the period between first and last torpor. Differences between the two can have significant energetic consequences, especially for mammals returning to euthermic levels of body temperature whilst remaining in their cold burrows before emerging, eg. reproductively developing males in spring.

      Weaknesses:

      The differences between physiological requirements for gameatogenesis between sexes that affect the timing of heterothermy and need for euthermy during mammalian hibernator are significant issues that underlie, but are under discussed, in this contrast of selective pressures that determine seasonal timing of dormancy. Some additional discussion of the effects of rapid rapid climate change on between and within species phenologies of dormancy would have been interesting.

    3. eLife assessment

      This valuable and ambitious review examines seasonal dormancy in various species, including hibernating mammals (excluding bats and bears) and ectotherms. It provides a solid test of hypotheses on dormancy timing, considering energetic constraints and life history as alternative drivers. The review will be of interest to evolutionary biologists.

    1. eLife assessment

      This paper addresses a fundamental issue in the field of autophagy: how is a protein responsible for autophagosome-lysosome fusion recruited to mature autophagosomes but not immature ones? The work succeeds in its ambition to provide a new conceptual advance. The evidence supporting the conclusions is convincing, with fluorescence microscopy, biochemical assays, and molecular dynamics simulations. This work will be of broad interest to cell biologists and biochemists studying autophagy, and also those focusing on lipid/membrane biology.

    2. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors report a molecular mechanism for recruiting syntaixn 17 (Syn17) to the closed autophagosomes through the charge interaction between enriched PI4P and the C-terminal region of Syn17. How to precisely control the location and conformation of proteins is critical for maintaining autophagic flux. Particularly, the recruitment of Syn17 to autophagosomes remains unclear. In this paper, the author describes a simple lipid-protein interaction model beyond previous studies focusing on protein-protein interactions. This represents conceptual advances.

      We would like to thank Reviewer #1 for the positive evaluation of our study.

      Reviewer #2 (Public Review):

      Summary:

      Syntaxin17 (STX17) is a SNARE protein that is recruited to mature (i.e., closed) autophagosomes, but not to immature (i.e., unclosed) ones, and mediates the autophagosome-lysosome fusion. How STX17 recognizes the mature autophagosome is an unresolved interesting question in the autophagy field. Shinoda and colleagues set out to answer this question by focusing on the C-terminal domain of STX17 and found that PI4P is a strong candidate that causes the STX17 recruitment to the autophasome.

      Strengths:

      The main findings are: 1) Rich positive charges in the C-terminal domain of STX17 are sufficient for the recruitment to the mature autophagosome; 2) Fluorescence charge sensors of different strengths suggest that autophagic membranes have negative charges and the charge increases as they mature; 3) Among a battery of fluorescence biosensors, only PI4P-binding biosensors distribute to the mature autophagosome; 4) STX17 bound to isolated autophagosomes is released by treatment with Sac1 phosphatase; 5) By dynamic molecular simulation, STX17 TM is shown to be inserted to a membrane containing PI4P but not to a membrane without it. These results indicate that PI4P is a strong candidate that STX17 binds to in the autophagosome.

      We would like to thank Reviewer #2 for pointing out these strengths.

      Weaknesses:

      • It was not answered whether PI4P is crucial for the STX17 recruitment in cells because manipulation of the PI4P content in autophagic membranes was not successful for unknown reasons.

      As we explained in the initial submission, we tried to deplete PI4P in autophagosomes by multiple methods but did not succeed. In this revised manuscript, we added the result of an experiment using the PI 4-kinase inhibitor NC03 (Figure 4―figure supplement 1), which shows no significant effect on the autophagosomal PI4P level and STX17 recruitment.

      Author response image 1.

      The PI 4-kinase inhibitor NC03 failed to suppress autophagosomal PI4P accumulation and STX17 recruitment. HEK293T cells stably expressing mRuby3–STX17TM (A) or mRuby3–CERT(PHD) (B) and Halotag-LC3 were cultured in starvation medium for 1 h and then treated with and without 10 μM NC03 for 10 min. Representative confocal images are shown. STX17TM- or CERT(PHD)-positive rates of LC3 structures per cell (n > 30 cells) are shown in the graphs. Solid horizontal lines indicate medians, boxes indicate the interquartile ranges (25th to 75th percentiles), and whiskers indicate the 5th to 95th percentiles. Differences were statistically analyzed by Welch’s t-test. Scale bars, 10 μm (main), 1 μm (inset).

      • The molecular simulation study did not show whether PI4P is necessary for the STX17 TM insertion or whether other negatively charged lipids can play a similar role.

      As the reviewer suggested, we performed the molecular dynamics simulation using membranes with phosphatidylinositol, a negatively charged lipid. STX17 TM approached the PI-containing membrane but was not inserted into the membrane within a time scale of 100 ns in simulations of all five structures. This data suggests that PI4P, which is more negatively charged than PI, is required for STX17 insertion. Thus, we have included these data in Figure 5E and F and added the following text to Lines 242–244. “Moreover, if the membrane contained phosphatidylinositol (PI) instead of PI4P, STX17 approached the PI-containing membrane but was not inserted into the membrane (Figure 5E, F, Video 3)."

      Author response image 2.

      (E) An example of a time series of simulated results of STX17TM insertion into a membrane consisting of 70% phosphatidylcholine (PC), 20% phosphatidylethanolamine (PE), and 10% phosphatidylinositol (PI). STX17TM is shown in blue. Phosphorus in PC, PE and PI are indicated by yellow, cyan, and orange, respectively. Short-tailed lipids are represented as green sticks. The time evolution series are shown in Video 3. (F) Time evolution of the z-coordinate of the center of mass (z_cm) of the transmembrane helices of STX17TM in the case of membranes with PI. Five independent simulation results are represented by solid lines of different colors. The gray dashed lines indicate the locations of the lipid heads. A scale bar indicates 5 nm.

      • The question that the authors posed in the beginning, i.e., why is STX17 recruited to the mature (closed) autophagosome but not to immature autophagic membranes, was not answered. The authors speculate that the seemingly gradual increase of negative charges in autophagic membranes is caused by an increase in PI4P. However, this was not supported by the PI4P fluorescence biosensor experiment that showed their distribution to the mature autophagosome only. Here, there are at least two possibilities: 1) The increase of negative charges in immature autophagic membranes is derived from PI4P. However the fluorescence biosensors do not bind there for some reason; for example, they are not sensitive enough to recognize PI4P until it reaches a certain level, or simply, their binding does not occur in a quantitative manner. 2) The negative charge in immature membranes is not derived from PI4P, and PI4P is generated abundantly only after autophagosomes are closed. In either case, it is not easy to explain why STX17 is recruited to the mature autophagosome only. For the first scenario, it is not clear how the PI4P synthesis is regulated so that it reaches a sufficient level only after the membrane closure. In the second case, the mechanism that produces PI4P only after the autophagosome closure needs to be elucidated (so, in this case, the question of the temporal regulation issue remains the same).

      We thank the reviewers for pointing this out. While the probe for weakly negative charges (1K8Q) labeled both immature and mature autophagosomes, the probes for intermediate charges (5K4Q and 3K6Q) and PI4P labeled only mature autophagosomes (Figure 2F, Figure 2–figure supplement 1B). Thus, we think that the autophagosomal membrane rapidly and drastically becomes negatively charged, and at the same time, PI4P is enriched. Although immature membranes may have weak negative charges, we did not examine which lipids contribute to the negative charges. Thus, we have added the following sentences to the Discussion part.

      “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283) “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to address the question of how the SNARE protein Syntaxin 17 senses autophagosome maturation by being recruited to autophagosomal membranes only once autophagosome formation and sealing is complete. The authors discover that the C-terminal region of Syntaxin 17 is essential for its sensing mechanism that involves two transmembrane domains and a positively charged region. The authors discover that the lipid PI4P is highly enriched in mature autophagosomes and that electrostatic interaction with Syntaxin 17's positively charged region with PI4P drives recruitment specifically to mature autophagosomes. The temporal basis for PI4P enrichment and Syntaxin 17 recruitment to ensure that unsealed autophagosomes do not fuse with lysosomes is a very interesting and important discovery. Overall, the data are clear and convincing, with the study providing important mechanistic insights that will be of broad interest to the autophagy field, and also to cell biologists interested in phosphoinositide lipid biology. The author's discovery also provides an opportunity for future research in which Syntaxin 17's c-terminal region could be used to target factors of interest to mature autophagosomes.

      Strengths:

      The study combines clear and convincing cell biology data with in vitro approaches to show how Syntaxin 17 is recruited to mature autophagosomes. The authors take a methodical approach to narrow down the critical regions within Syntaxin 17 required for recruitment and use a variety of biosensors to show that PI4P is enriched on mature autophagosomes.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      There are no major weaknesses, overall the work is highly convincing. It would have been beneficial if the authors could have shown whether altering PI4P levels would affect Syntaxin 17 recruitment. However, this is understandably a challenging experiment to undertake and the authors outlined their various attempts to tackle this question.

      We thank Reviewer #3 for pointing this out. Please see our above response to Reviewer #2 (Public Review).

      In addition, clear statements within the figure legends on the number of independent experimental repeats that were conducted for experiments that were quantitated are not currently present in the manuscript.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      Reviewer #1 (Recommendations For The Authors):

      This paper is well written and all experiments were conducted with a high standard. Several minor issues should be addressed before final publication.

      (1) To further confirm the charge interaction, a charge screening experiment should be performed for Fig. 2A.

      We have asked Reviewer #1 through the editor what this experiment meant and understood that it was to see the effects of high salt concentrations. We monitored the association of GFP-STX17TM with liposomes in the presence or absence of 1 M NaCl and found that it was blocked in a high ionic buffer. This data supports the electrostatic interaction of STX17 with membranes. We have included this data in Figure 2B and added the following sentences to Lines 124–126.

      “The association of STX17TM with PI4P-containing membranes was abolished in the presence of 1 M NaCl (Figure 2B). These data suggest that STX17 can be recruited to negatively charged membranes via electrostatic interaction independent of the specific lipid species.”

      Author response image 3.

      GFP–STX17TM translated in vitro was incubated with rhodamine-labeled liposomes containing 70% PC, 20% PE and 10% PI4P in the presence of 1 M NaCl or 1.2 M sucrose. GFP intensities of liposomes were quantified and shown as in Figure 1C (n > 30).

      (2) The authors claim that "Autophagosomes become negatively charged during maturation", based on experiments using membrane charge probes. Since it's mainly about the membrane, it's better to refine the claim to "The membrane of autophasosomes becomes...", which would be more precise and close to the topic of this paper.

      We would like to thank the reviewer for pointing this out. This point is valid. As recommended, we have collected the phrases “Autophagosomes become negatively charged during maturation” to “The membrane of autophagosomes becomes negatively charged during maturation” (Line 72, 118, 262, 969 (title of Figure2), 1068 (title of Figure2–figure supplyment1)).

      (3) The authors should add more discussion regarding the "specificity" for recruiting Syn17 through the charge interaction. Particularly, how Syn17 could be maintained before the closure of autophagosomes? For the MD simulations in Fig. 5, the current results don't add much to the manuscript. The cell biology experiments have demonstrated the conclusion. The authors could try to find more details about the insertion by analyzing the simulation movies. Do membrane packing defects play a role during the insertion process? A similar analysis was conducted for alpha-synuclein (https://pubmed.ncbi.nlm.nih.gov/33437978/).

      Regarding the mechanism of STX17 maintenance in the cytosol, we do not think that other molecules, such as chaperones, are essential because purified recombinant mGFP-STX17TM used in this study is soluble. However, it does not rule out such a mechanism, which would be a future study.

      In the paper by Liu et al. (PMID: 33437978), small liposomes with diameters of 25–50 nm are used. Therefore, there are packing defects in the highly curved membranes, to which alpha-synuclein helices are inserted in a curvature-dependent manner. On the other hand, autophagosomes are much larger (~1 um in diameter) and almost flat for STX17 molecules, so we think it is unlikely that STX17 recognizes the packing defect.

      Reviewer #2 (Recommendations For The Authors):

      • The two (and other) possibilities with regards to the interpretation of the negative charge/PI4P result in autophagic membranes are hoped to be discussed.

      As mentioned above, we have added the following sentences to the Discussion section. “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283)

      “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      • Fluorescence biosensors are convenient to give an overview of the intracellular distribution of various lipids, but some of them show false-negative results. For example, evectin-2-PH for PS binds to endosomes but not to the plasma membrane, even though the latter contains abundant PS. With regards to PI4P, some biosensors illuminate both the Golgi and autophagosome, while others do not appear to bind the Golgi. Moreover, fluorescence biosensors for PI(3,5)P2 and PI(3,4)P2, which are also candidates for the STX17 insertion issue, are less reliable than others (e.g., those for PI3P and PI(4,5)P2). These problems need to be considered.

      We agree with Reviewer #2 that fluorescence biosensors are not perfect for detecting specific lipids. Based on the Reviewer’s suggestion, we have included a comment on this in the Discussion section as follows (Lines 265–268).

      “Given the possibility that fluorescence lipid probes may give false-negative results, a more comprehensive biochemical analysis, such as lipidomics analysis of mature autophagosomes, would be imperative to elucidate the potential involvement of other negatively charged lipids.”

      • A negative control for the PI4P biosensor, i.e., a mutant lacking the PI4P binding ability, is better to be tested to confirm the presence of PI4P in autophagosomes.

      We would like to thank the Reviewer for this comment. We conducted the suggested experiment and confirmed that the CERT(PHD)(W33A) mutant, which is deficient for PI4P binding (Sugiki et al., JBC. 2012), was diffusely present in the cytosol and did not localize to STX17-positive autophagosomes. This data supports our conclusion that PI4P is indeed present in autophagosomes. We have included this data in Figure 3–figure supplement 2A and explained it in the text (Lines 164–166).

      Author response image 4.

      Mouse embryonic fibroblasts (MEFs) stably expressing GFP–CERT(PHD)(W33A) and mRuby3–STX17TM were cultured in starvation medium for 1 h. Bars indicate 10 μm (main images) and 1 μm (insets).

      • As a control to the molecular dynamic simulation study, STX17 TM insertion into a membrane containing other negative charge lipids, especially PI, needs to be tested. PI is a negative charge lipid that is likely to exist in autophagic membranes (as suggested by the authors' past study).

      We thank the reviewers for this suggestion. As mentioned above (Reviewer #2, Public Review), we performed the molecular dynamics simulation using membranes containing PI and added the results in Figure 5E and F and Video 3.

      • If the putative role of PI4P could be shown in the cellular context, the authors' conclusion would be much strengthened. I wonder if overexpression of PI4P fluorescence biosensors, especially those that appear to bind to the autophagosome almost exclusively, may suppress the recruitment of STX17 there.

      We would like to thank the Reviewer for asking this question. In MEFs stably overexpressing PI4P probes driven by the CMV promoter, STX17 recruitment was not affected. Thus, simple overexpression of PI4P probes does not appear to be effective in masking PI4P in autophagosomes.

      Another idea is to use an appropriate molecule (e.g., WIPI2, ATG5) and to recruit Sac1 to autophagic membranes by using the FRB-FKBP system or the like. I hope these and other possibilities will be tested to confirm the importance of PI4P in the temporal regulation of STX17 recruitment.

      We tried the FRB-FKBP system using the phosphatase domain of yeast Sac1 fused to FKBP and LC3 fused to FRB, but unfortunately, this system failed to deplete PI4P from the autophagosomal membrane.

      Reviewer #3 (Recommendations For The Authors):

      A few areas for suggested improvement are:

      (1) It would be helpful if the authors could clarify for all figures how many independent experiments were conducted for all experiments, particularly those that have quantitation and statistical analyses.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      The authors made several attempts to modulate PI4P levels on autophagosomes although understandably this proved to be challenging. A couple of suggestions are provided to address this area:

      (2) Given the reported role of GABARAPs in PI4K2a recruitment and PI4P production on autophagosomes, as well as autophagosome-lysosome fusion (Nguyen et al (2016) J Cell Biol) it would be worthwhile to assess whether GABARAP TKO cells have reduced PI4P and reduced Stx17 recruitment

      According to the Reviewer’s suggestion, we examined the localization of STX17 TM and the PI4P probe CERT(PHD) in ATG8 family (LC3/GABARAP) hexa KO HeLa cells that were established by the Lazarou lab (Nguyen et al., JCB 2016). As in WT cells, STX17 TM and CERT(PHD) were still colocalized with each other in hexa KO cells, suggesting that neither STX17 recruitment nor PI4P enrichment depends on ATG8 family proteins (note: the size of autophagosomes in HeLa cells is smaller than in MEFs, making it difficult to observe autophagosomes as ring-shaped structures). We have included this result in Figure 3–figure supplement 2(F) and explained it in the text (Lines 194–196, 198).

      Author response image 5.

      (F) WT and ATG8 hexa KO HeLa cells stably expressing GFP–STX17TM and transiently expressing mRuby3–CERT(PHD) were cultured in starvation medium. Bars indicate 10 μm (main images) and 1 μm (insets).

      (3) Can the authors try fusing Sac1 to one of the PI4P probes (CERT(PHD)) that were used, or alternatively to the c-terminus of Syntaxin 17? This approach would help to recruit Sac1 only to mature autophagosomes and could therefore prevent the autophagosome formation defect observed when fused to LC3B that targeted Sac1 to autophagosomes as they were forming. Understandably, this approach might seem a bit counterintuitive since the phosphatase is removing PI4P which is what is recruiting it but it could be a viable approach to keep PI4P levels low enough on mature autophagosomes so that Syntaxin 17 is no longer recruited. A Sac1 phosphatase mutant might be needed as a control.

      We would like to thank the Reviewer for these suggestions. We tried the phosphatase domain of yeast Sac1 or human SAC1 fused with STX17TM, but unfortunately, these fusion proteins did not deplete PI4P from autophagosomes.

    3. Reviewer #1 (Public Review):

      In this manuscript, the authors report a molecular mechanism for recruiting syntaxin 17 (Syn17) to the closed autophagosomes through the charge interaction between enriched PI4P and the C-terminal region of Syn17. How to precisely control the location and conformation of proteins is critical for maintaining autophagic flux. Particularly, the recruitment of Syn17 to autophagosomes remains unclear. In this paper, the author describes a simple lipid-protein interaction model beyond previous studies focusing on protein-protein interactions. This represents conceptual advances.

    4. Reviewer #2 (Public Review):

      Summary:

      Syntaxin17 (STX17) is a SNARE protein that is recruited to mature (i.e., closed) autophagosomes, but not to immature (i.e., unclosed) ones, and mediates the autophagosome-lysosome fusion. How STX17 recognizes the mature autophagosome is an unresolved interesting question in the autophagy field. Shinoda and colleagues set out to answer this question by focusing on the C-terminal domain of STX17 and found that PI4P is a strong candidate that causes the STX17 recruitment to the autophagosome.

      Strengths:

      The main findings are: 1) Rich positive charges in the C-terminal domain of STX17 are sufficient for the recruitment to the mature autophagosome; 2) Fluorescence charge sensors of different strengths suggest that autophagic membranes have negative charges and the charge increases as they mature; 3) Among a battery of fluorescence biosensors, only PI4P-binding biosensors distribute to the mature autophagosome; 4) STX17 bound to isolated autophagosomes is released by treatment with Sac1 phosphatase; 5) By dynamic molecular simulation, STX17 TM is shown to be inserted to a membrane containing PI4P but not to a membrane without it. These results indicate that PI4P is a strong candidate that STX17 binds to in the autophagosome.

      Weaknesses:

      • It was not answered whether PI4P is crucial for the STX17 recruitment in cells because manipulation of the PI4P content in autophagic membranes was not successful for unknown reasons.<br /> • The question that the authors posed in the beginning, i.e., why is STX17 recruited to the mature (closed) autophagosome but not to immature autophagic membranes, was not answered. The authors speculate that the seemingly gradual increase of negative charges in autophagic membranes is caused by an increase in PI4P. However, this was not supported by the PI4P fluorescence biosensor experiment that showed their distribution to the mature autophagosome only. Here, there are at least two possibilities: 1) The increase of negative charges in immature autophagic membranes is derived from PI4P. However the fluorescence biosensors do not bind there for some reason; for example, they are not sensitive enough to recognize PI4P until it reaches a certain level, or simply, their binding does not occur in a quantitative manner. 2) The negative charge in immature membranes is not derived from PI4P, and PI4P is generated abundantly only after autophagosomes are closed. In either case, it is not easy to explain why STX17 is recruited to the mature autophagosome only. For the first scenario, it is not clear how the PI4P synthesis is regulated so that it reaches a sufficient level only after the membrane closure. In the second case, the mechanism that produces PI4P only after the autophagosome closure needs to be elucidated (so, in this case, the question of the temporal regulation issue remains the same).

    5. Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to address the question of how the SNARE protein Syntaxin 17 senses autophagosome maturation by being recruited to autophagosomal membranes only once autophagosome formation and sealing is complete. The authors discover that the C-terminal region of Syntaxin 17 is essential for its sensing mechanism that involves two transmembrane domains and a positively charged region. The authors discover that the lipid PI4P is highly enriched in mature autophagosomes and that electrostatic interaction with Syntaxin 17's positively charged region with PI4P drives recruitment specifically to mature autophagosomes. The temporal basis for PI4P enrichment and Syntaxin 17 recruitment to ensure that unsealed autophagosomes do not fuse with lysosomes is a very interesting and important discovery. Overall, the data are clear and convincing, with the study providing important mechanistic insights that will be of broad interest to the autophagy field, and also to cell biologists interested in phosphoinositide lipid biology. The author's discovery also provides an opportunity for future research in which Syntaxin 17's c-terminal region could be used to target factors of interest to mature autophagosomes.

      Strengths:

      The study combines clear and convincing cell biology data with in vitro approaches to show how Syntaxin 17 is recruited to mature autophagosomes. The authors take a methodical approach to narrow down the critical regions within Syntaxin 17 required for recruitment and use a variety of biosensors to show that PI4P is enriched on mature autophagosomes.

      Weaknesses:

      There are no major weaknesses, overall the work is highly convincing. It would have been beneficial if the authors could have shown whether altering PI4P levels would affect Syntaxin 17 recruitment. However, this is understandably a challenging experiment to undertake and the authors outlined their various attempts to tackle this question.

    1. Reviewer #3 (Public Review):

      Summary:

      This manuscript by Liu et al. presents a case that CAPSL mutations are a cause of familial exudative vitreoretinopathy (FEVR). Attention was initially focused on the CAPSL gene from whole exome sequence analysis of two small families. The follow-up analyses included studies in which CAPSL was manipulated in endothelial cells of mice and multiple iterations of molecular and cellular analyses. Together, the data show that CAPSL influences endothelial cell proliferation and migration. Molecularly, transcriptomic and proteomic analyses suggest that CAPSL influences many genes/proteins that are also downstream targets of MYC and may be important to the mechanisms.

      Strengths:

      This multi-pronged approach found a previously unknown function for CAPSLs in endothelial cells and pointed at MYC pathways as high-quality candidates in the mechanism.

      Weaknesses:

      Two issues shape the overall impact for me. First, the unreported population frequency of the variants in the manuscript makes it unclear if CAPSL should be considered an interesting candidate possibly contributing to FEVR, or possibly a cause. Second, it is unclear if the identified variants act dominantly, as indicated in the pedigrees. The studies in mice utilized homozygotes for an endothelial cell-specific knockout, leaving uncertainty about what phenotypes might be observed if mice heterozygous for a ubiquitous knockout had instead been studied.

      In my opinion, the following scientific issues are specific weaknesses that should be addressed:

      (1) Please state in the manuscript the number of FEVR families that were studied by WES. Please also describe if the families had been selected for the absence of known mutations, and/or what percentage lack known pathogenic variants.

      (2) A better clinical description of family 3104 would enhance the manuscript, especially the father. It is unclear what "manifested with FEVR symptoms, according to the medical records" means. Was the father diagnosed with FEVR? If the father has some iteration of a mild case, please describe it in more detail. If the lack of clinical images in the figure is indicative of a lack of medical documentation, please note this in the manuscript.

      (3) The TGA stop codon can in some instances also influence splicing (PMID: 38012313). Please add a bioinformatic assessment of splicing prediction to the assays and report its output in the manuscript.

      (4) More details regarding utilizing a "loxp-flanked allele of CAPSL" are needed. Is this an existing allele, if so, what is the allele and citation? If new (as suggested by S1), the newly generated CAPSL mutant mouse strain needs to be entered into the MGI database and assigned an official allele name - which should then be utilized in the manuscript and who generated the strain (presumably a core or company?) must be described.

      (5) The statement in the methods "All mice used in the study were on a C57BL/6J genetic background," should be better defined. Was the new allele generated on a pure C57BL/6J genetic background, or bred to be some level of congenic? If congenic, to what generation? If unknown, please either test and report the homogeneity of the background, or consult with nomenclature experts (such as available through MGI) to adopt the appropriate F?+NX type designation. This also pertains to the Pdgfb-iCreER mice, which reference 43 describes as having been generated in an F2 population of C57BL/6 X CBA and did not designate the sub-strain of C57BL/6 mice. It is important because one of the explanations for missing heritability in FEVR may be a high level of dependence on genetic background. From the information in the current description, it is also not inherently obvious that the mice studied did not harbor confounding mutations such as rd1 or rd8.

      (6) In my opinion, more experimental detail is needed regarding Figures 2 and 3. How many fields, of how many retinas and mice were analyzed in Figure 2? How many mice were assessed in Figure 3?

      (7) I suggest adding into the methods whether P-values were corrected for multiple tests.

    2. eLife assessment

      This study explores the role of calcyphosine-like (CAPSL) in Familial Exudative Vitreoretinopathy (FEVR) via the MYC pathway, offering valuable insights into disease mechanisms that are supported by a solid, multi-pronged approach. The overall significance of the study might, however, be limited due to weak support from human genetic studies.

    3. Reviewer #1 (Public Review):

      Summary:

      The author presents the discovery and characterization of CAPSL as a potential gene linked to Familial Exudative Vitreoretinopathy (FEVR), identifying one nonsense and one missense mutation within CAPSL in two distinct patient families afflicted by FEVR. Cell transfection assays suggest that the missense mutation adversely affects protein levels when overexpressed in cell cultures. Furthermore, conditionally knocking out CAPSL in vascular endothelial cells leads to compromised vascular development. The suppression of CAPSL in human retinal microvascular endothelial cells results in hindered tube formation, a decrease in cell proliferation, and disrupted cell polarity. Additionally, transcriptomic and proteomic profiling of these cells indicates alterations in the MYC pathway.

      Strengths:

      The study is nicely designed with a combination of in vivo and in vitro approaches, and the experimental results are good quality.

      Weaknesses:

      My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. Further careful examination of human genetics evidence in both patient cohorts and the general population will help to clarify. In light of human genetics, more caution needs to be exercised when interpreting results from mice and cell models and how is it related to the human patient phenotype.

    4. Reviewer #2 (Public Review):

      Summary:

      This work identifies two variants in CAPSL in two-generation familial exudative vitreoretinopathy (FEVR) pedigrees, and using a knockout mouse model, they link CAPSL to retinal vascular development and endothelial proliferation. Together, these findings suggest that the identified variants may be causative and that CAPSL is a new FEVR-associated gene.

      Strengths:

      The authors' data provides compelling evidence that loss of the poorly understood protein CAPSL can lead to reduced endothelial proliferation in mouse retina and suppression of MYC signaling in vitro, consistent with the disease seen in FEVR patients. The study is important, providing new potential targets and mechanisms for this poorly understood disease. The paper is clearly written, and the data generally support the author's hypotheses.

      Weaknesses:

      (1) Both pedigrees described appear to suggest that heterozygosity is sufficient to cause disease, but authors have not explored the phenotype of Capsl heterozygous mice. Do these animals have reduced angiogenesis similar to KOs? Furthermore, while the p.R30X variant protein does not appear to be expressed in vitro, a substantial amount of p.L83F was detectable by western blot and appeared to be at the normal molecular weight. Given that the full knockout mouse phenotype is comparatively mild, it is unclear whether this modest reduction in protein expression would be sufficient to cause FEVR - especially as the affected individuals still have one healthy copy of the gene. Additional studies are needed to determine if these variants alter protein trafficking or localization in addition to expression, and if they can act in a dominant negative fashion.

      (2) The manuscript nicely shows that loss of CAPSL leads to suppressed MYC signaling in vitro. However, given that endothelial MYC is regulated by numerous pathways and proteins, including FOXO1, VEGFR2, ERK, and Notch, and reduced MYC signaling is generally associated with reduced endothelial proliferation, this finding provides little insight into the mechanism of CAPSL in regulating endothelial proliferation. It would be helpful to explore the status of these other pathways in knockdown cells but as the authors provide only GSEA results and not the underlying data behind their RNA seq results, it is difficult for the reader to understand the full phenotype. Volcano plots or similar representations of the underlying expression data in Figures 6 and 7 as well as supplemental datasets showing the differentially regulated genes should be included. In addition, while the paper beautifully characterizes the delayed retinal angiogenesis phenotype in CAPSL knockout mice, the authors do not return to that model to confirm their in vitro findings.

      (3) In Figure S2D, the result of this vascular leak experiment is unconvincing as no dye can be seen in the vessels. What are the kinetics for biocytin tracers to enter the bloodstream after IP injection? Why did the authors choose the IP instead of the IV route for this experiment? Differences in the uptake of the eye after IP injection could confound the results, especially in the context of a model with vascular dysfunction as here.

      (4) In Figure 5, it is unclear how filipodia and tip cells were identified and selected for quantification. The panels do not include nuclear or tip cell-specific markers that would allow quantification of individual tip cells, and in Figure 5C it appears that some filipodia are not highlighted in the mutant panel.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Below I summarize points that should be addressed in a revised version of the manuscript.

      • Page 6, first paragraph: I don't understand by the signals average out to a single state. If the distribution is indeed randomly distributed, a broad signal with low intensity should be present.

      We agree that this statement may cause confusion. We changed the text (marked in bold) to clarify the statement: The mobility of the undocked SBDs will be higher than the diffusion of the whole complex, allowing the sampling of varying interdomain distances within a single burst. However, these dynamic variations are subsequently averaged to a singular FRET value during FRET calculations for each burst, and may appear as a single low FRET state in the histograms.

      • Page 6, third paragraph: how can the donor only be detected in the acceptor channel? Is this tailing out?

      Donor only signal is not detected in the acceptor channel. As described in page 5 and in the Materials & Methods section, the dye stoichiometry value is defined for each burst/dwell using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      When no acceptor fluorophore is present FAA=0 and S=1.

      Some donor photons bleed through into the acceptor channel, but we correct for this by calculating the leakage and crosstalk factors as described in the Materials and Methods (page 20).

      We changed the text (marked in bold) in the manuscript to address the question: The FRET data of both OpuA variants is best explained by a four-state model (Figure 2A,B; fourth and fifth panel) (Supplementary File 3). Two of the four states represent donor-only (S≈1) or acceptor-only (S≈0) dwells. The full bursts belonging to donor-only and acceptor-only molecules were excluded prior to mpH2MM. This means that some molecules transit to a donor-only or acceptor-only state within the burst period, which most likely reflects blinking or bleaching of one of the fluorophores. These donoronly and acceptor-only states were also excluded during further analysis. The other two states reflect genuine FRET dwells that were analyzed by mpH2MM. They represent different conformations of the SBDs.

      • Page 7, "SBD dynamics ..": why was the V149Q mutant only analyzed in the K521C background and not also in the N414C background?

      The two FRET states were best distinguished in OpuA-K521C. Therefore, we decided to focus on OpuA-K521C and not OpuA-N414C. OpuA-V149Q was used to show that reduced docking efficiency does not affect the transition rate constants and relative abundances of the two FRET states, and we regarded it sufficient to test the SBD dynamics in OpuA-K521C only.

      • Page 8, second paragraph: why was the N414C mutant analyzed only from 0 - 600 mM and not also up to 1000 mM?

      In line with the previous answer, our main focus was on OpuA-K521C, since the two FRET states were best distinguished in OpuA-K521C. OpuA-N414C was used to prove that similar states are observed when measuring with fluorophores on the opposite site of the SBD. We studied how the FRET states change in response to different conditions that correspond to different stages of the transport cycle and how it changes in response to different ionic strengths. Initially, 600 mM KCl was used to study the dynamics of the SBD at high ionic strength. Later in this study, we tested a very wide range of different salt concentrations for OpuA-K521C to get detailed insights into the dynamics of the SBDs over a wide ionic strength range. Note that 1 M KCl is a very high, non-physiological ionic strength for the typical habitat of L. lactis and was only used to show that the high FRET state occurs even under very extreme conditions.

      • Page 8, third paragraph: why was the dimer (if it is the source of the FRET signal) only partially disrupted?

      We acknowledge that this is a very good point. However, we purposely did not speculate on this point in the manuscript, because we have limited information on the molecular details of the interaction. As we highlight on page 8, the SBDs experience each other in a very high apparent concentration (millimolar range). This means that the interactions are most likely very weak (low affinity) and not very specific. Such interactions are in the literature referred to as the quinary structure of proteins and they occur at the high macromolecular crowding in the cell and in proteins with tethered domains, and thus at high local concentrations. Such interactions can be screened by high ionic strength. In the revised manuscript, we now present the partially disrupted dimer structure in the context of the quinary structure of a protein (page 11):

      In other words, the high FRET state may comprise an ensemble of weakly interacting states rather than a singular stable conformation, resembling the quinary structure of proteins. The quinary structure of proteins is typically revealed in highly crowded cellular environments and describes the weak interactions between protein surfaces that contribute to their stability, function, and spatial organization (Guin & Gruebele, 2019). Despite the current study being conducted under dilute conditions, the local concentration of SBDs (~4 mM) mimics a densely populated environment and reveal quinary structure.

      • Page 9, second paragraph: according to the EM data processing, only 20% of the particles were used for 3D reconstruction. Why? Does it mean that the remaining 80% were physiologically not relevant? If so, why were the 20% used relevant?

      We note that it is a fundamental part of image processing of single particle cryo-EM data to remove false positives or low-resolution particles throughout the processing workflow. In particular when using a very low and therefore generous threshold during automated particle picking, as we did (t=0.01 and t=0.05 for the 50 mM KCl and 100 mM KCl datasets, respectively), the initial set of particles includes a significant amount of false positives – a tradeoff to avoid excluding particles belonging to low populated classes/orientations. It is thus common that more than 50% of ‘particles’ are excluded in the first rounds of 2D classification. In our case, only 30% and 52% of particles were retained after such first clean-up steps. Subsequently, the particle set is further refined, and additional false positives and low-resolution particles are excluded during extensive rounds of 3D classification. We also note that during the final steps, most of the data excluded represents particles of lower quality that do not contribute to a high-resolution, or belong to low population protein conformations. This does not mean that such a population is not physiological relevant. In conclusion, having only 5-20% of the initial automated picked particles contributing to the reconstruction of the final cryo-EM map is common, with the vast majority of excluded particles being false positives.

      • Page 11, third paragraph: the way the proposed model is selected is also my main criticism. All alternative models do not fit the data. Therefore, the proposed model is suggested. However, I do not grasp any direct support for this model. Either I missed it or it is not presented.

      Concerning the specific model in Figure 5, the reviewer is correct. We do not provide direct evidence for a side-ways interaction. However, we have evidence of transient interactions and our data rule out several scenarios of interaction, leaving 5C as the most likely model. This is also the main conclusion of this paper: In conclusion, the SBDs of OpuA transiently interact in a docking competent conformation, explaining the cooperativity between the SBDs during transport. The conformation of this interaction is not fixed but differs substantially between different conditions.

      Because the interaction is very short-lived it was not possible to visualize molecular details of this interaction. We present Figure 5 to hypothesize the most likely type of interaction, since many possibilities can be excluded with the vast amount of presented data. To make our point more clear that we discuss models and rule out several possibilities but not demonstrate a specific interaction between the SBDs, we now write on page 10 (changes marked in bold): We have shown that the SBDs of OpuA come close together in a short-lived state, which is responsive to the addition of glycine betaine (Figure 4A). Although the occurrence of the state varies between different conditions, it was not possible to negate the high-FRET state completely, not even under very high or low KCl concentrations, or in the presence of 50 mM arginine plus 50 mM glutamate (Figure 4A,B). To evaluate possible interdomain interactions scenarios we consider the following: (1) The SBDs of OpuA are connected to the TMDs with very short linkers of approximately 4 nm, which limit their movement and allow the receptor to sample a relatively small volume near its docking site. (2) in low ionic strength condition OpuA-K521C displays a high FRET state with mean FRET values of 0.7-0.8, which correspond to inter-dye distances of approximately 4 nm. (3) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (4) The distance between the density centers of the SBDs in the cryo-EM reconstructions (based on particles with a low and high FRET state) is 6 nm, which aligns with the dimensions of an SBD (length: ~6 nm, maximal width: ~4 nm). These findings collectively indicate that two SBDs interact but not necessarily in a singular conformation but possibly as an ensemble of weakly interacting states. Hence, we discuss three possible SBD-SBD interaction models to explain the highFRET state:

      Reviewer #2 (Recommendations For The Authors):

      In the abstract and elsewhere the authors suggest that the SBDs physically interact with one another, and that this interaction is important for the transport mechanism, specifically for its cooperativity.

      I feel that this main claim is not well established. The authors convincingly demonstrate that the SBDs largely occupy two states relative to one another and that in one of these states, they are closer than in the other. Unless I have missed (or failed to understand) some major details of the results, I did not find any evidence of a physical interaction. Have the authors established that the high FRET state indeed corresponds to the physical engagement of the SBDs? I feel that a direct demonstration of an interaction is much missing.

      Along the same lines, in the low-salt cryo-EM structure, where the SBDs are relatively closer together, the SBDs are still separated and do not interact.

      See also our response to the final comment of reviewer 1. Furthermore, please carefully consider the following: (1) FRET values of 0.7-0.8 correspond to inter-dye distances of approximately 4 nm. (2) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (3) The cryo-EM reconstruction is the average of all the particles in the final dataset, including both the particles with a low and high FRET state. Further, the local resolution of the SBDs in the cryo-EM map is low, indicative of high degree of flexibility. Thus, a potential interaction is possible within the observed range of flexibility. (4) The distance between the density centers is 6 nm, aligning with the dimensions of an SBD (length: 6 nm, maximal width: 4 nm). These factors collectively indicate SBD interactions, and we present these points now more explicitly in Figure 4 and the last part of the results section (page 9).

      Once the authors successfully demonstrate that direct physical interaction indeed occurs, they will need to provide data that places it in the context of the transport cycle. Do the SBDs swap ligand molecules between them? Do they bind the ligand and/or the transporter cooperatively? What is the role of this interaction?

      We acknowledge the intriguing nature of the posed questions, but they extend beyond the scope of this study. It is extremely challenging to obtain high-resolution structures of highly dynamic multidomain proteins, like OpuA, and to probe transient interactions as we do here for the SBDs of OpuA. We therefore combined cryo-TEM with smFRET studies and perform the most advanced and state-of-theart analysis tools as acknowledged by reviewer 1. We link our observations on the structural dynamics and interactions of the SBDs to a previous study, where we showed that the two SBDs of OpuA interact cooperatively. We do not have further evidence that connect the physical interactions to the transport cycle. In our view, the collective datasets indicate that the here reported physical interactions between the SBDs increase the transport efficiency.

      As far as I understand, the smFRET data have been interpreted on the basis of a negative observation, i.e., that it is "likely" that none of the FRET states corresponds to a docked SBD. To convincingly show this, a positive observation is required, i.e., observation of a docked state.

      The aim of this study was to study interdomain dynamics and not specifically docking. We have previously shown that docking can be visualized via cryo-EM (Sikkema et al., 2020), however the SBDs of OpuA appear to only dock in specific turnover conditions. We now show that the high FRET state of OpuA cannot represent a docked state, but that the SBDs transiently interact (see our response to the first comment). Importantly, a docked state was also not found in the cryo-EM reconstructions at low ionic strength, representing the smFRET conditions where we observe the interactions between the SBDs. The high FRET state occupies 30% of the dwells in this condition, and such a high percentage of molecules would have become apparent during cryo-EM 3D classification in case they would form a docked state. Therefore, we conclude that docking does not occur in low ionic strength apo condition. We discuss this point and our reasoning on page 11 of the revised manuscript.

      In this respect, I find it troubling that in none of the tested conditions, the authors observed a FRET state which corresponds to the docked state. Such a state, which must exist for transport to occur (as mentioned in the authors' previous publications), needs to be demonstrated. This brings me to my next question: why have the authors not measured FRET between the SBDs and the transporter? Isn't this a very important piece that is missing from their puzzle?

      We agree that investigating docking behavior under varied turnover conditions requires focused experiments on FRET dynamics between the SBDs and the transporter. As noted on page 5, OpuA exists as a homodimer, implying that a single cysteine mutation introduces two cysteines in a single functional transporter. To specifically implement a cysteine mutation in only one SBD and one transmembrane domain, it is necessary to artificially construct a heterodimer. We recently published initial attempts in this direction, and this will be a subject for future research but still requires years of work.

      Additionally, I feel that important controls are missing. For example, how will the data presented in Fig1 look if the transporter is labeled with acceptor or donor only? How do soluble SBDs behave?

      In the employed labeling method, donor and acceptor dyes are mixed in a 1:1 ratio and randomly attached to the two cysteines in the transporter. This automatically yields significant fractions of donor only and acceptor only transporters which are always present during the smFRET recordings. We can visualize those molecules on the basis of the dye stoichiometry, which we calculate by using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      Unfiltered plots look as follows (a dataset of OpuA-K521C at 600 mM KCl):

      Author response image 1.

      Donor only and acceptor only molecules have a very well discernible stoichiometry of 1 and 0, respectively. The filtering procedure is described in the materials and methods section, and these plots can be found in the supplementary database. We did not add them to the main text or supplementary materials of the original manuscript, as this is a very common procedure in the field of smFRET. We now include such a dataset in the revised manuscript.

      Soluble SBDs of OpuA have been studied previously (e.g. Wolters et al., 2010 & De Boer et al. 2019). For example, we have shown by SEC-MALLLS that soluble SBDs do not form dimers, which is consistent with our notion that the SBDs interact with low affinity. It is not possible to study interdomain dynamics between soluble SBDs by smFRET, because the measurements are carried out at picomolar concentrations (monomeric conditions). We emphasize that smFRET measurements with native complexes, with SBDs near each other at apparent millimolar concentrations, is physiologically more relevant.

      Additional comments:

      (1) "It could well be that cooperativity and transient interactions between SBDs is more common than previously anticipated" and a similar statement in the abstract. What evidence is there to suggest that the transient interactions between SBDs are a common phenomenon?

      On page 11, we write: Dimer formation of SBPs has been described for a variety of proteins from different structural clusters of substrate-binding proteins [33–38,51–53]. We cite 9 papers that report SBD/SBP dimers. This suggest to us that the phenomenon of interacting substrate-binding proteins could be more common. Moreover, the concentration of maltose-binding protein and other SBPs in the periplasm of Gram-negative bacteria can reach (sub)millimolar concentrations, and low-affinity interactions may play a role not only in membrane protein-tethered SBDs (like in OpuA) but also be important in soluble substrate-receptors. Such low-affinity interactions are rarely studied in biochemical experiments.

      (2) I think that the data presented in 1B-C better suits the supplementary information.

      Figure 1B-D is already a summary of the supplementary information that describes the optimization of OpuA purification. We think it is valuable to show this part of the figure in the main text. A very clean and highly pure OpuA sample is essential for smFRET experiments. Quality of protein preparations and data analysis are key for the type of measurements we report in this paper.

      (3) "the first peak in the SEC profile corresponds...." The peaks should be numbered in the figure to facilitate their identification.

      We have changed the figure as suggested.

      (4) "smFRET is a powerful tool for studying protein dynamics, but it has only been used for a handful of membrane proteins". With the growing list of membrane proteins studied by smFRET I find this an overstatement.

      We removed this sentence in the new version of the manuscript.

      (5) "We rationalized that docking of one SBD could induce a distance shift between the two SBDs in the FRET range of 3-10 nm (Figure 1E)" How and why was this assumed?

      We realize that this is one of the sentences that caused confusion about the aim of this study. In this part of the manuscript, we should not have used docking as an example and we apologize for that. We replaced the sentence by: These variants are used to study inter-SBD dynamics in the FRET range of 310 nm (Figure 1E).

      Also Figure 1E was adjusted to prevent confusion:

      Author response image 2.

      In addition, to avoid any confusion we changed the following sentence on page 4 (changes marked in bold): We designed cysteine mutations in the SBD of OpuA to study interdomain dynamics in the full length transporter.

      (6) "However, the FRET distributions are broader than would be expected from a single FRET state, especially for OpuA-K521C" Have the authors established how a single state FRET of OpuA looks? Is there a control that supports this claim?

      Below we compare two datasets from OpuA-K521C in 600 mM KCl with a typical smFRET dataset from the well-studied substrate-binding protein MBP from E. coli, which resides in a single state. Left: OpuA-K521C; Right: MBP

      Author response image 3.

      We agree that this cannot be assumed from the presented data. Therefore we rewrote this sentence: However, the FRET distributions tail towards higher FRET values, especially OpuA-K521C.

      (7) "V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the intrinsic transport and ATP hydrolysis efficiency intact." I find this statement confusing: How can a mutation reduce docking efficiency yet leave the transport activity unchanged?

      We rewrote the sentences (changes marked in bold): V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the ionic strength sensing in the NBD and the binding of glycine betaine and ATP intact. Accordingly, a reduced docking efficiency should result in a lower absolute glycine betaine-dependent ATPase activity. At the same time the responsiveness of the system to varying KCl, glycine betaine, or Mg-ATP concentrations should not change.

      (8) Along the same lines: "whereas the glycine betaine-, Mg-ATP-, or KCl-dependent activity profiles remain unchanged" vs. "OpuA-V149Q-K521C exhibited a 2- to 3-fold reduction in glycine betainedependent ATPase activity".

      See comment at point 7.

      (9) In general, I find the writing wanting at places, not on par with the high standards set by previous publications of this group.

      We recognize the potential ambiguity in our phrasing. We hope that after incorporating the feedback provided by the reviewers our manuscript will convey our findings in a clearer manner.

      Extra changes to the text:

      (1) Title changed: The substrate-binding domains of the osmoregulatory ABC importer OpuA physically transiently interact

      (2) Second part of the abstract changed: We now show, by means of solution-based single-molecule FRET and analysis with multi-parameter photon-by-photon hidden Markov modeling, that the SBDs transiently interact in an ionic strength-dependent manner. The smFRET data are in accordance with the apparent cooperativity in transport and supported by new cryo-EM data of OpuA. We propose that the physical interactions between SBDs and cooperativity in substrate delivery are part of the transport mechanism.

      (3) Page 6, third paragraph and Figure 2B: the wrong rate number was extracted from table 1. Changed this in the text and figure: 112 s-1  173 s-1. It did not affect any of the interpretations or conclusions.

      (4) Page 8, last paragraph, changed: smFRET was also performed in the absence of KCl and with a saturating concentration of glycine betaine (100 µM). The mean FRET efficiency of the highFRET state of OpuA-K521C increased to 0.78, which corresponds to an inter-dye distance of about 4 nm. This indicates that the dyes at the two SBDs move very close towards each other (Figure 4A) (Table 1) (Supplementary File 34).

      (5) Page 9, second paragraph changed: Due to the inherent flexibility of the SBDs, with respect to both the MSP protein of the nanodisc and the TMDs of OpuA, their resolution is limited. Furthermore, the cryo-EM reconstructions average all the particles in the final dataset, including those with a low and high FRET state. Nevertheless, in both conditions, the densities that correspond to the SBDs can be observed in close proximity (Figure 4D). The distance between the density centers is 6 nm and align with the dimensions of an SBD, providing further evidence for physical interactions between the SBDs.

    2. eLife assessment

      The OpuA Type I ABC importer uses two substrate binding domains to capture extracellular glycine betaine and present the substrate to the transmembrane domain for subsequent transport and correction of internal dehydration. This study presents valuable findings addressing the question of whether the two substrate binding domains of OpuA dock and physically interact in a salt-dependent manner. The single-molecule fluorescence resonance energy transfer and cryogenic electron microscopy data that are presented provide convincing support for the existence of a transient interaction between the substrate binding domains that depends on ionic strength, laying a foundation for future studies exploring how this interaction is involved in the overall transport mechanism.

    3. Reviewer #1 (Public Review):

      Summary: The type I ABC importer OpuA from Lactococcus lactis is the best studied transporter involved in osmoprotection. In contrast to most ABC import systems, the substrate binding protein is fused via a short linker to the transmembrane domain of the transporter. Consequently, this moiety is called the substrate binding domain (SBD). OpuA has been studied in the past in great detail and we have a very detailed knowledge about function, mechanisms of activation and deactivation as well as structure.

      Strengths: Application of smFRET to unravel transient interactions of the SBDs. The method is applied at a superb quality and the data evaluation is excellent.

      Weaknesses: The proposed model is not directly supported by experimental data. Rather alternative models are excluded as they do not fit to the obtained data. However, this is now clearly stated in the manuscript

    4. Reviewer #2 (Public Review):

      Summary:<br /> In this report the authors used solution-based single-molecule FRET and low resolution cryo-EM to investigate the interactions between the substrate-binding domains of the ABC-importer OpuA from Lactococcus lactis. Based on their results, the authors suggest that the SBDs interact in an ionic strength-dependent manner.

      Strengths:<br /> The strength of this manuscript is the uniqueness and importance of the scientific question, the adequacy of the experimental system (OpuA), and the combination of two very powerful and demanding experimental approaches.

      Weaknesses:<br /> A demonstration that the SBDs physically interact with one another, and that this interaction is important for the transport mechanism will greatly strengthen the claims of the authors. The relation to cooperativity is also unclear.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful to the Editors for overseeing the review of our manuscript, and to the two reviewers for their thoughtful comments and suggestions for how it can be improved.

      I submit at this time a revision, as well as a detailed response (below) to each of the points raised in the first round of review.

      We feel the manuscript has been significantly improved by taking the reviewers' comments to heart. In a nutshell, we added new key pieces of data (impact of WIN site inhibition on global translation, rRNA production, as well as the requested cell biology analyses showing nucleolar stress), new analyses of the proteomics to counter potential concerns with normalization, and expanded/revised verbiage in key areas to clarify parts of the text that were confusing or problematic. The main figures have not changed; all new material is included in supplements to figures 2 and 3.

      Public Reviews

      Reviewer #1 (Public Review):

      Building on previous work from the Tansey lab, here Howard et al. characterize transcriptional and translational changes upon WIN site inhibition of WDR5 in MLL-rearranged cancer cells. They first analyze whether C16, a newer generation compound, has the same cellular effects as C6, an early generation compound. Both compounds reduce the expression of WDR5-bound RPGs in addition to the unbound RPG RPL22L1. They then investigate differential translation by ribo-seq and observe that WIN site inhibition reduces the translational RPGs and other proteins related to biomass accumulation (spliceosome, proteasome, mitochondrial ribosome). Interestingly, this reduction adds to the transcriptional changes and is not limited to RPGs whose promoters are bound by WDR5. Quantitative proteomics at two-time points confirmed the downregulation of RPGs. Interestingly, the overall effects are modest, but RPL22LA is strongly affected. Unexpectedly, most differentially abundant proteins seem to be upregulated 24 h after C6 (see below). A genetic screen showed that loss of p53 rescues the effect of C6 and C16 and helped the authors to identify pathways that can be targeted by compounds together with WIN site inhibitors in a synergistic way. Finally, the authors elucidated the underlying mechanisms and analyzed the functional relevance of the RPL22, RPL22L1, p53, and MDM4 axis.

      While this work is not conceptually new, it is an important extension of the observations of Aho et al. The results are clearly described and, in my view, very meaningful overall.

      Major points:

      (1) The authors make statements about the globality/selectivity of the responses in RNA-seq, ribo-seq, and quantitative proteomics. However, as far as I can see, none of these analyses have spike-in controls. I recommend either repeating the experiments with a spike-in control or carefully measuring transcription and translation rates upon WIN site inhibition and normalizing the omics experiments with this factor.

      The reviewer is correct that we did not include spike-in controls in our omics experiments. We would like to emphasize that none of the omics data in this manuscript have been processed in unorthodox ways, and that the major conclusions each have independent corroborating data.

      The selectivity in RPG suppression observed in RNA-Seq, for example, is supported by results from our target engagement (QuantiGene) assays; suppression of RPL22L1 mRNA levels is supported by quantitative and semi-quantitative RT-PCR, by western blotting, and by the results of our proteomic profiling; alternative splicing (and expression) of MDM4—and its dependency on RPL22—is also backed up by similar RT-PCR and western blotting data. The same applies for alternative splicing of RPL22L1.

      That said, we do appreciate the point the reviewer is making here, and have done our best to respond. We do not think it is a prudent investment in resources to repeat the numerous omics assays in the manuscript. We also considered normalizing for bulk transcription and translation rates as suggested, but it is not clear in practice how this would be done, and it could introduce additional variables and uncertainties that may skew the interpretation of results. Instead, to respond to this comment, we made the following changes to the manuscript:

      (1) We now explicitly state, for all omics assays, that spike-in controls were not included. These statements will prompt the reader to make their own assessment of the robustness of each of our findings and interpretations.

      (2) We have added new data to the manuscript (Figure 2—figure supplement 1A–B) measuring the impact of C6 and C16 on bulk translation using the OPP labeling method. These new data demonstrate that WIN site inhibitors induce a progressive yet modest decline in protein synthesis capacity. At 24 hours, there is no significant effect of either agent on protein synthesis levels. By 48 hours, a small but significant effect is observed, and by 96 hours translation levels are ~60% of what they are in vehicle-treated control cells. These new data are important because they support the idea that normalization has not blunted the responses we observe—the magnitude of the effects are consistent between the different assays and tend to cap out at two-fold in terms of RPG suppression, translation efficiency, ribosomal protein levels, and protein synthesis capacity.

      (3) We have included additional analysis regarding the LFQMS, as described below, that specifically addresses the issue of normalization in our proteomics experiments.

      (2) Why are the majority of proteins upregulated in the proteomics experiment after 24 h in C6 (if really true after normalization with general protein amount per cell)? This is surprising and needs further explanation.

      The reviewer is correct in noting that (by LFQMS) ~700 proteins are induced after 24 hours of treatment of MV4:11 cells with C16 (not C6, as stated). The reviewer would like us to examine whether this apparent increase in proteins is a normalization artifact. In response to this comment, we have made the following changes to the manuscript:

      (1) Our new OPP labeling experiments (Figure 2—figure supplement 1A–B) show that there is no significant reduction in overall protein synthesis following 24 hours of C16 treatment. In light of this finding, it is unlikely that normalization artifacts, resulting from diminution of the pool of highly abundant proteins, create the appearance of these 700 proteins being induced. We now explicitly make this point in the text.

      (2) We now clarify in the methods how we seeded identical numbers of cells for DMSO and C16-treated cultures in these experiments, and—consistent with our finding that WIN site inhibitors have little if any effect on protein synthesis or proliferation at the 24 hour timepoint— extracted comparable amounts of proteins from these two treatment conditions (DMSO: 344.75 ± 21.7 µg; C16: 366.50 ± 15.8 µg; [Mean ± SEM]).

      (3) We now include in Figure 3—figure supplement 1A a plot showing the distribution of peptide intensities for each protein detected in each run of LFQMS before and after equal median normalization. This new analysis reveals that the distribution of intensities is not appreciably changed via normalization. Specifically, there is not a reduction in peptide intensities in the unnormalized data from 24 hours of C16 treatment that is reversed or tempered by normalization. This analysis provides further support for the notion that the increase we observe is not a normalization artifact.

      (4) We now include in Figure 3—figure supplement 1B–D a set of new analyses examining the relationship between the initial intensity of proteins in DMSO control samples (a crude proxy for abundance) versus the fold change in response to WIN site inhibitor. This analysis shows that we have as many "highly abundant" (10th decile) proteins increasing as we do decreasing in response to WINi. Thus, it appears as though the wholesale clearance of highly abundant proteins from the cell is not occurring at this early treatment timepoint. In addition, this analysis also shows that ribosomal proteins (RP) are generally the most abundant, most suppressed, proteins and that their fold-change at the protein level at 24 hours is less than two-fold, consistent again with the magnitude of transcriptional effects of C16, as measured by RNA-Seq and QuantiGene. The fact that the drop in RP levels is consistent with expectations based on other analyses provides further empirical support for the notion that protein levels inferred from LFQMS are authentic and not skewed by global changes in the proteome.

      The increase in proteins at this time point, we argue, is thus most likely genuine. It is not surprising that—at a timepoint at which protein synthesis is unaffected—several hundred proteins are induced by a factor of two. How this occurs, we do not know. It may be a transient compensatory mechanism, or it may be an early part of the active response to WIN site inhibitors. Lest the reader be confused by this finding, we have now added text to this section of the manuscript discussing and explaining the phenomenon in more detail.

      (3) The description of the two CRISPR screens (GECKO and targeted) is a bit confusing. Do I understand correctly that in the GECKO screen, the treated cells are not compared with nontreated cells of the same time point, but with a time point 0? If so, this screen is not very meaningful and perhaps should be omitted. Also, it is unclear to me what the advantages of the targeted screen are since the targets were not covered with more sgRNAs (data contradictory: 4 or 10 sgRNAs per target?) than in Gecko. Also, genome-wide screens are feasible in culture for multiple conditions. Overall, I find the presentation of the screening results not favorable.

      In essence, this is a single screen performed in two tiers. In Tier 1, we screened a complete GECKO library (six sgRNA/gene) with the earliest generation (less potent) inhibitor C6, and compared sgRNA representation against the time zero population. This screen would reveal sgRNAs that are specifically associated with response to C6, as well as those that are associated with general cell fitness and viability. We then identified genes connected to these sgRNAs, removed those that are pan essential, and built a custom library for the second tier using sgRNAs from the Brunello library (four sgRNA/gene). We then screened this custom library with both C6 and the more potent inhibitor C16, this time against DMSO-treated cells from the same timepoint.

      We acknowledge that this is not the most streamlined setup for a screen. But our intention was to compare two inhibitors (C6 and C16) and identify high confidence 'hits' that are disconnected from general cell viability, rather than generate an exhaustive list of all genes that, when disrupted, skew the response to WIN site inhibitor. The final result of this screen (Figure 4E) is a gene list that has been validated with two chemically distinct WIN site inhibitors and up to 10 unique sgRNAs per gene. We may not have captured every gene that can modulate response to WIN site inhibitor, but those appearing in Figure 4E are highly validated.

      To answer the reviewer's specific questions: (i) we cannot omit the Tier 1 screen because then there would be no rationale for what was screened in the second Tier; and (ii) the advantage of the custom Tier 2 library is that it allowed us to screen hits from the Tier 1 screen with four completely independent sgRNAs. Although there are not more sgRNAs for each gene in the Tier 2 versus the Tier 1 library, these sgRNAs are different and thus, for C6 at least, hits surviving both screens were validated with up to 10 unique sgRNAs.

      We apologize that the description of the CRISPR screens was not clearer, and have reworked this section of the manuscript to make our intent and our actions clearer.

      (4) Can Re-expression of RPL22 rescue the growth arrest of C6?.

      We have not attempted to complement the RPL22 knock out. But we do note that evidence supporting the idea that loss of RPL22 confers resistance to WIN site inhibitor is strong—six (out of six) sgRNAs against RPL22 were significantly enriched in the Tier 1 screen, and independent knock out of RPL22 with the Synthego multi-guide system in MV4;11 and MOLM13 cells increases the GI50 for C16.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Howard et al reports the development of high-affinity WDR5-interaction site inhibitors (WINi) that engage the protein to block the arginine-dependent engagement with its partners. Treatment of MLL-rearranged leukemia cells with high-affinity WINi (C16) decreases the expression of genes encoding most ribosomal proteins and other proteins required for translation. Notably, although these targets are enriched for WDR5-ChIP-seq peaks, such peaks are not universally present in the target genes. High concordance was found between the alterations in gene expression due to C16 treatment and the changes resulting from treatment with an earlier, lower affinity WINi (C6). Besides protein synthesis, genes involved in DNA replication or MYC responses are downregulated, while p53 targets and apoptosis genes are upregulated. Ribosome profiling reveals a global decrease in translational efficiency due to WINi with overall ribosome occupancies of mRNAs ~50% of control samples. The magnitude of the decrements of translation for most individual mRNAs exceeds the respective changes in mRNA levels genome-wide. From these results and other considerations, the authors hypothesize that WINi results in ribosome depletion. Quantitative mass spec documents the decrement in ribosomal proteins following WINi treatment along with increases in p53 targets and proteins involved in apoptosis occurring over 3 days. Notably, RPL22L1 is essentially completely lost upon WINi treatment. The investigators next conduct a CRISPR screen to find moderators and cooperators with WINi. They identify components of p53 and DNA repair pathways as mediators of WINi-inflicted cell death (so gRNAs against these genes permit cell survival). Next, WINi are tested in combination with a variety of other agents to explore synergistic killing to improve their expected therapeutic efficacy. The authors document the loss of the p53 antagonist MDM4 (in combination with splicing alterations of RPL22L1), an observation that supports the notion that WINi killing is p53-mediated.

      Strengths:

      This is a scientifically very strong and well-written manuscript that applies a variety of state-ofthe art molecular approaches to interrogate the role of the WDR5 interaction site and WINi. They reveal that the effects of WINi seem to be focused on the overall synthesis of protein components of the translation apparatus, especially ribosomal proteins-even those that do not bind WDR5 by ChIP (a question left unanswered is how much the WDR5-less genes are nevertheless WINi targeted). They convincingly show that disruption of the synthesis of these proteins is accompanied by DNA damage inferred by H2AX-activation, activation of the p53pathway, and apoptosis. Pathways of possible WINi resistance and synergies with other antineoplastic approaches are explored. These experiments are all well-executed and strongly invite more extensive pre-clinical and translational studies of WINi in animal studies. The studies also may anticipate the use of WINi as probes of nucleolar function and ribosome synthesis though this was not really explored in the current manuscript.

      Weaknesses:

      A mild deficiency in the current manuscript is the absence of cell biological methods to complement the molecular biological and biochemical approaches so ably employed. Some microscopic observations and confirmation of nucleolar dysfunction and DNA damage would be reassuring.

      We thank the reviewer for their comments. We agree that an absence of cell biological methods was a deficiency in the original manuscript. In response to this comment, we have now added immunofluorescence (IF) analyses, examining the impact of C16 on nucleolar integrity and nucleophosmin (NPM1) distribution (Figure 3—figure supplement 4). These new data clearly show that C16 induces nucleolar stress at 72 hours—as measured by the redistribution of NPM1 from the nucleolus to the nucleoplasm. These new data fill an important gap in the story, and we are grateful to the reviewer for prompting us to perform these experiments.

      As part of the above study, we also probed for gamma-H2AX, expecting that we may see some signs of accumulation in the nucleoli (see comment #4 from Reviewer #2, below). We did not observe this response. Importantly, however, we did see that gamma-H2AX staining occurs only in what are overtly apoptotic cells. This is an important finding, because we had previously speculated that the induction of gamma-H2AX observed by Western blotting reflected part of a bona-fide response to DNA damage elicited by WIN site inhibitors. Instead, the IF data now leads us to conclude that this signal simply reflects the established fact that WIN site inhibitors induce apoptosis in this cell line (Aho et al., 2019). In response to this new finding, we have added additional discussion to the text and have removed or de-emphasized the potential contribution of DNA damage to the mechanism of action of WDR5 WIN site inhibitors. Again, we are grateful for this comment as it has prevented us from continuing to report/pursue erroneous observations.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      There is a typo in "but are are linked to mRNA instability when translation is inhibited".

      Thank you for catching this typo. It has now been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors report that WINi initially (at 24 hrs) increases the expression of most proteins while decreasing ribosomal proteins, but at 72 hours all proteins are depressed. The transient bump-up of non-translation-related proteins seems odd. A simple resolution to this somewhat strange observation is that there is no real increase in the other proteins, but because of the loss of a large fraction of the most abundant cellular proteins (the ribosomal proteins), the relative fraction of all other proteins is increased; that is, the increase of non-ribosomal proteins may be an artifact of normalization to a lower total protein content. Can this be explored?

      We are grateful to the reviewer for this comment. We have tried our best to respond, as detailed above in response to Reviewer #1 Public Comment #2.

      (2) It would be really nice to assess nucleolar status microscopically. Do nucleoli get bigger? Smaller? Do they have abnormal morphology? Is there nucleolar stress? What happens to rRNA synthesis and processing?

      We agree and thank the reviewer for raising this point. As noted in our response to Reviewer #2, above, we have included new IF that shows: (i) no obvious effect on nucleolar integrity, (ii) redistribution of NPM1 to the nucleoplasm (indicative of nucleolar stress), and (iii) induction of gamma-H2AX staining in apoptotic cells (indicative of apoptosis).

      Additionally, in response to this comment, we also looked at the impact of WIN site inhibitors on rRNA synthesis, using AzCyd labeling. These new data appear in Figure 3—figure supplement 3. Interestingly, these new data show that there is a progressive decline in rRNA synthesis, and that by 96 hours of treatment levels of both 18S and 28S rRNAs are reduced— again by about a factor of two. Our interpretation of this finding is that in response to the progressive decline in RPG transcription there is a secondary decrease in rRNA synthesis. This result is perhaps not surprising, but it does again add an important missing piece to our characterization of WIN site inhibitors and is further support for the concept that inhibition of ribosome production is a dominant part of the response to these agents.

      (3) The WINi elicited DNA damage is incompletely characterized, rather it is inferred from H2AX activation. Comet assays would help to confirm such damage.

      As noted in our response to Reviewer #2, our original inference of DNA damage, prompted by gamma-H2AX activation, is erroneous, and due instead to the ability of WIN site inhibitors to induce apoptosis. We thus did not pursue comet assays, etc., and removed discussion of potential DNA damage from the manuscript.

      (4) Staining and microscopic observation of H2AX would be very useful. Is the WINi provoked DNA damage nucleolar-localized? Does the deficiency of ribosomal proteins lead to localized genotoxic nucleolar stress - or alternatively does the paucity of ribosomes and decreased translation lead to imbalances in other cellular pathways, perhaps including some involved in overall genome maintenance which would provoke more global DNA damage and H2AX staining, not limited to the nucleolus.

      Again, please see our response to the Public Comment from Reviewer #2.

      (5) It would be important to assess the influence and effects of WINi on some p53 mutant, p53-/- and p53 wild-type cell lines. Given their prevalence, p53 status may be expected to alter WINi efficacy.

      The issue of how p53 status impacts the response to WINi is interesting and important, but we feel this is beyond the scope of the current manuscript. It is likely that many factors contribute to the response of cancer cells to these agents, and thus simply surveying some cancer lines for their response and linking this to their p53 status is unlikely to be very informative. Making definitive statements about the contribution of p53, and the differences between wild-type, lossof-function mutants, gain of function mutants, and null mutants will require more extensive analyses and is fertile territory for future studies, in our opinion.

    2. eLife assessment

      This important paper reveals that one of the major roles of the WDR5 WIN site is to promote ribosome synthesis, and that by attacking the WIN site with inhibitors ribosome attrition occurs creating new vulnerabilities that can be therapeutically exploited. This deficiency of ribosomal proteins also provokes the p53 response. The data from a variety of approaches is generally very convincing, and together buttresses the authors' conclusions and interpretations quite nicely; overall, this paper will provide a justification for pre-clinical and translational studies of WDR5 interaction site inhibitors.

    3. Reviewer #1 (Public Review):

      Building on previous work from the Tansey lab, here Howard et al. characterize transcriptional and translational changes upon WIN site inhibition of WDR5 in MLL-rearranged cancer cells. They first analyze whether C16, a newer generation compound, has the same cellular effects as C6, an early generation compound. Both compounds reduce the expression of WDR5-bound RPGs in addition to the unbound RPG RPL22L1. They then investigate differential translation by ribo-seq and observe that WIN site inhibition reduces the translational RPGs and other proteins related to biomass accumulation (spliceosome, proteasome, mitochondrial ribosome). Interestingly, this reduction adds to the transcriptional changes and is not limited to RPGs whose promoters are bound by WDR5. Quantitative proteomics at two time points confirmed the downregulation of RPGs. Interestingly, the overall effects are modest, but RPL22LA is strongly affected. Unexpectedly, most differentially abundant proteins seem to be upregulated 24 h after C6 (see below). A genetic screen showed that loss of p53 rescues the effect of C6 and C16 and helped the authors to identify pathways that can be targeted by compounds together with WIN site inhibitors in a synergistic way. Finally, the authors elucidated the underlying mechanisms and analyzed the functional relevance of the RPL22, RPL22L1, p53 and MDM4 axis.

      Comments on revised version:

      The authors have answered my points satisfactorily and the manuscript has become clearer and more meaningful as a result. In particular, the measurement of global translation rate is important and validates the upregulation of a number of proteins following WDR5 inhibitor treatment.

    4. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Howard et al reports the development of high affinity WDR5-interaction site inhibitors (WINi) that engage the protein to block the arginine-dependent engagement with its partners. Treatment of MLL-rearranged leukemia cells with high-affinity WINi (C16) decreases the expression of genes encoding most ribosomal proteins and other proteins required for translation. Notably, although these targets are enriched for WDR5-ChIP-seq peaks, such peaks are not universally present in the target genes. High concordance was founded between the alterations in gene expression due to C16 treatment and the changes resulting from treatment with an earlier, lower affinity WINi (C6). Besides protein synthesis, genes involved in DNA replication or MYC responses are downregulated, while p53 targets and apoptosis genes are upregulated. Ribosome profiling reveals a global decrease in translational efficiency due to WINi with overall ribosome occupancies of mRNAs ~50% of control samples. The magnitude in the decrements of translation for most individual mRNAs exceeds the respective changes in mRNA levels genome-wide. From these results and other considerations, the authors hypothesize that WINi results in ribosome depletion. Quantitative mass spec documents the decrement in ribosomal proteins following WINi treatment along with increases in p53 targets and proteins involved in apoptosis occurring over 3 days. Notably RPL22L1 is essentially completely lost upon WINi treatment. The investigators next conduct a CRISPR screen to find moderators and cooperators with WINi. They identify components of p53 and DNA repair pathways as mediators of WINi inflicted cell death (so gRNAs against these genes permit cell survival). Next, WINi are tested in combination with a variety of other agents to explore synergistic killing to improve their expected therapeutic efficacy. The authors document loss of the p53 antagonist MDM4 (in combination with splicing alterations of RPL22L1), an observation that supports the notion that WINi killing is p53-mediated.

      This is a scientifically very strong and well-written manuscript that applies a variety of state-of-the art molecular approaches to interrogate the role of the WDR5 interaction site and WINi. They reveal that the effects of WINi seem to be focused on the overall synthesis of protein components of the translation apparatus, especially ribosomal proteins-even those that do not bind WDR5 by ChIP (a question left unanswered is how such the WDR5-less genes are nevertheless WINi targeted). They convincingly show that disruption of the synthesis of these proteins occurs upon activation of p53 dependent apoptosis, likely driven by unbalanced ribosomal protein synthesis leading to MDM2 inhibition. This apoptosis is subsequently followed, as expected by ɣH2AX-activation. Pathways of possible WINi resistance and synergies with other anti-neoplastic approaches are explored. These experiments are all well-executed and strongly invite more extensive pre-clinical and translational studies of WINi in animal studies. The studies also may anticipate the use of WINi as probes of nucleolar function and ribosome synthesis though this was not really explored in the current manuscript. The current version of the manuscript documents ribosomal stress revealed by leakage of NPM1 into the nucleoplasm while nucleolar integrity is preserved. A progressive loss of rRNA synthesis occurs upon drug treatment that is presumably secondary to the decrement in ribosomal protein production.

      Comments on revised version:

      (1) The authors to my mind, have quite nicely and professionally addressed the comments of the reviewers and are to be congratulated on an important contribution to the elucidation of WDR5 biology and pathology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a useful study examining the determinants and mechanisms of LRMP inhibi:on of cAMP regula:on of HCN4 channel ga:ng. The evidence provided to support the main conclusions is unfortunately incomplete, with discrepancies in the work that reduce the strength of mechanis:c insights.

      Thank you for the reviews of our manuscript. We have made a number of changes to clarify our hypotheses in the manuscript and addressed all of the poten:al discrepancies by revising some of our interpreta:on. In addi:on, we have provided addi:onal experimental evidence to support our conclusions. Please see below for a detailed response to each reviewer comment.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors use truncations, fragments, and HCN2/4 chimeras to narrow down the interaction and regulatory domains for LRMP inhibition of cAMP-dependent shifts in the voltage dependence of activation of HCN4 channels. They identify the N-terminal domain of HCN4 as a binding domain for LRMP, and highlight two residues in the C-linker as critical for the regulatory effect. Notably, whereas HCN2 is normally insensitive to LRMP, putting the N-terminus and 5 additional C-linker and S5 residues from HCN4 into HCN2 confers LRMP regulation in HCN2.

      Strengths:

      The work is excellent, the paper well written, and the data convincingly support the conclusions which shed new light on the interaction and mechanism for LRMP regulation of HCN4, as well as identifying critical differences that explain why LRMP does not regulate other isoforms such as HCN2.

      Thank you.

      Reviewer #2 (Public Review):

      Summary:

      HCN-4 isoform is found primarily in the sino-atrial node where it contributes to the pacemaking activity. LRMP is an accessory subunit that prevents cAMP-dependent potentiation of HCN4 isoform but does not have any effect on HCN2 regulation. In this study, the authors combine electrophysiology, FRET with standard molecular genetics to determine the molecular mechanism of LRMP action on HCN4 activity. Their study shows that parts of N- and C-termini along with specific residues in C-linker and S5 of HCN4 are crucial for mediating LRMP action on these channels. Furthermore, they show that the initial 224 residues of LRMP are sufficient to account for most of the activity. In my view, the highlight of this study is Fig. 7 which recapitulates LRMP modulation on HCN2-HCN4 chimera. Overall, this study is an excellent example of using time-tested methods to probe the molecular mechanisms of regulation of channel function by an accessory subunit.

      Weaknesses:

      (1) Figure 5A- I am a bit confused with this figure and perhaps it needs better labeling. When it states Citrine, does it mean just free Citrine, and "LRMP 1-230" means LRMP fused to Citrine which is an "LF" construct? Why not simply call it "LF"? If there is no Citrine fused to "LRMP 1-230", this figure would not make sense to me.

      We have clarified the labelling of this figure and specifically defined all abbreviations used for HCN4 and LRMP fragments in the results section on page 14.

      (2) Related to the above point- Why is there very little FRET between NF and LRMP 1-230? The FRET distance range is 2-8 nm which is quite large. To observe baseline FRET for this construct more explanation is required. Even if one assumes that about 100 amino are completely disordered (not extended) polymers, I think you would still expect significant FRET.

      FRET is extremely sensitive to distance (to the 6th power of distance). The difference in contour length (maximum length of a peptide if extended) between our ~260aa fragment and our ~130 aa fragments is on the order of 450Å (45nm), So, even if not extended it is not hard to imagine that the larger fragments show a weaker FRET signal. In fact, we do see a slightly larger FRET than we do in control (not significant) which is consistent with the idea that the larger fragments just do not result in a large FRET.

      Moreover, this hybridization assay is sensitive to a number of other factors including the affinity between the two fragments, the expression of each fragment, and the orientation of the fluorophores. Any of these factors could also result in reduced FRET.

      We have added a section on the limitations of the FRET 2-hybrid assay in the discussion section on page 20. Our goal with the FRET assay was to provide complimentary evidence that shows some of the regions that are important for direct association and we have edited to the text to make sure we are not over-interpreting our results.

      (3) Unless I missed this, have all the Cerulean and Citrine constructs been tested for functional activity?

      All citrine-tagged LRMP constructs (or close derivatives) were tested functionally by coexpression with HCN (See Table 1 and pages 10-11). Cerulean-tagged HCN4 fragments are of course intrinsically not-functional as they do not include the ion conducting pore.

      Reviewer #3 (Public Review):

      Summary:

      Using patch clamp electrophysiology and Förster resonance energy transfer (FRET), Peters and co-workers showed that the disordered N-terminus of both LRMP and HCN4 are necessary for LRMP to interact with HCN4 and inhibit the cAMP-dependent potentiation of channel opening. Strikingly, they identified two HCN4-specific residues, P545 and T547 in the C-linker of HCN4, that are close in proximity to the cAMP transduction centre (elbow Clinker, S4/S5-linker, HCND) and account for the LRMP effect.

      Strengths:

      Based on these data, the authors propose a mechanism in which LRMP specifically binds to HCN4 via its isotype-specific N-terminal sequence and thus prevents the cAMP transduction mechanism by acting at the interface between the elbow Clinker, the S4S5-linker, the HCND.

      Weaknesses:

      Although the work is interesting, there are some discrepancies between data that need to be addressed.

      (1) I suggest inserting in Table 1 and in the text, the Δ shift values (+cAMP; + LRMP; +cAMP/LRMP). This will help readers.

      Thank you, Δ shift values have been added to Tables 1 and 2 as suggested.

      (2) Figure 1 is not clear, the distribution of values is anomalously high. For instance, in 1B the distribution of values of V1/2 in the presence of cAMP goes from - 85 to -115. I agree that in the absence of cAMP, HCN4 in HEK293 cells shows some variability in V1/2 values, that nonetheless cannot be so wide (here the variability spans sometimes even 30 mV) and usually disappears with cAMP (here not).

      With a large N, this is an expected distribution. In 5 previous reports from 4 different groups of HCN4 with cAMP in HEK 293 (Fenske et al., 2020; Liao et al., 2012; Peters et al., 2020; Saponaro et al., 2021; Schweizer et al., 2010), the average expected range of the data is 26.6 mV and 39.9 mV for 95% (mean ± 2SD) and 99% (mean ± 3SD) of the data, respectively. As the reviewer mentions the expected range from these papers is slightly larger in the absence of cAMP. The average SD of HCN4 (with/without cAMP) in papers are 9.9 mV (Schweizer et al., 2010), 4.4 mV (Saponaro et al., 2021), 7.6 mV (Fenske et al., 2020), 10.0 mV (Liao et al., 2012), and 5.9 mV (Peters et al., 2020). Our SD in this paper is roughly in the middle at 7.6 mV. This is likely because we used an inclusive approach to data so as not to bias our results (see the statistics section of the revised manuscript on page 9). We have removed 2 data points that meet the statistical classification as outliers, no measures of statistical significance were altered by this.

      This problem is spread throughout the manuscript, and the measured mean effects are indeed always at the limit of statistical significance. Why so? Is this a problem with the analysis, or with the recordings?

      The exact P-values are NOT typically at the limit of statistical significance, about 2/3rds would meet the stringent P < 0.0001 cut-off. We have clarified in the statistics section (page 10) that any comparison meeting our significance threshold (P < 0.05) or a stricter criterion is treated equally in the figure labelling. Exact P-values are provided in Tables 1-3.

      There are several other problems with Figure 1 and in all figures of the manuscript: the Y scale is very narrow while the mean values are marked with large square boxes. Moreover, the exemplary activation curve of Figure 1A is not representative of the mean values reported in Figure 1B, and the values of 1B are different from those reported in Table 1.

      Y-axis values for mean plots were picked such that all data points are included and are consistent across all figures. They have been expanded slightly (-75 to -145 mV for all HCN4 channels and -65 to -135 mV for all HCN2 channels). The size of the mean value marker has been reduced slightly. Exact midpoints for all data are also found in Tables 1-3.

      The GV curves in Figure 1B (previously Fig. 1A) are averages with the ±SEM error bars smaller than the symbols in many cases owing to relatively high n’s for these datasets. These curves match the midpoints in panel 1C (previously 1B). Eg. the midpoint of the average curve for HCN4 control in panel A is -117.9 mV, the same as the -117.8 mV average for the individual fits in panel B.

      We made an error in the text based on a previous manuscript version about the ordering of the tables that has now been fixed so these values should now be aligned.

      On this ground, it is difficult to judge the conclusions and it would also greatly help if exemplary current traces would be also shown.

      Exemplary current traces have been added to all figures in the revised manuscript.

      (3) "....HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP. Thus, LRMP appears to regulate HCN4 by altering the interactions between the C-linker, S4-S5 linker, and Nterminus at the cAMP transduction centre."

      Although this is an interesting theory, there are no data supporting it. Indeed, P545 and T547 at the tip of the C-linker elbow (fig 6A) are crucial for LRMP effect, but these two residues are not involved in the cAMP transduction centre (interface between HCND, S4S5 linker, and Clinker elbow), at least for the data accumulated till now in the literature. Indeed, the hypothesis that LRMP somehow inhibits the cAMP transduction mechanism of HCN4 given the fact that the two necessary residues P545 and T547 are close to the cAMP transduction centre, remains to be proven.

      Moreover, I suggest analysing the putative role of P545 and T547 in light of the available HCN4 structures. In particular, T547 (elbow) points towards the underlying shoulder of the adjacent subunit and, therefore, is in a key position for the cAMP transduction mechanism. The presence of bulky hydrophobic residues (very different nature compared to T) in the equivalent position of HCN1 and HCN2 also favours this hypothesis. In this light, it will be also interesting to see whether a single T547F mutation is sufficient to prevent the LRMP effect.

      We agree that testing this hypothesis would be very interesting. However, it is challenging. Any mutation we make that is involved in cAMP transduction makes measuring the LRMP effect on cAMP shifts difficult or impossible.

      Our simple idea, now clarified in the discussion, is that if you look at the regions involved in cAMP transduction (HCND, C-linker, S4-S5), there are very few residues that differ between HCN4 and HCN2. When we mutate the 5 non-conserved residues in the S5 segment and the C-linker, along with the NT, we are able to render HCN2 sensitive to LRMP. Therefore, something about the small sequence differences in this region confer isoform specificity to LRMP. We speculate that this happens because of small structural differences that result from those 5 mutations. If you compare the solved structures of HCN1 and HCN4 (there is no HCN2 structure available), you can see small differences in the distances between key interacting residues in the transduction centre. Also, there is a kink at the bottom of the S4 helix in HCN4 but not HCN1. This points a putatively important residue for cAMP dependence in a different direction in HCN4. We hypothesize in the discussion that this may be how LRMP is isoform specific.

      Moreover, previous work has shown that the HCN4 C-linker is uniquely sensitive to di-cyclic nucleotides and magnesium ions. We are hypothesizing that it is the subtle change in structure that makes this region more prone to regulation in HCN4.

      Reviewing Editor (recommendations for the Authors):

      (1) Exemplar recordings need to be shown and some explanation for the wide variability in the V-half of activation.

      Exemplar currents are now shown for each channel. See the response to Reviewer 3’s public comment 2.

      (2) The rationale for cut sites in LRMP for the investigation of which parts of the protein are important for blocking the effect of cAMP is not logically presented in light of the modular schematics of domains in the protein (N-term, CCD, post-CCD, etc).

      There is limited structural data on LRMP and the HCN4 N-terminus. The cut sites in this paper were determined empirically. We made fragments that were small enough to work for our FRET hybridization approach and that expressed well in our HEK cell system. The residue numbering of the LRMP modules is based on updated structural predictions using Alphafold, which was released after our fragments were designed. This has been clarified in the methods section on pages 5-6 and the Figure 2 legend of the revised manuscript.

      (3) Role of the HCN4 C-terminus. Truncation of the HCN4 C-terminus unstructured Cterminus distal to the CNBD (Fig. 4 A, B) partially reverses the impact of LRMP (i.e. there is now a significant increase in cAMP effect compared to full-length HCN4). The manuscript is written in a manner that minimizes the potential role of the C-terminus and it is, therefore, eliminated from consideration in subsequent experiments (e.g. FRET) and the discussion. The model is incomplete without considering the impact of the C-terminus.

      We thank the reviewer for this comment as it was a result that we too readily dismissed. We have added discussion around this point and revised our model to suggest that not only can we not eliminate a role for the distal C-terminus, our data is consistent with it having a modest role. Our HCN4-2 chimera and HCN4-S719x data both suggest the possibility that the distal C-terminus might be having some effect on LRMP regulation. We have clarified this in the results (pages 12-13) and discussion (page 19).

      (4) For FRET experiments, it is not clear why LF should show an interaction with N2 (residues 125-160) but not NF (residues 1-160). N2 is contained within NF, and given that Citrine and Cerulean are present on the C-terminus of LF and N2/NF, respectively, residues 1-124 in NF should not impact the detection of FRET because of greater separation between the fluorophores as suggested by the authors.

      This is a fair point but FRET is somewhat more complicated. We do not know the structure of these fragments and it’s hard to speculate where the fluorophores are oriented in this type of assay. Moreover, this hybridization assay is sensitive to affinity and expression as well. There are a number of reasons why the larger 1-260 fragment might show reduced FRET compared to 125-260. As mentioned in our response to reviewer 2’s public comment 2, we have added a limitation section that outlines the various caveats of FRET that could explain this.

      (5) For FRET experiments, the choice of using pieces of the channel that do not correlate with the truncations studied in functional electrophysiological experiments limits the holistic interpretation of the data. Also, no explanation or discussion is provided for why LRMP fragments that are capable of binding to the HCN4 N-terminus as determined by FRET (e.g. residues 1-108 and 110-230, respectively) do not have a functional impact on the channel.

      As mentioned in the response to comment 2, the exact fragment design is a function of which fragments expressed well in HEK cells. Importantly, because FRET experiments do not provide atomic resolution for the caveats listed in the revised limitations section on page 20-21, small differences in the cut sites do not change the interpretation of these results. For example, the N-terminal 1-125 construct is analogous to experiments with the Δ1-130 HCN4 channel.

      We suspect that residues in both fragments are required and that the interaction involves multiple parts. This is stated in the results “Thus, the first 227 residues of LRMP are sufficient to regulate HCN4, with residues in both halves of the LRMP N-terminus necessary for the regulation” (page 11). We have also added discussion on this on page 21.

      (6) A striking result was that mutating two residues in the C-linker of HCN4 to amino acids found in HCN channels not affected by LRMP (P545A, T547F), completely eliminated the impact of LRMP on preventing cAMP regulation of channel activation. However, a chimeric channel, (HCN4-2) in which the C-linker, the CNBD, and the C-terminus of HCN4 were replaced by that of HCN2 was found to be partially responsive to LRMP. These two results appear inconsistent and not reconciled in the model proposed by the authors for how LRMP may be working.

      As stated in our answer to your question #3, we have revised our interpretation of these data. If the more distal C-terminus plays some role in the orientation of the C-linker and the transduction centre as a whole, these data can still be viewed consistent with our model. We have added some discussion of this idea in our discussion section.

      (7) Replacing the HCN2 N-terminus with that from HCN4, along with mutations in the S5 (MCS/VVG) and C-linker (AF/PT) recapitulated LRMP regulation on the HCN2 background. The functional importance of the S5 mutations is not clear as no other experiments are shown to indicate whether they are necessary for the observed effect.

      We have added our experiments on a midpoint HCN2 clone that includes the S5 mutants and the C-linker mutants in the absence of the HCN4 N-terminus (ie HCN2 MCSAF/VVGPT) (Fig. 7). And we have discussed our rationale for the S5 mutations as we believe they may be responsible for the different orientations of the S4-S5 linker in HCN1 and HCN4 structures that are known to impact cAMP regulation.

      Reviewer #1 (Recommendations For The Authors):

      A) Comments:

      (1) Figure 1: Please show some representative current traces.

      Exemplar currents are now shown for each channel in the manuscript.

      (2) Figure 1: There appears to be a huge number of recordings for HCN4 +/- cAMP as compared to those with LRMP 1-479Cit. How was the number of recordings needed for sufficient statistical power decided? This is particularly important because the observed slowing of deactivation by cAMP in Fig. 1C seems like it may be fairly subtle. Perhaps a swarm plot would make the shift more apparent? Also, LRMP 1-479Cit distributions in Fig. 1B-C look like they are more uniform than normal, so please double-check the appropriateness of the statistical test employed.

      We have revised the methods section (page 7) to discuss this, briefly we performed regular control experiments throughout this project to ensure that a normal cAMP response was occurring. Our minimum target for sufficient power was 8-10 recordings. We have expanded the statistics section (page 9) to discuss tests of normality and the use of a log scale for deactivation time constants which is why the shifts in Fig. 1D (revised) are less apparent.

      (3) It would be helpful if the authors could better introduce their logic for the M338V/C341V/S345G mutations in the HCN4-2 VVGPT mutant.

      See response to the reviewing editor’s comment 7.

      B) Minor Comments:

      (1) pg. 9: "We found that LRMP 1-479Cit inhibited HCN4 to an even greater degree than the full-length LRMP, likely because expression of this tagged construct was improved compared to the untagged full-length LRMP, which was detected by co-transfection with GFP." Co-transfection with GFP seems like an extremely poor and a risky measure for LRMP expression.

      We agree that the exact efficiency of co-transfection is contentious although some papers and manufacturer protocols indicate high co-transfection efficiency (Xie et al., 2011). In this paper we used both co-transfection and tagged proteins with similar results.

      (2) pg 9: "LRMP 1-227 construct contains the N-terminus of LRMP with a cut-site near the Nterminus of the predicted coiled-coil sequence". In Figure 2 the graphic shows the coiledcoil domain starting at 191. What was the logic for splitting at 227 which appears to be the middle of the coiled-coil?

      See response to the reviewing editor’s comment 2.

      (3) Figure 5C: Please align the various schematics for HCN4 as was done for LRMP. It makes it much easier to decipher what is what.

      Fig. 5 has been revised as suggested.

      (4) pg 12: I assume that the HCN2 fragment chosen aligns with the HCN4 N2 fragment which shows binding, but this logic should be stated if that is the case. If not, then how was the HCN2 fragment chosen?

      This is correct. This has been explicitly stated in the revised manuscript (page 14).

      (5) Figure 7: Add legend indicating black/gray = HCN4 and blue = HCN2.

      This has been stated in the revised figure legend.

      (6) pg 17: Conservation of P545 and T547 across mammalian species is not shown or cited.

      This sentence is not included in the revised manuscript, however, for the interest of the reviewer we have provided an alignment of this region across species here.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is not clear whether in the absence of cAMP, LRMP also modestly shifts the voltagedependent activity of the channels. Please clarify.

      We have clarified that LRMP does not shift the voltage-dependence in the absence of cAMP (page 10). In the absence of cAMP, LRMP does not significantly shift the voltagedependence of activation in any of the channels we have tested in this paper (or in our prior 2020 paper).

      (2) Resolution of Fig. 8b is low.

      We ultimately decided that the cartoon did not provide any important information for understanding our model and it was removed.

      (3) Please add a supplementary figure showing the amino acid sequence of LRMP to show where the demarcations are made for each fragment as well as where the truncations were made as noted in Fig 3 and Fig 4.

      A new supplementary figure showing the LRMP sequence has been added and cited in the methods section (page 5). Truncation sites have been added to the schematic in Fig. 2A.

      (4) In the cartoon schematic illustration for Fig. 3 and Fig.4, the legend should include that the thick bold lines in the C-Terminal domain represent the CNBD, while the thick bold lines in the N-Terminal domain represent the HCN domain. This was mentioned in Liao 2012, as you referenced when you defined the construct S719X, but it would be nice for the reader to know that the thick bold lines you have drawn in your cartoon indicate that it also highlights the CNBD or the HCN domain.

      This has been added to figure legends for the relevant figures in the revised manuscript.

      (5) On page 12, missing a space between "residues" and "1" in the parenthesis "...LRMP L1 (residues1-108)...".

      Fixed. Thank you.

      (6) Which isoform of LRMP was used? What is the NCBI accession number? Is it the same one from Peters 2020 ("MC228229")?

      This information has been added to the methods (page 5). It is the same as Peters 2020.

      Reviewer #3 (Recommendations For The Authors):

      (1) "Truncation of residues 1-62 led to a partial LRMP effect where cAMP caused a significant depolarizing shift in the presence of LRMP, but the activation in the presence of LRMP and cAMP was hyperpolarized compared to cAMP alone (Fig. 3B, C and 3E; Table 1). In the HCN4Δ1-130 construct, cAMP caused a significant depolarizing shift in the presence of LRMP; however, the midpoint of activation in the presence of LRMP and cAMP showed a non-significant trend towards hyperpolarization compared to cAMP alone (Fig. 3C and 3E; Table 1)".

      This means that sequence 62-185 is necessary and sufficient for the LRMP effect. I suggest a competition assay with this peptide (synthetic, or co-expressed with HCN4 full-length and LRMP to see whether the peptide inhibits the LRMP effect).

      We respectfully disagree with the reviewer’s interpretation. Our results, strongly suggest that other regions such as residues 25-65 (Fig. 3C) and C-terminal residues (Fig. 6) are also necessary. The use of a peptide could be an interesting future experiment, however, it would be very difficult to control relative expression of a co-expressed peptide. We think that our results in Fig. 7E-F where this fragment is added to HCN2 are a better controlled way of validating the importance of this region.

      (2) "Truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation. In the presence of both LRMP and cAMP the activation of HCN4-S719X was still significantly hyperpolarized compared to the presence of cAMP alone (Figs. 4A and 4B; Table 1). And the cAMP-induced shift in HCN4-S719X in the presence of LRMP (~7mV) was less than half the shift in the absence of LRMP (~18 mV)."

      On the basis of the partial effects reported for the truncations of the N-terminus of HCN4 162 and 1-130 (Fig 3B and C), I do not think it is possible to conclude that "truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation". Indeed, cAMP-induced shift in HCN4 Δ1-62 and Δ1-130 in the presence of LRMP were 10.9 and 10.5 mV, respectively, way more than the ~7mV measured for the HCN4-S719X mutant.

      As you rightly stated at the end of the paragraph:" Together, these results show significant LRMP regulation of HCN4 even when the distal C-terminus is truncated, consistent with a minimal role for the C-terminus in the regulatory pathway". I would better discuss this minimal role of the C-terminus. It is true that deletion of the first 185 aa of HCN4 Nterminus abolishes the LRMP effect, but it is also true that removal of the very Cterm of HCN4 does affect LRMP. This unstructured C-terminal region of HCN4 contains isotype-specific sequences. Maybe they also play a role in recognizing LRMP. Thus, I would suggest further investigation via truncations, even internal deletions of HCN4-specific sequences.

      Please see the response to the reviewing editor’s comment 3.

      (3) Figure 5: The N-terminus of LRMP FRETs with the N-terminus of HCN4.

      Why didn't you test the same truncations used in Fig. 3? Indeed, based on Fig 3, sequences 1-25 can be removed. I would have considered peptides 26-62 and 63-130 and 131-185 and a fourth (26-185). This set of peptides will help you connect binding with the functional effects of the truncations tested in Fig 3.

      Please see the response to the reviewing editor’s comment 2 and 5.

      Why didn't you test the C-terminus (from 719 till the end) of HCN4? This can help with understanding why truncation of HCN4 Cterminus does affect LRMP, tough partially (Fig. 4A).

      Please see the response to the reviewing editor’s comment 3.

      (4) "We found that a previously described HCN4-2 chimera containing the HCN4 N-terminus and transmembrane domains (residues 1-518) with the HCN2 C-terminus (442-863) (Liao et al., 2012) was partially regulated by LRMP (Fig. 7A and 7B)".

      I do not understand this partial LRMP effect on the HCN4-2 chimera. In Fig. 6 you have shown that the "HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP". How can be this reconciled with the HCN4-2 chimera? HCN4-2, "containing" P545A/T547F mutations, should not perceive LRMP.

      Please see the response to the reviewing editor’s comment 6.

      (5) "we next made a targeted chimera of HCN2 that contains the distal HCN4 N-terminus (residues 1-212) and the HCN2 transmembrane and C-terminal domains with 5 point mutants in non-conserved residues of the S5 segment and C-linker elbow (M338V/C341V/S345G/A467P/F469T)......Importantly, the HCN4-2 VVGPT channel is insensitive to cAMP in the presence of LRMP (Fig. 7C and 7D), indicating that the HCN4 Nterminus and cAMP-transduction centre residues are sufficient to confer LRMP regulation to HCN2".

      Why did you insert also the 3 mutations of S5? Are these mutations somehow involved in the cAMP transduction mechanism?

      You have already shown that in HCN4 only P545 and T547 (Clinker) are necessary for LRMP effect. I suggest to try, at least, the chimera of HCN2 with only A467P/F469T. They should work without the 3 mutations in S5.

      Please see the response to the reviewing editor’s comment 7.

    2. eLife assessment

      This study identifies the molecular determinants of LRMP co-regulation of HCN 4 activity. The evidence supporting the conclusions, which is compelling, is backed by rigorous electrophysiological and spectroscopic analysis. The work is important because it greatly enhances our understanding of the mechanisms of HCN channel regulation in a tissue-specific manner and highlights a functional role for more disordered regions that have yet to be structurally resolved.

    3. Reviewer #1 (Public Review):

      Summary:

      The authors use truncations, fragments, and HCN2/4 chimeras to narrow down the interaction and regulatory domains for LRMP inhibition of cAMP-dependent shifts in the voltage dependence of activation of HCN4 channels. They identify the N-terminal domain of HCN4 as a binding domain for LRMP, and highlight two residues in the C-linker as critical for the regulatory effect. Notably, whereas HCN2 is normally insensitive to LRMP, putting the N-terminus and 5 additional C-linker and S5 residues from HCN4 into HCN2 confers LRMP regulation in HCN2.

      Strengths:

      The work is excellent, the paper well written, and the data convincingly support the conclusions which shed new light on the interaction and mechanism for LRMP regulation of HCN4, as well as identifying critical differences that explain why LRMP does not regulate other isoforms such as HCN2.

    4. Reviewer #2 (Public Review):

      Summary:

      HCN-4 isoform is found primarily in sino-atrial node where it contributes to the pacemaking activity. LRMP is an accessory subunit which prevents cAMP-dependent potentiation of HCN4 isoform but does not have any effect on HCN2 regulation. In this study, the authors combine electrophysiology, FRET with standard molecular genetics to determine the molecular mechanism of LRMP action on HCN4 activity. Their study shows parts of N- and C-termini along with specific residues in C-linker and S5 of HCN4 are crucial for mediating LRMP action on these channels. Furthermore, they show that the initial 224 residues of LRMP are sufficient to account for most of the activity. In my view, the highlight of this study is Fig. 7 which recapitulates LRMP modulation on HCN2-HCN4 chimera. Overall, this study is an excellent example of using time-tested methods to probe the molecular mechanisms of regulation of channel function by an accessory subunit.

      The authors adequately addressed my earlier concerns.

    5. Reviewer #3 (Public Review):

      Summary:

      Using patch clamp electrophysiology and Förster resonance energy transfer (FRET), Peters and co-workers showed that the disordered N-terminus of both LRMP and HCN4 are necessary for LRMP to interact with HCN4 and inhibit the cAMP-dependent potentiation of channel opening. Strikingly, they identified two HCN4-specific residues, P545 and T547 in the C-linker of HCN4, that are close in proximity to the cAMP transduction centre (elbow Clinker, S4/S5-linker, HCND) and account for the LRMP effect.

      Strengths:

      Based on these data, the Authors propose a mechanism in which LRMP specifically binds to HCN4 via its isotype-specific Nterminal sequence and thus prevents the cAMP transduction mechanism by acting at the interface between the elbow Clinker, the S4S5-linker, the HCND.

      Weaknesses:

      Although the work is interesting, there are some discrepancies between data that need to be addressed.

      - I suggest inserting in Table 1 and in the text, the Δ shift values (+cAMP; + LRMP; +cAMP/LRMP). This will help readers.

      - Figure 1 is not clear, the distribution of values is anomalously high. For instance, in 1B the distribution of values of V1/2 in the presence of cAMP goes from - 85 to -115. I agree that in the absence of cAMP, HCN4 in HEK293 cells shows some variability in V1/2 values, that nonetheless cannot be so wide (here the variability spans sometimes even 30 mV) and usually disappears with cAMP (here not).<br /> This problem is spread throughout the ms, and the measured mean effects indeed always at the limit of statistical significance. Why so? Is this a problem with the analysis, or with the recordings?<br /> There are several other problems with Figure 1 and in all figures of the ms: the Y scale is very narrow while the mean values are marked with large square boxes. Moreover, the exemplary activation curve of Fig 1A is not representative of the mean values reported in Figure 1B, and the values of 1B are different from those reported in Table 1.<br /> On this ground it is difficult to judge the conclusions and it would also greatly help if exemplary current traces would also be shown.

      - "....HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP. Thus, LRMP appears to regulate HCN4 by altering the interactions between the C-linker, S4-S5 linker, and N-terminus at the cAMP transduction centre."

      Although this is an interesting theory, there are no data supporting it. Indeed, P545 and T547 at the tip of the C-linker elbow (fig 6A) are crucial for LRMP effect, but these two residues are not involved in the cAMP transduction centre (interface between HCND, S4S5 linker and Clinker elbow), at least for the data accumulated till now in the literature. Indeed, the hypothesis that LRMP somehow inhibits the cAMP transduction mechanism of HCN4 given the fact that the two necessary residues P545 and T547 are close to the cAMP transduction centre, awaits to be proven.

      Moreover, I suggest analysing the putative role of P545 and T547 in the light of the available HCN4 structures. In particular, T547 (elbow) point towards the underlying shoulder of the adjacent subunit and, therefore, it is in a key position for the cAMP transduction mechanism. The presence of bulky hydrophobic residues (very different nature compared to T) in the equivalent position of HCN1 and HCN2 is also favouring this hypothesis. In this light, it will also be interesting to see whether single T547F mutation is sufficient to prevent LRMP effect.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Pan DY et al. discovered that the clearance of senescent osteoclasts can lead to a reduction in sensory nerve innervation. This reduction is achieved through the attenuation of Netrin-1 and NGF levels, as well as the regulation of H-type vessels, resulting in a decrease in pain-related behavior. The experiments are well-designed. The results are clearly presented, and the legends are also clear and informative. Their findings represent a potential treatment for spine pain utilizing senolytic drugs.

      Strengths:

      Rigorous data, well-designed experiments as well as significant innovation make this manuscript stand out.

      Weaknesses:

      Quantification of histology and detailed statistical analysis will further strengthen this manuscript.

      I have the following specific comments.

      (1) Since defining senescent cells solely based on one or two markers (SA-β-gal and p16) may not provide a robust characterization, it would be advisable to employ another wellestablished senescence marker, such as γ-H2AX or HMGB1, to corroborate the observed increase in senescent osteoclasts following LSI and aging.

      We value the comments provided by the reviewer. In accordance with your suggestion, we have performed co-staining of HMGB1 with Trap in Supplementary Figure 1 to corroborate the observed augmentation of senescent osteoclasts following LSI and aging.

      Author response image 1.

      (2) The connection between heightened Netrin-1 secretion by senescent osteoclasts following LSI or aging and its relevance to pain warrants thorough discussion within the manuscript to provide a comprehensive understanding of the entire narrative.

      We appreciate the reviewer's insightful comments. We have thoroughly addressed the entire narrative in the revised manuscript, as outlined below:

      During lumbar spine instability (LSI) or aging, endplates undergo ossification, leading to elevated osteoclast activity and increased porosity1-4. The progressive porous transformation of endplates, accompanied by a narrowed intervertebral disc (IVD) space, is a hallmark of spinal degeneration4,5. Considering that pain arises from nociceptors, it is plausible that low back pain (LBP) may be attributed to sensory innervation within endplates. Additionally, porous endplates exhibit higher nerve density compared to normal endplates or degenerative nucleus pulposus6. Netrin-1, a crucial axon guidance factor facilitating nerve protrusion, has been implicated in this process7-9. The receptor mediating Netrin-1-induced neuronal sprouting, deleted in colorectal cancer (DCC), was found to co-localize with CGRP+ sensory nerve fibers in endplates after LSI surgery10,11. In summary, during LSI or aging, osteoclastic lineage cells secrete Netrin-1, inducing extrusion and innervation of CGRP+ sensory nerve fibers within the spaces created by osteoclast resorption. This Netrin-1/DCC-mediated pain signal is subsequently transmitted to the dorsal root ganglion (DRG) or higher brain levels.

      (3) It appears that the quantitative data for TRAP staining in Figure 1j is missing.

      We appreciate the reviewer's comments. We have added the statistical data of TRAP staining (Figure. 1p) to Figure 1 in the revised manuscript.

      Author response image 2.

      (4) Regarding Figure 6, could you please specify which panels were analyzed using a t-test and which ones were subjected to ANOVA? Alternatively, were all the panels in Figure 6 analyzed using ANOVA?

      We appreciate the reviewer’s comments here. Upon careful review, we have ensured that quantitative data in panels b, c, and f are analyzed using t-tests, while panels d, e, and g are subjected to one-way ANOVA. These updates have been reflected in the revised figure legend.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examined the underlying mechanisms between senescent osteoclasts (SnOCs) and lumbar spine instability (LSI) or aging. They first showed that greater numbers of SnOCs are observed in mouse models of LSI or aging, and these SnOCs are associated with induced sensory nerve innervation, as well as the growth of H-type vessels, in the porous endplate. Then, the deletion of senescent cells by administration of the senolytic drug Navitoclax (ABT263) results in significantly less spinal hypersensitivity, spinal degeneration, porosity of the endplate, sensory nerve innervation, and H-type vessel growth in the endplate. Finally, they also found that there is greater SnOCmediated secretion of Netrin-1 and NGF, two well-established sensory nerve growth factors, compared to non-senescent OCs. The study is well conducted and data strongly support the idea. However, some minor issues need to be addressed.

      (1) In Figure 2C, "Number of SnCs/mm2", SnCs should be SnOCs.

      We apologize for the oversight. This has been rectified in the revised manuscript.

      Author response image 3.

      (2) In Figure 3A-E, is there any statistical difference between groups Young and Aged+PBS?

      We appreciate the reviewer's comments. Following your recommendation, we conducted additional statistical analyses to compare the young and PBS-treated aged mice, and we have incorporated these findings into the revised manuscript. The data reveals a significant increased paw withdrawal frequency (PWF) in aged mice treated with PBS compared with young mice, particularly at 0.4g instead of 0.07g (Figure 3a, 3b). Moreover, aged mice treated with PBS exhibited a significant reduction in both distance traveled and active time when compared to young mice (Figure. 3d, 3e). Additionally, PBS-treated aged mice demonstrated a significantly shortened heat response time relative to young mice (Figure. 3c).

      Author response image 4.

      (3) Again, is there any statistical difference between the Young and Aged+PBS groups in Figure 4F-K?

      We appreciate the reviewer's comments. As per your suggestion, we conducted a thorough analysis to determine the statistical differences between the young and aged+PBS groups, and these statistical results have been implemented in the revised manuscript. The caudal endplates of L4/5 in PBS-treated aged mice exhibited a significant increase in endplate porosity (Figure. 4f) and trabecular separation (Tb.Sp) (Figure. 4g) compared to young mice.

      Additionally, PBS-treated aged mice showed a significant elevation in endplate score (Figure. 4h), as well as an increased distribution of MMP13 and ColX within the endplates when compared to young mice (Figure. 4i, 4j). Furthermore, TRAP staining revealed a significant increase in TRAP+ osteoclasts within the endplates of PBS-treated aged mice as compared to young mice (Figure. 4k).

      Author response image 5.

      (4) What is the figure legend of Figure 7?

      The legend for Figure 7 (as below) is included in a separate PDF file labeled 'Figures and Legends.' We have carefully checked the revised manuscript and made sure all the legends are included.

      “Fig. 7. (a) Representative images of immunofluorescent analysis of CD31, an angiogenesis marker (green), Emcn, an endothelial cell marker (red) and nuclei (DAPI; blue) of adult sham, LSI and aged mice injected with PBS or ABT263. (b) Quantitative analysis of the intensity mean value of CD31 per mm2 in sham, LSI mice treated with PBS or ABT263. (c) Quantitative analysis of the intensity mean value of CD31 per mm2 in aged mice treated with PBS or ABT263. (d) Quantitative analysis of the intensity mean value of Emcn per mm2 in sham, LSI mice treated with PBS or ABT263. (e) Quantitative analysis of the intensity mean value of Emcn per mm2 in aged mice treated with PBS or ABT263. n ≥ 4 per group. Statistical significance was determined by one-way ANOVA, and all data are shown as means ± standard deviations. “

      (5) In "Mice" section, an Ethical code is suggested to be added.

      We appreciate the reviewer's comments. In accordance with your suggestion, we have included the Johns Hopkins University animal protocol number in the revised manuscript. The relevant paragraph has been updated to read: “All mice were maintained at the animal facility of The Johns Hopkins University School of Medicine (protocol number: MO21M276).”

      (6) In "Methods" section, please indicate the primers of GAPDH.

      We apologize for the absence of the GAPDH primers. Upon review, the GAPDH primers used were as follows: forward primer 5'-ATGTGTCCGTCGTGGATCTGA-3' and reverse primer 5'-ATGCCTGCTTCACCACCTTCTT-3'. These primer sequences have been included in the revised manuscript.

      (7) Preosteoclasts are regarded to be closely related to H-type vessel growth, so do the authors have any comments on this? Any difference or correlation between SnCs and preosteoclasts?

      The pre-osteoclast plays a crucial role in secreting anabolic growth factors that facilitate H-type vessel formation, osteoblast chemotaxis, proliferation, differentiation, and mineralization. The osteoclast represents the terminal differentiation phase, ultimately leading to the induction of resorption.

      Senescent cells, including senescent osteoclasts, are characterized by permanent cell cycle arrest and changes in their secretory profile, which can impact their function. In the context of osteoclasts, senescence can lead to a reduction in bone resorption capacity and impaired bone remodeling. Senescent osteoclasts are believed to contribute to age-related bone loss and bonerelated diseases, such as osteoporosis.

      Reviewer #3 (Public Review):

      Summary:

      This research article reports that a greater number of senescent osteoclasts (SnOCs), which produce Netrin-1 and NGF, are responsible for innervation in the LSI and aging animal models.

      Strengths:

      The research is based on previous findings in the authors' lab and the fact that the IVD structure was restored by treatment with ABT263. The logic is clear and clarifies the pathological role of SnOCs, suggesting the potential utilization of senolytic drugs for the treatment of LBP. Generally, the study is of good quality and the data is convincing.

      Weaknesses:

      There are some points that can be improved:

      (1) Since this work primarily focuses on ABT263, it resembles a pharmacological study for this drug. It is preferable to provide references for the ABT263 concentration and explain how the administration was determined.

      Thank you for your comment. ABT263 has been extensively employed in diverse research studies12-15. The concentration and administration of ABT263 followed the protocol outlined in the published paper13. The reference on how to use ABT263 is cited in the method section: “ABT263 was administered to mice by gavage at a dosage of 50 mg per kg body weight per day (mg/kg/d) for a total of 7 days per cycle, with two cycles conducted and a 2-week interval between them39”.

      (2) It would strengthen the study to include at least 6 mice per group for each experiment and analysis, which would provide a more robust foundation.

      Thank you for your comment here. In response, we conducted a new set of experiments, augmenting the majority of the sample size to six, and updated the corresponding statistical data in the revised manuscript.

      (3) In Figure 4, either use "adult" or "young" consistently, but not both. Additionally, it's important to define "sham," "young," and "adult" explicitly in the methods section.

      Thank you for your comment. We have addressed the inconsistency in the labeling of Figure 4. Additionally, we have explicitly defined "sham," "young," and "adult" in the methods section as follows: The control group (sham group) for the LSI group refers to C57BL/6J mice that did not undergo LSI surgery, while the control group (young group) for the Aged group refers to 4-month-old C57BL/6J mice.

      Author response image 6.

      (4) Assess the protein expression of Netrin 1 and NGF.

      Thank you for your comment here. We employed ELISA to assess the protein expression of Netrin-1 and NGF in the L3 to L5 endplates. The data revealed that compared to the young sham mice, LSI was associated with significantly greater protein expression of Netrin1 and NGF, which was substantially attenuated by ABT263 treatment in LSI mice (Supplementary Fig. 2a, 2b)

      Author response image 7.

      Reference

      (1) Bian, Q. et al. Excessive Activation of TGFbeta by Spinal Instability Causes Vertebral Endplate Sclerosis. Sci Rep 6, 27093, doi:10.1038/srep27093 (2016).

      (2) Bian, Q. et al. Mechanosignaling activation of TGFbeta maintains intervertebral disc homeostasis. Bone Res 5, 17008, doi:10.1038/boneres.2017.8 (2017).

      (3) Papadakis, M., Sapkas, G., Papadopoulos, E. C. & Katonis, P. Pathophysiology and biomechanics of the aging spine. Open Orthop J 5, 335-342, doi:10.2174/1874325001105010335 (2011).

      (4) Rodriguez, A. G. et al. Morphology of the human vertebral endplate. J Orthop Res 30, 280-287, doi:10.1002/jor.21513 (2012).

      (5) Taher, F. et al. Lumbar degenerative disc disease: current and future concepts of diagnosis and management. Adv Orthop 2012, 970752, doi:10.1155/2012/970752 (2012).

      (6) Fields, A. J., Liebenberg, E. C. & Lotz, J. C. Innervation of pathologies in the lumbar vertebral end plate and intervertebral disc. Spine J 14, 513-521, doi:10.1016/j.spinee.2013.06.075 (2014).

      (7) Hand, R. A. & Kolodkin, A. L. Netrin-Mediated Axon Guidance to the CNS Midline Revisited. Neuron 94, 691-693, doi:10.1016/j.neuron.2017.05.012 (2017).

      (8) Moore, S. W., Zhang, X., Lynch, C. D. & Sheetz, M. P. Netrin-1 attracts axons through FAK-dependent mechanotransduction. J Neurosci 32, 11574-11585, doi:10.1523/JNEUROSCI.0999-12.2012 (2012).

      (9) Serafini, T. et al. Netrin-1 is required for commissural axon guidance in the developing vertebrate nervous system. Cell 87, 1001-1014, doi:10.1016/s0092-8674(00)81795-x (1996).

      (10) Forcet, C. et al. Netrin-1-mediated axon outgrowth requires deleted in colorectal cancer-dependent MAPK activation. Nature 417, 443-447, doi:10.1038/nature748 (2002).

      (11) Shu, T., Valentino, K. M., Seaman, C., Cooper, H. M. & Richards, L. J. Expression of the netrin-1 receptor, deleted in colorectal cancer (DCC), is largely confined to projecting neurons in the developing forebrain. J Comp Neurol 416, 201-212, doi:10.1002/(sici)1096-9861(20000110)416:2<201::aid-cne6>3.0.co;2-z (2000).

      (12) Born, E. et al. Eliminating Senescent Cells Can Promote Pulmonary Hypertension Development and Progression. Circulation 147, 650-666, doi:10.1161/CIRCULATIONAHA.122.058794 (2023).

      (13) Chang, J. et al. Clearance of senescent cells by ABT263 rejuvenates aged hematopoietic stem cells in mice. Nat Med 22, 78-83, doi:10.1038/nm.4010 (2016).

      (14) Lim, S. et al. Local Delivery of Senolytic Drug Inhibits Intervertebral Disc Degeneration and Restores Intervertebral Disc Structure. Adv Healthc Mater 11, e2101483, doi:10.1002/adhm.202101483 (2022).

      (15) Yang, H. et al. Navitoclax (ABT263) reduces inflammation and promotes chondrogenic phenotype by clearing senescent osteoarthritic chondrocytes in osteoarthritis. Aging (Albany NY) 12, 12750-12770, doi:10.18632/aging.103177 (2020).

    2. eLife assessment

      This fundamental study advances our understanding of the role of senescent osteoclasts (SnOCs) in the pathogenesis of spine instability. The authors provide compelling evidence for the SnOCs to induce sensory nerve innervation. Accordingly, reduction of SnOCs by the senolytic drug Navitoclax markedly reduces spinal pain sensitivity. This work will be of broad interest to regenerative biologists working on spinal pain.

    3. Reviewer #1 (Public Review):

      Summary:

      In this study, Pan DY et al. discovered that the clearance of senescent osteoclasts can lead to a reduction in sensory nerve innervation. This reduction is achieved through the attenuation of Netrin-1 and NGF levels, as well as the regulation of H-type vessels, resulting in a decrease in pain-related behavior. The experiments are well-designed. The results are clearly presented, and the legends are also clear and informative. Their findings represent a potential treatment for spine pain utilizing senolytic drugs.

      Strengths:

      Rigorous data, well-designed experiments as well as significant innovation make this manuscript stand out.

      Weaknesses:

      All my concerns have been well addressed, no further comments.

    4. Reviewer #2 (Public Review):

      Summary:

      This manuscript examined the underlying mechanisms between senescent osteoclasts (SnOCs) and lumbar spine instability (LSI) or aging. They first showed that greater numbers of SnOCs are observed in mouse models of LSI or aging, and these SnOCs are associated with induced sensory nerve innervation, as well as the growth of H-type vessels, in the porous endplate. Then, the deletion of senescent cells by administration of the senolytic drug Navitoclax (ABT263) results in significantly less spinal hypersensitivity, spinal degeneration, porosity of the endplate, sensory nerve innervation, and H-type vessel growth in the endplate. Finally, they also found that there is greater SnOC-mediated secretion of Netrin-1 and NGF, two well-established sensory nerve growth factors, compared to non-senescent OCs. The study is well conducted and data strongly support the idea.

    5. Reviewer #3 (Public Review):

      Summary:

      This research article reports that a greater number of senescent osteoclasts (SnOCs), which produce Netrin-1 and NGF, are responsible for innervation in the LSI and aging animal models.

      Strengths:

      The research is based on previous findings in the authors' lab and the fact that the IVD structure was restored by treatment with ABT263. The logic is clear and clarifies the pathological role of SnOCs, suggesting the potential utilization of senolytic drugs for the treatment of LBP. Generally, the study is of good quality and the data is convincing.

      Weaknesses:

      All my concerns have been well addressed, no further comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      The authors should include experiments such as Cryo-EM and genetically modified animals to demonstrate the physiological importance of the TMEM81 complex.

      While we intend to pursue cryo-EM studies of the putative complex (or subcomplexes thereof), this is clearly not a straightforward endeavor and goes beyond the scope of the present manuscript. Concerning the generation of genetically modified animals, we would like to underline that the majority of the proteins that we used for AlphaFold-Multimer complex predictions were precisely chosen based on the fact that - as detailed in the publications referenced in the Introduction - ablation of the respective genes caused sex-specific infertility due to defects in gamete fusion (the other criterion used for inclusion being structural similarity to IZUMO1 coupled with expression in the testis (IZUMO2-4 and TMEM81), or evidence from other kinds of experiments in the case of human-specific MAIA). Concerning TMEM81, experimental evidence for a direct involvement in gamete fusion is described in the referenced preprint by Daneke et al., which was submitted to bioRxiv concomitantly with the present work.

      Reviewer #2

      I believe that the manuscript would benefit from the authors providing more information about the systematic search (Figure 4). For example, by indicating for each pair tested the average pDock score in a 2D plot (or table) and as raw data in the supplementary information.

      Figure 4 has been modified to report both the top and the mean ranking scores for every interaction. Furthermore, additional metrics for the systematic search summarized in Figure 4, including pDockQ scores, are provided in this manuscript revision as supplementary Table S1.

      A global search, such as including all membrane proteins expressed in eggs or sperm, could not only be more informative but could also allow the reader to understand the pDock score discrimination power for this particular subset.

      The possibility of carrying out a global search was evaluated by performing preliminary computational experiments on an extended ensemble of sperm and egg proteins. In order to do so, we compiled a list of sperm membrane proteins by referring to 4 proteomic datasets (PMIDs 36384108, 36896575, 31824947, 24082039) and identifying ~600 proteins that were found in at least two of them; among these, 250 were single-pass type I or type II membrane proteins, or GPI-anchored proteins. Similarly, a list of 160 egg surface membrane proteins, excluding multipass and secreted ones, was obtained by comparing oocyte cDNA library NIH_MGC_257_N (Express Genomics, USA) with 4 proteomic datasets (PMIDs 35809850, 36042231, 29025019, 27215607). As we briefly commented at the beginning of the section “Prediction of interactions between human proteins associated with gamete fusion” of the revised manuscript, the tests carried out using the resulting list of sperm and egg proteins suggested that interpreting the results of a global search would be severely complicated by a relatively large number of putative false positives. Moreover, the tests showed that performing a complete systematic search would be beyond our current access to computing power. Based on these observations, we preferred to maintain the present study limited to proteins that had been previously clearly implicated in gamete fusion and/or matched specific structural features of IZUMO1.

      Figure 5 could be improved in clarity by schematically indicating to which cell each protein is anchored.

      This has been done in the revised version of the manuscript.

      Reviewer #3

      Major comments

      (1) In Figure 1, how the protein of mouse/human IZUMO1 and JUNO is purified is not mentioned in the main text nor in the Methods. Are the mouse IZUMO1-His and mouse JUNO-His transfected together or separately? Are human JUNO-His and human IZUMO1-Myc transfected together into HEK293 cells? And purified by IMAC?

      Transfection information has been included in the Methods section “Protein expression, purification and analysis” (previously “Protein expression and purification”). Concerning the purification procedure, we had already stated in the legend of Figure 1 that human JUNOE-His/IZUMO1E-Myc had been purified by IMAC before SEC, and have now done the same for mouse JUNOE-His and IZUMO1E-His.

      (2) It would be easier to understand the figure if the author could run a WB to indicate which band above JUNO is specifically IZUMO1-Myc in Figure 1.

      This has been done and reported in a new Figure S1 (with the original Figure S1 having now become Figure S2). Details about the antibodies used for immunoblot have been included in both Methods section “Protein expression, purification and analysis” and the Key Resources Table.

      (3) Figure 4: Analysis of more proteins that have been suggested as possible candidates for sperm-egg interaction will help to highlight the following results. Also, providing a score for the possibility of interaction might help in selecting those proteins in Figures 5 and 6.

      Please refer to the answer to the first question of Reviewer #2.

      (4) Figure 7: The authors take advantage of the latest developments in protein structure and interaction to model protein complex formation. However, some experimental experiments such as Co-IP, pull down to support the prediction to verify some of this predicated interaction is necessary.

      We agree with the reviewer; however, for the reasons we discussed during our comparison of the biochemical properties of the JUNO/IZUMO1 interaction between mouse and human, pursuing this line of inquiry will likely necessitate an extensive set of parallel experiments using proteins from different species. This work is being planned and will be the focus of future studies. However, as we mentioned at the end of the Abstract, one should also consider that some of these complexes are likely to be highly transient. Because of this, while they may have important regulated roles in vivo (function at a specific time and place), they could be very challenging to detect using standard approaches in vitro. We thus see this as a significant advance that structural modeling could contribute to the identification of such functionally important but transient interactions.

      Minor points

      (1) In the abstract, "three sperm (IZUMO1, SPACA6 and TMEM81) "should be "three sperm proteins."

      The Abstract has been condensed to fit within the suggested 200-word limit and, as part of this, the sentence has been changed to “complex involving sperm IZUMO1, SPACA6, TMEM81 and egg JUNO, CD9”.

      (2) How do the predictions of the binary complex IZUMO1/CD9 (Figure S1B) or IZUMO1/CD81 (Figure S1C) suggest "the two egg tetraspanins are interchangeable"? Was it because they are quite similar? Please provide more explanation for this speculation. Interchangeable by function or for complex formation? To support the conclusion, biochemical data is required. Otherwise, it needs to be toned down.

      This is because, in the AlphaFold-Multimer predictions of the pentameric complex, CD9 and CD81 are placed in essentially the same way relative to the other subunits.

      We have now clarified this at the end of page 6:

      “(...) suggest that the two egg tetraspanins are interchangeable because they are predicted to bind to the same region of IZUMO1; (...)”

      (3) It would be more reader-friendly if the author could label the name of each protein in the figure in Figure S1, especially when the name is not written in the figure legend.

      This has been done in Figure S2 of the revised manuscript (corresponding to original Figure S1).

    2. eLife assessment

      This study offers valuable insights into the structural architecture of the mammalian egg-sperm fusion synapse, shedding light on the role of specific proteins in fertilization. The significance of the findings lies in the potential identification of a pentameric complex involved in gamete fusion by AlphaFold Multimer. The strength of evidence for the approach/methodology is solid, while the experimental validation is incomplete in supporting these interactions. This work will be of interest to biomedical researchers working on fertility and reproductive health.

    3. Reviewer #2 (Public Review):

      Summary:

      Fertilization is a crucial event in sexual reproduction, but the molecular mechanisms underlying egg-sperm fusion remain elusive. Elofsson A et al. used AlphaFold to explore possible synapse-like assemblies between sperm and egg membrane proteins during fertilization. Using a systematic search of protein-protein interactions, the authors proposed a pentameric complex of three sperm (IZUMO1, SPACA6, and TMEM81) and two egg (JUNO and CD9) proteins, providing a new structural model to be used in future structure-function studies.

      Strengths:

      (1) The study uses the AlphaFold algorithm to predict higher-order assemblies. This approach could offer insights into a highly transient protein complex, which are challenging to detect experimentally.<br /> (2) The article predicts a pentameric complex between proteins involved in fertilization, shedding light on the architectural aspects of the egg-sperm fusion synapse.

      Weaknesses:

      The proposed model, which is a prediction from a modeling algorithm, lacks experimental validation of the identity of the components and the predicted contacts.

      It is noteworthy that in an independent study, Deneke et al. provides experimental evidence of the interaction between IZUMO1/SPACA6/TMEM81 in zebrafish. This is an important element that supports the findings presented in this manuscript

      Regarding the authors response on the question of a global search:<br /> I understand that a global search might be difficult to interpret because a large number of putative false positives. But it is this type of information that is needed to assess the validity of the model and the scoring power in the absence of any experimental validation. At minimum, the search should include a negative control set of proteins known to be unrelated to sperm fertilization or homologous egg-sperm fusion complexes from incompatible species to account for species-specific interactions.

      I acknowledge that experimentally validating highly transient complexes presents technical hurdles. However, a high-confidence structural model could enable the design of point mutations specifically disrupting the predicted interactions. Subsequent rescue experiments could then validate the directionality of these interactions. Ultimately, such experiments are crucial for robust model validation.

    4. Reviewer #3 (Public Review):

      Summary:

      Sperm-egg fusion is a critical step in successful fertilization. Although several proteins have been identified in mammals that are required for sperm-egg adhesion and fusion, it is still unclear whether there are other proteins involved in this process and how the reported proteins complex co-operate to complete the fusion process. In this study, the authors first identified TMEM81 as a structural homologue of IZUMO1 and SPACA6, and predicted the interactions with a pool of human proteins associated with gamete fusion, using AlphaFold-Multimer, a recent advance in protein complex structure prediction. The prediction is compelling and well discussed, and the experimental evidence to verify this interaction is lacking in this study but supported by a complementary and independent study by another group.

      Strengths:

      The authors present a pentameric complex formation of four previously reported proteins involved in egg/sperm interaction together with TMEM181 using a deep learning tool, AlphaFold-Multimer.

      Weaknesses:

      It is intriguing to see that some of the proteins involved in sperm-egg interaction are successfully predicted to be assembled into a single multimeric structure by AlphaFold-Multimer. The experimental validation of the interactions is not directly supported in this study. As there are more candidate proteins in the process, testing other possible protein interactions more comprehensively will provide more rationale for the current 3D multi-protein modeling.

    1. eLife assessment

      This important study sheds light on the mechanisms underlying a rare brain disease, offering insight into the role of microglia in this complex pathophysiology. The evidence presented is solid, utilizing state-of-the-art laboratory models to explore cellular interactions and disease development. While further research is needed, this study will be of interest to neuroscientists and clinicians aiming to understand and combat similar neurodegenerative disorders.

    2. Reviewer #1 (Public Review):

      Here, using an organoid system, Wong et al generated a new model of hereditary diffuse leukoencephalopathy with axonal spheroids, with which they investigated how CSF1R-mutaions affect the phenotypes of microglia/macrophages, and revealed metabolic changes in microglia/macrophages associated with a proinflammatory phenotype.

      In general, this paper is interesting and well-written, and tackles important issues to be addressed.

      This study suffers from several major concerns and limitations that dampen the value of the study. As the authors also mentioned, models that perfectly recapitulate the complexity of the HDLS brain the models would be required to better understand the molecular mechanisms of the disease. In this regard, it is unclear how nicely the organoid system in this study can recapitulate the condition in patients with HDLS (e.g. reduced microglia density, downregulated expression of P2YR12, pathological alterations). In addition, the authors used two different models with distinct mutations that could produce different readouts in CSF1R-mediated cellular responses.

      Although the reviewer does understand the importance of providing several options/tools to study rare diseases like HDLS and the difficulty of generating stable organoids with less variation, it is unclear if the different outcomes between HD1 and HD2 are generated through different mutations or simply due to different differentiation efficiency from iMacs (e.g. Figure 2B), which needs to be confirmed. Lastly, there is an over-interpretation regarding the results in Figure 6A. There is no difference between isoHD1 iMac control and HD1 Mut iMac.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper investigates a rare and severe brain disease called Hereditary Diffuse Leukoencephalopathy with Axonal Spheroids (HDLS). The authors aimed to understand how mutations in the gene CSF-1R affect microglia, the resident immune cells in the brain, and which alterations and factors lead to the specific pathophysiology. To model the human brain with the pathophysiology of HDLS, they used the human-specific model system of induced pluripotent stem cell (iPSC)-derived forebrain organoids with integrated iPSC-derived microglia (iMicro) from patients with the HDLS-causing mutation and an isogenic cell line with the corrected genome. They found that iPSC-derived macrophages (iMac) with HDLS mutations showed changes in their response, including increased inflammation and altered metabolism. Additionally, they studied these iMacs in forebrain organoids, where they differentiate into iMicro, and showed transcriptional differences in isolated iMicro when carrying the HDLS mutation. In addition, the authors described the influence of the mutation within iMicro on the transcriptional level of neurons and neural progenitor cells (NPCs) in the organoid. They observed that the one mutation showed implications for impaired development of neurons, possibly contributing to the progression of the disease.

      Overall, this study provides valuable insights into the mechanisms underlying HDLS and emphasizes the importance of studying diseases like these with a suitable model system. These findings, while promising, represent only an initial step towards understanding HDLS and similar neurodegenerative diseases, and thus, their direct translation into new treatment options remains uncertain.

      Strengths:

      The strength of the work lies in the successful reprogramming of two HDLS patient-derived induced pluripotent stem cells (iPSCs) with different mutations, which is crucial for the study of HDLS using human forebrain organoid models. The use of corrected isogenic iPSC lines as controls increases the validity of the mutation-specific observations. In addition, the model effectively mimics HDLS, particularly concerning deficits in the frontal lobe, mirroring observations in the human brain. Obtaining iPSCs from patients with different CSF1R mutations is particularly valuable given the limitations of rodent and zebrafish models when studying adult-onset neurodegenerative diseases. The study also highlights significant metabolic changes associated with the CSF1R mutation, particularly in the HD2 mutant line, which is confirmed by the HD1 line. In addition, the work shows transcriptional upregulation of the proinflammatory cytokine IL-1beta in cells carrying the mutation, particularly when they phagocytose apoptotic cells, providing further insight into disease mechanisms.

      Weaknesses:

      The authors have not elucidated the significance of the increased CSF1 dosage in Figure 2F, aside from its effect on cell viability, lacking a thorough discussion of this result. Additionally, while transcriptomic and metabolic alterations related to the mutation were demonstrated in iMac models, similar investigations in iMicros are absent, necessitating further experiments to validate the findings across cell models. The conclusion drawn regarding cytokine levels lacks robust support from the data, particularly considering the varied responses observed in different mutant lines. Further analysis of the secretome (e.g. via ELISA) could provide additional insights. Moreover, the characterization of iMicros is incomplete, with limited protein-level analysis (e.g. validate RNA-seq via flow cytometry). Additionally, the claim of microglial-like morphology lacks adequate evidence, as the provided image is insufficient for such an assessment. RNA-seq experiments should be represented better, it is not possible to read the legends or gene names in the figures. Maybe the data sets can be combined into PCAone and one overall analysis, e.g. via WGCNA-like analyses? This would make it easier for the reader to compare the two cell lines side by side. Furthermore, inaccuracies and omissions in the figure legends compromise the clarity of data representation. Statistical test information is missing. Finally, inconsistent terminology usage throughout the paper may confuse readers (iMac versus iMicros).

    1. eLife assessment

      In this study, the authors provide valuable evidence that the LGE is not a significant source of oligodendrocytes for the cortex. The reviewers did find some technical considerations that call for some modulation of the strength of the authors' conclusions and also pointed out some aspects of the data that were incomplete as presented.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors generated a novel transgenic mouse line OpalinP2A-Flpo-T2A-tTA2 to specifically label mature oligodendrocytes, and at the same time their embryonic origins by crossing with a progenitor cre mouse line. With this clever approach, they found that LGE/CGE-derived OLs make minimum contributions to the neocortex, whereas MGE/POA-derived OLs make a small but lasting contribution to the cortex. These findings are contradictory to the current belief that LGE/CGE-derived OPCs make a sustained contribution to cortical OLs, whereas MGE/POA-derived OPCs are completely eliminated. Thus, this study provides a revised and more comprehensive view on the embryonic origins of cortical oligodendrocytes. To specifically label mature oligodendrocytes, and at the same time their embryonic origins by crossing with a progenitor cre mouse line. With this clever approach, they found that LGE/CGE-derived OLs make minimum contributions to the neocortex, whereas MGE/POA-derived OLs make a small-but-lasting contribution to to cortex. These findings are contradictory to the current belief that LGE/CGE-derived OPCs make a sustained contribution to cortical OLs, whereas MGE/POA-derived OPCs are completely eliminated. Thus, this study has provided a revised and updated view on the embryonic origins of cortical oligodendrocytes.

      Strengths:

      The authors have generated a novel transgenic mouse line to specifically label mature differentiated oligodendrocytes, which is very useful for tracing the final destiny of mature myelinating oligodendrocytes. Also, the authors carefully compared the distribution of three progenitor cre mouse lines and suggested that Gsh-cre also labeled dorsal OLs, contrary to the previous suggestion that it only marks LGE-derived OPCs. In addition, the author also analyzed the relative contributions of OLs derived from three distinct progenitor domains in other forebrain regions (e.g. Pir, ac). Finally, the new transgenic mouse lines and established multiple combinatorial genetic models will facilitate future investigations of the developmental origins of distinct OL populations and their functional and molecular heterogeneity.

      Weaknesses:

      Since OpalinP2A-Flpo-T2A-tTA2 only labels mature oligodendrocytes but not OPCs, the authors can not suggest that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation (line 118-9). It remains possible that LGE/CGE-derived OPCs migrate into the cortex but are later eliminated.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Cai et al use a combination of mouse transgenic lines to re-examine the question of the embryonic origin of telencephalic oligodendrocytes (OLs). Their tools include a novel Flp mouse for labelling mature oligodendrocytes and a number of pre-existing lines (some previously generated by the last author in Josh Huang's lab) that allowed combinatorial or subtractive labelling of oligodendrocytes with different origins. The conclusion is that cortically-derived OLs are the predominant OL population in the motor and somatosensory cortex and underlying corpus callosum, while the LGE/CGE generates OLs for the piriform cortex and anterior commissure rather than the cerebral cortex. Small numbers of MGE-derived OLs persist long-term in the motor, somatosensory and piriform cortex.

      Strengths:

      The strength and novelty of the manuscript lies in the elegant tools generated and used and which have the potential to elegantly and accurately resolve the issue of the contribution of different progenitor zones to telencephalic regions.

      Weaknesses:

      (1) Throughout the manuscript (with one exception, lines 76-78), the authors quantified OL densities instead of contributions to the total OL population (as a % of ASPA for example). This means that the reader is left with only a rough estimation of the different contributions.

      (2) All images and quantifications have been confined to one level of the cortex and the potential of the MGE and the LGE/CGE to produce oligodendrocytes for more anterior and more posterior cortical regions remains unexplored.

      (3) Hence, the statement that "In summary, our findings significantly revised the canonical model of forebrain OL origins (Figure 4A) and provided a new and more comprehensive view (Figure 4B )." (lines 111, 112) is not really accurate as the findings are neither new nor comprehensive. Published manuscripts have already shown that (a) cortical OLs are mostly generated from the cortex [Tripathi et al 2011 (https://doi.org/10.1523/JNEUROSCI.6474-10.2011), Winker et al 2018 (https://doi.org/10.1523/JNEUROSCI.3392-17.2018) and Li et al (https://doi.org/10.1101/2023.12.01.569674)] and (b) MGE-derived OLs persist in the cortex [Orduz et al 2019 (https://doi.org/10.1038/s41467-019-11904-4) and Li et al 2024 (https://doi.org/10.1101/2023.12.01.569674)]. Extending the current study to different rostro-caudal regions of the cortex would greatly improve the manuscript.

    4. Reviewer #3 (Public Review):

      In the manuscript entitled "Embryonic Origins of Forebrain Oligodendrocytes Revisited by Combinatorial Genetic Fate Mapping," Cai et al. used an intersectional/subtractional strategy to genetically fate-map the oligodendrocyte populations (OLs) generated from medial ganglionic eminence (NKX2.1+), lateral ganglionic eminences, and dorsal progenitor cells (EMX1+). Specifically, they generated an OL-expressing reporter mouse line OpalinP2A-Flpo-T2A-tTA2 and bred with region-specific neural progenitor-expressing Cre lines EMX1-Cre for dOL and NKX2.1-Cre for MPOL. They used a subtractional strategy in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line to predict the origins of OLs from lateral/caudal ganglionic eminences (LC). With their genetic tools, the authors concluded that neocortical OLs primarily consist of dOLs. Although the populations of OLs (dOLs or MP-OLs) from Emx1+ or Nkx2.1+ progenitors are largely consistent with previous findings, they observed that MP-OLs contribute minimally but persist into adulthood without elimination as in the previous report (PMID: 16388308).

      Intriguingly, by using an indirect subtraction approach, they hypothesize that both Emx1-negative and Nkx2.1-negative cells represent the progenitors from lateral/caudal ganglionic eminences (LC), and conclude that neocortical OLs are not derived from the LC region. This is in contrast to the previous observation for the contribution of LC-expressing progenitors (marked by Gsx2-Cre) to neocortical OLs (PMID: 16388308). The authors claim that Gsh2 is not exclusive to progenitor cells in the LC region (PMID: 32234482). However, Gsh2 exhibits high enrichment in the LC during early embryonic development. The presence of a small population of Gsh2-positive cells in the late embryonic cortex could originate/migrate from Gsh2-positive cells in the LC at earlier stages (PMID: 32234482). Consequently, the possibility that cortical OLs derived from Gsh2+ progenitors in LC could not be conclusively ruled out. Notably, a population of OLs migrating from the ventral to the dorsal cortical region was detected after eliminating dorsal progenitor-derived OLs (PMID: 16436615).

      The indirect subtraction data for LC progenitors drawn from the OpalinFlp-tdTOM reporter in Emx1-negative and Nkx2.1-negative cells in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line present some caveats that could influence their conclusion. The extent of activity from the two Cre lines in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mice remains uncertain. The OpalinFlp-tdTOM expression could occur in the presence of either Emx1Cre or Nkx2.1Cre, raising questions about the contribution of the individual Cre lines. To clarify, the authors should compare the tdTOM expression from each individual Cre line, OpalinFlp::Emx1Cre::RC::FLTG or OpalinFlp::Nkx2.1Cre::RC::FLTG, with the combined OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line. This comparison is crucial as the results from the combined Cre lines could appear similar to only one Cre line active.

      Overall, the authors provided intriguing findings regarding the origin and fate of oligodendrocytes from different progenitor cells in embryonic brain regions. However, further analysis is necessary to substantiate their conclusion about the fate of LC-derived OLs convincingly.

    1. eLife assessment

      This useful work describes a novel microscopy-based method to correlate the degree of pigmentation with the gene expression profile of human-induced pluripotent stem cell-derived Retinal Pigmented Epithelial (iPSC-RPE) cells at the single cell level. The presented evidence is solid in showing that there is heterogeneous gene expression in iPSC-derived RPE cells, and there is no significant correlation with the pigmentation. By analyzing the expression of some genes related to function, lysosomal- and complement-related pathways were partially enriched in darker cells. This methodology can be used by other researchers interested in analyzing gene expression related to microscopic images.

    1. eLife assessment

      This useful study reports that a week or more of hypoxia exposure in mice increases erythropoiesis and decreases the number of iron-recycling macrophages in the spleen, compromising their capacity for red blood cell phagocytosis – reflected by increased mature erythrocyte retention in the spleen. Compared to an earlier version, the study has been strengthened with mouse experiments under hypobaric hypoxia and complemented by extensive ex vivo analyses. Unfortunately, while some of the evidence is solid, the work as it currently stands only incompletely supports the authors' hypotheses. While the study would benefit from additional experiments that more directly buttress the central claims, it should be of interest to the fields of hemopoiesis and bone marrow biology and possibly also blood cancer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study examined a universal fractal primate brain shape. However, the paper does not seem well structured and is not well written. It is not clear what the purpose of the paper is. And there is a lack of explanation for why the proposed analysis is necessary. As a result, it is challenging to clearly understand what novelty in the paper is and what the main findings are.

      We have now restructured the paper, including a summary of the main purpose and findings as follows:

      “Compared to previous literature, we can summarise our main contribution and advance as follows:

      (i) We are showing for the first time that representative primate species follow the exact same fractal scaling – as opposed to previous work showing that they have a similar fractal dimension [Hofman1985, Hofman1991], i.e. slope, but not necessarily the same offset, as previous methods had no consistent way of comparing offsets.

      (ii) Previous work could also not show direct agreement in morphometrics between the coarse-grained brains of primate species and other non-primate mammalian species.

      (iii) Demonstrating in proof-of-principle that multiscale morphometrics, in practice, can have much larger effect sizes for classification applications. This moves beyond our previous work where we only showed the scaling law across [Mota2015] and within species [Wang2016], but all on one (native) scale with comparable effect sizes for classification applications [Wang2021].

      In simple terms: we know that objects can have the same fractal dimension, but differ greatly in a range of other shape properties. However, we demonstrate here, that representative primate brains and mammalian brain indeed share a range of other key shape properties, on top of agreeing in fractal dimension. This suggests a universal blueprint for mammalian brain shape and a common set of mechanisms governing cortical folding. As a practical additional outcome of our study, we could show that our novel method of deriving multiscale metrics of brain shape can differentiate subtle shape changes much better than the metrics we have been using so far at a single native scale.”

      We plan to use the second paragraph as a plain-language summary of our work.

      Additionally, several terms are introduced without adequate explanation and contextualization, further complicating comprehension.

      We have now made sure that potential jargon is introduced with context and explanation. For example in Introduction: “This scaling law, relating powers of cortical thickness and surface area metrics, […]”

      Does the second section, "2. Coarse-graining procedure", serve as an introduction or a method?

      We have now renamed this section to “Coarse-graining Method” to indicate that this is a section about methods. However, to describe the methods adequately, we also expanded this section with introductory texts around the history and motivation of the method to provide context and explanations, as the reviewer rightly requested.

      Moreover, the rationale behind the use of the coarse-graining procedure is not adequately elucidated. Overall, it is strongly recommended that the paper undergoes significant improvements in terms of its structure, explanatory depth, and overall clarity to enhance its comprehensibility.

      To specifically explain the rationale behind the coarse-graining method, we added several clarifications, including the following paragraph:

      “As a starting point for such a coarse-graining procedure, we suggest to turn to a well-established method that measures fractal dimension of objects: the so-called box-counting algorithm [Kochunov2007, Madan2019]. Briefly, this algorithm fills the object of interest (say the cortex in our case) with boxes, or voxels of increasingly larger sizes and counts the number of boxes in the object as a function of box size. As the box size increases, the number of boxes decreases; and in a log-log plot, the slope of this relationship indicates the fractal dimension of the object. In our case, this method would not only provide us with the fractal dimension of the cortex, but, with increasing box size, the filled cortex would also contain less and less detail of the folded shape of the cortex. Intuitively, with increasing box size, the smaller details, below the resolution of a single box, would disappear first, and increasingly larger details will follow -- precisely what we require from a coarse-graining method. We therefore propose to expand the traditional box-counting method beyond its use to measure fractal dimension, but to also analyse the reconstructed cortices as different realisations of the original cortex at the specified spatial scale.”

      Reviewer #2 (Public Review):

      In this manuscript, Wang and colleagues analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The work builds on the scaling law introduced previously by co-author Mota, and Herculano-Houzel. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between the brains of young (~20 year old) and old (~80) humans. My main criticism of this manuscript is that the findings are presented in too abstract a manner for the scientific contribution to be recognized.

      We recognise that our work is at the intersection of complex mathematical concepts and a perplexing biological phenomenon. Therefore, our paper has to strike a balance between scientifically accurate and succinct descriptions whilst giving sufficient space to provide context and explanations.

      Throughout, we have now added text to provide more context, but also repeat key statements in plain-english terms.

      For example, we added the following text to highlight our key contributions.

      “In simple terms: we know that objects can have the same fractal dimension, but differ greatly in a range of other shape properties. However, we demonstrate here, that representative primate brains and mammalian brain indeed share a range of other key shape properties, on top of agreeing in fractal dimension. This suggests a universal blueprint for mammalian brain shape and a common set of mechanisms governing cortical folding. As a practical additional outcome of our study, we could show that our novel method of deriving multiscale metrics of brain shape can differentiate subtle shape changes much better than the metrics we have been using so far at a single native scale.”

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1, constitute a novel procedure, but it is not strongly motivated, and it produces image segmentations that do not resemble real brains.

      To specifically explain the rationale behind the coarse-graining method, we added several clarifications, including the following paragraph:

      “As a starting point for such a coarse-graining procedure, we suggest to turn to a well-established method that measures fractal dimension of objects: the so-called box-counting algorithm [Kochunov2007, Madan2019]. Briefly, this algorithm fills the object of interest (say the cortex in our case) with boxes, or voxels of increasingly larger sizes and counts the number of boxes in the object as a function of box size. As the box size increases, the number of boxes decreases; and in a log-log plot, the slope of this relationship indicates the fractal dimension of the object. In our case, this method would not only provide us with the fractal dimension of the cortex, but, with increasing box size, the filled cortex would also contain less and less detail of the folded shape of the cortex. Intuitively, with increasing box size, the smaller details, below the resolution of a single box, would disappear first, and increasingly larger details will follow -- precisely what we require from a coarse-graining method. We therefore propose to expand the traditional box-counting method beyond its use to measure fractal dimension, but to also analyse the reconstructed cortices as different realisations of the original cortex at the specified spatial scale.”

      We also note in several places in the text that the coarse-grained brains are not to be understood as exact reconstructions of actual brains, but serve the purpose of a model:

      “[…] nor are the coarse-grained versions of human brains supposed to exactly resemble the location/pattern/features of gyri and sulci of other primates. The similarity we highlighted here are on the level of summary metrics, and our goal was to highlight the universality in such metrics to point towards highly conserved quantities and mechanisms.”

      “Note, of course, that the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel (section S2). This causes the cortical segmentation, such as the bottom row of Figure 1B, to increase in thickness with successive melting steps, to unrealistic values. For the rightmost figure panel, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of a typical mammalian neocortex.

      Specifically on the point on increasing cortical thickness with increased level of coarse-graining, we have now added the following paragraph:

      “The observation that with increasing voxel sizes, the coarse-grained cortices tend to be smoother and thicker is particularly interesting: the scaling law in Eq. 1 can be understood as thicker cortices (T) form larger folds (or are smoother i.e. less surface area At) when brain size is kept constant (Ae). This way of understanding has also been vividly illustrated by using the analogy of forming paper balls with papers of varying thickness in [Mota2015]: to achieve the same size of a paper ball (Ae), the one that uses thicker paper (T) will show larger folds (or is smoother i.e. less surface area At) than the one using thinner paper. The scaling law can therefore be understood as a physically and biologically plausible statement, and here, we are encouraged that our algorithm yields results in line with the scaling law.”

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4c. It seems this additional spacing between gyri in 80-year-olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20-year-olds. A case could be made that the familiar way of describing brain tissue - cortical volume, white matter volume, thickness, etc. - is a more direct and intuitive way to describe differences between young and old adult brains than the obscure shape metric described in this manuscript. At a minimum, a demonstration of an advantage of the Figure 4a and 4b analyses over current methods for interpreting age-related differences would be valuable.

      We have demonstrated the utility of our new shape metrics in a separate paper [Wang2021]. However, we agree with the reviewer that, in this specific instance, it is much easier to understand the key message without considering the less traditional metrics. We have therefore completely revised this part of the Results section to highlight the advantage of multiscale morphometrics, and used the traditional metric of surface area to illustrate the point. The reasoning in surface area is much easier to follow, both visually and conceptually, exactly as the reviewer described.

      (3) In Discussion lines 199-203, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. First, the authors do not show, (and it would be surprising if it were true) that self-similarity is observed for length scales smaller than the acquired MRI data for any of the datasets analyzed. The analysis is restricted to coarse (but not fine)-graining.

      To clarify this point, we have added a supplementary section and the following sentence: “Note this method has also no direct dependency on the original MR image resolution, as the inputs are smooth grey and white matter surface meshes reconstructed from the images using strong (bio-)physical assumptions and therefore containing more fine-grained spatial information than the raw images (also see Suppl. Text 3).”

      We are indeed sampling at resolutions down to 0.2mm, which is below MR image resolution. The reviewer is, however, correct that we are only coarse-graining, not “fine-graining”. Coarse-graining, here, relates to more coarse than the smooth surface meshes though, not the MR image.

      Therefore, self-similarity on all length scales would seem to be too strong a constraint. Second, it is hard to imagine how this test could be used in practice. Specific examples of how gyrification mechanisms support or fail to support the generation of self-similarity across any length scale, would strengthen the authors' argument.

      We agree that spatial scales much below 0.2mm resolution may not be of interest, as these scales are only measuring the fractal properties, or “bumpiness”, of the surface meshes at the vertex level. We have therefore revised our statement in Discussion and clarified it with an example: “Finally, this dual universality is also a more stringent test for existing and future models of cortical gyrification mechanisms at relevant scales, and one that moreover is applicable to individual cortices. For example, any models that explicitly simulate a cortical surface could be directly coarse-grained with our method and compared to actual human and primate data provided here.”

      Some additional, specific comments are as follows:

      (4) The definition of the term A_e as the "exposed surface" was difficult to follow at first. It might be helpful to state that this parameter is operationally defined as the convex hull surface area.

      We agree and introduced this term now at first use: “The exposed surface area can be thought of as the surface area of a piece of cling film wrapped around the brain. Mathematically, for the remaining paper it is the convex hull of the brain surface.”

      Also, for the pial surface, A_t, there are several who advocate instead for the analysis of a cortical mid-thickness surface area, as the pial surface area is subject to bias depending on the gyrification index and the shape of the gyri. It would be helpful to understand if the same results are obtained from mid-thickness surfaces.

      This point is indeed being investigated independently of this study. Our provisional understanding is that in healthy human brains, at native scale, using the mid (or the white matter) surface introduced a systematic offset shift in the scaling law, but does not affect the scaling slope of 1.25. However, this requires a more in-depth investigation in a range of other conditions, and in the context of the coarse-grained shapes, which is on-going. Nevertheless, the scaling law, at first introduction already, has been using the pial surface area [Mota2015] and all subsequent follow-up studies followed this convention. To make our paper here accessible and directly comparable, we therefore used the same metric. Future work will investigate the utility of other metrics.

      (5) In Figure 2c, the surfaces get smaller as the coarse-graining increases, making it impossible to visually assess the effects of coarse-graining on the shapes. Why aren't all cortical models shown at the same scale?

      The purpose of rescaling the surfaces is to match the scaling plot (Fig 2A) directly, which are showing shrinking surface areas Ae and At with increasing coarse-graining. Here, we are effectively keeping the size of the box constant and resizing the cortical surface instead, which is mathematically equivalent to changing the box size and keeping the cortical surface constant.

      An alternative interpretation of the “shrinking” is, therefore, that with increasingly smaller cortical surfaces, the folding details disappear, as we require from our coarse-graining method. This is also visually apparent, as the reviewer points out. We have added this to the explanation in the text.

      If we, however, changed the box size instead, the scaling law plot would be meaningless: for example, Ae would barely change with coarse-graining. We would therefore have needed to introduce more complexity in our analysis in terms of how we can measure the scaling law. Thus, we opted to present the simpler method and interpretation here.

      Nevertheless, we agree that a direct comparison would be beneficial and have thus added the videos for each species in supplementary under this link: https://bit.ly/3CDoqZQ Upon completed peer-review, we hope to integrate these directly into eLife’s interactive displays for this figure.

      (6) Text in Section 3.2 emphasizes that K is invariant with scale (horizontal lines in Figure 3), and asserts this is important for the formation of all cortices. However, I might be mistaken, but it appears that K varies with scale in Figure 4a, and the text indicates that differences in the S dependence are of importance for distinguishing young vs. old brains. Is this an inconsistency?

      We agree that it may be confusing to emphasise a “constant K” in the first set of results across species, and then later highlight a changing K in the human ageing results. To clarify, in the first set of results, we find a constant K relative to a changing S: the range in K across melted primate brains is less than 0.1, whereas in S it is over 1.2. In other words, S changes are an order of magnitude higher than K changes. Hence, we described K as “constant” relative to S.

      Nevertheless, K shows subtle changes within individuals, which is what we were describing in the human ageing results. These changes are within the range of K values described in the across species results.

      However, in the interest of clarity, we followed the reviewer’s suggestion of simplifying the last set of results on human ageing and therefore the variable K in human ageing now only appears in Supplementary. We have now added clarifications to the supplementary on this point.

      Reviewer #3 (Public Review):

      Summary:

      Through a detailed methodology, the authors demonstrated that within 11 different primates, the shape of the brain matched a fractal of dimension 2.5. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependent effect of aging on the human brain.

      Strengths:

      • New hierarchical way of expressing cortical shapes at different scales derived from the previous report through the implementation of a coarse-graining procedure.

      Positioning of results in comparison to previous works reinforcing the validity of the observation.

      • Illustration of scale-dependence of effects of brain aging in the human.

      Weaknesses:

      • The impact of the contribution should be clarified compared to previous studies (implementation of new coarse graining procedure, dimensionality of primate brain vs previous studies, and brain aging observations).

      We have now made these changes, particularly by adding two paragraphs to the start of Discussion. One summarising the main contributions above previous work, and one paraphrasing the former in plain English for accessibility.

      • The rather small sample sizes, counterbalanced by the strength of the effect demonstrated.

      We have now increased the sample size of the human ageing analysis substantially to over 100 subjects and observe the same trends, but with an even stronger effect. We therefore believe that this revision serves as an additional internal validation of our data and methods.

      • The use of either averaged or individual brains for the different sub-studies could be made clearer.

      We have now added this to our Suppl methods: with the exception of the Marmoset, all brain surface data were derived from healthy individual brains.

      • The model discussed hypothetically in the discussion is not very clear, and may not be state-of-the-art (axonal tension driving cortical folding? cf. https://doi.org/10.1115/1.4001683).

      We have now added this citation to our Discussion and given it context:

      “Indeed, our previously proposed model [Mota2015] for cortical gyrification is very simple, assuming only a self-avoiding cortex of finite thickness experiencing pressures (e.g. exerted by white matter pulling, or by CSF pressure). The offset K, or 'tension term', precisely relates to these pressures, leading us to speculate that subtle changes in K correlate with changes in white matter property [Wang2016, Wang2021]. In the same vein of speculation, the scale-dependence of K shown in this work might therefore be related to different types of white matter that span different length scales, such as superficial vs. deep white matter, or U-fibres vs. major tracts. However, there are also challenges to the axonal tension hypothesis [Xu2010]. Indeed, white matter tension differentials in the developed brain may not explain location of folds, but instead white matter tension may contribute to a whole-brain scale 'pressure' during development that drives the folding process overall.”

      Reviewer #3 (Recommendations For The Authors):

      Many thanks to the authors for this elegant article. I will only report here on the cosmetics of the article.

      We thank the reviewer for their kind words and attention to detail and have made all the suggested changes and revised the paper generally for readability, grammar and spelling.

      p2: last line of abstract: 'for a range of conditions in the future'.

      p3 l.37: I would not self-describe this method as elegant as this is a subjective property .

      p3 l.38: 'that will render' -> I wouldn't use the future here.

      p.4 l.59: double spacing before ref [9]?

      p.6 l.99: 'approximate a fractal' -> why is 'a' italicized?

      p.7 fig.2: I would expect the colours to be detailed in the legend. Are there two data points per species because both hemispheres are treated separately?

      p.9 l.134-135: 'similar to and in terms of the universal law 'as valid as' -> please add commas for reading comfort: 'similar to, and, in terms of the universal law, 'as valid as'.

      p.9 l. 141: For all the cortices we analysed.

      p.9 Fig 3: I find the colours a bit confusing in Figs B and C. I find Fig C a bit confusing: what are all the lines representative of, and more specifically, the two lower lines with a different trajectory?

      p.10 l.155: '1̃500' -> '~1500'.

      p.13 l. 209: either 'speculate that' of 'wonder if'.

      p.14 l.232: 'neuron numbers' -> 'number of neurons'.

      p.26 S2 second paragraph: 'gryi' -> 'gyri'.

      p.30 l.3: please refrain from starting a sentence with I.e..

      p.30 last line before S3.2: 'The algorithmic implementation in MATLAB can be found on Zenodo: TBA' - I guess this is linked to you disclosing the code upon acceptance, but please complete before final submission.

      p.34 middle/bottom of page: 'The scheme described in Sec. S3.1' -> double spacing before S3.1?

      p.35 l.1: 'We simply replace' -> 'we simply replace' (no capital).

      p.36 Fig S5.1: explicit the same colouring of the points and boxes in legend

      p.38 Fig. S6.1: briefly describe the use of colours in the legend.

      p.39 Fig. S7.1: detail colours in the legend.

      p.41 Fig. S7.3: detail colours in the legend.

    2. eLife assessment

      This study presents valuable framework and findings to our understanding of the brain as a fractal object by observing the stability of its shape property within 11 primate species and by highlighting an application to the effects of aging on the human brain. The evidence provided is solid but the link between brain shape and the underlying anatomy remains unclear. This study will be of interest to neuroscientists interested in brain morphology, and to physicists and mathematicians interested in modeling the shapes of complex objects.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Figure 1

      • The "matched primary tumors" from TCGA include n=424 from cutaneous melanoma; but it is unclear where this is coming from; the PanCan Atlas for melanoma shows n=81 primary and 367 metastatic tumors. There are also additional large cohorts of ICI-treated metastatic tumors with RNAseq data (e.g. a metastatic melanoma cohort with 100+ patients https://doi.org/10.1038/s41591-019-0654-5) that would increase the numbers here.

      We thank the reviewer for their observation. We have replaced references to “primary” cancers as “TCGA” cancers as appropriate. While the TCGA analyses included metastatic samples, the majority of the TCGA tumors in most cohorts correspond to primary cancers or local metastases, a point which we added to the text. We retained Fig. 1D as the representative examples are actual primary samples. We have decided to defer analysis of additional melanoma cohorts for future inquiry.

      Figure 2

      • What is the basis for the split between high and low Dux4 expressing tumors at 1 TPM? Is it arbitrary, or based on some structure in the distribution? (e.g. bimodal distribution)

      Our previous analyses of RNA-seq datasets derived from early embryogenesis samples (PMID: 3132774, 28459457) showed that physiologic levels of DUX4 range from approximately 2 to 10 TPM. We added a description in the methods section, under “Genome annotations, gene expression, and Gene Ontology (GO) enrichment analyses,” of our conservative choice for the threshold: DUX4-positivity defined as expression levels > 1 TPM.

      Figure 3

      • Overall claim is that Dux4 expression is associated with worse survival in metastatic urothelial carcinomas treated with PD-L1 inhibitor. However, the rationale for the choice of split (Dux4 expression < 0.5 and > 1 TPM) to show is unclear (is this the 25th percentile? 75th percentiles?), and the rationale/interpretation of the "partial adjustment" for TMB by removing the bottom quartile of TMB feels non-rigorous and prone to bias. It doesn't feel like Fig 3bc contributes very much; Figure 4 really is the more rigorous analysis.

      We thank the reviewer for these comments and suggestions. We adjusted the analyses in Fig. 3C and Fig. S3 to be consistent with Fig. 1 and Fig. 2, in terms of the choice of split. We also clarified in the text how our initial, crude TMB adjustment served as an important indication for us to pursue more rigorous statistical approaches.

      Figure 4

      • Dux4 expression is independently associated with worse survival considering other clinical and molecular characteristics

      • I would include TGFB in the features considered in the table (in the supplementary but not the main table or forest plots, not sure why not?)

      • The choice of Dux4 expression split ( < 0.25 and > 1 TPM) feels arbitrary and is different than the split in Figure 3; what is the rationale for this? Also, how many patients does this exclude? (TPM between 0.25 and 1). What does the continuous value or median split for Dux4 expression give you for the CoxPH model?

      • Re: building a predictive model, excluding patients (e.g. between <0.25 and > 1 Dux4 TPM) makes the model difficult to apply (e.g. cannot apply to patients with Dux4 levels in the missing interval); a better predictive model would include all patients in the cohort.

      We thank the reviewer for their other suggestions. We have clarified in the text that our choice to define DUX4negative samples as those with DUX4 expression levels < 0.25 TPM was made to preemptively address potential misclassifications due to decreased sensitivity of bulk RNA-seq at very low expression levels (PMID: 18516045). We believe our classifications with the new scheme are more reliable. We have also now specified in the text that our categorization excludes 126 patients. We have decided to not pursue the addition of TGFB or exploration of the use of an alternative split or continuous version of DUX4 expression in the Cox Proportional Hazards analyses but appreciate the suggestions, which we will keep in mind for future studies.

      Figure 5

      • An RSF (randomized survival forest) model predicts survival in Dux4+ vs Dux4- patient, and the Shapley values for landmark time analyses show time-varying effects of different features.

      • In some sense, the authors have already demonstrated that Dux4+ is associated with survival differences in ICI treated patients; so a model that predicts survival applied to Dux4+ and Dux4- patients that shows a difference in survival is unsurprising (even in a training/test set setting given that there is a difference in survival across the entire cohort). The quantified marginal effect (from a predictive perspective) of different features is what is interesting here. In that light, I'd like to see more validation of the model up front, specifically how close the predicted survival is to the actual survival of patients (e.g. the survival curves in Fig 5a but with actual survival of the Dux4- and Dux4+ cohorts superimposed on the predicted probabilities).

      We thank the reviewer for this suggestion. We have added a plot showing the superimposed survival probability estimates over time for the RSF and KM models for patients assigned to either the test or training sets in Fig. 5.

      SFig 5

      • Unclear how the authors got estimates of the # of expected deaths associated with covariates (e.g. "...we measured an increase in the number of predicted deaths associated with DUX4-positivity by approximately 16, over DUX4negative status (Fig S5F-G).") from Shapley values as shown in the indicated figure - is this 16 out of the entire cohort? At a given time point? Would recommend perhaps showing the inferred absolute change in mortality (e.g. 8% absolute increase in mortality)

      Mortality is the expected number of deaths for the cohort over the observation window, measured as the sum of the CHF over time. We have clarified this in the Methods section, under “Random Survival Forest, feature importance, and partial dependence.” We have also changed the quantification to show the absolute mortality differences comparing patients with DUX4-negative and -positive tumors; we thank the reviewer for this suggestion. We have also clarified in the text that adjusted mortality was estimated via partial dependence, which operates using the correct units, as opposed to Shapley values, where attribution is scaled. Finally, we changed the referenced figure when discussing changes in mortality associated with TMB and DUX4 status (Fig. S5H-I); we appreciate the reviewer pointing out this error.

      Figure S1B-C

      • The authors argue that Dux4 expression is not an artifact of FFPE tissue by analyzing a mixed tumor cohort sequenced with both poly-A and hybrid probe capture in matched flash-frozen and FFPE tumor samples, showing that it is 1) detectible both FFPE and flash-frozen tissue and 2) higher levels are detected in polyA sequencing/frozen tissue. However, the reference for this section (D. Robinson et al 2015) is a study of a cohort of prostate cancers with polyA bulk RNAseq sequencing; is this correct/is the data coming from a different study?

      • Analysis of scRNAseq (if available) would strengthen their analyses by better delineating the expression and response of interferon-gamma and downstream (e.g. antigen presentation) pathways in specific cell compartments, and potential differences in cell-cell interactions (e.g. using CellPhoneDB) associated with Dux4+ vs Dux4- tumors.

      • Do the investigators find similar findings in primary and metastatic tumors sequenced the same way (e.g. tcga primary vs met melanoma, albeit most of the met melanoma are Stage III lymph nodes)?

      We thank the reviewer for finding the citation error. We have corrected the manuscript to reflect the correct study we analyzed (PMID: 28783718). We also thank the reviewer for their additional suggestions, which undoubtedly would strengthen the current study. However, we have respectfully decided to defer these additional analyses for future study.

      Reviewer #2:

      It is strange as a statistician to see BIC and AIC represented as barplots, e.g. Figure 4B. There is no knowledge to be gained through this visual representation that would not otherwise be conveyed by just giving the numbers.

      We thank the reviewer for this suggestion. We understand that simply stating the numbers would be equally informative. However, we respectfully decided to retain our current versions of Figures 4 and S4 so that the numbers can be illustrated in a visual manner in the figures, rather than just stated in the text.

    2. eLife assessment

      This study presents a valuable finding on the association between DUX4 expression with features of immune evasion in human tissue and clinical outcomes in patients with advanced urothelial cancer. The evidence supporting the claims of the authors is convincing, using a range of corroborative statistical techniques. Compared to an earlier version, the quality of the manuscript has been enhanced, for example Figure 5 now illustrates the key features of survival probability estimates over time for patients assigned to with the test or training set.

    3. Reviewer #1 (Public Review):

      Pineda et al investigate the association of the hypothesis that Dux4, an embryonic transcription factor, expression in tumor cells is associated with immune evasion and resistance to immunotherapy. They analyze existing cohorts of bulk RNAseq sequenced tumors across cancer types to identify Dux4 expression and association with survival. They find that Dux4 expression is detected in a higher proportion of metastatic tumors compared to primary tumors, is associated with decreased immune infiltrate and a variety of immune metrics and previously nominated immune signatures, and do an in depth evaluation of a cohort of metastatic urothelial cell carcinoma, finding that Dux4 expression is associated with a more immunodeficient tumor microenvironment (desert or excluded microenvironment) and worse survival in this aPDL1 treated cohort. They then find that Dux4 expression is a major independent predictor of survival in this cohort using different types of survival analyses (KM, Cox PH, and random survival forests). With prior existing biological data supporting the hypothesis (in prior work, the senior author has demonstrated Dux4 expression causally suppresses MHC-I expression in interferon-gamma treated cell lines), the current work links Dux4 expression with less immune activity in clinical tumor samples and with survival in ICI treated urothelial carcinomas, and demonstrates that Dux4 expression provides independent information towards survival including other molecular and clinical characteristics (TMB, ECOG PS as the other strongest markers), and provides interesting resolution on landmark analyses with TMB and Dux4 expression providing greater informativeness at later survival landmarks (e.g. 1 year and later), while ECOG PS has strong informativeness already at earlier time points. This work provides impetus towards more mechanistic and functional dissection of the mechanism of Dux4-associated changes with the tumor microenvironment (e.g. in vivo mouse studies) as well as potential interventional studies (e.g. Dux4 as a target in combination therapies). What the work does not provide is additional resolution on the mechanism of how Dux4 may be associated with a more immunodeficient microenvironment.

      The conclusions are generally well supported, but there are issues that would benefit from clarification and extension:

      - The finding that Dux4 expression is detected in a higher proportion of metastatic tumors and at higher levels compared to TCGA samples (Fig 1BC) is striking. However, at least for one tumor type (melanoma), the TCGA cohort is comprised of mostly locoregional metastatic (n=81 primary and 367 metastatic tumors in the PanCan Atlas). Since there are annotations for primary and (locoregional) metastatic samples in TCGA, an analysis of the primary vs. locoregional metastasis vs distant metastatic samples seems reasonable and likely informative. The analysis of tumors with matched FFPE and flash frozen samples with hybrid probe capture and polyA sequencing, respectively is a nice validation to show that the difference in Dux4 expression is not due to differences in preservation of starting material/sequencing in the metastatic samples vs TCGA samples (S1BC).<br /> - The findings that Dux4 expression in the metastatic urothelial carcinoma setting is associated with a more immunodeficient microenvironment (Figure 2) is clear and unambiguous using multiple lines of data and analyses (bulk RNAseq, DUX4-positive vs DUX4-negative tumors, different immune cell and cytokine signatures; IHC showing an association with immune deserts and immune excluded phenotypes). However, this is an association and does not demonstrate causality.<br /> - The survival analyses (Fig 3,4,5) show fairly convincingly that Dux4 provide independent predictive information beyond clinical variables and TMB towards survival in the aPDL1 treated metastatic urothelial carcinoma cohort. However, the choice to split the cohort into Dux4 negative (defined as < 0.25 TPM) and Dux4 positive (> 1 TPM) while excluding a large number of patients (n=126 pts) that fall in between has significant impact on the rigor of conclusions. This would benefit from showing all the data (e.g. including the 3rd group of in-betweens in the survival analyses as a separate group).<br /> - The authors demonstrate that adding Dux4 to clinical markers and TMB results in an improved predictive model for survival, but there are a few questions regarding this model as a clinical biomarker<br /> o Is Dux4 expression better than other correlated immune signatures/markers (e.g. interferon gamma, T effector signature, overall immune infiltrate) in providing additional information?<br /> - The use of random survival forests to quantify the (predictive) marginal effect of Dux4+ vs Dux4- expression on survival in a non-parametric model as well as shed light on association with survival at different landmark times using Shapley values is quite interesting and well conducted.

    4. Reviewer #2 (Public Review):

      Summary:

      This article takes an expansive look at the potential role of DUX4 in cancer treatment and prognosis, including its correlation with other key biomarkers, the potential for cancer to be resistant to treatment, and risk prediction.

      Strengths:

      The primary strength of this work is the breadth of the analyses. The authors have linked DUX4 to not just one but multiple points in the trajectory of cancer, which increases the face validity of their conclusion that DUX4 is meaningfully related to the course of a cancer as well as the prognosis for a patient.

      Statistically, the authors have taken care to properly validate their findings using appropriate bootstrapping and testing strategies.

      Weaknesses:

      Several weaknesses are noted. First, there is little-to-no description of the underlying sample population. It is only stated that "several large cohorts of patients with different metastatic cancers" were analyzed, and that a cohort of patients with advanced urothelial cancer was used for estimating associations with clinical outcomes. Lacking is information on the sampling mechanism, inclusion/exclusion criteria, treatment modalities, the definition of 'time = 0', the number of events observed, or even the sample size. Knowledge about the underlying study design would help explain some counterintuitive results, e.g. that the hazard of death among patients with Stage IV cancer is half that of those with Stage I cancer (Table 1); presumably this is not because Stage IV is actually protective but rather an artifact of the sampling scheme for these data. Second, the definition of negative versus positive DUX4 expression varies throughout the paper. In Figure 2B, Figure 3A, and Figure 3C, it is defined as >1 TPM vs. <= 1 TPM; in Figure 4A and Figure 5A, it is defined as >1 TPM vs. < 0.25 TPM; in Figure S1C it is partitioned into four groups, with boundaries defined at 0.25 TPM, 1 TPM, and 5 TPM. If categorization is needed, a rationale should be provided (ideally prospectively and not based upon the observed data, so as to avoid the perception of forking paths analyses), and it should be consistently applied. Third and finally, data seem to be occasionally excluded without rationale. For example, as mentioned above, the Cox model presented in Figure 4A seems to exclude all patients with DUX4 TPM between 0.25 and 1. Figure 3C excludes patients with TMB in the lowest quartile (although the decision was ostensibly to control for TMB confounding, there are more appropriate ways to do so that don't result in loss of data, e.g. a stratified KM plot). Excluding patients based upon a particular region of the covariate space makes interpreting the resulting model awkward.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Line 144, after eq. (1). Vectors d_i need to be defined. Are these the mapping of vectors e_i due to the active deformation? It would be useful to state then that d_3 is aligned with r'.

      Thank you for your suggestion, and the definition has been added to lines 146-149 for a better understanding of the model.

      • Line 144.Authors state a_i(0,0,Z)=0. Shouldn't this be true also for any angle, i.e., a_i(0,Theta, Z)=0?

      Thank you, we have revised it in line 144.

      • Line 156. G_0 is defined as Diag(1,g_0(t), 1), which seems to be using cylindrical coordinates. Previously, in line 147, vector argument X of \chi is defined with Cartesian coordinates (X,Y,Z). Shouldn't these be also cylindrical?

      We are very sorry for this error, our initial configuration is defined with cylindrical coordinates, we have revised it in the manuscript line 151.

      • Line 162. "where alpha and beta lie in the range [-pi/2, pi/2]" has already been indicated.

      Thank you for your mention, we have deleted duplicate information in line 166.

      • Line 171. W is defined as the strain energy density, while in equation (2), symbol W is the total energy (which depends on the previous W). Letters for total elastic and strain energy must be distinguished.

      Thank you, we have changed the letter for total energy in Eq.(2).

      • Line 176. "we take advantage of the weakness of" -> "we take advantage of the small value of".

      We have revised it in line 179.

      • Line 177. Why is there a subscript i in p_i? If these do not correspond to penalty p, but to parameters in eqn (3), the latter should have been introduced before this line.

      We have revised this error in line 180.

      • Line 186. "as the overall elongation \zeta". This parameter, axial extension, has not been defined yet.

      Thank you for your mention, the definition of \zeta is now given in line 146.

      • Figure 4. Why are the values of g_0 from the elastic model and equations (30)-(32) so non-smooth? Clarify what is being fit and what is the input in the latter equations. Final external radius R_3? Final internal radius R_1'?

      (1) To mimic the embryo, we consider a multi-layered cylindrical body so that the shear modulus of each layer is different. The continuity of both deformations and stresses is imposed (see Eq.(26)-Eq.(30). This is the usual treatment for complex morpho-elastic systems. Obviously, $g_0$ originates from the actomyosin cortex so it appears only in the corresponding layer. Finally, all physical quantities such as deformations and stresses must be continuous.

      (2) The final outer radius is R_3, which represents the outer radius of C. elegans embryos. In addition to R_3, what we need to consider in this model are R_1’=0.7, R_1’=0.768, R_2=0.8 and R_2’=0.96, these definitions have been added in the caption of Appendix 2—figure 1.

      • Line 663, equation (19). Parameter mu is multiplying penalisation term with p, while in equation (2) mu is only affecting the elastic part.

      These two different ways of expressing the energy function will ultimately affect the value of p, but the two p are not the same quantities, so they will not affect our results. To avoid misunderstandings, we will replace p in equation (19) with q.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my public summary, I find the writing really not adequate. I provide here a list of specific points that the authors should in my opinion address. As a general comment, I would delete many instances of 'the'.

      First, here are figures and whole paragraphs that do not seem to bring anything to the understanding of the phenomenon of C. elegans elongation, notably, Figs. 2, 3C-H, 5m, and 6. Figures 6G and 7 are the only figures containing results it seems. Some elements of the figures are repeated, for example, the illustration of the system's cross-section in Figs 3 and 5.

      Thank you for your suggestion, we have made some adjustments to our images to remove some of the duplicate information.

      Second, and this is my most important criticism: the mechanism of elongation by releasing elastic stress introduced by muscle contraction is not explained in clear terms anywhere in the text. At least, I was unable to understand it. On p 10 you write "This energy exchange causes the torsion-bending energy to convert into elongation energy, (...)" How this is done is not explained. I assume that the reference state is somehow changed through muscle contraction. The new reference state probably has a longer axis than the one before, but this would then be a plastic deformation and not purely elastic as claimed by the authors (ll 76: "This work aims to answer this paradox within the framework of finite elasticity without invoking cell plasticity (...)"). Is torsion important for this process or is it 'just' another way to store elastic energy in the system?

      We perfectly explain most of the exchange of energy between bending, torsion and elongation: indeed, we quantify all aspects of this transformation as the elastic elongation energy, and the dissipation processes which will cost energy. The dissipation evaluated here concerns the rotation of the worm due to the muscle geometry and the viscous friction at the inner surface of the egg. Torsion seems to appear in the late stages and only in some cases. As we show, it comes from a torque induced by the muscles which are not vertical. vertical. Finally, our quantitative predictions of the modelling which recovers most of the experimental published results.

      Third, there are a number of strange phrasings and the notation is not helpful in places.

      We feel sorry for that, the manuscript is now more precise.

      Fourth, the title promises to explain how cyclic muscle contractions reinforce acto-myosin motors. I can't see this done in this work.

      The fact that the acto-myosin is reorganized between two sequences of contraction justifies the title. The complete reorganization of the actomyosin network would require a chemico-mechanical model that is not achieved here, perhaps in future work as data become available.

      In addition:

      We have chosen to respond globally rather than point by point to the referee’s recommendations.

      Typographic errors and vocabulary

      All English corrections and typos are now included in the main text.

      Figures and captions:

      Figures and captions have been improved.

      • Figure 1: Make the caption and the illustration more coherent. For example, only two cell types are distinguished; in the caption, you mention lateral cells, in the sketch seam cells. What is the difference between acto-myosin and muscle contraction? Muscle contraction is also auto-myosin-based.

      (1) The caption for Fig.1 is revised.

      (2) From a mechanical point of view, actomyosin bundles in C elegans are orthoradial, whereas muscles are essentially parallel to the main axis of the body are essentially parallel to the main axis of the body, so the geometry is completely different and of extreme importance for deformation. Muscle contractions are quasi-periodic, we do not know the dynamics of the attached molecular motor of myosin. So of course, both contain actin and myosin (not exactly the same proteins), but our model is sensitive to more macroscopic properties.

      • Figure 2: I do not find this figure helpful. I might expect such a figure in a grant proposal, but much less in an article.

      Figure 2 shows the strategy of our work, we hope that readers can see at a glance what kind of analysis has been done through this figure: since our work is divided into several parts, readers can also unravel the logic through this scheme after reading the whole manuscript. So, this diagram is a guide, and it may be helpful and necessary.

      • Figure 3: Figure 3 A, right: What is the dashed line? B You indicate fibers, but your model does not contain fibers, does it? How do I get from the cube to the deformed object? What is the relation of C-H with the rest of the work? Furthermore, you mention seam cells in Fig. 1, but they are absent here. Why can you neglect them? Why introduce them in the first place? E What is a plant vine? F-H What rods are you referring to? Plants do not have muscles, right?

      We have modified this figure, and the original Figure 3 now corresponds to Figures 3 and 4.

      (1) The dashed line is the centerline after deformation.

      (2) The referee is wrong: our model represents the fibers by a higher shear modulus for the actomyosin cortex and for the muscles (see Table Appendix 1) and G_1 reflects the activities of the muscle and actin fibers.

      (3) The cube in Figure 3 is a mathematical 3D volume element that is subjected to stresses. Hyperelasticity modelling is based on such a representation.

      (4) C-H(new version: Fig.4 A-F): These images show similar deformations: bending and torsion as our C. elegans study. These figures indicate that such deformations are quite common in nature, even if the underlying mechanism is different.

      (5) This is a point we have already mentioned: we ignore the difference between the different types of epidermal cells and average their role in the early and second stages of elongation.

      (6) The plant vine is the 'botanical vine', see Goriely's article and book.

      (7) F-H(new version: Fig.4 D-F) do not have fixed rods, we set a curvature and torsion to fit the actual biological behavior.

      (8) Plants do not have muscles, but they grow, and our formalism for growth, pre-strain and material plasticity is very similar to the hyper-elasticity formalism.

      • Figure 4: Fig .4 A: "The central or inner part (0 < 𝑅 < 𝑅2, shear modulus 𝜇𝑖) except the muscles which are stiffer." I do not understand.

      In the new version, this figure corresponds to Fig.5. The shear modulus of the intrinsic part is very small, but the muscles are harder so we have to consider them separately, we have revised this sentence to avoid misunderstanding.

      • Figure 5: Fig 5 A and D: The schematic of the cross-section has appeared already in the previous figure. No need to repeat it here. The same holds for the schematic of the cylindrical embryo. Caption: "But, the yellow region is not an actual tissue layer and it is simply to define the position of muscles." Why do you introduce the yellow region at all? I do not think that it clarifies anything. "Deformation diagram, when left side muscles M_1 and M_2." Something seems to be missing here. Similarly in the next sentence. "the actin fiber orientation changes from the 'loop' to the 'slope'" Do the rings break up and form a helix?

      In the new version, this figure corresponds to Fig.6.

      (1) We have made revisions to these figures.

      (2) The yellow part can show the accurate location of four muscles, which is important for our model and further calculations.

      (3) We have revised this sentence in the caption of Fig. 6.

      (4) Actin rings do not change to a helix pattern, they will be only sloping.

      • Figure 6: Fig 6 A-C These panels do not go beyond Fig 5B. Fig 6D: what are these images supposed to show? They are not really graphs, but microscopy images. The caption is not helpful to understand, what the reader is supposed to see here. Fig 6F: do you really want to plot a linear curve?

      In the new version, Fig.5 and Fig.6 respectively correspond to Fig.6 and Fig.7.

      (1) Fig.6 shows the simulated images, and Fig.7 A-C is the real calculation results, they are different.

      (2) Fig.7 D can show the real condition during C. elegans late elongation, here, we would like to show the torsion of the C. elegans.

      (3) Yes, it is our result.

      Discussions concerning the biological referee questions:

      Ll 75: “how the muscle contractions couple to the acto-myosin activity" Again I find this misleading because muscle contraction relies on auto-myosin activity. Probably, you can find a better expression to refer to the activity of the actomyosin network in the epidermis. Do you propose any mechanism for how muscle contraction increases epidermal contractility? This does not seem to be the mechanism that you propose for elongation, is it?

      The actomyosin activity will not stop because of the muscle contraction. Obviously, these two processes cannot be independent. The energy released by a muscle contraction event can and must contribute to the reorganization of the actomyosin network that occurs during the elongation process. Indeed, despite the fact that the embryo elongates, the density of actin cables appears to be maintained, which automatically requires a redistribution of actin monomers. We propose a scenario in which muscle contraction increases actomyosin contractility via energy conversion. We show that after unilateral contraction there is an energy release for this once all dissipation factors are eliminated. We invite the reviewer to re-examine Figure 2 and invite biologists to seriously evaluate the density of molecular motors attached to the circumferential actin cable throughout the stretch process.

      Ll 133: "we decide to simplify the geometrical aspect because of the mechanical complexity" This is hardly a justification. Why is it appropriate?

      Yes, we would like to offer the reader the simplest modelling with a limiting technicity and a limited number of unknown parameters.

      L 135: "active strains" Why not active stress?

      The two are equivalent, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      L 170: "hyperelastic" Please, explain this term.

      It is the elasticity of very soft samples subjected to large deformations. For classic references, see the books of Ogden, Holzapfel and Goriely, all of which are mentioned in our paper.

      Major criticism

      Eq. 3 and Ll 227: "𝑝1 is the ratio between the free available myosin population and the attached ones divided by the time of recruitment" Why is the time of recruitment the same for all motors? "inverse of the debonding time" Is it the same as the unbinding rate? Why use the symbol p_2 for it? What is p_3?

      The model proposed to justify the increase in the activity of the actomyosin motors during the first phase is a mean-field model: thus all quantities are averaged: we are not considering the theory of a single molecular motor, but a collection in a dynamic environment, so we do not need stochasticity here. Equation (3) concerns the compressive pre-strain, which by definition is a quantity varying between $0$ and $1$ and $X_g=1-G$. ... The debonding time is not the same as the debonding rate. The term $p_3$ indicates saturation and is derived from the law of mass action. The good agreement with the experimental data is shown in Fig.5 (A) and (B). An equivalent model has been developed by (M. Serra et al.).

      Serra M, Serrano Nájera G, Chuai M, et al. A mechanochemical model recapitulates distinct vertebrate gastrulation modes[J]. Science Advances, 2023, 9(49)

      Ll 275: "This energy exchange causes the torsion-bending energy to convert into elongation energy, leading to a length increase during the relaxation phase, as shown in Fig.1 of Appendix 5." You have posed the puzzle of how contraction leads to elongation, and now that you resolve the puzzle, you simply say that torsion and bending energy are converted into elongation. How? Usually, if I deform an elastic object, it will return to its original configuration after releasing the external forces. Why is this not the case here?

      Furthermore, the central result of your work is presented in an Appendix!?

      We agree with the referee that an elastic object will return to its initial configuration by releasing stress, i.e. by giving up its accumulated elastic energy to the environment. But the elastic energy has to go somewhere, such as heat. We do not dare to say that the temperature of the worm increases during the muscle contractions.

      In fact, the referee's comment also assumes that full relaxation of the stresses is possible, so the object is not a multi-layered specimen and/or it is not enclosed in a box. Most living species are under stress, usually called residual stress. Our skin is under stress. Our fingerprints result from an elastic instability of the epidermis, occurring on foetal life as our brain circumvolutions or our vili. . So, it is obvious that stresses are maintained in multilayered living systems. Closer to the case of C. elegans, the existence of stresses has been demonstrated by experiments with laser ablation fractures in the first stage. The fact that the fractures open proves the existence of stress: if not, there is no opening and only a straight line.

      Ll 379: "Although a special focus is made on late elongation, its quantitative treatment cannot avoid the influence of the first stage of elongation due to the acto-myosin network, which is responsible for a prestrain of the embryo." This statement is made repeatedly through the manuscript, but I do not understand, why you could not use an initial state without pre-strain.

      This is the basic concept of hyperelasticity. The reference state must be free of stress, so we cannot evaluate the first muscle contraction without treating the first elongation stage.

      Grammar, vocabulary and writing errors

      ll 31: "the influence of mechanical stresses (...) becomes more complex to be identified and quantified" Is the influence of mechanical stress too complex or too difficult to be identified/quantified?

      We have revised it in line 31, “The superposition of mechanical stresses, cellular processes (e.g., division, migration), and tissue organization is often too complex to identify and quantify.”

      Ll 41: "The embryonic elongation of C. elegans represents an attractive model of matter reorganization without a mass increase before hatching." Maybe "Embryonic elongation of C. elegans before hatching represents an attractive model of matter reorganization in the absence of growth.".

      We have revised it in line 41.

      L 42: "It happens after the ventral enclosure (...)" Maybe "It happens after ventral enclosure (...)".

      We have revised it in line 42.

      Ll 52: "The transition is well defined since the muscle participation makes the embryo rather motile impeding any physical experiments such as laser ablation (...)" Ablation of what?

      We have revised it in line 53:The transition is well defined, because the muscle involvement makes the embryo rather motile, and any physical experiments such as laser fracture ablation of the epidermis, which could be performed and achieved in the first period (\cite{vuong2017interplay}), become difficult,.

      Ll 59: "a hollow cylinder composed of four parts (seam and dorso-ventral cells)" It is not clear, what the four parts are - in the parenthesis, two are mentioned.

      We have revised it in line 59. Fig.1 shows the whole structure, dorsal, ventral and seam cells form four parts of the epidermis.

      L 78: "several important issues at this stage remain unsettled" At which stage?

      It means the late elongation stage, we have added this information in line 78.

      Ll 85: "but how it works at small scales remains a challenge." Maybe "but how it works at small scales remains to be understood.".

      We have revised it in line 86.

      Ll 99: "the osmolarity of the interstitial fluid" The comes out of the blue. Before you only talked about mechanics, why now osmolarity? Also, the interstitial fluid is only mentioned now. It is important for the dissipative effects that you discuss later, right? If yes, then you should probably introduce it earlier.

      For a better understanding, we have change osmolarity into viscosity in line 99.

      l 120: "The cortex is composed of three distinct cells" Maybe "distinct cell types".

      Thank you, and we have revised it in line 120.

      L 121: "cytoskeleton organization and actin network configurations" What is the difference between cytoskeleton organization and actin network configuration? Also, either both should be plural or both singular, I guess.

      (1) Cytoskeleton (which involves microtubules) forms the epidermis of C. elegans embryos, and the actin network surrounds the epidermis.

      (2) Thank you for your suggestion, we have revised it in line 121.

      L 130: "which will be introduced hereafter" Maybe "which will be used hereafter".

      We have revised it in line 130.

      Ll 148: "The geometric deformation gradient" You usually denote vectors in bold face, so \chi should be bold, right? Define d_i in Eq.(1).

      Yes, we have added this information in line 147.

      L 172: "auxiliary energy density" Please, explain this term.

      We have changed "auxiliary energy density" into "associated energy density" in line 175. Energy density is the amount of energy stored in a given system or region of space per unit volume, the associated energy density in our manuscript can help us to do some calculations.

      Ll 188: "Similar active matter can be found in biological systems, from animals to plants as illustrated in Fig.3(C)-(E), they have a structure that generates internal stress/strain when growing or activity. (...)" Why such a general statement during the presentation of the results? The second part of the sentence seems to be incomplete.

      Answers: We would like to show our method is general, and can be used in many situations. We have revised the wrong sentence in line 192.

      Ll 243: "a bending deformation occurs on the left for active muscles localized on left" Maybe "bending to the left occurs if muscles on the left are activated".

      Thank you, we have revised it in line 247.

      L 250: "we assume them are perfectly synchronous" Maybe "we assume them to contract simultaneously". We have revised it in line 252.

      L 258: "the muscle and acto-myosin activities are assumed to work almost simultaneously." Before it was simultaneously, now only almost!? What does almost mean?

      Sorry, we would like to express the same meaning in theses two sentences, we have deleted the word ‘almost’ in line 261.

      Ll 294: "one can hypothesize several scenarios" After that, only one scenario is described it seems.

      Thank you, we have revised this sentence in line 299.

      L 341: "and then is more viscous than water" Maybe "and that is more viscous than water".

      We have revised it in line 345.

      L 373: "before the egg hatch" Maybe "before the embryo (or larva) hatches"?

      We have revised the sentence in line 367.

      L 409: "elephant trunk elongated" maybe "elephant trunk elongation".

      We have revised it in line 412.

      Ll 417: "As one imagines, it is far from triviality (...)" Does this remake help in any way to understand better C. elegans elongation? Also maybe "it is far from trivial".

      We have revised it in line 423.

      Ll 428: "can map the initial stress-free state B_0 to a state B_1, which reflects early elongation process" Maybe: "maps the initial stress-free state B_0 to a state B_1, which describes early elongation".

      We have revised it in line 428.

      L 429: "After in the residually stressed (...)" Maybe "Subsequently, we impose an incremental strain filed G_1 that maps the state B_1 to the state B_2, which represents late elongation".

      We have revised it in line 429.

      l 763: "Modelling details of without pre-strain case" Maybe "Case without pre-strain" or "Modelling in the absence of pre-strain" Similarly for l 784.

      We have revised them in line 763 and line 784.

      Some questions of definition and understanding

      Ll 71: "We can imagine that once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side." I do not understand the sentence. Muscle activation leads to contraction, there is nothing to imagine here. Maybe you hypothesize that the muscles are attached to the epidermis such that muscle contraction leads to epidermis deformation?

      Yes, four muscle bands are attached to the epidermis, as shown in Fig.1. The deformation does not concern only the epidermis but the whole embryo during the bending events. We have modified the sentence to avoid misunderstanding, the sentence change to “Once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side.” in line 71.

      Ll 110: "However, it is less widely known that its internal striated muscles share similarities with skeletal muscles found in vertebrates in terms of both function and structure" Is it important for what you report, whether this fact is widely known?

      Yes, it is our opinion.

      Ll 112: "the role of the four axial muscles (...) is nearly contra-intuitive" Is it or is it not? If yes, why?

      Yes it is. Muscles exert contractions, so compressive deformations. Their localization are along the axis of symmetry (up to a small deviation) so they cannot mechanically realize the expected elongation, contrary to the orthoradial actomyosin network.

      However, elongation of the C. elegans is observed experimentally, so yes, we think the result contraintuitive.

      L 116: "fully heterogeneous cylinder" What is this?

      It means that the C. elegans embryo does not have the same elastic properties in different parts (or layers).

      L 129: "will collaborate to facilitate further elongation" To facilitate or to drive? If the former, what drives elongation?

      Contraction of muscles and actin bundles together drive elongation

      Ll 141: "the deformation in each section can be quantified since the circular geometry is lost with the contractions" The deformation could also be quantified if the sections remained circular, right?

      Yes. However, circularity is lost during each bending event.

      Ll 151: "we need to evaluate the influence of the C. elegans actin network during the early elongation before studying the deformation at the late stage. So, the deformation gradient can be decomposed into: (...) where (...) is the muscle-actomyosin supplementary active strain in the late period" I thought you were now studying the early stage?

      In this part, we are outlining how we can study the whole elongation (early and late), not just the early elongation stage. To evaluate the deformation induced by the first contraction of the muscles, we need to know the state of stress of the worm prior to this event, so we also need to recover the early period using the same formalism for the same structure.

      L 160: "When considering a filamentary structure with different fiber directions" Which filamentary structure are you talking about?

      Fig.3 B shows this model and the filamentary structure, which contains the actin and muscle fibers.

      Ll 174: "When the cylinder involves several layers with different shear modulus 𝜇 and different active strains, the integral over 𝑆 covers each layer" I do not understand this sentence. Also, you should probably write 'moduli' instead of modulus.

      This implies that when integrating over the whole cross-section S, we need to take into account each layer independently with its own shear modulus and sum the results.

      L 176: "weakness of 𝜀" Do you mean \epsilon << 1?

      Yes

      Ll 178: "Given that the Euler-Lagrange equations and the boundary conditions are satisfied at each order, we can obtain solutions for the elastic strains at zero order 𝐚(𝟎) and at first order 𝐚(𝟏)." Are you thinking about different orders in an \epsilon expansion or the early and the late stages of elongation?

      Answers: Different orders are considered only for the late elongation study, the early elongation is treated exactly so do not need a correction in \epsilon.

      L 197: "fracture ablation" Please, define.

      This is an experiment in which a laser is used to make a cut in a small-scale object of study and then the internal stresses are obtained based on the morphology of the cut, please see the Ref ‘Assessing the contribution of active and passive stresses in C. elegans elongation’. We have added this definition in line 200.

      Ll 203: What motivated your choice of notations for the radii R_2'? The inner part of the cylinder is fluid? But above you wrote about a solid cylinder. Why should the inner part be compressible?

      (1) We need to define the location of actin cables, which concentrate at the outer periphery.

      (2) Our model is a hollow cylinder, and the inner part of the cylinder contains internal organs, tissues, fluids, and so on, so we consider it to be a compressible extremely soft material (Line 213).

      Ll 212: "𝑟(𝑅) is the radius after early elongation." And during?

      R is variable, r(R) depends on R but also on time t, it represents the radius of C. elegans embryos after the onset of elongation, i.e., after acto-myosin and muscle activities begin.

      L 232: \tau_p is probably t_p?

      Yes.

      L 240: "quite simultaneously" Please, be precise.

      In practice, it is difficult to define the concept of simultaneous occurrence unless there is rigorous experimental data to show it, but all we can get in the Ref ‘Remodelage des jonctions sous stress mécanique’, is that it occurs almost simultaneously, which we define as quite simultaneously.

      Ll 246: "a short period" What does short mean? Why is it relevant?

      From the experimental observations and data, we know that each contraction occurs very rapidly: a few seconds so we define a short period for one contraction.

      L 263: "the bending of the model will be increased" Is it really the model that is bent?

      Yes, the bending deformation predicted by the model, we have revised in line 266.

      Ll 265: "we observed a consistent torsional deformation (Fig.6(E)) that agrees with the patterns seen in the video" In which sense do these configurations agree? I do not see any similarity between panels D and E.

      Both show a torsion deformation.

      L 267: "torsion as the default of symmetry of the muscle axis" I do not understand.

      We discuss two cases in this research, one where the muscle follows the axis of the C. elegans in the initial configuration, and the other where the muscle has a slight angle of deflection, and we have added more information in the manuscript (line 270).

      Ll 274: "Each contraction of a pair increases the energy of the system under investigation, which is then rapidly released to the body." Do you mean the elastic energy stored in the epidermis and central part of the embryo?

      Yes, the whole body.

      Ll 284: "The activation of actin fibers 𝑔𝑎1 after muscle relaxation can be calculated and determined by our model." Have you done it?

      Yes, we can obtain the value of g_a1, and then calculate the elongation.

      Ll 286 I do not understand, why you write about mutants at this place. Am I supposed to have already understood the basic mechanism of elongation? Why do you now write about the first stage?

      I would like to show our formalism can model wild-type and mutant C.elegans, and the comparison results are good.

      L 302: "The result is significantly higher than our actual size 210𝜇𝑚." How was significance assessed? Your actual size is probably more than 210µm.

      Here, we have considered two situations, one is that the accumulated energy is totally applied to the elongation so that the length will be much larger than the experimental result of 210 µm, the length value that we have obtained by calculation. In the other case, we have considered the energy dissipation, which leads to 210 µm.

      L 433: "where 𝜆 is the axial extension due to the pre-strained" Maybe ""where 𝜆 is the axial extension due to the pre-stress".

      In our manuscript, we define the pre-strain, not the pre-stress.

      L 438: "active filamentary tensor" Please, define.

      Active filamentary tensor defines the tensor representing the activities of a cylindrical model composed of different orientations fibers.

    2. Reviewer #1 (Public Review):

      The authors have made a novel and important effort to distinguish and include different sources of active deformations for fitting C elegans embryo development: cyclic muscle contractions and actomyosion circumferential stresses. The combination and synchronisation of both contributions are, according to the model, responsible for different elongation rates, and can induce bending and torsion deformations, which are a priori not expected from purely contractile forces. The model can be applied to other growth processes in initially cylindrical shapes.

      The tilt of the fibers is an important assumption of the model. However, fiber direction in Figure 3B is not fully clear for explaining the tilting. The fiber in 3B has not very much in common with the fibers in the color part of the figure. Also, is vector m supposed to be tangent to the fiber? In the figure does not seem to be so. It should be expected that alpha is a consequence of the deformation, not as an input parameter, as it seems in the tests of Figure 6A. How is the value of alpha chosen? According to Figure 6, torsion is expected for alpha>0, but for beta=pi/2 and alpha>0 no torsion may be obtained. In fact, it seems that torsion should appear when cos(beta)*sin(alpha)>0. As a consequence, value of beta should be given in Figure 6. Can the amount of torsion be tested as a function of alpha and beta?

      The transfer of energy and deformation is a very interesting aspect of the paper, and also crucial for the model and predicting elongation. However, the modelling of this transfer remains very obscure and only explained in the Appendix. Some more details on how the transfer is selected should be given in the main text. Can the transfer of energy interpreted as a change of the relaxed reference configuration? Once a ratio of the energy transferred is fixed, the assumption on elongation distribution should be stated. (Uniformly? ) The authors should also define in the main text the factor g_a1, and explain how this value is computed from condition W_c=W_r .

      Given the convoluted shape of the embryo in the egg, contact may be a crucial mechanism for determining growth and torsion. The model does not include this contact, and this limitation should be reflected in the article.

      Minor comment:<br /> -Line 300: "we determine the optimal values for the activation parameters". the optimal with respect to which objective? Norm of difference between experimental and computational displacements? How this is quantified needs to be specified.

    3. Reviewer #2 (Public Review):

      Summary:

      During C. elegans development, embryos undergo elongation of their body axis in absence of cell proliferation or growth. This process relies in an essential way on periodic contractions of two pairs muscles that extend along the embryo's main axis. How contraction can lead to extension along the same direction is unknown.

      To address this question, the authors use a continuum description of a multicomponent elastic solid. The various components are the interior of the animal, the muscles, and the epidermis. The different components form separate compartments and are described as hyperelastic solids with different shear moduli. For simplicity, a cylindrical geometry is adopted. The authors consider first the early elongation phase, which is driven by contraction of the epidermis, and then late elongation, where contraction of the muscles injects elastic energy into the system, which is then transferred into elongation. The authors get elongation that can be successfully fitted to the elongation dynamics of wild type worms and two mutant strains.

      Strengths:

      The work proposes a physical mechanism underlying a puzzling biological phenomenon. The framework developed by the authors could be used to explain phenomena in other organisms and could be exploited in the design of soft robots.

      Weaknesses:

      (1) The manuscript is hard to read without being very familiar with continuum descriptions of elastic media. This might make the work difficult to access for biologists. This is a real pity because the findings are potentially of great interest to developmental biologists and engineers alike.

      (2) The discussion of the worm's mechanical properties could go deeper. The authors hardly justify their assumptions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study presents careful biochemical experiments to understand the relationship between LRRK2 GTP hydrolysis parameters and LRRK2 kinase activity. The authors report that incubation of LRRK2 with ATP increases the KM for GTP and decreases the kcat. From this, they suppose an autophosphorylation process is responsible for enzyme inhibition. LRRK2 T1343A showed no change, consistent with it needing to be phosphorylated to explain the changes in G-domain properties. The authors propose that phosphorylation of T1343 inhibits kinase activity and influences monomer-dimer transitions.

      Strengths:

      The strengths of the work are the very careful biochemical analyses and the interesting result for wild-type LRRK2.

      Weaknesses:

      A major unexplained weakness is why the mutant T1343A starts out with so much lower activity--it should be the same as wild-type, non-phosphorylated protein. Also, if a monomer-dimer transition is involved, it should be either all or nothing. Other approaches would add confidence to the findings.

      We thank the reviewer for these suggestions. We are aware that the T1343A has generally a lower activity compared to the wild type. Therefore, we would like to emphasize that this mutant is the only one not showing an increase in Km values after ATP treatment. Other mutants, also having lower kcat values like T1503A, still show this characteristic change in Km. Our favored explanation for the lower kcat of T1343A is that this mutation lays within a critical region, the so-called ploop, of the Roc domain and is very likely structurally not neutral. Concerning the dimer-monomer transition, we are convinced that there is more than one factor involved in this equilibrium. Most likely, including, but not limited to other LRRK2 domains (e.g. the WD40 domain), binding of co-factors (e.g. Rab29/Rab32 or 14-3-3) and membrane binding. Consistently, also with stapled peptides targeting the Roc or Cor domains we were not able to shift the equilibrium completely to the monomer (Helton et al., ACS Chem Biol. 2021, 16:2326-2338; Pathak et al. ACS Chem Neurosci. 2023, 14(11):1971-1980) We have addressed these points in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      This study addresses the catalytic activity of a Ras-like ROC GTPase domain of LRRK2 kinase, a Ser/Thr kinase linked to Parkinson's disease (PD). The enzyme is associated with gain-of-function variants that hyper-phosphorylate substrate Rab GTPases. However, the link between the regulatory ROC domain and activation of the kinase domain is not well understood. It is within this context that the authors detail the kinetics of the ROC GTPase domain of pathogenic variants of LRRK2, in comparison to the WT enzyme. Their data suggest that LRRK2 kinase activity negatively regulates the ROC GTPase activity and that PD variants of LRRK2 have differential effects on the Km and catalytic efficiency of GTP hydrolysis. Based on mutagenesis, kinetics, and biophysical experiments, the authors suggest a model in which autophosphorylation shifts the equilibrium toward monomeric LRRK2 (locked GTP state of ROC). The authors further conclude that T1343 is a crucial regulatory site, located in the P-loop of the ROC domain, which is necessary for the negative feedback mechanism. Unfortunately, the data do not support this hypothesis, and further experiments are required to confirm this model for the regulation of LRRK2 activity.

      Specific comments are below:

      • Although a couple of papers are cited, the rationale for focusing on the T1343 site is not evident to readers. It should be clarified that this locus, and perhaps other similar loci in the wider ROCO family, are likely important for direct interactions with the GTP molecule.

      To clarify this point: We, have not only have focused on this specific locus, but instead systematically mutated all known auto-phosphorylation sites with the RocCOR domain (see. supplemental information). Furthermore, it has been shown that this site, at least in the RCKW (Roc to WD40) construct, is quantitatively phosphorylated (Deniston et al., Nature 2020, 588:344-349). We are aware that the T1343 residue is located within the p-loop and that this can impact nucleotide binding capacities (see response to reviewer 1).

      We have clarified and addressed these points in a revised version of the manuscript.

      • Similar to the above, readers are kept in the dark about auto-phosphorylation and its effects on the monomer/dimer equilibrium. This is a critical aspect of this manuscript and a major conceptual finding that the authors are making from their data. However, the idea that auto-phosphorylation is (likely) to shift the monomer/dimer equilibrium toward monomer, thereby inactivating the enzyme, is not presented until page 6, AFTER describing much of their kinetics data. This is very confusing to readers, as it is difficult to understand the meaning of the data without a conceptual framework. If the model for the LRRK2 function is that dimerization is necessary for the phosphorylation of substrates, then this idea should be presented early in the introduction, and perhaps also in the abstract. If there are caveats, then they should be discussed before data are presented. A clear literature trail and the current accepted (or consensus) mechanism for LRRK2 activity is necessary to better understand the context for these data.

      We agree on the reviewer’s opinion. We have revised the introduction accordingly and added a paragraph on page 3 starting from line 27.

      • Following on the above concepts, I find it interesting that the authors mention monomeric cytosolic states, and kinase-active oligomers (dimers??), with citations. Again here, it would be useful to be more precise. Are dimers (oligomers?) only formed at the membrane? That would suggest mechanisms involving lipid or membrane-attached protein interactions. Also, what do the authors mean by oligomers? Are there more than dimers found localized to the membrane?

      There are multiple studies that have shown that LRRK2 is mainly monomeric in the cytosol while it forms mainly dimeric or higher oligomeric states at membrane (James et al., Biophys. J. 2012, 102, L41–L43; Berger et al., Biochemistry, 2010, 49, 5511–5523). However, we agree with the reviewer that it remains to be determined if the dimeric form is the most active state at the membrane, or a higher oligomeric state. Espescially since a recent study shows that LRRK2 can form active tetramers only when bound to Rab29 (Zhu et al., bioRxiv, 2022, DOI: 10.1101/2022.04.26.489605). We have clarified these points in the introduction of the revised version of the manuscript (page 3, line 27ff).

      • Fig 5 is a key part of their findings, regarding the auto-phosphorylation induced monomer formation of LRRK2. From these two bar graphs, the authors state unequivocally that the 'monomer/dimer equilibrium is abolished', and therefore, that the underlying mechanism might be increased monomerization (through maintenance of a GTP-locked state). My view is that the authors should temper these conclusions with caveats. One is that there are still plenty of dimers in the auto-phosphorylated WT, and also in the T1343A mutant. Why is that the case? Can the authors explain why only perhaps a 10% shift is sufficient? Secondly, the T1343A mutant appears to have fewer overall dimers to begin with, so it appears to readers that 'abolition' is mainly due to different levels prior to ATP treatment at 30 deg. I feel these various issues need to be clarified in a revised manuscript, with additional supporting data. Finally, on a minor note, I presume that there are no statistically significant differences between the two sets of bar graphs on the right panel. It would be wise to place 'n.s.' above the graphs for readers, and in the figure legend, so readers are not confused.

      Starting with the monomer-dimer equilibrium we are convinced that there is more than the phosphorylation of T1343 (see response to reviewer 1). Therefore a 10% shift in our assay most likely underestimate the effect seen in cells. Consistently, the T1343A mutants show a similar increase in Rab10 phosphorylation assay as the G2019S mutant. This thus shows that the identified feedback mechanism plays an important role in a cellular context. We have addressed this point in the revised manuscript on page 6, line 8ff. As long as the significance indicators in the bar charts are concerned, we agree with reviewer. In order not to overload the figure, we finally decided to include all pairwise comparisons (post-hoc tests) in the supplement.

      • Figure 6B, Westerns of phosphorylation, the lanes are not identified and it is unclear what these data mean.

      We apologize for this mistake and have added the correct labeling in the revised version of the manuscript.

    2. eLife assessment

      This valuable manuscript reports on the relationship between GTP hydrolysis parameters and kinase activity of LRRK2, which is associated with Parkinson's disease. The authors provide a detailed accounting of the catalytic efficiency of the ROC GTPase domain of pathogenic variants of LRRK2, in comparison with the wild-type enzyme. The authors propose that phosphorylation of T1343 inhibits kinase activity and influences monomer-dimer transitions, but the experimental evidence is currently incomplete.

    3. Reviewer #1 (Public Review):

      Summary:

      This study presents careful biochemical experiments to understand the relationship between LRRK2 GTP hydrolysis parameters and LRRK2 kinase activity. The authors report that incubation of LRRK2 with ATP increases the KM for GTP and decreases the kcat. From this they suppose an autophosphorylation process is responsible for enzyme inhibition. LRRK2 T1343A showed no change, consistent with it needing to be phosphorylated to explain the changes in G-domain properties. The authors propose that phosphorylation of T1343 inhibits kinase activity and influences monomer-dimer transitions.

      Strengths:

      Strengths of the work are the very careful biochemical analyses and interesting result for wild type LRRK2.

      Weaknesses:

      The conclusions related to involvement of a monomer-dimer transition are to this reviewer, premature and an independent method needs to be utilized to bolster this aspect of the story.

    4. Reviewer #2 (Public Review):

      As discussed in the original review, this manuscript is an important contribution to a mechanistic understanding of LRRK2 kinase. Kinetic parameters for the GTPase activity of the ROC domain have been determined in the absence/presence of kinase activity. A feedback mechanism from the kinase domain to GTP/GDP hydrolysis by the ROC domain is convincingly demonstrated through these kinetic analyses. However, a regulatory mechanism directly linking the T1343 phospho-site and a monomer/dimer equilibrium is not fully supported. The T1343A mutant has reduced catalytic activity and can form similar levels of dimer as WT. The revised manuscript does point out that other regulatory mechanisms can also play a role in kinase activity and GTP/GDP hydrolysis (Discussion section). The environmental context in cells cannot be captured from the kinetic assays performed in this manuscript, and the introduction contains some citations regarding these regulatory factors. This is not a criticism, the detailed kinetics here are rigorous, but it is simply a limitation of the approach. Caveats concerning effects of membrane localization, Rab/14-3-3 proteins, WD40 domain oligomers, etc... should be given more prominence than a brief (and vague) allusion to 'allosteric targeting' near the end of the Discussion.

      Specific comments

      (1) The revised version is better organized with respect to the significance of monomer/dimer equilibrium and the relevance of the GTP-binding region of ROC domain that encompasses the T1343 phospho-site. The relevance of monomers/dimers of LRRK2 from previous studies is better articulated and readers are able to follow the reasoning for the various mutations.

      (2) As a suggestion I would change the following on page 6 to clarify for readers:<br /> "...would show no change in kcat and KM values upon in vitro ATP treatment" to:<br /> "...would show no change in kcat and KM values for GTP hydrolysis upon in vitro ATP treatment"

      (3) The levels of dimer in WT (+ATP) and T1343A (+/- ATP) are the same, about 40-45%. These data are cited when the authors state that ATP-induced monomerization is 'abolished' (page 6). My suggestion is to re-phrase this conclusion for consistency with data (Fig 5). For example, one can state that 'ATP incubation does not affect the percentage of dimer for the T1343A variant of LRRK2'. This would be similar to the authors' description of these data on page 8 - 'no difference in dimer formation upon ATP treatment'.

    1. eLife assessment

      In this valuable study, the authors examine the role of the cytoskeletal regulatory protein Abba in governing the process of cell genesis in the developing cortex. This study provides insights into the mechanisms of microcephaly, a developmental malformation. The evidence supporting the study was felt to be solid, but the reviewers did note some technical weaknesses that limit the strength of some of the interpretations.

    1. eLife assessment

      This is an important study on changes in newborns' neural abilities to distinguish auditory signals at 37 weeks of gestation. The evidence of change in neural discrimination as a function of gestational age is convincing, but further analysis of the acoustic signals and description of the infants' language environment would strengthen the interpretation of the results. The work contributes to the field of neurodevelopment and suggests potential clinical applications in neurodevelopmental disorders.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The author studies a family of models for heritable epigenetic information, with a focus on enumerating and classifying different possible architectures. The key aspects of the paper are:

      • Enumerate all 'heritable' architectures for up-to 4 constituents.

      • A study of whether permanent ("genetic") or transient ("epigenetic") perturbations lead to heritable changes

      • Enumerated the connectivity of the "sequence space" formed by these heritable architectures

      • Incorporating stochasticity, the authors explore stability to noise (transient perturbations)

      • A connection is made with experimental results on C elegans.

      The study is timely, as there is a renewed interest in the last decade in non-genetic, heritable heterogeneity (e.g., from single-cell transcriptomics). Consequently, there is a need for a theoretical understanding of the constraints on such systems. There are some excellent aspects of this study: for instance, the attention paid to how one architecture "mutates" into another. Unfortunately, the manuscript as a whole does not succeed in formalising nor addressing any particular open questions in the field. Aside from issues in presentation and modelling choices (detailed below), it would benefit greatly from a more systematic approach rather than the vignettes presented.

      Despite being foundational, this work was systematic in that (1) for the simple architectures modeled using ordinary differential equations (ODEs) with continuity assumptions, parameters that support steady states were systematically determined for each architecture and then every architecture was explored using genetic changes exhaustively, although epigenetic perturbations were not examined exhaustively because of their innumerable variety; and (2) for the more realistic modeling of architectures as Entity-Sensor-Property systems, the behavior of systems with respect to architecture as well as parameter space that lead to particular behaviors (persistence, heritable epigenetic change, etc.) was systematically explored. A more extensive exploration of parameter space that also includes the many ways that the interaction between any two entities/nodes could be specified using an equation is a potentially ever-expanding challenge that is beyond the scope of any single paper.

      Specific aspects that remain to be addressed include the application of multiple notions of heritability to real networks of arbitrary size, considering different types of equations for change of each entity/node, and classifying different behavioral regimes for different sets of parameters.

      The key contribution of the paper is an articulation of the crucial questions to ask of any regulatory architecture in living systems rather than the addressing of any question that a field has recognized as ‘open’. Specifically, through the exhaustive listing of small regulatory architectures that can be heritable and the systematic analysis of arbitrary Entity-Sensor-Property systems that more realistically capture regulatory architectures in living systems, this work points the way to constrain inferences after experiments on real living systems. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this paper will raise awareness in both communities and lead to more constrained models that minimize biases introduced by incomplete knowledge of the network, which is always the case when analyzing living systems.

      Terminology

      The author introduces a terminology for networks of interacting species in terms of "entities" and "sensors" -- the former being nodes of a graph, and the latter being those nodes that receive inputs from other nodes. In the language of directed graphs, "entities" would seem to correspond to vertices, and "sensors" those vertices with positive indegree and outdegree. Unfortunately, the added benefit of redefining accepted terminology from the study of graphs and networks is not clear.

      The Entities-Sensors-Property (ESP) framework is based on underlying biology and not graph theory, making an ESP system not entirely equivalent to a network or graph, which is much less constrained. The terms ‘entity’, ‘sensor’, and ‘property’ were defined and justified in a previous paper (Jose, J R. Soc. Interface, 2020). While nodes of a network can be parsed arbitrarily and the relationship between them can also be arbitrary, entities and sensors are molecules or collections of molecules that are constrained such that the sensors respond to changes in particular properties of other entities and/or sensors. When considered as digraphs, sensors can be seen as vertices with positive indegree and outdegree. The ESP framework can be applied across any scale of organization in living systems and this specific way of parsing interactions also discretizes all changes in the values of any property of any entity. In short, ESP systems are networks, but not all networks are ESP systems. Therefore, the results of network theory that remain applicable for ESP systems need further investigation.

      The key utility of the ESP framework is that it is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles.

      Model

      The model seems to suddenly change from Figure 4 onwards. While the results presented here have at least some attempt at classification or statistical rigour (i.e. Fig 4 D), there are suddenly three values associated with each entity ("property step, active fraction, and number"). Furthermore, the system suddenly appears to be stochastic. The reader is left unsure what has happened, especially after having made the effort to deduce the model as it was in Figs 1 through 3. No respite is to be found in the SI, either, where this new stochastic model should have been described in sufficient detail to allow one to reproduce the simulation.

      The Supplementary Information section titled ‘Simulation of simple ESP systems’ provides the requested detailed information and revisions to the writing provide the biologically grounded justification for parsing interacting regulators as ESP systems.

      Perturbations

      Inspired especially by experimental manipulations such as RNAi or mutagenesis, the author studies whether such perturbations can lead to a heritable change in network output. While this is naturally the case for permanent changes (such as mutagenesis), the author gives convincing examples of cases in which transient perturbations lead to heritable changes. Presumably, this is due the the underlying multistability of many networks, in which a perturbation can pop the system from one attractor to another.

      Unfortunately, there appears to be no attempt at a systematic study of outcomes, nor a classification of when a particular behaviour is to be expected. Instead, there is a long and difficult-to-read description of numerical results that appear to have been sampled at random (in terms of both the architecture and parameter regime chosen). The main result here appears to be that "genetic" (permanent) and "epigenetic" (transient) perturbations can differ from each other -- and that architectures that share a response to genetic perturbation need not behave the same under an epigenetic one. This is neither surprising (in which case even illustrative evidence would have sufficed) nor is it explored with statistical or combinatorial rigour (e.g. how easy is it to mistake one architecture for another? What fraction share a response to a particular perturbation?)

      As an additional comment, many of the results here are presented as depending on the topology of the network. However, each network is specified by many kinetic constants, and there is no attempt to consider the robustness of results to changes in parameters.

      The systematic study of all arbitrary regulatory architectures is beyond the scope of this paper and, indeed, beyond the scope of any one paper. Nevertheless 225,000 arbitrary Entity-Sensor-Property systems were systematically explored and collections of parameters that lead to different behaviors provided (e.g., 78,285 are heritable). These ESP systems more closely mimic regulation in living systems than the coupled ODE-based specification of change in a regulatory architecture.

      The example questions raised here are not only difficult to answer, but subjective and present a moving target for future studies. One, ‘how easy is it to mistake one architecture for another?’. Mistaking one architecture for another clearly depends on the number of different types of experiments one can perform on an architecture and the resolution with which changes in entities can be measured to find distinguishing features. Two, ‘What fraction share a response to a particular perturbation?’. ‘Sharing a response’ also depends on the resolution of the measurement after perturbation.

      DNA analogy

      At two points, the author makes a comparison between genetic information (i.e. DNA) and epigenetic information as determined by these heritable regulatory architectures. The two claims the author makes are that (i) heritable architectures are capable of transmitting "more heritable information" than genetic sequences, and (ii) that, unlike DNA, the connectivity (in the sense of mutations) between heritable architectures is sparse and uneven (i.e. some architectures are better connected than others).

      In both cases, the claim is somewhat tenuous -- in essence, it seems an unfair comparison to consider the basic epigenetic unit to be an "entity" (e.g., an entire transcription factor gene product, or an organelle), while the basic genetic unit is taken to be a single base-pair. The situation is somewhat different if the relevant comparison was the typical size of a gene (e.g., 1 kb).

      Considering every base being the unit of stored information in the DNA sequence results in the maximal possible storage capacity of a genome of given length. Any other equivalence between entity and units within the genome (e.g., 1 kb gene) will only reduce the information stored in the genome.

      Nevertheless, the claim was modified to say that the information content of an ESP system can [italics added] be more extensive than the information content of the genome. This accounts for the possibility of an organism that has an inordinately large genome such that maximal information that can be stored in a particular genome sequence exceeds that stored in a particular configuration of all the contents in a cell.

      I thank the reviewer for providing further explanation of this misunderstanding in the second round of review, which helps draw future readers to the sections in the paper that discusses this important point (also see response to Recommendations for the authors).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I thank the author for their efforts in replying to the comments. I have updated my review accordingly; in particular, I have:

      (1) Removed my complaint that Heritability is nowhere defined

      (2) Removed issues with the presentation of the ODE model in the supplementary information.

      I thank the reviewer for raising these issues and acknowledging the improvements made.

      However, given that the manuscript is broadly unchanged from the initial one, many of my prior comments remain justified. Some key points:

      (1) The manuscript continues to be difficult to read, for the same reasons as I mentioned when reviewing the paper previously.

      (2) The utility of the "ESP" formalism is still unclear.

      • As the author notes, continuous ODEs are of course an idealisation of a system with discrete copy number.

      • However, discussing this is standard fare in any textbook dealing with chemical dynamics and stochastic processes -- see, for instance, the standard textbook by van Kampen.

      • This seems little reason to reject ODEs and implement a poorly defined formalism/simulation scheme.

      (3) The author claims that many questions raised are "beyond the scope of this study". Indeed, answering all of these questions are beyond the scope of any one study. However, as I initially wrote, the paper would be much stronger if it focused on a particular problem rather than the many vignettes depicted.

      The broad scope of this foundational paper necessitates addressing many issues, which may make it a difficult read for some readers. I hope that future work where each paper focuses on one of the aspects raised here will enable the extensive treatment of limited scope as suggested by the reviewer.

      The utility of ODEs is much appreciated and was indeed a computationally efficient way of exploring the vast space of regulatory architectures. As stated in the response to the public reviews, the Entity-Sensors-Property framework provides a biologically grounded way of parsing interacting regulators. This approach is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles.

      On a final note, on the subject of the comparison with DNA:

      Perhaps I have misunderstood something. I simply meant that comparing the "maximal information" with 4 HRAs (12.45 bits) is certainly more than the "maximal information" with 4 basepairs (8 bits), but definitely less than the "maximal information" for four 1-kb genes (4^(4000) combinations, so 8000 bits...)

      Perhaps the author means that the growth in information of HRAs is faster than exponential. If so, that should be shown and then remarked on.

      For this reason, I maintain my comment that the comparison is tenuous.

      This issue was addressed once in the results section and again in the discussion section.

      The results section states that “The combinatorial growth in the numbers of HRAs with the number of interactors can thus provide vastly more capacity for storing information in larger HRAs compared to that afforded by the proportional growth in longer genomes.”

      The discussion section states that “Despite imposing heritability, regulated non-isomorphic directed graphs soon become much more numerous than unregulated non-isomorphic directed graphs as the number of interactors increase (125 vs. 5604 for 4 interactors, Table 1). With just 10 interactors, there are >3x1020 unregulated non-isomorphic directed graphs [60] and HRAs are expected to be more numerous. This tremendous variety highlights the vast amount of information that a complex regulatory architecture can represent and the large number of changes that are possible despite sparsity of the change matrix (Fig. 3).”

      Thus, indeed as the reviewer surmises, the combinatorial explosion in information of HRAs with increases in interacting entities is faster than the proportional growth in information of genome sequence with increases in length.

      In summary, I thank the reviewers and editors for their help in improving the paper and would like to make the current manuscript the Version of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The author studies a family of models for heritable epigenetic information, with a focus on enumerating and classifying different possible architectures. The key aspects of the paper are:

      • Enumerate all 'heritable' architectures for up to 4 constituents.

      • A study of whether permanent ("genetic") or transient ("epigenetic") perturbations lead to heritable changes.

      • Enumerated the connectivity of the "sequence space" formed by these heritable architectures.

      -Incorporating stochasticity, the authors explore stability to noise (transient perturbations). - A connection is made with experimental results on C elegans.

      The study is timely, as there has been a renewed interest in the last decade in nongenetic, heritable heterogeneity (e.g., from single-cell transcriptomics). Consequently, there is a need for a theoretical understanding of the constraints on such systems. There are some excellent aspects of this study: for instance:

      • The attention paid to how one architecture "mutates" into another, establishing the analogue of a "sequence space" for network motifs (Fig 3).

      • The distinction is drawn between permanent ("genetic") and transient ("epigenetic") perturbations that can lead to heritable changes.

      • The interplay between development, generational timescales, and physiological time (as in Fig. 5).

      I thank the reviewer for highlighting these aspects of the work.

      The manuscript would be very interesting if it focused on explaining and expanding these results. Unfortunately, as a whole, it does not succeed in formalising nor addressing any particular open questions in the field. Aside from issues in presentation and modelling choices (detailed below), it would benefit greatly from a more systematic approach rather than the vignettes presented.

      This first paper is foundational and therefore cannot be expected to solve all aspects of the problem of heredity. The work was nevertheless systematic in that (1) for the simple architectures modeled using ordinary differential equations (ODEs) with continuity assumptions, parameters that support steady states were systematically determined for each architecture and then every architecture was explored using genetic changes exhaustively, although epigenetic perturbations were not examined exhaustively because of their wide variety; and (2) for the more realistic modeling of architectures as Entity-Sensor-Property systems, the behavior of systems with respect to architecture as well as parameter space that lead to particular behaviors (persistence, heritable epigenetic change, etc.) was systematically explored. A more extensive exploration of parameter space that also includes the many ways that the interaction between any two entities/nodes could be specified using an equation is a potentially ever-expanding challenge that is beyond the scope of any single paper (see response to additional comments below).

      Specific aspects that remain to be addressed include the application of multiple notions of heritability to real networks of arbitrary size, considering different types of equations for change of each entity/node, and classifying different behavioral regimes for different sets of parameters. As is evident from this list of combinatorial possibilities, the space to be explored is vast and beyond the scope of this foundational paper.

      The key contribution of the paper is an articulation of the crucial questions to ask of any regulatory architecture in living systems rather than the addressing of any question that a field has recognized as ‘open’. Specifically, through the exhaustive listing for small regulatory architectures that can be heritable and the systematic analysis of arbitrary Entity-Sensor-Property systems that more realistically capture regulatory architectures in living systems, this work points the way to constrain inferences after experiments on real living systems. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this paper will raise awareness in both communities and lead to more constrained models that minimize biases introduced by incomplete knowledge of the network, which is always the case when analyzing living systems.

      Terminology

      The author introduces a terminology for networks of interacting species in terms of "entities" and "sensors" -- the former being nodes of a graph, and the latter being those nodes that receive inputs from other nodes. In the language of directed graphs, "entities" would seem to correspond to vertices, and "sensors" those vertices with positive indegree and outdegree. Unfortunately, the added benefit of redefining accepted terminology from the study of graphs and networks is not clear.

      The Entities-Sensors-Property (ESP) framework is based on underlying biology and not graph theory, making an ESP system not entirely equivalent to a network or graph, which is much less constrained. The terms ‘entity’, ‘sensor’, and ‘property’ were defined and justified in a previous paper (Jose, J R. Soc. Interface, 2020). While nodes of a network can be parsed arbitrarily and the relationship between them can also be arbitrary, entities and sensors are molecules or collections of molecules that are constrained such that the sensors respond to changes in particular properties of other entities and/or sensors. When considered as digraphs, sensors can be seen as vertices with positive indegree and outdegree. The ESP framework can be applied across any scale of organization in living systems and this specific way of parsing interactions also discretizes all changes in the values of any property of any entity. In short, ESP systems are networks, but not all networks are ESP systems. Therefore, the results of network theory that remain applicable for ESP systems need further investigation. This justification is now repeated in the paper.

      The key utility of the ESP framework is that it is aligned with the development of mechanistic models for the functions of living systems while being consistent with heredity. In contrast, widely analyzed networks like protein-interaction networks, signaling networks, gene regulatory networks, etc., are not always constrained using these principles. In addition, the language of digraphs where sensors can be seen as vertices with positive indegree and outdegree has been also added to aid readers who are familiar with graph theory.

      Heritability

      The primary goal of the paper is to analyse the properties of those networks that constitute "heritable regulatory architectures". The definition of heritability is not clearly stated anywhere in the paper, but it appears to be that the steady-state of the network must have a non-zero expression of every entity. As this is the heart of the paper, it would be good to have the definition of heritable laid out clearly in either the main text or the SI.

      I have now defined the term as used in this paper early, which is indeed as surmised by the reviewer simply the preservation of the architecture and non-zero levels of all entities. I have also highlighted additional notions of heredity that are possible, which will be the focus of future work. These can range from precise reproduction of the concentration and the localization of every entity to a subset of the entities being reproduced with some error while the rest keep varying from generation to generation (as illustrated in Fig. 2 of Jose, BioEssays, 2018). Importantly, it is currently unclear which of these possibilities reflects heredity in real living systems.

      Model

      As described in the supplementary, but not in the main text, the author first chooses to endow these networks with simple linear dynamics; something like $\partial_t \vec{x} = A x - T x$, where the vector $x$ is the expression level of each entity, $A$ has the structure of the adjacency matrix of the directed graph, and $T$ is a diagonal matrix with positive entries that determines the degradation or dilution rate of each entity. From a readability standpoint, it would greatly aid the reader if the long list of equations in the SI were replaced with the simple rule that takes one from a network diagram to a set of ODEs.

      I have abridged the description by eliminating the steady state expression for every HRA as suggested and simply pointed to the earlier version of the paper for those readers who might prefer the explicit derivations of these simple expressions. An overview is now provided for going from any network diagram to a set of ODEs.

      The implementation of negative regulation is manifestly unphysical if the "entities" represent the expression level of, say, gene products. For instance, in regulatory network E, the value of the variable z can go negative (for instance, if the system starts with z= and y=0, and x > 0).

      Negative values for any entity were avoided in simulations by explicitly setting all such values to zero. This constraint has been added as a note in the section describing the equations for the change of each node/entity in each regulatory network. Specifically, the levels of each entity/sensor was set to zero during any time step when the computed value for that entity/sensor was less than zero. This bounding of the function allows for any approach to zero while avoiding negative values. I apologize for the omission of this constraint from the supplemental material in the last submission. This constraint was used in all the simulations and therefore this change does not affect any of the results presented. In this way, it is ensured that the presence of negative regulation does not lead to negative values.

      Formally, the promotion or inhibition of an entity or sensor can be modeled using any function that is either increasing (for promotion) or decreasing (for inhibition). This diversity of possibilities is one of the challenges that prevents exhaustive exploration of all functions. In fact, the use of ODEs after assuming a continuous function is an idealization that facilitates understanding of general principles but is not in keeping with the discreteness of entities or step changes in their values (amount, localization, etc.) observed in living systems. Other commonly used continuous functions include Hill functions for the rate of production of y given as xn/(k + xn) for x activating y, which increases to ~1 as x increases, or given as k/(k + xn) for x inhibiting y, which decreases to ~0 as x increases. Increasing values of ‘n’ result in steeper sigmoidal curves. In reality, levels of all entities/sensors are expected to be discretized by measurement in living systems and the form of the function for any regulation needs empirical measurement in vivo (see response to comment below).

      The model seems to suddenly change from Figure 4 onwards. While the results presented here have at least some attempt at classification or statistical rigour (i.e. Fig 4 D), there are suddenly three values associated with each entity ("property step, active fraction, and number"). Furthermore, the system suddenly appears to be stochastic. The reader is left unsure of what has happened, especially after having made the effort to deduce the model as it was in Figs 1 through 3. No respite is to be found in the SI, either, where this new stochastic model should have been described in sufficient detail to allow one to reproduce the simulation.

      While ODEs are easier to simulate and understand, they are less realistic as explained above. I have now added more explanation justifying the need for the subsequent simulation of Entity-Sensor-Property systems. I have also expanded the information provided for each aspect of the model (previously outlined in Fig. 4A and detailed within the code) in a Supplementary Information section titled ‘Simulation of simple ESP systems’.

      Perturbations

      Inspired especially by experimental manipulations such as RNAi or mutagenesis, the author studies whether such perturbations can lead to a heritable change in network output. While this is naturally the case for permanent changes (such as mutagenesis), the author gives convincing examples of cases in which transient perturbations lead to heritable changes. Presumably, this is due the the underlying mutlistability of many networks, in which a perturbation can pop the system from one attractor to another.

      Unfortunately, there appears to be no attempt at a systematic study of outcomes, nor a classification of when a particular behaviour is to be expected. Instead, there is a long and difficult-to-read description of numerical results that appear to have been sampled at random (in terms of both the architecture and parameter regime chosen). The main result here appears to be that "genetic" (permanent) and "epigenetic" (transient) perturbations can differ from each other -- and that architectures that share a response to genetic perturbation need not behave the same under an epigenetic one. This is neither surprising (in which case even illustrative evidence would have sufficed) nor is it explored with statistical or combinatorial rigour (e.g. how easy is it to mistake one architecture for another? What fraction share a response to a particular perturbation?)

      The systematic study of all arbitrary regulatory architectures is beyond the scope of this paper and, as stated earlier, beyond the scope of any one paper. Nevertheless 225,000 arbitrary Entity-Sensor-Property systems were systematically explored and collections of parameters that lead to particular behaviors provided (e.g., 78,285 are heritable). These ESP systems more closely mimic regulation in living systems than the coupled ODE-based specification of change in a regulatory architecture.

      The example questions raised here are not only difficult to answer, but subjective and present a moving target for future studies. One, ‘how easy is it to mistake one architecture for another?’. Mistaking one architecture for another clearly depends on the number of different types of experiments one can perform on an architecture and the resolution with which changes in entities can be measured to find distinguishing features. Two, ‘What fraction share a response to a particular perturbation?’. ‘Sharing a response’ also depends on the resolution of the measurement of entities after perturbation.

      As an additional comment, many of the results here are presented as depending on the topology of the network. However, each network is specified by many kinetic constants, and there is no attempt to consider the robustness of results to changes in parameters.

      The interpretations presented are conservative determinations of heritability based on the topology of the architecture. In other words, architectures that can be heritable for some set of parameters. Of course, parameter sets can be found that make any regulatory architecture not heritable. As stated earlier, exploring all parameters for even one architecture is beyond the scope of a single study because of the infinitely many ways that the interaction between any two entities can be specified.

      DNA analogy

      At two points, the author makes a comparison between genetic information (i.e. DNA) and epigenetic information as determined by these heritable regulatory architectures. The two claims the author makes are that (i) heritable architectures are capable of transmitting "more heritable information" than genetic sequences, and (ii) that, unlike DNA, the connectivity (in the sense of mutations) between heritable architectures is sparse and uneven (i.e. some architectures are better connected than others).

      In both cases, the claim is somewhat tenuous -- in essence, it seems an unfair comparison to consider the basic epigenetic unit to be an "entity" (e.g., an entire transcription factor gene product, or an organelle), while the basic genetic unit is taken to be a single base-pair. The situation is somewhat different if the relevant comparison was the typical size of a gene (e.g., 1 kb).

      Considering every base being the unit of stored information in the DNA sequence results in the maximal possible storage capacity of a genome of given length. Any other equivalence between entity and units within the genome (e.g., 1 kb gene) will only reduce the information stored in the genome.

      Nevertheless, the claim has been modified to say that the information content of an ESP system can [italics added] be more extensive than the information content of the genome. This accounts for the possibility of an organism that has an inordinately large genome such that maximal information that can be stored in a particular genome sequence exceeds that stored in a particular configuration of all the contents in a cell.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript uses an interesting abstraction of epigenetic inheritance systems as partially stable states in biological networks. This follows on previous review/commentary articles by the author. Most of the molecular epigenetic inheritance literature in multicellular organisms implies some kind of templating or copying mechanisms (DNA or histone methylation, small RNA amplification) and does not focus on stability from a systems biology perspective. By contrast, theoretical and experimental work on the stability of biological networks has focused on unicellular systems (bacteria), and neglects development. The larger part of the present manuscript (Figures 1-4) deals with such networks that could exist in bacteria. The author classifies and simulates networks of interacting entities, and (unsurprisingly) concludes that positive feedback is important for stability. This part is an interesting exercise but would need to be assessed by another reviewer for comprehensiveness and for originality in the systems biology literature. There is much literature on "epigenetic" memory in networks, with several stable states and I do not see here anything strikingly new.

      The key utility of the initial part of the paper is the exhaustive enumeration of all small heritable regulatory architectures. The implications for the abundance of ‘network motifs’ and more generally any part of a network proposed to perform a particular function is that all such parts need to be compatible with heredity. This principle is generally not followed in the literature, resulting in incomplete networks being interpreted as having motifs or modules with autonomous function. Therefore, while the need for positive feedback for stability is indeed obvious, it is not consistently applied by all. For example, the famous synthetic circuit ‘the repressilator’ (Elowitz and Leibler, “A synthetic oscillatory network of transcriptional regulators”, Nature, 2000), which is presented as an example of ‘rational network design’, has three transcription factors that all sequentially inhibit the production of another transcription factor in turn forming a feedback loop of inhibitory interactions. Therefore, the contributions of the factors that promote the expression of each entity is unknown and yet essential for heritability. The comprehensive listing of the heritable regulatory architectures that are simple provide the basis for true synthetic biology where the contributing factors for observed behavior of the network are explicitly considered only after constraining for heredity. Using this principle, the minimal autonomous architecture that can implement the repressilator is the HRA ‘Z’ (Fig. 1).

      An interesting part is then to discuss such networks in the framework of a multicellular organism rather than dividing unicellular organisms, and Figure 5 includes development in the picture. Finally, Figure 6 makes a model of the feedback loops in small RNA inheritance in C. elegans to explain differences in the length of inheritance of silencing in different contexts and for different genes and their sensitivity to perturbations. The proposed model for the memory length is distinct from a previously published model by Karin et al. (ref 49).

      I thank the reviewer for appreciating this aspect of the paper.

      Strengths:

      A key strength of the manuscript is to reflect on conditions for epigenetic inheritance and its variable duration from the perspective of network stability.

      I thank the reviewer for appreciating the importance of the overall topic.

      Weaknesses:

      • I found confusing the distinction between the architecture of the network and the state in which it is. Many network components (proteins and RNAs) are coded in the genome, so a node may not disappear forever.

      I have added language to clarify the many states of a network versus its architecture (also illustrated in Fig. 4 for ESP systems). Even loss of expression below a threshold can lead to permanent loss if there is not sufficient noise to induce re-expression. For example, consider the simple case of a transcription factor that binds to its own promoter, requiring 10 molecules for the activation of the promoter and thus production of more of the same transcription factor. If an epigenetic change (e.g., RNA interference) reduces the levels to fewer than 10 molecules and if the noise in the system never results in the numbers of the transcription factor increasing beyond 10, the transcription factor has been effectively lost permanently. In this way, reduction of a regulator can lead to permanent change despite the presence of the DNA. Many papers in the field of RNA silencing in C. elegans have provided strong experimental evidence to support this assertion.

      • From the Supplementary methods, the relationship between two nodes seems to be all in the form of dx/dt = Kxy . Y, which is just one way to model biological reactions. The generality of the results on network architectures that are heritable and robust/sensitive to change is unclear. Other interactions can have sigmoidal effects, for example. Is there no systems biology study that has addressed (meta)stability of networks before in a more general manner?

      Indeed, the relationship between any two entities can in principle be modeled using any function. Extensive exploration of the behavior of any regulatory architecture – even the simplest ones – require simplifications. For example, early work by Stuart Kauffman explored Boolean networks (see ref. 10 in the paper for history and extensive explanations). However, allowing all possible ways of specifying the interactions between components of a network makes analysis both a computational and conceptual challenge.

      • Why is auto-regulation neglected? As this is a clear cause of metastable states that can be inherited, I was surprised not to find this among the networks.

      Auto-regulation in the sense of some molecule/entity ultimately leading to the production of more of itself is present in every heritable regulatory architecture. Specifically, all auto-regulatory loops rely on a sequence of interactions between two or more kinds of molecules. For example, a transcription factor (TF) binding to the promoter of its own gene sequence, resulting in the production of more TF protein is a positive feedback loop that relies on many interacting factors (transcription, translation, nuclear import, etc.) and can be considered as ‘auto-regulation’ as it is sometimes referred to in the literature. In this sense, every HRA (A through Z) includes ‘auto-regulation’ or more appropriately positive feedback loops. For example, in the HRA ‘A’, x ‘auto-regulates’ itself via y.

      • I did not understand the point of using the term "entity-sensor-property". Are they the same networks as above, now simulated in a computer environment step by step (thus allowing delays)?

      Please see response to the other reviewer regarding the need for the Entity-SensorProperty framework and how it is distinct from generic networks. Briefly, the ODE-based simple networks, while easy to analyze, are not realistic because of the assumptions of continuity. In contrast ESP systems are more realistic with measurement discretizing changes in property values as is expected in real living systems.

      • The final part applies the network modeling framework from above to small RNA inheritance in C. elegans. Given the positive feedback, what requires explanation is how fast the system STOPs small RNA inheritance. A previous model (Karin et al., ref. 49) builds on the fact that factors involved in inheritance are in finite quantity hence the different small RNAs "compete" for amplification and those targeting a given gene may eventually become extinct.

      The present model relies on a simple positive feedback that in principle can be modulated, and this modulation remains outside the model. A possibility is to add negative regulation by factors such as HERI-1, that are known to limit the duration of the silencing.

      The duration of silencing differs between genes. To explain this, the author introduces again outside the model the possibility of piRNAs acting on the mRNA, which may provide a difference in the stability of the system for different transcripts. At the end, I do not understand the point of modeling the positive feedback.

      The previous model (Karin et al., Cell Systems, 2023) can describe populations of genes that are undergoing RNA silencing but cannot explain the dynamics of silencing particular genes. Furthermore, this model also cannot explain cases of effectively permanent silencing of genes that have been reported (e.g., Devanapally et al., Nature Communications, 2021 and Shukla et al., Current Biology, 2021). Finally, the observations of susceptibility to, recovery from, and even resistance to trans silencing (e.g., Fig. 5a in Devanapally et al., Nature Communications, 2021) require an explanation that includes modulation of the HRDE-1-dependent positive feedback loop that maintains silencing across generations.

      The specific qualitative predictions regarding the relationship between piRNA-mediated regulation genome-wide and HRDE-1-dependent silencing of a particular gene across generations could guide the discovery of potential regulators of heritable RNA silencing. The equations (4) and (5) in the paper for the extent of modulation needed for heritable epigenetic change provide specific quantitative predictions that can be tested experimentally in the future. I have also revised the title of the section to read ‘Tuning of positive feedback loops acting across generations can explain the dynamics of heritable RNA silencing in C. elegans’ to emphasize the above points.

      • From the initial analysis of abstract networks that do not rely on templating, I expected a discussion of possible examples from non-templated systems and was a little surprised by the end of the manuscript on small RNAs.

      The heritability of any entity relies on regulatory interactions regardless of whether a templated mechanism is also used or not. For example, DNA replication relies on the interactions between numerous regulators, with only the sequence being determined by the template DNA. The field of small RNA-mediated silencing facilitates analysis of epigenetic changes at single-gene resolution (Chey and Jose, Trends in Genetics, 2022). It is therefore likely to continue to provide insights into heritable epigenetic changes and how they can be modulated. Unfortunately, there are currently no known cases of epigenetic inheritance where the role of any templated mechanism has been conclusively excluded. Future research will improve our understanding of epigenetic states and their modulation in terms of changes in positive feedback loops as proposed in this study and potentially lead to the discovery of such mechanisms that act entirely independent of any template-dependent entity.

      Recommendations for the authors:

      I thank the reviewers for their specific suggestions to improve the paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper has many long paragraphs that attempt to explain results, make illustrations, and give intuition. Unfortunately, these are difficult to read. It would aid the reader greatly if these were, say, converted into cartoons (even if only in the SI), or made more accessible in some other way.

      I agree with the importance of making the material accessible to readers in multiple ways. I have now added a figure with schematics in the SI titled ‘Illustrations of key concepts’ (new Fig. S2), which collects concepts that are relevant throughout the paper and might aid some readers.

      The bulk of the supplementary is currently a collection of elementary mathematics results: to whit, pages 26 to 33 of the combined manuscript carry no more information than a quick description of the general model and the diagrams in Fig 1. Similarly, pages 34 to 39 (non-zero dilution rate), and pages 39 through 58 (response to permanent changes) each express a trivial mathematical point that is more than sufficiently made with one illustrative example.

      I agree with the reviewer and have condensed these pages as suggested. I have added a pointer to the earlier version as containing further details for the readers who might prefer the explicit listing of these equations.

      Overall, the paper appears to be a collection of numerical results obtained from different models, united by uncertain terminology that is not fully defined in this paper. The most promising aspects of the paper lie either in (a) combinatorially complete enumeration of all regulatory architectures, or (b) relating experimental manipulations in C. elegans to possible underlying regulatory architectures. Focusing on one or the other might improve the readability of the paper.

      The two sections of the paper are complementary and when presented together help with the integration of concepts rather than the siloed pursuit of theory versus experimental analysis. When this work was presented at meetings before submission, it was clear that different researchers appreciated different aspects. This divergence is also apparent in the two reviews, with each reviewer appreciating different aspects. I have repeated the definitions and justifications from the earlier paper (Jose, J R Soc Interface, 2020) to provide a more fluid transition between the two complementary sections of the paper. Knowing both sides could aid in the development of models that are not only consistent with measurable quantities (e.g., anything that can be considered an entity) but are also logically constrained (e.g., entities matched with sensors while avoiding any entities that do not have a source of production – i.e., avoiding nodes with indegree = 0).

      However, having said that many results of these types are well-known in models of regulatory networks, and it is unclear what precisely warrants the new framework that the author is proposing. Indeed, it would be good to understand in what way the framework here is novel, and how it is distinguished from prior studies of regulatory networks.

      The key novelty of the work is the consideration of heritability for any regulation. With the explicit definition of the heritability for a regulatory architecture and the acknowledgement that there can be more than one notion of heredity, this paper now sets the foundation for examining many real networks in this light. I hope that the added justifications for the current framework in the revised paper strengthen these arguments. Future literature reviews on networks in general and how they address heritability or persistence will better define the prevalence of these considerations. Currently, most experimental biologists engaged in reductionist approaches and some systems biologists examining the function or prevalence of network motifs do not explicitly constrain their models for heritability or persistence. It is hoped that this work will raise awareness in both communities and lead to more constrained models that acknowledge incomplete knowledge of the network, which is always the case when analyzing living systems.

      Reviewer #2 (Recommendations For The Authors):

      Minor points/clarity

      • page 1 line 57: "transgenerational waveforms that preserve form and function" is unclear.

      This phrase was expanded upon in a previous paper (Jose, BioEssays, 2020). I have now added more explanation in this paper for completeness. The section now reads ‘For example, the localization and activity of many kinds of molecules are recreated in successive generations during comparable stages [1-3]. These recurring patterns can change throughout development such that following the levels and/or localizations of each kind of molecule over time traces waveforms that return in phase with the similarity of form and function across generations [2].’

      • page 7 line 3-6: the sentence has an ambiguous structure.

      I have now edited this long sentence to read as follows: ‘For systematic analysis, architectures that could persist for ~50 generations without even a transient loss of any entity/sensor were considered HRAs. Each HRA was perturbed (loss-of-function or gain-of-function) after five different time intervals since the start of the simulation (i.e., phases). The response of each HRA to such perturbations were compared with that of the unperturbed HRA.’

      • page 9 lines 25-27: the sentence is convoluted: are you defining epigenetic inheritance?

      I have simplified this sentence describing prior work by others (Karin et al., Cell Systems, 2023) and moved a clause to the subsequent sentence. This section now reads: ‘Recent considerations of competition for regulatory resources in populations of genes that are being silenced suggest explanations for some observations on RNA silencing in C. elegans [49]. Specifically, based on Little’s law of queueing, with a pool of M genes silenced for an average duration of T, new silenced genes arise at a rate  that is given by M = T’. I have also provided more context by preceding this section with: ‘Although the release of shared regulators upon loss of piRNA-mediated regulation in animals lacking PRG-1 could be adequate to explain enhanced HRDE-1-dependent transgenerational silencing initiated by dsRNA in prg-1(-) animals, such a competition model alone cannot explain the observed alternatives of susceptibility, recovery and resistance (Fig. 6A).’

      • page 13 lines 51-53. This last sentence of the discussion is ambiguous/unclear.

      I have now rephrased this sentence to read: ‘This pathway for increasing complexity through interactions since before the origin of life suggests that when making synthetic life, any form of high-density information storage that interacts with heritable regulatory architectures can act as the ‘genome’ analogous to DNA.’

      • Figure 2: the letters in the nodes are hard to read; the difference between full and dotted lines in the graphs also.

      I have enlarged the nodes and widened the gap in the dotted lines to make them clearer. I have also similarly edited Fig. 1 and Fig. S3 to Fig. S9.

    2. eLife assessment

      This useful manuscript explores conditions for epigenetic inheritance by studying the stability of simple network models to permanent and transient perturbations. A novel aspect of the study is that it unifies non-genetic inheritance phenomena across cell divisions of unicellular organisms and in the germline of multicellular organisms. However, the models studied are more a collection of vignettes of numerical studies than a systematic study, therefore the evidence presented remains incomplete. As a first step towards building a more systematic theoretical framework, this work will be of interest to colleagues in the field of epigenetic inheritance.

    3. Reviewer #1 (Public Review):

      The author studies a family of models for heritable epigenetic information, with a focus on enumerating and classifying different possible architectures. The key aspects of the paper are:

      - Enumerate all 'heritable' architectures for up-to 4 constituents.<br /> - A study of whether permanent ("genetic") or transient ("epigenetic") perturbations lead to heritable changes<br /> - Enumerated the connectivity of the "sequence space" formed by these heritable architectures<br /> - Incorporating stochasticity, the authors explore stability to noise (transient perturbations)<br /> - A connection is made with experimental results on C elegans.

      The study is timely, as there is a renewed interest in the last decade in non-genetic, heritable heterogeneity (e.g., from single-cell transcriptomics). Consequently, there is a need for a theoretical understanding of the constraints on such systems. There are some excellent aspects of this study: for instance, the attention paid to how one architecture "mutates" into another. Unfortunately, the manuscript as a whole does not succeed in formalising nor addressing any particular open questions in the field. Aside from issues in presentation and modelling choices (detailed below), it would benefit greatly from a more systematic approach rather than the vignettes presented.

      ## Terminology

      The author introduces a terminology for networks of interacting species in terms of "entities" and "sensors" -- the former being nodes of a graph, and the latter being those nodes that receive inputs from other nodes. In the language of directed graphs, "entities" would seem to correspond to vertices, and "sensors" those vertices with positive indegree and outdegree. Unfortunately, the added benefit of redefining accepted terminology from the study of graphs and networks is not clear.

      ## Model

      The model seems to suddenly change from Figure 4 onwards. While the results presented here have at least some attempt at classification or statistical rigour (i.e. Fig 4 D), there are suddenly three values associated with each entity ("property step, active fraction, and number"). Furthermore, the system suddenly appears to be stochastic. The reader is left unsure what has happened, especially after having made the effort to deduce the model as it was in Figs 1 through 3. No respite is to be found in the SI, either, where this new stochastic model should have been described in sufficient detail to allow one to reproduce the simulation.

      ## Perturbations

      Inspired especially by experimental manipulations such as RNAi or mutagenesis, the author studies whether such perturbations can lead to a heritable change in network output. While this is naturally the case for permanent changes (such as mutagenesis), the author gives convincing examples of cases in which transient perturbations lead to heritable changes. Presumably, this is due the the underlying multistability of many networks, in which a perturbation can pop the system from one attractor to another.

      Unfortunately, there appears to be no attempt at a systematic study of outcomes, nor a classification of when a particular behaviour is to be expected. Instead, there is a long and difficult-to-read description of numerical results that appear to have been sampled at random (in terms of both the architecture and parameter regime chosen). The main result here appears to be that "genetic" (permanent) and "epigenetic" (transient) perturbations can differ from each other -- and that architectures that share a response to genetic perturbation need not behave the same under an epigenetic one. This is neither surprising (in which case even illustrative evidence would have sufficed) nor is it explored with statistical or combinatorial rigour (e.g. how easy is it to mistake one architecture for another? What fraction share a response to a particular perturbation?)

      As an additional comment, many of the results here are presented as depending on the topology of the network. However, each network is specified by many kinetic constants, and there is no attempt to consider the robustness of results to changes in parameters.

      ## DNA analogy

      At two points, the author makes a comparison between genetic information (i.e. DNA) and epigenetic information as determined by these heritable regulatory architectures. The two claims the author makes are that (i) heritable architectures are capable of transmitting "more heritable information" than genetic sequences, and (ii) that, unlike DNA, the connectivity (in the sense of mutations) between heritable architectures is sparse and uneven (i.e. some architectures are better connected than others).

      In both cases, the claim is somewhat tenuous -- in essence, it seems an unfair comparison to consider the basic epigenetic unit to be an "entity" (e.g., an entire transcription factor gene product, or an organelle), while the basic genetic unit is taken to be a single base-pair. The situation is somewhat different if the relevant comparison was the typical size of a gene (e.g., 1 kb).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a noncanonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multihierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

      Public Reviews:

      Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

      We greatly value the reviewer's efforts and feedback in reviewing our manuscript. We acknowledge the reviewer's observation that the changes indicated by our proteomic results are not markedly pronounced. To reinforce our findings, we have incorporated an analysis of gene alterations at the single-cell level (snRNA-seq) from the publicly accessible IRI mouse model data (Figure supplement 7). The snRNA-seq data align with our proteomic data in terms of the general trend of gene/protein alterations, but reveal more substantial changes in both ASCT2 and SMCTs. These discrepancies might stem from the different quantification methods used, suggesting a possible underestimation in our label-free proteomic quantification. The differences we see between the functional changes in transporters and their quantification in proteomics can be explained by the unique challenges posed by membrane proteins. Post-translational modifications and the complex nature of multiple transmembrane domains often impact the accurate measurement of these proteins in proteomic studies. This complexity can lead to a mismatch between the actual functional changes occurring in the transporters and their perceived abundance or alterations as detected by proteomic methods (Figure 4A) (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). However, this label-free quantitative proteomics approach is well-suited for our study, given its screening efficiency, compatibility with animal models, and the absence of a labeling requirement. We may consider incorporating alternative quantitative proteomic methods in future for a more thorough comparison. We have included these considerations in lines 351-356 of the revised manuscript.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNAsequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      Regarding the roles of ASCT2 and SMCTs in renal D-serine transport, snRNA-seq showed that ASCT2 expression in the controls is less than 10% of the cell population. We suggest that ASCT2 contributes to D-serine reabsorption because of its high affinity and SMCTs (SMCT1 and SMCT2) would play a role in D-serine reabsorption in the cells without ASCT2 expression. In addition, we included other factors (the turnover rate and the presence of local canonical substrates) that may determine the capability of D-serine reabsorption. We have included this suggestion in the Discussion lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

      Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The experimental approach and the identification of D-serine as a new substrate for SMCT1 merit publication in Elife.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).

      It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      We thoughtfully appreciate the reviewer’s comment on the manuscript. Considering the alteration of D-/L-serine ratios, there are several factors including protein expression levels at both apical and basolateral sides, properties of the transporters (e.g. transport affinities, substrate stereoselectivities), and the expression of DAAO (D-amino acid oxidase) which selectively degrades D-amino acids. Moreover, the mechanism becomes more complicated when the transport systems of L- and D-enantiomers are different and have distinct stereoselectivities as in the case of serine. Future studies are required to complete the mechanism. However, we would like to explore the mechanism based on the current knowledge.

      From this study, we identified ASCT2 and SMCTs (SMCT1 and SMCT2) as D-serine transport systems. We showed that SMCT1 prefers D-serine. Although we did not analyze ASCT2 stereoselectivity, based on the previous studies, ASCT2 recognizes both D- and Lserine with high affinities and slightly prefers L-enantiomer (Km of 18.4 µM for L-serine in oocyte expression system (Utsunomiya-Tate et al. J Biol Chem 1996) and 167 µM for Dserine in oocyte expression system (Foster et al. Plos ONE 2016), and the IC50 of 0.7 mM for L-serine and 4.9 mM for D-serine (in HEK293 expression systems, Foster et al. PLOS ONE 2016). The proteomics showed an increase of ASCT2 (1.6-fold increase) and a decrease of SMCTs (1.7-fold decrease in SMCT1, and 1.3-fold decrease in SMCT2) in IRI conditions. The table below summarizes D-serine transport by ASCT2 and SMCTs.

      In the case of L-serine, ASCT2 and B0ATs (in particular B0AT3) have been revealed as L-serine transport systems in the kidneys (Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Proteomics showed that B0ATs have higher expression levels than ASCT2 supporting the idea that B0ATs are the main L-serine transport system (Table S1: Abundance of B0AT1 = 1.34E+09, B0AT3 = 2.13E+08, ASCT2 = 1.46E+07). In IRI conditions, B0AT3 decreased 1.8 fold and B0AT1 decreased 1.1 fold. From these results, we included the contribution of B0ATs in L-serine transport in Author response table 1.

      Author response table 1.

      Taken together, we suggest that high ratios of D-/L-serine in IRI conditions are a combinational result of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction and 2) decrease of L-serine reabsorption by B0ATs. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratio, with low rations in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a D-serine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/L-serine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomic analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a thorough study that was reviewed previously under the old system. I think the authors have strengthened their findings and have no further suggestions.

      We appreciate reviewer 1 for his/her effort and comments, which greatly contributed to improving this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The experiments seem to me to have been well performed and the data are readily available.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.

      (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.

      Following the reviewer’s comment, we have included the explanation of the micromolecule and membrane transport proteins in lines 41-43.

      Manuscript lines 41-43

      “Membrane transport proteins function to transport micromolecules such as nutrients, ions, and metabolites across membranes, thereby playing a pivotal role in the regulation of micromolecular homeostasis.”

      (2) In line 91, I suggest specifying that this is a renal IRI model.

      Following the reviewer’s comment, we have added the information that it is a renal IRI model of AKI (lines 90-92).

      Manuscript lines 90-92

      “We applied 2D-HPLC to quantify the plasma and urinary enantiomers of 20 amino acids of renal ischemia-reperfusion injury (IRI) mice, a model of AKI and AKI-to-CKD transition (Sasabe et al., 2014; Fu et al., 2018).”

      (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?

      To our knowledge, there is no report of ASCT2 expression in other nephron segments. Our immunofluorescent data of the ASCT2 staining in the whole kidney at the low magnification and another region of Figure 3 (below) as well as immunohistochemistry from Human Protein Atlas (update: Jun 9th, 2023) did not show a strong signal of ASCT2 expression in other regions besides the proximal tubules. Thus, we conclude that ASCT2 is mainly expressed in proximal tubules, but not in other nephron regions.

      Author response image 1.

      (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?

      This experiment was done by expressing the candidate genes in the presence of endogenous ASCT2. We have added the information in lines 225-227 to emphasize this process.

      Manuscript lines 225-227

      “Based on this finding, we utilized cell growth determination assay as the screening method even in the presence of endogenous ASCT2 expression. HEK293 cells were transfected with human candidate genes without ASCT2 knockdown.”

      (5) Lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTRSMCT1 or 2 cells?

      We appreciate the reviewer to point out this data. We apologize for causing the confusion in the text. The total amount of D-serine uptake in the cells did not enhance but the net uptake (uptake subtracted from the background) was increased. This enhancement is a result of the lower background by ASCT2 knockdown. We have revised the texts and explained this result in more detail (lines 256-258).

      Manuscript lines 256-258

      “In the cells with ASCT2 knockdown, the background level was lower, thereby enhancing the D-[3H]serine transport contributed by both SMCT1 and SMCT2 (the net uptake after subtracted with background) (Figure 5C).”

      (6) Line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.

      We admitted the reviewer’s concern about the low affinity of SMCT1. However, Km at mM range is widely accepted for several low-affinity amino acid transporters such as proton-coupled amino acid transporter PAT1 (Km = 2 – 5 mM; Miyauchi et al. Biochem J 2010), cationic amino acid transporter CAT2A (Km = 3 – 4 mM; Closs et al. Biochem 1997), and large-neutral amino acid transporter LAT4 (Km = 17 mM; Bodoy et al. J Biol Chem 2005). In the kidneys, many compounds are well-known to be reabsorbed by the low-affinity but high-capacity (high-expression) transporters. Similarly, D-serine was reported to be reabsorbed by the low-affinity transporter (Kragh-Hansen and Sheikh, J Physiol 1984; Shimomura et al. BBA 1988; Silbernagl et al. Am J Physiol Renal Physiol 1999). Moreover, amino acid profile showed urinary D-serine in the range of 100 – 200 µM (Figure supplement 2). This concentration range could drive SMCT1 function (Figure 5). Combined with the high and ubiquitous expression of SMCT1, we propose that SMCT1 is a low-affinity but highcapacity D-serine transporter in the kidneys.

      snRNA-seq is a method that can directly compare the expression levels between different genes within the same cells. From Figure supplement 7, expression of SMCT1 is much more abundant than ASCT2. ASCT2 was presented in less than 10% of cell population. It is possible that 90% of the cells that do not express ASCT2 use SMCT1 to reabsorb Dserine.

      We have revised the Discussion regarding this comment (lines 386-404).

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      (7) Line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?

      We agree with the reviewer that transporters at the basolateral membrane are important to complete the D-serine reabsorption in the kidney, and have included this issue in the original manuscript. We stated that transport systems at the basolateral side are necessary to be analyzed in order to complete the picture of D-serine transport systems in the kidney (lines 481-483 of the revised manuscript). However, we did not have any strong candidates for basolateral D-serine transport systems. Because we analyzed the proteome of BBMV, which concentrates on the apical membrane proteins, the analysis did not detect several transporters at the basolateral side.

      (8) In lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

      Thank you for your comment. Proximal tubule (PT) is the first and the main region to maintain acid-base homeostasis in the kidney. In PT cells, NH3 secretes H+ to titrate luminal HCO3- and creates CO2, which is absorbed into PT cells and produces "new intracellular HCO3-", which is subsequently reabsorbed into the blood. Although ion fluxes in PT is to maintain the pH homeostasis, the pH regulation in both luminal and intracellular PT cells is highly dynamic. We totally agree with the reviewer and to follow that, we have revised the text by emphasizing the pH around PT segments, rather than the final urine pH, and leaving the discussion open for the possibility of PAT1 function in PT of normal kidneys (lines 474481).

      Manuscript lines 474-481

      “PAT1, a low-affinity proton-coupled amino acid transporter (Km in mM range), has been found at both sub-apical membranes of the S1 segment and inside of the epithelia (The Human Protein Atlas: https://www.proteinatlas.org; updated on Dec 7th, 2022) (Sagné et al., 2001; Vanslambrouck et al., 2010). PAT1 exhibits optimum function at pH 5 - 6 but very low activity at pH 7 (Miyauchi et al., 2005; Bröer, 2008b). Future research is required to address the significance of PAT1 on D-serine transport in the proximal tubule segments where pH regulation is known to be highly dynamic (Boron, 2006; Nakanishi et al., 2012; Bouchard and Mehta, 2022; Imenez Silva and Mohebbi, 2022).”

      Reviewer #3 (Recommendations For The Authors):

      The authors proposed that the increased expression of ASCT2, even together with the decreased expression of SMCT1/2, causes the increased renal reabsorption of D-serine that occurs in IRI. In the discussion, the main argument to sustain this hypothesis is the higher apparent affinity for D-serine of ASCT2 (<200 uM Km) versus SMCT1 (3.4 mM Km). In the Discussion section (page 18- 1st complete paragraph), the authors indicate that the Mass Spec intensities of SMCT1 and 2 are two and one order of magnitude higher respectively than that of ASCT2. This suggests that SMCT1 is clearly more expressed than ASCT2 in control conditions. IRI increments ASCT2 protein expression in brush-border membrane vesicle from kidney 1.6 folds and decreases that of SMCT1 0.6 folds. How this fold changes, even taking into account the lower Km of ASCT2 versus SMCT1 would explain the dramatic changes in the D-/L-serine ratios in plasma and urine in IRI? The authors might discuss whether other transport characteristics, even unknown (e.g., a higher turnover rate of ASCT2 vs SMCT1), would also contribute to the higher D-serine reabsorption in IRI.

      SMCT1 shows some enantiomer selectivity for D- vs L-serine. At 50 uM concentration the transport is almost double for D. vs L-serine, but is ASCT2 stereoselective between the two enantiomers of serine? Some of the authors of this manuscript showed in a previous paper that the basolateral transporter Asc1 also participates in the accumulation of D-serine in serum caused by renal tubular damage. (Serum D-serine accumulation after proximal renal tubular damage involves neutral amino acid transporter Asc-1. Suzuki M et al. Sci Rep. 2019 Nov 13;9(1):16705 (PMID: 31723194)). Asc1 shows no stereoselectivity between L- and D-serine. Can the authors discuss possible mechanisms resulting in increased renal reabsorption of Dserine than L-serine in IRI with the participation of transporters with modest stereoselectivity for D- vs L-serine?

      We appreciate the reviewer’s comments on the degree of protein alteration in proteomics, the functional contributions of ASCT2 and SMCTs, and the alteration of D/L ratios. We have included the possibilities of the technical concerns and the discussion on the roles of ASCT2 and SMCTs as follows.

      • Regarding the expression levels, proteomics and snRNA-seq showed the same tendency that ASCT2 increase and SMCTs decrease in IRI conditions. However, the degrees of alterations are more contrast in snRNA-seq. This may be due to the difference in quantification methods and probably points out the underestimated quantification of membrane transport proteins in label-free proteomics. The accuracy of protein quantifications in the label-free proteomics are often impacted by the presence of post-translational modifications and multiple trans-membrane domains like in the case of the membrane transport proteins (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). Alternative methods of quantitative proteomics may be added in the future for a more thorough comparison. We have added this issue in lines 351-356 of the revised version.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNA-sequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      • For the functional contributions of ASCT2 and SMCTs in the kidney, we admitted the reviewer’s concern about the low affinity of SMCT1. Following the reviewer’s comment, we have included other factors besides transport affinities, e.g. expression levels and turnover rates of the transporters. From the results of both proteomics and snRNA-seq, ASCT2 expression is significantly lower than SMCTs in the normal conditions. snRNA-seq showed that ASCT2 was presented in less than 10% of the cell population (Figure supplement 7). We propose that most of the cells that do not express ASCT2 may use SMCT1 to reabsorb D-serine. This topic was included in the revised manuscript lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of D-serine transport systems.”

      • As for the dramatic alterations of D/L-serine ratios juxtaposed with minimal changes in ASCT2 and SMCTs expression level, we cautiously refrain from drawing a definitive conclusion regarding the entire mechanism. This caution is grounded in the scientific understanding of a comprehensive elucidation of both L-serine transport systems and D-serine transport systems at both apical and basolateral membranes. Nevertheless, we would like to suggest a mechanism at the apical membrane based on the current knowledge.

      For D-serine transport systems, we found ASCT2 and SMCTs contributions in this study. Meanwhile, L-serine was previously reported to be mediated mainly by the neutral amino acid transporters B0AT3 (in particular B0AT3; Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Hence, the mechanism behind the alterations of D/L-serine ratios should include B0AT3 functions as well. In IRI conditions, B0AT3 decreased 1.8 fold. We suggest that high ratios of D-/L-serine in IRI conditions are a combined outcome of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction, and 2) decrease of L-serine reabsorption by B0AT3. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratios, with low ratios in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested the differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a Dserine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/Lserine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomics analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      • In the case of Asc-1, it was reported to be a D-serine transporter in the brain (Rosenberg et al. J Neurosci 2013). Suzuki et al. 2019 showed the increase of Asc-1 in cisplatin-induced tubular injury. Notably, the mRNA of Asc-1 is predominantly found in Henle’s loop, distal tubules, and collecting ducts but not in proximal tubules, and its protein expression level is dramatically low in the kidney (Human Protein Atlas: update on Jun 19, 2023). Furthermore, in this study, Asc-1 expression was not detected in the brush border membrane proteome. Consequently, we have decided not to include Asc-1 in the Discussion of this study, which primarily focuses on the proximal tubules.
    2. eLife assessment

      This study shows compelling evidence that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a non-canonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multi-hierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

    3. Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

    4. Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early ischemia- reperfusion injury (IRI), a well-known acute kidney injury (AKI) model. In a second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1 and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1 and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and ischaemia-reperfusion injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.<br /> (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.<br /> (2) In line 91, I suggest specifying that this is a renal IRI model.<br /> (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?<br /> (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?<br /> (5) lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTR-SMCT1 or 2 cells?<br /> (6) line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.<br /> (7) line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 that there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?<br /> (8) in lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

    5. Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia reperfusion injury in kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In the way, the authors revealed that SMCT1 also transports D-serine.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible of the increment of this enantiomer in serum in a murine model of ischemia reperfusion injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decreased in the ratio D-serine/L-serine in the urine of mouse under ischemia reperfusion injury (IRI).<br /> It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      I am satisfied with the changes the authors have introduced in the text of the revised version of their manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) More explanation/description of Fig 3C and 3D would be helpful for readers, including the color code of 3D and black lines shown in both panels.

      We have added more description to the legend of Figure 3, and we have used the same color code as in Figure 2, which we now specifically note in the figure legend as well.

      (2) Differences between cranial and trunk NCC could be experimentally shown or discussed. Fig 4C shows some differences between these two populations, but in situ, results using Dlc1/Sp5/Pak3 probes in the trunk region may be informative, like Fig 5 supplement 2 for cranial NCCs.

      This is an important point. The focus of our study was on cranial neural crest cells, and the single cell sequencing data is therefore truly reflective of only cranial neural crest cells. We have not functionally tested for the roles of Dlc1/Sp5/Pak3 in trunk neural crest cells, however, based on the expression and loss-of-function phenotypes of Sp5 or Pak3 knockout mice, we predict they individually may not play a significant role. It remains plausible that Dlc1 could play an important role in the delamination of trunk neural crest cells, but we have not tested that definitively. Nonetheless, Sabbir et al 2010 showed in a gene trap mouse mutant that Dlc1 is expressed in trunk neural crest cells. Regarding the similarities and differences between cranial and trunk neural crest cells as noted by the reviewer with respect to Figure 4, it’s important to recognize the temporal differences illustrated in Figure 4. Neural crest cell delamination proceeds in a progressive wave from anterior to posterior, but also that the analysis was designed to quantify cell cycle status before and during neural crest cell delamination. We have compared cranial and trunk neural crest cells in more detail in the discussion and also speculate what might happen in the trunk based on what we know from other species.

      (3) Discussion can be added about the potential functions of Dlc1 for NCC migration and/or differentiation based on available info from KO mice.

      We have added specific details regarding the published Dlc1 knockout mouse phenotype to the discussion, particularly with respect to the craniofacial anomalies which included frontonasal prominence and pharyngeal arch hyperplasia, and defects in neural tube closure and heart development. Although the study didn’t investigate the mechanisms underpinning the Dlc1 knockout phenotype, the craniofacial morphological anomalies would be consistent with a deficit in neural crest cell delamination reducing the number of migrating neural crest cells, as we observed in our Dlc1 knockdown experiments.

      Reviewer #2 (Recommendations For The Authors):

      The authors used the (Tg(Wnt1-cre)11Rth Tg(Wnt1-GAL4)11Rth/J) line but work from the Bush lab (see Lewis et al., 2013) has demonstrated fully penetrant abnormal phenotypes that affect the midbrain neuroepithelium, increased CyclinD1 expression and overt cell proliferation as measured by BrdU incorporation. The authors should explain why they used this mouse line instead of the Wnt1-Cre2 mice (129S4-Tg(Wnt1-cre)1Sor/J) in the Jackson Laboratory (which lacks the phenotypic effects of the original Wnt1-Cre line), or a "Cre-only" control, or at a minimum explain the steps they took to ensure there were no confounding effects on their study, especially since cell proliferation was a major outcome measure.

      This is an important point, and we thank the reviewer for raising it. Yes, it has been reported that the original Wnt1Cre mice exhibit a midbrain phenotype (Ace et al. 2013). However, it has also been noted that Wnt1Cre2 can exhibit recombination in the male germline leading to ubiquitous recombination (Dinsmore et al., 2022). Therefore, to avoid any potential for bias, we used an equal number of cells derived from the Wnt1 and F10N transgenic line embryos in our scRNA-seq, and this included multiple non-Cre embryos. Our scRNA-seq analysis was therefore not dependent upon Wnt1-Cre, but also because we used whole heads not fluorescence sorted cells. However, Wnt1-Cre lineage tracing was advantageous from a computational perspective to help define cells that were premigratory and migratory in concert with Mef2c-lacZ ¬based on their expression of YFP, LacZ or both. We note these specifics more clearly in the methods.

      The Results section (line 122) states that scRNA-seq was performed on dissociated cranial tissues but the Methods section (lines 583-584) implies that whole E8.5 mouse embryos were dissociated. Which was dissociated, whole embryos or just cranial tissues? Obviously, the latter would be a better strategy to enrich for cranial neural crest, but the authors also examine the trunk neural crest. This should be clarified in the text.

      We apologize that some of the details regarding the tissue isolation were confusing and we have clarified this in the methods and the text. For the record, after isolating E8.5 embryos, we then dissected the head from those embryos, and performed scRNA-seq on dissociated cranial tissues. As the reviewer correctly noted, this approach strategically enriches for cranial neural crest cells.

      The authors do not justify why they chose a knockdown strategy, which has its limitations including its systemic injection into the amniotic cavity, its likely global and more variable effects, and its need to be conducted in culture. Why the authors did not instead use a Wnt1-Cre-mediated deletion of Dlc1, which would have been "cleaner" and more specific to the neural crest, is not clear (maybe so they could specifically target different Dcl1 isoforms?). Also, the authors use Sox10 as a marker to count neural crest cells, but Sox10 may only label a subset of neural crest cells and thus some unaffected lineages may not have been counted. The authors should mention what is known about the regulation of Dcl1 by Sox10 in the neural crest. Although the data are persuasive, a second marker for counting neural crest cells following knockdown would make the analysis more robust. Can the authors explain why they did not simply use the Mef2c-F10N-LacZ line and count LacZ-positive cells (if fluorescence signal was required for the quantification workflow, then could they have used an anti-beta Galactosidase antibody to label cells)?

      We thank the reviewer for raising these important considerations. It has previously been noted that although Wnt1-Cre is the gold standard for conditional deletion analyses in neural crest cell development, especially migration and differentiation, it is not a good tool for functional studies of the specification and delamination of neural crest cells due to the timing of Wnt1 expression and Cre activation and excision (see Barriga et al., 2015). Therefore, we chose a knockdown strategy instead, and also because it allows us to more rapidly evaluate gene function. We agree that there are limitations to the approach with respect to variability, however, this is outweighed by the ability to repeatedly perform the knockdown at multiple and more relevant temporal stages such as E7.5 (which is prior to the onset of Wnt1-Cre activity), as well as target different isoforms, and also treat large numbers of embryos for quantitative analyses. The advantage of using Sox10 as a marker for counting neural crest cells is that at the time of analysis, cranial neural crest cells are still migrating towards the frontonasal prominences and pharyngeal arches, and the overwhelming majority of these cells are Sox10 positive. Moreover, we can therefore assay every Dlc1 knockdown embryo for Sox10 expression and count the number of migrating neural crest cells. The limitation of using the Mef2c-F10N-LacZ line is that this transgenic line is maintained as a heterozygote, and thus only half the embryos in a litter could reasonably be expected to be lacZ+. But combining Sox10 and Mef2c-F10N-LacZ fluorescent immunostaining for similar analyses in the future is a great idea.

      Reviewer #3 (Recommendations For The Authors):

      The putative intermediate cells differentially express mRNAs for genes involved in cell adhesion, polarity, and protrusion relative to bona fide premigratory cells (Fig. 2E). This is persuasive evidence, but only differentially expressed genes are shown. Discussing those markers that have not yet changed, e.g. Cdh1 or Zo1 (?), would be instructive and help to clarify the order of events.

      We thank the author for this suggestion and we have provided more detail about adherens junction and tight junctions. Cdh1 is not expressed, and although Myh9 and Myh10 are expressed, we did not detect any significant changes. ZO1 is a tight junction protein encoded by the gene Tjp1, which along with other tight junctions protein encoding genes, is downregulated in intermediate NCCs as shown in the Figure 2E.

      It is unclear whether the two putative intermediate state clusters differ other than their stage of the cell cycle. Based on the trajectory analysis in Fig. 3C-D, the authors state that these two populations form simultaneously and independently but then merge into a single population. However, without further differential expression, it seems more plausible that they represent a single population that is temporarily bifurcated due to cell cycle asynchrony.

      We have addressed the cell cycle question in the discussion by noting that while it is possible the transition states represent a single population that is temporarily bifurcated due to cell cycle asynchrony, if this were true, then we should expect S phase inhibition to eliminate both transition state groups. Instead, our trajectory analyses suggest that the transition states are initially independent, and furthermore, S phase inhibition did not affect delamination of the other population of neural crest cells.

      The authors do not present an in-depth comparison of these neural crest intermediate states to previously reported cancer intermediate states. This analysis would reveal how similar the signatures are and thus how extrapolatable these and future findings in delaminating neural crest are to different types of cancer.

      We have also added more detail to the discussion to address the potential for similarities and differences in neural crest intermediate states compared to previously reported cancer intermediate states. The challenge, however, is that none of the cancer intermediate states have been characterized at a molecular level. Nonetheless, with the limited molecular markers available, we have not identified any similarities so far, but our datasets are now available for comparison with future cancer EMP datasets.

      The reduction in SOX10+ cells may be in part or wholly attributable to inhibition of proliferation AFTER delamination. Showing that there are premigratory NCCs in G2/M at ~E8.0 would bolster the argument that this population is present from the earliest stages.

      The presence of premigratory neural crest cells in G2/M is shown by the scRNA-seq data and cell cycle staining data in the neural plate border.

      Lines 248-249: The pseudo-time analysis in Fig 3C/D does indicate that the two most mature cell clusters (pharyngeal arch and frontonasal mesenchyme) may arise from common or similar migratory progenitors. However, given the decades of controversy about fate restriction of neural crest cells, the statement that "EMT intermediate NCC and their immediate lineages are not fate restricted to any specific cranial NCC derivative at this timepoint" should be toned down so as to not give the impression that they have identified common progenitors of ectomesenchyme and neuro/glial/pigment derivatives.

      We appreciate this comment, because as the reviewer noted, there has been considerable literature and debate about the fate restriction and plasticity of neural crest cells, and indeed we did not intend to imply we have identified common progenitors of ectomesenchyme and neuro/glial/pigment derivatives. That can only be truly functionally demonstrated by clonal lineage tracing analyses. Rather, we interpret our pseudo-time analyses to indicate that irrespective of cell cycle status at the time of delamination, these two populations come together with equivalent mesenchymal and migratory properties, but in the absence of fate determination in the collective of cells. This does not mean that individual cells are common progenitors of both ectomesenchyme and neuro/glial/pigment derivatives. The nuance is important, and we address this more carefully in the text.

      Lines 320-321: "...this overlap in expression was notably not observed in older embryos in areas where EMT had concluded". It is unclear whether the markers no longer overlap in older embryos (i.e. segregate to distinct populations) or are simply no longer expressed.

      The data in Figure 5 demonstrates the dynamic and overlapping expression of Dlc1, Sp5 and Pak3 in the different clusters of cells as they transition from being neuroepithelial to mesenchymal. In contrast to Sp5 and Pak3, Dlc1 is not expressed by premigratory neural crest cells but is expressed at high levels in all EMT intermediate stage neural crest cells. Later as Dlc1 continues to be expressed in migrating neural crest cells, Pak3 and Sp5 are downregulated. But the absence of overlapping expression in the dorsolateral neural plate at the conclusion of EMT coincides with their downregulation in that territory.

      In the final results section on Dlc1, the previously published mutant mouse lines are referenced as having "craniofacial malformation phenotypes". The lack of detail given on what those malformations are (assuming descriptions are available) makes the argument that they may be related to insufficient delamination less persuasive. The degree of knockdown correlates so well with the percentage reduction in migratory neural crest (Fig. 6) that one would imagine a null mutant to have a very severe phenotype.

      The inference from the reviewer is correct and indeed Dlc1 null mutant mice do have a severe phenotype. We have added more specific details regarding the craniofacial and other phenotypes of the Dlc1 mutant mice to the discussion. Of note the frontonasal prominences and the pharyngeal arches are hypoplastic in E10.5 Dlc1 mutant embryos, which would be consistent with a neural crest cell deficit. Although a deficit in neural crest cells can be caused my multiple distinct mechanisms, our Dlc1 knockdown analyses suggest that the phenotype is due to an effect on neural crest cell delamination which diminishes the number of migrating neural crest cells.

      Use the same y-axis for Fig. 4C/D

      This has been corrected.

      Fig. 6C: Please note in the panel which gene is being measured by qPCR

      This has been corrected to denoted Dlc1.

      Lines 108-117: More concise language would be appropriate here.

      As requested, we were more succinct in our language and have shortened this section.

      The SABER-FISH images are very dim. I realize the importance of not saturating the pixels, but the colors are difficult to make out.

      We thank the reviewer for pointing this out and have endeavored to make the SABER-FISH images brighter and easier to see.

    2. eLife assessment

      This fundamental study reports compelling findings that intermediate states exist in epithelial-mesenchymal transition (EMT) during natural development and differentiation of mammalian neural crest cells, similar to recent reports in cancer. The authors determined that there were at least two paths to delamination and migration - one that occurs during S-phase of cell cycle and another during G2/M phase, and that the process of delamination is not restricted to cell fate. Finally, the authors showed that expression of Dlc1 may be used to identify cells in an intermediate state of EMT as well as their spatial location in the mouse embryo. The work will be of interest to developmental biologists, neurobiologists and cancer researchers.

    3. Reviewer #1 (Public Review):

      Summary:

      This describes the molecular identity of the intermediate status of cranial neural crest cells (NCCs) during the initial delamination process. Taking advantage of single-cell RNA seq, the authors identify new populations of cells during EMT characterized by a specific set of gene expressions, including Dlc1. Promigratory cranial NCCs differentiate through different trajectories depending on their cell cycle phases but converge into a common progenitor, then differentiate into mesenchymal cells expressing region-specific genes.

      Strengths:

      Single-cell RNA seq data convincingly support what the authors claim. This is the first time to identify intermediate states between premigratory and migratory cranial NCCs. Silencing one of the marker genes, Dlc1, reduces the migratory activity of cranial NCCs. These findings deepen our understanding of the mechanism of EMT in general.

      Comments on revised version:

      Weaknesses:

      None after substantial revision.

    4. Reviewer #2 (Public Review):

      Zhao et al., focus on mechanisms through which cells convert from epithelium to mesenchyme and become migratory. This phenomenon of epithelial-to-mesenchymal transition (EMT) occurs during both embryonic development and cancer progression. During cancer progression, EMT seemingly includes cells at intermediate states as defined by the combinatorial expression of epithelial and mesenchymal markers. But the importance of these markers and the role of these intermediate states remains unclear. Moreover, whether EMT during development also involves equivalent intermediate cell states is not known. To address this gap in knowledge, the authors devise a strategy to identify and characterize changes that an embryonic population of cells called the cranial neural crest undergo as they delaminate from the neuroepithelium and become a highly migratory population of mesenchymal cells that ultimately give rise to a broad range of derivatives.

      To isolate and study the neural crest, the authors use embryos collected at E8.5 from two transgenic mouse lines. Wnt1-Cre;RosaeYFP labels Wnt1-positive neuroepithelial cells in the dorsolateral neural plate, which includes pre-migratory neural crest that reside in the dorsal neuroectoderm and neural plate border before induction (as well as some other lineages). Mef2c-F10N-LacZ leverages a neural crest cell-specific enhancer of Mef2c to control LacZ expression in predominantly migratory neural crest. This dual genetic approach that allows the authors to distinguish and compare pre-migratory and migratory neural crest cells is a strength of the work.

      To assay for the differential expression of genes involved in the EMT and migration of cranial neural crest, the authors perform single cell RNA sequencing (scRNA-seq) using current methods. A strength is a large sample size per mouse line, and relatively high numbers of single cells analyzed. The authors identify six major cell/tissue types present in mouse E8.5 cranial tissues using known markers, which they then segregate into a cranial neural crest cluster using a well-reasoned bioinformatic strategy. The cranial neural crest cluster contains pre-migratory and migratory cells that they partition further into five subclusters and then characterize using the differential expression and combinatorial patterns of neural crest specifier genes, markers of pre-migratory neural crest, markers of early versus late migratory neural crest, markers of undifferentiated versus differentiated neural crest, tissue-specific markers, and region-specific markers. One weakness is that there is little attempt to map potential novel genes and/or pathways that also distinguish these clusters.

      The authors then go on to subdivide the five cranial neural crest subclusters into almost two dozen smaller subclusters, again using the combinatorial expression of known markers (e.g., neural crest genes, cell junction genes, and cell cycle genes). A weakness is that the marker analysis and accompanying interpretation of the results relies heavily on the purported roles of different genes as described in the published work of others, which potentially introduces some untested assumptions and a bit of hand-waving into the study. Moreover, the limited correlation between mRNA and protein abundance for cell cycle markers is well documented in the literature but the authors rely heavily on gene expression to determine cell cycle status. Even though the authors add a compelling Edu/pHH3 double-labeling experiment and cell cycle inhibition studies, the work would be strengthened by including some analysis of protein expression to see if the cell cycle correlations hold up. Nonetheless, the subcluster and cell cycle analyses lead the authors to conclude that there are a series of intermediate cell states between neural crest EMT and delamination, and that cell cycle regulation is a defining feature and necessary component of those states. These novel findings are generally well supported by the data.

      To test if there are spatiotemporal differences in the localization of neural crest cells during EMT in vivo, the authors apply a cutting-edge technique called signal amplification by exchange reaction for multiplexed fluorescent in situ hybridization (SABER-FISH), which they validate using standard in situ hybridization. The authors select specific marker genes that seem justified based on their scRNA-seq dataset, and they generate a series of convincing images and quantitative data that add valuable depth to the story.

      As a functional test of their hypothesis that one of the genes indicative of an EMT intermediate stage (i.e., Dlc1) is essential for neural crest migration, the authors use a lentivirus-mediated knockdown strategy. A strength is that the authors include appropriate scramble and cell death controls as part of their experimental design.

      The authors use Sox10 as a marker to count neural crest cells, but Sox10 may only label a subset of neural crest cells and thus some unaffected lineages may not have been counted. Although the data are persuasive, a second marker for counting neural crest cells following knockdown would make the analysis more robust.

      Overall, this is a first-rate study with many more strengths than weaknesses. The authors generate high quality data, and their interpretations are reasonable and balanced. Another strength is the writing, which is clear and well organized, and the figures (including supplemental), which are excellent and provide unambiguous visualization of some very complex data sets. The methods are state-of the art and are effectively executed, and they will be useful to the broader cell and developmental biology community. The work contains well-substantiated findings and supports the conclusion that EMT is a highly dynamic, multi-step process, which was previously thought to be more-or-less binary. Such findings will alter the way the field thinks about EMT in neural crest and the work will likely serve as an important example alongside cancer metastasis.

    5. Reviewer #3 (Public Review):

      Summary:

      Zhao et al. address the question of whether intermediate states of the epithelial-to-mesenchymal transition (EMT) exist in a natural developmental context as well as in cancer cells. This is important not only for our understanding of these developmental systems but also for their development as resources for new anti-cancer approaches. Guided by single-cell RNA sequencing analysis of delaminating mouse cranial neural crest cells, they identify two distinct populations with transcriptional signatures intermediate between neuroepithelial progenitors and migrating crest. Both clusters are also spatially intermediate and are actively cycling, with one in S-phase and one in G2/M. They show that blocking progression through S phase prior to the onset of delamination and knockdown of intermediate state marker Dlc1 both reduce the number of migratory cells that have completed EMT. Overall, the work provides a modern take and new insights into the classical developmental process of neural crest delamination.

      Strengths:

      • Deep analysis of the scRNAseq dataset revealed previously unappreciated cell populations intermediate between premigratory and migratory crest.<br /> • The observation that delaminating/intermediate neural crest cells appear to be in S or G2/M phase is interesting and worth reporting, though the ultimate significance remains unclear, given that they do not make distinct derivatives depending on their cycle state.<br /> • The authors employ new methods for multiplex spatial imaging to more accurately define their populations of interest and their relative positions.<br /> • The authors present evidence that intermediate state gene Dlc1 (a Rho GAP) is not just a marker but functionally required for neural crest delamination in mouse, as previously shown in chicken.

      Weaknesses:

      • Similar experiments involving blockade of cell cycle progression and Dlc1 dose manipulation were previously performed in chick models, as noted in the discussion. The newly-defined intermediate states give added context to the results, but they are not entirely novel.

    1. eLife assessment

      This valuable study further discloses the function of LRRK2 in BDNF-dependent synaptic processes in identifying postsynaptic actin cytoskeleton as a convergent site of LRRK2 pathophysiological activity. Multiple approaches in different cellular models provide mostly solid (but at times preliminary) evidence to support (many) of the conclusions, overall consistent with bioinformatics analyses covering previously published work. While an exciting start that should be pursued, examples are suggested by reviewers to add in additional experimentation to better support the expansive interpretation. The identification of mechanisms of LRRK2 action at the synapse is considered highly significant, as better knowledge in this regard may provide insight into why dopaminergic cells die with over-active LRRK2.

    2. Reviewer #1 (Public Review):

      Summary:

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity.

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and synaptosome preparations from the brain. They examine an LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function.

      Strengths:

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health?

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. Some effects of BDNF stimulation appear impaired in (some of the) LRRK2 knock-out scenarios (but not all). A phosphoproteomic analysis of PD mutant Knock-in mouse brain synaptosomes is included.

      Weaknesses:

      The data sets are disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is very light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are likely underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking.

      As examples to aid reader interpretation:

      (a) pS935 LRRK2 seems to go up at 5 minutes but goes down below pre-stimulation levels after (at times when BDNF-induced phosphorylation of other known targets remains very high). This is ignored in favour of discussion/investigation of initial increases, and the fact that BDNF does many things (which might indirectly contribute to initial but unsustained changes to pLRRK2) is not addressed.

      (b) Drebrin coIP itself looks like a very strong result, as does the increase after BDNF, but this was only demonstrated with a GFP over-expression construct despite several mouse and neuron models being employed elsewhere and available for copIP of endogenous LRRK2. Also, the coIP is only demonstrated in one direction. Similarly, the decrease in drebrin levels in mice is not assessed in the other model systems, coIP wasn't done, and mRNA transcripts are not quantified (even though others were). Drebrin phosphorylation state is not examined.

      (c) The large differences in the CRISPR KO cells in terms of BDNF responses are not seen in the primary neurons of KO mice, suggesting that other differences between the two might be responsible, rather than the lack of LRRK2 protein.

      (d) No validation of hits in the G2019S mutant phosphoproteomics, and no other assays related to the rest of the paper/conclusions. Drebrin phosphorylation is different but unvalidated, or related to previous data sets beyond some discussion. The fact that LRRK2 binding occurs, and increases with BDNF stimulation, should be compared to its phosphorylation status and the effects of the G2019S mutation.

    3. Reviewer #2 (Public Review):

      Taken as a whole, the data in the manuscript show that BDNF can regulate PD-associated kinase LRRK2 and that LRRK2 modifies the BDNF response. The chief strength is that the data provide a potential focal point for multiple observations across many labs. Since LRRK2 has emerged as a protein that is likely to be part of the pathology in both sporadic and LRRK2 PD, the findings will be of broad interest. At the same time, the data used to imply a causal throughline from BDNF to LRRK2 to synaptic function and actin cytoskeleton (as in the title) are mostly correlative and the presentation often extends beyond the data. This introduces unnecessary confusion. There are also many methodological details that are lacking or difficult to find. These issues can be addressed.

      (1) The writing/interpretation gets ahead of the data in places and this was confusing. For example, the abstract highlights prior work showing that Ser935 LRRK2 phosphorylation changes LRRK2 localization, and Figure 1 shows that BDNF rapidly increases LRRK2 phosphorylation at this site. Subsequent figures highlight effects at synapses or with synaptic proteins. So is the assumption that LRRK2 is recruited to (or away from) synapses in response to BDNF? Figure 2H shows that LRRK2-drebrin interactions are enhanced in response to BDNF in retinoic acid-treated SH-SY5Y cells, but are synapses generated in these preps? How similar are these preps to the mouse and human cortical or mouse striatal neurons discussed in other parts of the paper (would it be anticipated that BDNF act similarly?) and how valid are SH-SY5Y cells as a model for identifying synaptic proteins? Is drebrin localization to synapses (or its presence in synaptosomes) modified by BDNF treatment +/- LRRK2? Or do LRRK2 levels in synaptosomes change in response to BDNF? The presentation requires re-writing to stay within the constraints of the data or additional data should be added to more completely back up the logic.

      (2) The experiments make use of multiple different kinds of preps. This makes it difficult at times to follow and interpret some of the experiments, and it would be of great benefit to more assertively insert "mouse" or "human" and cell type (cortical, glutamatergic, striatal, gabaergic) etc.

      (3) Although BDNF induces quantitatively lower levels of ERK or Akt phosphorylation in LRRK2KO preps based on the graphs (Figure 4B, D), the western blot data in Figure 4C make clear that BDNF does not need LRRK2 to mediate either ERK or Akt activation in mouse cortical neurons and in 4A, ERK in SH-SY5Y cells. The presentation of the data in the results (and echoed in the discussion) writes of a "remarkably weaker response". The data in the blots demand more nuance. It seems that LRRK2 may potentiate a response to BDNF that in neurons is independent of LRRK2 kinase activity (as noted). This is more of a point of interpretation, but the words do not match the images.

      (4) Figure 4F/G shows an increase in PSD95 puncta per unit length in response to BDNF in mouse cortical neurons. The data do not show spine induction/dendritic spine density/or spine morphogenesis as suggested in the accompanying text (page 8). Since the neurons are filled/express gfp, spine density could be added or spines having PSD95 puncta. However, the data as reported would be expected to reflect spine and shaft PSDs and could also include some nonsynaptic sites.

      (5) Experimental details are missing that are needed to fully interpret the data. There are no electron microscopy methods outside of the figure legend. And for this and most other microscopy-based data, there are few to no descriptions of what cells/sites were sampled, how many sites were sampled, and how regions/cells were chosen. For some experiments (like Figure 5D), some detail is provided in the legend (20 segments from each mouse), but it is not clear how many neurons this represents, where in the striatum these neurons reside, etc. For confocal z-stacks, how thick are the optical sections and how thick is the stack? The methods suggest that data were analyzed as collapsed projections, but they cite Imaris, which usually uses volumes, so this is confusing. The guide (sgRNA) sequences that were used should be included. There is no mention of sex as a biological variable.

      (6) For Figures 1F, G, and E, how many experimental replicates are represented by blots that are shown? Graphs/statistics could be added to the supplement. For 1C and 1I, the ANOVA p-value should be added in the legend (in addition to the post hoc value provided).

      (7) Why choose 15 minutes of BDNF exposure for the mass spec experiments when the kinetics in Figure 1 show a peak at 5 mins?

      (8) The schematic in Figure 6A suggests that iPSCs were plated, differentiated, and cultured until about day 70 when they were used for recordings. But the methods suggest they were differentiated and then cryopreserved at day 30, and then replated and cultured for 40 more days. Please clarify if day 70 reflects time after re-plating (30+70) or total time in culture (70). If the latter, please add some notes about re-differentiation, etc.

      (9) When Figures 6B and 6C are compared it appears that mEPSC frequency may increase earlier in the LRRK2KO preps than in the WT preps since the values appear to be similar to WT + BDNF. In this light, BDNF treatment may have reached a ceiling in the LRRK2KO neurons.

      (10) Schematic data in Figures 5A and C and Figures 5B and E are too small to read/see the data.

    1. eLife assessment

      This is a valuable study of the spatial organization of innate granulomas following Chromobacterium violaceum infection and the expression of CC and CXC chemokines in the granuloma at several time points following infection. There is a wealth of information to be gained from this study. However, the analysis of these granulomas is incomplete, with room for orthogonal validation of some of the key findings with additional animals (using ISH or IHC), in addition to a more quantitative analysis of some of the currently more qualitative conclusions.

    2. Reviewer #1 (Public Review):

      Amason et al. investigated the formation of granulomas in response to Chromobacterium violaceum infection, aiming to uncover the cellular mechanisms governing the granuloma response. They identify spatiotemporal gene expression of chemokines and receptors associated with the formation and clearance of granulomas, with a specific focus on those involved in immune trafficking. By analyzing the presence or absence of chemokine/receptor RNA expression, they infer the importance of immune cells in resolving infection. Despite observing increased expression of neutrophil-recruiting chemokines, treatment with reparixin (an inhibitor of CXCR1 and CXCR2) did not inhibit neutrophil recruitment during infection. Focusing on monocyte trafficking, they found that CCR2 knockout mice infected with C. violaceum were unable to form granulomas, ultimately succumbing to infection.

      The spatial transcriptomics data presented in the figures could be considered a valuable resource if shared, with the potential for improved and clarified analyses. The primary conclusion of the paper, that C. violaceum infection in the liver cannot be contained without macrophages, would benefit from clarification.

      While the spatial transcriptomic data generated in the figures are interesting and valuable, they could benefit from additional information. The manual selection of regions of granulomas for analysis could use additional context - was the rest of the liver not sequenced, or excluded for other reasons? Including a healthy liver in the analysis could serve as a control for any lasting effects at the final time point of 21 days. Providing more context for the scalebars throughout the spatial analyses, such as whether the data are raw counts or normalized based on the number of reads per spatial spot, would be helpful for interpretation, as changes in expression could signal changes in the numbers of cells or changes in the gene expression of cells.

      In Figure 4, qualitative measurements are valuable, but having an idea of the raw data for a few of the pursued chemokines/receptors would aid interpretation. It would also be beneficial to clarify whether the reported values are across all clusters and consider focusing on clusters with the greatest change in expression. Figures 5E and F would benefit from clarification regarding the x-axis units and whether the expression levels are summed across all clusters for each time point. Additionally, information on the sequencing depth of the samples would be helpful, particularly as shallow sequencing of RNA can result in poor capture of low-expression transcripts.

      Regarding the conclusion of the essentiality of macrophages in granuloma formation, it may be prudent to further investigate the role of macrophages versus CCR2. Analyzing total cell counts in the liver after infection could provide insight into whether the decrease in the fraction of macrophages is due to decreased numbers or infiltration of other cell types. Consideration of experiments deleting macrophages directly, instead of CCR2, could provide more definitive evidence of the necessity of macrophage migration in containing infections.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Amason et al employ spatial transcriptomics and intervention studies to probe the spatial and temporal dynamics of chemokines and their receptors and their influence on cellular dynamics in C. violaceum granulomas. As a result of their spatial transcriptomic analysis, the authors narrow in on the contribution of neutrophil- and monocyte-recruiting pathways to host response. This results in the observation that monocyte recruitment is critical for granuloma formation and infection control, while neutrophil recruitment via CXCR2 may be dispensable.

      Strengths:

      Since C. violaceum is a self-limiting granulomatous infection, it makes an excellent case study for 'successful' granulomatous inflammation. This stands in contrast to chronic, unproductive granulomas that can occur during M. tuberculosis infection, sarcoidosis, and other granulomatous conditions, infectious or otherwise. Given the short duration of C. violaceum infection, this study specifically highlights the importance of innate immune responses in granulomas.

      Another strength of this study is the temporal analysis. This proves to be important when considering the spatial distribution and timing of cellular recruitment. For example, the authors observe that the intensity and distribution of neutrophil- and monocyte-recruiting chemokines vary substantially across infection time and correlate well with their previous study of cellular dynamics in C. violaceum granulomas.

      The intervention studies done in the last part of the paper bolster the relevance of the authors' focus on chemokines. The authors provide important negative data demonstrating the null effect of CXCR1/2 inhibition on neutrophil recruitment during C. violaceum infection. That said, the authors' difficulty with solubilizing reparixin in PBS is an important technical consideration given the negative result. On the other hand, monocyte recruitment via CCR2 proves to be indispensable for granuloma formation and infection control. I would hesitate to agree with the authors' interpretation that their data proves macrophages are serving as a physical barrier from the uninvolved liver. It is possible and likely that they are contributing to bacterial control through direct immunological activity and not simply as a structural barrier.

      Weaknesses:

      There are several shortcomings that limit the impact of this study. The first is that the cohort size is very limited. While the transcriptomic data is rich, the authors analyze just one tissue from one animal per time point. This assumes that the selected individual will have a representative lesion and prevents any analysis of inter-individual variability. Granulomas in other infectious diseases, such as schistosomiasis and tuberculosis, are very heterogeneous, both between and within individuals. It will be difficult to assert how broadly generalizable the transcriptomic features are to other C. violaceum granulomas. Furthermore, this undermines any opportunity for statistical testing of features between time points, limiting the potential value of the temporal data.

      Another caveat to these data is the limited or incompletely informative data analysis. The authors use Visium in a more targeted manner to interrogate certain chemokines and cytokines. While this is a great biological avenue, it would be beneficial to see more general analyses considering Visum captures the entire transcriptome. Some important questions that are left unanswered from this study are:

      What major genes defined each spatial cluster?

      What were the top differentially expressed genes across time points of infection?

      Did the authors choose to focus on chemokines/receptors purely from a hypothesis perspective or did chemokines represent a major signature in the transcriptomic differences across time points?

      In addition to the absence of deep characterization of the spatial transcriptomic data, the study lacks sufficient quantitative analysis to back up the authors' qualitative assessments. Furthermore, the authors are underutilizing the spatial information provided by Visium with no spatial analysis conducted to quantify the patterning of expression patterns or spatial correlation between factors.

      Impact:

      The author's analysis helps highlight the chemokine profiles of protective, yet host protective granulomas. As the authors comment on in their discussion, these findings have important similarities and differences with other notable granulomatous conditions, such as tuberculosis. Beyond the relevance to C. violaceum infection, these data can help inform studies of other types of granulomas and hone candidate strategies for host-directed therapy strategies.

    1. eLife assessment

      This valuable study by Bartas and colleagues examined how patterns of monosynaptic input to specific cell types in the ventral tegmental area are altered by drugs of abuse. The authors applied a dimensionality reduction approach (principal component analysis) and showed that various drugs of abuse, and somewhat surprisingly the anesthesia alone (ketamine/xylasin), caused changes in the distribution of inputs labeled by the transsynaptic rabies virus. While there are some issues to be addressed, the evidence supporting the conclusions is overall solid, and provides information that is of value to the field, as well as a cautionary note on the interpretation of rabies virus-based tracing experiments.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors distinguished afferent inputs to different cell populations in the VTA using dimensionality reduction approaches and found significantly distinct patterns between normal and drug treatment conditions. They also demonstrated negative correlations of the inputs induced by drugs with gene expression of ion channels or proteins involved in synaptic transmission and demonstrated the knockdown of one of the voltage-gated calcium ion channels caused decreased inputs.

      Weaknesses:

      (1) For quantifications of brain regions in this study, boundaries were based on the Franklin-Paxinos (FP) atlas according to previous studies (Beier KT et al 2015, Beier KT et al 2019). It has been reported significant discrepancies exist between the anatomical labels on the FP atlas and the Allen Brain Atlas (ref: Chon U et al., Nat Commun 2019). Although a summary of conversion is provided as a sheet, the authors need to describe how consistent or different the brain boundaries they defined in the manuscript with Allen Brain Atlas by adding histology images. Also, I wonder how reliable the annotations were for over a hundred of animals with manual quantification. The authors should briefly explain it rather than citing previous studies in the Material and Methods Section.

      (2) Regarding the ellipsoids in the PC, although it's written in the manuscript that "Ellipsoids were centered at the average coordinate of a condition and stretched one standard deviation along the primary and secondary axes", it's intuitively hard to understand in some figures such as Figure 2O, P and Figure S1. The authors need to make their data analysis methods more accessible by providing source code to the public.

      (3) In histology images (Figure 1B and 3K), the authors need to add dashed lines or arrows to guide the reader's attention.

      (4) In Figure 2A and G, apparently there are significant differences in other brain regions such as NAcMed or PBN. If they are also statistically significant, the authors should note them as well and draw asterisks(*).

      (5) In Figure 2N about the spatial distribution of starter cells, the authors need to add histology images for each experimental condition (i.e. saline, fluoxetine, cocaine, methamphetamine, amphetamine, nicotine, and morphine) as supplement figures.

      (6) In the manuscript, it is necessary to explain why Cacna1e was selected among other calcium ion channels.

    3. Reviewer #2 (Public Review):

      The application of rabies virus (RabV)-mediated transsynaptic tracing has been widely utilized for mapping cell-type-specific neural connectivities and examining potential modifications in response to biological phenomena or pharmacological interventions. Despite the predominant focus of studies on quantifying and analyzing labeling patterns within individual brain regions based on labeling abundance, such an approach may inadvertently overlook systemic alterations. There exists a considerable opportunity to integrate RabV tracing data with the global connectivity patterns and the transcriptomic signatures of labeled brain regions. In the present study, the authors take an important step towards achieving these objectives.

      Specifically, the authors conducted an intensive reanalysis of a previously generated large dataset of RabV tracing to the ventral tegmental area (VTA) using dimension reduction methods such as PCA and UMPA. This reaffirmed the authors's earlier conclusion that different cell types in the VTA, namely dopamine neurons (DA) and GABAergic neurons, exhibit quantitatively distinct input patterns, and a single dose of addictive drugs, such as cocaine and morphine, induced altered labeling patterns. Additionally, the authors illustrate that distinct axes of PCA can discriminate experimental variations, such as minor differences in the injection site of viral tracers, from bona fide alternations in labeling patterns caused by drugs of abuse. While the specific mechanisms underlying altered labeling in most brain regions remain unclear, whether involving synaptic strength, synaptic numbers, pre-synaptic activities, or other factors, the present study underscores the efficacy of an informatics approach in extracting more comprehensive information from the RabV-based circuit mapping data.

      Moreover, the authors showcased the utility of their previously devised bulk gene expression patterns inferred by the Allen Gene Expression Atlas (AGEA) and "projection portrait" derived from bulk axon mapping data sourced from the Allen Mouse Brain Connectivity Atlas. The utilization of such bulk data rests upon several limitations. For instance, the collection of axon mapping data involves an arbitrary selection of both cell type-specific and non-specific data, which might overlook crucial presynaptic partners, and often includes contamination from neighboring undesired brain regions. Concerns arise regarding the quantitativeness of AGEA, which may also include the potential oversight of key presynaptic partners. Nevertheless, the authors conscientiously acknowledged these potential limitations associated with the dataset.

      Notably, building on the observation of a positive correlation between the basal expression levels of Ca2+ channels and the extent of drug-induced changes in RabV labeling patterns, the authors conducted a CRISPRi-based knockdown of a single Ca2+ channel gene. This intervention resulted in a reduction of RabV labeling, supporting that the observed gene expression patterns have causality in RabV labeling efficiency. While a more nuanced discussion is necessary for interpreting this result (see below), overall I commend the authors for their efforts to leverage the existing dataset in a more meaningful way. This endeavor has the potential to contribute significantly to our understanding of the mechanisms underlying alterations in RabV labeling induced by drugs of abuse.

      Finally, drawing upon the aforementioned reanalysis of previous data, the authors underscored that a single administration of ketamine/xylazine anesthesia could induce enduring modifications in RabV labeling patterns for VTA DA neurons, specifically those projecting to the nucleus accumbens and amygdala. Given the potential impact of such alterations on motivational behaviors at a broader level, I fully agree that prudent consideration is warranted when employing ketamine/xylazine for the investigation of motivational behaviors in mice.

      Specific Points:

      (1) Beyond advancements in bioinformatics, readers may find it insightful to explore whether the PCA/UMPA-based approach yields novel biological insights. For example, the authors are encouraged to discuss more functional implications of PBN and LH in the context of drugs of abuse, as their labeling abundance could elucidate the PC2 axis in Fig. 2M.

      2) While I appreciate the experimental data on Cacna1e knockdown, I am unclear about the rationale behind specifically focusing on Cacna1e. The logic behind the statement, "This means that expression of this gene is not inhibitory towards RABV transmission," is also unclear. Loss-of-function experiments only signify the necessity or permissive functions of a gene. In this context, Cacna1e expression levels are required for efficient RabV labeling, but this neither supports nor excludes the possibility that this gene expression instructively suppresses RabV labeling/transmission, which could be assessed through gain-of-function experiments.

    4. Reviewer #3 (Public Review):

      Summary:

      Authors mapped monosynaptic inputs to dopamine, GABA, and glutamate neurons in VTA under different anesthesia methods, and under drugs (cocaine, morphine, methamphetamine, amphetamine, nicotine, fluoxetine). They found that input patterns under different conditions are separated, and identified some key brain areas to contribute to such separation. They also searched a database for gene expression patterns that are common across input brain areas with some changes by anesthesia or drug administration.

      Strengths:

      The whole-brain approach to address drug effects is appealing and their conclusion is clear. The methodology and motivation are clearly explained.

      Weaknesses:

      While gene expression analyses may not be related to their findings on the anatomical effects of drugs, this will be a nice starting point for follow-up studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To resolve and further test the claim that TBI did not induce cell proliferation:

      How many brains did they analyse? Sample sizes must be provided in Figure S1.

      As per reviewer’s suggestion, we removed one of the unsupported claims shown in Figure S1. The original Figure S1 is shown below with the sample number added.

      Author response image 1.

      The authors could either improve the TBI method or the detection of cells in S-phase, mitosis or cycling. They could use PCNA-GFP or BrdU, EdU or FUCCI instead and at least provide evidence that they can detect cells in S-phase in intact brains. Timing is critical (ie cell cycle is longer than in larvae) so multiple time points should be tested. Or they could use pH3 but test more time points and rather large sample sizes. If they are not able to provide any evidence, then their lack of evidence is no evidence. The authors should consider removing pH3 and PCNA-GFP related claims instead.

      We have removed pH3 and PCNA-GFP related results and claims.

      Other unsupported claims:

      Figure 2A-C is not very clear what they are showing, but it is not evidence of astrocyte hypertrophy. It does not have cellular resolution and does not show the cell size, membranes, nor number

      (1) We have avoided the term “hypertrophy” and changed the description throughout the text to “astrocyte swelling”.

      (2) Images in the resolution of Figure 2E and 2F were able to show the enlarged soma of astrocytes, suggesting swelling.

      What is the point of using RedStinger in Figure 2?

      We used RedStinger to label the astrocyte nuclei.

      Figure S5 is not convincing, as anti-Pvr does not look localised to specific cells. Instead, it looks like uniform background. If they really think the antibody is localised, they should do double stainings with cell type specific markers. If the antibody does not work, then remove the data and the claim. They could test with RNAi knock-down in specific cell types and qRT-PCR which cells express pvr instead.

      We have removed the claim that “Pvr is predominantly expressed in astrocytes” and changed the description to “Immunostainings using the anti-Pvr antibodies revealed that endogenous Pvr expression is low in the control brains, yet significantly enhanced upon TBI. Reducing Pvr expression, but not Pvr overexpression, in astrocytes blocked the TBI-induced increase of Pvr expression (Figure S5)”.

      Figure S6: it is unclear what they are trying to show, but these data do not demonstrate that astrocytes do not engulf debris after TBI, as there isn't sufficient cellular resolution to make such claim. Firstly, they analyse one single cell per treatment. Secondly, the cell projections are not visible in these images, and therefore engulfment cannot be seen. The authors could remove the claim or visualise whether astrocytes phagocytose debris or not either using clones or with TEM.

      We agree with the reviewer that our images do not have the resolution to make this claim. We have removed Figure S6 and corresponding text description.

      On statistics:

      The statistical analysis needs revising as it is wrong in multiple places, eg Fig.1F,G,H; Figure 2D. They only use Student t-tests. These can only be used when data are continuous, distributed uniformly and only two samples are compared; if more than 2 samples, distributed uniformly, then use One-Way ANOVA and multiple comparisons tests. If data are categorical, use Chi-Square.

      We have double checked and compared the experimental group to the control separately using the Student t-tests throughout the study.

      Other points for improvement:

      Figure 2E,F: what are GFP puncta and how are they counted?

      I. Each GFP puncta looks like a little circle, likely representing a functional or dysfunctional structure. The biology of the GFP puncta is currently unkonwn.

      II. We used the ImageJ to quantify the GFP puncta:

      (1) Image- type-8 bits

      (2) Process-subtract background (Rolling ball radio:10)

      (3) Image-Adjust-Threshold-Apply

      (4) Analyze-Measure-set measurements-choose “area” “limit to threshold”-OK

      (5) Count the puncta number in the choosing area.

      (6) Get the number of puncta per square micron.

      All genotypes must be provided (including for MARCM clones), currently they are not.

      We have shown the full genotype in the corresponding legend.

      Figure 7O,P indicate on figure that these are RNAi

      We have revised the labels to RNAi in Figure 7O,P.

      Reviewer #2 (Recommendations For The Authors):

      Several typos are present in the text.

      We have read the manuscript carefully and corrected typos throughout.

    2. eLife assessment

      This study represents a valuable finding on the neuron-glia communication and glial responses to traumatic brain injury (TBI). The data supporting the authors' conclusions on TBI analysis, RNA-seq on FACS sorted astrocytes, genetic analyses on Pvr-JNK/MMP1 are solid. However, cellular aspects of the response to TBI, statistical analysis, and molecular links between Pvr-AP1 are incomplete, which could be further strengthened in the future by more rigorous analyses.

    3. Reviewer #1 (Public Review):

      Li et al report that upon traumatic brain injury (TBI), Pvr signalling in astrocytes activates the JNK pathway and up-regulates the expression of the well-known JNK target MMP1. The FACS sort astrocytes, and carry out RNAseq analysis, which identifies pvr as well as genes of the JNK pathway as particularly up-regulated after TBI. They use conventional genetics loss of function, gain of function and epistasis analysis with and without TBI to verify the involvement of the JNK-MMP1 signalling pathway downstream of PVR. They also show that blocking endocytosis prolongs the involvement of this pathway in the TBI response.

      The strengths are that multiple experiments are used to demonstrate that TBI in their hands damaged the BBB, induced apoptosis and increased MMP1 levels. The RNAseq analysis on FACS sorted astrocytes is nice and will be valuable to scientists beyond the confines of this paper. The functional genetic analysis is conventional, yet sound, and supports claims of JNK and MMP1 functioning downstream of Pvr in the TBI context.

      For this revised version the authors have removed all the unsupported claims. This renders their remaining claims more solid. However, it has resulted in the loss of important cellular aspects of the response to TBI, limiting the scope and value of the work.

      The main weakness is that novelty and insight are both rather limited. Others had previously published that both JNK signalling and MMP1 were activated upon injury, in multiple contexts (as well as the articles cited by the authors, they should also see Losada-Perez et al 2021). That Pvr can regulate JNK signalling was also known (Ishimaru et al 2004). The authors claim that the novelty was investigating injury responses in astrocytes in Drosophila. However, others had investigated injury responses by astrocytes in Drosophila before. It had been previously shown that astrocytes - defined as the Prospero+ neuropile glia, and also sharing evolutionary features with mammalian NG2 glia - respond to injury both in larval ventral nerve cords and in adult brains, where they proliferate regenerating glia and induce a neurogenic response (Kato et al 2011; Losada-Perez et al 2016; Harrison et al 2021; Simoes et al 2022). The authors argue that the novelty of the work is the investigation of the response of astrocytes to TBI. However, this is of somewhat limited scope. The authors mention that MMP1 regulates tissue remodelling, the inflammatory process and cancer. Exploring these functions further would have been an interesting addition, but the authors did not investigate what consequences the up-regulation of MMP1 after injury has in repair or regeneration processes.

      The statistical analysis is incorrect in places, and this could affect the validity of some claims.

      Altogether, this is an interesting and valuable addition to the repertoire of articles investigating neuron-glia communication and glial responses to injury in the Drosophila central nervous system (CNS). It is good and important to see this research area in Drosophila grow. This community together is building a compelling case for using Drosophila and its unparalleled powerful genetics to investigate nervous system injury, regeneration and repair, with important implications. Thus, this paper will be of interest to scientists investigating injury responses in the CNS using Drosophila, other model organisms (eg mice, fish) and humans.

    4. Reviewer #3 (Public Review):

      In this study, authors used the Drosophila model to characterize molecular details underlying traumatic brain injury (TBI). Authors used the transcriptomic analysis of astrocytes collected by FACS sorting of cells derived from Drosophila heads following brain injury and identified upregulation of multiple genes, such as Pvr receptor, Jun, Fos, and MMP1. Additional studies identified that Pvr positively activates AP-1 transciption factor (TF) complex consisting of Jun and Fos, of which activation leads to the induction of MMP1. Finally, authors found that disruption of endocytosis and endocytotic trafficking facilitates Pvr signaling and subsequently leads to induction of AP-1 and MMP1.

      Overall, this study provides important clues to understanding molecular mechanisms underlying TBI. The identified molecules linked to TBI in astrocytes could be potential targets for developing effective therapeutics. The obtained data from transcriptional profiling of astrocytes will be useful for future follow-up studies. The manuscript is well-organized and easy to read.

      However, the connection suggested by the authors between Pvr and AP-1, potentially mediated through the JNK pathway, lacks strong experimental support in my view. It's important to recognize that AP-1 activity is influenced by multiple upstream signaling pathways, not just the JNK pathway, which is the most well-characterized among them. Therefore, assuming that AP-1 transcriptional activity solely reflects the activity of the JNK pathway without additional direct evidence is unwarranted. To strengthen their argument, the study could benefit from direct evidence implicating the JNK pathway in linking Pvr to AP-1. This could be achieved through genetic studies involving mutants or transgenes targeting key components of the JNK pathway, such as Bsk and Hep, the Drosophila homologues of JNK and JNKK, respectively. Alternatively, employing p-JNK antibody-based techniques like Western blotting, while considering the potential challenges associated with p-JNK immunohistochemistry, could provide further validation. This important criticism regarding the molecular link between Pvr and AP-1 has been overlooked.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the roles of the axon growth regulator Sema7a in the formation of peripheral sensory circuits in the lateral line system of zebrafish. The evidence supporting the claims of the authors is solid, although further work directly testing the roles of different sema7a isoforms would strengthen the analysis. The work will be of interest to developmental neuroscientists studying circuit formation.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, an autocrine role for Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes.

      Finally, critical controls are absent from the overexpression paradigm.

      Comments: Thank you for your valuable comments. We have analyzed the hair cell scRNA transcriptome data of zebrafish neuromasts from published works and have not identified known expression of receptors of the Sema7A protein, particularly PlexinC1 and Integrin β1 molecules (reference 4 and 15) in hair cells. This result suggests that the Sema7A protein molecule, either secreted or membrane-bound, does not possess its cognate receptor to elicit an autocrine function on the hair cells. Moreover, the GPI-anchored Sema7A lacks a cytosolic domain. So it is unlikely that Sema7A signaling directly induces the formation of presynaptic ribbons. We propose that the decrease in average number and area of synaptic aggregates likely reflects decreased stability of the synaptic structures owing to lack of contact between the sensory axons and the hair cells, which has been identified in zebrafish neuromasts (reference 38).

      Thank you for pointing missing critical control experiments. Additional control experiments (lines 333-346) with a new figure (Figure 5) have been added.

      These issues weaken the claims made by the authors including the statement that they have identified differential roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively.

      Comments: We have rephrased our statement and argue in lines 428-430 that our experiments “suggest a potential mechanism for hair cell innervation in which a local Sema7Asec diffusive cue likely consolidates the sensory arbors at the hair cell cluster and the membrane-anchored Sema7A-GPI molecule guides microcircuit topology and synapse assembly.”

      The manuscript itself would benefit from the inclusion of details in the text to help the reader interpret the figures, tools, data, and analysis.

      Comments: We have made significant revisions to the text and figures to improve clarity and consistency of the manuscript.

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below).

      Comments: Thank you for making this point. To investigate the presence of both sema7a transcripts in the hair cells of the lateral-line neuromasts, we used the Tg(myo6b:actb1EGFP) transgenic fish to capture the labeled hair cells by fluorescence-activated cell sorting (FACS) and isolated total RNA. Using transcript specific DNA oligonucleotide primers, we have identified the presence of both sema7a transcript variants in the hair cell of the neuromast. Even though we have not developed transcript specific knockout animals, we speculate that the presence of both transcript variants in the hair cell implies that they function in distinct fashion. We have changed our interpretation in lines 32-34 to “Our findings propose that Sema7A likely functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development.”

      In future we will utilize the CRISPR/Cas9 technique to target the unique C-terminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us determine the role of the membrane-bound Sema7AGPI molecule as well as the Sema7Asec in sensory arborization and synaptic assembly.

      In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations.

      Comments: Thank you for this insightful comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory-axon collective behaves around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axons associates with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Reviewer #3 (Public Review):

      Summary:

      This study demonstrates that the axon guidance molecule Sema7a patterns the innervation of hair cells in the neuromasts of the zebrafish lateral line, as revealed by quantifying gain- and loss-of function effects on the three-dimensional topology of sensory axon arbors over developmental time. Alternative splicing can produce either a diffusible or membrane-bound form of Sema7a, which is increasingly localized to the basolateral pole of hair cells as they develop (Figure 1). In sema7a mutant zebrafish, sensory axon arbors still grow to the neuromast, but they do not form the same arborization patterns as in controls, with many arbors overextending, curving less, and forming fewer loops even as they lengthen (Figure 2,3). These phenotypes only become significant later in development, indicating that Sema7a functions to pattern local microcircuitry, not the gross wiring pattern. Further, upon ectopic expression of the diffusible form of Sema7a, sensory axons grow towards the Sema7a source (Figure 4). The data also show changes in the synapses that form when mutant terminals contact hair cells, evidenced by significantly smaller pre- and post-synaptic punctae (Figure 5). Finally, by replotting single cell RNA-sequencing data (Figure 6), the authors show that several other potential cues are also produced by hair cells and might explain why the sema7a phenotype does not reflect a change in growth towards the neuromast. In summary, the data strongly indicate that Sema7a plays a role in shaping connectivity within the neuromast.

      Strengths:

      The main strength of this study is the sophisticated analysis that was used to demonstrate fine-level effects on connectivity. Rather than asking "did the axon reach its target?", the authors asked "how does the axon behave within the target?". This type of deep analysis is much more powerful than what is typical for the field and should be done more often. The breadth of analysis is also impressive, in that axon arborization patterns and synaptic connectivity were examined at 3 stages of development and in three-dimensions.

      Weaknesses:

      The main weakness is that the data do not cleanly distinguish between activities for the secreted and membrane-bound forms of Sema7a, which the authors speculate may influence axon growth and synapse formation respectively. The authors do not overstate the claims, but it would have been nice to see some additional experimentation along these lines, such as the effects of overexpressing the membrane-bound form,

      Comments: We have accepted this useful suggestion. In lines 333-346 and in Figure 5 we have demonstrated the impact of overexpressing the membrane-bound transcript variant on arborization pattern of the sensory axons.

      Some analysis of the distance over which the "diffusible" form of Sema7a might act (many secreted ligands are not in fact all that diffusible), or

      Comments: We have reported this in lines 311-317 and in Figure 4F,G.

      Some live-imaging of axons before they reach the target (predicted to be the same in control and mutants) and then within the target (predicted to be different).

      Comments: We have accepted this useful suggestion. We demonstrate the dynamics of the sensory arbors that are attracted to an ectopic Sema7Asec source in lines 325-332, Figure 4I,J; Figure 4—figure supplement 2A, and Videos 13-16.

      Clearly, although the gain-of-function studies show that Sema7a can act at a distance, other cues are sufficient. Although the lack of a phenotype could be due to compensation, it is also possible that Sema7a does not actually act in a diffusible manner within its natural context. Overall, the data support the authors' carefully worded conclusions. While certain ideas are put forward as possibilities, the authors recognize that more work is needed. The main shortcoming is that the study does not actually distinguish between the effects of the two forms of Sema7a, which are predicted but not actually shown to be either diffusible or membrane linked (the membrane linkage can be cleaved). Although the study starts by presenting the splice forms, there is no description of when and where each splice form is transcribed.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the absence of a signal.

      We therefore utilized fluorescence-activated cell sorting (FACS) to capture the labeled hair cells and isolated total RNA to perform RT-PCR using transcript specific DNA oligonucleotide primers. We identified the presence of both the secreted and the membrane-bound transcripts at four-days-old neuromasts (lines 80-84, Figure 1B-D).

      Additionally, since the mutants are predicted to disrupt both forms, it is a bit difficult to disentangle the synaptic phenotype from the earlier changes in circuit topology - perhaps the change at the level of the synapse is secondary to the change in topology.

      Comments: Thank you for the insightful suggestion. We have analyzed the relationship between the sensory arbor network topology and the distribution of postsynaptic structures (lines 384-403, Figure 6G-I). We identified that the distribution of the postsynaptic aggregates is closely associated with the topological attributes of the sensory circuit. We further clarify the potential origin of disrupted synaptic assemblies in sema7a-/- mutants in lines 380-382 and lines 417-420.

      Further, the authors do not provide any data supporting the idea that the membrane bound form of Sema7a acts only locally. Without these kinds of data, the authors are unable to attribute activities to either form.

      Comments: We have accepted this useful suggestion and have prepared the Figure 5 with the necessary details.

      The main impact on the field will be the nature of the analysis. The field of axon guidance benefits from this kind of robust quantification of growing axon trajectories, versus their ability to actually reach a target. This study highlights the value of more careful analysis and as a result, makes the point that circuit assembly is not just a matter of painting out paths using chemoattractants and repellants, but is also about how axons respond to local cues. The study also points to the likely importance of alternative splice forms and to the complex functions that can be achieved using different forms of the same ligand.

      Reviewer #4 (Public Review):

      Summary:

      The work by Dasgupta et al identifies Sema7a as a novel guidance molecule in hair cell sensory systems. The authors use the both genetic and imaging power of the zebrafish lateralline system for their research. Based on expression data and immunohistochemistry experiments, the authors demonstrate that Sema7a is present in lateral line hair cells. The authors then examine a sema7a mutant. In this mutant, Sema7a proteins levels are nearly eliminated. Importantly, the authors show that when Sema7a is absent, afferent terminals show aberrant projections and fewer contacts with hair cells. Lastly the authors show that ectopic expression of the secreted form of Sema7a is sufficient to recruit aberrant terminals to non-hair cell targets. The sema7a innervation defects are well quantified. Overall, the paper is extremely well written and easy to follow.

      Strengths:

      (1) The axon guidance phenotypes in sema7a mutants are novel, striking and thoroughly quantified.

      (2) By combining both loss of function sema7a mutants and ectopic expression of the secreted form of Sema7a the authors demonstrate the Sema7a is both necessary and sufficient to guide sensory axons

      Weaknesses:

      (1) Control. There should be an uninjected heatshock control to ensure that heatshock itself does not cause sensory afferents to form aberrant arbors. This control would help support the hypothesis that exogenously expressed Sema7a (via a heatshock driven promoter) is sufficient to attract afferent arbors.

      Comments: Thank you for the suggestion. We have added the uninjected heatshock control experiment in Figure 5 and described experimental details in the text, lines 343-345.

      (2) Synapse labeling. The numbers obtained for postsynaptic labeling in controls do not match up with the published literature - they are quite low. Although there are clear differences in postsynaptic counts between sema7a mutants and controls, it is worrying that the numbers are so low in controls. In addition, the authors do not stain for complete synapses (pre- and post-synapses together). This staining is critical to understand how Sema7a impacts synapse formation.

      Comments: Thank you for raising this issue. We believe the low average numbers of the postsynaptic punctae in control neuromasts arise from lack of formation of postsynaptic aggregates beneath the immature hair cells, which are abundant in early stages of neuromast maturation. We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We do not think analyzing the complete synapse structure will add much to our understanding of how Sema7A influence synapse formation and maintenance.

      (3) Hair cell counts. The authors need to provide quantification of hair cell counts per neuromast in mutant and control animals. If the counts are different, certain quantification may need to be normalized.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Developmental delay. It is possible that loss of Sema7a simply delays development. The latest stage examined was 4 dpf, an age that is not quite mature in control animals. The authors could look at a later age, such as 6 dpf to see if the phenotypes persist or recover.

      Comments: The homozygous sema7a-/- mutants are unviable and die at 6 dpf. We therefore restricted our analysis till 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggest that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      Issue 1: One of the most interesting conclusions in this manuscript is the function of the GPIanchored vs. secreted form of Sema7a in axon structure and synapse formation. In lines 357360 of the discussion (for example) the authors state that they have shown that the GPIanchored form of Sema7a is responsible for contact-mediated synapse formation while the secreted form functions as a chemoattractant for axon arbor structure. "We have discovered dual modes of Sema7A function in vivo: the chemoattractive diffusible form is sufficient to guide the sensory arbors toward their target, whereas the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses." However, the data do not support this conclusion. Specifically, no analysis is done showing unique expression of either isoform in hair cells and no functional analysis is done to conclusively determine which isoform is important for either phenotype.

      Comments: We have shown that both sema7a transcripts are expressed in the hair cells of four-day-old neuromasts (lines 78-84, Figure 1C,D). Ectopic expression of the sema7asec transcript variant robustly attracts the lateral-line sensory arbors toward itself, whereas ectopic expression of the sema7aGPI variant fails to impart sensory guidance from a distance, suggesting that the membrane-bound form likely participates in contact-mediated neural guidance. These experiments decisively show, for the first time in zebrafish, the dual modes of Sema7A function in vivo. However, we agree that the sema7aGPI transcript-specific knockout animal would be essential to conclusively prove that the membrane-attached form is primarily involved in forming accurate neural circuitry and contact-mediated formation and maintenance of synapses. Hence, we have very carefully stated in lines 427-428 that “the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses”. We will follow up on this suggestion in our upcoming manuscript that will incorporate transcript-specific genetic ablations.

      Though the authors present RT-PCR analysis of sema7a isoforms, it is not interpretable. The second reverse primer will also recognize the full-length transcript (from what I can gather) so it does not simply show the presence of the secreted form. Is there a unique 3'UTR for the short transcript that can be used? Additionally, for the GPI-anchored version can you use a forward primer that is not present in the short isoform? This would shed some light on the respective levels of both transcripts.

      Comments: The C-termini of the two transcript variants are distinct and we have designed distinct primers that will selectively bind to each transcript (lines 503-511). Since, we have not performed quantitative polymerase chain reaction (qPCR), relative levels of each transcript are hard to determine.

      Alternatively, and perhaps of more use, in situ hybridization using unique probes for each isoform would allow you to determine which are actually present in hair cells.

      Comments: We have tried this approach and explained the point earlier (refer to lines 203212 of this response letter).

      To decisively state that these isoforms have unique functions in axon terminal structure and synapse formation, other experiments are also essential. For example, RNA-mediated rescue analyses using both isoforms would tell you which can rescue the axonal structure and synapse size/number phenotypes. Overexpression of the GPI-anchored form, like the secreted form in Figure 4, would allow you to determine if only the secreted form can cause abnormal axon extension phenotypes. Expression of both forms in hair cells (using a myo6b promotor for example) would allow assessment of their role in presynapse formation.

      Comments: We have ectopically expressed the sema7aGPI transcript variant near the sensory arbor network and observed that Sema7A-GPI fails to impart sensory axon guidance from a distance.

      Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      For the overexpression experiments, expression of mKate alone (with and without heat shock) is also a critical control to include.

      Comments: We have incorporated two control experiments: (1) larvae injected with hsp70:sema7asec-mKate2 plasmid that were not heat shocked and (2) Uninjected larvae that were heatshocked. We think these two controls are sufficient to demonstrate that the abnormal arborization patterns are not artifacts generated due to plasmid injection and heatshocking.

      Issue 2: A second concern is the lack of data showing support cell and hair cell formation and function is unaffected. Analysis of support and hair cell number with loss of Sema7a as well as simple analyses of mechanotransduction (FM4-64) would help alleviate concerns that phenotypes are due to disrupted neuromast formation and basic hair cell function rather than a specific role for Sema7a in this process.

      Comments: We have measured the hair cell numbers in both control and sema7a-/- mutants across developmental stages. We have added this to our submitted raw data.

      We have utilized the styryl fluorophore FM4-64 to test the mechanotransduction function of the hair cells in sema7a-/- mutants. We have detailed our finding in lines 137141 and in Figure 2—figure supplement 1C,D.

      Expression analysis of Sema7a receptors would also help strengthen the argument for a specific effect on lateral line afferent axons.

      Comments: Thank you for this suggestion. Currently, we do not possess an RNA transcriptome dataset for the lateral line ganglion. This deficit limits a systematic screen for lateral-line sensory neuronal gene expressions either through antibody stains or via HCRmediated in situ techniques. In future we plan to develop an RNA transcriptome for the lateral-line ganglion and identify potential binding partners for Sema7A.

      Issue 3: The manuscript could also be improved to include more detail in some areas and less in others. In general, each section has a fairly long lead up but lacks important experimental details that would help the reader interpret the data. For example:

      Figure 1: What is the label for the lateral line axons? Is it a specific transgenic? The legend states that 3 asterisks indicate p<0.0001. What about the other asterisk combinations?

      Comments: We have clarified these issues in lines 118-121 and in lines 906-907.

      Figure 2: For the network analysis, are the traces for all axons that branch to innervate the neuromast?

      Comments: Yes, we have traced the entire arbor containing all the axons that branched from the lateral line nerve and extended toward the clustered hair cells. The three-dimensional traces depict a skeletonized representation of the arbor network.

      Can the tracing method distinguish individual axons?

      Comments: No, our goal is to understand how the axon-collective behave around the clustered hair cells during development.

      How do you know where an end is versus continued looping?

      Comments: We have categorically defined the topological attributes in lines 187-191 and in Figure 3A.

      Also, are all neuromasts similarly affected or is there a divergence based on which organ you are imaging? What neuromast was imaged in this and other figures?

      Comments: Yes, all the neuromasts in the trunk and tail regions were affected similarly by the sema7a mutation. We did not observe any region-specific phenotypic outcome. We consistently imaged the trunk neuromasts, particularly the second, third, and fourth neuromasts.

      Discussion: The short discussion failed to put these findings into context or to discuss how this unique topological arrangement of axon terminals impacts function.

      Comments: We have added a new segment, lines 432-448, in the discussion section which mentions the potential role of the topological features in arranging the distribution pattern of the postsynaptic densities and thereby potentially influencing the network’s ability to gather sensory inputs through properly placed postsynaptic aggregates.

      Can you speculate on how the looping structure may alter number of synaptic contacts per axon for instance? For this, it would be useful to know if normally the synapses form on loops versus bare terminals.

      Comments: Thank you for this insightful suggestion. We have performed detailed analysis, as mentioned in lines 384-397, to characterize the distribution of the postsynaptic densities between the two topological attributes.

      Does this looping facilitate single axons contacting more hair cells of the same polarity? Would that be beneficial?

      Comments: Looping behaviors indeed facilitate the contact between the axons and the hair cells. As we have observed, the primary topological attribute that the sensory arbor network underneath the clustered hair cells adopts is a loop. The bare terminals are predominantly projected transverse to the clustered hair cells and lack contact with them. Whether a single axon, being part of a loop, preferentially contacts hair cells of same polarity is yet to be determined. We can address this question by mosaic labeling a single axon in the arbor network and determine its association with the hair cells. We intend to do these experiments in our upcoming manuscript.

      Minor concerns:

      (1) For the stacked charts quantifying topological features, I found interpreting them challenging. Is it possible to put these into overlapping histograms or line graphs to better compare wild type to mutant directly?

      Comments: Thank you for your suggestion. We tried several ways to represent our data and found that the stacked charts optimally signify our analysis and depict the characteristic phenological differences between the control and the sema7a-/- mutants.

      (2) There are numerous strong statements throughout not directly supported by the data, e.g. lines 110-113; 206-208; 357-360 and others. These should be tempered.

      Comments: For lines 110-113, we have updated this section with new experiments and the new segment is represented in lines 115-126.

      For lines 206-208, we have updated the statement to “This result suggests that the stereotypical circuit topology observed in the mature organ may emerge through transition of individual arbors from forming bare terminals to forming closed loops encircling topological holes” in lines 225-227.

      Reviewer #2 (Recommendations For The Authors):

      The authors should be careful about making any assumptions which form of sema7a is active in NMs. Their RT-PCR demonstrates presence of both isoforms in a whole animal; however, whether they are similarly present in HCs is not investigated here.

      Comments: We have addressed this concern and have updated the manuscript with new experiments, detailed in lines 78-84.

      Also, there is an issue of translation and trafficking to the membrane with subsequent secretion. An important experiment that would address this question is expressing two sema7a isoforms in mutant HCs and asking whether this can suppress the mutant phenotype.

      Comments: Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      Presumably, sema7a is trafficked to the membrane during HC maturation. This is consistent with the authors' observation that sema7a localization is changing as NM mature. However, actin-sema7a co-labeling does not actually show whether sema7a is on the membrane. Labeling HCs with a membrane marker (transgene) would be much more convincing. Alternatively, can the authors show sema7a localization actually correlates with the presence of sensory axon terminals? They already have immunos that label both. Thus, this should be pretty straightforward.

      Comments: Thank you for these suggestions. We have addressed these issues in lines 112114, and in lines 119-126.

      Figure 2 should have a control panel, so the reduced sema7a staining can be compared to the control side-by-side.

      Comments: We have depicted Sema7A staining in control neuromasts in multiple images, including Figure 1E, Figure 1H, and in Figure 2—figure supplement 1B. We have kept the control panel in the supplementary figure due to space restrictions in Figure 2.

      Arborization topology: While I appreciate the very careful characterization of the topology for wild-type and mutant NMs, I think it would be much more informative to mark individual axons and then analyze their topology. The main reason is that the authors cannot really distinguish whether some aspects of topology they describe are really due to the densely packed overlapping terminals of multiple axons or these are really characteristic, higher order organization of individual axons. Because of this, they cannot be certain what is really happening with sema7a mutant terminals. Related to the point above. While it is clear that the overall topology is abnormal in the mutant, the authors should be careful in concluding that sema7a regulates specific aspects of it. The overall structure is probably highly interconnected perturbing one parameter would likely affect all the others.

      Comments: Thank you for this comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference number 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory axon-collective behave around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axon-collective associates itself with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Experiments with the secreted sema7a isoform would be much more informative if they were compared/contrasted to the GPI anchored isoform.

      Comments: We added a new section, lines 338-351, and a new Figure 5 to address this issue.

      The phenotype of ectopic projections in sema7a overexpression experiments is pretty dramatic, especially given the fact that these were performed in wild-type animals. Does this mean that the phenotype would be even more dramatic in sema7a mutants, as they have more bare axon terminals according to the authors' analysis. Have the authors attempted this type of experiments?

      Comments: That is an interesting suggestion. We have not tested that yet. Our guess is that in the sema7a-/- mutants, the abundant bare terminals will be far more sensitive to an ectopic source of Sema7A. But even in the sema7a-/- mutants, other chemotropic cues are still functional, which may impart certain restrictions on how many bare terminals are allowed to leave the neuromast region.

      Reviewer #3 (Recommendations For The Authors):

      (1) No raw data are shown, such that it is difficult to assess variability across animals or within animals, just the overall trends within the whole dataset. Raw data need to be shown for every measurement, at least in supplemental figures. It would also be useful to reliably show control next to mutant in the same plot, as it is a bit hard to compare across panels, which occurs in several figures.

      Comments: We have uploaded all the raw data related to each experiment.

      (2) Given the focus on the two possible forms of Sema7a, the authors should use HCR or another form of reliable in situ hybridization to show the spatiotemporal pattern of expression of each isoform.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the lack of a signal.

      (3) The authors should explain the criteria used to select the 22 embryos used to analyze the effects of expressing diffusible Sema7a.

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      (4) Although arbors were imaged in live embryos, time is never presented as a variable, so I cannot tell whether axon topology was changing as the images were collected. This needs to be clarified.

      Comments: We imaged the trunk neuromasts of both control and sema7a-/- mutant live zebrsfish larvae at 2, 3, and 4 dpf. We imaged the control and the sema7a-/- mutants of each developmental stage in parallel, within a span of two hours, and repeated these experiments multiple times to gather almost a hundred larvae from each genotype. Even though the sensory arbor network is dynamic, we believe imaging both the genotypes in parallel and within a span of two hours, and averaging almost a hundred larvae from each genotype minimize the temporal variability observed in the arbor architecture.

      (5) Ideally, the authors should use CRISPR/cas-9 to create a mutation in the C-terminus that would prevent production of the GPI-anchored form and not of the diffusible form. I understand if this is too much work to do in a short time, and would be satisfied with another experiment that could distinguish roles for at least one isoform more clearly. For instance, it would be interesting to see an analysis of how far an axon can be from a source to detect diffusible Sema7a (live imaging would be ideal for this) and then to show that the effect is different when the membrane bound form is expressed.

      Comments: Thank you for this comment. We are currently working in generating transcript specific knockout animals.

      We have added live timelapse video microscopy data in lines 330-337, Figure 4H-J, Figure 4—figure supplement 2, Video15,16.

      We have added a new segment analyzing the membrane-bound transcript variant in lines 338-351.

      Reviewer #4 (Recommendations For The Authors):

      Feedback to authors

      Overall, this is a very important and novel study. Currently the manuscript does need revision.

      Major concerns:

      (1) Controls. For the ectoptic expression of Sema7a, injection of a construct expressing Sema7a under a heatshock promoter is used to drive ectopic expression. No heatshock (injected) animal are used as a control. In many systems heatshock can impact neuron morphology. And heatshock proteins are required for normal neurite and synapse formation. Please examine sensory axons in uninjected wildtype animals with heatshock.

      Comments: We have added this control experiment in a new segment, explained in detail in lines 348-350 and Figure 5.

      (2) Synapse staining - regarding Figure 5 and related supplement

      Understanding whether guidance defects ultimately impact synapse formation is an important aspect of this paper. Therefore, is necessary to have accurate measurements of the number of complete synapses, and the overall numbers of pre- and postsynaptic components. Currently the data plotted in Figure 5 is extensive, but the way the data is laid out, the relevant comparisons are challenging to make. Perhaps include this quantification in the supplement, and move the data from the supplement to the main figure? The quantifications in the supplement are easier to follow and easier to compare between genotypes.

      Comments: We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We believe that showing only the average numbers will not reveal the changes in the distribution of the synaptic structures during development and across genotypes.

      Looking at the data itself, there seems to be some discrepancies with the synaptic counts compared to published work. While the CTBP numbers seem in order, the Maguk numbers do not. In both mutant and control there are many hair cells without any Maguk puncta/aggregates-leading to 0.75-1 postsynapses per hair cell (Figure 5 supplement H-I). Typically, the numbers should be more comparable to what was obtained for CTBP, 3-4 puncta per cells (Figure 5 supplement B-C), especially by 3-4 dpf. 3-4 CTPB or Maguk puncta per cell is based on previously published immunostaining and EM work.

      The Maguk immunostaining, especially at early stages (2-3 dpf) is challenging. To compound a challenging immunostain, around 2019 Neuromab began to outsource the purification of their Maguk antibody. After this outsourcing our lab was no longer able to get reliable label with the Maguk antibody from Neuromab.

      Millipore sells the same monoclonal antibody and it works well: https://www.emdmillipore.com/US/en/product/Anti-pan-MAGUK-Antibody-clone-K2886,MM_NF-MABN72

      I would recommend this source.

      Comments: Thank you for suggesting the new MAGUK antibody. We have utilized this new MAGUK antibody from Millipore and added a new segment in lines 389-408. In future publication we will utilize this antibody to capture the postsynaptic densities in the sensory arbors.

      The discrepancies in the postsynaptic punctae number in our control larvae may arise due to the reliability of the Neuromab MAGUK antibody. We have utilized this same antibody to stain the sema7a-/- mutants and have observed a significant decrease in MAGUK punctae number and area. On grounds of keeping parity between the control and the sema7a-/- mutants, we have decided to keep our experimental results in the manuscript.

      In addition to a more accurate Maguk label, a combined pre- and post-synaptic label is essential to understand whether synapses pair properly in the sema7a mutants. This can be accomplished using subtype specific antibodies using goat anti-mouse IgG1/Maguk and goat anti-mouse IgG2a/CTBP secondaries.

      Comments: Thank you for suggesting this. We are preparing another manuscript in which we will utilize this technique along with other suggestions to tease apart the role of distinct transcript variants in regulating neural guidance and synapse formation.

      (3) Does sema7a lesion impact the number of hair cells per neuromast? If hair cell numbers are reduced several of the quantifications could be impacted.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Could innervation just be developmentally delayed in sema7a mutants? At 4 dpf the sensory system is just starting to come online and could still be in the process of refinement. Did you look at slightly older ages, after the sensory system is functional behaviorally, for example, 6 dpf? Do the cores phenotypes (synapse defects and excess arbors) persist at 6 dpf in the sema7a mutants?

      Comments: The homozygous sema7a-/- mutants are unviable and start to die at 6 dpf. We therefore restricted our analysis until 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggests that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Minor comments to address:

      Results

      Page 4 lines 89-91. For the readers, explain why you examined levels in Sema7a in rostral and caudal hair cells. Also, this sentence is, in general, a little bit misleading-initially reading that there is no difference in Sema7a at 1.5-4 dpf.

      Comments: In lines 44-48, we explain that the hair cells in the neuromast contain mechanoreceptive hair cells of opposing polarities that help them detect water currents from opposing directions. In lines 93-106, we tested whether the Sema7A level varies between the two polarities. We observed that the Sema7A level is similar between the two polarities of hair cells, but the average Sema7A intensity increases significantly over the developmental period of 2 dpf to 4 dpf in both rostrally and caudally polarized hair cells.

      Page 10-11 Lines 263-270. What was the frequency of these 2 outcomes- out of the 22 cases with ectopic expression?

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      Discussion

      Page 14 Lines 359-360. There is not enough evidence provided in this work to suggest that the membrane attached form of Sema7a is playing a role. Both the secreted and membrane form are gone in the sema7a mutants. If the membrane attached form was specifically lesioned, and resulted in a phenotype, then there would be sufficient evidence. Currently there is strong evidence for a distinct role for the secreted form. Although the authors qualify the outlined statement with the word 'likely', stating this possibility in the discussion take-home is misleading.

      Comments: In future we will utilize the CRISPR/Cas9 technique to target the unique Cterminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us differentiate between the roles of the membrane-bound Sema7AGPI molecule and the secreted Sema7Asec in sensory arborization and synaptic assembly.

      It might be interesting in either the intro or discussion to reference the role Sema3F in axon guidance in the mouse auditory epithelium. https://elifesciences.org/articles/07830

      Comments: We have added this reference in lines 61-64.

      Figures

      Please indicate on one of your Figures where the mutation is (roughly) in the sema7a mutant (in addition to stating it in the results).

      Comments: We have added this information in Figure 2—figure supplement 1A.

      Either state or indicate in a Figure where the epitope used to make the Sema7a antibody-to show that the antibody is predicted to recognize both isoforms.

      Comments: We have stated the details of the epitope in lines 528-529.

      Figure 2-S1 what is the scale in panel A, is it different between mutant and wildtype?

      Comments: We have updated the images. New images are depicted in Figure 2—figure supplement 1A.

      Methods

      What were the methods used to quantify synapse number and area?

      Comments: We have added a new section in lines 702-708 to explain the measurement techniques.

    2. eLife assessment

      Dasgupta and colleagues make a valuable contribution to the understanding how the guidance factor Sema7a promotes connections between mechanosensory hair cells and afferent neurons of the zebrafish lateral line system. The authors provide solid evidence that loss of Sema7a function results in fewer contacts between hair cells and afferents through comprehensive quantitative analysis. Additional work is needed to distinguish the effects of different isoforms of Sema7a to determine whether there are specific roles of secreted and membrane bound forms.

    3. Reviewer #1 (Public Review):

      Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, the effect of loss of Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes. These issues weaken the claims made by the authors including the statement that they have identified dual roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively.

    4. Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below). In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations.

    5. Reviewer #3 (Public Review):

      The data reported here demonstrate that Sema7a defines the local behavior of growing axons in the developing zebrafish lateral line. The analysis is sophisticated and convincingly demonstrates effects on axon growth and synapse architecture. Collectively, the findings point to the idea that the diffusible form of sema7a may influence how axons grow within the neuromast and that the GPI-linked form of sema7a may subsequently impact how synapses form, though additional work is needed to strongly link each form to its' proposed effect on circuit assembly.

      Comments on revised submission:

      The revised manuscript is significantly improved. The authors comprehensively and appropriately addressed most of the reviewers' concerns. In particular, they added evidence that hair cells express both Sema7A isoforms, showed that membrane bound Sema7A does not have long range effects on guidance, demonstrated how axons behave close to ectopic Sema7A, and analyzed other features of the hair cells that revealed no strong phenotypes. The authors also softened the language in many, but not all places. Overall, I am satisfied with the study as a whole.

    6. Reviewer #4 (Public Review):

      This study provides direct evidence showing that Sema7a plays a role in the axon growth during the formation of peripheral sensory circuits in the lateral-line system of zebrafish. This is a valuable finding because the molecules for axon growth in hair-cell sensory systems are not well understood. The majority of the experimental evidence is convincing, and the analysis is rigorous. The evidence supporting Sema7a's juxtracrine vs. secreted role and involvement in synapse formation in hair cells is less conclusive. The study will be of interest to cell, molecular and developmental biologists, and sensory neuroscientists.

    1. eLife assessment

      This study makes a valuable contribution to spatial transcriptomics by rigorously benchmarking cell-type deconvolution methods, assessing their performance across diverse datasets with a focus on biologically relevant, previously unconsidered aspects. The authors demonstrate the strengths of RCTD, cell2location, and SpatialDWLS for their performance, while also revealing the limitations of many methods when compared to simpler baselines. By implementing a full Nextflow pipeline, Docker containers, and a rigorous assessment of the simulator, this work offers robust insights that elevate the standards for future evaluations and provides a resource for those seeking to improve or develop new deconvolution methods. The thorough comparison and analysis of methods, coupled with a strong emphasis on reproducibility, provide solid support for the findings.

    2. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendation for the authors)

      I only have one comment for improvement of this study and it has to do with the comparison of simulators that they conducted. There are many other simulators around now, including scDesign3, spaSim, SPIDER, SRTSIM, etc. Are any of those methods worth including in the comparison?

      Indeed, many of the mentioned simulators did not exist when we initially developed synthspot, and upon closer examination, they are not directly comparable to our tool.

      • scDesign3: The runtime of scDesign3 is quite long as a result of its generative model. The example provided in its tutorial only simulates 183 genes and takes over seven minutes when using four cores on a system with Intel Xeon E5-2640 CPUs running at 2.5GHz. In a small downsampling analysis, we simulated 10, 50, 100, and 150 genes with scDesign3 and observed runtimes of 30, 130, 245, and 360 seconds, respectively. This seems to indicate a linear relationship between the number of genes and the runtime, therefore rendering it unsuitable for simulating whole-transcriptome datasets for deconvolution.

      • spaSim: spaSim focuses on modelling cell locations in different tissue structures but does not provide gene expression data. It is designed for testing cell colocalization capabilities rather than simulating gene expression.

      • SPIDER: Although SPIDER appears to have some overlap with our work, it seems to be in the early stages of development. The GitHub repository contains only two scripts without any documentation, and the preprint does not provide instructions on how to use the tool.

      • SRTSim: SRTSim explicitly states in its publication that it is not suitable for evaluating cell type deconvolution, as its focus is on simulating gene expression data without modelling cell type composition.

      • scMultiSim: scMultiSim, like scDesign3, is limited in its capability to model the entire transcriptome.

      Nonetheless, the inherent modularity of our Nextflow framework makes it possible for users to simply run the deconvolution methods on data that has been simulated by other simulators if need be.

      Additionally, we have added the following rationale for why we developed synthspot in “Synthspot allows simulation of artificial tissue patterns”:

      “On the other hand, general-purpose simulators are typically more focused on other inference tasks, such as spatial clustering and cell-cell communication, and are unsuitable for deconvolution. For instance, generative models and kinetic models like those of scDesign3 and scMultiSim are computationally intensive and unable to model entire transcriptomes. SRTSim focuses on modeling gene expression trends and does not explicitly model tissue composition, while spaSim only models tissue composition without gene expression.”

      The other aspect of the simulation comparison that I'm missing is some kind of spatial metric. There are metrics about feature correlation, sample-sample correlation, library size, etc. But, what about spatial correlation (e.g., Moran's I or similar). Perhaps comparing the distribution of Moran's I across genes in a simulated and real dataset would be a good first start.

      We would like to clarify that synthspot does not actually simulate the spatial location of spots, but synthetic regions where spots from the same region share similar compositions. Hence, incorporating a spatial metric in the comparison is not feasible. However, as RCTD is the only method that explicitly uses spot locations in its model (Supplementary Table 2, "Location information"), we believe that generating synthetic datasets with actual coordinates would not significantly impact the conclusions of the study.

      Reviewer #2 (Public Review)

      On the other hand, the authors state that in silver standard datasets one half of the scRNA-seq data was used for simulation and the other half was used as a reference for the algorithms, but the method of splitting the data, i.e., at random or proportionally by cell type, was not specified.

      The data was split proportionally by cell type. To clarify this, we have included an additional sentence in the main text under the first paragraph of “Cell2location and RCTD perform well in synthetic data”, as well as in Figure S2.

      Reviewer #2 (Recommendation for the authors)

      Figure legends in Figures 3, 4 and across most Supplementary material are almost illegible. Please consider increasing font size for better readability.

      Thank you for bringing this to our attention. The font size has been increased for all main and supplementary figures. Additionally, the supplementary figures have also been exported in higher resolution.

      Supplementary Notes Figure 2c reads "... total count per sampled multiplied by..."

      This has been adapted, as well as the captions of Supplementary Notes Figure 3c and 4c which had the same typo.

      Review #3 (Public review)

      The simulation setup has a significant weakness in the selection of reference single-cell RNAseq datasets used for generating synthetic spots. It is unclear why a mix of mouse and human scRNA-seq datasets were chosen, as this does not reflect a realistic biological scenario. This could call into question the findings of the "detecting rare cell types remains challenging even for top-performing methods" section of the paper, as the true "rare cell types" would not be as distinct as human skin cells in a mouse brain setting as simulated here.

      We appreciate the reviewer’s concern and would like to clarify that within one simulated dataset, we never mix mouse and human scRNA-seq data together. The synthetic spots generated for the silver standards are always sampled from a single scRNA-seq or snRNA-seq dataset. Specifically, for each of the seven public scRNA-seq datasets, we generate synthetic datasets with one of nine abundance patterns, resulting in a total of 63 synthetic datasets. These abundance patterns only affect the sampling priors that are used—the spots are still created with combinations of cells sampled from the same dataset.

      Furthermore, it is unclear why the authors developed Synthspot when other similar frameworks, such as SRTsim, exist. Have the authors explored other simulation frameworks?

      While there are other simulation frameworks available now, synthspot was designed to specifically address the requirements of our study, offering unique capabilities that make it suitable for deconvolution evaluation. Moreover, many of the simulators did not exist when we initially developed our tool. We have added the following rationale for why we developed synthspot in “Synthspot allows simulation of artificial tissue patterns”:

      “On the other hand, general-purpose simulators are typically more focused on other inference tasks, such as spatial clustering and cell-cell communication, and are unsuitable for deconvolution. For instance, generative models and kinetic models like those of scDesign3 and scMultiSim are computationally intensive and unable to model entire transcriptomes. SRTSim focuses on modeling gene expression trends and does not explicitly model tissue composition, while spaSim only models tissue composition without gene expression.”

      In our response to Reviewer 1 copied below, we also outline specific reasons why other simulators were not suitable for our benchmark:

      • scDesign3: The runtime of scDesign3 is quite long as a result of its generative model. The example provided in its tutorial only simulates 183 genes and takes over seven minutes when using four cores on a system with Intel Xeon E5-2640 CPUs running at 2.5GHz. In a small downsampling analysis, we simulated 10, 50, 100, and 150 genes with scDesign3 and observed runtimes of 30, 130, 245, and 360 seconds, respectively. This seems to indicate a linear relationship between the number of genes and the runtime, therefore rendering it unsuitable for simulating whole-transcriptome datasets for deconvolution.

      • spaSim: spaSim focuses on modelling cell locations in different tissue structures but does not provide gene expression data. It is designed for testing cell colocalization capabilities rather than simulating gene expression.

      • SPIDER: Although SPIDER appears to have some overlap with our work, it seems to be in the early stages of development. The GitHub repository contains only two scripts without any documentation, and the preprint does not provide instructions on how to use the tool.

      • SRTSim: SRTSim explicitly states in its publication that it is not suitable for evaluating cell type deconvolution, as its focus is on simulating gene expression data without modelling cell type composition.

      • scMultiSim: scMultiSim, like scDesign3, is limited in its capability to model the entire transcriptome.

      Finally, we would have appreciated the inclusion of tissue samples with more complex structures, such as those from tumors, where there may be more intricate mixing between cell types and spot types.

      We acknowledge the reviewer's suggestion and have incorporated a melanoma dataset from Karras et al. (2022) in response to this suggestion. This study profiled melanoma tumors by using both scRNA-seq and spatial technologies. The scRNA-seq consists of eight immune cell types and seven melanoma cell states. We have included this study as an additional silver standard and case study, the latter of which is presented in a separate section following the liver analysis (and a corresponding section in Methods).

      We found that method performances on synthetic datasets generated from this melanoma dataset follow previous trends (Figure S3-S5). However, the inclusion of the case study led to the following changes in the overall rankings: cell2location and RCTD are now tied for first place (previously RCTD ranked first), and Seurat and SPOTlight have swapped places. Despite these changes, the core messages and conclusions of our paper remain unchanged. All relevant figures (Figures 1a, 2, 3a, 4a, 6b, 7a, S3-S6, S9) have been updated to incorporate these new analyses and results.

      Review #3 (Recommendation for the authors)

      To maintain consistency in the results, it is recommended to exclude the human scRNAseq set when generating synthetic spots. Furthermore, addressing the other significant weaknesses mentioned earlier would be beneficial.

      Please refer to our response to the public review where we address the same remark.

      It is essential to differentiate this work from previous benchmarking and simulation frameworks.

      In addition to the rationale on why we developed our own framework (see response to the public review), we have included the following text in the discussion that highlights our versatile approach when using a real spatial dataset for evaluation:

      “In the case studies, we demonstrated two approaches for evaluating deconvolution methods in datasets without an absolute ground truth. These approaches include using proportions derived from another sequencing or spatial technology as a proxy, and leveraging spot annotations, e.g., zonation or blood vessel annotations, that typically have already been generated for a separate analysis.”

      Furthermore, we conducted an extra analysis in the liver case study, generating synthetic datasets with one experimental protocol and using the remaining two as separate references (Figure S13). This further illustrates the usefulness of our simulation framework, which we mentioned by appending this sentence in the discussion:

      “As in our silver standards, users can select the abundance pattern most resembling the real tissue to generate the synthetic spatial dataset, as we have also demonstrated in the liver case study.”

    3. Reviewer #1 (Public Review):

      Cell type deconvolution is one of the early and critical steps in the analysis and integration of spatial omic and single cell gene expression datasets, and there are already many approaches proposed for the analysis. Sang-aram et al. provide an up-to-date benchmark of computational methods for cell type deconvolution.

      In doing so, they provide some (perhaps subtle) additional elements that I would say are above the average for a benchmarking study: i) a full Nextflow pipeline to reproduce their analyses; ii) methods implemented in Docker containers (which can be used by others to run their datasets); iii) a fairly decent assessment of their simulator compared to other spatial omics simulators. A key aspect of their results is that they are generally very concordant between real and synthetic datasets. And, it is important that they authors include an appropriate "simpler" baseline method to compare against and surprisingly, several methods performed below this baseline. Overall, this study also has the potential to also set the standard of benchmarks higher, because of these mentioned elements.

      The only weakness of this study that I can readily see is that this is a very active area of research and we may see other types of data start to dominate (CosMx, Xenium) and new computational approaches will surely arrive. The Nextflow pipeline will make the the prospect of including new reference datasets and new computational methods easier.

    4. Reviewer #2 (Public Review):

      In this manuscript Sangaram et al provide a systematic methodology and pipeline for benchmarking cell type deconvolution algorithms for spatial transcriptomic data analysis in a reproducible manner. They developed a tissue pattern simulator that starts from single-cell RNA-seq data to create silver standards and used spatial aggregation strategies from real in situ-based spatial technologies to obtain gold standards. By using several established metrics combined with different deconvolution challenges they systematically scored and ranked 12 different methods and assessed both functional and usability criteria. Altogether, they present a reusable and extendable platform and reach very similar conclusions to other deconvolution benchmarking paper, including that RCTD, SpatialDWLS and Cell2location typically provide the best results. Major strengths of the simulation engine include the ability to downsample and recapitulate several cell and tissue organization patterns.

      More specifically, the authors of this study sought to construct a methodology for benchmarking cell type deconvolution algorithms for spatial transcriptomic data analysis in a reproducible manner. The authors leveraged publicly available scRNA-seq, seqFISH, and STARMap datasets to create synthetic spatial datasets modeled after that of the Visium platform. It should be noted that the underlying experimental techniques of seqFISH and STARMap (in situ hybridization) do not parallel that of Visium (sequencing), which could potentially bias simulated data. Furthermore, to generate the ground truth datasets cells and their corresponding count matrix are represented by simple centroids. Although this simplifies the analysis it might not necessarily accurately reflect Visium spots where cells could lie on a boundary and affect deconvolution results.

      The authors thoroughly and rigorously compare methods while addressing situational discrepancies in model performance, indicative of a strong analysis. The authors make a point to address both inter- and intra- dataset reference handling, which has a significant impact on performance, as the authors note in the text and conclusions. Indeed, supplying optimal reference data is - potentially most - important to achieve the best performance and hence it's important to understand that experimental design or sample matching is at least equally important to selecting the ideal deconvolution tool.

      Similarly, the authors conclude that many methods are still outperformed by bulk deconvolution methods (e.g. Music or NNLS), however, it needs to be noted that these 'bulk' methods are also among the most sensitive when using an external (inter) dataset (S10), which likely resembles the more realistic scenario for most labs.

      As the authors also discuss it's important to realize that deconvolution approaches are typically part of larger exploratory data analysis (EDA) efforts and require users to change parameters and input data multiple times. Thus, running time, computing needs, and scalability are probably key factors that researchers would like to consider when looking to deconvolve their datasets.

      The authors achieve their aim to benchmark different deconvolution methods and the results from their thorough analysis support the conclusions that creating cell type deconvolution algorithms that can handle both cell abundance and rarity throughout a given tissue sample are challenging.

      The reproducibility of the methods described will have significant utility for researchers looking to develop cell type deconvolution algorithms, as this platform will allow simultaneous replication of the described analysis and comparison to new methods.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in fundamental and clinical aspects of consciousness.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Luppi et al., apply the recently developed integrated information decomposition to the question how the architecture of information processing changes when consciousness is lost. They explore fMRI data from two different populations: healthy volunteers undergoing reversible anesthesia, as well as from patients who have long-term disorders of consciousness. They show that, in both populations, synergistic integration of information is disrupted in common ways. These results are interpreted in the context of the SAPHIRE model (recently proposed by this same group), that describes information processing in the brain as being composed of several distinct steps: 1) gatekeeping (where gateway regions introduce sensory information to the global synergistic workspace where 2) it is integrated or "processed" before 3) by broadcast back to to the brain.

      I think that this paper is an excellent addition to the literature on information theory in neuroscience, and consciousness science specifically. The writing is clear, the figures are informative, and the authors do a good job of engaging with existing literature. While I do have some questions about the interpretations of the various information-theoretic measures, all in all, I think this is a significant piece of science that I am glad to see added to the literature.

      One specific question I have is that I am still a little unsure about what "synergy" really is in this context. From the methods, it is defined as that part of the joint mutual information that is greater than the maximum marginal mutual information. While this is a perfectly fine mathematical measure, it is not clear to me what that means for a squishy organ like the brain. What should these results mean to a neuro-biologist or clinician?

      Right now the discussion is very high level, equating synergy to "information processing" or "integrated information", but it might be helpful for readers not steeped in multivariate information theory to have some kind of toy model that gets worked out in detail. On page 15, the logical XOR is presented in the context of the single-target PID, but 1) the XOR is discrete, while the data analyzed here are continuous BOLD signals w/ Gaussian assumptions and 2) the XOR gate is a single-target system, while the power of the Phi-ID approach is the multi-target generality. Is there a Gaussian analog of the single-target XOR gate that could be presented? Or some multi-target, Gaussian toy model with enough synergy to be interesting? I think this would go a long way to making this work more accessible to the kind of interdisciplinary readership that this kind of article with inevitably attract.

      We appreciate this observation. We now clarify that:

      “redundancy between two units occurs when their future spontaneous evolution is predicted equally well by the past of either unit. Synergy instead occurs when considering the two units together increases the mutual information between the units’ past and their future – suggesting that the future of each is shaped by its interactions with the other. At the microscale (e.g., for spiking neurons) this phenomenon has been suggested as reflecting “information modification” 36,40,47. Synergy can also be viewed as reflecting the joint contribution of parts of the system to the whole, that is not driven by common input48.”

      In the Methods, we have also added the following example to provide additional intuition about synergy in the case of continuous rather than discrete variables:

      “As another example for the case of Gaussian variables (as employed here), consider a 2-node coupled autoregressive process with two parameters: a noise correlation c and a coupling parameter a. As c increases, the system is flooded by “common noise”, making the system increasingly redundant because the common noise “swamps” the signal of each node. As a increases, each node has a stronger influence both on the other and on the system as a whole, and we expect synergy to increase. Therefore, synergy reflects the joint contribution of parts of the system to the whole that is not driven by common noise. This has been demonstrated through computational modelling (Mediano et al 2019 Entropy).”

      See below for the relevant parts of Figures 1 and 2 from Mediano et al (2019 Entropy), where Psi refers to the total synergy in the system.

      Author response image 1.

      Strengths

      The authors have a very strong collection of datasets with which to explore their topic of interest. By comparing fMRI scans from patients with disorders of consciousness, healthy resting state, and various stages of propofol anesthesia, the authors have a very robust sample of the various ways consciousness can be perturbed, or lost. Consequently, it is difficult to imagine that the observed effects are merely a quirk of some biophysical effect of propofol specifically, or a particular consequence of long-term brain injury, but do in fact reflect some global property related to consciousness. The data and analyses themselves are well-described, have been previously validated, and are generally strong. I have no reason to doubt the technical validity of the presented results.

      The discussion and interpretation of these results is also very nice, bringing together ideas from the two leading neurocognitive theories of consciousness (Global Workspace and Integrated Information Theory) in a way that feels natural. The SAPHIRE model seems plausible and amenable to future research. The authors discuss this in the paper, but I think that future work on less radical interventions (e.g. movie watching, cognitive tasks, etc) could be very helpful in refining the SAPHIRE approach.

      Finally, the analogy between the PID terms and the information provided by each eye redundantly, uniquely, and synergistically is superb. I will definitely be referencing this intuition pump in future discussions of multivariate information sharing.

      We are very grateful for these positive comments, and for the feedback on our eye metaphor.

      Weaknesses

      I have some concerns about the way "information processing" is used in this study. The data analyzed, fMRI BOLD data is extremely coarse, both in spatial and temporal terms. I am not sure I am convinced that this is the natural scale at which to talk about information "processing" or "integration" in the brain. In contrast to measures like sample entropy or Lempel-Ziv complexity (which just describe the statistics of BOLD activity), synergy and Phi are presented here as quasi-causal measures: as if they "cause" or "represent" phenomenological consciousness. While the theoretical arguments linking integration to consciousness are compelling, is this is right data set to explore them in? For example, the work by Newman, Beggs, and Sherril (nee Faber), synergy is associated with "computation" performed in individual neurons: the information about the future state of a target neuron that is only accessible when knowing both inputs (analogous to the synergy in computing the sum of two dice). Whether one thinks that this is a good approach neural computation or not, it fits within the commonly accepted causal model of neural spiking activity: neurons receive inputs from multiple upstream neurons, integrate those inputs and change their firing behavior accordingly.

      In contrast, here, we are looking at BOLD data, which is a proxy measure for gross-scale regional neural activity, which itself is a coarse-graining of millions of individual neurons to a uni-dimensional spectrum that runs from "inactive to active." It feels as though a lot of inferences are being made from very coarse data.

      We appreciate the opportunity to clarify this point. It is not our intention to claim that Phi-R and synergy, as measured at the level of regional BOLD signals, represent a direct cause of consciousness, or are identical to it. Rather, our work is intended to use these measures similarly to the use of sample entropy and LZC for BOLD signals: as theoretically grounded macroscale indicators, whose empirical relationship to consciousness may reveal the relevant underlying phenomena. In other words, while our results do show that BOLD-derived Phi-R tracks the loss and recovery of consciousness, we do not claim that they are the cause of it: only that an empirical relationship exists, which is in line with what we might expect on theoretical grounds. We have now clarified this in the Limitations section of our revised manuscript, as well as revising our language accordingly in the rest of the manuscript.

      We also clarify that the meaning of “information processing” that we adopt pertains to “intrinsic” information that is present in the system’s spontaneous dynamics, rather than extrinsic information about a task:

      “Information decomposition can be applied to neural data from different scales, from electrophysiology to functional MRI, with or without reference to behaviour 34. When behavioural data are taken into account, information decomposition can shed light on the processing of “extrinsic” information, understood as the translation of sensory signals into behavioural choices across neurons or regions 41,43,45,47. However, information decomposition can also be applied to investigate the “intrinsic” information that is present in the brain’s spontaneous dynamics in the absence of any tasks, in the same vein as resting-state “functional connectivity” and methods from statistical causal inference such as Granger causality 49. In this context, information processing should be understood in terms of the dynamics of information: where and how information is stored, transferred, and modified 34.”

      References:

      (1) Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition. Entropy 24, 930 (2022).

      Reviewer #2 (Public Review):

      The authors analysed functional MRI recordings of brain activity at rest, using state-of-the-art methods that reveal the diverse ways in which the information can be integrated in the brain. In this way, they found brain areas that act as (synergistic) gateways for the 'global workspace', where conscious access to information or cognition would occur, and brain areas that serve as (redundant) broadcasters from the global workspace to the rest of the brain. The results are compelling and consisting with the already assumed role of several networks and areas within the Global Neuronal Workspace framework. Thus, in a way, this work comes to stress the role of synergy and redundancy as complementary information processing modes, which fulfill different roles in the big context of information integration.

      In addition, to prove that the identified high-order interactions are relevant to the phenomenon of consciousness, the same analysis was performed in subjects under anesthesia or with disorders of consciousness (DOC), showing that indeed the loss of consciousness is associated with a deficient integration of information within the gateway regions.

      However, there is something confusing in the redundancy and synergy matrices shown in Figure 2. These are pair-wise matrices, where the PID was applied to identify high-order interactions between pairs of brain regions. I understand that synergy and redundancy are assessed in the way the brain areas integrate information in time, but it is still a little contradictory to speak about high-order in pairs of areas. When talking about a "synergistic core", one expects that all or most of the areas belonging to that core are simultaneously involved in some (synergistic) information processing, and I do not see this being assessed with the currently presented methodology. Similarly, if redundancy is assessed only in pairs of areas, it may be due to simple correlations between them, so it is not a high-order interaction. Perhaps it is a matter of language, or about the expectations that the word 'synergy' evokes, so a clarification about this issue is needed. Moreover, as the rest of the work is based on these 'pair-wise' redundancy and synergy matrices, it becomes a significative issue.

      We are grateful for the opportunity to clarify this point. We should highlight that PhiID is in fact assessing four variables: the past of region X, the past of region B, the future of region X, and the future of region Y. Since X and Y each feature both in the past and in the future, we can re-conceptualise the PhiID outputs as reflecting the temporal evolution of how X and Y jointly convey information: the persistent redundancy that we consider corresponds to information that is always present in both X and Y; whereas the persistent synergy is information that X and Y always convey synergistically. In contrast, information transfer would correspond to the phenomenon whereby information was conveyed by one variable in the past, and by the other in the future (see Luppi et al., 2024 TICS; and Mediano et al., 2021 arXiv for more thorough discussions on this point). We have now added this clarification in our Introduction and Results, as well as adding the new Figure 2 to clarify the meaning of PhiID terms.

      We would also like to clarify that all the edges that we identify as significantly changing are indeed simultaneously involved in the difference between consciousness and unconsciousness. This is because the Network-Based Statistic differs from other ways of identifying edges that are significantly different between two groups or conditions, because it does not consider edges in isolation, but only as part of a single connected component.

      Reviewer #3 (Public Review):

      The work proposes a model of neural information processing based on a 'synergistic global workspace,' which processes information in three principal steps: a gatekeeping step (information gathering), an information integration step, and finally, a broadcasting step. The authors determined the synergistic global workspace based on previous work and extended the role of its elements using 100 fMRI recordings of the resting state of healthy participants of the HCP. The authors then applied network analysis and two different measures of information integration to examine changes in reduced states of consciousness (such as anesthesia and after-coma disorders of consciousness). They provided an interpretation of the results in terms of the proposed model of brain information processing, which could be helpful to be implemented in other states of consciousness and related to perturbative approaches. Overall, I found the manuscript to be well-organized, and the results are interesting and could be informative for a broad range of literature, suggesting interesting new ideas for the field to explore. However, there are some points that the authors could clarify to strengthen the paper. Key points include:

      (1) The work strongly relies on the identification of the regions belonging to the synergistic global workspace, which was primarily proposed and computed in a previous paper by the authors. It would be great if this computation could be included in a more explicit way in this manuscript to make it self-contained. Maybe include some table or figure being explicit in the Gradient of redundancy-to-synergy relative importance results and procedure.

      We have now added the new Supplementary Figure 1 to clarify how the synergistic workspace is identified, as per Luppi et al (2022 Nature Neuroscience).

      (2) It would be beneficial if the authors could provide further explanation regarding the differences in the procedure for selecting the workspace and its role within the proposed architecture. For instance, why does one case uses the strength of the nodes while the other case uses the participation coefficient? It would be interesting to explore what would happen if the workspace was defined directly using the participation coefficient instead of the strength. Additionally, what impact would it have on the procedure if a different selection of modules was used? For example, instead of using the RSN, other criteria, such as modularity algorithms, PCA, Hidden Markov Models, Variational Autoencoders, etc., could be considered. The main point of my question is that, probably, the RSN are quite redundant networks and other methods, as PCA generates independent networks. It would be helpful if the authors could offer some comments on their intuition regarding these points without necessarily requiring additional computations.

      We appreciate the opportunity to clarify this point. Our rationale for the procedure used to identify the workspace is to find regions where synergy is especially prominent. This is due to the close mathematical relationship between synergistic information and integration of information (see also Luppi et al., 2024 TICS), which we view as the core function of the global workspace. This identification is based on the strength ranking, as per Luppi et al (2022 Nature Neuroscience), which demonstrated that regions where synergy predominates (i.e., our proposed workspace) are also involved with high-level cognitive functions and anatomically coincide with transmodal association cortices at the confluence of multiple information streams. This is what we should expect of a global workspace, which is why we use the strength of synergistic interactions to identify it, rather than the participation coefficient. Subsequently, to discern broadcasters from gateways within the synergistic workspace, we seek to encapsulate the meaning of a “broadcaster” in information terms. We argue that this corresponds with making the same information available to multiple modules. Sameness of information corresponds to redundancy, and multiplicity of modules can be reflected in the network-theoretic notion of participation coefficient. Thus, a broadcaster is a region in the synergistic workspace (i.e., a region with strong synergistic interactions) that in addition has a high participation coefficient for its redundant interactions.

      Pertaining specifically to the use of resting-state networks as modules, indeed our own (Luppi et al., 2022 Nature Neuroscience) and others’ research has shown that each RSN entertains primarily redundant interactions among its constituent regions. This is not surprising, since RSNs are functionally defined: their constituent elements need to process the same information (e.g., pertaining to a visual task in case of the visual network). We used the RSNs as our definition of modules, because they are widely understood to reflect the intrinsic organisation of brain activity into functional units; for example, Smith et al., (2009 PNAS) and Cole et al (2014 Neuron) both showed that RSNs reflect task-related co-activation of regions, whether directly quantified from fMRI in individuals performing multiple tasks, or inferred from meta-analysis of the neuroimaging literature. This is the aspect of a “module” that matters from the global workspace perspective: modules are units with distinct function, and RSNs capture this well. This is therefore why we use the RSNs as modules when defining the participation coefficient: they provide an a-priori division into units with functionally distinct roles.

      Nonetheless, we also note that RSN organisation is robustly recovered using many different methods, including seed-based correlation from specific regions-of-interest, or Independent Components Analysis, or community detection on the network of inter-regional correlations - demonstrating that they are not merely a function of the specific method used to identify them. In fact, we show significant correlation between participation coefficient defined in terms of RSNs, and in terms of modules identified in a purely data-driven manner from Louvain consensus clustering (Figure S4).

      (3) The authors acknowledged the potential relevance of perturbative approaches in terms of PCI and quantification of consciousness. It would be valuable if the authors could also discuss perturbative approaches in relation to inducing transitions between brain states. In other words, since the authors investigate disorders of consciousness where interventions could provide insights into treatment, as suggested by computational and experimental works, it would be interesting to explore the relationship between the synergistic workspace and its modifications from this perspective as well.

      We thank the Reviewer for bringing this up: we now cite several studies that in recent years have applied perturbative approaches to induce transitions between states of consciousness.

      “The PCI is used as a means of assessing the brain’s current state, but stimulation protocols can also be adopted to directly induce transitions between states of consciousness. In rodents, carbachol administration to frontal cortex awakens rats from sevoflurane anaesthesia120, and optogenetic stimulation was used to identify a role of central thalamus neurons in controlling transitions between states of responsiveness121,122. Additionally, several studies in non-human primates have now shown that electrical stimulation of the central thalamus can reliably induce awakening from anaesthesia, accompanied by the reversal of electrophysiological and fMRI markers of anaesthesia 123–128. Finally, in human patients suffering from disorders of consciousness, stimulation of intra-laminar central thalamic nuclei was reported to induce behavioural improvement 129, and ultrasonic stimulation 130,131 and deep-brain stimulation are among potential therapies being considered for DOC patients 132,133. It will be of considerable interest to determine whether our corrected measure of integrated information and topography of the synergistic workspace also restored by these causal interventions.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would appreciate it if the authors could revisit the figures and make sure that:

      (1) All fonts are large enough to be readable for people with visual impairments (for ex. the ranges on the colorbars in Fig. 2 are unreadably small).

      Thank you: we have increased font sizes.

      (2) The colormaps are scaled to show meaningful differences (Fig. 2A)

      We have changed the color scale in Figure 2A and 2B.

      Also, the authors may want to revisit the references section: some of the papers that were pre-prints at one point have now been published and should be updated.

      Thank you: we have updated our references.

      Minor comments:

      • In Eqs. 2 and 3, the unique information term uses the bar notation ( | ) that is typically indicative of "conditioned on." Perhaps the authors could use a slash notation (e.g. Unq(X ; Z / Y)) to avoid this ambiguity? My understanding of the Unique information is that it is not necessarily "conditioned on", so much as it is "in the context of".

      Indeed, the “|” sign of “conditioning” could be misleading; however, the “/” sign could also be misleading, if interpreted as division. Therefore, we have opted for the “\” sign of “set difference”, in Eq 2 and 3, which is conceptually more appropriate in this context.

      • The font on the figures is a little bit small - for readers with poor eyes, it might be helpful to increase the wording size.

      We have increased font sizes in the figures where relevant.

      • I don't quite understand what is happening in Fig. 2A - perhaps it is a colormap issue, but it seems as though it's just a bit white square? It looks like redundancy is broadly correlated with FC (just based on the look of the adjacency matrices), but I have no real sense of what the synergistic matrix looks like, other than "flat."

      We have now changed the color scale in Figure 2.

      Reviewer #2 (Recommendations For The Authors):

      Besides the issues mentioned in the Public review, I have the following suggestions to improve the manuscript:

      • At the end of the introduction, a few lines could be added explaining why the study of DOC patients and subjects under anesthesia will be informative in the context of this work.

      By comparing functional brain scans from transient anaesthetic-induced unconsciousness and from the persistent unconsciousness of DOC patients, which arises from brain injury, we can search for common brain changes associated with loss of consciousness – thereby disambiguating what is specific to loss of consciousness.

      • On page and in general the first part of Results, it is not evident that you are working with functional connectivity. Many times the word 'connection' is used and sometimes I was wondering whether they were structural or functional. Please clarify. Also, the meaning of 'synergistic connection' or 'redundant connection' could be explained in lay terms.

      Thank you for bringing this up. We have now replaced the word “connection” with “interaction” to disambiguate this issue, further adding “functional” where appropriate. We have also provided, in the Introduction, an intuitive explanation of what synergy and redundancy mean int he context of spontaneous fMRI signals.

      • Figure 2 needs a lot of improvement. The matrix of synergistic interactions looks completely yellow-ish with some vague areas of white. So everything is above 2. What does it mean?? Pretty uninformative. The matrix of redundant connections looks a lot of black, with some red here and there. So everything is below 0.6. Also, what are the meaning and units of the colorbars?.

      We agree: we have increased font sizes, added labels, and changed the color scale in Figure 2. We hope that the new version of Figure 2 will be clearer.

      • Caption of Figure 2 mentions "... brain regions identified as belonging to the synergistic global workspace". I didn't get it clear how do you define these areas. Are they just the sum of gateways and broadcasters, or is there another criterion?

      Regions belonging to the synergistic workspace are indeed the set comprising gateways and broadcasters; they are the regions that are synergy-dominated, as defined in Luppi et al., 2022 Nature Neuroscience. We have now clarified this in the figure caption.

      • In the first lines of page 7, it is said that data from DOC and anesthesia was parcellated in 400 + 54 regions. However, it was said in a manner that made me think it was a different parcellation than the other data. Please make it clear that the parcellation is the same (if it is).

      We have now clarified that the 400 cortical regions are from the Schaefer atlas, and 54 subcortical regions from the Tian atlas, as for the other analysis. The only other parcellation that we use is the Schaefer-232, for the robustness analysis. This is also reported in the Methods.

      • Figure 3: the labels in the colorbars cannot be read, please make them bigger. Also, the colorbars and colorscales should be centered in white, to make it clear that red is positive and blue is negative. O at least maintain consistency across the panels (I can't tell because of the small numbers).

      Thank you: we have increased font sizes, added labels, indicated that white refers to zero (so that red is always an increase, and blue is always a decrease), and changed the color scale in Figure 2.

      • The legend of Figure 4 is written in a different style, interpreting the figure rather than describing it. Please describe the figure in the caption, in order to let the read know what they are looking at.

      We have endeavoured to rewrite the legend of Figure 4 in a style that is more consistent with the other figures.

      • In several parts the 'whole-minus-sum' phi measure is mentioned and it is said that it did not decrease during loss of consciousness. However, I did not see any figure about that nor any conspicuous reference to that in Results text. Where is it?

      We apologise for the confusion: this is Figure S3A, in the Supplementary. We have now clarified this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the same direction, regarding Fig. 2, in my opinion, it does not effectively aid in understanding the selection of regions as more synergistic or redundant. In panels A) and B), the color scales could be improved to better distinguish regions in the matrices (panel A) is saturated at the upper limit, while panel B) is saturated at the lower limit). Additionally, I suggest indicating in the panels what is being measured with the color scales.

      Thank you: we have increased font sizes, added labels, and changed the color scale in Figure 2.

      (2) When investigating the synergistic core of human consciousness and interpreting the results of changes in information integration measures in terms of the proposed framework, did the authors consider the synergistic workspace computed in HCP data? If the answer is positive, it would be helpful for the authors to be more explicit about it and elaborate on any differences that may be found, as well as the potential impact on interpretation.

      This is correct: the synergistic workspace, including gateways and broadcasters, are identified from the Human Connectome Project dataset. We now clarify this in the manuscript.

      Minors:

      (1) I would suggest improving the readability of figures 2 and 3, considering font size (letters and numbers) and color bars (numbers and indicate what is measured with this scale). In Figure 1, the caption defines steps instead stages that are indicated in the figure.

      Thank you: we have increased font sizes, added labels, and replaced steps with “stages” in Figure 1.

    2. Reviewer #2 (Public Review):

      The authors analysed functional MRI recordings of brain activity at rest, using state-of-the-art methods that reveal the diverse ways in which the information can be integrated in the brain. In this way, they found brain areas that act as (synergistic) gateways for the 'global workspace', where conscious access to information or cognition would occur, and brain areas that serve as (redundant) broadcasters from the global workspace to the rest of the brain. The results are compelling and consisting with the already assumed role of several networks and areas within the Global Neuronal Workspace framework. Thus, in a way, this work comes to stress the role of synergy and redundancy as complementary information processing modes, which fulfill different roles in the big context of information integration.<br /> In addition, to prove that the identified high-order interactions are relevant to the phenomenon of consciousness, the same analysis was performed in subjects under anesthesia or with disorders of consciousness (DOC), showing that indeed the loss of consciousness is associated with a deficient integration of information within the gateway regions.

      However, there is still a standing issue that could be the basis for an improved analysis: the concepts of gateways and broadcasters allude to a directionality in the information flow. In fact, Figure 1 depicts Stage (i) and Stage (iii) as one-way processes. However, the identification of gateway and broadcaster regions relies on matrices that are symmetrical, i.e. they are not directed. Would it be possible to assess the gateway or broadcaster nature of a region taking into account the directionality of the information flow? In other words, if region X is a gateway, I would expect a synergistic relationship between the past of X,Y and present of Y (Y not being a gateway) towards the present of X; but not necessarily the other way around (i.e. the present of Y being less dependent on the past/present of X). A similar reasoning can be made for broadcasters.

      Although regional differences in haemodynamics complicate attempts to map directed information flow from fMRI recordings, perhaps the IID framework could be leveraged to extract directed data (i.e., there are many atoms that are explicitly directed). As an avenue for future research, it would be interesting to discuss the feasibility or limitations of such analysis.

      Also, there is something confusing in Figure 4B-C and its description. Awake should be similar to recovery (they are both awake, aren't they? Not much info is given, anyway); thus it seems counterintuitive that anesthesia minus awake looks so different than anesthesia minus recovery. The first is mostly blue-ish and the second is mostly red. Is it possible that Figure 4C is actually recovery minus anesthesia? That would make much more sense, also for Figure 4D. Please correct me if I am wrong.

    3. Reviewer #3 (Public Review):

      The work proposes a model of neural information processing based on a 'synergistic global workspace,' which processes information in three principal steps: a gatekeeping step (information gathering), an information integration step, and finally, a broadcasting step. They provided an interpretation of the reduced human consciousness states in terms of the proposed model of brain information processing, which could be helpful to be implemented in other states of consciousness. The manuscript is well-organized, and the results are important and could be interesting for a broad range of literature, suggesting interesting new ideas for the field to explore.

      Comments on revised version:

      The authors have addressed all my comments made in the previous revision.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Qin et al. set out to investigate the role of mechanosensory feedback during swallowing and identify neural circuits that generate ingestion rhythms. They use Drosophila melanogaster swallowing as a model system, focusing their study on the neural mechanisms that control cibarium filling and emptying in vivo. They find that pump frequency is decreased in mutants of three mechanotransduction genes (nompC, piezo, and Tmc), and conclude that mechanosensation mainly contributes to the emptying phase of swallowing. Furthermore, they find that double mutants of nompC and Tmc have more pronounced cibarium pumping defects than either single mutants or Tmc/piezo double mutants. They discover that the expression patterns of nompC and Tmc overlap in two classes of neurons, md-C and md-L neurons. The dendrites of md-C neurons warp the cibarium and project their axons to the subesophageal zone of the brain. Silencing neurons that express both nompC and Tmc leads to severe ingestion defects, with decreased cibarium emptying. Optogenetic activation of the same population of neurons inhibited filling of the cibarium and accelerated cibarium emptying. In the brain, the axons of nompC∩Tmc cell types respond during ingestion of sugar but do not respond when the entire fly head is passively exposed to sucrose. Finally, the authors show that nompC∩Tmc cell types arborize close to the dendrites of motor neurons that are required for swallowing, and that swallowing motor neurons respond to the activation of the entire Tmc-GAL4 pattern.

      Strengths:

      • The authors rigorously quantify ingestion behavior to convincingly demonstrate the importance of mechanosensory genes in the control of swallowing rhythms and cibarium filling and emptying

      • The authors demonstrate that a small population of neurons that express both nompC and Tmc oppositely regulate cibarium emptying and filling when inhibited or activated, respectively

      • They provide evidence that the action of multiple mechanotransduction genes may converge in common cell types

      Thank you for your insightful and detailed assessment of our work. Your constructive feedback will help to improve our manuscript.

      Weaknesses:

      • A major weakness of the paper is that the authors use reagents that are expressed in both md-C and md-L but describe the results as though only md-C is manipulated-Severing the labellum will not prevent optogenetic activation of md-L from triggering neural responses downstream of md-L. Optogenetic activation is strong enough to trigger action potentials in the remaining axons. Therefore, Qin et al. do not present convincing evidence that the defects they see in pumping can be specifically attributed to md-C.

      Thank you for your comments. This is important point that we did not adequately address in the original preprint. We have obtained imaging and behavioral results that strongly suggest md-C, rather than md-L, are essential for swallowing behavior.

      36 hours after the ablation of the labellum, the signals of md-L were hardly observable when GFP expression was driven by the intersection between Tmc-GAL4 & nompC-QF (see F Figure 3—figure supplement 1A). This observation indicates that the axons of md-L likely degenerated after 36 hours, and were unlikely to influence swallowing. Moreover, the projecting pattern of Tmc-GAL4 & nompC-QF>>GFP exhibited no significant changes in the brain post labellum ablation.

      Furthermore, even after labellum ablation for 36 hours, flies exhibited responses to light stimulation (see Figure 3—figure supplement 1B-C, Video 5) when ReaChR was expressed in md-C. We thus reasoned that md-C but not md-L, plays a crucial role in the swallowing process.

      • GRASP is known to be non-specific and prone to false positives when neurons are in close proximity but not synaptically connected. A positive GRASP signal supports but does not confirm direct synaptic connectivity between md-C/md-L axons and MN11/MN12.

      In this study, we employed the nSyb-GRASP, wherein the GRASP is expressed at the presynaptic terminals by fusion with the synaptic marker nSyb. This method demonstrates an enhanced specificity compared to the original GRASP approach.

      Additionally, we utilized +/ UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; + / MN-LexA fruit flies as a negative control to mitigate potential false signals originating from the tool itself (Author response image 1, scale bar = 50μm). Beside the genotype Tmc-Gal4, Tub(FRT. Gal80) / UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies discussed in this manuscript, we also incorporated genotype Tmc-Gal4, Tub(FRT. Gal80) / lexAop-nSyb-spGFP1-10, UAS-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies as a reverse control (Author response image 2). Unexpectedly, similar positive signals were observed, indicating that, positive signals may emerge due to close proximity between neurons even with nSyb-GRASP.

      Author response image 1

      It should be noted that the existence of synaptic projections from motor neurons (MN) to md-C cannot be definitively confirmed at this juncture. At present, we can only posit the potential for synaptic connections between md-C and motor neurons. A more conclusive conclusion may be attainable with the utilization of comprehensive whole-brain connectome data in future studies.

      Author response image 2

      • As seen in Figure 2—figure supplement 1, the expression pattern of Tmc-GAL4 is broader than md-C alone. Therefore, the functional connectivity the authors observe between Tmc expressing neurons and MN11 and 12 cannot be traced to md-C alone

      It is true that the expression pattern of Tmc-GAL4 is broader than that of md-C alone. Our experiments, including those flies expressing TNT in Tmc+ neurons, demonstrated difficulties in emptying (Figure 2A, 2D). Notably, we encountered challenges in finding fly stocks bearing UAS>FRT-STOP-P2X2. Consequently, we opted to utilize Tmc-GAL4 to drive UAS-P2X2 instead. We believe that the results further support our hypothesis on the role of md-C in the observed behavioral change in emptying.

      Overall, this work convincingly shows that swallowing and swallowing rhythms are dependent on several mechanosensory genes. Qin et al. also characterize a candidate neuron, md-C, that is likely to provide mechanosensory feedback to pumping motor neurons, but the results they present here are not sufficient to assign this function to md-C alone. This work will have a positive impact on the field by demonstrating the importance of mechanosensory feedback to swallowing rhythms and providing a potential entry point for future investigation of the identity and mechanisms of swallowing central pattern generators.

      Reviewer #2 (Public Review):

      In this manuscript, the authors describe the role of cibarial mechanosensory neurons in fly ingestion. They demonstrate that pumping of the cibarium is subtly disrupted in mutants for piezo, TMC, and nomp-C. Evidence is presented that these three genes are co-expressed in a set of cibarial mechanosensory neurons named md-C. Silencing of md-C neurons results in disrupted cibarial emptying, while activation promotes faster pumping and/or difficulty filling. GRASP and chemogenetic activation of the md-C neurons is used to argue that they may be directly connected to motor neurons that control cibarial emptying.

      The manuscript makes several convincing and useful contributions. First, identifying the md-C neurons and demonstrating their essential role for cibarium emptying provides reagents for further studying this circuit and also demonstrates the important of mechanosensation in driving pumping rhythms in the pharynx. Second, the suggestion that these mechanosensory neurons are directly connected to motor neurons controlling pumping stands in contrast to other sensory circuits identified in fly feeding and is an interesting idea that can be more rigorously tested in the future.

      At the same time, there are several shortcomings that limit the scope of the paper and the confidence in some claims. These include:

      a) the MN-LexA lines used for GRASP experiments are not characterized in any other way to demonstrate specificity. These were generated for this study using Phack methods, and their expression should be shown to be specific for MN11 and MN12 in order to interpret the GRASP experiments.

      Thanks for the suggestion. We have checked the expression pattern of MN-LexA, which is similar to MN-GAL4 used in previous work (Manzo et al., PNAS., 2012, PMID:22474379) . Here is the expression pattern:

      Author response image 3

      b) There is also insufficient detail for the P2X2 experiment to evaluate its results. Is this an in vivo or ex vivo prep? Is ATP added to the brain, or ingested? If it is ingested, how is ATP coming into contact with md-C neuron if it is not a chemosensory neuron and therefore not exposed to the contents of the cibarium?

      The P2X2 experimental preparation was done ex vivo. We immersed the fly in the imaging buffer, as described in the Methods section under Functional Imaging. Following dissection and identification of the subesophageal zone (SEZ) area under fluorescent microscopy, we introduced ATP slowly into the buffer, positioned at a distance from the brain

      c) In Figure 3C, the authors claim that ablating the labellum will remove the optogenetic stimulation of the md-L neuron (mechanosensory neuron of the labellum), but this manipulation would presumably leave an intact md-L axon that would still be capable of being optogenetically activated by Chrimson.

      Please refer to the corresponding answers for reviewer 1 and Figure 3—figure supplement 1.

      d) Average GCaMP traces are not shown for md-C during ingestion, and therefore it is impossible to gauge the dynamics of md-C neuron activation during swallowing. Seeing activation with a similar frequency to pumping would support the suggested role for these neurons, although GCaMP6s may be too slow for these purposes.

      Profiling the dynamics of md-C neuron activation during swallowing is crucial for unraveling the operational model of md-C and validating our proposed hypothesis. Unfortunately, our assay faces challenges in detecting probable 6Hz fluorescent changes with GCaMP6s.

      In general, we observed an increase of fluorescent signals during swallowing, but movement of alive flies during swallowing influenced the imaging recording, so we could not depict a decent tracing for calcium imaging for md-C neurons. To enhance the robustness of our findings, patching the md-C neurons would be a more convincing approach. As illustrated in Figure 2, the somata of md-C neurons are situated in the cibarium rather than the brain. patching of the md-C neuron somata in flies during ingestion is difficult.

      e) The negative result in Figure 4K that is meant to rule out taste stimulation of md-C is not useful without a positive control for pharyngeal taste neuron activation in this same preparation.

      We followed methods used in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which we believe could confirm that md-C do not respond to sugars.

      In addition to the experimental limitations described above, the manuscript could be organized in a way that is easier to read (for example, not jumping back and forth in figure order).

      Thanks for your suggestion and the manuscript has been reorganized.

      Reviewer #3 (Public Review):

      Swallowing is an essential daily activity for survival, and pharyngo-laryngeal sensory function is critical for safe swallowing. In Drosophila, it has been reported that the mechanical property of food (e.g. Viscosity) can modulate swallowing. However, how mechanical expansion of the pharynx or fluid content sense and control swallowing was elusive. Qin et al. showed that a group of pharyngeal mechanosensory neurons, as well as mechanosensory channels (nompC, Tmc, and Piezo), respond to these mechanical forces for regulation of swallowing in Drosophila melanogaster.

      Strengths:

      There are many reports on the effect of chemical properties of foods on feeding in fruit flies, but only limited studies reported how physical properties of food affect feeding especially pharyngeal mechanosensory neurons. First, they found that mechanosensory mutants, including nompC, Tmc, and Piezo, showed impaired swallowing, mainly the emptying process. Next, they identified cibarium multidendritic mechanosensory neurons (md-C) are responsible for controlling swallowing by regulating motor neuron (MN) 12 and 11, which control filling and emptying, respectively.

      Weaknesses:

      While the involvement of md-C and mechanosensory channels in controlling swallowing is convincing, it is not yet clear which stimuli activate md-C. Can it be an expansion of cibarium or food viscosity, or both? In addition, if rhythmic and coordinated contraction of muscles 11 and 12 is essential for swallowing, how can simultaneous activation of MN 11 and 12 by md-C achieve this? Finally, previous reports showed that food viscosity mainly affects the filling rather than the emptying process, which seems different from their finding.

      We have confirmed that swallowing sucrose water solution activated md-C neurons, while sucrose water solution alone could not (Figure 4J-K). We hypothesized that the viscosity of the food might influence this expansion process.

      While we were unable to delineate the activation dynamics of md-C neurons, our proposal posits that these neurons could be activated in a single pump cycle, sequentially stimulating MN12 and MN11. Another possibility is that the activation of md-C neurons acts as a switch, altering the oscillation pattern of the swallowing central pattern generator (CPG) from a resting state to a working state.

      In the experiments with w1118 flies fed with MC (methylcellulose) water, we observed that viscosity predominantly affects the filling process rather than the emptying process, consistent with previous findings. This raises an intriguing question. Our investigation into the mutation of mechanosensitive ion channels revealed a significant impact on the emptying process. We believe this is due to the loss of mechanosensation affecting the vibration of swallowing circuits, thereby influencing both the emptying and filling processes. In contrast, viscosity appears to make it more challenging for the fly to fill the cibarium with food, primarily attributable to the inherent properties of the food itself.

      Reviewer #4 (Public Review):

      A combination of optogenetic behavioral experiments and functional imaging are employed to identify the role of mechanosensory neurons in food swallowing in adult Drosophila. While some of the findings are intriguing and the overall goal of mapping a sensory to motor circuit for this rhythmic movement are admirable, the data presented could be improved.

      The circuit proposed (and supported by GRASP contact data) shows these multi-dendritic neurons connecting to pharyngeal motor neurons. This is pretty direct - there is no evidence that they affect the hypothetical central pattern generator - just the execution of its rhythm. The optogenetic activation and inhibition experiments are constitutive, not patterned light, and they seem to disrupt the timing of pumping, not impose a new one. A slight slowing of the rhythm is not consistent with the proposed function.

      Motor neurons implicated in patterned motions can be considered effectors of Central Pattern Generators (CPGs)(Marder et al., Curr Biol., 2001, PMID: 11728329; Hurkey et al., Nature., 2023, PMID:37225999). Given our observation of the connection between md-C neurons and motor neurons, it is reasonable to speculate that md-C neurons influence CPGs. Compared to the patterned light (0.1s light on and 0.1s light off) used in our optogenetic experiments, we noted no significant changes in their responses to continuous light stimulation. We think that optogenetic methods may lead to overstimulation of md-C neurons, failing to accurately mimic the expansion of the cibarium during feeding.

      Dysfunction in mechanosensitive ion channels or mechanosensory neurons not only disrupts the timing of pumping but also results in decreased intake efficiency (Figure 1E). The water-swallowing rhythm is generally stable in flies, and swallowing is a vital process that may involve redundant ion channels to ensure its stability.

      The mechanosensory channel mutants nompC, piezo, and TMC have a range of defects. The role of these channels in swallowing may not be sufficiently specific to support the interpretation presented. Their other defects are not described here and their overall locomotor function is not measured. If the flies have trouble consuming sufficient food throughout their development, how healthy are they at the time of assay? The level of starvation or water deprivation can affect different properties of feeding - meal size and frequency. There is no description of how starvation state was standardized or measured in these experiments.

      Defects in mechanosensory channel mutants nompC, piezo, and TMC, have been extensively investigated (Hehlert et al., Trends Neurosci., 2021, PMID:332570000). Mutations in these channels exhibit multifaceted effects, as illustrated in our RNAi experiments (see Figure 2E). Deprivation of water and food was performed in empty fly vials. It's important to note that the duration of starvation determines the fly's willingness to feed but not the pump frequency (Manzo et al., PNAS., 2012, PMID:22474379).

      In most cases, female flies were deprived water and food in empty vials for 24 hours because after that most flies would be willing to drink water. The deprivation time is 12 hours for flies with nompC and Tmc mutated or flies with Kir2.1 expressed in md-C neurons, as some of these flies cannot survive 24h deprivation.

      The brain is likely to move considerably during swallow, so the GCaMP signal change may be a motion artifact. Sometimes this can be calculated by comparing GCaMP signal to that of a co-expressed fluorescent protein, but there is no mention that this is done here. Therefore, the GCaMP data cannot be interpreted.

      We did not co-express a fluorescent protein with GCaMP for md-C. The head of the fly was mounted onto a glass slide, and we did not observe significant signal changes before feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      .>Abstract: I disagree that swallow is the first step of ingestion. The first paragraph also mentions the final checkpoint before food ingestion. Perhaps sufficient to say that swallow is a critical step of ingestion.

      Indeed, it is not rigorous enough to say “first step”. This has been replaced by “early step”.

      Introduction:

      Line 59: "Silence" should be "Silencing"

      This has been replaced.

      Results:

      Lines 91-92: I am not clear about what this means. 20% of nompC and 20% of wild-type flies exhibit incomplete filling? So nompC is not different from wild-type?

      Sorry for the mistake. Viscous foods led to incomplete emptying (not incomplete filling), as displayed in Video 4. The swallowing behavior differs between nompC mutants and wild-type flies, as illustrated in Figure 1C, Figure 1—figure supplement 1A-C and video 1&5.

      When fed with 1% MC water solution (Figure 1—figure supplement 1E-H). We found that when fed with 1% MC watere solution, Tmc or piezo mutants displayed incomplete emptying, which could constitute a long time proportion of swallowing behavior; while only 20% of nompC flies and 20% of wild-type flies sporadically exhibit incomplete emptying, which is significantly different. Though the percent of flies displaying incomplete pump is similar between nompC mutant and wild-type files, you can find it quite different in video 1 and 5.

      Line 94: Should read: “while for foods with certain viscosity, the pump of Tmc or piezo mutants might"

      What evidence is there for weakened muscle motion? The phenotypes of all three mutants is quite similar, so concluding that they have roles in initiation versus swallowing strength is not well supported -this would be better moved to the discussion since it is speculative.

      Muscles are responsible for pumping the bolus from the mouth to the crop. In the case of Tmc or piezo mutants, as evidenced by incomplete filling for viscous foods (see Video 4), we speculate that the loss of sensory stimuli leads to inadequate muscle contraction. The phenotypes observed in Tmc and piezo mutants are similar yet distinct from those of the wild-type or nompC mutant, as shown in Video 1 and 4. The phrase "due to weakened muscle motion" has been removed for clarity.

      Line 146: If md-L neurons are also labeled by this intersection, then you are not able to know whether the axons seen in the brain are from md-L or md-C neurons. Line 148: cutting the labellum is not sufficient to ablate md-L neurons. The projections will still enter the brain and can be activated with optogenetics, even after severing the processes that reside in the labellum.

      Please refer to the responses for reviewer #1 (Public Review):” A major weakness of the paper…” and Figure 4.

      Line 162: If the fly head alone is in saline, do you know that the sucrose enters the esophagus? The more relevant question here is whether the md-C neurons respond to mechanical force. If you could artificially inflate the cibarium with air and see the md-C neurons respond that would be a more convincing result. So far you only know that these are activated during ingestion, but have not shown that they are activated specifically by filling or emptying. In addition, you are not only imaging md-C (md-L is also labeled). This caveat should be mentioned.

      We followed the methods outlined in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which suggested that md-C neurons do not respond to sugars. While we aimed to mechanically stimulate md-C neurons, detecting signal changes during different steps of swallowing is challenging. This aspect could be further investigated in subsequent research with the application of adequate patch recording or two-photon microscopy (TPM).

      Figure 3: It is not clear what the pie charts in Figure 3 A refer to. What are the three different rows, and what does blue versus red indicate?

      Figure 3A illustrates three distinct states driven by CsChrimson light stimulation of md-C neurons, with the proportions of flies exhibiting each state. During light activation, flies may display difficulty in filling, incomplete filling, or a normal range of pumping. The blue and red bars represent the proportions of flies showing the corresponding state, as indicated by the black line.

      Figure 4: Where are the example traces for J? The comparison in K should be average dF/F before ingestion compared with average dF/F during ingestion. Comparing the in vitro response to sucrose to the in vivo response during ingestion is not a useful comparison.

      Please refer to the answers for reviewer #2 question d).

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments that would address some of my concerns listed in the public review include:

      a) high resolution SEZ images of MN-LexA lines crossed to LexAop-GFP to demonstrate their specificity

      b) more detail on the P2X2 experiment. It is hard to make suggestions beyond that without first seeing the details.

      c) presenting average GCaMP traces for all calcium imaging results

      d) to rule out taste stimulation of md-C (Figure 4K) I would suggest performing more extensive calcium imaging experiments with different stimuli. For example, sugar, water, and increasing concentrations of a neutral osmolyte (e.g. PEG) to suppress the water response. I think that this is more feasible than trying to get an in vitro taste prep to be convincing.

      Please refer to the responses for public review of reviewer #2.

      Reviewer #3 (Recommendations For The Authors):

      Below I list my suggestions as well as criticisms.

      (1) It would be excellent if the authors could demonstrate whether varying levels of food viscosity affect md-C activation.

      That is a good point, and could be studied in future work.

      (2) It is not clear whether an intersectional approach using TMC-GAL4 and nompC-QF abolishes labelling of the labellar multidendritic neurons. If this is the case, please show labellar multidendritic neurons in TMC-GAL4 only flies and flies using the intersectional approach. Along with this question, I am concerned that labellum-removed flies could be used for feeding assay.

      Intersectional labelling using TMC-GAL4 and nompC-QF could not abolish labelling of the labellar multidendritic neurons (Author response image 4). Labellum-removed flies could be used for feeding assay (Figure 3—figure supplement 1B-C, video 5), but once LSO or cibarium of fly was damaged, swallowing behavior would be affected. Removing labellum should be very careful.

      Author response image 4

      (3) Please provide the detailed methods for GRASP and include proper control.

      Please refer to the responses for public review of reviewer #1.

      (4) The authors hypothesized that md-C sequentially activates MN11 and 12. Is the time gap between applying ATP on md-C and activation of MN11 or MN12 different? Please refer to the responses for public review of reviewer #3. The time gap between applying ATP on md-C and activation of MN11 or MN12 didn’t show significant differences, and we think the reason is that the ex vivo conditions could not completely mimic in vivo process.

      I found the manuscript includes many errors, which need to be corrected.

      (1) The reference formatting needs to be rechecked, for example, lines 37, 42, and 43.

      (2) Line 44-46: There is some misunderstanding. The role of pharyngeal mechanosensory neurons is not known compared with chemosensory neurons.

      (3) Line 49: Please specify which type of quality of food. Chemical or physical?

      (4) Line 80 and Figure 1B-D Authors need to put filling and emptying time data in the main figure rather than in the supplementary figure. Otherwise, please cite the relevant figures in the text(S1A-C).

      (5) Line 84-85; Is "the mutant animals" indicating only nompC? Please specify it.

      (6) Figure 1a: It is hard to determine the difference between the series of images. And also label filling and emptying under the time.

      (7) S1E-H: It is unclear what "Time proportion of incomplete pump" means. Please define it.

      (8) Please reorganize the figures to follow the order of the text, for example, figures 2 and 4

      (9) Figure 4A. There is mislabelling in Figure 4A. It is supposed to be phalloidin not nc82.

      (10) Figure 4K: It does not match the figure legend and main text.

      (11) Figure 4D and G: Please indicate ATP application time point.

      Thanks for your correction and all the points mentioned were revised.

      Reviewer #4 (Recommendations For The Authors):

      The figures need improvement. 1A has tiny circles showing pharynx and any differences are unclear.

      The expression pattern of some of these drivers (Supplement) seems quite broad. The tmc nompC intersection image in Figure 1F is nice but the cibarium images are hard to interpret: does this one show muscle expression? What are "brain" motor neurons? Where are the labellar multi-dendritic neurons?

      Tmc nompC intersection image show no expression in muscles. Somata of motor neurons 12 or 11 situated at SEZ area of brain, while somata of md-C neurons are in the cibarium. Image of md-L neurons was posted in response for reviewer #3 (Recommendations For The Authors):

      Why do the assays alternate between swallowing food and swallowing water?

      Thank for your suggestion, figure 1A has been zoomed-in. The Tmc nompC intersection image in Figure 2F displayed the position of md-C neurons in a ventral perspective, and muscles were not labelled. We stained muscles in cibarium by phalloidin and the image is illustrated in Figure 4A, while we didn’t find overlap between md-C neurons and muscles. Image of md-L neurons were posted as Author response image 4.

      In the majority of our experiments, we employed water to test swallowing behavior, while we used methylcellulose water solution to test swallowing behavior of mechanoreceptor mutants, and sucrose solution for flies with md-C neurons expressing GCaMP since they hardly drank water when their head capsules were open.

      How starved or water-deprived were the flies?

      One day prior to the behavioral assays, flies were transferred to empty vials (without water or food) for 24 hours for water deprivation. Flies who could not survive 24h deprivation would be deprived for 12h.

      How exactly was the pumping frequency (shown in Fig 1B) measured? There is no description in the methods at all. If the pump frequency is scored by changes in blue food intensity (arbitrary units?), this seems very subjective and maybe image angle dependent. What was camera frame rate? Can it capture this pumping speed adequately? Given the wealth of more quantitative methods for measuring food intake (eg. CAFE, flyPAD), it seems that better data could be obtained.

      How was the total volume of the cibarium measured? What do the pie charts in Figure 3A represent?

      The pump frequency was computed as the number of pumps divided by the time scale, following the methodology outlined in Manzo et al., 2012. Swallowing curves were plotted using the inverse of the blue food intensity in the cibarium. In this representation, ascending lines signify filling, while descending lines indicate emptying (see Figure 2D, 3B). We maintain objectivity in our approach since, during the recording of swallowing behavior, the fly was fixed, and we exclusively used data for analysis when the Region of Interest (ROI) was in the cibarium. This ensures that the intensity values accurately reflect the filling and emptying processes. Furthermore, we conducted manual frame-by-frame checks of pump frequency, and the results align with those generated by the time series analyzer V3 of ImageJ.

      For the assessment of total volume of ingestion, we referred the methods of CAFE, utilizing a measurable glass capillary. We then calculated the ingestion rate (nL/s) by dividing the total volume of ingestion by the feeding time.

      The changes seem small, in spite of the claim of statistical significance.

      The observed stability in pump frequency within a given genotype underscores the significance of even seemingly small changes, which is statistically significant. We speculate that the stability in swallowing frequency suggests the existence of a redundant mechanism to ensure the robustness of the process. Disruption of one channel might potentially be partially compensated for by others, highlighting the vital nature of the swallowing mechanism.

      How is this change in pump frequency consistent with defects in one aspect of the cycle - either ingestion (activation) or expulsion (inhibition)?

      Please refer to Figure 2, 3. Both filling and emptying process were affects, while inhibition mainly influences emptying time (Figure 1—figure supplement 1).

      for the authors:

      Line 48: extensively

      Line 62 - undiscovered.

      Line 107, 463: multi

      Line 124: What is "dysphagia?" This is an unusual word and should be defined.

      Line 446: severe

      Line 466: in the cibarium or not?

      Thanks for your correction and all the places mentioned were revised.

    2. Reviewer #3 (Public Review):

      Strengths:<br /> There are many reports on the effect of chemical properties of foods on feeding in fruit flies, but only limited studies reported how physical properties of food affect feeding especially pharyngeal mechanosensory neurons. First, they found that mechanosensory mutants, including nompC, Tmc, and Piezo, showed impaired swallowing, mainly the emptying process. Next, they identified cibarium multidendritic mechanosensory neurons (md-C) are responsible for controlling swallowing by regulating motor neuron (MN) 12 and 11, which control filling and emptying, respectively.

      Weaknesses:<br /> While the involvement of md-C and mechanosensory channels in controlling swallowing is convincing, it is not yet clear which stimuli activate md-C. Can it be an expansion of cibarium or food viscosity, or both? In addition, if rhythmic and coordinated contraction of muscles 11 and 12 is essential for swallowing, how can simultaneous activation of MN 11 and 12 by md-C achieve this? Finally, previous reports showed that food viscosity mainly affects the filling rather than the emptying process, which seems different from their finding.

    3. eLife assessment

      This valuable study investigates the role of mechanosensory feedback during swallowing in adult Drosophila. The authors provide convincing evidence that three mechanotransduction channel genes are required for ingestion rhythms and localize the role of these genes to a specific subpopulation of pharyngeal mechanosensory neurons. However, there is incomplete evidence to support the conclusions that these sensory neurons are necessary for swallowing, respond to stretch during swallowing, and connect to the motor neurons that control swallowing. This work may be of interest to neuroscientists interested in motor control of feeding behavior.

    4. Reviewer #1 (Public Review):

      Qin et al. set out to investigate the role of mechanosensory feedback during swallowing and identify neural circuits that generate ingestion rhythms. They use Drosophila melanogaster swallowing as a model system, focusing their study on the neural mechanisms that control cibarium filling and emptying in vivo. They find that pump frequency is decreased in mutants of three mechanotransduction genes (nompC, piezo, and Tmc), and conclude that mechanosensation mainly contributes to the emptying phase of swallowing. Furthermore, they find that double mutants of nompC and Tmc have more pronounced cibarium pumping defects than either single mutants or Tmc/piezo double mutants. They discovered that the expression patterns of nompC and Tmc overlap in two classes of neurons, md-C and md-L neurons. The dendrites of md-C neurons warp the cibarium and project their axons to the subesophageal zone of the brain. Silencing neurons that express both nompC and Tmc leads to severe ingestion defects, with decreased cibarium emptying. Optogenetic activation of the same population of neurons inhibited filling of the cibarium and accelerated cibarium emptying. In the brain, the axons of nompC∩Tmc cell types respond during ingestion of sugar but do not respond when the entire fly head is passively exposed to sucrose. Finally, the authors show that nompC∩Tmc cell types arborize close to the dendrites of motor neurons that are required for swallowing and that swallowing motor neurons respond to the activation of the entire Tmc-GAL4 pattern.

      Strengths:<br /> -The authors rigorously quantify ingestion behavior to convincingly demonstrate the importance of mechanosensory genes in the control of swallowing rhythms and cibarium filling and emptying<br /> -The authors demonstrate that a small population of neurons that express both nompC and Tmc oppositely regulate cibarium emptying and filling when inhibited or activated, respectively<br /> -They provide evidence that the action of multiple mechanotransduction genes may converge in common cell types

      Weaknesses:<br /> -A major weakness of the paper is that the authors use reagents that are expressed in both md-C and md-L but describe the results as though only md-C is manipulated<br /> -Evidence that the defects they see in pumping can be specifically attributed to md-C is based on severing the labellum and allowing md-L neurons to degrade.<br /> -GRASP is known to be non-specific and prone to false positives when neurons are in close proximity but not synaptically connected. A positive GRASP signal supports but does not confirm direct synaptic connectivity between md-C/md-L axons and MN11/MN12.<br /> -MN11/MN12 LexA lines are not included in the manuscript and their expression patterns (shared with the reviewers in the author response) do not appear to contain any motor neurons. Double labeling with previously described MN11 and MN12 motor neuron Gal4 lines is needed to support the claim that these LexA lines in fact label MN11 and MN12.<br /> -As seen in Figure Supplement 2, the expression pattern of Tmc-GAL4 is broader than md-C alone. Therefore, the functional connectivity the authors observe between Tmc expressing neurons and MN11 and 12 cannot be traced to md-C alone<br /> -Example traces of md-C calcium imaging during ingestion in vivo are not included, and evidence that md-C neurons respond to mechanical force is lacking<br /> -A positive control (perhaps demonstrating that sugar sensory neurons respond to sucrose in this preparation) is needed to assess whether the lack of response to sucrose ex vivo in Figure 4K is informative<br /> -Proximity between md-C neurons and muscles is not evidence that they sense stretch<br /> -Reporting of posthoc tests needs to be improved throughout the manuscript, as it is not clear which comparisons are noted with asterisks in the figures.

      Overall, this work convincingly shows that swallowing and swallowing rhythms are dependent on several mechanosensory genes. Qin et al. also characterize a candidate neuron, md-C, that is likely to provide mechanosensory feedback to pumping motor neurons, but the results they present here are not sufficient to assign this function to md-C alone. This work will have a positive impact on the field by demonstrating the importance of mechanosensory feedback to swallowing rhythms and providing a potential entry point for future investigation of the identity and mechanisms of swallowing central pattern generators.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reply to comments:

      (1) It was not clear why the phylogenetic analysis included non-validated GPCRs that clustered with the validated peptidergic receptors. Would restricting the phylogenetic analyses only to confirmed peptidergic GPCRs alter the topology of the tree and subsequent conclusions of independent expansion?

      Thank you for this comment. In general, phylogenetic analyses become more robust if a larger diversity and fuller complement of sequences are included. With very sparse sampling, sequences that are homologous but not orthologous may be misleadingly grouped together, because intermediate sequences have been left out. For tree building, we thus did not want to focus only on experimentally validated receptors but also on all receptors that are phylogenetically related to the validated receptors. Only this approach can ensure a comprehensive exploration of the relationship of peptidergic receptors. The broader phylogenetic approach was also essential to identify orthologs to the experimentally validated Nematostella receptors across other cnidarian species.

      (2) Clearly, other neuropeptide signaling systems in cnidarians remain to be discovered but this paper represents a huge step forward.

      We appreciate this assessment of the paper. We agree that many systems remain to be discovered. Our paper will also help with the identification of further receptors both in Nematostella as well as other cnidarian species. Please note that we have made specific receptor-ligand predictions for several cnidarian species based on our phylogenetic analysis. Our phylogenies could also help prioritize the study of the remaining orphan Nematostella GPCRs.

      (3) There are limitations in what can be interpreted from single cell transcriptomic data but the data nevertheless provide the foundations for future studies involving i). detailed anatomical analysis of neuropeptide and neuropeptide receptor expression in N. vectensis using mRNA in situ hybridization and/or immunohistochemical methods and ii). functional analysis of the physiological/behavioral roles of neuropeptide signaling systems in N. vectensis

      We fully agree with this comment. The analysis of the available single-cell sequence resources clearly represents only the first step of anatomical and functional analyses. Our aim was to place the identified peptide-receptor interactions into a whole-organism context with cell type resolution, to highlight the potential complexity of peptidergic signaling in this organism and to facilitate the exploration and conceptualisation of our biochemical screen.

      Comments to authors

      (1) In future, when preparing manuscripts, please use page and line numbers; it makes the task below for reviewers much easier!

      We appreciate the suggestion and will do this for future manuscripts.

      (2) In the abstract the term "extensively wired" is used. In the context of neuropeptide mediated volume transmission this may not be an appropriate term to use because use of the word "wired" is likely to be associated with point-to-point type classical synaptic transmission; "extensively connected" would be better.

      Thank you for this comment. We have changed the text in the abstract to “extensively connected”.

      (3) Introduction: Please change "seven-transmembrane proteins and show a slower evolutionary rate than proneuropeptide..." to "seven-transmembrane proteins that show a slower evolutionary rate than proneuropeptide..."

      Changed.

      (4) Under the section "Creation of a Nematostella neuropeptide library, what is meant by "our regular expressions"? This needs to be rephrased to make it clearer what is meant.

      We have now rephrased the relevant sentence to make our approach clearer.

      “This predicted secretome was filtered with regular expressions to detect sequences with the repetitive dibasic cleavage sites (K and R in any combination) and amidation sites, using a custom script from a previous publication (Thiel et al., 2021).”

      and later:

      “Based on the MS data, we included the additional, non-dibasic N-terminal cleavage sites into our script that uses regular expressions to search for repetitive cleavage sites (Thiel et al., 2024) and re-screened the predicted secretome.”

      (5) Under the section "Creation of a Nematostella neuropeptide library" the phrase "differ in the length of their N-terminus" needs to be changed to "differ in the length of their N-terminal region". The N-terminus is, as its name implies, one end of the peptide/protein so it can't have a length as such.

      Changed.

      (6) Under the section "Analysis of metazoan class A GPCRs and selection of N. vectensis neuropeptide-receptor candidates",

      Change:

      "For a more detailed analysis, we then reduced our sampled species to the cnidarian, the bilaterian with experimentally confirmed GPCRs and Petromyzon marinus, and the two placozoan species (Figure 2B)."

      To

      "For a more detailed analysis, we then reduced our sampled species to cnidarians, bilaterians with experimentally confirmed GPCRs and Petromyzon marinus, and two placozoan species (Figure 2B)."

      Changed.

      (7) Under the section "Analysis of metazoan class A GPCRs and selection of N. vectensis neuropeptide-receptor candidates" - change "We re-run" to "We re-ran"

      Changed.

      (8) Throughout the paper reference is made to a variety of neuropeptides that have or are predicted to have an N-terminal pyroglutmate. However, these are referred to without indicating this post-translational modification e.g. QGRFamide.

      This should be corrected throughout the paper, in the text, and figures. Two abbreviations for pyroglutamate are used in the literature:

      pQ, which shows that the encoded amino acid is Q (Glutamine)

      pE, which shows that the post-translationally modified amino-acid is glutamate (E)

      In the neuropeptide field, pQ seems to be more widely used than pE, so our recommendation would be to use pQ.

      In the revised version we now write pyroQ whenever we refer to the actual peptide. We now only use the peptide name without indicating this modification when we refer to the precursor of these peptides.

      (9) The title for Figure 5 is rather short and vague. A title like "Tissue-specific expression of neuropeptide precursors and receptors in Nematostella" seems more appropriate

      We appreciate the reviewer's input, and we have made the change accordingly. The revised figure legend now reads: “Tissue-specific expression of neuropeptide precursors and receptors (GPCRs) in N. vectensis.”

      (10) All of the figures in the paper have been saved in bitmap format (e.g. tiff), which means that the resolution of the figures may end up being poor in the published article. All of the figures in this paper should be saved in vector format (e.g. eps) so that there is no loss of resolution when the size of the file/figure is reduced.

      We have now uploaded all figures in vector format (.eps or .pdf) to prevent any loss of resolution.

      (11) In Figure 3 - supplement 2 - the neuropeptides are referred to here as PRGamides and GPRGamides. Some consistency is needed here. And in Figure B, the G of one of the GPRGamides is not shown in black.

      Thank you for spotting this mistake. We now give the correct peptide sequence in parenthesis as "GPRGamide". We also highlighted the missing GPRGamide in the figure.

    2. eLife assessment

      This work identifies cnidarian neuropeptides and pairs them to their GPCR, then shows that neuropeptide signaling systems have evolved and diversified independently in cnidarians and bilaterians. Neuropeptide-receptor partners were experimentally identified using established and widely used methodologies including single cell mapping, providing compelling evidence for the conclusions of the paper. This impressive accomplishment provides fundamental new insights into the evolution of neuropeptide signaling systems and will be of broad interest to neurobiologists and evolution of development researchers.

    3. Joint Public Review:

      Neuropeptide signaling is an important component of nervous systems, where neuropeptides typically act via G-protein coupled receptors (GPCRs) to regulate many physiological and behavioral processes. Neuropeptides and their cognate GPCRs have been extensively characterized in bilaterian animals, revealing that a core set of neuropeptide signaling systems originated in common ancestors of extant Bilateria. Neuropeptides have also been identified in cnidarians, which are a sister group to the Bilateria. However, the GPCRs that mediate the effects of neuropeptides in cnidarians have not been identified.

      In this paper the authors perform a phylogenetic analysis of GPCRs in metazoans and report that the orthologs of bilaterian neuropeptide receptors are not found in cnidarians. This indicates that neuropeptide signaling systems have largely evolved independently in cnidarians and bilaterians. To accomplish this, they generated a library of putative and known neuropeptides computationally identified in the genome of the cnidarian sea anemone Nematostella vectensis. These peptides were systematically screened for their ability to activate any of the 161 putative Nematostella GPCRs.

      This work identified 31 validated GPCRs. These, together with GPCRs that cluster with them, were then used to demonstrate the independent expansion of GPCRs in cnidarian and bilaterian lineages. The authors then mapped validated receptors and ligands to the Nematostella single cell data to provide an overview of the cell types expressing these signaling genes. In addition, the authors have begun to analyze neuropeptide signaling networks in N. vectensis by showing potential signaling connections between cell types expressing neuropeptides and cell types expressing cognate receptors.

      This work is the most extensive pharmacological characterization of neuropeptide GPCRs in a cnidarian to date and thus represents an important accomplishment, and is one that will improve our understanding of how peptidergic signaling evolved in animals and its impact on evolution of nervous systems. In addition, this impressive work transforms our knowledge of neuropeptide signaling systems in cnidarians and provides the foundations for extensive functional characterization neuropeptide systems in the context of nervous systems that exhibit radial symmetry, contrasting with the bilaterally symmetrical architecture of the majority of bilaterian nervous systems.

      The reviewers did not detect any weaknesses in the work but asked that the authors comment on the following points, which they have done in the revised version.

      (1) Clearly, other neuropeptide signaling systems in cnidarians remain to be discovered but this paper represents a huge step forward.

      (2) There are limitations in what can be interpreted from single cell transcriptomic data but the data nevertheless provide the foundations for future studies involving i). detailed anatomical analysis of neuropeptide and neuropeptide receptor expression in N. vectensis using mRNA in situ hybridization and/or immunohistochemical methods and ii). functional analysis of the physiological/behavioral roles of neuropeptide signaling systems in N. vectensis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Chen et al. entitled, "The retina uncouples glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle", the authors look to provide insight on retinal metabolism and substrate utilization by using a murine explant model with various pharmacological treatments in conjunction with metabolomics. The authors conclude that photoreceptors, a specific cell within the explant, which also includes retinal pigment epithelium (RPE) and many other types of cells, are able to uncouple glycolytic and Krebs-cycle metabolism via three different pathways: 1) the mini-Krebs-cycle, fueled by glutamine and branched-chain amino acids; 2) the alanine-generating Cahill-cycle; and 3) the lactate-releasing Cori-cycle. While intriguing if determined to be true, these cell-specific conclusions are called into question due to the ex vivo experimental setup with the inclusion of RPE, the fact that the treatments were not cell-specific nor targeted at an enzyme specific to a certain cell within the retina, and no stable isotope tracing nor mitochondrial function assays were performed. Hence, without significant cell-specific methods and future experimentation, the primary claims are not supported.

      Strengths:

      This study attempts to improve on the issues that have limited the results obtained from previous ex vivo retinal explant studies by culturing in the presence of the RPE, which is a major player in the outer retinal metabolic microenvironment. Additionally, the study utilizes multiple pharmacologic methods to define retinal metabolism and substrate utilization.

      Weaknesses:

      A major weakness of this study is the lack of in vivo supporting data. Explant cultures remove the retina from its dual blood supply. Typically, retinal explant cultures are done without RPE. However, the authors included RPE in the majority of experimental conditions herein. However, it is unclear if the metabolomics samples included the RPE or not. The inclusion of the RPE, which is metabolically active and can be altered by the treatments investigated herein, further confounds the claims made regarding the neuroretina. Considering the pharmacologic treatments utilized with the explant cultures are not cell-specific and/or have significant off-target effects, it is difficult to ascertain that the metabolic changes are secondary to the effects on photoreceptors alone, which the authors claim. Additionally, the explants are taken at a very early age when photoreceptors are known to still be maturing. No mention or data is presented on how these metabolic changes are altered in retinal explants after photoreceptors have fully matured. Likewise, significant assumptions are made based on a single metabolomics experiment with no stable isotope tracing to support the pathways suggested. While the authors use immunofluorescence to support their claims at multiple points, demonstrating the presence of certain enzymes in the photoreceptors, many of these enzymes are present throughout the retina and likely the RPE. Finally, the claims presented here are in direction contradiction to recent in vivo studies that used cell-specific methods when examining retinal metabolism. No discussion of this difference in results is attempted. Response: We agree with the reviewer that in vivo studies could be very interesting indeed. However, technologically it will be extremely difficult to (repeatedly/continuously) sample the retina of an experimental animal and to combine this with an interventional study, with a subsequent metabolomic analysis. We do not currently have access to such technology nor are we aware of any other lab in the world capable of doing such studies. Moreover, virtually all prior studies on retinal metabolism have been done on explanted retina without RPE. This includes the seminal studies by Otto Warburg in the 1920s. As opposed to this, our retinal samples for also all the metabolomic analyses included the RPE, except for the no RPE condition that was used as a comparator for the earlier investigations.

      We note that our metabolomic analysis was done for all five experimental conditions where each condition included at least five independent samples (each derived from different animals).

      The reviewer is correct to say that our organotypic explant cultures are early post-natal, with explantation performed at post-natal day 9 and culturing until day 15. Since our retinal explant system has been validated extremely well over more than three decades of pertinent research (see for instance: Caffe et al., Curr Eye Res. 8:1083-92, 1989), we are confident that photoreceptors mature in vitro in ways that are very similar to the in vivo situation. As far as studies in adult retina (i.e. three months or older) are concerned, this is indeed an important question that will be addressed in future studies. Studies employing stable isotope labelling may also be very informative and are planned for the future, also in order to properly determine fluxes. This will likely require an extension to our NMR hardware with an 15N channel probe, something that we plan on implementing in the future.

      We are aware that a number of questions relating to retinal metabolism are controversial and that the use of other methodology or experimental systems may lead to alternative interpretations. We have now included citations of other studies that use, for example, conditional and/or inducible knock-outs or in vivo blood sampling (e.g. Wang et al., IOVS 38:48-55, 1997; Yu et al., Invest Ophthalmol Vis Sci. 46:4728-33, 2005; Swarup et al., Am J Physiol Cell Physiol. 316:C121-C133, 2019; Daniele et al., FASEB Journal 36:e22428, 2022) and discuss the pros and cons of such approaches (e.g. in Lines 376-384; 454-472).

      Reviewer #2 (Public Review):

      Summary:

      The authors aim to learn about retinal cell-specific metabolic pathways, which could substantially improve the way retinal diseases are understood and treated. They culture ex vivo mouse retinas for 6 days with 2 - 4 days of various drug treatments targeting different metabolic pathways or by removing the RPE/choroid tissue from the neural retina. They then look at photoreceptor survival, stain for various metabolic enzymes, and quantify a broad panel of metabolites. While this is an important question to address, the results are not sufficient to support the conclusions.

      Strengths:

      The questions the authors are exploring at extremely valuable and I commend the authors and working to learn more about retina metabolism. The different sensitivity of the cones to various drugs is interesting and may suggest key differences between rods and cones. The authors also provide a thoughtful discussion of various metabolic pathways in the context of previous publications.

      Weaknesses:

      As the authors point out, ex vivo culture models allow for control over multiple aspects of the environment (such as drug delivery) not available in vivo. Ex vivo cultures can provide good hints as to what pathways are available between interacting tissues. However, there are many limitations to ex vivo cultures, including shifting to a very artificial culture media condition that is extremely different than the native environment of the retina. It is well appreciated that cells have flexible metabolism and will adapt to the conditions provided. Therefore, observations of metabolic responses obtained under culture conditions need to be interpreted with caution, they indicate what the tissue is doing under those specific conditions (which include cells adapting and dying).

      Chen et al use pharmacological interventions to the impact of various metabolic pathways on photoreceptor survival and "long term" metabolic changes. The dose and timing of these drug treatments are not examined though. It is also hard to know how these drugs penetrate the tissue and it needs to be validated that the intended targets are being accurately hit. These relatively long-term treatments should be causing numerous downstream changes to metabolism, cell function, and survival, which makes looking at a snapshot of metabolite levels hard to interpret. It would be more valuable to look at multiple time points after drug treatment, especially easy time points (closer to 1 hr). The authors use metabolite ratios to make conclusions about pathway activity. It would be more valuable to directly measure pathway activity by looking a metabolite production rates in the media and/or with metabolic tracers again in time scales closer to minutes and hours instead of days.

      It is not clear from the text if the ex vivo samples with RPE/choroid intact are analyzed for metabolomics with the RPE/choroid still intact or if this is removed. If it is not removed, the comparison to the retina without RPE/choroid needs to be re-interpreted for the contribution of metabolites from the added tissue. The composition of the tissue is different and cannot be disentangled from the changes to the neural retina specifically.

      While the data is interesting and may give insights into some rod and cone-specific metabolic susceptibility, more work is needed to validate these conclusions. Given the limitations of the model the authors have over-interpreted their findings and the conclusions are not supported by the results. They need to either dramatically limit the scope of their conclusions or validate these hypotheses with additional models and tools.

      Response: We thank the reviewer for the insightful comments and agree that some of our interpretations may have been phrased too determinedly. We have therefore rephrased and toned down our conclusions in many instances in the text, and changed the manuscript title to now read “Retinal metabolism: Evidence for uncoupling of glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle”.

      Nevertheless, when considering the major known metabolic pathways and their possible impact on metabolite patterns after the experimental manipulations used here, we believe our interpretations to be consistent with the data obtained. Conversely, the previously suggested retinal aerobic glycolysis cannot explain most of the data we have obtained. Even further, also a predominant use of the classical “full” Krebs-cycle/OXPHOS would not explain the metabolite patterns found (e.g. alanine, N-acetylaspartate (NAA)). While this does not in itself mean that our interpretations are all correct, they seem plausible in view of the data at hand and will hopefully stimulate further research on retinal energy metabolism using complementary technologies that were not available to us for the purpose of this study.

      We comment that our organotypic retinal explant cultures, while they do contain their very own, native RPE, do not comprise the choroidal vasculature (in our explantation procedure the RPE readily detaches from the choroid).

      As far as the drugs used on retinal explants are concerned, we note that:

      (1) all three compounds used are extremely well validated, with literally thousands of studies and decades of research to their credit (i.e., 1,9-dideoxyforskolin: >270 publications since 1984; Shikonin: >1000 publications since 1977; FCCP: >2800 publications since 1967),

      (2) all experimental conditions show clear and differential drug effects, as shown, for instance, by the principal component analysis in Figure1I and the cluster analysis in Figure2A,

      (3) the response patterns observed for key metabolites match the anticipated drug effects (e.g. decreased glucose consumption with 1,9-dideoxyforskolin; decreased lactate levels with Shikonin; lactate accumulation with FCCP).

      One can therefore be reasonably certain that these drugs did penetrate the explanted retina and that their respective drug targets were hit. Assessing dose-responses would certainly be interesting, however, the aim of this initial study was not pharmacodynamics but a general manipulation of energy metabolism. Moreover, given the extensive validation of these drugs, off-target effects seem not very likely at the concentrations used.

      We agree with the reviewer that using a longitudinal, time-series type of analysis could give additional insights. We note that each additional time-point will require retinae from 25 animals and a very resource-intensive and time-consuming metabolomic analysis, together with a significantly more complex multivariate analysis (metabolite, experimental condition, time). This is a completely new undertaking that is simply not feasible as an extension of the present study.

      To look at pathway activity in more direct ways is very good idea, to this end we aim to implement in the future an idea put forward by the reviewers, namely 13C-labeling and additionally 15N-labeling and tracing for specific metabolic fuels (e.g. glucose, lactate and anaplerotic amino acids such as glutamate and branched chain amino acids).

      The reviewer is of course correct to say that the culture condition is somewhat artificial and that this may have introduced changes in the metabolism. However, as noted above in the first response to reviewer #1, the organotypic retinal culture system, using a defined medium, free of serum and antibiotics, has been extremely well studied and validated for decades (cf. Caffé et al., Curr Eye Res. 8:1083-92, 1989). Importantly, this system allows to maintain retinal viability, histotypic organization, and function over many weeks in culture. Moreover, most previous studies on retinal metabolism have also used explanted retina – acute or cultured – i.e. experimental approaches that are similar to what we have used and that may be liable to their own artefactual changes in metabolism. This includes the seminal, 1920s studies by Otto Warburg, or the 1980s studies by Barry Winkler, the results of which the reviewers do not seem to doubt.

      We further agree that studying retinal metabolism in a situation closer to in vivo conditions would be thrilling, however to our knowledge to date there is no retina model that fully mimics the complex interplay of the blood metabolome with metabolic tissue activity. This likely means that for each metabolic condition to study (e.g. hyperglycemia, cachexia, etc.), a fairly large number of animals will need to be sacrificed for the molecular investigation of ex vivo retinal biopsies, which would mean a tremendous animal burden.

      We hope the reviewer will appreciate that the revised manuscript now includes numerous improvements, along with new, additional datasets and figures, references to further relevant literature, and – as mentioned above – a more cautious phrasing of our interpretations and conclusions, including a more careful wording for the manuscript title.

      Reviewer #3 (Public Review):

      Summary:

      The neural retina is one of the most energetically active tissues in the body and research into retinal metabolism has a rich history. Prevailing dogma in the field is that the photoreceptors of the neural retina (rods and cones) are heavily reliant on glycolysis, and as oxygen tension at the level of photoreceptors is very low, these specialized sensory neurons carry out aerobic glycolysis, akin to the Warburg effect in cancer cells. It has been found that this unique metabolism changes in many retinal diseases, and targeting retinal metabolism may be a viable treatment strategy. The neural retina is composed of 11 different cell types, and many research groups over the past century have contributed to our current understanding of cell-specific metabolism of retinal cells. More recently, it has been shown in mouse models and co-culture of the mouse neural retina with human RPE cultures that photoreceptors are reliant on the underlying retinal pigment epithelium for supplying nutrients. Chen and colleagues add to this body of work by studying an ex vivo culture of the developing mouse retina that maintained contact with the retinal pigment epithelium. They exposed such ex vivo cultures to small molecule inhibitors of specific metabolic pathways, performing targeted metabolomics on the tissue and staining the tissue with key metabolic enzymes to lay the groundwork for what metabolic pathways may be active in particular cell types of the retina. The authors conclude that rod and cone photoreceptors are reliant on different metabolic pathways to maintain their cell viability - in particular, that rods rely on oxidative phosphorylation and cones rely on glycolysis. Further, their data support multiple mechanisms whereby glycolysis may occur simultaneously with anapleurosis to provide abundant energy to photoreceptors. The data from metabolomics revealed several novel findings in retinal metabolism, including the use of glutamine to fuel the mini-Krebs cycle, the utilization of the Cahill cycle in photoreceptors, and a taurine/hypotaurine shuttle between the underlying retinal pigment epithelium and photoreceptors to transfer reducing equivalents from the RPE to photoreceptors. In addition, this study provides robust quantitative metabolomics datasets that can be compared across experiments and groups. The use of this platform will allow for rapid testing of novel hypotheses regarding the metabolic ecosystem in the neural retina.

      Strengths:

      The data on differences in the susceptibility of rods and cones to mitochondrial dysfunction versus glycolysis provides novel hypothesis-generating conjectures that can be tested in animal models. The multiple mechanisms that allow anapleurosis and glycolysis to run side-by-side add significant novelty to the field of retinal metabolism, setting the stage for further testing of these hypotheses as well.

      Weaknesses:

      Almost all of the conclusions from the paper are preliminary, based on data showing enzymes necessary for a metabolic process are present and the metabolites for that process are also present. However, to truly prove whether these processes are happening, C13 labeling or knock-out or over-expression experiments are necessary. Further, while there is good data that RPE cultures in vitro strongly recapitulate RPE phenotypes in vivo, ex vivo neural retina cultures undergo rapid death. Thus, conclusions about metabolism from explants should either be well correlated with existing literature or lead to targeted in vivo studies. This paper currently lacks both.

      Response: As mentioned above in the first answers to reviewers #1 and #2, we think of our study as a starting point that may provide novel directions for a whole series of investigations into retinal energy metabolism. Especially the use of novel technologies may in the future allow to decipher the different metabolic phenotypes of the 100+ distinct retinal cell types by in situ spatial metabolomics and lipidomics. Currently, we still have to limit the scope of our studies to only certain aspects of this topic. We thus agree that some of our interpretations need to be formulated more carefully and we have done so in the revised version of our manuscript. We also agree with the reviewer that carbon (13C) labelling and tracing studies will be very informative and will engage in such studies in the future. Besides 13C, we aim to further employ 15N labelled substrates, which is especially suitable to study the destiny of amino acids.

      As far as our organotypic retinal explant system is concerned, it is arguably one of the best validated such systems available (see responses to reviewers #1 and #2). While the reviewer is correct to say that the neuroretina without RPE degenerates relatively quickly in vitro, in our system, with the neuroretina and its native RPE cultured together, we can routinely culture the retina for four weeks or more, without major cell loss (Söderpalm et al., IOVS 35:3910-21, 1994; Belhadj et al., JoVE 165, 2020). Thus, our retinal cultures with RPE do not undergo rapid death. Within the time-frame of the present study (6 days in vitro) culturing-induced cell death is minimal and unlikely to influence our analyses. For further, more detailed answers to the reviewers’ questions please see our detailed point-to-point response below.

      We agree with the reviewer that eventually in vivo studies will be important to confirm our interpretations. As mentioned in our initial response to reviewer #1, such studies will be very challenging and new technologies may need to be developed before in vivo investigations can deliver the answers to the questions at hand (see answer to question Rev#3.17 below), especially if the cross-play between substrate availability from the blood metabolome and the retinal metabolic pathway activity shall be studied.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      Rev#1.1. The animals should be screened for and lack rd8.

      Response: This is a pertinent question from the reviewer. Ever since we first became aware of the presence of rd8 mutations in certain mouse lines from major vendors (e.g. Charles River, Jackson Labs) in around 2010, we have setup regular screening of all our mouse lines for this Crb1 mutation. Accordingly, the mouse lines used in this study were confirmed to be free of the rd8 / Crb1 mutation. A corresponding remark has now been inserted into the SI materials and methods section (Lines 37-38).

      Rev#1.2. GLUT1 looks significantly different from in vivo to in vitro. Recommend co-staining with RHO and cone markers (PNA or CAR) to further delineate where it is being expressed. The in vitro cultures appear to have much shorter outer segments (OS). Considering OS biosynthesis is thought to drive a good deal of metabolic adaptations, how relevant is the in vitro model system to what is truly occurring in vivo?

      Response: The GLUT1 staining shown in Figure 1 displays the in vivo situation. Since may not have been entirely clear from the previous figure legend, we have now labelled this as “in vivo retina” and distinguish it from “in vitro” samples in the legend to Figure 1 (Lines 774-778). As far as the comparison of GLUT1 staining in vivo (Figure 1A3) vs. in vitro (Figure S1C3) is concerned, in both situations a strong RPE labelling is clearly visible, with essentially no GLUT1 label within the neuroretina.

      Nevertheless, to better delineate the expression of GLUT1 in the outer retina, we have now performed an additional co-staining with rhodopsin (RHO) as rod marker and peanut agglutinin (PNA) as cone marker, as suggested by the reviewer (new supplemental Figure S1). In brief, this co-staining confirms the strong expression of GLUT1 in the RPE, while there is essentially no GLUT1 detectable in rod or cone photoreceptors.

      Retinal explants in long-term cultures do indeed have somewhat shorter outer segments compared to same age in vivo counterparts (Caffe et al., Curr Eye Res. 8:1083-1092, 1989). However, in the short-term cultures (6 DIV) and at the age studied here (P15) outer segments have only just started to grow out and are around 10 - 12 µm long, both in vitro and in vivo (cf. LaVail, JCB 58:650-661, 1978). Thus, the metabolism required for outer segment synthesis should be equivalent when in vitro and in vivo situations are compared. For considerations on outer segments in retinal explant cultures see also Rev#3.2 and Rev#3.29.

      Rev#1.3. Also, recent publications have shown that GLUT1 is expressed in the neuroretina including rods, cones, and muller glia. Was GLUT1 not appreciated in these cells in your ex vivo samples and if so, why? Likewise, these same studies previously demonstrated GLUT1 resulted in rod degeneration but not cone. The results presented here differ significantly. Why the difference in results and is it secondary to the in vitro vs. in vivo setting? Furthermore, the authors state that they thought the no RPE situation would be similar to the GLUT1 inhibitor experimental condition but instead, they were vastly different. Is this secondary to the fact that GLUT1 is expressed outside the RPE.

      Response: We are aware that there is a controversy regarding GLUT1 expression in the neuroretina, please see also our response to question Rev#3.1 below. As far as our immunostaining for GLUT1 on in vivo retina is concerned, we find an unambiguous and very marked expression of GLUT1 in RPE cells, at both basal and apical sides. Compared to the RPE, the neuroretina appears devoid of GLUT1 staining. However, at very high gamma values a faint staining in the neuroretina becomes visible, a staining which from its appearance – processes spanning the entire width of the retina – is most compatible with Müller glia cells. Under normal circumstances we would have dismissed such a faint staining as background and false positive. Given the sometimes very contradicting reports in the literature, we cannot fully exclude a weak expression of GLUT1 also in cells other than the RPE, with Müller glial cells perhaps being the most likely candidate. At any rate, GLUT1 expression in the neuroretina can only be much weaker than in the RPE, making its relevance for overall retinal metabolism unclear.

      As far as recent publications studying GLUT1 in the retina are concerned, we know of the study by Daniele et al. (FASEB Journal 36:e22428, 2022), which used a rod-specific, conditional knock-out of GLUT1 and found a relatively slow rod degeneration. We are not aware of a selective GLUT1 knock-out in cones, nor are we aware of conditional GLUT3 knock-outs in the retina. For further discussion of the Daniele et al. study please see Rev#3.13.

      The reviewer is right, initially we were thinking that, since GLUT1 was expressed only (predominantly) in RPE, the metabolic response to GLUT1 inhibition should look similar to the no RPE situation. However, this initial hypothesis did not consider a key fact: The RPE builds the blood retinal barrier and the tight-junction coupled RPE cells are a barrier to any larger molecule, including glucose. Removing the barrier by removing the RPE dramatically increases the availability of glucose to the retina, a phenomenon that is likely exacerbated by the expression of the high affinity/high capacity GLUT3 on photoreceptors (cf. Figure S1A). In other words, when the RPE is removed the outer retina is “flooded” with glucose and we believe that this is probably the main factor that explains why the metabolic response to GLUT1 inhibition (1,9-DDF group) is so different from the no RPE condition.

      We have now included an additional corresponding explanation in the discussion (Lines 422-429). Furthermore, we have added an entire new subchapter to the discussion to debate the expression of glucose transporters in the outer retina (Lines 454-472).

      Rev#1.4. Shikonin's mechanism of action via protein aggregation and lack of specificity for PKM2 vs PKM1 at 4uM is an experimental limitation that needs to be taken into account. All treatments utilized are not cell-specific.

      Response: While the reviewer is correct to say that Shikonin may have multiple cellular targets and a diverse range of possible applications as an anti-inflammatory, antimicrobial, or anticancer agent (cf. Guo et al., Pharmacol. Res. 149:104463, 2019), numerous studies support its specificity for PKM2 over PKM1, at concentrations ranging from 1 – 10 µM (Chen et al., Oncogene 30:4297-306, 2011; Zhao et al., Sci. Rep. 8:14517, 2018; Traxler et al., Cell Metab. 34:1248-1263, 2022). We settled for 4 μM as an intermediate concentration, considering its effectiveness and specificity in previous studies. We have now inserted references detailing the specificity and concentration range of Shikonin into the SI Materials and Methods section (Line 62).

      The concern that “all treatments” are not cell-specific is debatable. Certainly, any given compound may have off-target effects, yet, since the compounds we used in our study have all been studied for decades (see above, initial response to Reviewer #2), their off-target profile is well established and unlikely to play an important role here. Moreover, in our study the cell specificity does not come from the compounds used but from where their targets are expressed. As shown in Figure 1A and in Figure S1C, Shikonin´s target PKM2 is almost exclusively expressed in photoreceptor inner segments. Hence, it seems very reasonable to expect that the vast majority of the metabolomic changes observed by Shikonin treatment are related to photoreceptors. We note that this assertion would still be true even if there was a low-level expression of PKM2 in other retinal cell types and/or if Shikonin had moderate off-target effects on other enzymes since the bulk of the effect on the quantitative metabolomic dataset would still originate from PKM2 inhibition in photoreceptors.

      Rev#1.5. What was the method of cone counting in Figure 1?

      Response: Cones were counted per 100 µm of retinal circumference based on an arrestin-3 staining (cone arrestin, CAR).

      This information is now included in the SI Materials and Methods section under “Microscopy, cell counting, and statistical analysis” (Lines 99-100).

      Rev#1.6. How do you know that FCCP is not altering RPE ox phos, disrupting the outer retinal microenvironment and leading to cell death, and therefore, the effects seen are not photoreceptor-specific but rather downstream from the initial insult in RPE?

      Response: We propose that FCCP will be acting on both photoreceptors and RPE cells (and all other retinal cell types) at essentially the same time, over the experimental time-frame. Thus, OXPHOS should be inhibited in all cells simultaneously. However, FCCP will primarily affect cells that actually use OXPHOS to a large extent, while cells relying on other metabolic pathways (e.g. glycolysis) will hardly be affected.

      We believe the very strong effect of FCCP, seen exclusively in rod photoreceptors, to be a direct drug effect. While we cannot not fully exclude an indirect effect via the RPE – as proposed by the reviewer – we think this to be unlikely because:

      (1) RPE viability was not compromised by FCCP treatment.

      (2) If the reviewer´s hypothesis was correct, then also cone photoreceptors should have been affected (e.g. because now the RPE consumes all glucose, leaving nothing for cones). However, cones were essentially unaffected by the FCCP treatment, making a dependence on RPE OXPHOS unlikely. Especially so, because blocking GLUT1 and glucose import on the RPE with 1,9-DDF had only relatively minor effects on rod photoreceptor viability but strongly affected cones. This indicates that the RPE is mainly shuttling glucose through to photoreceptors, especially to cones, and this function does not seem to be impaired by FCCP treatment.

      (3) We found that enzymes required for Krebs-cycle and OXPHOS activity (i.e. citrate synthase, fumarase, ATP synthase γ) are predominantly expressed in photoreceptors but virtually absent from RPE (Figure 3D, see also answer to following question).

      (4) The density of mitochondria (i.e. the target for FCCP) is far lower in RPE than in photoreceptors, as evidenced also by the COX staining shown in Figure 1A. Hence, photoreceptors are far more likely to be hit by FCCP treatment than RPE cells.

      To accommodate the reviewer´s concern, we have now added a further comment into the discussion (Lines 440-442).

      Rev#1.7. While Figure 3D is interesting, it offers no significant insight into mechanisms as the enzyme levels are not being compared to control nor is mitochondrial fitness in these conditions being assessed, which would provide greater insight than just showing that these enzymes are present in the inner segments, which are known to be rich in mitochondria. Additionally, stating that the low ATP is secondary to decreased Krebs cycle activity and ox phos based on merely ATP levels is not supported by metabolite levels minus citrate nor ox phos enzyme levels or oxygen consumption. Also, citrate is purported to be decreased in the table in Figure 2 in the no RPE condition; however, Supplemental Figure 2 demonstrates this change is not significant then the same data is presented in Supplemental Figure 3 and it is statistically significant again. Why the difference in data and why is the same data being shown multiple times?

      Response: The immunostaining shown in Figure 3D shows the in vivo retina, or in other words the localization of enzymes in the native situation. Since this may not have been obvious in the previous manuscript version, we have added a corresponding comment to the legend of Figure 3 (Line 806). The localization of the Krebs-cycle/OXPHOS enzymes citrate synthase, fumarase, and ATP synthase mainly to photoreceptors, but not (or much less) to RPE, is another piece of evidence supporting the idea that OXPHOS is predominantly performed by photoreceptors (see also answer to previous question Rev#1.6).

      The decreased ATP levels (together with citrate, aspartate, NAA) shown in Figure 3 in the no RPE group, are an indication that photoreceptor Krebs-cycle activity may be decreased but not abolished in the absence of RPE. Importantly, GTP levels are not reduced in the no RPE group (Figure 2). Since large amounts of GTP can only by synthesized by either SUCLG-1 in the Krebs-cycle or by NDK-mediated exchange with ATP, the most plausible interpretation is that Krebs-cycle dependent ATP-synthesis was decreased in the no RPE situation, but that the (mini) Krebs-cycle or Cahill-cycle, notably the step from succinyl-CoA to succinate, was running. Since there is no RPE in this group, this strongly suggests important Krebs-cycle/OXPHOS activity in photoreceptors where the majority of the corresponding enzymes are located (see above).

      We thank the reviewer for pointing out that the information on group comparisons may not have been presented with sufficient clarity. In the figures mentioned by the reviewer the data is shown and compared in different contexts: the table in Figure 2B and the data in Figure S3 (now renumbered to Figure S5) refer to two-way comparisons of treatment condition to control, to elucidate individual treatment effects. Meanwhile Figure S2 (now supplementary Figure S3) refers to a 5-way comparison for a general overview that puts all five groups in context with each other. These differences in comparisons and normalization to the respective common standards entail the use of different statistical tools, resulting in different p-values. The statistical testing approaches and thresholds are now disclosed in the figure legends, and additionally in the SI Materials and Methods section (Lines 145-155).

      Rev#1.8. When were the ex vivo samples taken for metabolomics, and if taken when significant TUNEL staining and cell death have occurred, are the changes in metabolism due to cell death or a true indication of differential metabolism? Furthermore, it is unclear if the metabolomics samples included the RPE or not. Considering these treatments will affect most cells in the retina and the RPE, which is included in the ex vivo samples, it is difficult to ascertain that these changes are secondary to the effects on photoreceptors alone.

      Response: The samples for metabolomics included the RPE (except for the no RPE condition) and were taken at the same time as the tissues for histological preparations and TUNEL assays, i.e. they were all taken at post-natal day 15. This has now been clarified in the SI Materials and Methods section (Lines 108-110).

      We cannot entirely exclude an effect of ongoing cell death caused by the different drug treatments on the retinal metabolome. However, since in the experimental treatments cell death was still comparatively low (even in the FCCP condition, overall cell death was only around 10% of the total retina), and the metabolomic analysis considered the entire tissue, the impact of cell death per se on the total metabolome will be comparatively minor (≤ 10%, i.e. within the typical error margin of the metabolomic analysis).

      As mentioned above, the drug treatments should in principle affect all retinal cells at the same time. However, only cells that express the drug targets (i.e. 1,9-DDF targets GLUT1 in RPE cells, Shikonin targets PKM2 in photoreceptors; cf. Figure 1A) should react to the treatment. Even FCCP, in the paradigm employed, will only affect those cells that rely heavily on OXPHOS. Our data indicates that while this is almost certainly the case for rods; cones, RPE cells, and essentially all of the inner retina, are not affected by FCCP treatment, strongly suggesting that OXPHOS is of minor importance for these cell populations.

      Rev#1.9. Why were the FCCP and no RPE groups compared? If they have similar metabolite patterns as noted in Figure 2, would that suggest that FCCP's greatest effect is on the ox phos of RPE and the metabolite patterns are secondary to alterations in RPE metabolism? Also, the increase in citrate and decrease in NAD may be related to effects on RPE mitochondrial metabolism when comparing these groups, and the disruption of RPE metabolism may then result in PARP staining of photoreceptors.

      Response: The reason for the pair-wise comparison of the no RPE and FCCP groups initially was indeed the similarity in metabolite patterns. This was now rephrased accordingly in the results section “Photoreceptors use the Krebs-cycle to produce GTP” (Lines 218-219). The interpretation that the reviewer proposed here is interesting, but does not conform with the data analysis of this and other group comparisons.

      Instead, the similarity between the metabolic patterns found in the no RPE and FCCP groups further supports the idea that a lack of RPE decreases retinal OXPHOS and increases glycolysis. This interpretation is based on the following observations:

      (1) Mitochondrial density in the RPE is far lower than in photoreceptors (see COX staining in Figure 1A), thus quantitatively the metabolite pattern caused by a disruption of OXPHOS (via FCCP treatment) will be dominated by metabolites generated by photoreceptors. For the same reason the depletion of retinal NAD+, and the concomitant increase in photoreceptor PAR accumulation after FCCP treatment, is unlikely to be due to changes in RPE.

      (2) Similarly, citrate synthase (CS) was found to be almost exclusively expressed in photoreceptor inner segments, with little expression in RPE (Figure 3D). Hence, the quantitative increase of citrate levels after FCCP treatment can only originate in photoreceptors.

      (3) The comparison of the control (with RPE) against the no RPE group suggested an increase in (aerobic) glycolysis in the absence of RPE, evidenced notably by a retinal accumulation of lactate, BCAAs, and glutamate (Figure 3A). The very same metabolite pattern is seen for the FCCP treatment (Figure 1B) indicating a marked upregulation of glycolysis (Figure 6C). The latter observation suggests that photoreceptors, after disruption of OXPHOS switch to an exclusively glycolytic metabolism, which, however, rods cannot sustain (Figure 1C, D).

      (4) Glucose consumption and lactate release is increased in the no RPE group vs. control (new Supplementary Figure 4). A similar increase in glucose consumption and lactate production is seen in the FCCP group suggesting that also the no RPE situation disrupts OXPHOS in photoreceptors.

      Rev#1.10. The conclusions being reached are difficult to interpret secondary to the experimental procedures and the fact that the treatments are not cell-specific and RPE is included with the neuroretina as well. Likewise, stating FCCP is altering the Krebs cycle in the neuroretina is difficult to believe as there are no changes in the Krebs cycle when compared to the control, which also has RPE.

      Response: We agree with the reviewer, that some of the conclusions may have been somewhat speculative. Accordingly, we have toned down our conclusions in several instances in the text, notably in abstract, introduction, and discussion.

      When it comes to Krebs cycle intermediates a key limitation of our study is indeed the lack of carbon-tracing and metabolic flux analysis as noted by the reviewers, a limitation that we now highlight more strongly in the discussion of the revised manuscript (Lines 545-549). While it is highly probable that the flux of Krebs cycle intermediates is altered by FCCP, our steady-state data does not show significant changes in the metabolites citrate, fumarate, and succinate. However, our study does show a highly significant decrease in GTP levels, which as explained above, is a key indicator of Krebs cycle activity/inactivity. Moreover, while GTP levels were reduced also in the no RPE group, GTP was still significantly higher in the no RPE group compared to the FCCP treatment. Our interpretation of this finding is that there is Krebs-cycle/OXPHOS activity in the neuroretina, which is abolished by FCCP.

      Rev#1.11. Supplemental Figure 4C and D states that GAC inhibition affected only photoreceptors, but GAC is expressed throughout the retina and so the inhibition is altering glutamine-glutamate homeostasis throughout the retina. Clearly, based on histology, one can see that the architecture of the retina, especially at the highest dose, is lost likely because all cells are being affected. So it is not photoreceptor-specific and even at low doses one can see that the inner retina is edematous. Moreover, with such a high amount of TUNEL staining in the ONL, are rods more affected than cones?

      Response: In our hands the immunostaining for Glutaminase C (GAC) labelled predominantly cone inner segments, the OPL, and perhaps bipolar cells (Figure S1A). The deleterious effects mentioned by the reviewer are only seen at the highest concentration of the GAC inhibitor compound 968. This concentration (10 µM) is 100-fold higher than the dose that produces a significant loss of cones in the outer retina (0.1 µM). We therefore think that this data points to the extraordinary reliance of cones on glutamine and glutamate. As can be seen from the images (Figure S4C) illustrating the effects of 0.1 and 1 µM Compound 968 treatment, the ONL thickness is not significantly reduced by the GAC inhibitor. This strongly indicates that at these doses the rods are not affected by GAC inhibition.

      Rev#1.12. The no RPE vs 1,9 DDF data may be interpreted as preventing glucose transport in the RPE increases BCAA catabolism by the RPE, which has been shown to utilize BCAA in culture systems. To this end, when the RPE is not present, the BCAA is increased as compared to the control with RPE.

      Response: Our original interpretation of this data was that after GLUT1 inhibition and a correspondingly reduced retinal glucose uptake, the retina switched to an increasing use of anaplerotic substrates, including BCAAs. This is supported by the concomitant upregulation of the Cahill-cycle product alanine and the mini-Krebs-cycle product N-acetylaspartate (NAA). Yet, we agree with the reviewer that BCAAs could also be consumed by the RPE. We have now changed our conclusion at the end of the results chapter “Reduced retinal glucose uptake promotes anaplerotic metabolism“ to also highlight this possibility (Lines 261-262).

      Rev#1.13. It is unclear why so much effort is comparing the no RPE group to the treatment groups and not comparing the control group to the different treatment groups.

      Response: Previous studies – including the seminal studies of Otto Warburg from the early 1920s – had always used retina without RPE. This “no RPE” situation is therefore something of a reference for our entire study, which is why we dedicated more effort to its analysis. We have now inserted a corresponding remark into the manuscript (Lines 182-184).

      Rev#1.14. The conclusions are significantly overstated especially with regards to rods versus cones as these are not cell-specific treatments. For example, the control vs 1,9 DDF vs FCCP clearly shows that there is mitochondrial dysfunction due to decreased NAD, increased AMP/ATP ratio, decreased Asp but increased Gln, and a compensatory increase in lactate production.

      Response: We agree with the reviewer and have tried to phrase our statements in more measured fashion. Notably, we have toned down our statements in the title, abstract, results, discussion, and several of the subchapter headings.

      Rev#1.15. While metabolic conclusions are drawn on serine/lactate ratio, this ratio is driven by the drastic changes in lactate and not so much serine in the treatment conditions as it was rather stable. Likewise, substrates beyond glucose have the potential to fuel the TCA cycle and make GTP via SUCLG1, such as fatty acids, other AAs, etc. Therefore, this ratio may not tell the entire story about anaplerotic metabolism. Furthermore, knowing that RPE utilize BCAAs to fuel their TCA cycle, the no RPE condition may simply have increased BCAAs due to lack of metabolism by the RPE, which drives the GTP/BCAA ratio. To state that the neuroretina was utilizing BCAAs for anaplerosis is not well supported based on the current data. Similarly, what is to say that the GTP/lactate ratio in the no RPE situation is not driven by the fact that the RPE is no longer present to act as acceptor of retinal lactate production or that more glucose is reaching the retina since the RPE is not present to accept and utilize that produced. Glucose uptake was not assessed to further address these issues.

      Response: We agree with reviewer that metabolite ratios may not tell the full story underlying retinal metabolism however based on the robustness of using quantitative and highly reproducible NMR data, they are an important part of the metabolomics toolbox. The reviewer correctly observed that the changes in lactate levels are more dramatic than in serine. Still, also serine was significantly increased in the no RPE, 1,9-DDF, and Shikonin groups. Together with the lactate changes (same or opposite direction) the resulting serine/lactate ratios display marked alterations.

      When it comes to the supply of other potential energy substrates mentioned by the reviewer, i.e. fatty acids or amino acids other than BCAAs, these are only supplied in minimal amounts in the defined, serum free R16 medium (Romijn, Biology of the Cell, 63, 263-268, 1988) and – if used to any important extent – would be rapidly depleted by the retina. Thus, for a culture period of 2 days in vitro between medium changes these energy sources are not available and thus cannot be used by the retina.

      Our conclusion that the retina is using anaplerosis is based not only on the observations made in the no RPE group but also on, for instance, the metabolite ratios seen in the 1,9-DDF treatment group. In this group decreased glycolytic activity may correspond to increased serine synthesis and anaplerosis.

      As far as glucose uptake is concerned, we have analysed the medium samples at P15 (equivalent to the retina tissue collection time point) and now present data that addresses this question more directly via the consumption of glucose from and release of lactate to the culture medium (New Supplementary Figure 4C, D). This new dataset provides another independent observation showing that:

      (1) Glucose consumption/lactate release (i.e. aerobic glycolysis) is high in the no RPE situation but low in the control situation. In other words, retinal aerobic glycolysis is most likely stimulated by the absence of RPE.

      (2) 1,9-DDF treatment decreases glucose consumption/lactate release as would be expected from a GLUT1 blocker. Since ATP and GTP production are high nonetheless, this indicates that other substrates (i.e. anaplerosis) were used for retinal energy production, in agreement with the analysis shown in Figure 6C.

      (3) The FCCP treatment, which disrupts oxidative ATP-production, increases glucose consumption/lactate release in way similar to the no RPE situation. Yet, the no RPE retina can still generate sizeable amounts of GTP but not ATP. Together, this provides further evidence that neuroretinal OXPHOS is decreased in the absence of RPE.

      Rev#1.16. The evidence for the mini-Krebs cycle is intriguing but weak considering it is based on certain enzymes being expressed in the photoreceptors, which had already been shown to be present in other publications, and a single ratio of metabolites that is increased in FCCP. One would expect this ratio to be increased under FCCP regardless. There is no stable isotope tracing with certain fuels to confirm the existence of the mini-Krebs cycle.

      Response: We thank the reviewer for this suggestion. We agree that our evidence for the mini-Krebs-cycle (and the Cahill-cycle) may be to some extent circumstantial and additional technologies would help to obtain further supportive data. Still, here we would like to invite the reviewer to a thought experiment where he/she could try and interpret our data without considering the Cahill- or the mini-Krebs-cycle. At least we ourselves, when we engaged into such thought experiments, were unable to explain the data observed without these alternative energy-producing cycles. Most notably, we were unable to explain the strong accumulation of either alanine or N-acetyl-aspartate (NAA) when only considering glycolysis and (full) Krebs-cycle metabolism. Of course, this may still be considered “weak” evidence, and we expect that future studies including complementary technologies will either confirm or expand our interpretation of the existing data set.

      The suggestion to perform stable isotope-labelled tracing with potential alternative fuels (e.g. glutamate, glutamine, pyruvate, etc.) is very attractive indeed. While such studies are likely to shed further light on the metabolic pathways proposed, this will entail very extensive experimental work, with multiple different conditions and concentrations and variety of analysis methods that is currently not feasible (e.g. a 1.7 mm NMR probe equipped with a 15N channel) as an extension of the present manuscript. Nevertheless, we will certainly consider this approach for future follow-up studies once such techniques are available and will screen for suited collaboration partners. A corresponding comment on such future possibilities has now been inserted into the discussion (Lines 545-549).

      Rev#1.17. The discussion does not mention how this data contradicts a recent in vivo study looking at Glut1 knockout in the retina (Daniele et al. FASEB. 2022) or previous in vivo studies that suggest cones may be less sensitive to changes in glucose levels (Swarup et al. 2019). This is a key oversight.

      Response: We thank the reviewer for pointing this out. We now included these studies in the revised discussion in a new subchapter on the expression of glucose transporters in the outer retina (Lines 454-472). For a critical review of the Daniele et al., 2022 study please also see our more detailed response to question Rev#3.13 below.

      Rev#1.18. GAC is expressed in more than just cones so making cell-specific statements regarding fuel utilization is not well supported.

      Response: Our immunostaining for GAC revealed a strong expression in cone inner segments (Figure S1A3). While this does not exclude (relatively minor) expression in other retinal cell types, cones are likely to be more reliant on GAC activity than other cell types. See also answer above.

      Rev#1.19. Suggesting that rods utilize the mini-Krebs cycle based on AAT2 being seen in the inner segments without at least co-staining for RHO or PNA is weak evidence for such a cycle. AAT looks to be expressed in the inner segments of all photoreceptors.

      Response: We have taken up this suggestion from the reviewer and now provide an additional co-staining for AAT1 and AAT2 with rhodopsin. Note that in response to a pertinent comment from Reviewer #3 we have changed the abbreviation for aspartate aminotransferase from “AAT” to the more commonly used “AST” throughout the manuscript.

      New images showing a co-staining for AST1 and AST2 with rhodopsin now replace the former image set in Figure 7D. In brief, the new images show the expression of both AST1 and AST2 across the retina, with, notably an expression in the inner segments of photoreceptors but not in the outer segments, where rhodopsin is expressed.

      Reviewer #3 (Recommendations For The Authors):

      Rev#3.1. The staining for the glucose transporters GLUT1 and GLUT3 does not reflect what has previously been published by two different groups that were validated by cell-specific knockout mice. As mentioned by the author GLUT1 and GLUT3 have differences in transport kinetics, which would affect their metabolism. Therefore, the lack of GLUT1 in photoreceptors would suggest that photoreceptor metabolism is not faithfully replicated in this system. This difference from the previous literature should be discussed in the discussion.

      Response: As the reviewer pointed out, the expression of GLUT1 in the retina is somewhat controversial, with much older literature showing expression on the RPE, while some more recent studies claim GLUT1 expression in photoreceptors. For a brief discussion of our GLUT1 immunostaining please see also our answer to question Rev#1.3 above.

      Although the retinal expression of GLUT1 was besides the focus of our study, we feel we must address this point in more detail: In the brain the generally accepted setup for GLUT1 and GLUT3 expression is that low-affinity GLUT1 (Km = 6.9 mM) is expressed on glial cells, which contact blood vessels, while high-affinity GLUT3 (Km = 1.8 mM) is expressed on neurons (Burant & Bell, Biochemistry 31:10414-20, 1992; Koepsell, Pflügers Archiv 472, 1299–1343, 2020). This setup matches decreasing glucose concentration with increasing transporter affinity, for an efficient transport of glucose from blood vessels, to glial cells, to neurons. In the retina, the cells that contact the choroidal blood vessels are the tight-junction-coupled RPE cells. As shown by us and many others, RPE cells strongly express GLUT1 (cf. Figure 1A-3.). To warrant an efficient glucose transport from the RPE to photoreceptors, photoreceptors must express a glucose transporter with higher glucose affinity than GLUT1. We show that this is indeed the case with photoreceptors expressing GLUT3 (cf. Supplemental Figure 1-5.). While a part teleological explanation does not per se prove that our data is correct, at least our data is plausible. In contrast, the glucose transporter setup sometimes claimed in the literature is biochemically implausible, i.e. for the flow of metabolites (glucose) to go against a gradient of transporter affinities, and we are not aware of an example of such a setup occurring anywhere in nature.

      However, at this point we cannot exclude low levels of GLUT1 expression on Müller glia cells or even photoreceptors. This expression could, for instance, be relevant in cases where cells were shuttling excess glucose – perhaps produced through gluconeogenesis – onwards to other retinal cells. Still, GLUT1 expression can only be minor when compared to RPE since a major expression would destroy the glucose affinity gradient (see above) required for efficient glucose shuttling into the energy hungry photoreceptors.

      To address this request by the reviewer (and also reviewer #1) we now discuss the question of glucose transporter expression in the outer retina in a new subchapter of the discussion (Lines 454-472).

      Rev#3.2. Photoreceptor metabolism and aerobic glycolysis are tied to photoreceptor function, as demonstrated by Dr. Barry Winkler. The authors should provide data or mention (if previously published) about photoreceptor OS growth and function in this system.

      Response: The studies of Barry Winkler (e.g. Winkler, J Gen Physiol. 77, 667-692, 1981) confirmed the original work of Otto Warburg and expanded on the idea that the neuroretina was using aerobic glycolysis. Importantly, Winkler used a very similar experimental setup as Warburg has used, namely explanted rat retina without RPE. In light of our data where we compare metabolism of mouse retina with and without RPE – where retina cultured without RPE confirms the data of Warburg and Winkler – it appears most likely that the purported aerobic glycolysis occurs mostly in the absence of RPE but only to a lower extent in the native retina.

      Photoreceptor outer segment outgrowth is somewhat slower in the organotypic retinal explant cultures compared to the in vivo situation (cf. Caffe et al., Curr Eye Res. 8:1083-1092, 1989 with LaVail, JCB 58:650-661, 1978; see also answer to reviewer #1). Importantly, organotypic retinal explant cultures and their photoreceptors are fully functional and remain so for extended periods in culture (Haq et al., Bioengineering 10:725, 2023; Tolone et al., IJMS 24:15277, 2023). This information has now been added to the manuscript discussion section, into the new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395).

      Rev#3.3. It is unclear from the description of the experiment in both the results and methods if 1,9DDF, Shikonin, and FCCP were added to both apical and basal media compartments or one or the other and should be specified. The details of what was on the apical compartment would be helpful, as the model is supposed to allow for only nutrients from the basal compartment (as indicated by the authors themselves). Is the apical compartment just exposed to air? How does this affect survival?

      Response: In organotypic retinal explant cultures the RPE rests on the permeable culturing membrane such that the basal side is contact with the membrane and the medium below (far schematic drawing see Figure S1B), while the apical side is covered by a thin film of medium created by the surface tension of water (Caffe et al., Curr Eye Res. 1989; Belhadj et al., JoVE, 2020). This thin liquid film ensures sufficient oxygenation and is an important factor that allows the retinal explant to remain viable for several weeks in culture. If the retinal cultures were submerged by the medium, their viability – especially that of the photoreceptors – would drop dramatically and would typically be below 3-5 days. Therefore, in the retinal organotypic explant cultures used here, the nutrients and the drugs applied do indeed reach the outer retina from the basal side, i.e. similar as they would in vivo.

      To address this question from the reviewer, corresponding clarifications have been inserted into the SI Materials and Methods section (Lines 64-66).

      Rev#3.4. As the metabolomic data obtained was quantitative, several metabolites discussed should be analyzed in terms of ratios, for example, Glutathione and glutathione disulfide should be reported as a ratio. In addition as ATP, ADP, and AMP were measured, they can used to calculate the energy charge of the tissue.

      Response: We thank the reviewer for these suggestions and have created corresponding graphs for GSH / GSSG ratio and energy charge. These new graphs have now been added to the SI datasets, to the new Supplementary Figure 4. To accommodate other requests from the Reviewers, this new Figure also contains additional new datasets on glucose and lactate concentrations (see further comments above and below). Please note that all later SI Figures have been renumbered accordingly.

      In brief, the ratios for GSH/GSSG show no significant changes between control and the different experimental groups. Meanwhile, the adenylate energy charge of the retinal tissues show a significant decrease in the energy charge for the Shikonin group and the FCCP group. Note that in the new Supplementary Figure 4A, the dotted lines indicate the energy charge window typical for most healthy cells (0.7 – 0.95).

      Rev#3.5. I think a missed opportunity when discussing the possible taurine/hypotaurine shuttle would be the impact on the osmosis of the subretinal space as taurine has been hypothesized as a major osmolyte.

      Response: This is another interesting recommendation from the reviewer. To address this point, we have now introduced a corresponding paragraph and references in the discussion of the manuscript (Lines 503-504; 512-514).

      Rev#3.6. In Figure 3, the distribution of these enzymes should also be studied under the no RPE condition as the culture treatment took several days for these metabolic changes to occur.

      Response: The images shown in Figure 3D are from the in vivo retina. Since this may not have been very clear in the previous manuscript version, we have now added a corresponding explanation to the legend of Figure 3. As far as we can tell, the expression and localization of neuroretinal enzymes does not change in cultured retina, during the culture period (compare Figure 1A with Supplementary Figure S1C). However, when it comes to the metabolite taurine its production (localization) changes dramatically in the no RPE situation where taurine is essentially undetectable by immunostaining (not shown but see metabolite data in Figure 2A, Figure 3A).

      Rev#3.7. In Figures 4 and 5, it is unclear why the experimental groups were not compared to the control and requires further explanation. Furthermore, the authors should justify the concentrations of drugs used as the cell death could have risen from toxicity to the drugs and not due to disruption of metabolism.

      Response: The reviewer is right, the rationale for these comparisons may not have been laid out with sufficient clarity. In Figure 4 the no RPE and FCCP groups are compared because both groups showed similar metabolite changes towards the control situation. The no RPE to FCCP comparison thus focussed on the details of the – at first seemingly minor – differences between these two groups. This has now been clarified in the corresponding part of the results (Lines 218-219).

      In Figure 5A, B we compare the no RPE and 1,9-DDF groups with each other, notably because the data obtained seemingly contradicted our initial expectation that these two groups should show similar metabolite patterns. Also here, we have now inserted an additional explanation for this choice of comparisons (Lines 252-253).

      In Figure 5C, D we compare the Shikonin and FCCP groups with each other. The idea behind this comparison was that in the 1st group glycolysis was blocked while in the 2nd group OXPHOS was inhibited, or in other words here were compared what happened when the two opposing ends of energy metabolism were manipulated in opposite directions. This reasoning is now given in the results section (Lines 265-268).

      As far as the choice of drugs and concentrations is concerned, we used only compounds that have been extremely well validated through up to five decades of scientific research (see initial response to Reviewer #2 above). We therefore are confident that at the concentrations employed the results obtained stem from drug effects on metabolism and not from generic, off-target toxicity. Then again, as we show, prolonged (i.e. 4 days) block of energy metabolism pathways does cause cell death.

      Rev#3.8. In line 203, the authors discuss GTP as being primarily a mitochondrial metabolite, however, photoreceptors would require a localized source of GTP synthesis in the outer segments as part of phototransduction, and therefore GTP in photoreceptors cannot be a mitochondrial-specific reaction in photoreceptors. Furthermore, the authors mentioned NDK as being a possible source of GTP, but they do not show NDK localization despite it being reported in the literature to be localized in the OS.

      Response: The question as to the source of GTP in photoreceptor outer segments is indeed highly relevant. For GTP production in mitochondria see the answer to the next question below (Rev#3.9). An early study showed nucleoside-diphosphate kinases (NDK) to be expressed on the rod outer segments of bovine retina (Abdulaev et al., Biochemistry 37:13958-13967, 1998). More recently NDK-A was shown to be strongly expressed in photoreceptor inner segments (Rueda et al., Molecular Vision 22:847-885, 2016). We now refer to both studies in the results section of the manuscript (Line 227-228).

      Rev#3.9. In the "Impact on glycolytic activity, serine synthesis pathway, and anaplerotic metabolism" section, the authors claim in the no RPE group glycolytic activity was higher due to a depressed GTP-to-lactate ratio. However, this reviewer is under the impression that GTP production in photoreceptors is not mitochondrial specific, so this ratio doesn't make sense (I could be mistaken, however). A better ratio would have been pyruvate/lactate or glucose/lactate when discussing increased glucose consumption.

      Response: We appreciate the reviewers’ comment, yet we do indeed believe we can show that GTP-production in our experimental context is mainly mitochondrial. As explained in the manuscript results section (“Photoreceptors use the Krebs-cycle to produce GTP”), there are essentially only two possibilities for a photoreceptor to produce sizeable amounts of GTP. In the mitochondria via SUCLG1 – i.e. an enzyme highly expressed in photoreceptor inner segments (Figure 5D) – and the cytoplasm via NDK from excess ATP. The claim about the depressed GTP-to-lactate ratio in the no RPE situation takes this into account. Importantly, since in the no RPE situation ATP-levels are significantly lower than GTP, here GTP can only be produced via SUCLG1 and OXPHOS. Moreover, this contrasts with the FCCP group where mitochondrial OXPHOS is disrupted and both ATP and GTP are depleted.

      As far as ratios with pyruvate and glucose are concerned, we agree that these could potentially be very interesting to analyse. Unfortunately, in our retinal tissue 1H-NMR spectroscopy- based metabolomics analysis the levels of both pyruvate and glucose were below the detection limits which likely reflects their rapid metabolic turnover (cf. table S1). While this might be attributable to the marked consumption of these metabolites within the tissue, it does not allow for us to calculate the suggested ratios to lactate. Then again, in the supernatant medium which was collected at the same time point as the retina tissue, we can readily detect glucose and lactate levels, for this data please see the new Supplementary Figure 4.

      Rev#3.10. Aspartate aminotransferase should be abbreviated as AST, as it is more commonly noted.

      Response: In response to this comment from the reviewer, we have changed the abbreviation for aspartate aminotransferase from AAT to AST throughout the manuscript.

      Rev#3.11. In the discussion the assumptions of the ex vivo culture systems should be clearly stated. One that was not mentioned, but affects the implications of the data, is that the retinas used in this study are from the developing mouse eye. Another important assumption that was made in this paper was that the changes in retinal metabolism were due to photoreceptors even though the whole neural retina was included.

      Response: The reviewer is correct; we have added these two points to the discussion section of the manuscript. Notably, we now included a new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395) to present different in vitro and in vivo test systems.

      Rev#3.12. Starting at line 347: As the authors know, the RPE has been shown to be highly reliant on mitochondrial function, and disruption of RPE mitochondrial metabolism leads to photoreceptor degeneration (numerous papers have shown this). Furthermore, the lower levels of lactate detected in their explants when RPE was present suggests that lactate is actively transported out of the neural retina by the RPE.

      Response: The reviewer is right about lactate being exported from the retina to the blood stream in vivo, or, in our in vitro study, to the culture medium. In the new dataset showing glucose and lactate concentrations in the culture medium (new Supplementary Figure 4C, D), we show that without RPE (no RPE group) and the retina releases more significantly lactate into the medium than control retina with RPE. At the same time the no RPE retina consumes more glucose than control retina.

      Rev#3.13. Line 360: Again, in mouse photoreceptors (by bulk RNAseq and scRNAseq), there is no GLUT3 expression (encoded by slc2a3). It was also recently shown by Dr. Nancy Philp's lab that rod photoreceptors express GLUT1, encoded by slc2a1 (PMCID: PMC9438481). The differences reported in this study and previous studies should be discussed.

      Response: Although this comment may not make us very popular, we are somewhat sceptical of RNAseq data (especially single cell RNAseq) since the underlying methodology – at the current level of technological development – is notoriously unreliable when it comes to the assessment of low abundance transcripts and suffers from apoor batch reproducibility, compared to NMR based metabolomics. Due to methodological constraints RNAseq have a propensity to display erroneously high or low expression. Moreover, and perhaps even more important, dissociated cells in scRNAseq studies undergo rapid gene expression changes that can significantly falsify the image obtained (Rajala et al., PNAS Nexus 2:1-12, 2023). Finally, it cannot be emphasized enough that mRNA expression profiles DO NOT equate protein expression and there are numerous examples for divergent expression profiles when mRNA and protein is compared.

      The Daniele et al. study (FASEB Journal 36:e22428, 2022; PMCID: PMC9438481) used in situ hybridization to study the mRNA expression of GLUT1 (slc2a1) and GLUT3 (slc2a3). In line with our comment just above, the Daniele et al. study may provide for an example of divergence between mRNA and protein expression, since it seemingly showed only minor expression of GLUT1/slc2a1 in the RPE, i.e. precisely in the one cell type that is well-known for its very strong GLUT1 protein expression.

      Furthermore, Daniele et al. used a conditional GLUT1 knock-out in photoreceptors induced by repeated Tamoxifen injections. The photoreceptor GLUT1 knock-out led to a relatively mild phenotype with only about 45% of the outer nuclear layer lost over a 4-months time-course. This is in stark contrast with the FCCP or the 1,9-DDF treatment, which would ablate nearly all rod photoreceptors in under one or two weeks, respectively.

      As a side note, Tamoxifen is an oestrogen receptor antagonist (with partial agonistic behaviour) with a long history of causing retinal and photoreceptor damage. Notably, oestrogen receptor signalling is important for maintaining photoreceptor viability (Nixon & Simpkins, IOVS 53:4739-47, 2012; Xiong et al., Neuroscience 452:280-294, 2021). Therefore, the relatively minor effects of the conditional GLUT1 KO in photoreceptors found in Daniele et al. may have been confounded by direct tamoxifen photoreceptor toxicity. On a wider level, this possible confounding factor related to the use of Tamoxifen points to general problems associated with certain forms of genetic manipulations.

      We now mention the controversy around the expression of glucose transporters in the retina, including the Daniele et al. study in a new subchapter of the discussion on "Expression of glucose transporters in the outer retina” (Lines 454-472).

      Rev#3.14. Lines 370-372: FCCP caused a strong cell death phenotype in rods, however under stress rods upregulate the secretion of RdCVF, which leads to cone photoreceptor survival by the upregulation of aerobic glycolysis in cones. The data should be re-interpreted in the context of this previous literature.

      Response: We thank the reviewer for this comment; however, we could not find a reference that would state that “…under stress rods upregulate the secretion of RdCVF”. What we did find was a reference stating that similar factors such as thioredoxins (TRX80) are secreted from blood monocytes under stress (Sahaf & Rosén, Antioxid Redox Signal 2:717-26, 2000). However, we consider these cells to be too dissimilar to rod photoreceptors to warrant a corresponding comment. Moreover, the research group who discovered RdCVF originally showed that rod-secreted RdCVF cannot prevent cone degeneration if the corresponding Nxnl1 gene is knocked-out in cones, arguing for a cell-autonomous mechanism of RdCVF -dependent cone protection (Mei et al., Antioxid Redox Signal. 24:909-23, 2016).

      Since it is very possible that we may have missed the correct reference(s), we would welcome further guidance by the reviewer.

      Rev#3.15. Line 374: 1,9-DDF caused a 90% loss of cones, however, previous studies by Dr. Nancy Philp have shown glucose deprivation in the outer retina affects primarily rod photoreceptors. The differences should be discussed.

      Response: We thank the reviewer for directing us to these studies. As mentioned above (Rev#3.13.) the Daniele et al. 2022 study yielded only relatively mild effects for a rod-specific conditional GLUT1 KO on photoreceptor viability. Similarly, in an earlier study (Swarup et al., Am J Physiol Cell Physiol. 316: C121–C133, 2019) the Philp group found that also a GLUT1 KO in the RPE caused only a minor phenotype in the photoreceptor layer. We would argue that if glucose, and by extension aerobic glycolysis, were indeed of major importance for (rod) photoreceptor survival, the degenerative effect of these genetic GLUT1 ablations should have been devastating and should have destroyed most of the outer retina in a matter of days. The fact that this was not seen in both studies is another piece of independent evidence that rod photoreceptors do not rely to any major extent on glycolytic metabolism.

      The two studies from the Philp lab (Swarup et al., 2019; Daniele et al., 2022) are now cited in the discussion (Lines 417-419 and 458-460).

      Rev#3.16. Line 375: Yes Dr. Claudio Punzo and Dr. Leveillard Thierry along with other groups have shown glycolysis is required to maintain cone survival when under stress, however, the authors should emphasize that it is under stress that this is observed.

      Response: In response to this comment we have now specifically extended our corresponding remark in the discussion of the manuscript (Lines 446-447).

      Rev#3.17. The section "Cone photoreceptors use the Cahill-cycle". The presence of ALT in photoreceptors was surprising and suggests alternatives to the Cori reaction. However, previous measurements of glucose and lactate from localized in vivo cannulation of animal eyes suggest the majority of glucose taken up by the retina is released back to the blood as lactate. Again, this section should discuss this idea in terms of the previous literature.

      Response: Here, we believe the reviewer is referring to studies performed in the late 1990s where, in anaesthetized cats, the lactate concentration in blood samples obtained from choroidal vein cannulation was compared against that in blood samples obtained from femoral arteries (Wang et al., IOVS 38:48-55, 1997). We note that a more relevant in vivo measurement of retinal glucose consumption and lactate production would likely require the simultaneous cannulation of the central retinal artery (CRA) and the central retinal vein (CRV). This would need to be combined with repeated (online) blood sampling, drug applications, and subsequent metabolomic analysis. We are not aware of any in vivo studies where such procedures have been successfully performed and further miniaturization and increased sensitivity of metabolomic analytic equipment will likely be required before such an undertaking may become feasible. Even so, such studies may not be feasible in small rodents (mice, rats) and may instead require larger animal species (e.g. dog, monkey) to overcome limitations in eye and blood sample size.

      We have now extended the discussion of our manuscript with a new subchapter on “The retina as an experimental system for studies into neuronal energy metabolism”. Within this new subchapter we now present two different in vivo experimental approaches that addressed retinal energy metabolism (Lines 376-384). Moreover, we now present new data on retinal lactate release to the culture medium, showing, for instance, a strong increase in lactate release in the no RPE condition compared to control (new Supplementary Figure 4).

      Rev#3.18. Lines 431-433: The study cited suggested that the mitochondrial AST was detected in other cells, in agreement with the data shown. However, the authors' statements in this section are misleading as they do not take into consideration the contribution of AST from other cell types.

      Response: The reviewer is right, we found both AST1 and AST2 to be expressed not only in photoreceptor inner segments but also in the inner retina, especially in the inner plexiform layer (new Figure 6D). Since this might indicate mini-Krebs-cycle activity also in retinal synapses, we have added a corresponding comment to the discussion (Lines 540-543).

      Grammatical and wording fixes:

      Rev#3.19. Line 98 - "the recycling of the photopigment, retinal."

      Response: We have inserted a comma after “photopigment”.

      Rev#3.20. Results section and Figure 1 start without providing context for the model system where staining is being done.

      Response: We have added this information to the beginning of the results section (Lines 105-106).

      Rev#3.21. Supplementary Figure 2 is not mentioned in the main text - there is no context for this figure.

      Response: Supplementary Figure 2 was originally referenced in the legend to Figure 2. We now mention supplementary figure 2 (now renumbered to supplementary figure S3) also in the main text, in the results section under “Experimental retinal interventions produce characteristic metabolomic patterns” (Line 148).

      Rev#3.22. Volcano plot in Supplementary Figures 3, 5, 6, 7, and 8 don't indicate what Log2(FC) is in reference to.

      Response: The log2 fold change (FC) is calculated as follows: log2 (fold change) = log2 (mean metabolite concentration in condition A) - log2 (mean metabolite concentration in condition B) where condition A and condition B are two different experimental groups being compared. This is now explained in the SI Materials and Methods (Lines 145-147) and indicated in abbreviated form in the figure legends. Please note that supplemental figures have now been renumbered due to the insertion of an additional, new Figure.

      Rev#3.23. Line 331 - –a“d allowed to analyze the..." ”s incorrect phrasing.

      Response: This phrasing was changed.

      Rev#3.24. Line 343 "c“cled" ”

      Response: This phrasing was changed.

      Rev#3.25. Line 446 is misworded.

      Response: This phrasing was changed.

      Technical questions:

      Rev#3.26. At what point after explant was the IHC done in Supplemental Figure 1? If early, but experiments are done later, there's’a chance things are more disorganized at the end of the experiment.

      Response: Staining and metabolomics analysis were both done at the end of each experiment, at the same time, at P15. This is now mentioned in the SI materials and methods section (Lines 67, 108-110).

      Rev#3.27. FCCP affects plasma membrane permeability, which is particularly critical in neurons that undergo repolarization and depolarization - –ow do we know FCCP on cell death via metabolism? See: https://www.sciencedirect.com/science/article/pii/S2212877813001233

      Response: The reviewer is correct, a significant permeabilization of cell membranes in general would likely cause extensive neuronal cell death, unrelated to a disruption of OXPHOS. However, the FCCP concentration used here (5 µM) is at the lower end of what was used in the mentioned Kenwood et al. study (Mol Metab. 3:114-123, 2014) and the effect on cell membrane permeability in tissue culture is likely to be rather small, as opposed to what was seen by Kenwood et al. in cultures of individual cells. This view is supported by the fact that in our FCCP treatments, we did not observe any significant increases of cell death in any retinal cell type (including RPE) other than in rod photoreceptors. Together with the fact that only photoreceptors strongly express Krebs-cycle/OXPHOS related enzymes, this strongly suggests that the FCCP effects seen by us were due to disruption of OXPHOS.

      Rev#3.28. Numerous metabolite comparisons are being made throughout the manuscript – what type of multiple hypothesis testing corrections are utilized? Only certain figures mention multiple hypothesis testing (e.g. Figure 6).

      Response: In general, in this manuscript we used two different statistical methods: 1) For two-group comparisons, we used an unpaired, two-tailed t-test, which reports a p-value with 95% confidence interval without additional multiple hypothesis testing (e.g. in Figure 2, Suppl. Figures 4, 6, 7, 8). 2) For multiple group comparisons we used a one-way ANOVA analysis with Tukey’s multiple comparisons post-hoc test (except suppl. Figure 9 where Fisher´s LSD post-hoc test was used). The information on which statistical test was used for what dataset is now given in the figure legends and in the SI Material and Methods section.

      Rev#3.29. For Figure 3, how do we know that the removal of RPE is causing the metabolite changes due to RPE-PR coupling? How do you rule out the fact that it isn’t just: I – a thicker physical barrier between media and the neural retina that is causing the changes, or II – removal of RPE from PR causes OS shearing and a stress response that alters metabolism?

      Response: We believe these concerns can be ruled out: The RPE cells are linked by tight junctions and are not “just a thicker barrier” but a barrier that is almost impermeable for most metabolites unless they are carried by specific transporters. Outer segment shearing via RPE removal would indeed be a concern if we had used adult retina. However, we explanted that retina at P9 when it does not possess any sizeable outer segments yet. As a matter of fact, photoreceptors grow out outer segments only after P9.

      Rev#3.30. While 1,9-dideoxyforskolin blocks GLUT1, it is known to have other effects, including on potassium channels. How do we know the effects of 1,9-dideoxyforskolin are specific to GLUT1? Utilizing a GLUT1 KO and showing no additional effects when adding 1,9-dideoxyforskolin would be helpful as a control.

      Response: This is a good suggestion from the reviewer. We note that this is technically not easy to achieve as it would require an RPE-specific knock-out that should be inducible at a given experimental time-point, in a quantitative manner. The study by Swarup et al. (see above Rev#3.13.) used an RPE specific knock-out that was, however, not inducible. Moreover, if the corresponding inducible knock-out animals could be generated, then the stochastic nature of the inducing treatment would probably affect only a limited number of cells within a given cell population. In our experimental context, a less than quantitative knock-out would significantly complicate interpretation of results, even to the point that no additional insight might be gained.

      Rev#3.31. The analysis in Figure 6, even with attempts to control drug treatments, is highly speculative. One really needs animals with predominately cones vs. predominately rods to do this analysis (e.g. with NRL mice).

      Response: The reviewer is right, the analysis shown in Figure 6 was an explorative approach to try and deduce features of rod and cone metabolism. This is now mentioned in the results section (Lines 282-284). Since the experiments were not initially intended to address such questions, by necessity the interpretations remain speculative. The comparison of mouse mutants in which there are either no cones (e.g. cpfl1 mouse) or no rods (e.g. NRL knock-out mouse) may allow to disentangle the metabolic contributions of rods and cones. We appreciate the suggestion from the reviewer and have now inserted a relating suggestion for future studies into the discussion section (Lines 450-452).

      Rev#3.32. Overall, much of the paper suggests intriguing pathways, but without C13 tracing or relevant genetic knock-outs, the pathways would have to be speculative rather than definitive.

      Response: We agree with the reviewer that further research, including 13C and 15N-tracing studies, will be necessary to evaluate which pathway(s) are used by what retinal cell type under what condition. Still, the high robustness and quantitative nature of the NMR metabolomics data allows us to draw pathway conclusions based on metabolites that are unique to specific pathways/cell types or using ratios. We now relate to the advantages of such carbon-tracing studies in the discussion of the manuscript (Lines 545-549).

      Stylistic suggestions:

      Rev#3.33. This is a very dense paper to read. It would be helpful for each figure to have a summary diagram of the relevant metabolite changes and how they fit together. Further, for those not metabolism-inclined, defining the mini-Kreb’s, Cahill, and Cori cycles and their brief implications at some point early in the manuscript would be helpful.

      Response: We have been thinking a lot about how we could add in the suggested summary diagrams into each figure. Unfortunately, whatever idea we contemplated would have significantly increased the complexity of the figures, while the actual benefit in terms of improved understandability was unclear.

      However, we did include the suggestion from the reviewer to present the terms Cori, Cahill-, and mini-Krebs-cycle already in the introduction and we hope that this has improved the understandability of the manuscript overall (Lines 79-92).

      Rev#3.34. More discussion about the step-by-step ways that the mini-Kreb’s reaction “uncouples” glycolysis from the Kreb’s cycle would be helpful. What do you mean by “uncouple” in this context?

      Response: We thank the reviewer for this suggestion. Uncoupling in this context means that glycolysis and Krebs cycle are not metabolically coupled to each other via pyruvate. Instead both pathways can run independently from each other and in parallel, as long as the Krebs-cycle uses glutamate, BCAAs or other amino acids as fuels. We now also address this point already in the introduction of the manuscript (Lines 87-90).

      Conceptual questions:

      Rev#3.35. As the proposal that PR undergo heavy amounts of OXPHOS is controversial, it would be helpful for the authors to review the literature on lactate production by the retina and what studies have shown previously about retina use of lactate, specifically lactate making its way into TCA cycle intermediates, suggesting OXPHOS, in PRs.

      Response: In response to this question we have added several new references to the introduction and discussion of the manuscript. The question of lactate production (aerobic glycolysis) vs. the use of OXPHOS is now discussed in Lines 77-81, Lines 367-384.

      Rev#3.36. Why would cones die more in the no RPE condition? The authors suggest this has something to do with GLUT1 expression on RPE and the transport of glucose to cones. Even if we accept that cones are highly glycolytic, loss of RPE should expose the neural retina to even more glucose in your experimental set-up.

      Response: This is a very interesting question from the reviewer. Indeed, loss of the RPE and blood-retinal barrier function should increase photoreceptor access to glucose, even more so if they are expressing high affinity GLUT3. In the discussion (Lines 420-424), we speculate that this may trigger the Crabtree effect, shutting down OXPHOS and causing the cells to exclusively rely on glycolysis. This, however, will likely not yield sufficient ATP to maintain their viability, so that they “starve” to death even in the presence of ample glucose. Since cones require at least twice as much ATP as rods, they may be more sensitive to a Crabtree-dependent shut-down of OXPHOS. However, if this speculation was correct then the question remains why the FCCP treatment, which abolishes OXPHOS more directly, does not cause cone death. Here, we again can only speculate that high glucose may have additional toxic effects on cones that are independent of OXPHOS. We now try to present this reasoning in the discussion (Lines 426-429).

    2. eLife assessment

      Chen and colleagues utilize an in situ explant model of the neural retina and retinal pigment epithelium (RPE), along with small molecule inhibition of key metabolic enzymes and targeted metabolomic analysis, to decipher key differences in metabolic pathways used by rods, cones, Muller glia, and the RPE. They conclude that rods are heavily reliant on oxidative metabolism, cones are heavily reliant on glycolysis, and multiple mechanisms exist to decouple glycolysis from oxidative metabolism in the retina. This study provides valuable metabolomic data and insights into the metabolic flexibility of different retinal cells. However, current evidence is still incomplete as several of the conclusions from the paper stand in contradiction to other published findings and the authors naturally suggests experiments that will be needed in the future to validate the hypothesized pathways and refute existing published data. Such future validation includes animal models with tissue specific knockout of the key enzymes probed in the study; inhibiting the targets of this study with more than 1 small molecule that is structurally different, and at different doses and timings; using retinal explants from matured animals; performing labeled metabolite tracing experiments; and direct assessment of mitochondrial function (via OCR) under various manipulations.

    3. Reviewer #1 (Public Review):

      Summary:

      In the resubmitted manuscript by Chen et al. entitled, "Retinal metabolism: Evidence for uncoupling of glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle", the authors look to provide insight on retinal metabolism and substrate utilization but using a murine explant model with various pharmacological treatments in conjunction with metabolomics. The authors conclude that photoreceptors, a specific cell within the explant, which also includes retinal pigment epithelium (RPE) and many other types of cells, are able to uncouple glycolytic and Krebs-cycle metabolism via three different pathways: 1) the mini-Krebs-cycle, fueled by glutamine and branched-chain amino acids; 2) the alanine-generating Cahill-cycle; and 3) the lactate-releasing Cori-cycle. While the authors have toned down some of their bold conclusions made in the original manuscript, they did very little in the way of providing additional well-controlled experiments, including cell-specific treatments, genetic knockouts, or stable isotope tracing to support their conclusions. Rather, the authors proceed to speculate more without additional data. The major issues raised by this reviewer were not adequately addressed. As such, the conclusions continue to be highly speculative and not well supported with evidence.

      Strengths of resubmission:

      The resubmission toned down some of its bold statements.

      Weaknesses of resubmission:

      Major weaknesses of this study persist including lack of in vivo supporting data. Also, retinal explant culture metabolomics are done in neuroretina with RPE attached, which are metabolically active and can be altered by the treatments investigated herein, further confounding the claims made regarding the neuroretina. While including the RPE in the explant model is commended, it needs to be separated from the retina prior to metabolomics to get a better sense of each tissues' metabolism. Also, melanin within RPE will hinder immunofluorescence signal, so one cannot state that RPE do not express certain enzymes based solely on immunofluorescence. Pharmacologic treatments are not cell-specific as the enzymes are expressed in numerous cells within the retina and RPE, and/or the treatments have significant off-target effects (such as shikonin). So, it is difficult to ascertain that the metabolic changes are secondary to the effects on photoreceptors alone, which the authors claim. Additionally, the explants are taken at a very early age when photoreceptors are known to still be maturing. No mention or data is presented on how these metabolic changes are altered in retinal explants after photoreceptors have fully matured. Likewise, significant assumptions are made based on a single metabolomics experiment with no stable isotope tracing to support the pathways suggested. In vivo, stable-isotope retinal metabolomics are being done and have been done, so stating this technology is beyond our field is false. Therefore, the conclusions reached in this manuscript are still not supported.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors aim to learn about retinal cell specific metabolic pathways, which could substantially improve the way retinal diseases are understood and treated. They culture ex vivo mouse retinas for 6 days with 2 - 4 days of various drug treatments targeting different metabolic pathways or by removing the RPE/choroid tissue from the neural retina. They then look at photoreceptor survival, stain for various metabolic enzymes/transporters and quantify a broad panel of metabolites. While this is an important question to address, the results are not sufficient to support the conclusions.

      Strengths:

      The questions the authors are exploring at extremely valuable and I commend the authors and working to learn more about retina metabolism. The different sensitivity of the cones to various drugs is interesting and may suggest key differences between rod and cones. The authors also provide a thoughtful discussion of various metabolic pathways in the context of previous publications.

      Weaknesses:

      As the authors point out, ex vivo culture models allow for control over multiple aspects of the environment (such as drug delivery) not available in vivo. Ex vivo cultures can provide good hints as to what pathways are available between interacting tissues. However, there are many limitations to ex vivo cultures, including shifting to a very artificial culture media condition that is extremely different than the native environment of the retina. It is well appreciated that cells have flexible metabolism and will adapt to conditions provided. Therefore, observations of metabolic responses obtained under culture conditions need to be interpreted with caution, they indicate what the tissue is doing under those specific conditions (which include cells adapting and dying).

      Chen et al use pharmacological interventions are to the impact of various metabolic pathways on photoreceptor survival and "long term" metabolic changes. The dose and timing of these drug treatments are not examined though. It is also hard to know how these drugs penetrate the tissue and it needs to be validated that they intended targets are being accurately hit. These relatively long term treatments should be causing numerous downstream changes to metabolism, cell function and survival, which makes looking at a snap shot of metabolite levels hard to interpret. It would be more valuable to look at multiple time points after drug treatment, especially easy time points (closer to 1 hr). the authors use metabolite ratios to make conclusions about pathway activity. It would be more valuable to directly measure pathway activity by looking a metabolite production rates in the media and/or with metabolic tracers again in time scales closer to minutes and hours instead of days.

      While the data is interesting and may give insights into some rod and cone specific metabolic susceptibility, more work is needed to validate these conclusions. Given the limitations of the model the authors have over interpreted their findings and the conclusions are not supported by the results. They need to either dramatically limit the scope of their conclusions or validate these hypotheses with additional models and tools.

    5. Reviewer #3 (Public Review):

      Summary:

      The neural retina is one of the most energetically active tissues in the body and research into retinal metabolism has a rich history. Prevailing dogma in the field is that the photoreceptors of the neural retina (rods and cones) are heavily reliant on glycolysis, and as oxygen tension at the level of photoreceptors is very low, these specialized sensory neurons carry out aerobic glycolysis, akin to the Warburg effect in cancer cells. It has been found that this unique metabolism changes in many retinal diseases, and targeting disease-altered retinal metabolism may be a viable treatment strategy. The neural retina is composed of 11 different cell types, and many research groups over the past century have contributed to our current understanding of cell-specific metabolism of retinal cells. More recently, it has been shown in mouse models and co-culture of the mouse neural retina with human RPE cultures that photoreceptors are reliant on the underlying retinal pigment epithelium for supplying nutrients. Chen and colleagues add to this body of work by studying an ex vivo culture of the developing mouse retina that maintained contact with the retinal pigment epithelium. They exposed such ex vivo cultures to small molecule inhibitors of specific metabolic pathways, performing targeted metabolomics on the tissue and staining tissue with key metabolic enzymes to lay the groundwork for what metabolic pathways may be active in particular cell types of the retina. The authors conclude that rod and cone photoreceptors are reliant on different metabolic pathways to maintain their cell viability - in particular, that rods rely on oxidative phosphorylation and cones rely on glycolysis. Further, their data suggest multiple mechanisms whereby glycolysis may occur simultaneously with anapleurosis to provide abundant energy to photoreceptors. The data from metabolomics revealed several novel findings in retinal metabolism, including the use of glutamine to fuel the mini-Krebs cycle, the utilization of the Cahill cycle in photoreceptors, and a taurine/hypotaurine shuttle between the underlying retinal pigment epithelium and photoreceptors to transfer reducing equivalents from the RPE to photoreceptors. In addition, this study provides quantitative metabolomics datasets that can be compared across experiments and groups. The use of this platform will allow for rapid testing of novel hypotheses regarding the metabolic ecosystem in the neural retina.

      Strengths:

      The data on differences in susceptibility of rods and cones to mitochondrial dysfunction versus glycolysis provides novel hypothesis-generating conjectures that can be tested in animal models. The multiple mechanisms that allow anapleurosis and glycolysis to run side-by-side add significant novelty to the field of retinal metabolism, setting the stage for further testing of these hypotheses as well.

      Weaknesses:

      Almost all of the conclusions from the paper are preliminary, based on data showing enzymes necessary for a metabolic process are present and the metabolites for that process are also present. However, to truly prove whether these processes are happening (rather than speculation of the possibility they are happening), further experiments are necessary. As it currently stands, results from this study contradict results from other studies - in particular that cones, not rods, are most reliant of glycolysis. The authors attempt to address these contradictions, but without further experimentation, logical arguments carry only so much weight. At a minimum, the authors have argued that the small molecules they use are exquisitely specific for their intended targets, but validating results with a second small molecule that hits the same target but is structurally different would bolster their claims. Genetically knocking down the intended targets with interfering RNA technology would also be possible, as would explant cultures from knock-out animals. Without these studies to confirm target specificity, combined with the fact that conclusions from this study contradict existing studies in the literature, the results have to be categorized as speculative and hypothesis-generating rather than conclusive.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their comments, as well as for the time dedicated to make useful suggestions that have contributed to improve the manuscript. We have responded to the concerns raised by the reviewers, and after that, we have also responded to the different points highlighted in the Recommendations for the authors:

      Reviewer #1

      While in vivo injury was used to assess regeneration from subsets of PNS neurons, different in vitro neurite growth or explant assays were used for further assessments. However, the authors did not assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro. Such results will be important in interpreting the results.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Intriguingly, even in individual groups of PNS neurons, not all neurons regenerate to the same extent. It is known that the distance between the cell body and the lesion site affects neuronal injury responses. It would be interesting to test this in the observed regeneration.

      Although it is true that the distance can affect the outcome, here we used a physiological model where all neurons are lesioned at the same point in the nerve. Not only distance is different for motoneurons, but also the microenvironment surrounding their somas and therefore the direct comparison of these neurons with sensory neurons is limited. We extended the discussion on this matter in the new manuscript.

      Fig 1: The authors quantified the number of regenerating axons at two different time points. However, the total numbers of neurons/axons in each subset are different. The authors should use these numbers to normalize their regenerative axons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. It would be informative for the authors to analyze/discuss this further.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      Fig 6: Is it possible to assess the regenerative effects of knockdown Med12 after in vivo injury?

      It is possible, but it is out of the scope of this work. Here, we aimed to describe the regenerative response and validate our data by testing a potential target for specific regeneration. Future studies will focus on the modulation of this specific regeneration both in vitro and in vivo.

      Reviewer #2

      It seems that the most intriguing outcome of this paper revolves around the role of Med12 in nerve regeneration. The authors should prioritize this finding. Drawing a conclusion regarding Med12's role in proprioceptor regeneration based solely on this in vitro model may be insufficient. This noteworthy result requires further investigation using more animal models of nerve regeneration.

      The main goal of this work was to compare the regenerative responses of different neuron subpopulations. We modulated Med12 to validate our data and the potential of our findings. Unfortunately, investigating in depth the role of Med12 in regeneration is out of the scope of this paper. For this reason, we did not prioritise this finding here. As this finding was striking, we strongly agree that the next step should be studying how it modulates regeneration.

      One critique revolves around the authors' examination of only a single time point within the dynamic and continuously evolving process of regeneration/reinnervation. Given that this process is characterized by dynamic changes, some of which may not be directly associated with active axon growth during regeneration, and encompasses a wide range of molecular alterations throughout reinnervation, concentrating solely on a single time point could result in the omission of critical molecular events.

      We agree that this is probably the main limitation of this study, as we discussed in the text. We chose 7 days postinjury as a standard time point widely described in literature and to have a correlate with our histological data. Although the main aim was to compare populations, analyzing an additional time point after injury could add valuable information.

      Reviewer #3

      No concerns were expressed by that reviewer.

      Recommendations for the authors:

      The authors should assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Optional:

      It will be interesting to test if the distances between the cell body and the lesion site contribute to the observed differences in individual subsets of PNS neurons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. At least the authors should discuss these.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      While the paper is technically well-executed, the conclusions and some of the findings appear to be incomplete and challenging to draw meaningful conclusions from. This manuscript presents some interesting findings, but the title is quite broad and may suggest that the authors have unveiled fundamental mechanisms explaining the varying regenerative abilities of peripheral axons. However, the results do not substantiate such a conclusion. Further comments and suggestions follow.

      We eliminated the word “regenerative (response)” from the title, as it could lead to think that all changes seen in these neurons are related only to regeneration. We think that “Neuron-specific RNA-sequencing reveals different responses in peripheral neurons after nerve injury” highlights the differences between neurons that we found without misleading towards thinking that we described regenerative mechanisms in all neurons.

      What's notably absent here is the validation of certain genes found with the ribosomes, especially those highlighted in the subsequent figures. The question arises as to whether the changes depicted in the figures align with changes in the DRGs in vivo. Is there concordance between the presence of these genes and their transcriptional changes? It would greatly enhance the study's value if the authors could show evidence of upregulation or downregulation of certain genes over time in tissue sections, utilizing techniques such as in situ hybridization or immunocytochemistry.

      We selected some factors that were specifically upregulated in subsets of neurons to corroborated by immunohistochemistry these findings. Changes in the immunofluorescence of P75 in motoneurons and ATF2 in cutaneous mechanoreceptors, were evaluated in controls and animals that received a nerve crush one week before. Supplementary figures with the images have been added.

      The authors discovered intriguing distinctions, such as the presence of specific signaling pathways unique to neurons projecting to muscle as opposed to those projecting to the skin. Among these pathways were those associated with receptor tyrosine kinases like VEGF, erbB, and neurotrophin signaling among others. The question now arises: do these pathways play a role in natural peripheral regeneration processes? To answer this, it is imperative to conduct in vivo studies. However, the authors employed an in vitro DRG neurite outgrowth assay to demonstrate that various types of neurons exhibit different responses to the presence of different neurotrophins. This does not reflect what actually happens in vivo. While neurotrophins indeed play a role in neuron survival and axon extension during development, their role in postnatal periods changes over time, and it remains unclear whether they play any role in the natural regenerative processes of the peripheral nerve. Therefore, this experiment may not be directly relevant in this case, especially during the early axon extension period of the regenerating axons. if the authors aim to establish a causal link with neurotrophin signaling, it becomes crucial to conduct in vivo experiments by manipulating the expression of key molecules like the receptors.

      It has been widely described that different types of peripheral neurons have a differential expression of Trk receptors, even in the adult, and that these respond differentially to neurotrophins. In our study, we do not stablish a causal relationship between the expression of Trk and neurite extension, but instead we show (as many others) that distinct neurons respond differentially to these neurotrophins. The fact that in vivo studies fail to show a clear effect does not necessarily mean that neurotrophins are not specific. It might mean that their effect is not strong enough to be a useful guide in the complex microenvironment found after an injury. For instance, NGF acts on TrkA (present in some neurons), but in vivo it has been shown to accelerate the clearance of myelin debris in Schwann cells (Li et al., 2020), which could facilitate regeneration of all type of axons, masking any potential specific effect on the subtypes of neurons expressing TrkA. In contrast, in an in vitro setting on neuronal cultures, the specific neuronal effect can be more evident.

      Additionally, it's worth noting that another paper utilizing the same methodology and experimental setup (PMID: 29756027, "Translatome Regulation in Neuronal Injury and Axon Regrowth" by Rozenbaum et al.) exists. Are there any significant differences or shared findings with that study?

      This study shows the transcriptomic response after an injury 4, 12 and 24 hours after an injury in a very similar experimental setup. They focus on comparing the neuronal vs the glial response to the injury, using a Ribotag line that tags ribosomes from all neurons in the DRG rather than specific neuron subtypes. As the time postinjury (24h vs 7 days) and the cell types studied are different, we could not directly compare our results. We did see an upregulation in both datasets of previously described growth-associated genes (Jun, Atf3, Sox11, Sprr1a, Gal…). We included the article in the references for its relevance in the topic.

      It would be helpful for readers to illustrate the finding of the fastest axon regeneration of nociceptors by showing fluorescence micrographs of the nerve samples in addition to the graphs shown in Fig. 1 C/D.

      In figure 1B, we show fluorescence micrographs of the nerves 7 days postinjury. As explained in the results, we counted the number of axons at 2 distances from the injury, we did not analyse the fastest axon. This is due to technical reasons: 7 days after the injury the fastest axon has surpassed our evaluation point, which was the further distance that we could assess in our experimental setting in a consistent manner. If the reviewer thinks that we need to include more images from our evaluations (from 9 dpi for example), we could prepare a new figure.

      The labeling in Fig. 2B is confusing. Is the CHAT immunoreactivity shown in the last panel illustrated by green or red signals? Is the red signal counterstaining with beta-tubulin?

      The labelling was changed in the figure to increase clarity.

      The references to the supplementary data throughout the manuscript are confusing. For example, where can the "Supp data 2" be found? (mention on p. 14 in the merged pdf file). Are they referring to the Excel spreadsheets?

      We divided the supplementary material in supplementary figures/table (found in the pdf) and supplementary data. Supplementary data refers to excel spreadsheets found outside the pdf file. We hope this will be clearer after the final formatting of the article.

      What does the following statement on p. 14 mean?: "The caveat in these analyses was that molecular classification by these approaches may be arbitrary, and not reflective of protein repurposing." This reviewer notes that these databases consider the fact that components participate in different pathways.

      Indeed, we aimed to explain that many proteins participate in different pathways, and this is a limitation of the enrichment analysis. We modified the sentence in the text.

      First paragraph on p. 15: The PPAR and AMPK pathways have much broader roles, and are not only "related to fatty acid metabolism". This factual inaccuracy should be corrected in the manuscript.

      The sentence has been corrected.

      The authors should consider showing increased TGF-beta signaling in their neurons after downregulation of Med12 given the previous implication of TGF-beta signaling in axon regeneration.

      We tried to demonstrate the effect of our knockdown in TGF-beta pathway by analyzing the expression of typical targets from this pathway by qPCR in our cultures. However, we could not detect any difference. We think that this can have two explanations: (1) as only a few cells upregulate Med12 whereas many cells downregulate it, the effect is masked (presumably only proprioceptors will have a significant difference in this pathway and, thus, it would be very difficult to see the effect), or (2) Med12 is not exerting its effect through this pathway. We added a supplementary figure with these data and discussed it in the manuscript.

      It would be helpful to eliminate typos and improve syntax/grammar/style.

      We revised the text to improve style.

    2. eLife assessment

      The valuable findings in this study show that subpopulations of peripheral sensory neurons display different capacities for regeneration after a similar injury. Nociceptor neurons have greater regeneration over mechanoreceptor, proprioceptors and motor neurons. This differential responsiveness of neuronal subtypes was traced to activation of different transcriptional programs, which were carefully analyzed and quantitated, resulting in solid evidence for the conclusions.

    3. Joint Public Review:

      Bolivar et al. set out to explore whether four distinct neuronal subtypes within the peripheral nervous system exhibit varying potentials for axon regeneration following nerve injury. To investigate this question, they harnessed the power of four distinct reporter mouse models featuring fluorescent labeling of these neuronal subtypes. Their findings reveal that axons of nociceptor neurons exhibit faster regeneration than those of motor neurons, with mechanoreceptors, and proprioceptors displaying the slowest regeneration rate.

      To delve into the molecular mechanisms underlying this divergence in regeneration potential, the authors employed the Ribotag technique in mice. This approach enabled them to dissect the differential translatomes of these four neuronal populations after nerve injury, comparing them to uninjured neurons. Their comprehensive expression profiling data uncovers a remarkably heterogeneous response among these neuron subtypes to axon injury.

      To focus on one identified target with a mechanistic experiment as a proof of concept, their analysis highlights a striking upregulation of MED12 in proprioceptors, leading to the hypothesis that this molecule may play an inhibitory role, contributing to the comparatively slower regeneration of proprioceptor axons when compared to other neuronal subtypes. This hypothesis gains support from their in vitro model, where siRNA-mediated downregulation of MED12 results in a significant increase in neurite outgrowth in proprioceptive neurons after plating in cell culture dishes.

      Overall, this is an interesting study, and in conjunction with similar work from others will be highly valuable for neurobiologists studying how to modulate the regeneration of axons from distinct neuronal subtypes. The quality of data presentation appears to be very good in general, and the manuscript is appropriately written.

      Comments on revised version:

      Because there are multiple explanations for the differential regeneration responses, the authors have provided further discussion about how regeneration may be regulated in vitro and in vivo. The detection of a gene, Med12, which is unregulated in proprioceptive neurons, but not nociceptive and mechanoceptors, gives support to the existence of specific programs of responses in the peripheral nervous system after injury. Further investigation is needed to define this responsiveness in detail.

      Another response is the role of neurotrophins and their receptors. The authors have considered outcomes as a result of different Trk receptor signaling and also the effect of TGFbeta and IL6 as cytokine modulators. Add to this list is the possibility that axon guidance molecules and downstream substates may also play a role.

      The original title was considered to be too broad and did not explain all the mechanistic aspects of this study. Therefore a revised title "Neuron-specific RNA-sequencing reveals different response in peripheral neurons after nerve injury" was used. It is appropriately suitable for the results reported in this manuscript.

    1. Author Response

      Public Reviews:

      Reviewer #1

      Strengths:

      Overall, the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      The Reviewer correctly points out that 3xTg-AD mice typically do not exhibit plaques before 6 months of age, with limited amounts even up to 12 months, particularly in the hippocampus. To the best of our knowledge, the 6E10 antibody binds to an epitope in APP (682-687) that is also present in the Abeta (3-8) peptide. Consequently, 6E10 detects full-length APP, α-APP (soluble alpha-secretase-cleaved APP), and Abeta (LaFerla et al., 2007). Nonetheless, we concur with the Reviewer's observation that the detected signal includes Abeta oligomers and the C99 fragment, which is currently considered an early marker of AD pathology (Takasugi et al., 2023; Tanuma et al., 2023). Studies have demonstrated intracellular accumulation of C99 in 3-month-old 3xTg mice (Lauritzen et al., 2012), and its binding to the Kv7 potassium channel family, which results in inhibiting their activity (Manville and Abbott, 2021). If a similar mechanism operates in IS-3 cells, it could explain the changes in their firing properties observed in our study. Consequently, we will revise the manuscript to include this crucial information in both the Results and Discussion sections.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

      We appreciate the Reviewer’s perspective on the techniques used for imaging synaptic connections. While we acknowledge the limitations of confocal microscopy for resolving pre- and post-synaptic structures in thick sections, we respectfully disagree regarding the exclusive suitability of electron microscopy (EM). Our approach involved confocal 3D image acquisition using a 63x objective at 0.2 um lateral resolution and 0.25 Z-step, providing valuable quantitative insights into synaptic bouton density. Despite the challenges posed by thick sections, this method together with automatic analysis allows for careful quantification. Although EM offers unparalleled resolution, it presents challenges in quantification. We will ensure to include the important details regarding image acquisition and analysis in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      3xTg mice show early Aß accumulation in VIP-positive interneurons.

      3xTg mice show deficits in a spatially modified version of the novel object recognition test. - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      3xTg mice show increased O/A interneuron activity during specific behavioral conditions.

      3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      We are thankful to the reviewer for their insightful recognition of our efforts and their enthusiasm for the results of this research.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

      RE: We completely agree with the reviewer's observation regarding the lack of demonstration of causality in our results. Investigating causality in the relationship between deficits in VIP physiological properties and differences in network activity is indeed a crucial aspect of this project. However, achieving this goal will require a significant amount of time and dedicated manipulations in a new mouse model (VIP-Cre-3xTg). We appreciate the importance of this line of investigation and consider it as a priority for our future research endeavors.

    2. eLife assessment

      This study describes important findings related to early disruptions in disinhibitory modulation exerted by VIP+ interneurons, in CA1 in a transgenic model of Alzheimer's disease pathology. The authors provide a convincing analysis at the cellular, synaptic, network, and behavioral levels on how these changes correlate and might be related to behavioral impairments during these early stages of AD pathology.

    3. Reviewer #1 (Public Review):

      Summary:

      The work in the manuscript titled " Altered firing output of VIP interneurons and early dysfunctions in CA1 hippocampal circuits in the 3xTg mouse model of Alzheimer's disease" utilized patch-clamp techniques to explore the electrophysiological characteristics of VIP interneurons in the early stages of AD using the 3xTg mouse model. The study revealed that VIP interneurons exhibited prolonged action potentials and reduced firing rates. These changes could not be attributed to modifications in input signals or morphological transformations. The authors attributed aberrant VIP activity to the accumulation of beta-amyloid in those interneurons.

      The decreased frequency of VIP inhibitory events was associated with no observed changes in excitatory drive to these interneurons. Consequently, heightened activity in the general population of CA1 interneurons was observed during a decision-making task and an object recognition test. In light of these findings, the authors concluded that the altered firing patterns of VIP interneurons may initiate early-stage dysfunction in hippocampal CA1 circuits, potentially influencing the progression of AD pathology.

      Strengths:

      Overall the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70-micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

    4. Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      - 3xTg mice show early Aß accumulation in VIP-positive interneurons.

      - 3xTg mice show deficits in a spatially modified version of the novel object recognition test.

      - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      - 3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      - 3xTg mice show increased O/A interneuron activity during specific behavioral conditions.

      - 3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their constructive comments. The following is our point-to-point responses.

      Reviewer #1 (Recommendations For The Authors):

      Point 1- Abstract: advanced morning peak « opposite » to pdf/pdfr mutants. To my knowledge, the alteration of PDF/PDFR suppresses the morning peak. I am not sure that an advance of the peak is « opposite » to its inhibition?

      Mutants with disruptions in CNMa or CNMaR display advanced morning activity, indicating an enhanced state. Mutants with disruptions in Pdf or Pdfr exhibit no morning anticipation, suggesting a promoting role of these genes in morning anticipation. Therefore, our revised version is: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-51)

      Point 2- Fig 1K-L: the authors should show the sleep phenotype of the homozygous nAChRbeta2 mutant (if not lethal) for a direct comparison with the FRT/FLP genotype and thus evaluate the efficiency of the system.

      We have incorporated sleep profiles of nAChRbeta2 mutant and W1118 into Fig 1K-L. nAChRbeta2 mutants (red) exhibited a sleep amount comparable to that of pan-neural nAChRbeta2 knockout flies (dark red), as shown below.

      Author response image 1.

      Point 3- Dh31-EGFP-FRT expression patterns look different in figS1 A (or fig1 H) and J. why that?

      We re-examined the original data. Both (with R57C10-GAL4 for Fig. S1A, right, S1J, left) are Dh31EGFP.FRT samples displayed below which demonstrated consistent primary expression subsets. Any observed disparities in region "e" could potentially be attributed to variations during dissection.

      Author response image 2.

      Point 4- The knockdown experiments with the elav-switch (RU486) system (fig S2) do not seem to be as efficient as the HS-FLP system (fig 1H-J). The conclusions on the efficiency should be toned down.

      We have revised accordingly: "Near Complete Disruption of Target Genes by GFPi and Flp-out Based cCCTomics" (Line 130): "Knocking out at the adult stage using either hsFLP driven Flp-out (Golic and Lindquist, 1989) (Fig. 1H-1J) or neural (elav-Switch) driven shRNAGFP (Nicholson et al., 2008; Osterwalder et al., 2001) (Fig. S2A-S2I), also resulted in the elimination of most, though not all, GFP signals." (Line 145-149)

      Point 5- Fig 2H-J: the LD behavioral phenotype of pdfr pan-neuronal cripsr does not seem to correspond to what is described in the literature for the pdfr mutant (han), see hyun et al 2005 (no morning anticipation and advanced evening peak). I understand that the activity index is lower than controls but fig2H shows a large anticipatory activity that seems really unusual, and no advanced evening peak is observed. I think that the authors should show the CRISPR flies and pdfr mutants together, to better compare the phenotypes.

      Thank you for pointing out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig. 2H-2I of the previous version) whose morning anticipation still exist (Fig, 2H of the previous manuscript), although the significant decrease of morning anticipation index (Fig 2I of the previous manuscript) and advanced evening activity are not as pronounced as observed in han5304 (Fig. 3C in Hyun et al., 2005).

      First, we have separated the activity plots of Fig. 2H of previous manuscript, as shown below. The activity from ZT18 to ZT24 shows a tendency of decreasing from ZT18 to ZT21 and a tendency of increasing from ZT21 to ZT24. The lowest activity before dawn during ZT18 to ZT24 shows at about ZT21, and the activity at ZT18 is comparable to the activity at ZT24. This is significantly different compared to the two control groups, whose activity tends to increase activity from ZT18 to ZT24 with an activity peak at ZT24.

      The activity from ZT6 to ZT12 increased much faster in Pdfr knockout flies and get to an activity plateau at about ZT11 compared to two control groups with a slower activity increasing from ZT6 to ZT12 with no activity plateau but an activity peak at ZT12.

      Author response image 3.

      Second, we have incorporated the phenotype of Pdfr mutants we previously generated (Pdfr-attpKO Deng et al., 2019) with Pdfr pan-neuronal knockout by Cas9.HC. This mutant lacks all seven transmembrane regions of Pdfr (a). The phenotypes are very similar between Pdfr-attpKO flies and Pdfr pan-neuronal knockout flies. In this experimental repeat, we found that a much more obvious advanced evening activity peak is observed both in pan-neuronal knockout flies and Pdfr-attpKO flies.

      To further analyze the phenotypes of Pdfr pan-neuronal knockout flies by Cas9.HC, we referred to the literature. The activity pattern at ZT18 to ZT24 (activity tends to decrease from ZT18 to ZT21 and tends to increase from ZT21 to ZT24, with the lowest activity before dawn occurring at about ZT21, and activity at ZT18 comparable to activity at ZT24) is also reported in Pdfr knockout flies such as Fig3C and 3H in Hyun et al., 2005, Fig 2B in Lear et al., 2009, Fig 3B in Zhang et al., 2010, Fig .5A in Guo et al., 2014, and Fig 5B in Goda et al., 2019. Additionally, the less pronounced advanced evening activity peak compared to han5304 (Fig. 3C in Hyun et al., 2005) is also reported in Fig. 2B in Lear et al., 2009, Fig. 3B in Zhang et al., 2010, and Fig. 5B in Goda et al., 2019. We consider that this difference is more likely to be caused by environmental conditions or recording strategies (DAM system vs. video tracing).

      Therefore, we revised the text to: “Pan-neuronal knockout of Pdfr resulted in a tendency towards advanced evening activity and weaker morning anticipation compared to control flies (Fig. 2H-2I), which is similar to Pdfr-attpKO flies. These phenotypes were not as pronounced as those reported previously, when han5304 mutants exhibited a more obvious advanced evening peak and no morning anticipation (Hyun et al., 2005)”.

      Author response image 4.

      Point 6-The authors should provide more information about the DD behavior (power is low, but how about the period of rhythmic flies, which is shortened in pdf (renn et al) and pdfr (hyun et al) mutants).

      We have incorporated period data into Fig. 2I. Indeed, conditional knock out of Pdfr by Cas9.HC driven by R57C10-GAL4 shortens the period length, as shown below (previous data), also in Fig. 2I of the revised version.

      In the revised Fig. 2I, we tested 45 Pdfr-attpKO flies during DD condition (3 out of 48 flies died during video tracing in DD condition), and only one fly was rhythmic. In contrast, 9 out of 48 Pdfr pan-neuronal knockout flies were rhythmic.

      Author response image 5.

      Point 7- P15 and fig6. The authors indicate that type II CNMa neurons do not show advanced morning activity as type I do, but Figs 6 I and K seem to show some advance although less important than type I. I am not sure that this supports the claim that type I is the main subset for the control of morning activity. This should be toned down.

      We have re-organized Fig. 6 and revised the summary of these results as: “However, Type II neurons-specific CNMa knockout (CNMa ∩ GMR91F02) showed weaker advanced morning activity without advanced morning peak (Fig. 6N), while Type I neurons-specific CNMa knockout did (Fig. 6J), indicating a possibility that these two type I CNMa neurons constitute the main functional subset regulating the morning anticipation activity of fruit fly”. (Line 400-405)

      Point 8- Figs 6M and N: is power determined from DD data? if yes, how about the period and arrhythmicity? Please also provide the LD activity profiles for the mutants and rescued pdfr genotypes.

      Yes, the power was determined from the DD data. In the new version of the manuscript, we have included the activity plots for the LD phase in supplementary Fig S13, as well as shown below (A, B), and the period and arrhythmicity data for the DD phase in Fig. 6S and Table S7. We have also refined the related description as follows: “Moreover, knocking out Pdfr by GMR51H05, GMR79A11 and CNMa GAL4, which cover type I CNMa neurons, decreased morning anticipation of flies (Fig. 6T, Fig. S13B). However, the decrease in morning anticipation observed in the Pdfr knockout by CNMa-GAL4 was not as pronounced as with the other two drivers. Because the presumptive main subset of functional CNMa is also PDFR-positive, there is a possibility that CNMa secretion is regulated by PDF/PDFR signal”. (Line 413-419)

      Author response image 6.

      Point 9- Fig 7: does CNMaR affect DD behavior? This should be tested.

      We analyzed the CNMaR-/- activity in the dark-dark condition over a span of six days. Results revealed a higher power in CNMaR mutants compared to control flies (Power: 93.5±41.9 (CNMaR-/-, n=48) vs 47.3±31.6 (w1118, n=47); Period: 23.7±0.3 h (CNMaR-/-, n=46) vs 23.7±0.3 h (w1118, n=47); arrhythmic rate 2/48 (CNMaR-/-) vs 0/47 (w1118)). Considering that mutating CNMa had no obvious effect on DD behavior, even if CNMaR affects DD behavior, it cannot be attributed to CNMa signal, we did not further repeat and analyze DD behavior of CNMaR mutant. We believe this raises another question beyond the scope of our current discussion.

      Reviewer #2 (Recommendations For The Authors):

      Point 1-One major concern is the apparent discrepancies in clock network gene expression using the Flp-Out and split-LexA approaches compared to what is known about the expression of several transmitter and peptide-related genes. For example, it is well established that the 5th-sLNv expresses CHAT (along with a single LNd), yet there appears to be no choline acetyltransferase (ChAT) signal in the 5th-sLNv as assayed by the Split-LexA approach (Fig. 4). This approach also suggests that DH31 is expressed in the s-LNvs, which, as one of the most intensely studied clock neuron are known to express PDF and sNPF, but not DH31. The results also suggest that the sLNvs express ChAT, which they do not. Remarkably PDF is not included in the expression analysis, this peptide is well known to be expressed in only two subgroups of clock neurons, and would therefore be an excellent test case for the expression analysis in Fig. 4. PDF should therefore be added to analysis shown in Fig. 4. Another discrepancy is PdfR, which split LexA suggests is expressed in the Large LNvs but not the small LNvs, the opposite of what has been shown using both reporter expression and physiology. The authors do acknowledge that discrepancies exist between their data and previous work on expression within the clock network (lines 237 and 238). However, the extent of these discrepancies is not made clear and calls into question the accuracy of Flp-Out and Split LexA approaches.

      The concerns mentioned above are:

      (1) sLNvs express PDF and sNPF but not Dh31;

      (2) ChAT presents in 5th-sLNv and one LNd but not in other sLNvs;

      (3) PDFR presents in sLNvs but not l-LNvs.

      (4) PDF is not included in the analysis.

      To verify the accuracy of these intersection analyses, all related to PDF positive neurons (except 5th-sLNv and LNds), we stained PDF and examined the co-localization between PDF-positive LNvs and the respective drivers ChAT-KI-LexA, Pdfr-KI -LexA, Dh31-KI -LexA, and Pdf-KI -LexA.

      First, Dh31-KI-LexA labeled four s-LNvs, as shown below (also in Fig. S9A). Therefore, the results of the intersection analysis of Dh31-KI-LexA with Clk856-GAL4 are correct. The difference in the results compared to previous literature is attributed to Dh31-KI-LexA labels different neurons than the previous driver or antibody.

      Second, no s-LNv was labeled by ChAT-KI -LexA as shown below. We rechecked our intersection data and found that we analyzed 10 brains of ChAT-KI-LexA∩Clk856-GAL4 while only two brains showed sLNvs positively. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Third, one l-LNv and at least two s-LNvs were labeled by Pdfr-KI-LexA, as shown below (also in Fig. S9B). Fourth, Pdf-KI-LexA labels all PDF-positive neurons, but the intersection analysis by Pdf-KI-LexA and Clk856-GAL4 only showed scattered signals, as shown below (D, also in Fig. S9C). For these cases, we found some positive signals expected but not observed in our dissection. The possible reason could be the inefficiency of LexAop-FRT-myr::GFP driven by LexA. Therefore, our intersection results must miss some positive signals.

      Author response image 7.

      Finally, we revised the text to (Line 286-317):

      To assess the accuracy of expression profiles using CCT drivers, we compared our dissection results with previous reports. Initially, we confirmed the expression of CCHa1 in two DN1s (Fujiwara et al., 2018), sNFP in four s-LNvs and two LNds(Johard et al., 2009), and Trissin in two LNds (Ma et al., 2021), aligning with previous findings. Additionally, we identified the expression of nAChRα1, nAChRα2, nAChRβ2, GABA-B-R2, CCHa1-R, and Dh31-R in all or subsets of LNvs, consistent with suggestions from studies using ligands or agonists in LNvs (Duhart et al., 2020; Fujiwara et al., 2018; Lelito and Shafer, 2012; Shafer et al., 2008) (Table S4).

      Regarding previously reported Nplp1 in two DN1as (Shafer et al., 2006), we found approximately five DN1s positive for Nplp-KI-LexA, indicating a broader expression than previously reported. A similar pattern emerged in our analysis of Dh31-KI-LexA, where four DN1s, four s-LNvs, and two LNds were identified, contrasting with the two DN1s found in immunocytochemical analysis (Goda et al., 2016). Colocalization analysis of Dh31-KI-LexA and anti-PDF revealed labeling of all PDF-positive s-LNvs but not l-LNvs (Fig S9A), suggesting that the differences may arise from the broader labeling of 3' end knock-in LexA drivers or the amplitude effect of the binary expression system. The low protein levels might go undetected in immunocytochemical analysis. This aligns with transcriptome analysis findings showing Nplp1 positive in DN1as, a cluster of CNMa-positive DN1ps, and a cluster of DN3s (Ma et al., 2021), which is more consistent with our dissection.

      Despite the well-known expression of PDF in LNvs and PDFR in s-LNvs (Renn et al., 1999; Shafer et al., 2008), we did not observe stable positive signals for both in Flp-out intersection experiments, although both Pdf-KI-LexA and Pdfr-KI-LexA label LNvs as expected (Fig S9B-S9C). We also noted fewer positive neurons in certain clock neuron subsets compared to previous reports, such as NPF in three LNds and some LNvs (Erion et al., 2016; He et al., 2013; Hermann et al., 2012; Johard et al., 2009; Lee et al., 2006) and ChAT in four LNds and the 5th s-LNv (Johard et al., 2009; Duhart et al., 2020) (Table S4). We attribute this limitation to the inefficiency of LexAop-FRT-myr::GFP driven by LexA, acknowledging that our intersection results may miss some positive signals.

      Point 2-Related to this, the authors rather inaccurately suggest that the field's understanding of PdfR expression within the clock neuron network is "inconsistent" and "variable" (lines 368-377). This is not accurate. It is true that the first attempts to map PdfR expression with antisera and GAL4s were inaccurate. However, subsequent work by several groups has produced strong convergent evidence that with the exception of the l-LNvs after several days post-eclosion, PdfR is expressed in the Cryptochrome expressing a subset of the clock neuron network. This section of the study should be revised.

      We thank the reviewer for pointing this out. As we have already addressed and revised the related part in the RESULTS section (Line 308-317), we have now removed this part from the DISCUSSION section of the revised version.

      Point 3-One minor issue that would avoid unnecessary confusion by readers familiar with the circadian literature is the say that activity profiles are plotted in the study. The authors have centered their averaged activity profiles on the 12h of darkness. This is the opposite of the practice of the field, and it leads to some initial confusion in the examination of the morning and evening peak data. The authors may wish to avoid this by centering their activity plots on the 12h light phase, which would put the morning peak on the left and the evening peak on the right. This is the way the field is accustomed to examining locomotor activity profiles.

      The centering of averaged activity profiles on the 12 h of darkness is done to highlight the phenotype of advanced morning activity. To prevent any confusion among readers, we have included a sentence in the figure legend explaining the difference in our activity profiles compared to previous literatures: "Activity profiles were centered of the 12 h darkness in all figures with evening activity on the left and morning activity on the right, which is different from general circadian literatures. (Fig. 2H legend)" (Line 957-959))

      Point 4-The authors conclude that the loss of PDF and CNMa have opposite effects on the morning peak of locomotor activity (line 392). But they also acknowledge, briefly, that things are not that simple: loss of CNMa causes a phase advance, but loss of PDF causes a loss or reduction in the anticipatory peak. It is still significant to find a peptide transmitter with the clock neuron network that regulates morning activity, but the authors should revise their conclusion regarding the opposing actions of PDF and CNMa, which is not well supported by the data.

      We have revised the relevant parts.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Point 5-The authors should acknowledge, cite, and incorporate the substantive discussion of CNMa peptide and the DN1p neuronal class in Reinhard et al. 2022 (Front Physiol. 13: 886432).

      We have revised the text accordingly and cited this paper: “Type I with two neurons whose branches projecting to the anterior region, as in CNMa∩GMR51H05, CNMa∩Pdfr, and CNMa∩GMR79A11 (Fig. 6E, 5G, 6H), and type II with four neurons branching on the posterior side with few projections to the anterior region, as in CNMa∩GMR91F02 (Fig. 6F). These two types of DN1ps’ subsets were also reported and profound discussed previously (Lamaze et al., 2018; Reinhard et al., 2022)”. (Line 393-397)

      Reviewer #3 (Recommendations For The Authors):

      Point 1-Throughout the manuscript figure legends (axis, genotypes, etc) are too small to be appreciated. Fig. 1. Panel A. The labels are very difficult to read.

      We have attempted to enlarge the font as much as possible in the revised version.

      Point 2-Fig. 1. H-J Why is efficiency not mentioned in all the examples?

      In the revised manuscript, the results of Fig 1H-1J are discussed in the revised version (Line 145-147). The reason that we did not calculate the exact efficiency is that the GFP intensity is not stable enough which might change during dissection, mounting or intensity of laser in our experimental process. Therefore, in all results related to GFP signal (Fig. 1B-1J, Fig. S1, Fig. S2, Fig. 2B-2D), we relied on qualitative judgment rather than quantitative judgment, unless the GFP signal was easily quantifiable (such as in cases with limited cells or no GFP signal in the experimental group).

      Point 3-Fig. 1. Panel L, left (light phase): the statistical comparisons are not clearly indicated (the same happens in Figs 3Q and 3R).

      We have now re-arranged Fig. 1L and Fig. 3Q-3R to make the statistical comparisons clear in the new version.

      Point 4-Line 792. Could induced be introduced?

      Yes, we have now corrected this typo.

      Point 5-Fig. S1. Check labels for consistency. GMR57C10 Gal4 driver is most likely R57C10.

      We have now revised the labels (Fig. S1).

      Point 6-Fig. S2. If the experiments were repeated and several brains were observed, the authors should include the efficiency and the number of flies as reported in Fig. S1.

      We have now added the number of flies in Fig. S2 as reported in Fig. S1. As Response to Point 2 mentioned, due to the instability of the GFP signal, we are unable to provide a quantitative efficiency in this context.

      Point 7-Fig S4. The fig legend describes panels I-J which are not shown in the current version of the manuscript.

      We now have deleted them.

      Point 8-Fig 2I. Surprising values for morning anticipation indexes even for controls (0.5 would indicate ¨no anticipation¨; in controls, the expected values would be >>0.5, as most of the activity is concentrated right before the transition. Could the authors explain this unexpected result?

      We have revised the description of the calculation in the methods section (Line 612). After calculating the ratio of the last three hours of activity to the total six hours of activity, the results were further subtracted by 0.5. Therefore, the index should be ≤0.5. When the index is equal to 0, it indicates no morning anticipation.

      Point 9-Fig 2K/L. The authors mention that not all genes are effectively knocked out with their strategy. Could this be accounted for the specific KD strategy, its duration, or the promotor strength? It is surprising no explanation is provided in the text (page 9 line 179).

      In our pursuit of establishing a broadly effective method for gene editing, Fig. 2H-2L and Fig. 2D revealed that previous attempts have fallen short of achieving this objective. The observed inefficiency may be attributed to the intensity of the promoter, resulting in inadequate expression. Alternatively, the insufficient duration of the operation may also contribute to the lack of success. However, in the context of sleep and rhythm research applications, the age of the fruit fly tests is typically fixed, limiting the potential to enhance efficiency by extending the manipulation time. Moreover, increasing the expression level may pose challenges related to cytotoxicity, as reported in previous studies (Port et al., 2014). We refrain from offering specific explanations, as we lack a definitive plan and cannot provide additional robust evidence to support the above speculations. Consequently, in our ongoing efforts, we aim to enhance the efficiency of the tool system while operating within the current constraints.

      Point 10-Page 9, line 179. Can the authors include a brief description of the reason for the different modifications? Only one was referenced.

      We have revised related part in the manuscript (Line 223-231):

      Cas9.M9: We fused a chromatin-modulating peptide (Ding et al., 2019), HMGN1 183 (High mobility group nucleosome binding domain 1), at the N-terminus of Cas9 and HMGB1 184 (High mobility group protein B1) at its C-terminus with GGSGP linker, termed Cas9.M9.

      Cas9.M6: We also obtained a modified Cas9.M6 with HMGN1 at the N-terminus and an undefined peptide (UDP) at the C-terminus. (NOTE:UDP was gained by accident)

      Cas9.M0: We replaced the STARD linker between Cas9 and NLS in Cas9.HC with GGSGP the linker (Zhao et al., 2016), termed Cas9.M0

      Point 11-The authors tested the impact of KO nAChR2 across the different versions of conditional disruption (Fig 1K-L, Fig 2L, Fig 3R). It is surprising they observe a difference in daytime sleep upon knocking down with Cas9.HC (2L) but not with Cas9.M9 (3R) and the reverse is seen for night-time sleep. Could the authors provide an explanation? Efficiency is not the issue at stake, is it?

      In Fig. 2K, the day sleep of flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; UAS-Cas9/+) was significantly decreased compared to flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; +/+), but not when compared to flies (R57C10-GAL4/+; UAS-Cas9/+). Our criterion for asserting a difference is that the experimental group must show a significant distinction from both control groups. Therefore, we concluded that there was no significant difference between the experimental group and the control groups in Fig. 2K.

      Point 12-Fig. 4. Which of the two strategies described in A-B was employed to assemble the expression profile of CCT genes in clock neurons shown in C? This information should be part of the fig legend.

      We have now revised the legend as follows: “(A-B) Schematic of intersection strategies used in Clk856 labelled clock neurons dissection, Flp-out strategy (A) and split-LexA strategy (B). The exact strategy used for each gene is annotated in Table S5.”

      Point 13-Similarly, how many brains were analyzed to give rise to the table shown in C?

      We have now revised the legend of Table S4 to address this concern. As indicated in: “The largest N# for each gene in Table S4 is the brain number analyzed for each gene”.

      Point 14-Finally, the sentence ¨The figure is...¨ requires revision.

      We have now revised it: “The exact cell number for each subset is annotated in Table S4”.

      Point 15-Legend to Table S3. The authors have done an incredible job testing many gRNAs for each gene potentially relevant for communication. However, there is very little information to make the most out of it; for instance, the legend does not inform why many of the targeted genes do not appear to have been tested any further. It would be useful to the reader to discern whether despite being the 3 most efficient gRNAs, they were still not effective in targeting the gene of interest, or whether they showed off-targets, or it was simply a matter of testing the educated guesses. This information would be invaluable for the reader.

      First, we designed and generated transgenic UAS-sgRNA fly lines for all these sgRNAs. We randomly selected 14 receptor genes, known for their difficulty in editing based on our experience, to assess the efficiency of our strategy, as depicted in Fig. 3M-3P, Fig. S5, and Fig. S6. We believe these results are representative and indicative of the efficiency of sgRNAs designed using our process and applied with the modified Cas9.

      Secondly, we acknowledge your valid concern. While we selected sgRNAs with no predicted off-target effects through various prediction models (outlined in the Methods under C-cCCTomics sgRNA design), we did not conduct whole-genome sequencing. Consequently, we can only assert that the off-target possibility is relatively low. To address potential misleading effects arising from off-target concerns, it is essential to validate these results through mutants, RNAi, or alternative UAS-sgRNAs targeting the same gene.

      Point 16-Table S4. Some of the data presented derives from observations made in 1-2 brains for a specific cluster; isn´t it too little to base a decision on whether a certain gene is (or not) expressed? It is surprising since the same CCT line was observed/analysed in more brains for other clusters. Can the authors explain the rationale?

      The N# number represents the GFP positive number, and we have revised the legend of Table S4. The largest N# number denotes the total number of brains analyzed for a specific CCT line. It's possible that, due to variations in our dissection or mounting process, some clusters were only observed in 1-2 brains out of the total brains analyzed. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Point 17-The paragraph describing this data in the results section needs revising (lines 233-243).

      We have now revised this. (Line 286-317)

      Point 18-While it is customary for authors to attempt to improve the description of the activity patterns by introducing new parameters (i.e. MAPI and EAPI, lines 253-258) it would be interesting to understand the difference between the proposed method and the one already in use (which compares the same parameter, i.e., the slope (defined as ¨the slope of the best-fitting linear regression line over a period of 6 h prior to the transition¨, i.e., Lamaze et al. 2020 and many others). Is there a need to introduce yet another one?

      This approach is necessary. The slope defined by Lamaze et al. utilizes data from only 2 time points, which may not accurately capture the pattern within a period before light on or off. Linear regression is not well-suited for a single fly due to the high variability in activity at each time point, making it challenging to fit the model at the individual level. The parameters we have introduced (MAPI and EAPI) in this paper are concise and can be applied at the individual level, effectively reflecting the morning or evening anticipation characteristics of each fly.

      As an alternative, the activity plot of a certain fly line could be represented by an average of all flies' activity in one experiment. This would make linear regression easier to fit. However, several independent experiments are required for statistical robustness, necessitating the inclusion of hundreds of flies for each strain in a single analysis.

      Point 19-In general, the legends of supplementary figures are a bit too brief. S7 and S8: it is not clear which of the two intersectional strategies were used (it would benefit whoever is interested in replicating the experiments). Legend to Fig S8 should read ¨similar to Fig S7¨.

      We have now revised the legend and included “The exact strategy used for each gene is annotated in Table S5” in the legend.

      Point 20-The legend in Table S6 should clearly state the genotypes examined. What does the marking in bold refer to?

      We have now revised annotation of Table S6. Marking in bold refer to results out of one SD compared to control group.

      Point 21-Line 314. The sentence needs revision.

      We have revised these sentences.

      Point 22-Line 391 (and also in the results section). The authors attempt to describe the CNMa phenotype as the opposite of pdf/pdfr mutant phenotypes. However, no morning anticipation/advanced morning anticipation are not necessarily opposite phenotypes.

      We have revised related description.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Reference

      Deng, B., Li, Q., Liu, X., Cao, Y., Li, B., Qian, Y., Xu, R., Mao, R., Zhou, E., Zhang, W., et al. (2019). Chemoconnectomics: mapping chemical transmission in Drosophila. Neuron 101, 876-893.e874.

      Ding, X., Seebeck, T., Feng, Y., Jiang, Y., Davis, G.D., and Chen, F. (2019). Improving CRISPR-Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. Crispr j 2, 51-63.

      Duhart, J.M., Herrero, A., de la Cruz, G., Ispizua, J.I., Pírez, N., and Ceriani, M.F. (2020). Circadian Structural Plasticity Drives Remodeling of E Cell Output. Curr Biol 30, 5040-5048.e5045.

      Erion, R., King, A.N., Wu, G., Hogenesch, J.B., and Sehgal, A. (2016). Neural clocks and Neuropeptide F/Y regulate circadian gene expression in a peripheral metabolic tissue. eLife 5, e13552.

      Fujiwara, Y., Hermann-Luibl, C., Katsura, M., Sekiguchi, M., Ida, T., Helfrich-Förster, C., and Yoshii, T. (2018). The CCHamide1 neuropeptide expressed in the anterior dorsal neuron 1 conveys a circadian signal to the ventral lateral neurons in Drosophila melanogaster. Front Physiol 9, 1276.

      Goda, T., Tang, X., Umezaki, Y., Chu, M.L., Kunst, M., Nitabach, M.N.N., and Hamada, F.N. (2016). Drosophila DH31 neuropeptide and PDF receptor regulate night-onset temperature preference. J Neurosci 36, 11739-11754.

      Goda, T., Umezaki, Y., Alwattari, F., Seo, H.W., and Hamada, F.N. (2019). Neuropeptides PDF and DH31 hierarchically regulate free-running rhythmicity in Drosophila circadian locomotor activity. Sci Rep 9, 838.

      Guo, F., Cerullo, I., Chen, X., and Rosbash, M. (2014). PDF neuron firing phase-shifts key circadian activity neurons in Drosophila. Elife 3.

      He, C., Cong, X., Zhang, R., Wu, D., An, C., and Zhao, Z. (2013). Regulation of circadian locomotor rhythm by neuropeptide Y-like system in Drosophila melanogaster. Insect Mol Biol 22, 376-388.

      Hermann, C., Yoshii, T., Dusik, V., and Helfrich-Förster, C. (2012). Neuropeptide F immunoreactive clock neurons modify evening locomotor activity and free-running period in Drosophila melanogaster. J Comp Neurol 520, 970-987.

      Hyun, S., Lee, Y., Hong, S.T., Bang, S., Paik, D., Kang, J., Shin, J., Lee, J., Jeon, K., Hwang, S., et al. (2005). Drosophila GPCR Han is a receptor for the circadian clock neuropeptide PDF. Neuron 48, 267-278.

      Johard, H.A., Yoishii, T., Dircksen, H., Cusumano, P., Rouyer, F., Helfrich-Förster, C., and Nässel, D.R. (2009). Peptidergic clock neurons in Drosophila: ion transport peptide and short neuropeptide F in subsets of dorsal and ventral lateral neurons. J Comp Neurol 516, 59-73.

      Lamaze, A., Krätschmer, P., Chen, K.F., Lowe, S., and Jepson, J.E.C. (2018). A Wake-Promoting Circadian Output Circuit in Drosophila. Curr Biol 28, 3098-3105.e3093.

      Lear, B.C., Zhang, L., and Allada, R. (2009). The neuropeptide PDF acts directly on evening pacemaker neurons to regulate multiple features of circadian behavior. PLoS Biol 7, e1000154.

      Lee, G., Bahn, J.H., and Park, J.H. (2006). Sex- and clock-controlled expression of the neuropeptide F gene in Drosophila. 103, 12580-12585.

      Lelito, K.R., and Shafer, O.T. (2012). Reciprocal cholinergic and GABAergic modulation of the small ventrolateral pacemaker neurons of Drosophila's circadian clock neuron network. J Neurophysiol 107, 2096-2108.

      Ma, D., Przybylski, D., Abruzzi, K.C., Schlichting, M., Li, Q., Long, X., and Rosbash, M. (2021). A transcriptomic taxonomy of Drosophila circadian neurons around the clock. Elife 10.

      Port, F., Chen, H.M., Lee, T., and Bullock, S.L. (2014). Optimized CRISPR/Cas tools for efficient germline and somatic genome engineering in Drosophila. Proc Natl Acad Sci USA 111, E2967-2976.

      Reinhard, N., Schubert, F.K., Bertolini, E., Hagedorn, N., Manoli, G., Sekiguchi, M., Yoshii, T., Rieger, D., and Helfrich-Förster, C. (2022). The Neuronal Circuit of the Dorsal Circadian Clock Neurons in Drosophila melanogaster. Front Physiol 13, 886432.

      Renn, S.C., Park, J.H., Rosbash, M., Hall, J.C., and Taghert, P.H. (1999). A pdf neuropeptide gene mutation and ablation of PDF neurons each cause severe abnormalities of behavioral circadian rhythms in Drosophila. Cell 99, 791-802.

      Shafer, O.T., Helfrich-Förster, C., Renn, S.C., and Taghert, P.H. (2006). Reevaluation of Drosophila melanogaster's neuronal circadian pacemakers reveals new neuronal classes. J Comp Neurol 498, 180-193.

      Shafer, O.T., Kim, D.J., Dunbar-Yaffe, R., Nikolaev, V.O., Lohse, M.J., and Taghert, P.H. (2008). Widespread receptivity to neuropeptide PDF throughout the neuronal circadian clock network of Drosophila revealed by real-time cyclic AMP imaging. Neuron 58, 223-237.

      Zhang, L., Chung, B.Y., Lear, B.C., Kilman, V.L., Liu, Y., Mahesh, G., Meissner, R.A., Hardin, P.E., and Allada, R. (2010). DN1(p) circadian neurons coordinate acute light and PDF inputs to produce robust daily behavior in Drosophila. Curr Biol 20, 591-599.

      Zhao, P., Zhang, Z., Lv, X., Zhao, X., Suehiro, Y., Jiang, Y., Wang, X., Mitani, S., Gong, H., and Xue, D. (2016). One-step homozygosity in precise gene editing by an improved CRISPR/Cas9 system. Cell Res 26, 633-636.

    2. eLife assessment

      This paper expands the genetic toolset that was previously developed by the Rao lab to introduce the conditional downregulation of neurotransmission components in Drosophila. As a proof of principle, the authors tested their new collection and provide evidence of the contribution of CNMamide (a neuropeptide) to the temporal control of locomotor activity patterns. These are overall important findings supported by compelling evidence.

    3. Reviewer #1 (Public Review):

      Summary:

      The paper of Mao et al. expands the genetic toolset that was previously developed by the Rao lab (Denfg et al 2019) to introduce the conditional KO or downregulation of neurotransmission components in Drosophila. The authors then use these tools to investigate neurotransmission in the the clock neurons of the Drosophila brain. They first test some known components and then analyze the contribution of the CNMa neuropeptide and its receptor to the circadian behavior. The results indicate that CNMA acts from a subset of DN1ps (dorsal clock neurons) to set the phase of the morning peak of locomotor activity in light:dark cycles, with an advanced morning activity in the absence of the neuropeptide. Interestingly, the receptor for the PDF neuropeptide appears to be acting in some of the CNMa neurons to control morning activity.

      Strengths/weaknesses:

      This is clearly a very useful new set of tools to restrict the manipulation of these components to specific neuronal populations, and overall (see specific points below), the paper is convincing to show that the tools indeed allow to efficiently and specifically eliminate neuropeptides/receptors from subsets of neurons. The analysis of the CNMa function in the clock network reveals a new and interesting function for CNMa in the control of morning anticipation in LD conditions. This function appears to depend on CNMA_expressing DN1ps.

      Comment on revised version:

      I believe that the authors properly addressed the main points that were raised in my comment on version 1.

    4. Reviewer #2 (Public Review):

      Original Review:

      In this study Mao and co-workers deliver a substantial suite of genetic tools in support of the senior author's recent proposal to create a "chemoconnectomic" tool kit for the expression mapping and conditional disruption of specific neurotransmitter systems with fly neurons of interest. Specifically, they describe the creation of two toolsets for recombination-based and CRISPR/Cas9-based conditional knockouts of genes supporting neurotransmitter and neuromodulator function and Flp-Out and Split-LexA toolkit for the examination of gene expression within defined subsets of neurons. The authors report the creation of conditional genetic tools for the disruption/mapping of approximately 200 chemoconnectomic gene products, an examination of the general effectiveness of these tools in the fly brain and apply them to the circadian clock network in an attempt to reveal new information regarding the transmitter/modulator systems involved in daily behavioral timing. The authors provide clear evidence of the effectiveness of the new methods along with a transparent assessment of the variability of the tools. In addition, they present evidence that the neuro peptide CNMa influences the morning peak of daily activity in the fly by regulating the timing of activity increases in anticipation of dawn.

      A major strength of the study is the transparent assessment of the effectiveness and variability of the conditional genetic approaches developed by the authors. The authors have largely achieved their aims and the study therefore represents a major delivery on the promise of chemoconnectomics made by the senior author in 2019 (Neuron, Vol. 101, p. 876). Though there are some concerns about the variability of knockout effectiveness, off target effects of the knockout strategies, and (especially) the accuracy of the gene expression approach, the tools created for this study will almost certainly be useful for the field and support a great deal of future work.

      Comments on revised version:

      The authors have responded to each of my concerns. Most importantly, they have made the discrepancies within the study and between the study and previously published work clearer to the reader. they have also corrected statements that are not consistent with the current state of the field. The issue regarding opposing effects of PDF signaling and CNMa, which was also raised by Reviewer One still stands, notwithstanding the edits made to the text.

    5. Reviewer #3 (Public Review):

      Summary:

      Mao and colleagues generated powerful reagents to genetically analyse chemical communication (CCT) in the brain, and in the process uncovered a function for the CNMa neuropeptide expressed in a subset of DN1p neurons that contributes to the temporal organization of locomotor activity, i.e., the timing of morning anticipation.

      Strengths:

      The strength of the manuscript relies in the generation/characterization of new tools for conditional targeting a well-defined set of CCT genes along with the design and testing of improved versions of Cas9 for efficient knock out. Such invaluable resources will be of interest to the whole community. The authors employed these tools and intersectional genetics to provide an alternative profiling of clock neurons, which is complementary to the ones already published. Furthermore, they uncovered a role for CNMamide, expressed in two DN1ps, in the timing of morning anticipation.

      Weaknesses:

      All prior concerns have been addressed.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study provides an unprecedented understanding of the roles of different combinations of NaV channel isoforms in nociceptors' excitability, with relevance for the design of better strategies targeting NaV channels to treat pain. Although the experimental combination of electrophysiological, modeling, imaging, molecular biology, and behavioral data is convincing and supports the major claims of the work, some conclusions need to be strengthened by further evidence or discussion. The work may be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Xie, Prescott, and colleagues have reevaluated the role of Nav1.7 in nociceptive sensory neuron excitability. They find that nociceptors can make use of different sodium channel subtypes to reach equivalent excitability. The existence of this degeneracy is critical to understanding neuronal physiology under normal and pathological conditions and could explain why Nav subtype-selective drugs have failed in clinical trials. More concretely, nociceptor repetitive spiking relies on Nav1.8 at DIV0 (and probably under normal conditions in vivo), but on Nav1.7 and Nav1.3 at DIV4-7 (and after inflammation in vivo).

      The conclusions of this paper are mostly well supported by data, and these findings should be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Strengths:

      (1.1) The authors have employed elegant electrophysiology experiments (including specific pharmacology and dynamic clamp) and computational simulations to study the excitability of a subpopulation of DRGs that would very likely match with nociceptors (they take advantage of using transgenic mice to detect Nav1.8-expressing neurons). They make a strong point showing the degeneracy that occurs at the ion channel expression level in nociceptors, adding this new data to previous observations in other neuronal types. They also demonstrate that the different Nav subtypes functionally overlap and are able to interchange their "typical" roles in action potential generation. As Xie, Prescott, and colleagues argue, the functional implications of the degenerate character of nociceptive sensory neuron excitability need to be seriously taken into account regarding drug development and clinical trials with Nav subtype-selective inhibitors.

      Weaknesses:

      (1.2) The next comments are minor criticisms, as the major conclusions of the paper are well substantiated. Most of the results presented in the article have been obtained from experiments with DRG neuron cultures, and surely there is a greater degree of complexity and heterogeneity about the degeneracy of nociceptors excitability in the "in vivo" condition. Indeed, the authors show in Figures 7 and 8 data that support their hypothesis and an increased Nav1.7's influence on nociceptor excitability after inflammation, but also a higher variability in the nociceptors spiking responses. On the other hand, DRG neurons targeted in this study (YFP (+) after crossing with Nav1.8-Cre mice) are >90% nociceptors, but not all nociceptors express Nav1.8 in vivo. As shown by Li et al., 2016 ("Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity"), there is a high heterogeneity of neuron subtypes within sensory neurons. Therefore, some caution should be taken when translating the results obtained with the DRG neuron cultures to the more complex "in vivo" panorama.

      We agree that most but not all Nav1.8+ DRG cells are nociceptors and that not all nociceptors express Nav1.8. We targeted small neurons that also express (or at some point expressed) Nav1.8, thus excluding larger neurons that express Nav1.8. This allowed us to hone in on a relatively homogeneous set of neurons, which is crucial when testing different neurons to compare between conditions (as opposed to testing longitudinally in the same neuron, which is not feasible). We expect all neurons are degenerate but likely on the basis of different ion channel combinations. Indeed, even within small Nav1.8+ neurons, other channels that we did not consider likely contribute to the degenerate regulation (as now better reflected in the revised Discussion).

      That said, there are multiple sources of heterogeneity. We suspect that heterogeneity is more increased after inflammation than after axotomy because all DRG neurons experience axotomy when cultured whereas neurons experience inflammation differently in vivo depending on whether their axon innervates the inflamed area (now explained on lines 214-215). This is not so much about whether the insult occurs in vivo or in vitro, but about how homogeneously neurons are affected by the insult. Granted, neurons are indeed more likely to be heterogeneously affected in vivo since conditions are more complex. But our goal in testing PF-71 in behavioral tests (Fig. 8) was to show that changes observed in nociceptor excitability in Figure 7, despite heterogeneity, were predictive of changes in drug efficacy. In short, we establish Nav interchangeability by comparing neurons in culture (Figs 1-6), but we then show that similar Nav shifts can develop in vivo (Fig 7) with implications for drug efficacy (Fig 8). Such results should alert readers to the importance of degeneracy for drug efficacy (which is our main goal) even without a complete picture of nociceptor degeneracy or DRG neuron heterogeneity. Additions to the Discussion (lines 248-259, 304-308) are intended to highlight these considerations.

      (1.3) Although the authors have focused their attention on Nav channels, it should be noted that degeneracy concerning other ion channels (such as potassium ion channels) could also impact the nociceptor excitability. The action potential AHP in Figure 1, panel A is very different comparing the DIV0 (blue) and DIV4-7 examples. Indeed, the conductance density values for the AHP current are higher at DIV0 than at DIV7 in the computational model (supplementary table 5). The role of other ion channels in order to obtain equivalent excitability should not be underestimated.

      We completely agree. We focused on Nav channels because of our initial observation with TTX and because of industry’s efforts to develop Nav subtype-selective inhibitors, whose likelihood of success is affected by the changes we report. But other channels are presumably changing, especially given observed changes in the AHP shape (now mentioned on lines 304-308). Investigation should be expanded to include these other channels in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The authors have noted in preliminary work that tetrodotoxin (TTX), which inhibits NaV1.7 and several other TTX-sensitive sodium channels, has differential effects on nociceptors, dramatically reducing their excitability under certain conditions but not under others. Partly because of this coincidental observation, the aim of the present work was to re-examine or characterize the role of NaV1.7 in nociceptor excitability and its effects on drug efficacy. The manuscript demonstrates that a NaV1.7-selective inhibitor produces analgesia only when nociceptor excitability is based on NaV1.7. More generally and comprehensively, the results show that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes (NaV 1.3/1.7 and 1.8). This can cause widespread changes in the role of a particular subtype over time. The degenerate nature of nociceptor excitability shows functional implications that make the assignment of pathological changes to a particular NaV subtype difficult or even impossible.

      Thus, the analgesic efficacy of NaV1.7- or NaV1.8-selective agents depends essentially on which NaV subtype controls excitability at a given time point. These results explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      Strengths:

      (2.1) The above results are clearly and impressively supported by the experiments and data shown. All methods are described in detail, presumably allow good reproducibility, and were suitable to address the corresponding question. The only exception is the description of the computer model, which should be described in more detail.

      We failed to report basic information such as the software, integration method and time step in the original text. This information is now provided on lines 476-477. Notably, the full code is available on ModelDB plus all equations including the values for all gating parameters are provided in Supplementary Table 5 and values for maximal conductance densities for DIV0 and DIV7 models are provided in Supplementary Table 6. Changes in conductance densities to simulate different pharmacological conditions are reported in the relevant figure legends (now shown in red). We did not include model details in the main text to avoid disrupting the flow of the presentation, but all the model details are reported in the Methods, tables and/or figure legends.

      (2.2) The results showing that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and expression of different NaV subtypes are of great importance in the fields of basic and clinical pain research and sodium channel physiology and pharmacology, but also for a broad readership and community. The degenerate nature of nociceptor excitability, which is clearly shown and well supported by data has large functional implications. The results are of great importance because they may explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      In summary, the authors achieved their overall aim to enlighten the role of NaV1.7 in nociceptor excitability and the effects on drug efficacy. The data support the conclusions, although the clinical implications could be highlighted in a more detailed manner.

      Weaknesses:

      As mentioned before, the results that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes are impressive. However, there is some "gap" between the DRG culture experiments and acutely dissociated DRGs from mice after CFA injection. In the extensive experiments with cultured DRG neurons, different time points after dissociation were compared. Although it would have been difficult for functional testing to examine additional time points (besides DIV0 and DIV47), at least mRNA and protein levels should have been determined at additional time points (DIV) to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly.

      Characterizing the time course of NaV expression changes is worthwhile but, insofar as such details are not necessary to establish that excitability is degenerate, it was not include in the current study. Furthermore, since mRNA levels do not parallel the functional changes in Nav1.7 (Figure 6A), we do not think it would be helpful to measure mRNA levels at intermediate time points. Measuring protein levels would be more informative, however, as now explained on lines 362-369, neurons were recorded at intermediate time points in initial experiments and showed a lot of variability. Methods that could track fluorescently-tagged NaV channels longitudinally (i.e. at different time points in the same cell) would be well suited for this sort of characterization, but will invariably lead to more questions about membrane trafficking, phosphorylation, etc. We agree that a thorough characterization would be interesting but we think it is best left for a future study.

      It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV47) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. This would better link the following data demonstrating that in acutely dissociated nociceptors after CFA injection, the inflammationinduced increase in NaV1.7 membrane expression enhances the effect of (or more neurons respond to) the NaV1.7 inhibitor PF-71, whereas fewer CFA neurons respond to the NaV1.8 inhibitor PF-24.

      These are some of the many good questions that emerge from our results. We are not particularly keen to investigate what happens over several days in culture, since this is not so clinically relevant, but it would be interesting to compare changes induced by nerve injury in vivo (which usually involves neuroinflammatory changes) and changes induced by inflammation. Many previous studies have touched on such issues but we are cautious about interpreting transcriptional changes, and of course all of these changes need to be considered in the context of cellular heterogeneity. It would be interesting to decipher if changes in NaV1.7 and NaV1.8 are directly linked so that an increase in one triggers a decrease in the other, and vice versa. But of course many other channels are also likely to change (as discussed above), and they too warrant attention, which makes the problem quite difficult. We look forward to tackling this in future work.

      The results shown explain, at least in part, the poor clinical outcomes with the use of subtypeselective NaV inhibitors and therefore have important implications for the future development of Nav-selective analgesics. However, this point, which is also evident from the title of the manuscript, is discussed only superficially with respect to clinical outcomes. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how?

      We wish to avoid speculating on which particular clinical results are better explained because our study was not designed for that. Instead, our take-home message (which is well supported; see Discussion on lines 309-321) is that NaV1.7-selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. At the end of the results (line 235), which is, we think, what prompted the reviewer’s comment, we point to the Discussion. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is difficult. We certainly don’t have all the answers, but we hope our results will point readers in a new direction to help answer such questions.

      Another point directly related to the previous one, which should at least be discussed, is that all the data are from rodents, or in this case from mice, and this should explain the clinical data in humans. Even if "impediment to translation" is briefly mentioned in a slightly different context, one could (as mentioned above) discuss in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans.

      We are not aware of human data that speak directly to nociceptor degeneracy but degeneracy has been observed in diverse species; if anything, human neurons are probably even more degenerate based on progressive expansion of ion channel types, splice variants, etc. over evolution. Of course species differences extend beyond degeneracy and are always a concern for translation, because of a species difference in the drug target itself or because preclinical pain testing fails to capture the most clinically important aspects of pain (which we mention on line 35). Line 39 now reiterates that these explanations for translational difficulties are not mutually exclusive, but that degeneracy deserves greater consideration that is has hitherto received. Indeed, throughout our paper we imply that degeneracy may contribute to the clinical failure of Nav subtype-specific drugs, but those failures are certainly not evidence of degeneracy. In the Discussion (line 320-321), we now cite a recent review article on degeneracy in the context of epilepsy, and point out how parallels might help inform pain research. We wish we had a more direct answer to the reviewer’s request; in the absence of this, we hope our results motivate readers to seek out these answers in future research.

      Although speculative, it would be interesting for readers to know whether a treatment regimen based on "time since injury" with NaV1.7 and NaV1.8 inhibitors might offer benefits. Based on the data, could one hypothesize that NaV1.7 inhibitors are more likely to benefit (albeit in the short term) in patients with neuropathic pain with better patient selection (e.g., defined interval between injury and treatment)?

      We like that our data prompt this sort of prediction. However, this is potentially complicated since the injury may be subtle, which is to say that the exact timing may not be known. There are scenarios (e.g. postoperative pain) where the timing of the insult is known, but in other cases (e.g. diabetic neuropathy) the disease process is quite insidious, and different neurons might have progressed through different stages depending on how they were exposed to the insult. Our own experiments with CFA are a case in point. Notwithstanding the potential difficulties about gauging the time course, any way of predicting which Nav subtype is dominant could help more strategically choose which drug to use.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors used patch-clamp to characterize the implication of various voltagegated Na+ channels in the firing properties of mouse nociceptive sensory neurons. They report that depending on the culture conditions NaV1.3, NaV1.7, and NaV1.8 have distinct contributions to action potential firing and that similar firing patterns can result from distinct relative roles of these channels. The findings may be relevant for the design of better strategies targeting NaV channels to treat pain.

      Strengths:

      The paper addresses the important issue of understanding, from an interesting perspective, the lack of success of therapeutic strategies targeting NaV channels in the context of pain. Specifically, the authors test the hypothesis that different NaV channels contribute in a plastic manner to action potential firing, which may be the reason why it is difficult to target pain by inhibiting these channels. The experiments seem to have been properly performed and most conclusions are justified. The paper is concisely written and easy to follow.

      Weaknesses:

      (1) The most critical issue I find in the manuscript is the claim that different combinations of NaV channels result in equivalent excitability. For example, in the Abstract it is stated that: "...we show that nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8". The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same. I think that the culprit of this issue is that the authors reach their conclusion from the comparison of the (average) firing rate determined over 1 s current stimulation in distinct conditions. However, this is not the only parameter that determines how sensory neurons convey information. For instance, the time dependence of the instantaneous frequency, the actual firing pattern, may be important too. Moreover, the use of 1 s of current stimulation might not be sufficient to characterize the firing pattern if one wants to obtain conclusions that could translate to clinical settings (i.e., sustained pain). A neuron in which NaV1.7 is the main contributor is expected to have a damping firing pattern due to cumulative channel inactivation, whereas another depending mainly on NaV1.8 is expected to display more sustained firing. This is actually seen in the results of the modelling.

      This concern seems to boil down to how equivalent is equivalent? The spike shape or the full inputoutput curve for a DIV0 neuron (Nav1.8-dominant) is never equivalent to what’s seen in a DIV47 neuron (Nav1.7-dominant), but nor are any two DIV0 neurons strictly equivalent, and likewise for any two DIV4-7 neurons. Our point is that DIV0 and DIV4-7 neurons are a far more similar (less discriminable) in their excitability than expected from the qualitative difference in their TTX sensitivity (and from repeated claims in the literature that Nav1.7 is necessary for spike generation in nociceptors). Nav isoforms need not be identical to operate similarly; for instance, Nav1.8 tends to activate at “suprathreshold” voltages, but this depends on the value of threshold; if threshold increases, Nav1.8 can activate at subthreshold voltages (see Fig 5). We have modified lines 155- 175 to help clarify this.

      We completely agree that firing rate is not the only way to convey sensory information, and of course injecting current directly into the cell body via a patch pipette is not a natural stimulus. These are all factors to keep in mind when interpreting our data. Nonetheless, our data show that excitability is similar between DIV0 and DIV 4-7, so much so that data from any one neuron (without pharmacological tests or capacitance measurements) would likely not reveal if that cell is DIV0 or DIV4-7; this “indiscriminability” qualifies as “equivalent” for our purposes, and is consistent with phrasing used by other authors studying degeneracy. Notably, not every DIV4-7 neuron exhibits spike height attenuation (see Fig. 1A), likely because of concomitant changes in the AHP that were not captured in our computer model or directly tested in our experiments. This highlights that other channel changes may also contribute to degeneracy and the maintenance of repetitive spiking.

      (2) In Fig. 1, is 100 nM TTX sufficient to inhibit all TTX-sensitive NaV currents? More common in literature values to fully inhibit these currents are between 300 to 500 nM. The currents shown as TTX-sensitive in Fig. 1D look very strange (not like the ones at Baseline DIV4-7). It seems that 100 nM TTX was not enough, leading to an underestimation of the amplitude of the TTXsensitive currents.

      As now summarized in Supplementary Table 3 (which is newly added), 100 nM TTX is >20x the EC50 for Nav1.3 and Nav1.7 (but is still far below the EC50 for Nav1.8). Based on this, TTXsensitive channels are definitely blocked in our TTX experiments.

      (3) Page 8, the authors conclude that "Inflammation caused nociceptors to become much more variable in their reliance of specific NaV subtypes". However, how did the authors ensure that all neurons tested were affected by the CFA model? It could be that the heterogeneity in neuron properties results from distinct levels of effects of CFA.

      We agree with the reviewer. We also believe that variable exposure to CFA is the most likely explanation for the heightened variability in TTX-sensitivity reported in Figure 7 (now more clearly explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the CFA and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing), and that demonstration is true even if some neurons do not switch. Subsequent testing in Figure 8 shows that enough neurons switch to have a meaningful effect in terms of the behavioral pharmacology. So, notwithstanding tangential concerns, we think our CFA experiments succeeded in showing that Nav channels can switch in vivo and that this impacts drug efficacy.

      Recommendations for the authors:

      All reviewers agreed that these results are solid and interesting. However, the reviewers also raised several concerns that should be addressed by the authors to improve the strength of the evidence presented. Revisions considered to be essential include:

      (1) Discuss how degeneracy concerning other ion channels (such as potassium ion channels) could also impact nociceptor excitability (reviewer #1). Additionally, the translation of results from DRG neuron cultures to "in vivo" nociceptors should be better discussed.

      We have added a new paragraph to the Discussion (line 248-259) to remind readers that despite our focus on Nav channels, other ion channels likely also change (and that these changes involve diverse regulatory mechanisms that require further investigation). Likewise, despite our focus on the changes caused by culturing neurons, we remind readers that subtler, more clinically relevant in vivo perturbations can likewise cause a multitude of changes. We end that paragraph by emphasizing that although accounting for all the contributing components is required to fully understand a degenerate system, meaningful progress can be made by studying a subset of the components. We want to emphasize this because there is some middle ground between focusing on one component at a time (which is the norm) vs. trying to account for everything (which is an infeasible ideal). Additional text on lines 304-308 also addresses related points.

      (2) Discuss how different combinations of NaV channels result in equivalent excitability, in the context of the experimental conditions used (see main comment by reviewer #3). It should also be discussed in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans (reviewer #2).

      Regarding the first part of this comment, reviewer 3 wrote in the public review that “The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same.” Differences in gating properties are commonly used to argue that different Nav subtypes mediate different phases of the spike, for example, that Nav1.7 initiates the spike whereas Nav1.8 mediates subsequent depolarization because Nav1.7 and Nav1.8 activate at perithreshold and suprathrehold voltages, respectively (see lines 134-135, now shown in red). But such comparison is overly simplistic insofar as it neglects the context in which ion channels operate. For instance, if Nav1.7 is not expressed or fully inactivates, voltage threshold will be less negative, enabling Nav1.8 to contribute to spike initiation; in other words, previously “suprathreshold” voltages become “perithreshold”. Figure 5 is dedicated to explaining this context-sensitivity; specifically, we demonstrate with simulations how Nav1.8 takes over responsibility for initiating a spike when Na1.7 is absent or inactivated. Text on lines 155- 184 has been edited to help clarify this. Regarding the second part of this comment, we are not aware of any direct evidence from human sensory neurons that different sodium channels produce equivalent excitability, but that is certainly what we expect. We suggest that failure of Nav subtype-specific drugs is, at least in part, because of degeneracy, but such failures do not demonstrate degeneracy unless other contributing factors can be excluded (which they can’t). Recognizing degeneracy is difficult, and so variability that might be explained by degeneracy will go unexplained or attributed to other factors unless, by design or serendipity, experiments quantify the effects of degeneracy (as we have attempted to do here). We now cite a recent review article on degeneracy and epilepsy (line 320), which addresses relevant themes that might help inform pain research; for instance, most existing antiseizure medications act on multiple targets whereas more recently developed single-target drugs have proven largely ineffective. This is similar to but better documented than for analgesics. With this in mind, we revised the text to emphasize the circumstantial nature of existing evidence and the need to test more directly for degeneracy (lines 320-323).

      (3) Extend the discussion about the poor clinical outcomes with the use of subtype-selective NaV inhibitors. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how? (reviewer #2)

      As discussed above, we are cautious avoid speculating on which clinical results are attributable to degeneracy. Instead, our take-home message (see Discussion, lines 309-323) is that NaV1.7selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is not trivial. Interpreting clinical data is also complicated by the fact that we are either dealing with genetic mutations (with unclear compensatory changes) or pharmacological results (where NaV1.7-selective drugs have a multitude of problems that might contribute to their lack of efficacy, separate from effects of degeneracy). We have striven to contextualize our results (e.g. last paragraph of results, lines 222-235). We think this is the most we can reasonably say based on the limitations of existing clinical data.

      (4) Provide a clearer and more detailed description of the computational model (reviewers #2 and #3).

      We added important details on line 476-477 but, in our honest opinion, we think our computational model is thoroughly explained. The issue seems to boil down to whether details are included in the Results vs. being left for the Methods, tables and figure legends. We prefer the latter.

      (5) Better clarify the effects of the CFA model, to provide further evidence relating inflammation with nociceptors variability (reviewers #2 and #3)

      As explained in response to a specific point by reviewer #3, we believe that variable exposure to CFA explains the heightened variability in TTX-sensitivity reported in Figure 7 (now explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the inflammation and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing); that demonstration holds true even if some neurons do not switch. Subsequent testing (Fig 8) shows that enough neurons switch to drug efficacy assessed behaviorally. This is emphasized with new text on lines 225-227. Overall, we think our CFA experiments succeed in showing that Nav channels can switch in vivo and, despite variability, that this occurs in enough neurons to impact drug efficacy.

      (6) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews.

      Detailed responses are provided below for all feedback and changes to the text were made whenever necessary, as identified in our responses.

      Reviewer #1 (Recommendations For The Authors):

      Minor points/recommendations:

      Protein synthesis inhibition by cercosporamide could be the direct cause of a smaller-thanexpected increase in Nav1.7 levels at DIV5. But for Nav1.8, there is a mitigation in the increased levels at DIV5, that only could be explained by several indirect mechanisms, including membrane trafficking and posttranslational modifications (phosphorylation, SUMOylation, etc.) on Nav1.8 or protein regulators of Nav1.8 channels. The authors suggest that "translational regulation is crucial", but also insinuate that other processes (membrane trafficking, etc.) could contribute to the observed outcome. It is difficult to assess the relative importance of these different explanations without knowing the exact mechanisms that are acting here.

      We agree. We relied on electrophysiology (and pharmacology) to measure functional changes, but we wanted to verify those data with another method. We expected mRNA levels to parallel the functional changes but, when that did not pan out, we proceeded to look at protein levels. Perhaps we should have stopped there, but by blocking protein translation, we show that there is not enough Nav1.7 protein already available that can be trafficked to the membrane. That does not explain why Nav1.8 levels drop. Our immunohistochemistry could not tease apart membrane expression from overall expression, which limits interpretation. We have enhanced the text to discuss this (lines 200-204), but further experiments are needed. Though admittedly incomplete, our initial finding help set the stage for future experiments on this matter.

      Page 15, typo: "contamination from genomic RNA" -> "contamination from genomic DNA" (appears twice).

      This has been corrected on lines 420 and 421.

      Page 17: I could not find the computer code at ModelDB (http://modeldb.yale.edu/267560). It seems to be an old web link. It should be available at some web repository.

      We confirmed that the link works. Entry is password-protected (password = excitability; see line 476). Password protection will be removed once the paper is officially published.

      Page 19, reference 36, typo: "Inhibitio of" -> "Inhibition of".

      This has been corrected (line 557).

      Page 33, typo: "are significantly larger than differences at DIV1" -> "are significantly larger than differences at DIV0".

      This has been corrected (line 796).

      Page 35, figure 6 legend. The number of experiments (n) is not indicated for panel C data.

      N = 3 is now reported (line 828).

      Reviewer #2 (Recommendations For The Authors):

      p. 3/4 and Data of Fig. 6: It should be commented on why days 1-3 were not investigated. An investigation of the time course (by higher frequency testing) would certainly have an added value because it would be possible to deduce whether the changes develop slowly and gradually, or whether the excitability induced by different NaVs changes suddenly. At least mRNA and protein levels should be determined at additional time points to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly. It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV4-7) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. Or is the latter question clear in the literature?

      We now explain (lines 362-369) that intermediate time points (DIV1-3) were tested in initial current clamp recordings. Those data showed that TTX-sensitivity stabilized by DIV4 and differed from the TTX-insensitivity observed at DIV0. TTX-sensitivity was mixed at DIV1-3 and crosscell variability complicated interpretation. Subsequent experiments were prioritized to clarify why NaV1.7 is not always critical for nociceptor excitability, contrary to past studies. Our efforts to measure mRNA and protein levels were primarily to validate our electrophysiological findings; we are also interested in deciphering the underlying regulatory processes but this is an entire study on its own. Unfortunately, the existing literature does not help or point to an explanation for the Nav1.7/1.8 shift we observed.

      Our evidence that mRNA levels do not parallel functional changes argues against pursuing transcriptional changes in Nav1.7, though transcriptional changes in other factors might be important. Interpretation of immuno quantification would be complicated by the high variability we observed with the physiology at intermediate time points and, furthermore, we cannot resolve surface expression from overall expression based on available antibodies. Methods conducive to longitudinal measurements would be more appropriate (as now mentioned on line 367-369). In short, a lot more work is required to understand the mechanisms involved in the switch, but we think the existing demonstration suffices to show that NaV1.7 and NaV1.8 protein levels vary, with crucial implications for which Nav subtype controls nociceptor excitability, and important implications for drug efficacy. Explaining why and how quickly those protein levels change will be no small feat is best left for a future study.

      p. 4 and following: In order to enable the interpretation of the used concentration of PF-24, PF71, and ICA, the respective IC50 should be indicated.

      A table (now Supplementary Table 3; line 861) has been added to report EC50 values for all drugs for blocking NaV1.7, NaV1.8 and NaV1.3. The concentrations we used are included on that table for easy comparison.

      p. 5, end of the middle paragraph: Here it should be briefly explained -for less familiar readers- why NaV1.1 cannot be causative (ICA inhibits NaV1.1 and 1.3).

      We now explain (lines 117-120) that NaV1.1 is expressed almost exclusively in medium-diameter (A-delta) neurons whereas NaV1.3 is known to be upregulated in small-diameter neurons, and so the effect we observe in small neurons is most likely via blockade NaV1.3.

      p. 6, lines 4/5: At least once it should read computer model instead of model.

      “Computer” has been added the first time we refer to DIV0 or DIV4-7 computer models (lines 138-139)

      p. 6: the difference between Fig. 4B and Fig. 4 - Figure suppl. 1 should be mentioned briefly.

      We now explain (lines 150-154) that Fig. 4B involves replacing a native channel with a different virtual channel (to demonstrate their interchangeability) whereas and Fig. 4 - Figure supplement 1 involves replacing a native channel with the equivalent virtual channel (as a positive control).

      p. 6/7: the text and the conclusions regarding Figure 5 are difficult to follow. Somewhat more detailed explanations of why which data demonstrate or prove something would be helpful.

      The text describing Figure 5 (lines 155-175) has been revised to provide more detail.

      p. 7, last sentence of the first paragraph: How is this supported by the data? Or should this sentence be better moved to the discussion?

      This sentence (now lines 182-184) is designed as a transition. The first half – “a subtype’s contribution shifts rapidly (because of channel inactivation)” – summarizes the immediately preceding data (Figure 5). The second half – “or slowly (because of [changes in conductance density])” – introduces the next section. The text show in square brackets has been revised. We hope this will be clearer based on revisions to the associated text.

      p. 7, second paragraph, line 3: Please delete one "at both".

      Corrected

      p. 7, second paragraph: Please explain why different time points (DIV4-7, DIV5, or DIV7) were used or studied.

      Initial electrophysiological experiments determined that TTX sensitivity stabilized by DIV 4 (see response to opening point) and we did not maintain neurons longer than 7 days, and so neurons recorded between DIV4 and 7 were pooled. If non-electrophysiological tests were conducted on a specific day within that range, we report the specific day, but any day within the DIV4-7 range is expected to give comparable results. This is now explained on lines 365-367.

      p. 8: the text regarding Fig. 7 should also include the important data (e.g. percentage of neurons showing repetitive spinking) mentioned in the legend.

      This text (lines 216-220) has been revised to include the proportion of neurons converted by PF71 and PF-24 and the associated statistical results.

      Fig. 1: third panel (TTX-sensitive current...) of D & Fig. 2 subpanel of A (Nav1.8 current...). These panels should be explained or mentioned in the text and/or legends.

      We now explain in the figure legends (lines 708-710; 714-715; 736-738) how those currents are found through subtraction.

      Fig. 2 - figure supplement 2. One might consider taking Panel A to Fig. 2 so that the comparison to DIV0 is apparent without switching to Suppl. Figs.

      We left this unchanged so that Figures 2 and 3 are equivalently organized, with negative control data left to the supplemental figures. Elife formatting makes it easy to reach the supplementary figure from the main figure, so we hope this won’t be an impediment to readers.

      Fig. 6 C, middle graph (graph of Nav1.7): Please re-check, whether DIV5 none vs. 24 h and none vs. 120 h are really significantly different with such a low p-value.

      We re-checked the statistics and the difference pointed out by the reviewer is significant at p=0.007. We mistakenly reported p<0.001 for all comparisons, and so this p value has been corrected; all the other p values are indeed <0.001. Notably, the data are summarized as median ± quartile because of their non-Gaussian distribution; this is now explained on line 827 (as a reminder to the statement on lines 461-462). Quartiles are more comparable to SD than to SEM (in that quartiles and SD represent the distribution rather than confidence in estimating the mean, like SEM), and so medians can differ very significantly even if quartiles overlap, as in this case.

      Reviewer #3 (Recommendations For The Authors):

      (1) A critical issue in the manuscript is the use of teleological language. It is likely that this is not the intention, but careful revision of the language should be done to avoid the use of expressions that confer purpose to a biological process. Please, find below a list of statements that I consider require correction.

      • In the Abstract, the first sentence: "Nociceptive sensory neurons convey pain signals to the CNS using action potentials". Neurons do not really "use" action potentials, they have no will or purpose to do so. Action potentials are not tools or means to be "used" by neurons. Other examples of misuse of the verb "use" are found in several other sentences:

      "...nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8"

      "Flexible use of different NaV subtypes - an example of degeneracy - compromises..."

      "Nociceptors can achieve equivalent excitability using different sodium channel subtypes" "...degeneracy - the ability of a biological system to achieve equivalent function using different components..."

      "...nociceptors can achieve equivalent excitability using different sodium channel subtypes..."

      "Our results show that nociceptors can achieve similar excitability using different NaV channels" "...the spinal dorsal horn circuit can achieve similar output using different synaptic weight combinations..."

      "Contrary to the view that certain ion channels are uniquely responsible for certain aspects of neuronal function, neurons use diverse ion channel combinations to achieve similar function" "In summary, our results show that nociceptors can achieve equivalent excitability using different NaV subtypes"

      “Use” can mean to put into action (without necessarily implying intention). Based on definitions of the word in various dictionaries, we feel we are well within the realm of normal usage of this term. In trying to achieve a clear and succinct writing style, we have stuck with our original word choice.

      • At the end of page 5 and in the legend of Fig. 7, the word "encourage" is not properly used in the sentence "The ability of NaV1.3, NaV1.7 and NaV1.8 to each encourage repetitive spiking is seemingly inconsistent with the common view...". Encouraging is really an action of humans or animals on other humans or animals.

      Like for “use”, we verified our usage in various dictionaries and we do not think that most readers will be confused or disturbed by our word choice. We use “encourage” to explain that increasing NaV1.3, NaV1.7 or NaV1.8 can increase the likelihood of repetitive spiking; we avoided “cause” because the probability of repetitive spiking is not raised to 100%, since other factors must always be considered.

      • In the Abstract and other places in the manuscript, the word "responsibility" seems to be wrongly employed. It is true that one can say, for instance, on page 4 last paragraph "we sought to identify the NaV subtype responsible for repetitive spiking at each time point". However, to confer channels with the human quality of having "responsibility" for something does not seem appropriate. See also page 8 last paragraph, the first paragraph of the Discussion, and the three paragraphs of page 11.

      Again, we must respectfully disagree with the reviewer. We appreciate that this reviewer does not like our writing style but we do not believe that our style violates English norms.

      (2) In the first sentence of the Abstract, nociceptive sensory neurons do not convey "pain signals". Pain is a sensation that is generated in the brain.

      “Pain” is used as an adjective for “signal” and is used to help identify the type of signal. Nonetheless, since the word count allowed for it, we now refer to “pain-related signals” (line 10).

      (3) I do not see the point of plotting the firing rate as a function of relative stimulus amplitude (normalized to the rheobase, e.g., Fig. 1A bottom panels, Fig. 2B, bottom-right, Fig. 2 Supp2A right, Fig. 3 B bottom panels, etc) instead of as a function of the actual stimulus amplitude. I have the impression that this maneuver hides information. This is equivalent to plotting the current amplitudes as a function of the voltage normalized by the voltage threshold for current activation, which is obviously not done.

      This is how the experiments were performed, so it would be impossible to perform the statistical analysis using the absolute amplitudes post-hoc; specifically, stimulus intensities were tested at increments defined relative to rheobase rather than in absolute terms. There are pros and cons to each approach, and both approaches are commonly used. Notably, we report the value of rheobase on the figures so that readers can, with minimal arithmetic, convert to absolute stimulus intensities. No information is hidden by our approach.

      (4) On page 4 it is stated that "We show later that similar changes develop in vivo following inflammation with consequences for drug efficacy assessed behaviourally (see Fig. 8), meaning the NaV channel reconfiguration described above is not a trivial epiphenomenon of culturing". However, what happens in culture may have nothing in common with what happens in vivo during inflammation. Thus, the latter data may not serve to answer whether the culture conditions induce artifacts or not. I suggest tuning down this statement by changing "meaning" to "suggesting".

      On line 97, we now write “suggesting”.

      (5) Page 5, first paragraph, I miss a clear description of the mathematical models. Having to skip to the Methods section to look for the details of the models as the artifices introduced to simulate different conditions is rather inconvenient.

      So as not to disrupt the flow of the presentation with methodological details, we only provide a short description of the model in the Results. We have slightly expanded this to point out that the conductance-based model is also single-compartment (line 111). We provide a very thorough description of our model in the Methods, especially considering all the details provided in Supplementary Tables 1, 5 and 6. We also report conductance densities and % changes in figure legends (lines 722, 747-748; now shown in red). This is also true for Figure 3-figure supplement 2 (lines 756-759). We tried very hard to find a good balance that we hope most readers will appreciate.

      (6) Page 6, second paragraph, simulations do not serve to "measure" currents.

      The sentence been revised to indicate that simulations were used to “infer” currents during different phases of the spike (line 155).

      (7) Page 7, regarding the tile of the subsection "Control of changes in NaV subtype expression between DIV0 and DIV4-7", the authors measured the levels of expression, but not really the mechanisms "controlling" them. I suggest writing "changes in NaV subtype expression between DIV0 and DIV4-7"

      We have removed “control of” from the section title (line 185)

      (8) What was the reason for adding a noise contribution in the model?

      We now explain that noise was added to reintroduce the voltage noise that is otherwise missing from simulations (line 474). For instance, in the absence of noise, membrane potential can approach voltage threshold very slowly without triggering a spike, which does not happen under realistically noisy conditions. Of course membrane potential fluctuates noisily because of stochastic channel opening and a multitude of other reasons. This is not a major issue for this study, and so we think our short explanation should suffice.

      (9) Please, define the concept of degeneracy upon first mention.

      Degeneracy is now succinctly defined in the abstract (line 20).

    2. eLife assessment

      This fundamental study provides an unprecedented understanding of the roles of different combinations of NaV channel isoforms in nociceptors' excitability, with relevance for the design of better strategies targeting NaV channels to treat pain. Although the experimental combination of electrophysiological, modeling, imaging, molecular biology, and behavioral data is convincing and supports the major claims of the work, some results remain inconclusive and need to be strengthened by further evidence. The work may be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

    3. Reviewer #1 (Public Review):

      Summary:

      In this work, Xie, Prescott and colleagues have reevaluated the role of Nav1.7 in nociceptive sensory neurons excitability. They find that nociceptors can make use of different sodium channel subtypes to reach equivalent excitability. The existence of this degeneracy is critical to understanding the neuronal physiology under normal and pathological conditions and could explain why Nav subtype-selective drugs have failed in clinical trials. More concretely, nociceptor repetitive spiking relies on Nav1.8 at DIV0 (and probably under normal conditions in vivo), but on Nav1.7 and Nav1.3 at DIV4-7 (and after inflammation in vivo).

      The conclusions of this paper are mostly well supported by data, and these findings should be of broad interest to scientists working on pain, drug development, neuronal excitability and ion channels.

      Strengths:

      The authors have employed elegant electrophysiology experiments (including specific pharmacology and dynamic clamp) and computational simulations to study the excitability of a subpopulation of DRGs that would very likely match with nociceptors (they take advantage of using transgenic mice to detect Nav1.8-expressing neurons). They make a strong point showing the degeneracy that occurs at the ion channel expression level in nociceptors, adding this new data to previous observations in other neuronal types. They also demonstrate that the different Nav subtypes functionally overlap and are able to interchange their "typical" roles in action potential generation. As Xie, Prescott and colleagues argue, the functional implications of the degenerate character of nociceptive sensory neurons excitability need to be seriously taken into account regarding drug development and clinical trials with Nav subtype-selective inhibitors.

      In this revised version, the quality of the manuscript has been visibly improved. In my opinion, the questions and concerns raised by reviewers have been addressed clearly. After a detailed reading of this version and the comments to the reviewers, I have no additional comments or criticisms.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors have noted in preliminary work that tetrodotoxin (TTX), which inhibits NaV1.7 and several other TTX-sensitive sodium channels, has differential effects on nociceptors, dramatically reducing their excitability under certain conditions but not under others. Partly because of this coincidental observation, the aim of the present work was to re-examine or characterize the role of NaV1.7 in nociceptor excitability and the effects on drug efficacy. The manuscript demonstrates that a NaV1.7-selective inhibitor produces analgesia only when nociceptor excitability is based on NaV1.7. More generally and comprehensively, the results show that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes (NaV 1.3/1.7 and 1.8). This can cause widespread changes in the role of a particular subtype over time. The degenerate nature of nociceptor excitability shows functional implications that make the assignment of pathological changes to a particular NaV subtype difficult or even impossible.

      Thus, the analgesic efficacy of NaV1.7- or NaV1.8-selective agents depends essentially on which NaV subtype controls excitability at a given time point. These results explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      Strengths:

      The results are clearly and impressively supported by the experiments and data shown. During the revision, the manuscript was consistently improved and the concerns of the first reviews were resolved. All methods are described in detail, and presumably, allow good reproducibility and were suitable to address the scientific question.

      The results showing that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and expression of different NaV subtypes are of great importance in the fields of basic and clinical pain research and sodium channel physiology and pharmacology, but also for a broad readership and community. The degenerate nature of nociceptor excitability, which is clearly shown and well supported by data has large functional implications. The results are of great importance because they may explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      In summary, the authors achieved their overall aim to enlighten the role of the NaV1.7 in nociceptor excitability and the effects on drug efficacy. The data support the conclusions and clinical implications are highlighted as far as is currently justifiable due to the still limited experience in translation. This appears well-considered, not too speculative, and ultimately appropriate.

      The main weaknesses of the first version were fixed during the revision:

      (i) After revising the manuscript, the initial weakness that the computer model was described superficially has been fixed. Important information was added to the main text and additional information, including the full code and equations and values are deposited on ModelDB or are given in the Supplementary information (Suppl. Table 5 & 6).

      (ii) The authors now comment that corresponding studies on protein levels or e.g. neuroinflammatory changes could support the characterization of the time course of membrane expression and cellular changes, but this should be addressed in future studies, as these analyses would also raise new questions, such as about membrane trafficking, post-translational modifications, etc. This is plausible and has now been appropriately addressed in the text.

      (iii) During the initial review the authors were asked to discuss the promising role of NaV1.7 in the light of clinical results. In their response, the authors confidently state that they „wish to avoid speculating on which particular clinical results are better explained because our study was not designed for that." They, however, emphasize their take-home message, which is well supported "Instead, our take-home message (which is well supported; see Discussion on lines 309-321) is that NaV1.7-selective drugs may have a variable clinical effect because nociceptors' reliance on NaV1.7 is itself variable - much more than past studies would have readers believe. ... The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is difficult."

      Against the background of this argumentation, it must be admitted that the decision not to present as yet unproven speculations is probably appropriate from a scientific point of view and that this ultimately proves the critical assessment of one's own data and the limitations of the study. This is undoubtedly acceptable and - in retrospect - probably the right way to go.

    5. Reviewer #3 (Public Review):

      Summary:

      In this study the authors used patch-clamp to characterize the implication of various voltage-gated Na+ channels in the firing properties of mouse nociceptive sensory neurons. They claim that depending on the culture conditions NaV1.3, NaV1.7, and NaV1.8 have distinct contributions to action potential firing and that similar firing patterns can result from distinct relative roles of these channels.

      Strengths:

      The paper addresses the important issue of understanding the lack of success of therapeutic strategies targeting NaV channels in the context of pain. Specifically, the authors test the hypothesis that different NaV channels contribute in a plastic manner to action potential firing, which may be the reason why it is difficult to target pain by inhibiting these channels.

      Weaknesses:

      (1) - The main claim of this paper is that "nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8". From this, they allude to the manifestation of "degeneracy", a concept implying that a biological process can occur via distinct sets of underlying components.<br /> In my opinion, the analyses of the data is biased towards the author's interpretation.<br /> - First, when comparing the excitability across neurons one should relate the response (in this case mean firing frequency) to the absolute size of the stimulus, not to the size of the stimulus normalized to the rheobase (see e.g., Figs. 1A). From this particular figure the authors conclude that the excitability is similar in the culture stages DIV0 and DIV4-7, but these data were not directly compared.<br /> - Second, the authors reach their conclusion from the comparison of the (average) firing rate determined over 1 s current stimulation in distinct conditions. However, this is not the only parameter that determines how sensory neurons might convey information. For instance, the time dependence of the instantaneous frequency, the actual firing pattern, maybe also important.<br /> - Third, the use of 1 s of current stimulation might not be sufficient to characterize the firing pattern if one wants to obtain conclusions that could translate to clinical settings (i.e., sustained pain).<br /> - Fourth, out of principle, the gating properties of NaV1.7 and NaV1.8 channels are not identical, and therefore their contributions to excitability should not be the same. A neuron in which NaV1.7 is the main contributor is expected to have a damping firing pattern due to cumulative channel inactivation, whereas another depending mainly on NaV1.8 is expected to display more sustained firing. This is actually seen in the results of the modelling.

      (2) - The quality of some recordings is dubious. The currents shown as TTX-sensitive in Fig. 1D look very strange (not like the ones at Baseline DIV4-7). These traces show abnormally fast inactivation and even transient deflections above zero current line. These are obvious artifacts of the subtraction procedure, probably due to unstable current amplitudes along the recording time. Similar odd-looking traces are shown in Fig. 3A.

      (3) - I would like to point out that the main Significance Statement of the manuscript reads "The analgesic efficacy of subtype-selective drugs hinges on which subtype controls excitability". I would like to point out that, in addition of being extremely obvious for anyone knowing a bit about pain signaling, the authors did not test the analgesic efficacy of any drug in this study.

      (4) - A critical issue in the manuscript is the unnecessary use of phrases that imply that biological entities have some sort of willpower, flirting with anthropomorphism and teleological language.<br /> Sentences such as "Nociceptive sensory neurons convey pain signals to the CNS using action potentials" (see the Abstract) should be avoided. Neurons do not really "use" action potentials, they have no will to do so. Action potentials are not tools or means to be "used" by neurons. There are many other examples of misuse of the verb "use" in many other sentences. These were pointed out during the revision phase, but unfortunately the authors refused to correct them.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Our answer to the final point(s) raised is as follows:

      "We thank the reviewer for the comment. We checked our datasets accordingly. Typically, the n of cells showed deviations of maximally 20% from experiment to experiment (e.g. 16-24 cells per experiment). Additionally, experiments were performed using different passages of the cells. Moreover, data were validated at different time-points during the study using newly thawed cell lines."


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Bischoff et al present a carefully prepared study on a very interesting and relevant topic: the role of ion channels (here a Ca2+-activated K+ channel BK) in regulating mitochondrial metabolism in breast cancer cells. The potential impact of these and similar observations made in other tumor entities has only begun to be appreciated. That being said, the authors pursue in my view an innovative approach to understanding breast cancer cell metabolism. Considering the following points would further strengthen the manuscript:

      We thank reviewer #1 for the overall positive feedback on our study.

      Methods:

      (1) The authors use an extracellular Ca2+ concentration (2 mM) in their Ringer's solutions that is almost twice as high as the physiologically free Ca2+ concentration (ln 473). Moreover, the free Ca2+ concentration of their pipette solution is not indicated (ln 487).

      Indeed, we utilized 2 mM of Ca2+ in the physiologic live-cell imaging buffer. This concentration could actually be a little lower than the total Ca2+ concentration (ranging usually from 2.2 to 2.6 mM) in the body, while the free Ca2+ concentration is typically half as high. Nevertheless, we find multiple studies different from ours, which utilized 2 mM for their live-cell-based experiments. Please check the following studies, which represent only a small selection:

      https://doi.org/10.1038/s41598-019-49070-8

      https://doi.org/10.1016/j.bpj.2020.08.045

      https://doi.org/10.1016/j.redox.2022.102319

      However, to ensure that the applied conditions are physiologically relevant, we reperformed experiments using MMTV-PyMT WT and MMTV-PyMT BK-KO cells and compared cytosolic Ca2+ concentrations over time in response to cell stimulation with ATP, either in the presence of 1.0 mM (Author response image 1A) or 2.0 mM extracellular Ca2+ (Author response image1B). The respective graphs are attached in the following for reviewer’s inspection. As expected, we find that the intracellular Ca2+ concentration in MMTV-PyMT WT and BK-KO cells was dependent on the extracellular Ca2+ concentration. Importantly, however, irrespective of the exact Ca2+ concentration applied, we observed a similar difference in basal cytosolic Ca2+ between MMTV-PyMT WT and BK-KO cells (Author response image1C).

      Author response image 1.

      Cytosolic Ca2+ concentrations over-time in the presence of 1.mM or 2.0 mM extracellular Ca2+.

      Concerning the Ca2+ concentration in the patch-pipette – we are very glad that you uncovered an error in our description and apologize for the mistake. Actually, the information the reviewer is referring to was already given in the previous version of the manuscript, but unclear because a comma was shifted (see line 487 in the originally submitted manuscript). The Ca2+ concentration of the patch-pipette was 0.1 mM in the presence of 0.6 mM EGTA, which should (according to Ca-EGTA calculator, https://somapp.ucdmc.ucdavis.edu/pharmacology/bers/maxchelator/CaEGTA-NIST.htm) be equivalent to ~30 nM of free Ca2+ in the patch pipette. We corrected the mistake in the manuscript and thank the reviewer again for spotting this inaccuracy.

      (2) Ca2+I measurements: The authors use ATP to elicit intracellular Ca2+ signals. Is this then a physiological stimulus for Ca2+ signaling in breast cancer? What is the rationale for using ATP? Moreover, it would be nice to see calibrated baseline values of Ca2+i.

      We thank the reviewer for the comment and suggestion. Importantly, it was demonstrated recently, that all of the utilized cell lines respond to treatment with extracellular ATP with a prominent increase in Ca2+I, most probably indicating the expression of purinergic receptors, which was a prerequisite to observe ATP induced changes in [Ca2+]i.

      https://doi.org/10.1038/s41419-022-05329-z,

      https://doi.org/10.1093/carcin/bgt493

      https://doi.org/10.1038/s41598-018-26459-5

      Furthermore, ATP plays a crucial role in the tumor microenvironment, where high rates of cell death occur. Hence, ATP is of pathophysiologic relevance for the utilized cancer cell lines.

      https://doi.org/10.1038/s41568-018-0037-0

      https://doi.org/10.3390/cells9112496

      https://doi.org/10.1002/jcp.30580

      Following the suggestions by Reviewer #1 (and #2), we included calibrations of Ca2+cyto and Ca2+mito in the manuscript, by depleting the intracellular Ca2+ stores using Ionomycin in the absence of extracellular Ca2+ (EGTA) to validate the basal difference in Ca2+cyto and Ca2+mito. Additionally, Ca2+cyto was calibrated under basal and inhibitor treated conditions, and values in nM are given in the text (p. 5, lines 185-190, 193-195 and 199-200, in the tracked changes version of the MS). The new data can be found in new Figure S2F – Figure S2J and new Figure S2R – Figure S2V. Moreover, we calculated basal [Ca2+]cyto in the different BKCa pro- and deficient cell lines and under inhibitor treated conditions. We additionally added information about the pathophysiologic relevance of ATP in the tumor microenvironment in lines 175-178 in the tracked changes version of the manuscript.

      (3) Membrane potential measurements: It would be nice to see a calibration of the potential measurements; this would allow us to correlate the IV relationship with membrane potential. Without calibration, it is hard to compare unless the identical uptake of the dye is shown. Does paxilline or IbTx also induce depolarization?

      We thank the reviewer for the suggestion. Indeed, membrane potential calibrations/ measurements using the membrane potential sensitive dye Dibac4(3) would be interesting, however, technically hardly feasible. The reason is that the principle of the dye is based on different uptake in response to differences in membrane potential, and not ratiometric as for most other dyes/ sensors used. Considering this limitation, we decided to perform membrane potential measurements by patch-clamp analysis. Additionally, we performed these experiments upon inhibition of PM-located BKCa by IBTX. Current-clamp experiments confirmed the difference in basal membrane potential between MMTV-PyMT WT and BK-KO cells (consult new Figure S1C and lines 127-130 in the tracked changes version of the manuscript). Interestingly, IBTX treatment depolarized the PM potential to the BK-KO cell level, which validates that BK activity and PM potential are connected. In addition to this approach, we utilized our recently developed genetically encoded K+ sensors revealing basal differences in [K+]cyto between MMTV-PyMT WT and BK-KO cells. Also this difference between both genotypes was equalized by IBTX as the respective treatment increased [K+]cyto only in WT cells, which most likely explains the cause of PM depolarization (consult lines 130-135 in the tracked changes version of the manuscript and new Figure S1D and Figure S1E).

      (4) Mito-potential measurements: Why did the authors use such a long time course and preincubate cells with channel blockers overnight? Why did they not perform paired experiments and record the immediate effect of the BK channel blockers in the mito potential?

      We thank the reviewer for the suggestion. We performed TMRM-based experiments with MMTV-PyMT WT cells in response to short-term exposure to paxilline, which did not significantly affect the mitochondrial membrane potential, at least within 15 minutes of treatment (Author response image 2). This indicates, that further downstream processes subsequent to (mito) BKCa inhibition affect the mitochondrial membrane potential(MMP), most probably including remodeling processes of the respiratory chain, mitochondrial ion homeostasis or glycolytic activity, ultimately also delivering reduction equivalents to mitochondria. Our final goal was to validate potential differences between a BKCa pro-and deficient cell model, whereby the latter cells lacked the BKCa channel since its origination. Hence, “long-term” (~12h) BKCa inhibition as performed in our experiments rather reflects the BK-KO cell situation. Taken together with the new experiment (Author response image 2), we can now state that the effect of BK inhibition on the MMP is at least not the consequence of an acute (within minutes) channel blockade.

      Author response image 2.

      Mitochondrial membrane potential, as measured using TMRM, in response to acute short-term administration of 5µM paxilline, followed by mitochondrial depolarization using FCCP.

      (5) MTT assays are also based on mitochondrial function - since modulation of mito function is at the core of this manuscript, an alternative method should be used.

      We thank the reviewer for the important comment. We performed additional, immunofluorescence-based experiments using Ki-67 staining to assess cell proliferation rates. The newly added data can be found in the text, lines 409-412 in the tracked changes version of the manuscript and new Figure S6D-F. The results obtained confirm the MTTbased results (Fig.6H-I).

      Results:

      (1) Fig. 5G: The number of BK "positive" mitoplasts is surprisingly low - how does this affect the interpretation? Did the authors attempt to record mitoBK current in the "whole-mitoplast" mode? How does the mitoBK current density compare with that of the plasma membrane? Is it possible to theoretically predict the number of mitoBK channels per mitochondrion to elicit the observed effects? Can these results be correlated with the immuno-localization of mitoBK channels?

      Indeed, the number of BKCa-positive mitoplasts appears low on a first view. However, as these experiments were performed in a mitoplast-attached mode, it is important to keep in mind that only a very small area of the actual mitoplast is investigated with each patch. If no channel was detected in such region, the patch was depicted as “empty”, as presented in Fig.5G, which does, however, not mean that the entire mitochondria was actually BKCa negative. Hence, the density of BKCa in the IMM might be higher than expected from our experiments. Nevertheless, already earlier results using glioblastoma cell lines – considered to be one of the cell lines mostly enriched in mitoBKCa – demonstrated a quite low density of BKCa β4 regulatory subunit in mitochondria – please see figure 2B in the following paper: 10.1371/journal.pone.0068125 – which (based on 1:1 stoichiometry of α and β subunits) also suggests that the density of the alpha subunit of BKCa might be low in this compartment.

      Author response image 3.

      Author response image 3: Schematic representation of mitoplast attached patch-clamp experiments

      Theoretically, density predictions of mitoBK compared to PM localized BKCa would be possible if whole-mitoplast experiments were performed, however, we are unsure what added value this information would actually burst, allowing the pharmacologic modulation of structures originally located within the mitochondrial matrix. Please also consult Author response image 3. According to the most recent models, even if there are other views on this, mitoBKCa is oriented in a way, that the C-terminus with its Ca2+ binding bowl is located within the mitochondrial matrix. Hence, to allow Ca2+ sensitivity experiments of the channel, broken up (by swelling) mitoplasts are required to make the Ca2+ binding bowl accessible for Ca2+ manipulations in the bath solution. This approach does not allow us to compare the channel density to that of the PM.

      Finally, to the best of our knowledge, a combination of immunofluorescence with mitoplast patch-clamp experiments is not feasible yet, and would probably be impossible due to the low density of the mitoBKCa as well as the lack of highly sensitive and specific antibodies.

      (2) There are also reports about other mitoK channels (e.g. Kv1.3, KCa3.1, KATP) playing an important role in mitochondrial function. Did the authors observe them, too? Can the authors speculate on the relative importance of the different channels? Is it known whether they are expressed organ-/tumor-specifically?

      Author response image 4.

      Representative single channels different to mitoBKCa detected in MDAMB-453 mitoplasts.

      The reviewer is right, other K+ channels have been found in mitochondria and these also play a role in tumor cells. This is also consistent with our data (Fig.5G), where we observed other channels in the mitoplasts of BCCs as well. These all four cell lines tested. According to their conductance and our expectations from literature, these channels may e.g. include mitoIKCa, mitoSKCa, mitoKATP orothers (10.1146/annurev-biophys-092622-094853). As we focused, however, on patches containing a mitoBKCa, we did not further pharmacologically characterize these channels. Two examples of channels we found in these mitoplasts besides BKCa are presented for reviewers’ inspection (Author response image 4). As our manuscript focusses on mitoBKCa, we did not further classify these channels in smaller subgroups according to their conductance, as we feel that a differentiation between BKCa (~210 pS), and channels showing a conductance ≤150pS, or a conductance ≤100 pS is sufficient. Furthermore, this additional information would dilute our story too much making it difficult for the (non-specialist) reader to follow the red thread of the study. We added respective information in the manuscript, however. Please consult lines 365-366 in the tracked changes version of the manuscript.

      Reviewer #1 is right, the observed the different K+ channels might of course be organ- or tumor-specific. For example, it has been reported that the expression of K+ channels is different in various cancer cell (lines) (https://doi.org/10.2174/13816128113199990032, 10.1016/j.pharmthera.2021.107874, 10.1038/nrc3635), a fact, which also according to our study might be exploited for pharmacological manipulation, aiming to affect proliferation/apoptosis of cancer cells. Further, a recently published single-cell and spatially resolved atlas of human breast cancer implies that the expression of different K+ channels (such as mitoIKCa, mitoSKCa, mitoKATP) might even differ between cancer- and non-cancer cells within a single tumour (https://doi.org/10.1038/s41588-021-00911-1).

      Reviewer #2 (Public Review):

      Summary:

      The large-conductance Ca2+ activated K+ channel (BK) has been reported to promote breast cancer progression, but it is not clear how. The present study carried out in breast cancer cell lines, concludes that BK located in mitochondria reprograms cells towards the Warburg phenotype, one of the metabolic hallmarks of cancer.

      Strengths:

      The use of a wide array of modern complementary techniques, including metabolic imaging, respirometry, metabolomics, and electrophysiology. On the whole, experiments are astute and well-designed and appear carefully done. The use of BK knock-out cells to control for the specificity of the pharmacological tools is a major strength. The manuscript is clearly written.

      There are many interesting original observations that may give birth to new studies.

      Weaknesses:

      The main conclusion regarding the role of a BK channel located in mitochondria appears is not sufficiently supported. Other perfectible aspects are the interpretation of co-localization experiments and the calibration of Ca2+ dyes. These points are discussed in more detail in the following paragraphs:

      We thank reviewer #2 for the thorough assessment of our study.

      (1) May the metabolic effects be ascribed to a BK located in mitochondria? Unfortunately not, at least with the available evidence. While it is clear these cells have a BK in mitochondria (characteristic K+ currents detected in mitoplasts) and it is also well substantiated that the metabolic effects in intact cells are explained by an intracellular BK (paxilline effects absent in the BK KO), it does not follow that both observations are linked. Given that ectopic BKDEC appeared at the surface, a confounding factor is the likely expression of BK in other intracellular locations such as ER, Golgi, endosomes, etc. To their credit, authors acknowledge this limitation several times throughout the text ("...presumably mitoBK...") but not in other important places, particularly in the title and abstract.

      We thank the reviewer for this important comment and amended the title and abstract, respectively. The title of the manuscript was changed to “mitoBKCa is functionally expressed in murine and human breast cancer cells and potentially contributes to metabolic reprogramming.” Additionally, we changed appropriate passages in the text, to emphasize that mitoBKCa potentially mediates the metabolic reprogramming, but other intracellular channels could also contribute to these processes.

      (2) MitoBK subcellular location. Pearson correlations of 0.6 and about zero were obtained between the locations of mitoGREEN on one side, and mRFP or RFP-GPI on the other (Figs. 1G and S1E). These are nice positive and negative controls. For BK-DECRFP however, the Pearson correlation was about 0.2. What is the Z resolution of apotome imaging? Assuming an optimum optical section of 600 nm, as obtained by a 1.4 NA objective with a confocal, that mitochondria are typically 100 nm in diameter and that BK-DECRFP appears to stain more structures than mitoGREEN, the positive correlation of 0.2 may not reflect colocalization. For instance, it could be that BK-DECRFP is not just in mitochondria but in a close underlying organelle e.g. the ER. Along the same line, why did BK-RFP also give a positive Pearson? Isn´t that unexpected? Considering that BK-DEC was found by patch clamping at the plasma membrane, the subcellular targeting of the channel is suspect. Could it be that the endogenous BK-DEC does actually reside exclusively in mitochondria (a true mitoBK), but overflows to other membranes upon overexpression? Regarding immunodetection of BK in the mitochondrial Percoll preparation (Fig. S5), the absence of NKA demonstrates the absence of plasma membrane contamination but does not inform about contamination by other intracellular membranes.

      Indeed, it seems that BKCa-DEC is not an exclusive mitoBKCa, at least not upon (over-/)expression in MCF-7 cells. It is known from literature, that mitochondrial K+ channels are encoded by the nuclear genome, as no obvious gene for a K+ channel is found in the mitochondrial genome. Channel proteins are synthetized by cytosolic ribosomes and likely translocated into mitochondria via the TOM/TIM system. Although some K+ channels possess a mitochondrial targeting sequence at the N-terminus, their import is mostly far from a general mechanism, and this seems also to be true for BK channels. In the case of the K+ channel Kv1.3, an even more complex scenario is hypothesized, as the channel located in the PM could be transferred to mitochondria via mitochondria-associated membranes (MAM) structures of the ER (https://doi.org/10.3390/ijms20030734). Yet, the detailed mechanism for BK shuttling to mitochondria is not fully understood. Possibly, overflow is exactly what is happening, due to very high levels of BK-DEC expression upon transfection. However, that the channel translocates to the IMM upon transfection is not surprising and was also demonstrated for other cell models including HEK293 – see e.g. 10.1038/s41598-021-904653. Unfortunately, transfection efficiency of MCF-7 is quite low compared to HEK293 – hence, quantitative statements from mito-patches upon transfection are difficult.

      In order to ensure that the mitochondrial colocalization is not a matter of poor microscope resolution, we reperformed these experiments using confocal imaging on a Zeiss LSM980 with an Airyscan 2 detector, yielding z resolutions of ~ 450 nm. These experiments confirmed the increased colocalization of BKCa-DEC with mitochondria compared to BKCa lacking the DEC exon. Furthermore, this imaging at higher resolution demonstrated, that, unfortunately, colocalization might not be the best analysis, as especially fragmented mitochondria showed a clear MitoGREEN stained matrix, surrounded by red fluorescence derived from BKCaDECRFP present in the IMM (revised Fig. 1G).

      To validate the results derived from immunoblotting, we additionally stained the membranes for TMX1, a marker for the ER membrane. This analysis confirmed the high purity of the mitochondrial isolation without ER-membrane contamination after percoll purification, and hence validated the presence of BKCa in the mitochondrial membrane (revised Fig. S5D). The additional information can be found in lines 156-159 in the tracked changes version of the manuscript.

      (3) Calibration of fluorescent probes. The conclusion that BK blockers or BK expression affects resting Ca2+ levels should be better supported. Fluorescent sensors and dyes provide signals or ratios that need to be calibrated if comparisons between different cell types or experimental conditions are to be made. This is implicitly acknowledged here when monitoring ER Ca2+, with an elaborate protocol to deplete the organelle in order to achieve a reading at zero Ca2+.

      We thank the reviewer for the important comment. Please note that at no point in the manuscript we aim to compare different cell lines concerning their intracellular Ca2+ concentration, but we only compare the same cell lines after the different treatments, as we are aware of this limitation of fluorescent probes. However, to validate the differences in intracellular Ca2+ concentrations, we calibrated the signals derived from Fura-2 and 4mtD3cpV using ionomycin in combination with cellular Ca2+ depletion/ saturation. The newly added data can be found in the text, lines 185-190, 192-195, 199-200, and 228-230 in the tracked changes version of the manuscript, as well as new Figure S2F – Figure S2J and new Figure S2R – Figure S2V

      Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).

      Author response image 5.

      Dot blot for data shown in Figure 2I.

      The reviewer is right, it seems that BKCaRFP may also affect [Ca2+]mito. However, the effect is not significant and shows a p-value of p>0.999 using Kruskal-Wallistest followed by Dunn’s multiple comparison test, due to the non-normally distributed nature of the data. p=0.0002 for ctrl vs. BKCa-DECRFP and 0.0022 for BKCaRFP vs. BKCa-DECRFP, however. We added a scatter dot-blot of the respective data as Author response image 5 for reviewer’s inspection. Additionally, first, even using a more stringent statistical test by only comparing ctrl vs BKCaRFP using Mann-Whitney test, the results are not significant, as the p-value was determined at 0.4467, and second, we performed the requested Ca2+calibration using ionomycin under these conditions, which confirmed the difference between ctrl cells and BKCa-DECRFP expressing cells, but not BKCaRFP expressing ones. Please see Figure S2V.

      Reviewer #3 (Public Review):

      The original research article, titled "mitoBKCa is functionally expressed in murine and human breast cancer cells and promotes metabolic reprogramming" by Bischof et al, has demonstrated the underlying molecular mechanisms of alterations in the function of Ca2+ activated K+ channel of large conductance (BKCa) in the development and progression of breast cancer. The authors also proposed that targeting mitoBKCa in combination with established anti-cancer approaches, could be considered as a novel treatment strategy in breast cancer treatment.

      The paper is clearly written, and the reported results are interesting.

      Strengths:

      Rigorous biophysical experimental proof in support of the hypothesis.

      Weaknesses:

      A combinatorial synergistic study is missing.

      We thank reviewer #3 for the positive summary of our study. Indeed, we propose that targeting of mitoBKCa in combination with established anti-cancer drugs may represent a novel anti-cancer treatment strategy. Unfortunately, we feel that the manuscript is very condensed already, and that adding respective required experiments and data to support this hypothesis will make the flow of the manuscript more complex or even incomprehensible. As no attempts linking mitoBKCa activity with anti-cancer therapies have been made so far, we removed the respective information from the abstract and only discuss this aspect.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Statistics: Legends have to contain information about the number of biological replicates (N) and cells analysed (n). Statistics must be calculated with the averages of the replicates.

      Author response image 6.

      Representative single cell responses of Fura-2 loaded MMTV-PyMT WT cells.

      We thank the reviewer for the comment and added the missing details to all figure legends.

      We feel that using each cell represents exactly the power of high-resolution live-cell imaging, as there is no better biological replicate than a single separated cell, which is observed by fluorescence microscopy. This analysis is also able to visualize cell-to-cell differences in the microscopy area, similarly to patch-clamp experiments, where each single cell or mitoplast patched is used as a single replicate. Please find a representative dataset derived from fluorescence microscopy of different responses of neighboring single cells in Author response image 6.

      (2) Fig. 1G: This is a poor resolution figure, mostly because of its far too small size; in its current form it bears very little information.

      We agree with reviewer #1 and reperformed the imaging experiments using high resolution confocal imaging and exchanged the respective images. We feel that this increased the quality of the images significantly. Unfortunately, we were not able to increase the size of the images in the main figure, hence, we added magnifications of the respective images as new Figure S1I.

      (3) Fig. 1H: What do the dotted grey lines and the labels stand for?

      We believe Reviewer #1 is probably referring to Figure 1G. As indicated in the figure panel and in the text, the grey dotted lines and labels indicate the colocalization scores of mtRFP and RFP-GPI with MitoGREEN, respectively. These data are also shown in Figure S1H, including error bars and statistics. We added additional information in the text to make the meaning of the lines clearer to the reader. Please consult lines 149 – 150 in the tracked changes version of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) May the metabolic effects be ascribed to a BK located in mitochondria? Short of a way to tackle BK function and metabolism specifically in mitochondria, the conclusion may best be toned down to "intracellular BK". For the time being the term "mitoBK" appears too ambitious.

      We fell that you are right and that our previous overstatement requires adaptation as a clear (100%) attribution of the observed metabolic effects solely to mitoBKCa is not definitely possible. We have therefore amended all relevant passages in the entire MS accordingly.

      (2) MitoBK subcellular location. Please address the points raised in the Public Review.

      As stated above we addressed the point raised in the public review accordingly (please consult new Figure S1I and revised Figure 1G).

      (3) Calibration of fluorescent probes. Please provide calibrations for cytosolic and mitochondrial Ca2+, for example, the standard high Ca2+/ionophore/metabolic inhibition treatment to reach saturation followed by Ca2+ chelation to obtain zero Ca2+.

      We thank the reviewer for the comment. As you can see from our response to the public review, we performed the respective experiments, and datasets were added in the manuscript.

      (4) Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).

      Please consult our response to the (same) comment in the public review.

      (5) Line 228. The statement "Similar results were obtained in MDA-MB-453 cells" is confusing. As shown in Fig.3, pax reduced ECAR and OCR in MMTV-PyMT WT cells. As ibtx was without effect, it is suggested that intracellular BK support metabolism. However, the effect of pax on MDA cells was the opposite. Doesn´t this divergence speak against a universal role of intracellular BKs in promoting metabolism in BCCs? A similar point may be made regarding metabolomics, which showed no effects of pax on lactate and pyruvate in MMTV-PyMT WT cells but stimulation in MDA cells. Perhaps the word "promotes" in the title of the figure should be replaced by something more neutral like "affects" or "alters", as used elsewhere,

      We thank the reviewer for pointing out the overstatement regarding intracellular BK functions and changed the title of the figure as suggested.

      With regard to the experiments mentioned, we would like to point out the following aspects:

      First, the cell lines used strongly differ in their metabolic settings under basal conditions. While both, MMTV-PyMT and MDA-MB-453 cells seem to show similar basal ECAR levels (if BKCa was present), their OCR seems to differ strongly. MMTV-PyMT cells seem to show a basal OCR which is almost at the maximum already, while MDA-MB-453 cells possess a tremendous capacity in their OCR, as observed upon mitochondrial uncoupling using FCCP. Of note, both, ECAR and OCR are indirect metabolic measures. On the one hand, ECAR measures extracellular acidification, which is accomplished by H+ along with lactate secretion. However, lactate secretion is not the only process leading to extracellular acidification, and ECAR may hence measure a variety of H+ releasing processes, including processes of vesicle secretion. On the other hand, OCR is not directly linked to ATP production, as mitochondrial complex IV is consuming O2, ATP, however, is produced by mitochondrial complex V. This becomes even more evident when having a look on OCRs after FCCP treatment – under these conditions, the H+ gradient is destroyed and ATP synthase activity is reduced, OCR, however, increases to the maximum due to increased supply of mitochondrial complex IV with H+.

      Second, please note that the LC-MS-based metabolomics derive from a static single time point and not from an over-time “live” read-outs. Moreover, underlying dynamics of the parameters measured can not be assessed. Hence, as an example, increasing levels of pyruvate can e.g. indicate faster generation, or slower subsequent degradation/ metabolization. A clear in-depth statement about what is happening under basal and BKCa inhibitor treated conditions is hence not possible. The only conclusion possible to draw from these experiments is that paxilline treatment differentially affects metabolic pathways in these cells.

      Based on these limitations of both methods, we decided to perform our in-depth fluorescence microscopy-based analysis, which provided strong evidence for intracellular BKCa channels on mitochondrial ATP production. Despite opposing effects of BKCa inhibition on OCR in MMTV-PyMT WT and MDA-MB-453 cells, mitochondrial ATP production was reduced, if BKCa-DECRFP was expressed/ intracellular BKCa was functional.

      In line with these findings, mitoBKCa was recently described as an uncoupling protein, which could furthermore explain the differential effects of intracellular BKCa inhibition on OCR. https://doi.org/10.1038/s41598-021-90465-3

      Minor

      (6) Fig. 1C. Average fluorescence intensity in 6 experiments was about 20% higher in BK-KO cells relative to WT. Such a small difference is significant but should not be evident to the eye. The pictures selected for illustration appear to show a much larger difference and therefore may not be representative. If this is the case, please omit them. The same goes for the other representative pictures.

      Author response image 7.

      : Representative images at different brightnesses.

      Please note, that the analysis of the images was done in an unbiased way using a Fiji macro. After analysis, we chose representative images, which were closest to the average.

      Furthermore, we must kindly disagree with the reviewer as changes of 20% in fluorescence intensity are indeed evident to the eye (consult Author response image 7). This panels show the same image at different brightness levels with intensity differences of 20%. Hence, we feel, that all the images the reviewer was referring are representative for the values given.

      (7) Line 130. The definition of "recent" is of course relative, but 10 years?

      We are very glad that you have discovered this “inconsistency", and reworded the respective phrase accordingly.

      (8) Line 327. "conductivity" is the property of a medium, "conductance" is the property of a component, such as a channel.

      We thank the reviewer for the important comment. We revised the text accordingly.

      (9) Various figures. FRET sensor data are expressed as Ratio(FRET/CFP). This is unusual, typically it should be FRET ratio (YFP/CFP), FRET ratio(mTFP/Venus), etc. Please note that the FRET partners differ between sensors.

      We acknowledge the comment of the reviewer. It is correct that fluorescent proteins vary widely between the sensors (used). Please note, however, the following: The emission measured from these sensors actually represents FRET, as CFP but not YFP is directly excited. Hence, emission is FRET, not the “intrinsic” fluorescence of the YFP. This is getting more and more important to differentiate, as there are probes existing, which can also be “alternately” excited, i.e. CFP and YFP separately, which will then yield the YFP/CFP ratio (https://doi.org/10.1021/acssensors.8b01599). In case of only CFP excitation, we feel, that the term FRET/CFP is preferable over other labelings such as YFP/CFP.

      (10) BK-DEC makes BCCs cells less oxidative. However, BK-DEC was first described in cardiomyocytes, which are among the most oxidative cell types. It would be useful if authors could address this apparent contradiction in the Discussion Section.

      That is an exciting point that we addressed as follows in the revised MS:

      First, it is important to mention that cardiac myocytes do not show a metabolic Warburg setting and are – under physiologic conditions – maintained in a high O2 environment.

      Second, a recent study from our group addressed the question about the role of mitoBKCa in primary cardiac myocytes. Indeed, mitoBKCa was functionally expressed in these cells. Interestingly, under physiologic conditions, the channel did not alter (multiple) cell behaviours nor overall cardiac physiology in a mouse model. However, upon induction of ischemia/ reperfusion injury, a lack of BK increased cardiac susceptibility to cell death resulting in increased infarction size (https://doi.org/10.1161/CIRCULATIONAHA.117.028723). Hence, also in this cell model, BKCa only played a role under oxygen limited conditions/ conditions where mitochondria were not properly functioning. Thus, the results derived from cardiac myocytes support our recent findings in BCCs, as BKCa mediates BCC resistance to hypoxic stress/ makes BCCs more independent from oxidative metabolism.

      Parts of this discussion were included in the revised MS. Please consult lines 490-500 in the tracked changes version of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The study is very well designed and most of the computational analyses were done rigorously.

      We highly appreciate the positive feedback by reviewer #3.

      (2) The authors should discuss the expression of BKCa in different subsets of breast cancer. Authors may also debate on the level of steroid receptors and BKCa expressions.

      We thank reviewer #3 for the important suggestion and added the requested information in the discussion, lines 445-447 and 450-454 in the tracked changes version of the manuscript.

      (3) In the discussion section, the authors mentioned that the MCF7 cell is the best model to study this hypothesis. Does it imply that triple-negative breast cancer cell lines express lower levels of BKCa? The authors should discuss this.

      We thank the reviewer for the interesting comment; we would like to point out that the ERα-positive MCF-7 cell line was used to study experimental overexpression of BKCa at an otherwise low baseline level. This does not imply that BKCa is expressed at lower levels in TNBC cell lines; in fact a recent study showed the opposite, i.e. overexpression of BKCa in TNBC patients (10.1186/s12885-020-07071-1). Consistent with our work, the authors conclude that the channel could even be a new strategy for development of a targeted therapy in TNBC. We also added this information in the discussion, lines 450-454 in the tracked changes version of the manuscript.

      (4) The authors propose that combinatorial targeting of mitoBKCa along with known breast cancer chemotherapeutics can open a new horizon in breast cancer treatment. However, the authors did not perform any experiment to show the synergistic effect as mentioned.

      As already stated in the public reviews, we feel that the manuscript is very condensed already, and that adding the respective experiments and data will make the flow of the study even more complex. For the moment, we removed all information and statements linking mitoBKCa with anti-cancer treatment strategies from the abstract and only discuss this aspect. We hope that the reviewer agrees with us that an extensive analysis of the functional mitoBKCa status in the context of established breast cancer therapies must be addressed by (our) future studies.

      Minor Comments:

      There are several typos and grammatical errors that need further attention and rephrasing.

      We thank the reviewer for the comment and revised the text accordingly.

    2. Reviewer #1 (Public Review):

      Original Review:

      Bischoff et al present a carefully prepared study on a very interesting and relevant topic: the role of ion channels (here a Ca2+-activated K+ channel BK) in regulating mitochondrial metabolism in breast cancer cells. The potential impact of these and similar observations made in other tumor entities has only begun to be appreciated. That being said, the authors pursue in my view an innovative approach to understanding breast cancer cell metabolism.

      Considering the following points would further strengthen the manuscript:

      Methods:

      (1) The authors use an extracellular Ca2+ concentration (2 mM) in their Ringer's solutions that is almost twice as high as the physiologically free Ca2+ concentration (ln 473). Moreover, the free Ca2+ concentration of their pipette solution is not indicated (ln 487).

      (2) Ca2+I measurements: The authors use ATP to elicit intracellular Ca2+ signals. Is this then physiological stimulus for Ca2+ signaling in breast cancer? What is the rationale for using ATP? Moreover, it would be nice to see calibrated baseline values of Ca2+i

      (3) Membrane potential measurements: It would be nice to see a calibration of the potential measurements; this would allow to correlate IV relationship with membrane potential. Without calibration it is hard to compare unless the identical uptake of the dye is shown.<br /> Do paxilline or IbTx also induce a depolarization?

      (4) mito-potential measurements: Why did the authors use such a long time course and preincubated cells mit channel blockers overnight? Why did they not perform paired experiments and record the immediate effect of the BK channel blockers in the mito potential?

      (5) MTT assay are also based on mitochondrial function - since modulation of mito function is at the core of this manuscript, an alternative method should be used.

      Results:

      (1) Fig. 5G: The number of BK "positive" mitoplasts is surprisingly low - how does this affect the interpretation? Did the authors attempt to record mitoBK current in the "whole-mitoplast" mode? How does the mitoBK current density compare with that of the plasma membrane? Is it possible to theoretically predict the number of mitoBK channels per mitochondrium to elicit the observed effects? Can these results be correlated with immuno-localization of mitoBK channels?

      (2) There are also reports about other mitoK channels (e.g. Kv1.3, KCa3.1, KATP) playing an important role in mitochondrial function. Did the authors observe them, too? Can the authors speculate on the relative importance of the different channels? Is it known whether they are expressed organ-/tumor-specifically?

      Comments on revised version:

      The authors responded to all of my comments - except for one - in a satisfactory way so that I have no further concerns. The authors have prepared a very interesting piece of work that advances the field.

      However, I disagree with respect to their interpretation of statistics. Individually analyzed cells are not the best biological replicate per se. In my view a true replicate requires the use of an independent batch of cells derived from a new passage. The statistical analysis can only based on the total number of n cells, if each replicate contributes the same number of cells. If this is not the case, the authors will have to calculate the average of each replicate first so that they are equally weighted.

    3. eLife assessment

      The large-conductance Ca2+ activated K+ channel BKCa has been reported to promote breast cancer progression. The present study presents convincing evidence that an intracellular subpopulation of this channel reprograms breast cancer cells towards the Warburg phenotype, one of the metabolic hallmarks of cancer. This important finding advances the field of cancer cell metabolism and has potential therapeutic implications.

    4. Reviewer #2 (Public Review):

      Summary:

      The large-conductance Ca2+ activated K+ channel (BK) has been reported to promote breast cancer progression, but it is not clear how. The present study, carried out in breast cancer cell lines, concludes that BK located in mitochondria reprograms cells towards the Warburg phenotype, one of the metabolic hallmarks of cancer.

      Strengths:

      The use of a wide array of modern complementary techniques, including metabolic imaging, respirometry, metabolomics and electrophysiology. On the whole experiments are astute and well designed, and appear carefully done. The use of a BK knock out cells to control for the specificity of the pharmacological tools is a major strength. The manuscript is clearly written. There are many interesting original observations that may give birth to new studies.

      Weaknesses: The main conclusion regarding the role of a BK channel located in mitochondria appears is not sufficiently supported. Other perfectible aspects are the interpretation of co-localization experiments and the calibration of Ca2+ dyes. These points are discussed in more detail in the following paragraphs:

      (1) May the metabolic effects be ascribed to a BK located in mitochondria? Unfortunately not, at least with the available evidence. While it is clear these cells have a BK in mitochondria (characteristic K+ currents detected in mitoplasts) and it is also well substantiated that the metabolic effects in intact cells are explained by an intracellular BK (paxilline effects absent in the BK KO), it does not follow that both observations are linked. Given that ectopic BK-DEC appeared at the surface, a confounding factor is the likely expression of BK in other intracellular locations such as ER, Golgi, endosomes, etc. To their credit authors acknowledge this limitation several times throughout the text ("...presumably mitoBK...") but not in other important places, particularly in title and abstract.

      (2) mitoBK subcellular location. Pearson correlations of 0.6 and about zero were obtained between the locations of mitoGREEN on one side, and mRFP or RFP-GPI on the other (Figs. 1G and S1E). These are nice positive and negative controls. For BK-DECRFP however the Pearson correlation was about 0.2. What is the Z resolution of apotome imaging? Assuming an optimum optical section of 600 nm, as obtained a 1.4 NA objective with a confocal, that mitochondria are typically 100 nm in diameter and that BK-DECRFP appears to stain more structures that mitoGREEN, the positive correlation of 0.2 may not reflect colocalization. For instance, it could be that BK-DECRFP in not just in mitochondria but in a close underlying organelle e.g. the ER. Along the same line, why did BK-RFP also give a positive Pearson? Isn´t that unexpected? Considering that BK-DEC was found by patch clamping at the plasma membrane, the subcellular targeting of the channel is suspect. Could it be that the endogenous BK-DEC does actually reside exclusively in mitochondria (a true mitoBK), but overflows to other membranes upon overexpression? Regarding immunodetection of BK in the mitochondrial Percoll preparation (Fig. S5), absence of NKA demonstrates absence of plasma membrane contamination, but does not inform about contamination by other intracellular membranes.

      (3) Calibration of fluorescent probes. The conclusion that BK blockers or BK expression affects resting Ca2+ levels should be better supported. Fluorescent sensors and dyes provide signals or ratios that need be calibrated if comparisons between different cell types or experimental conditions are to be made. This is implicitly acknowledged here when monitoring ER Ca2+, with an elaborate protocol to deplete the organelle in order to achieve a reading at zero Ca2+.

      (4) Line 203. "...solely by the expression of BKCa-DECRFP in MCF-7 cells". Granted, the effect of BKCa-DECRFP on the basal FRET ratio appears stronger than that of BK-RFP, but it appears that the latter had some effect. Please provide the statistics of the latter against the control group (after calibration, see above).

      The revised version of the manuscript has incorporated my suggestions to a very reasonable degree, in several cases with new experiments. The details of these improvements can be found in the correspondence.

    5. Reviewer #3 (Public Review):

      The original research article, titled "mitoBKCa is functionally expressed in murine and human breast cancer cells and promotes metabolic reprogramming" by Bischof et al, has demonstrated the underlying molecular mechanisms of alterations in the function of Ca2+ activated K+ channel of large conductance (BKCa) in the development and progression of breast cancer. The authors also proposed that targeting mitoBKCa in combination with established anti-cancer approaches, could be considered as a novel treatment strategy in breast cancer treatment.

      The paper is modified according to the reviewer's comments. Most of the queries raised by this reviewer were answered. However, the preclinical implication of this study can also be manifested in combinatorial treatment with known chemotherapeutic drugs which is lacking in this manuscript. Hopefully, the authors will consider this in their future study.

    1. eLife assessment

      This important study discovered DBT as a novel gene implicated in the resistance to MG132-mediated cytotoxicity and potentially also in the pathogenesis of ALS and FTD, two fatal neurodegenerative diseases. The authors provided convincing evidence to support a mechanism by which loss of DBT suppresses MG132-mediated toxicity via promoting autophagy. This work will be of interest to cell biologists and biochemists, especially in the FTD/ALS field.

    2. Reviewer #1 (Public Review):

      Summary:

      Through an unbiased genomewide KO screen, the authors identified loss of DBT to suppress MG132-mediated death of cultured RPE cells. Further analyses suggested that DBT reduces ubiquitinated proteins by promoting autophagy. Mechanistic studies indicated that DBT loss promotes autophagy via AMPK and its downstream ULK and mTOR signaling. Furthermore, loss of DBT suppresses polyglutamine- or TDP-43-mediated cytotoxicity and/or neurodegeneration in fly models. Finally, the authors showed that DBT proteins are increased in ALS patient tissues, compared to non-neurological controls.

      Strengths:

      The idea is novel, the evidence is convincing, and the data are clean. The findings have implications for human diseases.

      Weaknesses:

      None.

    3. Reviewer #2 (Public Review):

      Summary:

      Hwang, Ran-Der et al utilized a CRISPR-Cas9 knockout in human retinal pigment epithelium (RPE1) cells to evaluate for suppressors of toxicity by the proteasome inhibitor MG132 and identified that knockout of dihydrolipoamide branched chain transacylase E2 (DBT) suppressed cell death. They show that DBT knockout in RPE1 cells does not alter proteasome or autophagy function at baseline. However, with MG132 treatment, they show a reduction in ubiquitinated proteins but with no change in proteasome function. Instead, they show that DBT knockout cells treated with MG132 have improved autophagy flux compared to wildtype cells treated with MG132. They show that MG132 treatment decreases ATP/ADP ratios to a greater extent in DBT knockout cells, and in accordance causes activation of AMPK. They then show downstream altered autophagy signaling in DBT knockout cells treated with MG132 compared to wild-type cells treated with MG132. Then they express the ALS mutant TDP43 M337 or expanded polyglutamine repeats to model Huntington's disease and show that knockdown of DBT improves cell survival in RPE1 cells with improved autophagic flux. They also utilize a Drosophila models and show that utilizing either a RNAi or CRISPR-Cas9 knockout of DBT improves eye pigment in TDP43M337V and polyglutamine repeat-expressing transgenic flies. Finally, they show evidence for increased DBT in postmortem spinal cord tissue from patients with ALS via both immunoblotting and immunofluorescence.

      Strengths:

      This is a mechanistic and well-designed paper that identifies DBT as a novel regulator of proteotoxicity via activating autophagy in the setting of proteasome inhibition. Major strengths include careful delineation of a mechanistic pathway to define how DBT is protective. These conclusions are well-justified.

      Weaknesses:

      None

    1. Reviewer #3 (Public Review):

      Summary:

      The authors aimed to develop an automated tool to easily collect, process, and annotate the biomedical literature for higher efficiency and better reproducibility.

      Strengths:

      Two charms coming with the efforts made by the team are Pubget (for efficient and reliable grabbing articles from PubMed) and labelbuddy (for annotating text). They make text-mining of the biomedical literature more accessible, effective, and reproducible for streamlined text-mining and meta-science projects. The data were collected and analyzed using solid and validated methodology and demonstrated a very promising direction for meta-science studies.

      Weaknesses:

      More developments are needed for different resources of literature and strengths of AI-powered functions.

    2. eLife assessment

      The study presents an important ecosystem designed to support literature mining in biomedical research, showcasing a methodological framework that includes tools like Pubget for article collection and labelbuddy for text annotation. The solid evidence presented for these tools suggests they could streamline the analysis and annotation of scientific literature, potentially benefiting research across a range of biomedical disciplines. While the primary focus is on neuroimaging literature, the applicability of these methods and tools might extend further, offering useful advancements in the practices of meta-research and literature mining.

    3. Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors present new tools to collect and process information from the biomedical literature that could be typically used in a meta-analytic framework. The tools have been specifically developed for the neuroimaging literature. However, many of their functions could be used in other fields. The tools mainly enable to downloading of batches of paper from the literature, extracting relevant information along with meta-data, and annotating the data. The tools are implemented in an open ecosystem that can be used from the command line or Python.

      Strengths:

      The tools developed here are really valuable for the future of large-scale analyses of the biomedical literature. This is a very well-written paper. The presentation of the use of the tools through several examples corresponding to different scientific questions really helps the readers to foresee the potential application of these tools.

      Weaknesses:

      The tools are command-based and store outcomes locally. So users who prefer to work only with GUI and web-based apps may have some difficulties. Furthermore, the outcomes of the tools are constrained by inherent limitations in the scientific literature, in particular, here the fact that only a small portion of the publications have full text openly available.

    4. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors described the litmining ecosystem that can flexibly combine automatic and manual annotation for meta-research.

      Strengths:

      Software development is crucial for cumulative science and of great value to the community. However, such works are often greatly under-valued in the current publish-or-perish research culture. Thus, I applaud the authors' efforts devoted to this project. All the tools and repositories are public and can be accessed or installed without difficulty. The results reported in the manuscript are also compelling that the ecosystem is relatively mature.

      Weaknesses:

      First and foremost, the logic flow of the current manuscript is difficult to follow.

      The second issue is the results from the litmining ecosystem were not validated and the efficiency of using litmining was not quantified. To validate the results, it would be better to directly compare the results of litmining with recognized ground truth in each of the examples. To prove the efficiency of the current ecosystem, it would be better to use quantitative indices for comparing the litmining and the other two approaches (in terms of time and/or other costs in a typical meta-research).

      The third family of issues is about the functionality of the litmining ecosystem. As the authors mentioned, the ecosystem can be used for multiple purposes, however, the description here is not sufficient for researchers to incorporate the litmining ecosystem into their meta-research project. Imagine that a group of researchers are interested in using the litmining ecosystem to facilitate their meta-analyses, how should they incorporate litmining into their workflow? I have this question because, in a complete meta-analysis, researchers are required to (1) search in more than one database to ensure the completeness of their literature search; (2) screen the articles from the searched articles, which requires inspection of the abstract and the pdf; (3) search all possible pdf file of included articles instead of only relying on the open-access pdf files on PMC database. That said, if researchers are interested in using litmining in a meta-analysis that follows reporting standards such as PRISMA, the following functionalities are crucial:<br /> (a) How to incorporate the literature search results from different databases;<br /> (b) After downloading the meta-data of articles from databases, how to identify whose pdf files can be downloaded from PMC and whose pdf files need to be searched from other resources;<br /> (c) Is it possible to also annotate pdf files that were not downloaded by pubget?<br /> (d) How to maintain and update the meta-data and intermediate data for a meta-analysis by using litmining? For example, after searching in a database using a specific command and conducting their meta-analysis, researchers may need to update the search results and include items after a certain period.

    1. eLife assessment

      The authors identify a population of neurons with a specific complement of markers that originate in a distinct location from where cerebellar nuclear precursor cells have been thought to originate, that show distinct developmental properties. The discovery of a new germinal zone giving rise to a new population of CN neurons is an important finding, and it enriches our understanding of cerebellar development. The claims are supported by solid evidence and the authors use a wide range of technical approaches, including transgenic mice, that allow them to disentangle the influence of distinct developmental organizers.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors are interested in the developmental origin of the neurons of the cerebellar nuclei. They identify a population of neurons with a specific complement of markers that originate in a distinct location from where cerebellar nuclear precursor cells have been thought to originate that show distinct developmental properties. The cerebellar nuclei have been well studied in recent years both to understand their development and through an evolutionary lens, which supports the importance of this study. The discovery of a new germinal zone giving rise to a new population of CN neurons is an exciting finding, and it enriches our understanding of cerebellar development, which has previously been quite straightforward, where cerebellar inhibitory cells arise from the ventricular zone and the excitatory cells arise from the rhombic lip.

      Strengths:

      One of the strengths of the manuscript is that the authors use a wide range of technical approaches, including transgenic mice that allow them to disentangle the influence of distinct developmental organizers such as ATOH.<br /> Their finding of a novel germinal zone and a novel population of CN neurons is important for developmental neuroscientists, and cerebellar neuroscientists.

      Weaknesses:

      One important question raised by this work is what these newly identified cells eventually become in the adult cerebellum. Are they excitatory or inhibitory? Do they correspond to a novel cell type or perhaps one of the cell classes that have been recently identified in the cerebellum (e.g. Fujita et al., eLife, 2020)? Understanding this would significantly bolster the impact of this manuscript.

      The major weakness of the manuscript is that it is written for a very specialized reader who has a strong background in cerebellar development, making it hard to read for a general audience. It's challenging to follow the logic of some of the experiments as well as to contextualize these findings in the field of cerebellar development.

    3. Reviewer #2 (Public Review):

      Summary:

      Canonically cerebellar neurons are derived from 2 primary germinal zones within the anterior hindbrain (dorsal rhombomere 1). This manuscript identifies an important, previously underappreciated origin for a subset of early cerebellar nuclei neurons - the dorsal mesencephalon. This is an exciting finding. While the conclusions are generally supported, several of the figure panels are of inferior quality and do not readily convey the results the authors assert.

      Strengths:

      The authors have identified a novel early population of cerebellar neurons with likely novel origin in the midbrain. They have used multiple assays to support their conclusions, including immunohistochemistry and in situ analyses of a number of markers of this population which appear to stream from the midbrain into the dorsal anterior cerebellar anlage.

      The inclusion of Otx2-GFP short-term lineage analyses and analysis of Atoh1 -/- animals also provide considerable support for the midbrain origin of these neurons as streams of cells seem to emanate from the midbrain. However, without live imaging, there remains the possibility that these streams of cells are not actually migrating, and rather, gene expression is changing in static cells. Hence the authors have conducted midbrain diI labelling experiments of short-term and long-term cultured embryos showing di-labelled cells in the developing cerebellum. These studies confirm the migration of cells from the midbrain into the early cerebellum.

    1. eLife assessment

      This study convincingly demonstrates the ability of reverting a neurodevelopmental defect with a dietary intervention. While the exact mechanisms remain to be elucidated, the authors establish a simple but important system to study the PI3K/Akt/FOXO pathway but also the action of ketone bodies and their potential therapeutic use. This study will be of particular interest to the large community of scientists studying E/I disequilibrium in the nervous system.

    2. Reviewer #1 (Public Review):

      Summary:

      This interesting study explores the mechanism behind an increased susceptibility of daf-18/PTEN mutant nematodes to paralyzing drugs that exacerbate cholinergic transmission. The authors use state-of-the-art genetics and neurogenetics coupled with locomotor behavior monitoring and neuroanatomical observations using gene expression reporters to show that the susceptibility occurs due to low levels of DAF-18/PTEN in developing inhibitory GABAergic neurons early during larval development (specifically, during the larval L1 stage). DAF-18/PTEN is convincingly shown to act cell-autonomously in these cells upstream of the PI3K-PDK-1-AKT-DAF-16/FOXO pathway, consistent with its well-known role as an antagonist of this conserved signaling pathway. The authors exclude a role for the TOR pathway in this process and present evidence implicating selectivity towards developing GABAergic neurons. Finally, the authors show that a diet supplemented with a ketogenic body, β-hydroxybutyrate, which also counteracts the PI3K-PDK-1-AKT pathway, promoting DAF-16/FOXO activity, partially rescues the proper development (morphology and function) of GABAergic neurons in daf-18/PTEN mutants, but only if the diet is provided early during larval development. This strongly suggests that the critical function of DAF-18/PTEN in developing inhibitory GABAergic neurons is to prevent excessive PI3K-PDK-1-AKT activity during this critical and particularly sensitive period of their development in juvenile L1 stage worms. Whether or not the sensitivity of GABAergic neurons to DAF-18/PTEN function is a defining and widespread characteristic of this class of neurons in C. elegans and other animals, or rather a particularity of the unique early-stage GABAergic neurons investigated remains to be determined.

      Strengths:

      The study reports interesting and important findings, advancing the knowledge of how daf-18/PTEN and the PI3K-PDK-1-AKT pathway can influence neurodevelopment, and providing a valuable paradigm to study the selectivity of gene activities towards certain neurons. It also defines a solid paradigm to study the potential of dietary interventions (such as ketogenic diets) or other drug treatments to counteract (prevent or revert?) neurodevelopment defects and stimulate DAF-16/FOXO activity.

      Weaknesses:

      (1 )Insufficiently detailed methods and some inconsistencies between Figure 4 and the text undermine the full understanding of the work and its implications.

      The incomplete methods presented, the imprecise display of Figure 4, and the inconsistency between this figure and the text, make it presently unclear what are the precise timings of observations and treatments around the L1 stage. What exactly do E-L1 and L1-L2 mean in the figure? The timing information is critical for the understanding of the implications of the findings because important changes take place with the whole inhibitory GABAergic neuronal system during the L1 stage into the L2 stage. The precise timing of the events such as neuronal births and remodelling events are well-described (e.g., Figure 2 in Hallam and Jin, Nature 1998; Fig 7 in Mulcahy et al., Curr Biol, 2022). Likewise, for proper interpretation of the implication of the findings, it is important to describe the nature of the defects observed in L1 larvae reported in Figure 1E - at present, a representative figure is shown of a branched commissure. What other types of defects, if any, are observed in early L1 larvae? The nature of the defects will be informative. Are they similar or not to the defects observed in older larvae?

      (2) The claim of proof of concept for a reversal of neurodevelopment defects is not fully substantiated by data.

      The authors state that the work "constitutes a proof of concept of the ability to revert a neurodevelopmental defect with a dietary intervention" (Abstract, Line 56), however, the authors do not present sufficient evidence to distinguish between a "reversal" or prevention of the neurodevelopment defect by the dietary intervention. This clarification is critical for therapeutic purposes and claims of proof-of-concept. From the best of my understanding, reversal formally means the defect was present at the time of therapy, which is then reverted to a "normal" state with the therapy. On the other hand, prevention would imply an intervention that does not allow the defect to develop to begin with, i.e., the altered or defective state never arises. In the context of this study, the authors do not convincingly show reversal. This would require showing "embryonic" GABAergic neuron defects or showing convincing data in newly hatched L1 (0-1h), which is unclear if they do so or not, as I have failed to find this information in the manuscript. Again, the method description needs to be improved and the implications can be very different if the data presented in Figure 2D-E regard newly born L1 animals (0-1h) or L1 animals at say 5-7h after hatching. This is critical because the development of the embryonically-born GABAergic DD neurons, for instance, is not finalized embryonically. Their neurites still undergo outgrowth (albeit limited) upon L1 birth (see DataS2 in Mulcahy et al., Curr Biol 2022), hence they are susceptible to both committing developmental errors and to responding to nutritional interventions to prevent them. In contrast to embryonic GABAergic neurons, embryonic cholinergic neurons (DA/DB) do not undergo neurite outgrowth post-embryonically (Mulcahy et al., Curr Biol 2022), a fact which could provide some mechanistic insight considering the data presented. However, neurites from other post-embryonically-born neurons also undergo outgrowth post-embryonically, but mostly during the second half of the L1 stage following their birth up to mid-L2, with significant growth occurring during the L1-L2 transition. These are the cholinergic (VA/VB and AS neurons) and GABAergic (VD) neurons. The fact that AS neurons undergo a similar amount of outgrowth as VD neurons is informative if VD neurons are or are not susceptible to daf-18/PTEN activity. Independently, DD neurons are still quite unique on other aspects (see below), which could also bring insight into their selective response.

      Finally, even adjusting the claim to "constitutes a proof-of-concept of the ability of preventing a neurodevelpmental defect with a dietary intervention" would not be completely precise, because it is unclear how much this work "constitutes a proof of concept". This is because, unless I misunderstood something, dietary interventions are already applied to prevent neurodevelopment defects, such as when folic acid supplementation is recommended to pregnant women to prevent neural tube defects in newborns.

      (3) The data presented do not warrant the dismissal of DD remodeling as a contributing factor to the daf-18/PTEN defects.

      Inhibitory GABAergic DD neurons are quite unique cells. They are well-known for their very particular property of remodeling their synaptic polarity (DD neurons switch the nature of their pre- and post-synaptic targets without changing their wiring). This process is called DD remodeling. It starts in the second half of the L1 stage and finishes during the L2 stage. Unfortunately, the fact that the authors find a specific defect in early GABAergic neurons (which are very likely these unique DD neurons) is not explored in sufficient detail and depth. The facts that these neurons are not fully developed at L1, that they still undergo limited neurite growth, and that they are poised for striking synaptic plasticity in a few hours set them apart from the other explored neurons, such as early cholinergic neurons, which show a more stable dynamics and connectivity at L1 (see Mulcahy et al., Curr Biol 2022).

      The authors use their observation that daf-18/PTEN mutants present morphological defects in GABAergic neurons prior to DD remodeling to dismiss the possibility that the DAF-18/PTEN-dependent effects are "not a consequence of deficient rearrangement during the early larval stages". However, DD remodeling is just another cell-fate-determined process and as such, its timing, for instance, can be affected by mutations in genes that affect cell fates and developmental decisions, such as daf-18 and daf-16, which affect developmental fates such as those related with the dauer fate. Specifically, the authors do not exclude the possibility that the defects observed in the absence of either gene could be explained by precocious DD remodeling. Precocious DD remodeling can occur when certain pathways, such as the lin-14 heterochronic pathway, are affected. Interestingly, lin-14 has been linked with daf-16/FOXO in at least two ways: during lifespan determination (Boehm and Slack, Science 2005) and in the L1/L2 stages via the direct negative regulation of an insulin-like peptide gene ins-33 (Hristova et al., Mol Cell Bio 2005). It is likely that the prevention of DD dysfunction requires keeping insulin signaling in check (downregulated) in DD neurons in early larval stages, which seems to coincide with the critical timing and function of daf-18/PTEN. Hence, it will be interesting to test the involvement of these genes in the daf-18/daf-16 effects observed by the authors.

      Discussion on the impact of the work on the field and beyond:

      The authors significantly advance the field by bringing insight into how DAF-18/PTEN affects neurodevelopment, but fall short of understanding the mechanism of selectivity towards GABAergic neurons, and most importantly, of properly contextualizing their findings within the state-of-the-art C. elegans biology.

      For instance, the authors do not pinpoint which type of GABAergic neuron is affected, despite the fact that there are two very well-described populations of ventral nerve cord inhibitory GABAergic neurons with clear temporal and cell fate differences: the embryonically-born DD neurons and the post-embryonically-born VD neurons. The time point of the critical period apparently defined by the authors (pending clarifications of methods, presentation of all data, and confirmation of inconsistencies between the text and figures in the submitted manuscript) could suggest that DAF-18/PTEN is required in either or both populations, which would have important and different implications. An effect on DD neurons seems more likely because an image is presented (Figure 2D) of a defect in an L1 daf-18/PTEN mutant larva with 6 neurons (which means the larva was processed at a time when VD neurons were not yet born or expressing pUnc-47, so supposedly it is an image of a larva in the first half of the L1 stage (0-~7h?)). DD neurons are also likely the critical cells here because the neurodevelopment errors are partially suppressed when the ketogenic diet is provided at an "early" L1 stage, but not later (e.g., from L2-L3, according to the text, L2-L4 according to the figure? ).

      This study brings important contributions to the understanding of GABAergic neuron development in C. elegans, but unfortunately, it is justified and contextualized mostly in distantly-related fields - where the study has a dubious impact at this stage rather than in the central field of the work (post-embryonic development of C. elegans inhibitory circuits) where the study has stronger impact. This study is fundamentally about a cell fate determination event that occurs in a nutritionally-sensitive developmental stage (post-embryonic L1 larval stage) yet the introduction and discussion are focused on more distantly related problems such as excitatory/inhibitory (E/I) balance, pathophysiology of human diseases, and treatments for them. Whereas speculation is warranted in the discussion, the reduced in-depth consideration of the known biology of these neurons and organisms weakens the impact of the study as redacted. For instance, the critical role of DAF-18/PTEN seems to occur at the early L1 larval stage, a stage that is particularly sensitive to nutritional conditions. The developmental progression of L1 larvae is well-known to be sensitive to nutrition - eg, L1 larvae arrest development in the absence of food, something that is explored in nematode labs to synchronize animals at the L1 stage by allowing embryos to hatch into starvation conditions (water). Development resumes when they are exposed to food. Hence, the extensive postembryonic developmental trajectory that GABAergic neurons need to complete is expected to be highly susceptible to nutrition. Is it? The sensitivity towards the ketogenic diet intervention seems to favor this. In this sense, the attribution of the findings to issues with the nutrition-sensitive insulin-like signaling pathway seems quite plausible, yet this possibility seems insufficiently considered and discussed.

      Finally, the fact that imbalances in excitatory/inhibitory (E/I) inputs are linked to Autism Spectrum Disorders (ASD) is used to justify the relevance of the study and its findings. Maybe at this stage, the speculation would be more appropriate if restricted to the discussion. In order to be relevant to ASD, for instance, the selectivity of PTEN towards inhibitory neurons should occur in humans too. However, at present, the E/I balance alteration caused by the absence of daf-18/PTEN in C. elegans could simply be a coincidence due to the uniqueness of the post-embryonic developmental program of GABAergic neurons in C. elegans. To be relevant, human GABAergic neurons should also pass through a unique developmental stage that is critically susceptible to the PI3K-PDK1-AKT pathway in order for DAF-18/PTEN to have any role in determining their function. Is this the case? Hence, even in the discussion, where the authors state that "this study provides universally relevant information on.... the mechanisms underlying the positive effects of ketogenic diets on neuronal disorders characterized by GABA dysfunction and altered E/I ratios", this claim seems unsubstantiated as written particularly without acknowledging/mentioning the criteria that would have to be fulfilled and demonstrated for this claim to be true.

    3. Reviewer #2 (Public Review):

      Summary:

      Disruption of the excitatory/inhibitory (E/I) balance has been reported in Autism Spectrum Disorders (ASD), with which PTEN mutations have been associated. Giunti et al choose to explore the impact of PTEN mutations on the balance between E/I signaling using as a platform the C. elegans neuromuscular system where both cholinergic (E) and GABAergic (I) motor neurons regulate muscle contraction and relaxation. Mutations in daf-18/PTEN specifically affect morphologically and functionally the GABAergic (I) system, while leaving the cholinergic (E) system unaffected. The study further reveals that the observed defects in the GABAergic system in daf-18/PTEN mutants are attributed to reduced activity of DAF-16/FOXO during development.

      Moreover, ketogenic diets (KGDs), known for their effectiveness in disorders associated with E/I imbalances such as epilepsy and ASD, are found to induce DAF-16/FOXO during early development. Supplementation with β-hydroxybutyrate in the nematode at early developmental stages proves to be both necessary and sufficient to correct the effects on GABAergic signaling in daf-18/PTEN mutants.

      Strengths:

      The authors combined pharmacological, behavioral, and optogenetic experiments to show the GABAergic signaling impairment at the C. elegans neuromuscular junction in DAF-18/PTEN and DAF-16/FOXO mutants. Moreover, by studying the neuron morphology, they point towards neurodevelopmental defects in the GABAergic motoneurons involved in locomotion. Using the same set of experiments, they demonstrate that a ketogenic diet can rescue the inhibitory defect in the daf-18/PTEN mutant at an early stage.

      Weaknesses:

      The morphological experiments hint towards a pre-synaptic defect to explain the GABAergic signaling impairment, but it would have also been interesting to check the post-synaptic part of the inhibitory neuromuscular junctions such as the GABA receptor clusters to assess if the impairment is only presynaptic or both post and presynaptic.

      Moreover, all observations done at the L4 stage and /or adult stage don't discriminate between the different GABAergic neurons of the ventral nerve cord, ie the DDs which are born embryonically and undergo remodeling at the late L1 stage, and VDs which are born post-embryonically at the end of the L1 stage. Those additional elements would provide information on the mechanism of action of the FOXO pathway and the ketone bodies.

      Conclusion:

      Giunti et al provide fundamental insights into the connection between PTEN mutations and neurodevelopmental defects through DAF-16/FOXO and shed light on the mechanisms through which ketogenic diets positively impact neuronal disorders characterized by E/I imbalances.

    4. Reviewer #3 (Public Review):

      Summary:

      This is a conceptually appealing study by Giunti et al in which the authors identify a role for PTEN/daf-18 and daf-16/FOXO in the development of inhibitory GABA neurons, and then demonstrate that a diet rich in ketone body β-hydroxybutyrate partially suppresses the PTEN mutant phenotypes. The authors use three assays to assess their phenotypes: (1) pharmacological assays (with levamisole and aldicarb); (2) locomotory assays and (3) cell morphological assays. These assays are carefully performed and the article is clearly written. While neurodevelopmental phenotypes had been previously demonstrated for PTEN/daf-18 and daf-16/FOXO (in other neurons), and while KB β-hydroxybutyrate had been previously shown to increase daf-16/FOXO activity (in the context of aging), this study is significant because it demonstrates the importance of KB β-hydroxybutyrate and DAF-16 in the context of neurodevelopment. Conceptually, and to my knowledge, this is the first evidence I have seen of a rescue of a developmental defect with dietary metabolic intervention, linking, in an elegant way, the underpinning genetic mechanisms with novel metabolic pathways that could be used to circumvent the defects.

      Strengths:

      What their data clearly demonstrate, is conceptually appealing, and in my opinion, the biggest contribution of the study is the ability of reverting a neurodevelopmental defect with a dietary intervention that acts upstream or in parallel to DAF-16/FOXO.

      Weaknesses:

      The model shows AKT-1 as an inhibitor of DAF-16, yet their studies show no differences from wildtype in akt-1 and akt-2 mutants. AKT is not a major protein studied in this paper, and it can be removed from the model to avoid confusion, or the result can be discussed in the context of the model to clarify interpretation.

      When testing additional genes in the DAF-18/FOXO pathway, there were no significant differences from wild type in most cases. This should be discussed. Could there be an alternate pathway via DAF-18/DAF-16, excluding the PI3K pathway or are there variations in activity of PI3K genes during a ketogenic diet that are hard to detect with current assays?

      The consequence of SOD-3 expression in the broader context of GABA neurons was not discussed. SOD-3 was also measured in the pharynx but measuring it in neurons would bolster the claims.

      If they want to include AKT-1, seeing its effect on SOD-3 expression could be meaningful to the model.

    1. eLife assessment

      This valuable study presents new observations on white matter organisation at the micron scale, using a combination of synchrotron imaging and diffusion MRI across two species. Notably, the authors provide solid evidence for the fasciculation of axons within major fibre bundles into laminar structures, though these structures are not consistently observed across modalities or species. The study will be of general interest to neuroanatomists and those interested in white matter imaging.

    2. Reviewer #1 (Public Review):

      This study presents valuable observations of white matter organisation from diffusion MRI and two types of synchrotron imaging in both monkeys and mice. Cross-modality comparisons are interesting as the different methods are able to probe anatomical structures at different length scales, from single axons in high-resolution synchrotron (ESRF) imaging, to clusters of axons in lower-resolution synchrotron (DEXY) data, to axon populations at the mm-scale in diffusion MRI. By acquiring all modalities in monkey and mouse ex vivo samples, the authors can observe principles of fibre organisation, and characterise how fibre characteristics, such as tortuosity and micro-dispersion, vary across select brain regions and in healthy tissue versus a demyelination model. The results are solid, though some statements (in the abstract/discussion) do not appear to be fully supported, and statistical tests would help confirm whether tissue characteristics are similar/different between different conditions.

      One very interesting result is the observation of apparent laminar organisation of fibres in ex vivo monkey white matter samples. DESY data from the corpus callosum shows fibres with two dominant orientations (one L-R, one slightly inclined), clustered in laminar structures within this major fibre bundle. Thanks to the authors providing open data, I was able to look through the raw DESY volume and observe regions with different "textures" (different orientations) in the described laminar arrangement. That this organisation can be observed by eye, as well as by structure tensor, is fairly convincing. As not all readers will download the data themselves, the manuscript could benefit from additional figures/videos to demonstrate (1) the quality of the DESY data and (2) a more 3D visualisation of the laminar structures (where the coronal plane shows convincing columnar structure or stripes). Similarly in Figure 5A, though this nicely depicts two populations with different orientations, it is somewhat difficult to see the laminar structure in the current image.

      ESRF data of the centrum semiovale (CS) contributes evidence for similar laminar structures in a crossing fibre region, where primarily AP fibres are shown to cluster in 3 laminar structures. As above, further visualisations of the ESRF volume in the CS (as shown in Figure 4E) would be of value (e.g. showing consistency across the 4 volumes, 2D images showing stripey/columnar patterns along different axes, etc).

      A key limitation of this result is that, though the DESY data from the CC seems convincing, the same structures were not observed in high-resolution synchrotron (ESRF) data of the same tissue sample in the corpus callosum. This seems surprising and the manuscript does not provide a convincing explanation for this inconsistency. The authors argue that this is due to the limited FOV of the ESRF data (~200x200x800 microns). However, the observed laminar structures in DESY are ~40 microns thick, and ERSF data from the CST suggests laminar thicknesses in the range of 5-40 microns with a similar FOV. This suggests the ERSF FOV would be sufficient to capture at least a partial description of the laminar organisation. Further, the DESY data from the CC shows columnar variations along the LR axis, which we might expect to be observed along the long axis of the ESFR volume of the same sample. Additional analyses or explanations to reconcile these apparently conflicting observations would be of value. For example, the authors could consider down-sampling the ESRF data in an appropriate manner to make it more similar to the DESY data, and running the same analysis, to see if the observed differences are related to resolution (i.e. the thinner laminar structures cluster in ways that they look like a thicker laminar structure at lower resolution), or crop the DESY data to the size of the ESRF volume, to test whether the observed differences can be explained by differences in FOV.

      Laminar structures were not observed in mouse data, though it is unclear if this is due to anatomical differences or somewhat related to differences in data quality across species.

      The authors further quantify various other characteristics of the white matter, such as micro-dispersion, tortuosity, and maximum displacement. Notably, the microscopic FA calculated via structure tensor is fairly consistent across regions, though not modalities. When fibre orientations are combined across the sample, they are shown to produce similar FODs to dMRI acquired in the same tissue, which is reassuring. As noted in the text, the estimates of tortuosity and max displacement are dependent on the FOV over which they are calculated. Calculating these metrics over the same FOV, or making them otherwise invariant to FOV, could facilitate more meaningful comparisons across samples and/or modalities.

      Though the results seem solid, some statements, particularly in the abstract and discussion, do not seem to be fully supported by the data. For example, the abstract states "Our findings revealed common principles of fibre organisation in the two species; small axonal fasciculi and major bundles formed laminar structures with varying angles, according to the characteristics of major pathways.", though the results show "no strong indication within the mouse CC of the axonal laminar organisation observed in the monkey". Similarly, the introduction states: "By these means, we demonstrated a new organisational principle of white matter that persists across anatomical length scales and species, which governs the arrangement of axons and axonal fasciculi into sheet-like laminar structures." Further comments on the text are provided below.

      One observation not notably discussed in the paper is that the spherical histograms of Figure 3E/H appear to have an anisotropic spread of the white points about 0,0. It would be interesting if the authors could comment on whether this could be interpreted as the FOD having asymmetric dispersion and if so, whether the axis of dispersion relates to the fibre orientations of the laminar structures.

      A limitation of the study is that it considers only small ex vivo tissue samples from two locations in a single postmortem monkey brain and slightly larger regions of mouse brain tissue. Consequently, further evidence from additional brain regions and subjects would be required to support more generalised statements about white matter organisation across the brain.

      Given the monkey results, the mouse study (section 2.5 onwards) lacks some motivation. In particular, it is unclear why a demyelination model was studied and if/how this would link to the laminar structure observed in the monkey data. Further, it is unclear how comparable tortuosity/max deviation values are across species, considering the differences in data quality and relative resolution, given that the presented results show these values are very modality-dependent.

      The paper introduces a new method of "scale-space" parameters for structure tensors. Since, to my understanding, this is the first description of the method, some simple validation of the method would be welcomed. Further, the same scale parameters are not used across monkeys and mice, with a larger kernel used in mice (Table 2) which is surprising given their smaller brain size. Some explanation would be helpful.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this work, the authors combine diffusion MRI and high-resolution x-ray synchrotron phase-contrast imaging in monkey and mouse brains to investigate the 3D organization of brain white matter across different scales and species. The work is at the forefront of the anatomical investigation of the human connectome and aligns with several current efforts to bridge the resolution gap between what we can see in vivo at the millimeter scale and the complexity of the human brain at the sub-micron scale. The authors compare the 3D white matter organization across modalities within 2 small regions in one monkey brain (body of the corpus callosum, centrum semiovale) and within one region (splenium of the corpus callosum) in healthy mice and in one murine model of focal demyelination. The study compares measures of tissue anisotropy and fiber orientations across modalities, performs a qualitative comparison of fasciculi trajectories across brain regions and tissue conditions using streamlined tractography based on the structure tensor, and attempts to quantify the shape of fasciculi trajectories by measuring the tortuosity index and the maximum deviation for each reconstructed streamline. Results show measures of anisotropy and fiber orientations largely agree across modalities, especially for larger FOV data. The high-resolution data allows us to explore the fiber trajectories in relation to tissue complexity and pathology. The authors claim the study reveals new common organization principles of white matter fibers across species and scales, for which axonal fasciculi arrange into sheet-like laminar structures.

      Strengths:

      The aim of the study is of central importance within present efforts to bridge the gap between macroscopic structures observable in vivo in humans using conventional diffusion MRI and the microscopic organization of white matter tissue. Results obtained from this type of study are important to interpret data obtained in vivo, inform the development of novel methodologies, and expand our knowledge of the structural and thus functional organization of brain circuits.

      Multi-scale data acquired across modalities within the same sample constitute extremely valuable data that is often hard to acquire and represent a precious resource for validation of both diffusion MRI tractography and microstructure methods.

      The inclusion of multi-species data adds value to the study, allowing the exploration of common organization principles across species.

      The addition of data from a murine cuprizone model of focal demyelination adds interesting opportunities to study the underlying biological changes that follow demyelination and how these impact tissue anisotropy and fiber trajectories. These data can inform the interpretation and development of diffusion MRI microstructure models.

      Weaknesses:

      The main claim of a newly discovered laminar organization principle that is consistent across scales and species is not supported strongly enough by the data. The main evidence in support of the claim comes from the larger FOV data obtained from the body of the corpus callosum in the monkey brain. A laminar organization principle is partially shown in the centrum semiovale in the monkey brain and it is not shown in mice data. Additionally, the methods lack details to help the correct interpretation of these findings (e.g., how were these fasciculi defined?; how well do they represent different axonal populations?; what is the effect of blood vessels on the structure tensor reconstruction?; how was laminar separation quantified?) and the discussion does not provide a biological background for this organization. The corpus callosum sample suggests axons within a bundle of fibers are organized in a sheet-like fashion, while data from the centrum semiovale suggest fibers belonging to different fiber bundles are organized in a sheet-like arrangement. While I acknowledge the challenges in acquiring such high-resolution data, additional samples from different regions in the same animals and from different animals would help strengthen this claim.

      The main goal of the study is to bridge the organization of white matter across anatomical length scales and species. However, given the substantial difference in FOVs between the two imaging modalities used, and the absence of intermediate-resolution data, it remains difficult to effectively understand how these results can be used to inform conventional diffusion MRI. In this sense, the introduction does not do a good enough job of building a strong motivation for the scientific questions the authors are trying to answer with these experiments and for the specific methodology used.

      The cuprizone data represent a unique opportunity to explore the effect of demyelination on white matter tissue. However, this specific part of the study is not well motivated in the introduction and seems to represent a missed opportunity for further exploration of the qualitative and quantitative relationship between diffusion MRI and sub-micron tissue information (although unfortunately not within the same brain sample). This is especially true considering the diffusion MRI protocol for mice would allow extrapolation of advanced measures from different tissue compartments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper represents important findings when identifying untargeted metabolomics and its differences between metabolomes of different biological samples. GromovMatcher is the fantasy name for the soft development. The main idea behind it is built on the assumption of featuring and matching complex datasets. Although the manuscript reflects a solid analysis, it remains incomplete for validation with putative non-curated datasets.

      We are grateful to the eLife editor for taking the time and effort to assess our manuscript.

      We are however unsure of what the editor means by “it remains incomplete for validation with putative non-curated datasets”. As noted by Reviewer 2, manually curated datasets that could be used for validation are scarce. Most publicly available datasets do not contain sufficient information to establish a ground truth matching on which GromovMatcher, M2S, or metabCombiner can be tested. Even in the case where such a ground truth matching can be established, it must be performed by-hand through a manual matching process which is extremely time-consuming and requires very specific expertise. This, in our opinion, only highlights the need for automatic alignment methods such as metabCombiner, M2S or GromovMatcher.

      We do agree that the performance of GromovMatcher (and its competitors) needs to be validated further, and we plan to continue validating GromovMatcher as additional data becomes available in EPIC and other cohorts. With that in mind, the lack of publicly available validation data is the reason why we conducted such an extensive simulation study, arguably more comprehensive than previous validations, exploring challenging settings that we believe reflect real-life scenarios (main text “Validation on ground-truth data” and Appendix 3). We would like to stress that this allows us to highlight previously ignored limitations of the previously published methods, metabCombiner and M2S.

      We wish to thank the editor and reviewers for their time and efforts in reviewing our manuscript which led to many significant additions to our paper. Namely we:

      • Performed an additional sensitivity analysis (Appendix 3) exploring how an imbalance in the number of features or samples between two studies being matched (e.g. the dataset split), affects the quality of matchings found by GromovMatcher, metabCombiner, and M2S.

      • Investigated how changing or removing the reference dataset (Appendix 5) in the EPIC study (main text “Application to EPIC data”), affects the results of GromovMatcher.

      • Improved alignment matrix visualizations in Fig. 3a for all four methods tested on the validation data, to highlight more clearly which feature matches were correctly identified or missed.

      The revised paper is uploaded as the file “main_elife_revision.pdf” where all revisions are highlighted in blue as well as a copy “main_elife_revision_nohighlights.pdf” where revisions are not highlighted.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors have implemented the Optimal Transport algorithm in GromovMatcher for comparing LC/MS features from different datasets. This paper gains significance in the proteomics field for performing meta-analysis of LC/MS data.

      Strengths:

      The main strength is that GromovMatcher achieves significant performance metrics compared to other existing methods. The authors have done extensive comparisons to claim that GromovMatcher performs well.

      Weaknesses:

      There are two weaknesses.

      (1) When the number of features is reduced the precision drops to ~0.8.

      We would like to clarify that this drop in precision occurs in the challenging setting where only a small proportion of metabolites are shared between both datasets (e.g., the overlap – or proportion of shared features - was 25% in our simulation study). When two untargeted metabolic datasets share only 25% of their features, this is a challenging setting for any automated matching method as the vast majority 75% of the features in both datasets must remain unmatched.

      In such settings, the reviewer correctly observes that the precision of GromovMatcher algorithms (GM and GMT) drops within the range of 0.80 - 0.85 (Figure 3b, top left panel). Such a precision of 0.8 or larger is still competitive compared with the alternative methods MetabCombiner (mC) and M2S whose precisions drop below 0.8 (see main text Fig. 3b, top left panel).

      Precision is measured as the number of metabolite pairs correctly matched divided by all matches identified by a method. In other words, even in the challenging setting when the number of shared features (true matches) between both datasets is small (e.g. low 25% overlap), upwards of 80% of the feature matches found by GromovMatcher are correct which is a very encouraging result.

      (2) How applicable is the method for other non-human datasets?

      We thank the reviewer for raising this question. The crux of the matter concerning the application to animal data revolves around the hypothesis that correlations between metabolites in two different studies are preserved. Theoretically, the metabolome operates under similar principles in humans, governed by an underlying network of biochemical reactions. Consequently, in comparable human populations, the GM hypothesis is likely to hold to some extent.

      However, in practice, application to animal data is more complicated. Animal studies tend to have smaller sample sizes and often stem from intervention-driven scenarios, such as mice subjected to specific diets or chemicals. This results in deliberate alterations in metabolic structures which makes finding two comparable animal studies less likely. To investigate the reviewer’s question, we have searched through the two predominant LC-MS dataset repositories (MetaboLights and NIH Metabolomics Workbench) but did not find any pairs of comparable animal studies due to the reasons mentioned above. One potential strategy to navigate this issue could entail regressing the metabolic intensities against the variables that notably differ between the two animal populations and running GM using the residual intensities. This would be an interesting direction for future research and additional validation would be needed to test the robustness of GM in this setting.

      Reviewer #2 (Public Review):

      Summary:

      The goal of untargeted metabolomics is to identify differences between metabolomes of different biological samples. Untargeted metabolomics identifies features with specific mass-to-charge ratio (m/z) and retention time (RT). Matching those to specific metabolites based on the model compounds from databases is laborious and not always possible, which is why methods for comparing samples on the level of unmatched features are crucial.

      The main purpose of the GromovMatcher method presented here is to merge and compare untargeted metabolomes from different experiments. These larger datasets could then be used to advance biological analyses, for example, for the identification of metabolic disease markers. The main problem that complicates merging different experiments is m/z and RT vary slightly for the same feature (metabolite).

      The main idea behind the GromovMatcher is built on the assumption that if two features match between two datasets (that feature I from dataset 1 matches feature j from dataset 2, and feature k from dataset 1 matches feature l from dataset 2), then the correlations or distances between the two features within each of the datasets (i and k, and j and l) will be similar. The authors then use the Gromov-Wasserstein method to find the best matches matrix from these data.

      The variation in m/z between the same features in different experiments is a user-defined value and it is initially set to 0.01 ppm. There is no clear limit for RT deviations, so the method estimates a non-linear deviation (drift) of RT between two studies. GromovMatcher estimates the drift between the two studies and then discards the matching pairs where the drift would deviate significantly from the estimate. It learns the drift from a weighted spline regression.

      The authors validate the’performance of their GromovMatcher method by a validation experiment using a dataset of cord blood. They use 20 different splits and compare the GromovMatcher (both its GM and GMT iterations, whereby the GMT version uses the deviation from estimated RT drift to filter the matching matrix) with two other matching methods: M2S and metabCombiner.

      The second validation was done using a (scaled and centered) dataset of metabolics from cancer datasets from the EPIC cohort that was manually matched by an expert. This dataset was also used to show that using automatic methods can identify more features that are associated with a particular group of samples than what was found by manual matching. Specifically, the authors identify additional features connected to alcohol consumption.

      Strengths:

      I see the main strength of this work in its combination of all levels of information (m/z, RT, and higher-order information on correlations between features) and using each of the types of information in a way that is appropriate for the measure. The most innovative aspect is using the Gromov-Wasserstein method to match the features based on distance matrices.

      We thank the reviewer for acknowledging this strength of our proposed GromovMatcher method.

      The authors of the paper identify two main shortcomings with previously established methods that attempt to match features from different experiments: a) all other methods require fine-tuning of user-defined parameters, and, more importantly, b) do not consider correlations between features. The main strength of the GromovMatcher is that it incorporates the information on distances between the features (in addition to also using m/z and RT).

      Weaknesses:

      The first, minor, weakness I could identify is that there seem not to be plenty of manually curated datasets that could be used for validation.

      We thank the reviewer for raising this issue concerning manually curated validation data.

      Manually curated datasets available for validation purposes are indeed scarce. This stems from the laborious nature of matching features across diverse studies, hence the need for automatic matching methods. Our future strategy involves further validation of the GromovMatcher approach as more data becomes accessible in EPIC and other cohorts.

      The scarcity of real-life publicly available datasets that can be used for validation purpose is the reason why we conducted an extensive simulation study (main text “Validation on ground-truth data” and Appendix 3). It is notably thorough, arguably more comprehensive than previous validations, utilizes real-life untargeted data, and imitates situations where data originates from distinct untargeted metabolomics studies, complete with realistic noise parameters encompassing RT, mz, and feature intensities. Our validation study comprehensively explores the performance of GromovMatcher, M2S, and metabCombiner, including in challenging realistic settings where there is a nonlinear drift in retention times, varying levels of feature overlaps between studies, normalizations of feature intensities, as well as imbalances in the number of features and samples present in the studies being matched.

      The second is also emphasized by the authors in the discussion. Namely, the method as it is set up now can be directly used only to compare two datasets.

      This is indeed a limitation that is common to all three methods considered in this paper. However, all these methods, GromovMatcher, M2S, and metabCombiner, can still be used to compare and pool multiple datasets using a multi-step procedure. Namely, this can be done by designating a 'reference' dataset and aligning all studies to it one by one. We take this exact approach in our paper when aligning the CS, HCC, and PC studies of the EPIC data in positive mode (main text “Application to EPIC data”). Namely, the HCC and PC studies are both aligned to the CS study by running GromovMatcher twice, and after obtaining these matchings, our analysis is restricted to those features in HCC and PC that are present in the CS study.

      After the reviewer’s comment, we have added an additional sensitivity analysis in Appendix 5, to compare the results produced by GromovMatcher depending on the choice of the reference study. Namely, setting the reference study to either the CS study or the HCC study, GromovMatcher identified 706 and 708 common features respectively, with an overlap of 640 features. This highlights that the choice of reference does matter to some extent. In our original analysis of the EPIC data, choosing CS as the reference was motivated by the fact that CS had the largest sample size (compared to HCC and PC) and a subset of features in HCC and PC were already matched by experts to the CS study which we could use for validation (see Loftfield et al. (2021). J Natl Cancer Inst.).

      As mentioned in the discussion section of our manuscript, the recently proposed multimarginal Gromov-Wasserstein algorithm (Beier, F., Beinert, R., & Steidl, G. (2023). Information and Inference) could potentially allow multiple metabolomic studies to be matched using one optimization routine (e.g. without the designation of a ‘reference study’ for matching). We have not explored this possibility in depth yet as fast numerical methods for multimarginal GW are still in their infancy. Also, such multimarginal methods rely on the computation and storage of coupling or matching matrices that are tensors where the number of dimensions is equal to the number of datasets being matched. Therefore, multimarginal methods have large memory costs, which currently precludes their application for the matching of multiple metabolomics datasets.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was struggling with the representation used in Figure 3a. The gray points overlayed over the green points on a straight line are difficult to visually quantify. I found that my eyes mainly focused on the pattern of the red dots.

      Figure 3a has been modified to improve visual clarity. Namely we have consistently reordered the rows and columns of the coupling matrices such that the true positive matches (green points) are spatially separated from the false negative matches (red points). Now the fraction of true positive and false negative matches can be appreciated much more clearly by eye in Figure 3a.

      (2) I would also like to add the caveat that I cannot judge whether the authors used the other two methods that they compare with GromovMatcher (the M2S and metabCombiner) optimally. But I also do not see any evidence that they did not. Hopefully one of the other reviewers can address that.

      We appreciate the reviewer for highlighting the comparison of our approach GromovMatcher to the other existing methods M2S and MetabCombiner (mC). Both M2S and mC depend on tens of hyperparameters each with a discrete or continuous set of values that must be properly optimized to infer accurate matchings between dataset features. We detail in Appendix 2 how the hyperparameters of the M2S and mC methods are optimally tuned to achieve the best possible performance on the validation ground-truth data. Namely, both in the simulation study and on EPIC data, we grid-search over all important hyperparameters in the M2S and mC methods and choose those parameter combinations that result in the highest F1 score, averaged over 20 random trials. We remark that no such hyperparameter optimization was performed for our GromovMatcher method. As shown in Figures 3 and 4 of the main text, we find that GromovMatcher outperforms M2S and mC even in these cases when the hyperparameters of M2S and mC are tuned to predict optimal feature matchings.

      Given the large combinatorial space of hyperparameter choices, we believe we have thoroughly tested the important hyperparameter combinations that users of M2S and mC would be likely to explore in their own research.

      (3) Validation

      (3a) The first validation is done on a split cord blood dataset. I could not clearly see from the paper how sensitive the result is to the dataset split.

      We are grateful for the reviewer’s question and have included new experiments in Appendix 3 which show how the results of GromovMatcher, M2S, and MetabCombiner are affected by the dataset split. In our original manuscript, our validation ground-truth experiment began with an untargeted metabolomic dataset consisting of n = 499 samples and p = 4,712 metabolic features which is split equally into two datasets consisting of an equal number of samples n1 = n2 and an equal number of metabolic features p1 = p2. The features of these equal-sized datasets would then be matched by our method.

      Now in Appendix 3 (Figs. 1-3) we show the sensitivity of all three alignment methods (GromovMatcher, M2S, and MetabCombiner) when we vary the fraction of samples in dataset 1 over dataset 2 given by n1/ n2, the overlap in shared features between both datasets, and the fraction of metabolic features in dataset 1 that are not present in dataset 2 which affects the feature sizes of both datasets p1/ p2. We find that all alignment methods are able to maintain a consistent precision and recall score when these three dataset split parameters are varied. GromovMatcher achieves a higher precision and recall than M2S and MetabCombiner for all choices of dataset split, agreeing with the validation experiment results from the main text (see main text Fig. 3). All three methods tested decrease in precision (without dropping in recall) when dataset 1 and dataset 2 contain an equal number of unshared features (e.g. when p1 = p2). Therefore, these sensitivity experiments in Appendix 3 show that our results in the main text are performed in the most challenging setting for the dataset split.

      (3b) The second validation was done using a (scaled and centered) dataset of metabolics from cancer datasets from the EPIC cohort that was manually matched by an expert. Here the authors observe that metabCombiner has good precision, but lags in recall. And M2S has a very similar performance to GromovMatcher. The authors explain this by the fact that the drift in RT between the two experiments is mostly linear and thus does not affect the M2S performance. Can the authors find a different validation dataset where the drift in RT is not linear? If yes, it would be interesting to add it to the paper.

      We thank the reviewer for raising this question. As mentioned above, curated validation datasets such as the EPIC study analyzed in our paper are very rare and we do not currently have a validation study with a nonlinear retention time drift.

      Nevertheless, we performed an additional analysis of simulated data (reported in Appendix 2 – “M2S hyperparameter experiments” and Appendix 2 – Table 1) that demonstrates the decrease in M2S performance when the simulated drift is nonlinear. As presented in Appendix 2 – Table 1, in a low overlap setting with a linear drift which corresponds to the EPIC data, precision and recall were 0.831 and 0.934 respectively, instead of 0.769 and 0.905 in the main analysis where the drift was nonlinear.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study reports a novel mechanism linking DHODH inhibition-mediated pyrimidine nucleotide depletion to antigen presentation. Alternative means of inducing antigen presentation provide therapeutic opportunities to augment immune checkpoint blockade for cancer treatment. While the solid mechanistic data in vitro are compelling, in vivo assessments of the functional relevance of this mechanism are still incomplete.

      Public Reviews:

      We thank all Reviewers for their insightful comments and excellent suggestions.

      Reviewer #1 (Public Review):

      The manuscript by Mullen et al. investigated the gene expression changes in cancer cells treated with the DHODH inhibitor brequinar (BQ), to explore the therapeutic vulnerabilities induced by DHODH inhibition. The study found that BQ treatment causes upregulation of antigen presentation pathway (APP) genes and cell surface MHC class I expression, mechanistically which is mediated by the CDK9/PTEFb pathway triggered by pyrimidine nucleotide depletion.

      No comment from authors

      The combination of BQ and immune checkpoint therapy demonstrated a synergistic (or additive) anti-cancer effect against xenografted melanoma, suggesting the potential use of BQ and immune checkpoint blockade as a combination therapy in clinical therapeutics.

      No comment from authors

      The interesting findings in the present study include demonstrating a novel cellular response in cancer cells induced by DHODH inhibition. However, whether the increased antigen presentation by DHODH inhibition actually contributed to the potentiation of the efficacy of immune-check blockade (ICB) is not directly examined is the limitation of the study.

      No comment from authors for preceding text, comment addresses the following text

      Moreover, the mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways.

      We appreciate this comment, and we would like to explain why we did not pursue these approaches. According to DepMap, CRISPR/Cas9-mediated knockout of CDK9 in cancer cell lines is almost universally deleterious, scoring as “essential” in 99.8% (1093/1095) of all cell lines tested (see Author response image 1 below). This makes sense, as P-TEFb is required for productive RNA polymerase II elongation of most mammalian genes. As such, it was not feasible to generate cell lines with stable genetic knockout of CDK9 to test our hypothesis.

      While knockdown of CDK9 by RNA interference could support our results, DepMap data seems to indicate that RNAi-mediated knockdown of CDK9 is generally ineffective in silencing its activity, as this perturbation scored as “essential” in only 6.2% (44/710) of tested cell lines. This suggests that incomplete depletion of CDK9 will likely not be sufficient to block APP induction downstream of nucleotide depletion. Furthermore, RNAi-mediated depletion of CDK9 may trigger transcriptional changes in the cell by virtue of its many documented protein-protein interactions, and it would be difficult to establish a consistent “time zero” at which point CDK9 protein depletion is substantial but secondary effects of this have not yet occurred to a significant degree. These factors constitute major limitations of experiments using RNAi-mediated knockdown of CDK9.

      Author response image 1.

      Essentiality score from CRISPR and RNAi perturbation of CDK9 in cancer cell lines https://depmap.org/portal/gene/CDK9?tab=overview&dependency=RNAi_merged

      At any rate, we provide evidence that three different inhibitors of CDK9 (flavopiridol, dinaciclib, and AT7519) all inhibit our effect of interest (Fig 4B). The same results were observed using a previously validated CDK9-directed proteolysis targeting chimera (PROTAC2), and this was reversed by addition of excess pomalidomide (Fig 4C), which correlated with the presence/absence of CDK9 on western blot under the exact same conditions (Fig 4D).

      It is formally possible that all CDK9 inhibitors we tested are blocking BQ-mediated APP induction by some shared off-target mechanism (or perhaps by two or more different off-target mechanisms) AND this CDK9-independent target also happens to be degraded by PROTAC2. However, this would be an extraordinarily non-parsimonious explanation for our results, and so we contend that we have provided compelling evidence for the requirement of CDK9 for BQ-mediated APP induction.

      Finally, high concentrations of BQ have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, and the authors should discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      We are intrigued by the results shown to us by Reviewer #1 in the linked preprint (Mishima et al 2022, https://doi.org/10.21203/rs.3.rs-2190326/v1). We have also observed in our unpublished data that very high concentrations of BQ (>150µM) cause loss of cell viability that is not rescued by uridine supplementation and that occurs even in DHODH knockout cells. This effect of high-dose BQ must be DHODH-independent. We also agree that Mishima et al provide compelling evidence that the ferroptosis-sensitizing effect of high-dose BQ treatment is due (at least in large part) to inhibition of FSP1.

      Although we showed that DHODH is strongly inhibited in tumor cells in vivo (Fig 5C), we did not directly measure the concentration of BQ in the tumor or plasma. Sykes et al (PMID: 27641501) found that the maximum plasma concentration (Cmax) for [BQ]free following a single IP administration in C57Bl6/J mice (15mg/kg) is approximately 3µM, while the Cmax for [BQ]total was around 215µM. Because polar drug molecules bound to serum proteins (predominantly albumin) are not available to bind other targets, [BQ]free is the relevant parameter.

      Given a Cmax for [BQ]free of 3µM and half-life of 12.0 hours, we estimate that the steady-state [BQ]free with daily IP injections at this dose is around 4µM. Since we used an administration schedule of 10mg/kg every 24 hours, we estimate that the steady-state plasma [BQ]free in our system was 2.67µM (assuming initial Cmax of 2µM and half-life of 12.0 hours).

      To derive an upper-bound estimate for the Cmax of [BQ]free over the 12-day treatment period (Fig 5A-D), we will use the observed data for 15mg/kg dose, and we will assume that 1) there is no clearance of BQ whatsoever and 2) that [BQ]free increases linearly with increasing [BQ]total. This yields a maximum free BQ concentration of 12 x 3 = 36µM.

      Therefore, we consider it very unlikely that plasma concentrations of free BQ in our experiment exceeded the lower limit of the ferroptosis-sensitizing dose range reported by Mishima et al. However, without direct pharmacokinetic analysis, we cannot say for sure what the maximal [BQ]free was under our experimental conditions.

      Reviewer #2 (Public Review):

      In their manuscript entitled "DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation", Mullen et al. describe an interesting mechanism of inducing antigen presentation. The manuscript includes a series of experiments that demonstrate that blockade of pyrimidine synthesis with DHODH inhibitors (i.e. brequinar (BQ)) stimulates the expression of genes involved in antigen presentation. The authors provide evidence that BQ mediated induction of MHC is independent of interferon signaling. A subsequent targeted chemical screen yielded evidence that CDK9 is the critical downstream mediator that induces RNA Pol II pause release on antigen presentation genes to increase expression. Finally, the authors demonstrate that BQ elicits strong anti-tumor activity in vivo in syngeneic models, and that combination of BQ with immune checkpoint blockade (ICB) results in significant lifespan extension in the B16-F10 melanoma model. Overall, the manuscript uncovers an interesting and unexpected mechanism that influences antigen presentation and provides an avenue for pharmacological manipulation of MHC genes, which is therapeutically relevant in many cancers. However, a few key experiments are needed to ensure that the proposed mechanism is indeed functional in vivo.

      The combination of DHODH inhibition with ICB reflects more of an additive response instead of a synergistic combination. Moreover, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. To confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition, the authors should examine whether depletion of immune cells can reduce the therapeutic efficacy of BQ in vivo.

      We concur with this assessment.

      Moreover, they should examine whether BQ treatment induces antigen presentation in non-malignant cells and APCs to determine the cancer specificity.

      Although we showed that this occurs in HEK-293T cells, we appreciate that this cell line is not representative of human cells of any organ system in vivo. So, we agree it is important to determine if DHODH inhibition induces antigen presentation in human tissues and professional antigen presenting cells, and this is an excellent focus for future studies.

      However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline (i.e. even in the absence of DHODH inhibitor treatment), since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.

      Finally, although the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level, only MHC-I is validated by flow cytometry given the importance of MHC-II expression on epithelial cancers, including melanoma, MHC-II should be validated as well.

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Overall, the paper is clearly written and presented. With the additional experiments described above, especially in vivo, this manuscript would provide a strong contribution to the field of antigen presentation in cancer. The distinct mechanisms by which DHODH inhibition induces antigen presentation will also set the stage for future exploration into alternative methods of antigen induction.

      Reviewer #3 (Public Review):

      Mullen et al present an important study describing how DHODH inhibition enhances efficacy of immune checkpoint blockade by increasing cell surface expression of MHC I in cancer cells. DHODH inhibitors have been used in the clinic for many years to treat patients with rheumatoid arthritis and there has been a growing interest in repurposing these inhibitors as anti-cancer drugs. In this manuscript, the Singh group build on their previous work defining combinatorial strategies with DHODH inhibitors to improve efficacy. The authors identify an increase in expression of genes involved in the antigen presentation pathway and MHC I after BQ treatment and they narrow the mechanism to be strictly pyrimidine and CDK9/P-TEFb dependent. The authors rationalize that increased MHC I expression induced by DHODH inhibition might favor efficacy of dual immune checkpoint blockade. This combinatorial treatment prolonged survival in an immunocompetent B16F10 melanoma model.

      [No comment from authors]

      Previous studies have shown that DHODH inhibitors can increase expression of innate immunity-related genes but the role of DHODH and pyrimidine nucleotides in antigen presentation has not been previously reported. A strength of the manuscript is the use of multiple controls across a panel of cell lines to exclude off-target effects and to confirm that effects are exclusively dependent on pyrimidine depletion. Overall, the authors do a thorough characterization of the mechanism that mediates MHC I upregulation using multiple strategies. Furthermore, the in vivo studies provide solid evidence for combining DHODH inhibitors with immune checkpoint blockade.

      No comment from authors

      However, despite the use of multiple cell lines, most experiments are only performed in one cell line, and it is hard to understand why particular gene sets, cell lines or time points are selected for each experiment. It would be beneficial to standardize experimental conditions and confirm the most relevant findings in multiple cell lines.

      We appreciate this comment, and we understand how the use of various cell lines may seem puzzling. We would like to explain how our cell line panel evolved over the course of the study. Our first indication that BQ caused APP upregulation came from transcriptomics experiments (Figs 1A-D, S1A) performed as part of a previous study investigating BQ resistance (Mullen et al, 2023 Cancer Letters). In that study, we used CFPAC-1 as a model for BQ sensitivity and S2-013 as a model for BQ resistance. We did RNA sequencing +/- BQ in these cell lines to look for gene expression patterns that might underlie resistance/sensitivity to BQ. When analyzing this data, we serendipitously discovered the APP/MHC phenomenon, which gave rise to the present study.

      Our next step was to extend these findings to cancer cell lines of other histologies, and we prioritized cell lines derived from common cancer types for which immunotherapy (specifically ICB) are clinically approved. This is why A549 (lung adenocarcinoma), HCT116 (colorectal adenocarcinoma), A375 (cutaneous melanoma), and MDA-MB-231 (triple-negative breast cancer) cell lines were introduced.

      Because PDAC is considered to have an especially “immune-cold” tumor microenvironment, we reasoned that even dramatically increasing cancer cell antigen presentation may be insufficient to elicit an effective anti-tumor immune response in vivo. So we shifted our focus towards melanoma, because a subset of melanoma patients is very responsive to ICB and loss of antigen presentation (by direct silencing or homozygous loss-of-function mutations in MHC-I components such as B2M, or by functional loss of IFN-JAK1/2-STAT signaling) has been shown to mediate ICB resistance in human melanoma patients. This is why we extended our findings to B16F10 murine melanoma cells, intending to use them for in vivo studies with syngeneic immunocompetent recipient mice.

      The PDAC cell line MiaPaCa2 was introduced because a collaborator at our institution (Amar Natarajan) happened to have IKK2 knockout MiaPaCa2 cells, which allowed us to genetically validate our inhibitor results showing that IKK1 and IKK2 (crucial effectors for NF-kB signaling) are dispensable for our effect of interest.

      Ultimately, realizing that our results spanned various human and murine cell lines, we chose to use HEK-293T cells to validate the general applicability of our findings to proliferating cells in 2D culture, since HEK-293T cells (compared to our cancer cell lines) have relatively few genetic idiosyncrasies and express MHC-I at baseline.

      The differential in vivo survival depending on dosing schedule is interesting. However, this section could be strengthened with a more thorough evaluation of the tumors at endpoint.

      Overall, this is an interesting manuscript proposing a mechanistic link between pyrimidine depletion and MHC I expression and a novel therapeutic strategy combining DHODH inhibitors with dual checkpoint blockade. These results might be relevant for the clinical development of DHODH inhibitors in the treatment of solid tumors, a setting where these inhibitors have not shown optimal efficacy yet.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The main issue is that it did not directly examine whether the increased antigen presentation by DHODH inhibition contributed to the potentiation of the efficacy of immune-check blockade (ICB). The additional effect of BQ in the xenograft tumor study was not examined to determine if it was due to increased antigen presentation toward the cancer cells or due to merely cell cycle arrest effect by pyrimidine depletion in the tumor cells. The different administration timing of ICB with BQ treatment (Fig 5E) would not be sufficient to answer this issue.

      We agree with this assessment and, and we believe the experiment proposed by Reviewer #2 below (comparing the efficacy of BQ in Rag-null versus immunocompetent recipients) would address this question directly. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Additionally, in the in vivo study, the increase in surface MHC1 in the protein level in by BQ treatment was not examined in the tumor samples, and it was not confirmed whether increased antigen presentation by BQ treatment actually promoted an anti-cancer immune response in immune cells. To support the story presented in the study, these data would be necessary.

      We attempted to show this by immunohistochemistry, but unfortunately the anti-H2-Db antibody that we obtained for this purpose did not have satisfactory performance to assess this in our tissue samples harvested at necropsy.

      (3) The mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways. In general, results only by the inhibitor assay have a limitation of off-target effects.

      Please see our above reply to Reviewer #1 comment making this same point, where we spell out our rationale for not pursuing these experiments.

      (4) High concentrations of BQ (> 50 uM) have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, an iron-mediated lipid peroxidation-dependent cell death, independent of DHODH inhibition (https://www.researchsquare.com/article/rs-2190326/v1). It would be needed to discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      Please see our above reply to Reviewer #1 comment making this same point, where we explain why we are very confident that the BQ dose administered in our animal experiments was far below the minimum reported BQ dose required to sensitize cancer cells to ferroptosis in vitro.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) According to the proposed model, BQ mediated induction of antigen presentation is a contributing factor to the efficacy of this therapeutic strategy. If this is true, then depletion of immune cells should reduce the therapeutic efficacy of BQ in vivo. The authors should perform the B16-F10 transplant experiments in either Rag null mice (if available) or with CD8/CD4 depletion. The expectation would be that T cell depletion (or MHC loss with genetic manipulation) should reduce the efficacy of BQ treatment. Absent this critical experiment, it is difficult to confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition.

      We agree with this assessment and the proposed experiment comparing the response in Rag-null versus immunocompetent recipients. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Does BQ treatment induce antigen presentation in non-malignant cells? APCs? If the induction of antigen presentation is not cancer specific and related to a pyrimidine depletion stress response, then there is a possibility that healthy tissues will also exhibit a similar phenotype, raising concerns about the specificity of a de novo immune response. The authors should examine antigen presentation genes in healthy tissues treated with BQ.

      We agree it is important to examine if our findings regarding nucleotide depletion and antigen presentation are true of APCs and other non-transformed cells, but we are not so concerned about the possibility of raising an immune response against non-malignant host tissues, as explained above. We have reproduced the relevant section below:

      “However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline, since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.”

      (3) In the title, the authors claim that DHODH enhances the efficacy of ICB. However, the experiment shown in Figure 5D does not demonstrate this. The Kaplan Meier curves reflect more of an additive response versus a synergistic combination. Furthermore, the concurrent treatment of BQ and ICB seems to inhibit the efficacy of ICB due to BQ toxicity in immune cells. This result seems to contradict the title.

      We do not agree with this assessment. Given that the effect of dual ICB alone was very marginal, while the effect of BQ monotherapy was quite marked, we cannot conclude from Fig 5 that BQ treatment inhibited ICB efficacy due to immune suppression.

      (4) Related to Point 3, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. One explanation for the results is that BQ treatment reduces tumor burden, and then a subsequent course of ICB also reduces tumor burden but not that the two therapies are functioning in synergy. To address this, the authors should measure the duration of BQ mediated induction of antigen presentation after stopping treatment.

      We agree that the alternative explanation proposed by Reviewer #2 is possible and we appreciate the suggestion to test the stability of APP induction after stopping BQ treatment.

      (5) In Figure 1, the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level. However, they only validate MHC-I by flow cytometry. A simple experiment to evaluate the effect of BQ treatment on MHC-II surface expression would provide important additional mechanistic insight into the immunomodulatory effects of DHODH inhibition, especially given recent literature reinforcing the importance of MHC-II expression on epithelial cancers, including melanoma (Oliveira et al. Nature 2022).

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Minor Points

      (1) The authors show ChIP-seq tracks from Tan et al. for HLA-B. However, given the pervasive effect of Ter treatment across many HLA genes, the authors should either show tracks at additional loci, or provide a heatmap of read density across more loci. This would substantiate the mechanistic claim that RNA Pol II occupancy and activity across antigen presentation genes is the major driver of response to DHODH inhibition as opposed to mRNA stabilization/increased translation.

      We appreciate this suggestion. We have changed Fig 4 by replacing the HLA-B track (old Fig 4E) with a representation of fold change (Ter/DMSO) in Pol II occupancy versus fold change (Ter/DMSO) in mRNA abundance for 23 relevant genes (new Fig 4G); both of these datasets were obtained from the Tan et al manuscript. This new figure panel (Fig 4G) also shows linear regression analysis demonstrating that Pol II occupancy and mRNA expression are significantly correlated for APP genes. While we recognize that this data in itself is not formal proof of our hypothesis, it does strongly support the notion that increased transcription is responsible for the increased mRNA abundance of APP genes that we have observed.

      (2) A compelling way to demonstrate a change in antigen presentation is through mass spectrometry based immunopeptidomics. Performing immunopeptidomic analysis of BQ treated cell lines would provide substantial mechanistic insight into the outcome of BQ treatment. While this approach may be outside the scope of the current work, the authors should speculate on how this treatment may specifically alter the antigenic landscape where future directions would include empirical immunopeptidomics measurements.

      We fully agree with this comment. While the abundance of cancer cell surface MHC-I is an important factor for anticancer immunity, another crucial factor is the identity of peptides that are presented. Treatments that cause presentation of more immunogenic peptides can enhance T-cell recognition even in the absence of a relative change in cell surface MHC-I abundance.

      While we did not perform the immunopeptidomics experiments described, we can offer some speculation regarding this comment. As shown in Fig 1D-E, transcriptomics experiments suggest that immunoproteasome subunits (PSMB8, PSMB9, PSMB10) are upregulated upon DHODH inhibition. If this change in mRNA levels translates into greater immunoproteasome activity (which was not tested in our study), this would be expected to alter the repertoire of peptides available for presentation and could thereby change the immunopeptidome.

      However, this hypothesis requires direct testing, and we hope future studies will delineate the effects of DHODH inhibition and other cancer therapies on the immunopeptidome, as this area of research will have important clinical implications.

      (3) While the signaling through CDK9 seems convincing, it still does not provide a mechanistic link between depleted pyrimidines and CDK9 activity. The authors should speculate on the mechanism that signals to CDK9.

      We agree with the assessment. A mechanistic link between depleted pyrimidines and CDK9 activity will be a subject of future studies.

      (4) Related to minor point 2, the authors should consider a genetic approach to confirm the importance of CDK9. While the pharmacological approach, including multiple mechanistically distinct CDK9 inhibitors provides strong evidence, an additional experiment with genetic depletion of CDK9 (CRISPR KO, shRNA, etc) would provide compelling mechanistic confirmation.

      Reviewer #1 raised this very same point, and we agree. Please see our reply to Reviewer #1, which details why we did not pursue this approach and argues that the evidence we present is compelling even in absence of genetic manipulation.

      Additionally, please see the new Fig 4E and 4F, which is a repeat of Fig 4B using HCT116 cells. Figure 4E shows that, in this cell line, CDK9 inhibitors (flavopiridol, dinaciclib, and AT7519) block BQ-mediated APP induction, while PROTAC2 does not. Figure 4F shows that (for reasons we cannot fully explain) PROTAC2 does not lead to CDK9 degradation in HCT116 cells. This data strongly implicates CDK9, because it excludes a CDK9-degradation-independent effect of PROTAC2.

      (5) Figure 2B needs a legend.

      Thank you for pointing this out. We have added a legend to Fig 2B.

      (6) The authors should comment in the discussion on how this strategy may be particularly useful in patients harboring genetic or epigenetic loss of interferon signaling, a known mechanism of ICB resistance. Perhaps DHODH inhibition could rescue MHC expression in cells that are deficient in interferon sensing.

      Thank you for this suggestion! We have amended the Discussion section to mention this important point. Please see paragraph 2 of the revised Discussion section where we have added the following text:

      “Because BQ-mediated APP induction does not require interferon signaling, this strategy may have particular relevance for clinical scenarios in which tumor antigen presentation is dampened by the loss or silencing of cancer cell interferon signaling, which has been demonstrated to confer both intrinsic and acquired ICB resistance in human melanoma patients.”

      Reviewer #3 (Recommendations For The Authors):

      The authors present convincing evidence of the mechanism by which pyrimidine nucleotides regulate MHC I levels and about the potential of combining DHODH inhibitors with dual immune checkpoint blockade (ICB). This is an interesting paper given the clinical relevance of DHODH inhibitors. The studies raise some questions, and some points might need clarifying as below:

      • In Figure 2C, why do the authors focus on these two genes in the uridine rescue? These are important genes mediating antigen presentation, but it might be more interesting to see how H2-Db and H2-Kb expression correlate with the protein data shown in Fig 2D. Fig. 2C-2D is a relevant control, so it would be important to validate in a different cancer cell line (e.g. one of the PDAC cell lines used for the RNAseq).

      We appreciate this comment. Although Fig 3C shows that BQ-induced expression of H2-Db, H2-Kb, and B2m is reversed by uridine (in B16F10 cells), we recognize that this was not the best placement for this data, as it can easily be overlooked here since uridine reversal is not the main point of Fig 3C. We have left Fig 3C as is, because we think that the uridine reversal demonstrated in that panel serves as a good internal positive control for reversal of BQ-mediated APP induction in that experiment.

      We have repeated the experiments shown in the original Fig 2C and substituted the original Fig 2C with a new Fig 2C and Fig S2B, which show both Tap1 and Nlrc5 as well as H2-Db, H2-Kb, and B2m after treatment with either BQ (new Fig 2C) or teriflunomide (new Fig S2B). The original Fig S2B is now Fig S2C, and it shows that uridine has no effect on the expression of any of the genes assayed in the new Fig 2C or S2B.

      The reversibility of cell surface MHC-I induction was also validated in HCT116 cells (Fig 3F). We included the uridine reversal in Fig 3F to avoid duplicating the control and BQ FACS data in multiple panels.

      We have also added the qPCR data for HCT116 cells showing this same phenotype (at the mRNA level), which is the new Fig S2D.

      We decided to prioritize HCT116 cells for our mechanistic studies (Figures S2D, S4A, and 4E-F) because previous reports indicate that it is diploid and therefore less genetically deranged compared to our other cancer cell lines.

      • Figure 2F shows an elegant experiment to discard off-target effects related to cell death and to confirm that the increased MHC I expression is uniquely dependent on pyrimidines. DHODH has recently been involved in ferroptosis, a highly immunogenic type of cell death. What are the authors´ thoughts on BQ-induced ferroptosis as a possible contributor to the effects of ICB? Does BQ + ferroptosis inhibitor (ferrostatin) affect cell surface MHC I and/or expression of antigen processing genes?

      The potential role of DHODH in ferroptosis protection (Mao et al 2021) has important implications, so we are glad that multiple reviewers raised questions concerning ferroptosis. We did not directly test the effect of ferroptosis inducing agents (with or without BQ) on MHC-I/APP expression, but that is certainly a worthwhile line of investigation.

      The DHODH/ferroptosis issue is complicated by a study pointed out by Reviewer #1 that challenges the role of DHODH inhibition in BQ-mediated ferroptosis sensitization (Mishima et al, 2022). This study argues that high-dose BQ treatment causes FSP1 inhibition, and this underlies the effect of BQ on the cellular response to ferroptosis-inducing agents.

      Regardless of whether BQ-induced ferroptosis-sensitization is dependent on DHODH, FSP1, or some other factor, the Mao and Mishima studies agree that a relatively high dose of BQ is required to observe these effects (100-200µM for most cell lines and >50µM even in the most ferroptosis-sensitive cell lines). As we explained above, we consider it very unlikely that the in vivo BQ exposure in our experiments (Fig 5) was high enough to cause significant ferroptosis, especially in the absence of any dedicated ferroptosis-inducing agent (which is typically required to cause ferroptosis even in the presence of high-dose BQ).

      • The authors nail down the mechanism to CDK9 (Fig 4). However, all these experiments are performed in 293T cells. I would like to see a repeat of Fig. 4B in a cancer cell line (either PDAC or B16). Also, does BQ have any effect on CDK9 expression/protein levels?

      We have added two figure panels that address this comment (new Fig 4E and 4F). Figure 4E (which is a repeat of Fig 4B with HCT116 cells) shows that CDK9 inhibitors (flavopiridol, AT7519, and dinaciclib) reverse BQ-mediated APP induction in HCT116 cells (this agrees with Fig S4A showing that flavopiridol reverses MHC induction by various nucleotide synthesis inhibitors in this cell line), but PROTAC2 does not. Figure 4F shows that PROTAC2 (for reasons we cannot explain) does not cause CDK9 degradation in HCT116 cells. This adds further support to our thesis that CDK9 is a critical mediator of BQ-mediated APP induction (because how else can this pattern of results be explained?). The text of the Results section has been amended to reflect this.

      We chose to use HCT116 cells for this repeat experiment 1) to align with Fig S4A and 2) because, as previously mentioned, we consider HCT116 to be a good cell line for mechanistic studies because of its relative lack of idiosyncratic genetic features (compared to CFPAC-1, for example, which was derived from a patient with cystic fibrosis).

      • What are the differences in tumor size for the experiment shown in Figure 5E? What about tumor cell death in the ICB vs. BQ+ICB groups?

      Because this was a survival assay, direct comparisons of tumor volumes between groups was not possible at later time points, since mice that die or have to be euthanized are removed from their experimental group, which lowers the average group tumor burden at subsequent time points. Although tumor volume was the most common euthanasia criteria reached, a subset of mice were either found dead or had to be euthanized for other reasons attributed to their tumor burden (moribund state, inability to ambulate or stand, persistent bleeding from tumor ulceration, severe loss of body mass, etc.). This confounds any comparison of endpoint measurements (such as immunohistochemical quantification of tumor cell death markers, T-cell markers, etc.).

      • The different response in the concurrent vs delayed treatment is very interesting. The authors suggest two possible mechanisms to explain this: "1) Concurrent BQ dampens the initial anticancer immune response generated by dual ICB, or b) cancer cell MHC-I and related genes are not maximally upregulated at the time of ICB administration with concurrent treatment". However, and despite the caveat of comparing the in vitro to the in vivo setting, Fig 2D shows upregulation of MHC I already at 24h of treatment in B16 cells. Have the authors checked T cell infiltration in the concurrent and delayed treatment setting?

      For the same reasons described in response to the preceding comment, tumors harvested upon mouse death/euthanasia from our survival experiment were not suitable for cross-cohort comparison of tumor endpoint measurements. An additional experiment in which mice are necropsied at a prespecified time point (before any mice have died or reached euthanasia criteria, as in the experiment for Fig 5A-D) would be required to answer this question.

      • Page 5, line 181 -do the authors mean "nucleotide salvage inhibitors" instead of "synthesis"?

      We believe the reviewer is referring to the following sentence:

      “The other drugs screened included nucleotide synthesis inhibitors (5-fluorouracil, methotrexate, gemcitabine, and hydroxyurea), DNA damage inducers (oxaliplatin, irinotecan, and cytarabine), a microtubule targeting drug (paclitaxel), a DNA methylation inhibitor (azacytidine), and other small molecule inhibitors (Fig 2F).”

      In this context, we believe our use of “synthesis” instead of “salvage” is correct, because methotrexate and 5-FU inhibit thymidylate synthase (which mediates de novo dTTP synthesis), while gemcitabine and hydroxyurea inhibit ribonucleotide reductase (which mediates de novo synthesis of all dNTPs).

    2. Reviewer #1 (Public Review):

      The manuscript by Mullen et al. investigated the gene expression changes in cancer cells treated with the DHODH inhibitor brequinar (BQ), to explore the therapeutic vulnerabilities induced by DHODH inhibition. The study found that BQ treatment causes upregulation of antigen presentation pathway (APP) genes and cell surface MHC class I expression, mechanistically which is mediated by the CDK9/PTEFb pathway triggered by pyrimidine nucleotide depletion. The combination of BQ and immune checkpoint therapy demonstrated a synergistic (or additive) anti-cancer effect against xenografted melanoma, suggesting the potential use of BQ and immune checkpoint blockade as a combination therapy in clinical therapeutics.

      The interesting findings in the present study include demonstrating a novel cellular response in cancer cells induced by DHODH inhibition. However, whether the increased antigen presentation by DHODH inhibition actually contributed to the potentiation of the efficacy of immune-check blockade (ICB) is not directly examined is the limitation of the study. Moreover, the mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways. Finally, high concentrations of BQ have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, and the authors should discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      Comment on the revised version:

      In their response letter, the authors appropriately addressed the reviewer's comments.

      However, it is unfortunate that these comments are not reflected in the main text. Consequently, readers may encounter the same questions. Therefore, the reviewer recommends mentioning them in the discussion or limitations of the study, even if briefly, to address readers' concerns. Especially, addressing the comments such as the dosage of BQ being lower than the reported pro-ferroptotic dose (PMID 37407687), and the lack of examining potential impact of immune cell depletion on the efficacy of BQ treatment would be necessary for considering the proposed mechanism. The latter limitation is also raised by the other reviewer.

    3. Reviewer #2 (Public Review):

      In their manuscript entitled "DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation", Mullen et al. describe an interesting mechanism of inducing antigen presentation. The manuscript includes a series of experiments that demonstrate that blockade of pyrimidine synthesis with DHODH inhibitors (i.e. brequinar (BQ)) stimulates the expression of genes involved in antigen presentation. The authors provide evidence that BQ mediated induction of MHC is independent of interferon signaling. A subsequent targeted chemical screen yielded evidence that CDK9 is the critical downstream mediator that induces RNA Pol II pause release on antigen presentation genes to increase expression. Finally, the authors demonstrate that BQ elicits strong anti-tumor activity in vivo in syngeneic models, and that combination of BQ with immune checkpoint blockade (ICB) results in significant lifespan extension in the B16-F10 melanoma model. Overall, the manuscript uncovers an interesting and unexpected mechanism that influences antigen presentation and provides an avenue for pharmacological manipulation of MHC genes, which is therapeutically relevant in many cancers. However, a few key experiments are needed to ensure that the proposed mechanism is indeed functional in vivo.

      Major Points:

      (1) According to the proposed model, BQ mediated induction of antigen presentation is a contributing factor to the efficacy of this therapeutic strategy. If this is true, then depletion of immune cells should reduce the therapeutic efficacy of BQ in vivo. The authors should perform the B16-F10 transplant experiments in either Rag null mice (if available) or with CD8/CD4 depletion. The expectation would be that T cell depletion (or MHC loss with genetic manipulation) should reduce the efficacy of BQ treatment. Absent this critical experiment, it is difficult to confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition.

      (2) Does BQ treatment induce antigen presentation in non-malignant cells? APCs? If the induction of antigen presentation is not cancer specific and related to a pyrimidine depletion stress response, then there is a possibility that healthy tissues will also exhibit a similar phenotype, raising concerns about the specificity of a de novo immune response. The authors should examine antigen presentation genes in healthy tissues treated with BQ.

      (3) In the title, the authors claim that DHODH enhances the efficacy of ICB. However, the experiment shown in Figure 5D does not demonstrate this. The Kaplan Meier curves reflect more of an additive response versus a synergistic combination. Furthermore, the concurrent treatment of BQ and ICB seems to inhibit the efficacy of ICB due to BQ toxicity in immune cells. When concurrently administered, the survival of the mice is the same as with brequinar alone, suggesting that the efficacy of ICB was diminished. However, if ICB is administered following an initial dose of BQ, there is an added survival benefit of a magnitude that is similar to ICB alone. This result seems to contradict the title. Furthermore, the authors should show the longitudinal growth curves of these tumors.

      (4) Related to Point 3, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. One explanation for the results is that BQ treatment reduces tumor burden, and then a subsequent course of ICB also reduces tumor burden but not that the two therapies are functioning in synergy. To address this, the authors should measure the duration of BQ mediated induction of antigen presentation after stopping treatment.

      (5) In Figure 1, the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level. However, they only validate MHC-I by flow cytometry. A simple experiment to evaluate the effect of BQ treatment on MHC-II surface expression would provide important additional mechanistic insight into the immunomodulatory effects of DHODH inhibition, especially given recent literature reinforcing the importance of MHC-II expression on epithelial cancers, including melanoma (Oliveira et al. Nature 2022).

      Minor Points:

      (1) The authors show ChIP-seq tracks from Tan et al. for HLA-B. However, given the pervasive effect of Ter treatment across many HLA genes, the authors should either show tracks at additional loci, or provide a heatmap of read density across more loci. This would substantiate the mechanistic claim that RNA Pol II occupancy and activity across antigen presentation genes is the major driver of response to DHODH inhibition as opposed to mRNA stabilization/increased translation.

      (2) A compelling way to demonstrate a change in antigen presentation is through mass spectrometry based immunopeptidomics. Performing immunopeptidomic analysis of BQ treated cell lines would provide substantial mechanistic insight into the outcome of BQ treatment. While this approach may be outside the scope of the current work, the authors should speculate on how this treatment may specifically alter the antigenic landscape where future directions would include empirical immunopeptidomics measurements.

      (3) While the signaling through CDK9 seems convincing, it still does not provide a mechanistic link between depleted pyrimidines and CDK9 activity. The authors should speculate on the mechanism that signals to CDK9.

      (4) Related to minor point 2, the authors should consider a genetic approach to confirm the importance of CDK9. While the pharmacological approach, including multiple mechanistically distinct CDK9 inhibitors provides strong evidence, an additional experiment with genetic depletion of CDK9 (CRISPR KO, shRNA, etc) would provide compelling mechanistic confirmation.

      (5) The authors should comment in the discussion on how this strategy may be particularly useful in patients harboring genetic or epigenetic loss of interferon signaling, a known mechanism of ICB resistance. Perhaps DHODH inhibition could rescue MHC expression in cells that are deficient in interferon sensing.

      Overall, the paper is clearly written and presented. With the additional experiments described above, especially in vivo, this manuscript would provide a strong contribution to the field of antigen presentation in cancer. The distinct mechanisms by which DHODH inhibition induces antigen presentation will also set the stage for future exploration into alternative methods of antigen induction.

      Comments on latest version:

      The authors address the majority of the points raised in my previous review. However, no additional in vivo experiments were performed, which seems necessary for the major conclusions of the paper.

      I disagree with the authors' assessment of Major Point 3 in my review. I have updated the text of Major Point 3 in my public review to further clarify my position.

      My final assessment is that if the authors want to claim that DHODH inhibition potentiates immune checkpoint blockade, as is stated in the title, then further in vivo experimentation is needed.

    4. Reviewer #3 (Public Review):

      Mullen et al present an important study describing how DHODH inhibition enhances efficacy of immune checkpoint blockade by increasing cell surface expression of MHC I in cancer cells. DHODH inhibitors have been used in the clinic for many years to treat patients with rheumatoid arthritis and there has been a growing interest in repurposing these inhibitors as anti-cancer drugs. In this manuscript, the Singh group builds on their previous work defining combinatorial strategies with DHODH inhibitors to improve efficacy. The authors identify an increased expression of genes in the antigen presentation pathway and MHC I after BQ treatment which is mediated strictly by pyrimidine depletion and CDK9/P-TEFb. The authors rationalize that increased MHC I expression induced by DHODH inhibition might favor efficacy of dual immune checkpoint blockade. In fact, this combinatorial treatment prolonged survival in an immunocompetent B16F10 melanoma model.

      Previous studies have shown that DHODH inhibitors can increase expression of innate immunity-related genes but the role of DHODH and pyrimidine nucleotides in antigen presentation has not been previously reported. A strength of the manuscript is the solid in vitro mechanistic data supported by analysis in multiple cell lines. The in vivo data show compelling additive effects of DHODH inhibitors and ICB. However, more controls and experiments would be required to define the nature of these effects and to confirm that the mechanistic in vitro data is conserved in vivo.

      This is a relevant manuscript proposing a mechanistic link between pyrimidine depletion and MHC I expression and a novel therapeutic approach combining DHODH inhibitors with dual checkpoint blockade. These results might be relevant for the clinical development of DHODH inhibitors in the treatment of solid tumors, a setting where these have not shown optimal efficacy yet.

      Comments on revised version:

      The authors have addressed my questions regarding validation of gene expression in other cell lines. They have also provided an explanation about why in vivo evaluations could not be performed for the experiment in Figure 5E.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study, utilizing CITE-Seq to explore CML, is considered a useful contribution to our understanding of treatment response. However, the reviewers express concern about the incomplete evidence due to the small sample size and recommend addressing these limitations. Strengthening the study with additional patient samples and validation measures would enhance its significance.

      We thank the editors for the assessment of our manuscript. In view of the comments of the three reviewers, we have increased the number of CML patient samples analyzed to confirm all the major findings included in the manuscript. In total, more than 80 patient samples across different approaches have now been analyzed and incorporated in the revised manuscript.

      To the best of our knowledge, this is the first single cell multiomics report in CML and differs substantially from the recent single cell omics-based reports where single modalities were measured one at a time (Krishnan et al., 2023; Patel et al., 2022). Thus, the sc-multiomic investigation of LSCs and HSCs from the same patient addresses a major gap in the field towards managing efficacy and toxicity of TKI treatment by enumerating CD26+CD35- LSCs and CD26-CD35+ HSCs burden and their ratio at diagnosis vs. 3 months of therapy. The findings suggest design of a simpler and cheaper FACS assay to simultaneously stratify CML patients for TKI efficacy as well as hematologic toxicity.

      Reviewer 1:

      Summary:

      This manuscript by Warfvinge et al. reports the results of CITE-seq to generate singlecell multi-omics maps from BM CD34+ and CD34+CD38- cells from nine CML patients at diagnosis. Patients were retrospectively stratified by molecular response after 12 months of TKI therapy using European Leukemia Net (ELN) recommendations. They demonstrate heterogeneity of stem and progenitor cell composition at diagnosis, and show that compared to optimal responders, patients with treatment failure after 12 months of therapy demonstrate increased frequency of molecularly defined primitive cells at diagnosis. These results were validated by deconvolution of an independent previously published dataset of bulk transcriptomes from 59 CML patients. They further applied a BCR-ABL-associated gene signature to classify primitive Lin-CD34+CD38- stem cells as BCR:ABL+ and BCR:ABL-. They identified variability in the ratio of leukemic to non-leukemic primitive cells between patients, showed differences in the expression of cell surface markers, and determined that a combination of CD26 and CD35 cell surface markers could be used to prospectively isolate the two populations. The relative proportion of CD26-CD35+ (BCR:ABL-) primitive stem cells was higher in optimal responders compared to treatment failures, both at diagnosis and following 3 months of TKI therapy.

      Strengths:

      The studies are carefully conducted and the results are very clearly presented. The data generated will be a valuable resource for further studies. The strengths of this study are the application of single-cell multi-omics using CITE-Seq to study individual variations in stem and progenitor clusters at diagnosis that are associated with good versus poor outcomes in response to TKI treatment. These results were confirmed by deconvolution of a historical bulk RNAseq data set. Moreover, they are also consistent with a recent report from Krishnan et al. and are a useful confirmation of those results. The major new contribution of this study is the use of gene expression profiles to distinguish BCRABL+ and BCR-ABL- populations within CML primitive stem cell clusters and then applying antibody-derived tag (ADT) data to define molecularly identified BCR:ABL+ and BCR-ABL- primitive cells by expression of surface markers. This approach allowed them to show an association between the ratio of BCR-ABL+ vs BCR-ABL- primitive cells and TKI response and study dynamic changes in these populations following short-term TKI treatment.

      Weaknesses:

      One of the limitations of the study is the small number of samples employed, which is insufficient to make associations with outcomes with confidence. Although the authors discuss the potential heterogeneity of primitive stem, they do not directly address the heterogeneity of hematopoietic potential or response to TKI treatment in the results presented. Another limitation is that the BCR-ABL + versus BCR-ABL- status of cells was not confirmed by direct sequencing for BCR-ABL. The BCR-ABL status of cells sorted based on CD26 and CD35 was evaluated in only two samples. We also note that the surface markers identified were previously reported by the same authors using different single-cell approaches, which limits the novelty of the findings. It will be important to determine whether the GEP and surface markers identified here are able to distinguish BCR-ABL+ and BCR-ABL- primitive stem cells later in the course of TKI treatment. Finally, although the authors do describe differential gene expression between CML and normal, BCR:ABL+ and BCR:ABL-, primitive stem cells they have not as yet taken the opportunity to use these findings to address questions regarding biological mechanisms related to CML LSC that impact on TKI response and outcomes.

      Reviewer #1 (Recommendations For The Authors):

      Minor comment: Fig 4 legend -E and F should be C and D.

      We thank the reviewer for positive assessment of our work. Here, we highlight the updates in the revised manuscript considering the feedback received.

      Minor comment: Fig 4 legend -E and F should be C and D.

      We have edited the revised manuscript accordingly

      One of the limitations of the study is the small number of samples employed, which is insufficient to make associations with outcomes with confidence.

      Although we performed CITE-seq for 9 CML patient samples at diagnosis, we extended our investigations to include additional samples (e.g., largescale deconvolution analysis of samples, Fig 3 C-E, qPCR for BCR::ABL1 status, Fig. 6A, and the ratio between CD35+ and CD26+ populations at diagnosis and during TKI therapy, Fig. 6C-D) as described in the manuscript.

      In comparison to a scRNA-seq, multiomic CITE-seq involves preparation and sequencing of separate libraries corresponding to RNA and ADTs thereby being even more resource demanding limiting our capacity to process an extensive number of patient samples. To confirm our findings in a larger cohort we have therefore adopted a computational deconvolution approach, CIBERSORT to analyze a larger number of independent samples (n=59). This reflects a growing, sustainable trend to study larger number of patients in face of still prohibitively expensive but potentially insightful scomics approaches (For example, please see Zeng et al, A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia, Nature Medicine, 2022).

      However, in view of the comment, we have now substantially increased the number of analyzed patients in the revised manuscript. These include increased number of patient samples to investigate the ratio between CD35 and CD26 marked populations at diagnosis, and 3 months of TKI therapy (from n=8 to n=12 with now 6 optimal responders and 5 treatment failure at diagnosis and after TKI therapy), qPCR for BCR::ABL1 expression status at diagnosis (from n=3 to n=9) , and followed up the BCR::ABL1 expression in three additional samples after TKI therapy. Moreover, we examined the CD26 and CD35 marked populations for expression of GAS2, one of our top candidate LSC signature genes in three additional samples at diagnosis and at 3m follow up. Thus, >80 patient samples across different approaches have been analyzed to strengthen all major conclusions of the study.

      We emphasize that we were cautious in generalizing the observation obtained from any one approach and sought to confirm any major finding using at least one complementary method. As an example, although CITE-seq (n=9) showed altered frequency of all cell clusters between optimal and poor responders (Fig. 3B), we refrained from generalizing because our independent large-scale computational deconvolution analysis (n=59) only substantiated the altered proportion of primitive and myeloid cell clusters (Fig. 3E).

      Although the authors discuss the potential heterogeneity of primitive stem, they do not directly address the heterogeneity of hematopoietic potential or response to TKI treatment in the results presented.

      Thanks for noting the discussion on heterogeneity of the primitive stem cells. As described in the original manuscript, the figure 6 D-E showed a relationship between heterogeneity and TKI therapy response. The results showed that CD35+/CD26+ ratio within the HSC fraction associated with this therapy response. We have now increased the number of patient samples analyzed and present the updated results in the revised manuscript (now figure 6 C-D). These observations set the stage for assessing whether long term therapy outcome can also be influenced by heterogeneity at diagnosis.

      We have shown the hematopoietic potential of HSCs marked by CD35 expression in an independent parallel study and therefore only mentioned it concisely in the current manuscript. A combination of scRNA-seq, scATAC-seq and cell surface proteomics showed CD35+ cells at the apex of healthy human hematopoiesis, containing an HSCspecific epigenetic signature and molecular program, as well as possessing self-renewal capacity and multilineage reconstitution in vivo and vitro. The preprint is available as Sommarin et al. ‘Single-cell multiomics reveals distinct cell states at the top of the human hematopoietic hierarchy’, Biorxiv; https://www.biorxiv.org/content/10.1101/2021.04.01.437998v2.full

      We also note that the surface markers identified were previously reported by the same authors using different single-cell approaches, which limits the novelty of the findings.

      Our current manuscript is indeed a continuation of and builds onto our previous paper (Warfvinge R et al. Blood, 2017). In contrast to our previous report which was limited to examination of only 96 genes per cell, CITE-seq allowed us to examine the molecular program of cells using unbiased global gene expression profiling. Finally, although CD26 appears, once again as a reliable marker of BCR::ABL1+ primitive cells, CD35 emerges as a novel and previously undescribed marker of BCR::ABL1- residual stem cells. A combination of CD35 and CD26 allowed us to efficiently distinguish between the two populations housed within the Lin-34+38/low stem cell immunophenotype.

      Another limitation is that the BCR-ABL + versus BCR-ABL- status of cells was not confirmed by direct sequencing for BCR-ABL. The BCR-ABL status of cells sorted based on CD26 and CD35 was evaluated in only two samples

      Single cell detection of fusion transcripts is challenging with low detection sensitivity in single cell RNA-seq as has been noted previously (Krishnan et al. Blood, 2023, Giustacchini et al. Nature Medicine, 2017, Rodriguez-Meira et al. Molecular Cell, 2019). However, this is likely to change with the inclusion of targetspecific probes in scRNA-seq library preparation protocols. Nonetheless, in view of the comment, we have included more patient samples (from the previous n=3 to current n=10 (including TKI treated samples) for direct assessment of BCR-ABL1 status by qPCR analysis; the updated results are included in the revised manuscript (Figure 6A).

      It will be important to determine whether the GEP and surface markers identified here are able to distinguish BCR-ABL+ and BCR-ABL- primitive stem cells later in the course of TKI treatment.

      We performed qPCR to check for BCR::ABL1 status, and the level of GAS2, one of the top genes expressed in CML cells within CD26+ and CD35+ cells at diagnosis and following 3 months of TKI therapy. The results showed that while CD26+ are BCR::ABL1+, the CD35+ cells are BCR::ABL1- at both time points. Moreover, the expression of LSC-specific gene, GAS2 was specific to BCR::ABL1+ CD26+ cells at both diagnosis as well as following 3 months of TKI therapy. The new results are presented in figure 6B in the revised manuscript.

      Finally, although the authors do describe differential gene expression between CML and normal, BCR:ABL+ and BCR:ABL-, primitive stem cells they have not as yet taken the opportunity to use these findings to address questions regarding biological mechanisms related to CML LSC that impact on TKI response and outcomes.

      We agree with the reviewer that our major focus here was to characterize the cellular heterogeneity coupled to treatment outcome and therefore we did not delve deep into the molecular mechanisms underlying TKI response. However, in response to this comment, as mentioned above, we noted that one of the top genes in BCR::ABL1 cells (Fig. 4 C; right; in red), GAS2 (Growth Specific Arrest 2) was expressed at both diagnosis and TKI therapy within CD26+ cells relative to CD35+ cells (updated figure 6B). Interestingly, GAS2 was also detected in CML LSCs in a recent scRNA-seq study (Krishnan et al. Blood, 2023) suggesting GAS2 upregulation could be a consistent molecular feature of CML cells. GAS2 has been previously noted as deregulated in CML (Janssen JJ et al. Leukemia, 2005, Radich J et al, PNAS, 2006), control of cell cycle, apoptosis, and response to Imatinib (Zhou et al. PLoS One, 2014). Future investigations are warranted to assess whether GAS2 could play a role in the outcome of long-term TKI therapy.

      Reviewer 2:

      Summary:

      The authors use single-cell "multi-comics" to study clonal heterogeneity in chronic myeloid leukemia (CML) and its impact on treatment response and resistance. Their main results suggest 1) Cell compartments and gene expression signatures both shared in CML cells (versus normal), yet 2) some heterogeneity of multiomic mapping correlated with ELN treatment response; 3) further definition of s unique combination of CD26 and CD35 surface markers associated with gene expression defined BCR::ABL1+ LSCs and BCR::ABL1- HSCs. The manuscript is well-written, and the method and figures are clear and informative. The results fit the expanding view of cancer and its therapy as a complex Darwinian exercise of clonal heterogeneity and the selective pressures of treatments.

      Strengths:

      Cutting-edge technology by one of the expert groups of single-cell 'comics.

      Weaknesses:

      Very small sample sizes, without a validation set. The obvious main problem with the study is that an enormous amount of results and conjecture arise from a very small data set: only nine cases for the treatment response section (three in each of the ELN categories), only two normal marrows, and only two patient cases for the division kinetic studies. Thus, it is very difficult to know the "noise" in the system - the stability of clusters and gene expression and the normal variation one might expect, versus patterns that may be reproducibly study artifact, effects of gene expression from freezing-thawing, time on the bench, antibody labeling, etc. This is not so much a criticism as a statement of reality: these elegant experiments are difficult, timeconsuming, and very expensive. Thus in the Discussion, it would be helpful for the authors to just frankly lay out these limitations for the reader to consider. Also in the Discussion, it would be interesting for the authors to consider what's next: what type of validation would be needed to make these studies translatable to the clinic? Is there a clever way to use these data to design a faster/cheaper assay?

      We thank the reviewer for appraisal of our manuscript. We take the opportunity to point out the updates in the revised manuscript in view of the comments.

      Very small sample sizes, without a validation set. The obvious main problem with the study is that an enormous amount of results and conjecture arise from a very small data set: only nine cases for the treatment response section (three in each of the ELN categories), only two normal marrows, and only two patient cases for the division kinetic studies.

      As the reviewer has noted the single cell omics experiments remain resource demanding thereby placing a limitation on the number of patients analyzed. As described above in response to the comments from reviewer 1, multiomic CITE-seq allows extraction of two modalities in comparison to a typical scRNA-seq, however, this also makes it even more limited in the number of samples processed in a sustainable way. This was one of the motivations to analyze a larger number of independent samples (n=59) while benefiting from the insights gained from CITE-seq (n=9). Furthermore, by analyzing CD34+ cells from bone marrow and peripheral blood of CML patients, including both responders and non-responders after one year of Imatinib therapy, we were able to significantly diversity the patient pool, which was lacking in our CITE-seq patient pool. As mentioned above, this reflects a growing trend to analyze larger number of patients while anchoring the analysis on prohibitively expensive but potentially insightful sc-omics approaches (For example, please see Zeng et al, A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia, Nature Medicine, 2022).

      As emphasized above, we frequently sought to confirm the findings from one approach using a complementary method and independent samples. For example, although CITE-seq (n=9) showed altered frequency of all cell clusters between optimal and poor responders (Fig. 3B), we refrained from generalizing because an independent largescale computational deconvolution analysis (n=59) only substantiated the altered proportion of primitive and myeloid clusters.

      In view of the comment, we have now increased the number of patients analyzed during the revision process. These include increased numbers to investigate the ratio between CD35+ and CD26+ populations at diagnosis, as well as 3 months of TKI therapy, qPCR for BCR::ABL1, and patients examined for GAS2, one of the top genes expressed in CML cells (see response to reviewer 1 for details). Altogether, >80 patient samples across different approaches were analyzed to strengthen the conclusions.

      During the revision, we have analyzed cells from 8 CML patients for cell cycle using gene activity scores. This is in addition to the cell division kinetics data reported previously are now together described in the supplementary figures 9C-F.

      It is very difficult to know the "noise" in the system - the stability of clusters and gene expression and the normal variation one might expect, versus patterns that may be reproducibly study artifact, effects of gene expression from freezing-thawing, time on the bench, antibody labeling, etc. This is not so much a criticism as a statement of reality: these elegant experiments are difficult, time-consuming, and very expensive. Thus in the Discussion, it would be helpful for the authors to just frankly lay out these limitations for the reader to consider.

      We agree with the reviewer that sc-omics approaches can be noisy despite continuing efforts to denoise single cell datasets through both experimental and bioinformatic innovations. Therefore, we have updated the discussion as recommended by the reviewer (paragraph 5 in the discussion).

      We also note that CITE-seq, in contrast to scRNA-seq alone provides dual features: surface marker/protein as well as RNA for annotating the same cluster. In our manuscript, for example, cell clusters in UMAP for normal BM; Fig 1B were described using both surface markers (Fig. 1C) and RNA (Fig. 1D) making the cluster identity robust. To further elaborate this approach, a new supplementary figure 1C shows annotations of clusters using both RNA and surface markers.

      To potentially address the issue of stability of clusters and gene expression, we compared the marker genes for major clusters from nBM from this study (supplementary table 4, Warfvinge et al.) with those described recently in a scRNA-seq study by Krishnan et al. supplementary table 8, Blood, 2023 using Cell Radar, a tool that identifies and visualizes which hematopoietic cell types are enriched within a given gene set (description: https://github.com/KarlssonG/cellradar

      Direct link: https://karlssong.github.io/cellradar/). To compare, we used our in-house gene list for the major clusters as well as mapped the same number of top marker genes based on log2FC from corresponding cluster from Krishnan et al. as inputs to Cell Radar. The Cell Radar plot outputs are shown below.

      Author response image 1.

      This approach showed broad similarities across clusters from this study with their counterparts from the other study suggesting the cluster identities reported here are likely to be robust. Please note these figures are for reviewer response only and not included in the final manuscript.

      Also in the Discussion, it would be interesting for the authors to consider what's next: what type of validation would be needed to make these studies translatable to the clinic? Is there a clever way to use these data to design a faster/cheaper assay?

      Our findings on CD26+ and CD35+ surface markers to enrich BCR::ABL1+ and BCR::ABL1- cells suggest a simpler, faster and cheaper FACS panel can possibly quantify leukemic and non-leukemic stem cells in CML patients. We anticipate that future investigations, clinical studies might examine whether CD26CD35+ cells could be plausible candidates for restoring normal hematopoiesis once the TKI therapy diminishes the leukemic load, and whether patients with low counts of CD35+ cells at diagnosis have a relatively higher chance of developing hematologic toxicity such as cytopenia during therapy.

      We briefly mentioned this possibility in the discussion; however, we have now moved it to another paragraph to highlight the same. Please see paragraph 5 in the revised manuscript.

      Reviewer 3:

      Summary:

      In this study, Warfvinge and colleagues use CITE-seq to interrogate how CML stem cells change between diagnosis and after one year of TKI therapy. This provides important insight into why some CML patients are "optimal responders" to TKI therapy while others experience treatment failure. CITE-seq in CML patients revealed several important findings. First, substantial cellular heterogeneity was observed at diagnosis, suggesting that this is a hallmark of CML. Further, patients who experienced treatment failure demonstrated increased numbers of primitive cells at diagnosis compared to optimal responders. This finding was validated in a bulk gene expression dataset from 59 CML patients, in which it was shown that the proportion of primitive cells versus lineage-primed cells correlates to treatment outcome. Even more importantly, because CITE-seq quantifies cell surface protein in addition to gene expression data, the authors were able to identify that BCR/ABL+ and BCR/ABL- CML stem cells express distinct cell surface markers (CD26+/CD35- and CD26-/CD35+, respectively). In optimal responders, BCR/ABL- CD26-/CD35+ CML stem cells were predominant, while the opposite was true in patients with treatment failure. Together, these findings represent a critical step forward for the CML field and may allow more informed development of CML therapies, as well as the ability to predict patient outcomes prior to treatment.

      Strengths:

      This is an important, beautifully written, well-referenced study that represents a fundamental advance in the CML field. The data are clean and compelling, demonstrating convincingly that optimal responders and patients with treatment failure display significant differences in the proportion of primitive cells at diagnosis, and the ratio of BCR-ABL+ versus negative LSCs. The finding that BCR/ABL+ versus negative LSCs display distinct surface markers is also key and will allow for a more detailed interrogation of these cell populations at a molecular level.

      Weaknesses:

      CITE-seq was performed in only 9 CML patient samples and 2 healthy donors. Additional samples would greatly strengthen the very interesting and notable findings.

      Reviewer #3 (Recommendations For The Authors):

      My only recommendation is to bolster findings with additional CML and healthy donor samples.

      CITE-seq was performed in only 9 CML patient samples and 2 healthy donors. Additional samples would greatly strengthen the very interesting and notable findings.

      We thank the reviewer for the positive assessment of our manuscript. As mentioned in response to comments from reviewer 1 and 2, CITE-seq remains an reource consuming single cell method potentially limiting the number of patients to be analyzed. However, during the revision process, we have increased the number of patient material analyzed for other assays; these include increased number to investigate the ratio between CD35+ and CD26+ populations at diagnosis, and 3 months of TKI therapy, qPCR for BCR::ABL1, and patients examined for GAS2, one of the top genes expressed in CML cells. Thus, >80 patient samples across different assays have been analyzed to strengthen the conclusions. (Please see comment to reviewer 1 for more details)

    2. eLife assessment

      This study presents fundamental insights into the heterogeneity of chronic myeloid leukemia (CML) stem cells and their response to tyrosine kinase inhibitor therapy, shedding light on potential mechanisms underlying treatment failure. The study's robust methodology, supported by validation with bulk RNA-seq data and surface marker analysis, provides compelling evidence for the identified associations between cellular composition and treatment outcome. These findings contribute to our understanding of CML pathogenesis and may inform the development of more targeted therapeutic strategies.

    3. Reviewer #1 (Public Review):

      Summary:

      This manuscript by Warfvinge et al. reports the results of CITE-seq to generate single-cell multi-omics maps from BM CD34+ and CD34+CD38- cells from nine CML patients at diagnosis. Patients were retrospectively stratified by molecular response after 12 months of TKI therapy using European Leukemia Net (ELN) recommendations. They demonstrate heterogeneity of stem and progenitor cell composition at diagnosis, and show that compared to optimal responders, patients with treatment failure after 12 months of therapy demonstrate increased frequency of molecularly defined primitive cells at diagnosis. These results were validated by deconvolution of an independent previously published dataset of bulk transcriptomes from 59 CML patients. They further applied a BCR-ABL-associated gene signature to classify primitive Lin-CD34+CD38- stem cells as BCR:ABL+ and BCR:ABL-. They identified variability in the ratio of leukemic to non-leukemic primitive cells between patients, showed differences in expression of cell surface markers and determined that a combination of CD26 and CD35 cell surface markers could be used to prospectively isolate the two populations. The relative proportion of CD26-CD35+ (BCR:ABL-) primitive stem cells was higher in optimal responders compared to treatment failures, both at diagnosis and following 3 months of TKI therapy.

      Strengths:

      The studies are carefully conducted and the results are very clearly presented. The data generated will be a valuable resource for further studies. The strengths of this study are the application of single-cell multi-omics using CITE-Seq to study individual variations in stem and progenitor clusters at diagnosis that are associated with good versus poor outcomes in response to TKI treatment. These results were confirmed by deconvolution of a historical bulk RNAseq data set. Moreover, they are also consistent with a recent report from Krishnan et al. and are a useful confirmation of those results. The major new contribution of this study is the use of gene expression profiles to distinguish BCR-ABL+ and BCR-ABL- populations within CML primitive stem cell clusters and then applying antibody-derived tag (ADT) data to define molecularly identified BCR:ABL+ and BCR-ABL- primitive cells by expression of surface markers. This approach allowed them to show an association between the ratio of BCR-ABL+ vs BCR-ABL- primitive cells and TKI response and study dynamic changes in these populations following short-term TKI treatment.

      Weaknesses:

      The number of samples studied by CITE-Seq is limited. However, the authors have confirmed their key observations in additional samples. The BCR-ABL+ versus BCR-ABL- status of cells was not confirmed by direct sequencing for BCR-ABL. However, we recognize that the methodologies to perform these analyses on single cells is still evolving and the authors have shown that CD26 and CD35 expression can consistently identify BCR-ABL+ versus BCR-ABL- cells. It will be of interest to learn whether the GEP and surface markers identified here can distinguish BCR-ABL+ primitive stem cells later in the course of TKI treatment.

    4. Reviewer #3 (Public Review):

      Summary:

      In this study, Warfvinge and colleagues use CITE-seq to interrogate how CML stem cells change between diagnosis and after one year of TKI therapy. This provides important insight into why some CML patients are "optimal responders" to TKI therapy while others experience treatment failure. CITE-seq in CML patients revealed several important findings. First, substantial cellular heterogeneity was observed at diagnosis, suggesting that this is a hallmark of CML. Further, patients who experienced treatment failure demonstrated increased numbers of primitive cells at diagnosis compared to optimal responders. This finding was validated in a bulk gene expression dataset from 59 CML patients, in which it was shown that the proportion of primitive cells versus lineage-primed cells correlates to treatment outcome. Even more importantly, because CITE-seq quantifies cell surface protein in addition to gene expression data, the authors were able to identify the BCR/ABL+ and BCR/ABL- CML stem cells express distinct cell surface markers (CD26+/CD35- and CD26-/CD35+, respectively). In optimal responders, BCR/ABL- CD26-/CD35+ CML stem cells were predominant, while the opposite was true in patients with treatment failure. Together, these findings represent a critical step forward for the CML field and may allow more informed development of CML therapies, as well as the ability to predict patient outcomes prior to treatment.

      Strengths:

      This is an important, beautifully written, well-referenced study that represents a fundamental advance in the CML field. The data are clean and compelling, demonstrating convincingly that optimal responders and patients with treatment failure display significant differences in the proportion of primitive cells at diagnosis, and the ratio of BCR-ABL+ versus negative LSCs. The finding that BCR/ABL+ versus negative LSCs display distinct surface markers is also key and will allow for more detailed interrogation of these cell populations at a molecular level.

      Weaknesses:

      CITE-seq was performed in only 9 CML patient samples and 2 healthy donors. Additional samples would greatly strengthen the very interesting and notable findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank the reviewers for their thoughtful analysis and questions.

      A brief overview of the changes to the manuscript is provided here, with individual responses to the reviewer comments following.

      The methods section has been expanded to better explain the techniques used in our analyses. CTCF binding data section has likewise been expanded, to include more detail on the dataset and our analysis of its contents. All other requested clarifications have been added to areas of the results.

      Beyond specific requests from the reviewers, we made the following changes.

      We felt that a particular terminology choice on our part resulted in some confusion: the use of “SNPs” to refer to genetic variants within our Diversity Outbred samples. While we used SNPs that lay closest to the center of our haplotype predictions as our representative loci for each linkage disequilibrium block, this was done for computational purposes only. We did not focus most of our analyses on the haplotypes themselves, because of the uncertainty of which variants within an LD block actually participated in the genetic-epigenetic interactions we imputed.

      Thus, we edited the text to remove mention of “SNPs” unless our analysis did directly and deliberately profile SNPs themselves. In all other cases, we now refer to “haplotypes”, “genetic variants”, or “variants”. This should help increase clarity in the manuscript as a whole.

      A small error was discovered within the labelling and processing of regression model outputs in chromosome 14. A consistency check was run on all chromosomes, finding that only Chr 14 was affected. Chr 14 was rerun in its entirety to verify its results, with the previous results now archived within our databases uploaded on Synapse (see Methods for a link). All relevant calculations and figures were regenerated, resulting in an average shift of 1% or less across the manuscript. All analyses remain highly statistically significant.

      Responses to comments from Reviewer #1

      Methods

      • Sequencing depth was retrieved from the original publication on the primary multiomics dataset. (Line 105-106)

      • A line was added regarding initial mouse genome alignment for the original publication: we explain the GigaMUGA genotyping array, used for the DO mESC samples. For our ChIP-seq data, we reword to specify: we used liftovers from imputed strain-specific genomes to B6 mm10. (Lines 108-110; 116-120; 168-170)

      • Aneuploidy removal is expanded upon in a similar fashion: the original QC identified chromosome-level gene expression differences to remove aneuploid samples. (Line 111)

      • Mention of the pre-publication use of an alternative null model has been removed, given its lack of relevance to the rest of the text. While it was interesting to compare to the standard null model, it amounts to a side note that distracts from the focus of the paper. (Line 137-139).

      • Descriptive subheadings have been added.

      Results - Line 179 (now Line 191) now points to Methods.

      • Line 189-200 (now Line 188-204): language altered to better explain our intent: We wished to perform an intrachromosomal scan across the whole genome for non-additive genetic-epigenetic interactions. However, there were computational limits to how many possible combinations of gene, haplotype, and ATAC-seq peak we could feasibly test. We thus generated a random subset of possible combinations. This was also performed to identify target regions for focused analyses.

      • Line 195 (now line 206, expanded on in Line 210): Clarification added on the significance of our result: if non-additive genetic-epigenetic interactions were not a significant explanatory factor for gene expression, we would expect to see no enrichment of low p-value results. Instead, we see 0.07% of our models coming in at adj. p < 1x10-7.

      • Line 199 (now Line 216): The requested calculations were run, and are now included in table S3. We found that within 4 Mb of a given gene, less than 10% of variants and ATAC peaks within clustered closer to each other than they did to the gene they affected.

      Please note that this figure has a level of uncertainty due to linkage disequilibrium. Thus, rather than precisely answering the question “[are there haplotype-ATAC pairs] that are in the same locality but further away from the gene?”, we asked "is the ATAC peak closer than the gene to the point where we have the highest confidence of correctly calling the interacting genotype?". The relevant code has been deposited in our Synapse repository (see Methods for link).

      • Line 205 (now restructured in Line 221-228): The text has been edited to specify our intent. We are referring to a set of TAD-focused regression models we generated (see Methods) that comprehensively included all possible interactions between genes, and all haplotypes and ATAC peaks within +/- 1 TAD of the gene.

      • (Line 227): We specified that the previously-published TAD boundary dataset we used was retrieved from the Bing Ren lab’s Hi-C projects, which imputed locations of TAD boundaries in B6 mESCs.

      • We have relabeled Figure 1 and tweaked the surrounding text to clear up some confusing aspects. The Euler plots in Figure 1D-E reflect the fact that each ATAC-seq peak and haplotype can be in multiple relationships with local genes and regulatory factors. Some of these relationships will be simple correlation between their presence and gene expression, while others may co-regulate alongside independent regulatory factors, or engage in non-additive regulatory interactions.

      Because these non-additive regulatory interactions have not been comprehensively studied, we wished to determine whether there were any regulatory factors within our data that would not be detected as significant via more conventional methods, such as correlation analysis, mediation analysis, or regression analysis without an interaction term. Our Euler plots show that there are large subsets of both ATAC-seq peaks and haplotypes that are exclusively found in non-additive interactions. Thus, our justification for focusing on non-additive interactions for the rest of the paper.

      • Line 256 (now Line 252-255): We further clarified the above in this section: correlation and mediation analyses were previously completed by the team which initially analyzed the DO mESC dataset (Skelly et al. 2020, Cell Stem Cell). They performed a correlation analysis between open chromatin and gene expression (Skelly et al. Fig. 2A), and identified expression quantitative trait loci (eQTL) (Skelly et al. Fig. 2E). We felt that more direct comparisons to the Skelly et al. data would distract readers from our focus on genetic-epigenetic interactions. Thus, we limited our discussion of non-interacting regulatory relationships to Figures 1-2, and a brief mention in Figure 5.

      • Line 290 (now Line 337): We pulled promoter locations from the FANTOM5 database of mouse promoters, and included analysis in both the text and Figure S4A-B.

      • (Line 475-476): we clarified “DO founder SNPs” to “SNPs from the non-reference DO founder strains”.

      • Line 472 (restructured in Lines 531-564): We have expanded on this section, including answers to the reviewer’s questions regarding ChIP-seq peak counts, overlap with the TAD map we used for our other analyses, and expanded upon strain-specific CTCF binding we identified in our ChIP-seq analysis.

      Responses to comments from Reviewer #2:

      (1) Typo corrected.

      (2) Lines 194-195 (now line 206, expanded on in Line 210): We have expanded upon the intent and expectations of our analysis. In summary: if non-additive genetic-epigenetic interactions were not a significant explanatory factor for gene expression, we would expect to see no enrichment of low p-value results. Thus, we would expect 0.0000001% of results to reach adj. p < 1x10-7. Instead, we see 0.07% of our models coming in at adj. p < 1x10-7, four orders of magnitude greater than expected.

      (3) Lines 226-230 (Expanded on in Lines 252-276): We have relabeled Figure 1 and tweaked the surrounding text to clear up some confusing aspects. The percentages in the text are derived from the data summarized in the Euler plots in Figure 1D-E. These plots reflect the fact that each ATAC-seq peak and haplotype can be in multiple relationships with local genes and regulatory factors. Some of these relationships will be simple correlation between their presence and gene expression, while others may co-regulate alongside independent regulatory factors, or engage in non-additive regulatory interactions.

      (4) Line 261-263 (now lines 299-300): A companion to Figure 2B has been added (Fig. S3), which provides interaction counts for each ATAC-seq peak that contributed to Figure 2B. A horizontal line is included to highlight the locations of the highly-interacting ATAC peaks.

      (5) Analysis regarding Figure 3B had been removed from its original context. It has now been restored to the manuscript (Line 368-371).

    2. eLife assessment

      This important manuscript reports interactions between genetic variation, DNA accessibility, and chromatin structure in gene expression at a genome wide scale. The authors found that most of these interactions occur within topologically associating domains (TADs) and 3D genome structure data can be efficiently used to guide the discovery of significant genetic and epigenetic influences on gene expression. Overall, this convincing study highlights the importance of 3D chromatin structure in controlling how gene expression is regulated by genetic and epigenetic processes.

    3. Reviewer #1 (Public Review):

      This is an important manuscript that links gene expression to genetic variants and regions of open chromatin. The mechanisms of genetic gene regulation are essential to understanding how standing genetic variation translates to function and phenotype. This data set has the ability to add substantial insight into the field. In particular, the authors show how the relationships between variants, chromatin, and genes are spatially constrained by topologically associated domains.

    4. Reviewer #2 (Public Review):

      The experiments described in the manuscript are well designed and executed. Most of the data presented are of high quality, convincing, and in general support the conclusions made in the manuscript. This manuscript should be of great interest to the field of mammalian gene regulation and the approaches used here can have broader applications in studying genetic and epigenetic regulations of gene expression. The key finding reported here, the importance of 3D chromatin structure in controlling gene expression, although not unexpected, offers a better understanding of the physiological roles of TADs.

      Comments on revised version:

      I think the authors have substantially addressed reviewers' concerns. I have no further comments to add.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      (1) Since you only included patients with early-onset preeclampsia in the study, I suggest revising the title to "Identification of novel syncytiotrophoblast membrane extracellular vesicle derived protein biomarkers in early-onset preeclampsia...."

      We have changed our title to early-onset preeclampsia.

      (2) Under methods, you state that placenta was obtained from women undergoing elective cesarean section. Was this because all the study patients were delivered before the onset of labor? Or were laboring patients specifically excluded from the study?

      Indeed, labor influences the extracellular vesicles (EVs) generated. To ensure consistency in our samples and avoid this variable, we chose placentas obtained from elective cesarean sections (CS) for our study.

      (3) In Table 1 on page 10, the 8th row (Birth weight grams) needs to be reformatted. The mean birthweights for normal pregnancy and preeclampsia should be the same.

      We have reformatted the table and using ranges instead of brackets.

      (4) In the legend for Table 1, the sentence beginning on page 10, line 227, and continuing onto page 11, line 228, does not make sense. Part of the sentence was omitted inadvertently.

      We have modified this sentence to :

      Detergent treatment, which could break down EVs, with NP-40 confirmed that the majority (99%) of our samples were largely vesicular since only 0.1 ± 0.12% of BODIPY FL N-(2-aminoethyl)-maleimide and PLAP double-positive events were detected (a reduction of 99%) (Figure 1E and 1H).'

      (5) As you acknowledge, the sample size (12 patients) was small. This is understandable because early-onset preeclampsia occurs in <1% of parturients. You could collaborate with other centers in future studies to increase the sample size.

      Thank you very much for your comment. We are willing to cooperate on future research and will try to expand our sample size in subsequent studies.

      Reviewer #2 (Recommendations For The Authors):

      (1) This is one of the many "catalogue" papers where placental exosome proteins in preeclampsia are profiled. Thus, the manuscript lacks novelty. The only novelty factor is the authors have isolated exosomes by a different method and even separated the small and large exosomes. However, there is no mention of how these exosomes differ from each other in terms of their functionality. Thus it is hard to judge the biological significance of this work.

      We appreciate your insights regarding the novelty of our study. While numerous papers have profiled placental exosome proteins in preeclampsia, our methodology for enriching sSTB-EVs (exosomes) offers a distinct perspective. We believe that the separation of sSTB-EVs (exosomes) and medium/large STB-EVs (microvesicles) introduces a differentiation that extends beyond mere profiling, with implications for their functionality. There are previous studies showed that the different sizes of placenta EVs have distinct characteristics (Zabel RR, et al. Enrichment and characterization of extracellular vesicles from ex vivo one-sided human placenta perfusion. Am J Reprod Immunol. 2021 Aug;86(2)). Furthermore, the way cells internalize and respond to EVs may depend on the size of the EV (Zhuang X et al. Treatment of brain inflammatory diseases by delivering exosome encapsulated anti-inflammatory drugs from the nasal region to the brain. Mol Ther. 2011 Oct;19(10).) Therefore, it would be important for future studies to distinguish different sizes of EVs for the research.

      (2) The authors must demonstrate that these two types of EVs are also produced in vivo by detecting them in the serum of women.

      Thank you for the comment. Many previous studies have shown the two types of placental EVs in women's blood. Nakahara et al.'s (PMCID: PMC7755551) extensive review compiles studies that have specifically isolated various subtypes of placenta-derived EVs from maternal circulation. We have also readdressed it in the introduction.

      (3) The authors must compare the proteomes of serum-derived placental exosomes and the proteome of the STBs isolated from the perfusion experiments to judge how overlapping the outcomes are from those produced naturally and those produced under ex vivo conditions.

      We appreciate the reviewer's suggestion to compare the proteomes of serum-derived placental sSTB-EVs (exosomes) with those from STBs isolated through perfusion experiments. Indeed, such a comparison would provide valuable insights into the similarities and differences between naturally produced and ex vivo-generated sSTB-EVS (exosomes). However, isolating placental EVs from maternal circulation for comprehensive proteomic profiling presents challenges. It requires a significant amount of serum or plasma sample that will be sufficient to enable the isolation of placenta-specific EVs amongst numerous EVs in the circulation. In addition, it will require multiple intricate steps such as ultracentrifugation followed by immunoprecipitation. Each of these steps can potentially lead to the loss of EVs. Additionally, given the high concentration of lipoproteins in plasma relative to EVs, there's a significant risk of obtaining low-purity isolates from the outset. These challenges might compromise the comparability of results between placenta-specific EVs from maternal circulation and those from ex vivo perfusion. Nevertheless, we acknowledge the value of such an endeavor and will consider incorporating this aspect in future studies as the EV and proteomic methodology and technology improve and become more sensitive.

      (4) I have a major issue with the chosen study subjects. While the study title and the manuscript mention preeclampsia, as per the inclusion criteria mentioned in lines 88-90, the patients will be HELLP syndrome. Please clarify what was used and modify the manuscript accordingly.

      Thank you very much for finding this error. Our patients had none of the features that would qualify them for HELLP syndrome. We have edited to:

      PE was defined as new (after 20 weeks) systolic blood pressure of 140 mmHg or diastolic pressure of 90 mmHg, proteinuria (protein/creatinine ratio of 30 mg/mmol or more). None of our patients had maternal acute kidney injury, liver dysfunction, neurological features, hemolysis, or thrombocytopenia.

      (5) It is hard to reconcile how only 15 proteins were identified in the placental extract while 300+ in EVs. There is a methodological issue in the mass spec or extraction. With such widely different denominators in the total proteins identified, it is hard to compare the outcomes in terms of the three sample types.

      We acknowledge the reviewer's concerns regarding the disparity in protein counts between the placental extract and the EVs. Ultimately, more is not necessarily better. Several factors might contribute to this discrepancy. Firstly, it is plausible that certain proteins exhibit selective affinity to varying sizes of EVs, leading to a more diverse range of proteins than the placental extract. We were also stringent in our analysis to enable us to select proteins whose biological differences are more likely to be reproducible with a different validatory method like a western blot. Additionally, although the placental extract might contain a higher total protein concentration, it doesn't necessarily translate to a richer diversity of disease-specific proteins. Considering these nuances when comparing protein outcomes across sample types is helpful.

      (6) I am unable to understand the terms least differentially expressed and most differentially expressed. Do the authors mean upregulated and downregulated? Please clarify and use the terms appropriately by providing fold change values.

      We appreciate the reviewer's request for clarification. We intended to provide a relative measure of expression for the terms 'least differentially expressed' and 'most differentially expressed'. The terms are roughly equitable to down- and upregulated. Regarding EVs, we avoid using the terms 'upregulated' and 'downregulated' as EVs act as transporters and do not possess regulatory functions per se. However, for the placenta, we recognize the relevance of these terms.

      (7) The data presented is very superficial and lacks methodological details. The authors should provide the total number of targets achieved after mass spec. The cutoffs used the FDRs and other details.

      We apologize for the omission. We have added these details to the method section.

      (8) It is not clear how were these differentially abundant proteins identified. What was the cutoff used? Was it identified in all the replicates?

      We apologize for the omission. We have added these details to the method section.

      (9) How many samples were subjected to the discovery cohort, and how many were in the validation cohort? Were they the same or different? If the samples were different, how many PE samples had differentially abundant proteins by both methods?

      The study utilized 12 samples for initial discovery and another 12 for western blot validation. The validation samples specifically targeted proteins of interest, rather than undergoing another comprehensive mass spectrometry analysis.

      (10) It is striking that the authors report the expression of prostatic acid phosphatase in the placenta. In my understanding of placental biology, this gene or protein is not known to be expressed by the placenta. Please perform immunofluorescence to demonstrate that this protein is indeed produced in the STBs

      Research has revealed that even though it's called prostate-specific antigen, it's created in tissues other than the prostate, such as the placenta. Here are a couple of references to support this claim: PMID: 10634405, PMID: 7533063, PMID: 8939403, and PMID: 8945610. Hence it is likely not beneficial to demonstrate what many researchers have already demonstrated.

      (11) Please validate the differential abundance of these proteins in the exosomes isolated from the plasma of women with and without preeclampsia. A serial measurement will be of high value to determine how early as compared to hypertension, these biomarkers can predict preeclampsia.

      We are validating each EV-carried marker individually in the circulation (plasma or serum), localizing them in the placenta, and performing downstream functional analysis. This article is already lengthy and would likely be too cumbersome to include the details of all individual proteins in this manuscript. However, we have already published papers on Siglec 6 (PMID: 32998819) and Neprilysin (PMID: 30929513), and others will be published soon. We agree that there will be a lot of value to serial measurement, not just in terms of how early as compared to hypertension, these biomarkers can predict preeclampsia but also as potentially a more sensitive or specific test. This would be the subject of subsequent papers.

      (12) The authors are recommended to carry out immunofluorescence to localize the differentially abundant proteins in the placental sections and show that they are specific to STBs.

      We have already provided a similar response earlier (see response to point 11). In addition, while it is preferable, the biomarkers don't necessarily need to be specific to STB. Not all biomarkers are mechanistic agents/targets, and not all mechanistic agents are biomarkers. However, mechanistic agents should preferably be placental-specific. For example, the total sFLT1, the most studied biomarker, is not exclusively synthesized in the placenta, even though the placental-specific isoform represents a small fraction of the total sFLT-1. For example, in the non-placental world, alkaline phosphatase (ALP) is not exclusively produced by the liver but is a ‘biomarker’ of cholestatic disease.

      (13) Table 1 should give the range and SD could be given as + instead of the bracket.

      Thank you for your suggestion. We have edited it accordingly.

      (14) It is necessary to provide the gestational age of the onset of hypertension to get a judgment of how long these women were preeclamptic, culminating in HELLP.

      We want to emphasize that none of our patients experienced HELLP syndrome. In the results section, we have included the gestational age at the time of diagnosis in the table for preeclampsia. It's crucial to understand that the gestational age at diagnosis is distinct from the gestational age when hypertension initially appeared. Detecting the exact gestational age of hypertension onset would be challenging, and it would likely require a prospective or randomized clinical trial with continuous monitoring, possibly on a daily basis. However, our study is retrospective. Thus we can only comment on the gestational age at diagnosis

      (15) For newborns the term Sex is used and not gender

      Thank you for your suggestion. We have edited it accordingly.

      (16) Figure 2 is stretched and hard to read

      Thank you for your suggestion. We have edited it accordingly by creating two separate images to promote readability.

      (17) Line 278 change the sentence "there fifteen (15) proteins in the placenta" to "there were fifteen (15) proteins in the placenta"

      Thank you for your suggestion. We have edited it accordingly.

      (18) Line 288 you mean least and not lease

      Thank you for your suggestion. We have edited it accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our knowledge of how parasites evade the host complement immune system. The new cryo-EM structure of the trypanosome receptor ISG65 bound to complement component C3b is highly compelling and well-supported by biochemical experiments. This work will be of broad interest to parasitologists, immunologist, and structural biologists.

      We thank the reviewers and editorial team for this assessment of our work.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to use structural biology (cryo-EM), surface plasmon resonance, and complement convertase assays to understand the mechanism(s) by which ISG65 dampens the cytoxicity/cellular clearance to/of trypanosomes opsonised with C3b by the innate immune system.

      The cryo-EM structure adds significantly to the author's previous crystallographic data because the latter was limited to the C3d sub-domain of C3b. Further, the in vitro convertase assay adds an additional functional dimension to this study.

      The authors have achieved their aims and the results support their conclusions.

      The role of complement in immunity to T. brucei (or lack thereof) has been a significant question in molecular parasitology for over 30 years. The identification of ISG65 as the C3 receptor and now this study providing mechanistic insights represents a major advance in the field.

      Reviewer #2 (Public Review):

      This is an excellent paper that uses structural work to determine the precise role of one of the few invariant proteins on the surface of the African trypanosome. This protein, ISG65, was recently determined to be a complement receptor and specifically a receptor of C3, whose binding to ISG65 led to resistance to complement-mediated lysis. But the molecular mechanism that underlies resistance was unknown.

      Here, through cryoEM studies, the authors reveal the interaction interface (two actually) between ISG65 and C3, and based on this, make inferences regarding downstream events in the complement cascade. Specifically, they suggest that ISG65 preferably binds the converted C3b (rather than the soluble C3). Moreover, while conversion to a C3bB complex is not blocked, the ability to bind complement receptors 1 and 3 is likely blocked.

      Of course, all this is work on proteins in isolation and the remaining question is - can this in fact happen on the membrane? The VSG-coated membrane is supposed to be incredibly dense (packed at the limits of physical density) and so it is unclear whether the interactions that are implied by the structural work can actually happen on the membrane of a live trypanosome. This is not necessarily a dig but it should be addressed in the manuscript perhaps as a caveat.

      We thank the reviewer for their positive response our work. We fully agree with the reviewer about the caveats which come from this work being done in a biochemical context. We have addressed this in lines 223-24 and 327-333.

      Reviewer #3 (Public Review):

      The authors investigate the mechanisms by which ISG65 and C3 recognize and interact with each other. The major strength is the identification of eco-site by determining the cryoEM structure of the complex, which suggests new intervention strategies. This is a solid body of work that has an important impact on parasitology, immunology, and structural biology.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A paper by Sulzen et al was published online on 27th April in Nature Communications that has a similarity (the cryo-EM structure) to this paper. This does not detract from the value of this paper. The authors should, however, include a "compare and contrast" section in this paper to explain similarities and differences in the conclusions. For example, while this paper demonstrates that ISG65 does not prevent C3 convertase activity, the Sulzen paper suggests it does prevent C5 convertase activity. The compatibility of these conclusions should be discussed.

      Two studies of ISG65 were published shortly after submission of this manuscript (Sulzen et al and Lorenzen et al) and we have added a brief comparison of the conclusions of these papers here. These mentions include lines 151, 155-6, 201-2, 274-278, 292-93 and 321-323. For a more in-depth comparison we have published an opinion piece in Trends in Parasitology, which discusses all three of these papers and which we also now reference here.

      Could the authors comment as to whether they think the association of C3b with the unstructured region of ISG65 comes about via S-S shuffling? I.e., is C3B first thioester linked to VSG and then this rearranges to ISG65 through C3b-ISG65 proximity?

      We thank the reviewer for the interesting suggestion. However, we are not aware of evidence showing that C3b, which has been conjugated to a target protein through its covalent ester bond, then becomes transferred to a second target protein. As ISG65 can bind to C3 as well as C3b, we think that the conjugate could form when ISG65-bound C3 converts to C3b, becomes reactive and, through proximity, is most likely to conjugate to ISG65. Whether this occurs to a substantial degree in trypanosomes, or whether it is more likely that ISG65 interacts with C3b which is already VSG-conjugated, requires further experiments. We have edited lines 217-222 to make this point more clearly.

      Reviewer #3 (Recommendations For The Authors):

      The authors previously reported that ISG65 C-terminus is so flexible and is not resolved in their 2022 ISG65-C3d (TED of C3b) crystal structure, which is the same case here in the cryo-EM structure of ISG65-C3b. Thus, I am wondering how C3b might find the flexible C-terminus and form a covalent bond.

      We think that the answer to the reviewer’s question relates to local concentration. When two reactive compounds are not attached together, then they diffuse freely in three-dimensions and their likelihood of colliding and reacting is subject to the randomness of Brownian motion. However, if they bind together through an interaction distinct from the reactive residues, then this increases their relative local concentration and the likelihood of collision and reaction taking place. In the case of ISG65, this is coupled with the ability of ISG65 to bind to C3 before it converts to C3b and becomes reactive. The interaction of ISG65 with C3/C3b will therefore bring together the reactive residues and increases the probability that they will collide and form a conjugate. Our control with BSA, which does not bind to C3/C3b, and does not form these conjugates supports this conclusion. We have edited lines 217-222 to clarify.

      I also find it puzzling that deleting L2 or L3 in ISG65, which they found forming additional contracts with CUB domain of C3b (12 times binding tighter), does not affect the ISG65-C3b conjugate formation in the in vitro C3 convertase formation assay.

      When we consider the affinities that the L2 and L3 loop deletions variants have for ISG65, and the concentration of ISG65 in the C3 convertase assay, we would predict that the conjugates still form with the L2 and L3 variants. This binding would therefore increase the relative local concentration of the reactive residues and ensure preferential conjugate formation, as we observe.

      (1) Page 2 bottom line, "In particular, loop 2 forms a direct contact with the CUB domain of ISG65, centered around an electrostatic", ISG65 should be C3b.

      We thank the reviewer for spotting this. It has been corrected.

      (2) Page 4, "We found that ISG65 does not complete with either factor B or Factor D and does not block the binding of factor Bb (Figure 3b). This suggests that the C3 convertase can form in the presence of ISG65", "complete" should be "compete".

      It has been corrected.

      (3) Page 4, "revealed that in the presence of ISG65 a high molecular weight band appeared, which we identified through mass spectrometry to be a conjugate of ISG65 with C3b". There is no mass spectrometry data in the manuscript to support this.

      We agree with the reviewer that this data should be included in the paper and have now added it as Supplementary Table 3.

      (4) Page 5, "By inhibiting binding of CR2 to C3d, ISG65 will reduce the likelihood that B-cell receptor binding to trypanosome antigens will result in B-cell activation and antibody production." - this sentence is a bit confusing.

      We have clarified this point in lines 243-245.

      (5) Related to Figure 2a. "This structure reveals the two distinct interfaces formed between ISG65 and C3b (Figure 2a)." It would be clearer to label where interface 1 and interface 2 are in Figure 2a.

      We have now labelled interfaces 1 and 2 above the insets in Figure 2a.

      (6) Related to Figure 2C. I suggest mutagenesis to validate ISG65 L2/L3 - C3b CUB domain interaction, i.e. mutate ISG65 (N188, R187, Y190) and perform SPR with C3b.

      We agree with the reviewer that this experiment was a valuable validation of our structural data. To achieve this aim, we changed our SPR assay, coupling C3 variants to the chip surface in an orientation which would match their conjugation to a pathogen and allowing us to reliably compare the affinities of ISG65 variants. We then assessed the binding of ISG65, ISG65∆L2, and the ISG65L2N188A,H189A,Y190A proposed by the reviewer. As predicted from the structure, both loop 2 deletion and mutation reduced the affinity for C3b but did not affect the affinity for C3d, suggesting that the difference in affinity of ISG65 for C3b and C3d is due to the observed interface 2. This new data is described in lines 150-168 and is presented in Figure 2c.

      (7) Related to Figure 3a. Is the C3b only structure in the presence of ISG65 the real C3b only? Discussion can be added.

      Our cryoEM analysis of the ISG65-C3b mixture yielded three dimensional classes which contained clear density for ISG65 and those in which there was no density for ISG65. While the reviewer is technically correct, and we cannot be 100% sure that there is not an entirely disordered ISG65 attached to these ‘unbound’ C3b, we think that this is extremely unlikely. In either case, these ‘unbound’ C3b are indistinguishable from other structures of C3b and the argument in the paper stands. We have added a clause in lines 178-179 to make this point.

      (8) Related to Figure 3e. There is no label for WT and deletion mutants. Also, L1 and L3 deletion does not seem to show on the gel.

      We have added these labels.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the Reviewing Editor and two additional reviewers for the insightful input they gave us on the first version of our manuscript on allosteric activity regulation of the anaerobic ribonucleotide reductase from Prevotella copri. We have revised the manuscript in the light of the reviewers' comments. In particular, we have added additional experiments using hydrogen-deuterium exchange mass spectrometry (HDX-MS) to probe the accessibility and mobility of different parts of the protein structure in the apo-state and in the presence of dATP/CTP and ATP/CTP. The results strongly confirm the binding of nucleotides to the activity and specificity sites, as seen biochemically and structurally. In the question of mobility of the glycyl radical domain the HDX-MS experiments suggest an increased mobility in the presence of dATP, though the results are not as clear-cut as for the nucleotide binding. The HDX-MS analyses are complicated by the fact that they reflect all species in solution, which are evidently multiple for all states of PcNrdD. Finally, we have rephrased key parts of the results and discussion, and modified the title, to avoid any implication that we believe the glycyl radical domain becomes extensively disordered, rather that it becomes more mobile to the extent that it cannot be seen in the cryo-EM structures.

      eLife assessment

      This study advances our understanding of the allosteric regulation of anaerobic ribonucleotide reductases (RNRs) by nucleotides, providing valuable new structural insight into class III RNRs containing ATP cones. The cryo-EM structural characterization of the system is solid, but other aspects of the manuscript, which are incomplete, could be improved by including additional functional characterization and more evidence for the proposed mechanism of inhibition by dATP. The work will be of interest to biochemists and structural biologists working on ribonucleotide reductases and other allosterically regulated enzymes.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study is to understand the allosteric mechanism of overall activity regulation in an anaerobic ribonucleotide reductase (RNR) that contains an ATP-cone domain. Through cryo-EM structural analysis of various nucleotide-bound states of the RNR, the mechanism of dATP inhibition is found to involve order-disorder transitions in the active site. These effects appear to prevent substrate binding and a radical transfer needed to initiate the reaction.

      Strengths of the manuscript include the comprehensive nature of the work - including numerous structures of different forms of the RNR and detailed characterization of enzyme activity to establish the parameters of dATP inhibition. The manuscript could be improved, however, by performing additional experiments to establish that the mechanism of inhibition can be observed in other contexts and it is not an artifact of the structural approach. Additionally, some of the presentations of biochemical data could be improved to comply with standard best practices.

      The work is impactful because it reports initial observations about a potentially new mode of allosteric inhibition in this enzyme class. It also sets the stage for future work to understand the molecular basis for this phenomenon in more detail.

      We thank the editor and reviewers for their positive evaluation of the potential impact of our work. We completely agree that hypotheses based on structural data require orthogonal experimental verification. However, the number and consistency of the cryo-EM structures speak in favour of the data being representative of conditions in solution. We feel that in particular cryo-EM data should be relatively free of artefacts, e.g. biased or incorrect relative domain orientations, compared to crystallography, where crystal packing effects can affect these parameters. As we write in response to Reviewer #2, it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and only partly ordered in the dATP-bound tetramers. Further verification experiments will be performed in future but are outside the scope of the present article.

      We will improve the presentation of the biochemical data in a revised version.

      General comments:

      (1) It would be ideal to perform an additional experiment of some type to confirm the orderdisorder phenomena observed in the cryo-EM structures to rule out the possibility that it is an artifact of the structure determination approach. Circular dichroism might be a possibility?

      Circular dichroism reports only on the approximate relative proportions of helix, sheet and loop structure in a protein, thus we believe that it would not be a sensitive enough tool to distinguish between ordered and disordered states. We are considering what alternative methods might be appropriate.

      (2) Does the disordering phenomenon of one subunit in the ATP-bound structures have any significance - could it be related to half-of-sites activity? Does this RNR exhibit half-of-sites activity?

      Half-of-sites activity has not been biochemically proven in any ribonucleotide reductase in spite of the fact that it was first suggested in 1987 (PMID: 3298261). However, strong structural indication was recently published in the form of the holo-complex of the class Ia ribonucleotide reductase from Escherichia coli, which is highly asymmetrical and in which productive contacts forming an intact proton-coupled electron transfer pathway are only formed between one of two pairs of monomers (PMID: 32217749). We have not been able to prove half-of-sites activity for PcNrdD due to low overall radical content, but the structural results are indeed consistent with such an activity.

      (3) Does the disordering of the GRD with dATP bound have any long-term impact on the stability of the Gly radical? I realize that the authors tested the ability to form the Gly radical in the presence of dATP in Fig. 4 of the manuscript. But it looks like they only analyzed the samples after 20 min of incubation. Were longer time points analyzed?

      Radical content was measured after 5 min and 20 min incubation; 5 min incubations (not included in the manuscript) consistently gave higher radical content compared to 20 min incubation. Longer time points were not analysed, as we assumed that the radical content would be even lower after 20 min.

      (4) Did the authors establish whether the effect of dATP inhibition on substrate binding is reversible? If dATP is removed, can substrates rebind?

      This is an interesting question. We measured KDs for dATP in the micromolar range and are hence confident that dATP binding is reversible. Our measurements do not, however, directly prove that inhibition of the enzyme is reversible. Nevertheless, it is worth noting that the protein as purified was precipitated and analysed by the UV-visible spectrum. The aspurified PcNrdD contained 30% nucleotide contamination. The as-purified sample was then analysed by HPLC and we identified a major peak, corresponding to dATP/dADP. Therefore, purification conditions had to be optimised to remove the nucleotides. This is evidence that PcNrdD that has “seen” dATP can subsequently bind substrates in the presence of ATP. We will describe the purification more clearly in a revision.

      (5) In some figures (Fig. 6e, for example), the cryo-EM density map for the nucleotide component of the model is not continuous over the entire molecule. Can the authors comment on the significance of this phenomenon? Were the ligands validated in any way to ensure that the assignments were made correctly?

      Indeed we sometimes saw discontinuous density for the nucleotides, both in the active site and in the specificity site. However, the break was almost always near the C5’ carbon atom, which is common to all nucleotides. While we cannot readily explain this phenomenon, the nucleotides refined well with full occupancy, giving B-factors similar to those of the surrounding protein atoms. The identity of the nucleotide could always be inferred from a) the size of the base (purine or pyrimidine); b) the known nucleotide combinations added to the protein before grid preparation; c) prior knowledge on the combinations of effector and substrate that have been found valid for all RNRs since the first studies of allosteric specificity regulation.

      Reviewer #2 (Public Review):

      This manuscript describes the functional and structural characterization of an anaerobic (Class III) ribonucleotide reductase (RNR) with an ATP cone domain from Prevotella copri (PcNrdD). Most significantly, the cryo-EM structural characterization revealed the presence of a flap domain that connects the ATP cone domain and the active site and provides structural insights about how nucleotides and deoxynucleotides bind to this enzyme. The authors also demonstrated the catalytic functions and the oligomeric states. However, many of the biochemical characterizations are incomplete, and it is difficult to make mechanistic conclusions from the reported structures. The reported nucleotide-binding constants may not be accurate because of the design of the assays, which complicates the interpretation of the effects of ATP and dATP on PcNrdD oligomeric states. Importantly, statistical information was missing in most of the biochemical data. Also, while the authors concluded that the dATP binding makes the GRD flexible based on the absence of cryo-EM density for GRD in the dATP-bound PcNrdD, no other supports were provided. There was also a concern about the relevance of the proposed GRD flexibility and the stability of Gly radical. Overall, the manuscript provides structural insights about Class III RNR with ATP cone domain and how it binds ATP and dATP allosteric effectors. However, ambiguity remains about the molecular mechanism by which the dATP binding to the ATP cone domain inhibits the Class III RNR activity.

      Strengths:

      (1) The manuscript reports the first near-atomic resolution of the structures of Class III RNR with ATP domain in complex with ATP and dATP. These structures revealed the NxN flap domain proposed to form an interaction network between the substrate, the linker to the ATP cone domain, the GRD, and loop 2 important for substrate specificity. The structures also provided insights into how ATP and dATP bind to the ATP cone domain of Class III RNR. Also, the structures suggested that the ATP cone domain is directly involved in the tetramer formation by forming an interaction with the core domain in the presence of dATP. These observations serve as an important basis for future study on the mechanism of Allosteric regulation of Class III RNR.

      (2) The authors used a wide range of methodologies including activity assays, nucleotide binding assays, oligomeric state determination, and cryo-EM structural characterization, which were impressive and necessary to understand the complex allosteric regulation of RNR.

      (3) The activity assays demonstrated the catalytic function of PcNrdD and its ability to be activated by ATP and low-concentration dATP and inhibited by high-concentration dATP.

      (4) ITC and MST were used to show the ability of PcNrdD to bind NTP and dATP.

      (5) GEMMA was used successfully to determine the oligomeric state of PcNrdD, which suggested that PcNrdD exists in dimeric and tetrameric forms, whose ratio is affected by ATP and/or dATP.

      Weaknesses:

      (1) Activity assays.

      The activity assays were performed under conditions that may not represent the nucleotide reduction activity. The authors initiated the Gly radical formation and nucleotide reduction simultaneously. The authors also showed that the amount of Gly radical formation was different in the presence of ATP vs dATP. Therefore, it is possible that the observed Vmax is affected by the amount of Gly radical. In fact, some of the data fit poorly into the kinetic model. Also, the number of biological and technical replicates was not described, and no statistical information was provided for the curve fitting.

      The highest turnover activity of PcNrdD measured in presence of ATP was 1.3 s-1 (470 nmol/min/mg), a kcat comparable to recently reported values for anaerobic and aerobic RNRs from Neisseria bacilliformis, Leeuwenhoekiella blandensis, Facklamia ignava, Thermus virus P74-23, and Aquifex aeolicus (PMID: 25157154, PMID: 29388911, PMID: 30166338, PMID: 34314684, PMID: 34941255). The general trend illustrated in Figure 1 is that ATP has an activating effect on enzyme activity, whereas high concentrations of dATP have an inactivating effect on activity, which cannot be explained by suboptimal assay conditions since our EPR results consistently show that more radical is formed in incubations with dATP compared to incubations with ATP. Curve fitting methods used are listed in Materials and Methods (as specified in the Figure 1 legend), and standard errors for all specified curve fitting results (from triplicate experiments) are shown in Figure 1.

      (2) Binding assays.

      The interpretation of the binding assays is complicated by the fact that dATP binds both a- and s-sites and ATP binds a- and active sites. dATP may also bind the active site as the product. It is unknown if ATP binds s-site in PcNrdD. Despite this complexity, the binding assays were performed under the condition that all the binding sites were available.

      Therefore, it is not clear which event these assays are reporting.

      Both ITC and MST experiments involving ATP and dATP binding to the a-site were performed in the presence of at least 1 mM GTP substrate (5 mM in MST) to fill the active site, and 1 mM dTTP effector to fill the s-site (specified in the legend to Figure 2). These conditions enable binding of ATP or dATP only to the a-site in the ATP-cone.

      (3) Oligomeric states.

      Due to the ambiguity in the kinetic parameters and the binding constants determined above, the effects of ATP and dATP on the oligomeric states are difficult to interpret. The concentrations of ATP used in these experiments (50 and 100 uM) were significantly lower than KL determined by the activity assays (780 uM), while it is close to the Kd values determined by ITC or MST (~25 uM). Since it is unclear what binding events ITC and MST are reporting, the data in Figure 3 does not provide support for the claimed effects of ATP binding. For the effects of dATP, the authors did not observe a significant difference in oligomeric states between 50 or 100 uM dATP alone vs 50 uM dATP and 100 uM CTP. The former condition has dATP ~ 2x higher than the Kd and KL (Figure 1b) and therefore could be considered as "inhibited". On the other hand, NrdD should be fully active under the latter condition. Therefore, these observations show no correlation between the oligomeric state and the catalytic activity.

      The results in Figure 3 show that at in presence of 100 µM ATP plus 100 µM CTP the oligomeric equilibrium is 64% dimers plus 36% tetramers, and in presence of 50-100 µM dATP the oligomeric equilibrium is 32% dimers and 68% tetramers. We agree that there is no clear and strong correlation between oligomeric state and inhibition. We will also try to make it clearer in a revised version. Meanwhile, in order to add some clarity to our observations, SEC experiments at higher nucleotide concentrations will be done to strengthen our observations.

      (4) Effects of dATP binding on GRD structure

      One of the key conclusions of this manuscript is that dATP binding induces the dissociation of GRD from the active site. However, the structures did not provide an explanation for how the dATP binding affects the conformation of GRD or whether the dissociation of GRD is a direct consequence of dATP binding or it is due to the absence of nucleotide substrate. Also, Gly radical is unlikely to be stable when it is not protected from the bulk solvent. Therefore, it is unlikely that the GRD dissociates from the active site unless the inhibition by dATP is irreversible. Further evidence is needed to support the proposed mechanism of inhibition by dATP.

      We admit that it has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker can only be partly modelled in the dATP-bound tetramers. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes disorder of the GRD, given that all are part of a connected system (described as “nexus” in the manuscript). The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap.

      In any case a major conclusion of the work is that dATP does not inhibit the anaerobic RNR by prevention of glycyl radical formation but by prevention of its subsequent transfer. We agree that further evidence is required to support the proposed mechanism, but given the extent of the data already presented in the manuscript, we feel that such studies should be the subject of a future publication.

      (5) Functional support for the observed structures.

      Evidence for connecting structural observations and mechanistic conclusions is largely missing. For example, the authors proposed that the interactions between the ATP cone domain and the core domain are responsible for tetramer formation. However, no biochemical evidence was provided to support this proposal. Similarly, the functional significance of the interaction through the NxN flap domain was not proved by mutagenesis experiments.

      We did actually make mutants to verify the observed interactions, but several of them did not behave well in our hands, e.g. with regard to protein stability. Since we have no evidence that oligomerisation is coupled to inhibition, and since we did not observe any conservation between protein sequences in the interaction area, we chose not to pursue this point further. The main merit of the tetramer structures is that they allowed a high-resolution view of dATP binding to the ATP-cone and a comparison to previously-observed ATP-cones. Nevertheless, mutation experiments, also including the NxN flap, could be the subject of future work.

      Reviewer #3 (Public Review):

      The manuscript by Bimai et al describes a structural and functional characterization of an anaerobic ribonucleotide reductase (RNR) enzyme from the human microbe, P. copri. More specifically, the authors aimed to characterize the mechanism by how (d)ATP modulates nucleotide reduction in this anaerobic RNR, using a combination of enzyme kinetics, binding thermodynamics, and cryo-EM structural determination. One of the principal findings of this paper is the ordering of a NxN 'flap' in the presence of ATP that promotes RNR catalysis and the disordering of both this flap and the glycyl radical domain (GRD) when the inhibitory effector, dATP, binds. The latter is correlated with a loss of substrate binding, which is the likely mechanism for dATP inhibition. It is important to note that the GRD is remote (>30 Ang) from the binding site of the dATP molecule, suggesting long-range communication of the structural (dis)ordering. The authors also present evidence for a shift in oligomerization in the presence of dATP. The work does provide evidence for new insights/views into the subtle differences of nucleotide modulation (allostery) of RNR through long-range interactions.

      The strengths of the work are the impressive, in-depth structural analysis of the various regulated forms of PcRNR by (d)ATP using cryo-EM. The authors present seven different models in total, with striking differences in oligomerization and (dis)ordering of select structural features, including the GRD that is integral to catalysis. The authors present several, complementary biochemical experiments (ITC, MST, EPR, kinetics) aimed at resolving the binding and regulatory mechanism of the enzyme by various nucleotides. The authors present a good breadth of the literature in which the focus of allosteric regulation of RNRs has been on the aerobic orthologues.

      Given the resolution of some of the structures in the remote regions that appear to be of importance, the rigor of the work could have been improved by complementing this experimental studies with molecular dynamics (MD) simulations to reveal the dynamics of the GRD and loops/flaps at the active site.

      We have discussed with expert colleagues the possibility of carrying out MD simulations on the different states in order to study the differential effects of ATP and dATP binding on the dynamics of the GRD. However, they felt that the chance of obtaining meaningful results was low, particularly since some structural elements are missing from the models for both forms, in particular the linker between the ATP-cone and the core.

      The biochemical data supporting the loss of substrate binding with dATP association is compelling, but the binding studies of the (d)ATP regulatory molecules are not; the authors noted less-than-unity binding stoichiometries for the effectors.

      Most of the methods used measure only binding strength, not the number of binding sites (N), whereas ITC also measures number of sites. N is dependent on the integrity of the protein, i.e. the number of protein molecules in a preparation that are involved in binding, and quite often gives lower values than the theoretical number of binding sites.

      Also, the work would benefit from additional support for oligomerization changes using an additional biochemical/biophysical approach.

      SEC (chromatography), GEMMA (mass spectrometry) and cryo-EM were used to study oligomerization. Since each method has restrictions on nucleotide concentrations as well as protein concentrations that can be used, the results are not directly comparable, but all three methods indicate nucleotide dependent oligomerization changes. The SEC results will be included in a revised version.

      Overall, the authors have mostly achieved their overall aims of the manuscript. With focused modifications, including additional control experiments, the manuscript should be a welcomed addition to the RNR field

      Recommendations for the authors: Reviewer #1 (Recommendations For The Authors):

      (1) The last sentence of the abstract is not complete. The structures implicate a complex network of interactions in ... ? What do they implicate?

      A couple of words seem to have been missed from the abstract. We have rewritten the end of the abstract to emphasise better that the dynamical transitions involve a linked network of interactions and not just the GRD.

      (2) A reference is needed in the second sentence of the introduction.

      We have added a reference as requested.

      (3) Page 2, paragraph 2. The authors state "two beta subunits (NrdB) harboring a stable radical." This is not accurate. First of all, each beta subunit harbors its own cysteine oxidant.

      And in several subclasses, that oxidant is not a stable radical but an oxidized metal cluster. Please revise to improve accuracy and also provide appropriate references.

      We have revised the description and added a recent reference.

      (4) Page 4, Fig. 1, panels C and D. The fit of the curve to the data is pretty poor. Is there an explanation? Could the data be improved in some way? In general, it is also best practice nowadays to show the individual data points in addition to the error bars in plots like the ones shown in Figure 1. Please modify the plots to include the individual data points in this figure - and probably also the subsequent figures showing binding data.

      We have modified relevant panels in Figures 1, 2 and 5 as requested.

      (5) Page 12, first paragraph. The authors state that one of the monomers in the ATP-CTP structure is well ordered and the other is less ordered. It would be ideal to show in a figure the basis for this conclusion using the cryo-EM maps. The "less ordered" monomer appears to be fully modeled.

      Since the 2-fold axis of the dimer is vertical, the GRD of the left-hand monomer is hidden from view at the back of the molecule in Figure 6. For this monomer there was a small amount of density that allowed modelling of part of the glycyl radical loop (though not the tip containing the radical Gly itself) and the NxN flap, albeit with significantly higher mobility. We have illustrated this through an additional supplement for Figure 6 (figure supplement 2) in which the B-factors of the residues are shown both as a ribbon with radius proportional to the B-factor and through colouring. We hope that the four views in Figure 6 (figure supplement 2) together illustrate the relative mobility of different parts of the dimer.

      It would also be ideal to show the basis for the conclusion that the entire GRD is disordered in the dATP-bound dimer structure.

      Thank you for this suggestion. We have added a fifth supplement to Figure 8 in which we show the cryo-EM reconstruction for the dATP-bound dimer in two orientations, with the ATP-CTP-bound structure superimposed, which clearly shows that the entire GRD, the ATPcones, linker and NxN flap are all disordered in both monomers.

      Reviewer #2 (Recommendations For The Authors):

      (1) Units to describe enzyme activity.

      • The unit for the specific activity in the main text (nmol/min•mg) is unusual. It is most likely a typo of nmol/min/mg or nmol/(min•mg).

      We have changes to nmol/min/mg in the text.

      • The unit for the Vmax is unusual and should not be confused with the specific activity. By definition, Vmax is the velocity of a reaction at a defined enzyme concentration/amount. For example, if an assay of 10 mg enzyme yielded 470 nmol of product in 1 min, Vmax is 470 nmol/min, whereas the specific activity is 47 nmol/min/mg.

      The velocity as calculated above is ca 1.3 s-1. We have added kcat values to accompany the specific activities given.

      (2) Steady-state kinetic analysis.

      • The steady-state kinetic analysis in Figure 1 needs to be repeated. While the nonlinear curve fitting for Figure 1a is reasonable, those in Figures 1b, 1c, and 1d were outside the error range. Consequently, the reported kinetic parameters are unlikely accurate. The authors should repeat the assays with different enzyme preparation to account for all the errors. If the fit curve is still outside the error range, the kinetic model is likely incorrect, and the authors need to investigate different kinetic models.

      The replotted Figure 1 now includes two different experiments for 1b (four replicates in total).

      • The authors should report the number of replicates and the statistical data for the curve fitting.

      The figure legend has been updated with statistical data for all curve fits, and the number of replicates has been added.

      • The authors should report Vmax, Ki, and KL for Figure 1d.

      Results in Figures 1c and 1d are less straightforward than those in Figures 1a and 1b where the s-site is filled with dTTP, favouring binding of GTP to the active site. The curve fit in Figure 1c is disturbed at high concentrations of ATP, which plausibly competes with the CTP substrate and results in inhibition by formed dATP. The curve fit in Figure 1d is less certain since reduction of substrate is low due to intrinsic CTP reduction in absence of effector and partially overlapping activation and inhibition effects of dATP.

      • The authors should consider presenting the data in a log scale because of the complex nature of the activation/inhibition at the lower concentrations of dATP.

      Log scale plots are included as insets in Figures 1b and 1d.

      • The basal level of CPT reduction in the absence of an effector nucleotide should be reported with an error.

      The error value has been added in the figure legend for the basal level of CTP reduction in the absence of effector.

      (3) Equations for the kinetic analysis.

      -The equations should be numbered and referred to in the Figure 1 legend.

      All equations are specified and numbered in Materials and Methods. The equation used for each curve fit in the panels in Figure 1 is specified in the figure legend.

      -KL must be defined in the main text. I suppose this is Kd for ATP or dATP. The equation for KL determination is missing brackets for dNTP.

      KL (the concentration of an allosteric effector that gives half maximal enzyme activity) is defined in Materials and Methods where the equation is described. KL is not the same as KD (the dissociation constant for a ligand and its receptor). Brackets have been added to equation 1.

      • I believe dNTP in the first equation is incorrect because ATP was the ligand for Figures 1A and 1C.

      [dNTP] in the first equation has been changed to [NTP/dNTP] to indicate that both ribonucleotides and deoxyribonucleotides can bind.

      • The second equation can be expressed as dATP as I believe this is the only ligand that inhibits the enzyme.

      We prefer to keep the more general [dNTP] in the equation.

      • The equation used for the fitting in Figure 1d must be defined more clearly than "a combination of the two equations".

      The equation used for the curve fit in Figure 1d has been specified as equation 3 in Materials and Methods.

      (4) Design of the activity assays

      It is not clear if the activity assays report the rate of glycyl radical formation or nucleotide reduction. The authors mixed NrdD and NrdG and initiated the reaction by adding formate (essential for nucleotide reduction) and dithionite (Gly radical formation). The Gly radical formation is slow (in min time scale). The authors reported that ATP/dATP affected the rate of Gly radical formation and in the presence of ATP, Gly radical formation was incomplete even after 20 min. Therefore, it is possible that within the timescale of the activity assays (5 min), the reactions could be partially limited by the Gly radical formation, which may be the reason for the poor curve fitting.

      Activity assays were performed with 5 min pre-incubation without dithionite and formate (no glycyl radical formation) and 10 min incubation after addition of dithionite and formate (glycyl radical formation plus substrate reduction). During earlier tests, NrdD and NrdG were first preincubated in the presence of dithionite (glycyl radical formation) and after addition of formate the substrate reduction was monitored during 20 min. These experiments resulted in lower enzyme activity, whereas higher activity was achieved only upon formate addition to the preincubation reaction. We suppose that the presence of dithionite, which is a strong reducing agent, affected NrdD stability and the reaction was stabilised by the presence of formate at an earlier stage of the reaction. For the EPR conditions used in the paper, 5 min incubation gave higher radical content compared to 20 min, and the reported activity assay gave highest activity after 10 min incubation; kcat of 1.3 s-1.

      (5) Methods section for the activity assays.

      • The concentration of dTTP, ATP, and dATP used in the assays must be described.

      We thank the reviewer for pointing out this omission and we have now specified the concentrations used.

      • Although the authors mentioned that they changed the concentration of dTTP, such data were not presented. Is this correct? Did the authors fix the dTTP concentration for the GTP reduction?

      We apologise for the ambiguity and have specified that the dTTP concentration was fixed at 1 mM in the GTP experiments and that only the ATP or dATP concentrations were varied.

      (6) Discrepancy between Ki/KL and Kd.

      • There is a significant ambiguity remaining about the binding event that the ITC and MST results are reporting. Although dATP binds to both a- and s-sites and ATP binds to both active site and a-site, only a single binding event was observed in both cases. To distinguish the dATP binding to a- and s-sites and the active site, the authors should perform binding assays using mutant enzymes with only one of the binding sites available for dATP/ATP binding.

      MST and ITC were performed in presence of substrate (1 mM GTP) and s-site effector (1 mM dTTP in ITC experiments, and 5 mM dTTP in MST experiments), thus dATP is blocked from binding to the s-site and ATP from binding to the active site.

      • There are significant differences between Kd determined by MST or ITC and Ki/KL determined by the activity assays. Kd measurements were performed in the absence of the substrate nucleotides, while the assays required substrates. There may be complications from the presence of NrdG and the Gly radical formation. The authors must clearly describe all these complications and the discrepancy between Kd and Ki/KL.

      MST, ITC and enzyme assays were all performed in the presence of substrate, and enzyme assays also contained NrdG, which was not present in the MST and ITC analyses. While KD is a thermodynamic constant representing the affinity of ligand to its binding site - in our case an effector nucleotide to the ATP-cone, KL is a kinetic constant (the allosteric effector concentration that gives half maximal activity) representing the relationship between the effector concentration and the reaction speed and is affected by the enzyme turnover number (kcat). The relationship between KD, KL and Ki is further complicated by conformational and possibly oligomeric state changes of NrdD upon binding of allosteric effectors, which occurs on a slower time scale than the rapid exchange of nucleotides in allosteric sites.

      • The results of ATP/dATP copurification experiments shown in Figure 2 - figure supplement 1 show the preference of dATP binding over ATP. However, the results do not necessarily support the competition between ATP and dATP for binding to the ATP cone domain. It is still possible that dATP binding to the s-site diminishes the binding of ATP to the a-site.

      Our aim was to exclude the possibility that ATP and dATP can bind to the ATP-cone at the same time and not to study competition between the two. Nevertheless, to eliminate the possibility that dATP binding to the s-site could affect nucleotide binding to the a-site, in two out of three conditions described in the supplementary figure, the experiments were performed in the presence of dTTP to prevent binding of dATP to the s-site.

      (7) Oligomeric states.

      • The authors must present the GEMMA results without ATP or dATP. Otherwise, the effects of ATP and dATP on the oligomeric state are not clear.

      We cannot report GEMMA results without ATP or dATP because apo-PcNrdD was unstable in the GEMMA buffer and clogged the capillaries. Instead, SEC analysis was performed on apo-PcNrdD in a more suitable buffer and showed a homogeneous peak corresponding to a dimer (included as Figure 3 - figure supplement 1).

      • Figure 3 does not support the induction of a2 upon ATP binding. The concentrations of ATP used in these experiments (50 and 100 uM) were significantly lower than KL determined by the activity assays (780 uM), while it is close to the Kd values determined by ITC or MST (~25 uM). Since it is unclear what binding events ITC and MST are reporting, the data in Figure 3 does not provide support for the claimed effects of ATP binding.

      MST and ITC were performed in the presence of substrate (1 mM GTP) and s-site effector (1 mM dTTP in ITC experiments, and 5 mM dTTP in MST experiments), and they thus measure binding of ATP or dATP to the ATP cone. SEC analysis with 2 µM apo-PcNrdD and higher nucleotide concentrations (1 mM) was performed, confirming the presence of both dimers and tetramers in solution at different ratios depending on the addition of ATP or dATP. The SEC analysis, included as Figure 3 - figure supplement 1, confirms the existence of an equilibrium in solution.

      • The effects of dATP must be presented more clearly. The authors did not observe a significant difference in oligomeric states between 50 or 100 uM dATP vs 50 uM dATP and 100 uM CTP. The former condition has dATP ~ 2x higher than the Kd and KL (Figure 1b) and therefore could be considered as "inhibited". On the other hand, NrdD should be fully active under the latter condition. The absence of difference in the oligomeric states between these two different conditions suggested to me that the oligomeric state does not regulate the NrdD activity. The authors seemed to indicate the same conclusion, but did not describe it clearly.

      We agree that the oligomeric state most likely does not regulate the NrdD activity and hope to have explained this better in the revised version.

      • Figure 3 legend mentioned a and b, but the figure was not labeled.

      We have corrected this.

      • The authors should triplicate the analysis and report the errors.

      Five scans were added for each trace to increase the signal-to-noise level (included in figure legend).

      (8) EPR characterization of Gly radical

      • The amount of Gly radical must be quantified by EPR. The authors must report how much NrdD has Gly radical.

      The concentration of NrdD (1 µM) in the activity assays is too low to be quantified by EPR. In the EPR experiment the glycyl radical content is given in the figure legend.

      • The authors claim that the Gly radical environment was similar based on the doublet feature. However, the double feature comes from the hyperfine splitting with α proton whose orientation relative to the radical p-orbital would not be affected by the conformation or the environment. Thus, this conclusion is incorrect and must be removed.

      We thank the reviewer for the clarifying comment and have removed our suggestion in the text.

      (9) Gly711 should be shown in Fig. 6e to help readers understand the last paragraph on page 12.

      The figure reference has been changed to Fig. 7, where this is shown more clearly. In Fig. 6e, inclusion of Gly711 would obscure other important information.

      (10) GRD structure with dATP

      The disorder of GRD in the presence of dATP does not agree with the formation of Gly radical under the same conditions. Gly radical is unlikely stable if it is extensively exposed to solvent. Most likely, the observed cryo-EM structures represent the conformation irrelevant to Gly radical formation.

      We agree that the glycyl radical is unlikely to be stable if exposed to solvent. We believe that the GRD is not completely disordered but most likely made more mobile through rigid body movements of the domain to an extent that makes it invisible in the cryo-EM maps. It is most likely still in the vicinity of the active site, shielding the glycyl radical. Our new HDX-MS results show a small but tangible increase in mobility of the GRD in the presence of dATP compared to ATP. Of course the differences in dynamics remain to be confirmed. It is worth noting that the group of Catherine Drennan at MIT published a conference abstract more than a year ago that suggested a similar pattern of ordered/dynamic GRDs, based on crystal structures, though the details have not yet been published (https://doi.org/10.1096/fasebj.2022.36.S1.R3407).

      We also agree that the cryo-EM structures do not show the GRD conformation relevant to Gly radical formation, as this has been shown spectroscopically for the GRE pyruvate formate lyase to require large conformational changes in the GRD and also the presence of the activase. However, revealing this conformation would be a completely different project. We postulate that inactivation proceeds by prevention of radical transfer to the substrate, not by prevention of its formation.

      We have altered the wording in several places in the revised manuscript, including the title, to avoid using the term “disorder”, as this may imply (partial) unfolding, and we certainly do not wish to imply that.

      (11) The difference between dATP and ATP binding

      From the presented structures, it was not clear how the absence of 2'-OH affects the oligomeric state and the structure of the GRD. The low resolution of the ATP-bound structure precluded the comparison between the ATP and dATP-bound structures.

      We agree that a detailed analysis of the differences between ATP- and dATP-bound structures requires higher resolution structures, particularly of the ATP-bound form. This will be the subject of future studies.

      (12) Conclusion about the disordered GRD.

      -The authors should describe the reason why the dATP binding affected the structure of GRD. The authors did not discuss why dATP binding affected the folding or mobility of GRD. Since this is the key conclusion of this manuscript and the authors are making this conclusion based on the absence of the ordered GRD structure (hence the negative results), the authors should carefully describe why the dATP binding does not allow the binding/folding of GRD in the position observed in the ATP-bound structure.

      As mentioned in our response to point 4 in this reviewer’s Public Review, it is difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker cannot be completely modelled even in the dATP-bound tetramers. Our first hypotheses were that the ATP-cone might work by a steric occlusion mechanism, but the reality appears more complex. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes higher mobility of the GRD, given that all are part of a connected system. The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap. We hope that future structural studies of NrdDs from other organisms may shed further light on this mechanism.

      • The authors should test if the dATP inhibition is reversible for PcNrdD. If dATP binding induces dissociation of GRD from the active site and makes GRD flexible, Gly radical would most likely be quenched by formate or other components in the assay solution. If dATP inhibition is reversible, it is hard to believe that Gly radical dissociates completely from the active site.

      As-purified PcNrdD contains dATP and can after removal of bound nucleotides bind substrate in presence of ATP. The as-purified PcNrdD protein contained 30% nucleotide contamination. After precipitation, HPLC analysis identified a major peak corresponding to dATP/dADP. Purification conditions were optimised to remove the nucleotides and we have added this information to the purification description.

      (13) Functional support for the observed structures.

      Similar to X-ray crystallography, cryo-EM is a highly selective method that requires the selection of particles that can be analyzed with sufficient resolution. This means that the analysis could be biased towards the protein conformations stable on the cryo-EM grid. Consequently, testing the structural observations by functional characterization of mutant enzymes is critical. However, the authors did not perform such functional characterizations and made conclusions purely based on the structural observations.

      We acknowledge this limitation. We constructed several mutations located at the tetrameric interface between the ATP-cone and the core protein based on the cryo-EM structure of dATP loaded NrdD. Unfortunately, these mutant proteins were unstable and led to protein cleavage.

      (14) Other minor points:

      • In the introduction, the authors stated "The presence and function of the ATP-cone domain distinguish anaerobic RNRs from the other members of the large glycyl radical enzyme (GRE) family that are otherwise structurally and mechanistically related (Backman et al., 2017)." This statement is misleading because GREs are functionally diverse.

      We have removed the words “and mechanistically” to reduce ambiguity.

      • p. 12, e.g. should be removed.

      We are not sure what is meant here. Does the reviewer mean p. 21 “The interactions are mostly hydrophobic but are reinforced by several H-bonds, e.g. between Gln3D-Gln458A, Ser53D–Gln458A, Arg11D-Asp468A, the main chain amide of Ile12D and Tyr557A.”?

      Reviewer #3 (Recommendations For The Authors):

      Overall, the work presents an impressive and in-depth structural view of the conformational changes stemming from the interactions of (d)ATP allosteric effector molecules that are interrelated to RNR function. The manuscript is written clearly and provides a solid overview of RNR chemistry. The cryo-EM data show striking differences between ATP and dATP bound forms, though in select regions, the resolution is not good enough for strong interpretations of the finer details.

      (1) In cryo-EM structures, dATP appears to shift the oligomerization equilibrium from nearly all dimeric forms (absence of dATP) to a mixture of both dimeric and tetrameric species (presence of dATP). The examination of the oligomeric composition in solution using the GEMMA - a mass spectral technique - showed somewhat similar trends, though given the magnitude of the differences, it was less compelling. Have the authors considered a complementary solution technique, such as analytical SEC or dynamic light scattering that could provide further support for the change in oligomerization as observed in the cryo-EM?

      SEC analysis with 2 µM apoPcNrdD and higher nucleotide concentrations (1 mM) was performed, confirming the presence of both dimer and tetramer in solution at different ratios depending on the addition of ATP or dATP. The SEC analysis, included as Figure 3 - figure supplement 1, confirms the existence of an equilibrium in solution.

      (2) The protein as isolated from the final SEC shows a predominant peak corresponding to aggregate protein. It would be helpful if the authors ran an analytical SEC on the protein sample that is more refined to see how much soluble dimer/tetramer vs. aggregate protein there is. This could impact the kinetic and thermodynamic analysis of effector interactions. Further, the second major peak is labeled as 'monomer'. Is the protein isolated as a monomer and then forms dimer upon effector binding? It is unclear. The authors should consider presenting the SEC standards for the given column and buffer condition so that a reasonable estimate of the oligomerization status of the isolated protein can be assigned.

      Can the reviewer possibly have believed that Figure 1 - supplementary figure 2a shows PcNrdD rather than PcNrdG? The figure supplement corresponds to the as-isolated SEC analysis of the activase (PcNrdG), which shows the presence of two main peaks of aggregates and monomer. The monomeric peak was reinjected and showed no presence of further aggregation states. Currently it is not known which oligomeric state the activase harbours upon binding to PcNrdD and glycyl radical formation. None of the other SEC figures in the MS has any predominant peak corresponding to aggregated protein.

      (3) More details are needed for the ITC section. The ITC methods are not clear. What is the exact composition of the ligand solution being titrated into the protein solution? It is unclear how the less-than-unity binding stoichiometry was determined and what it means. Is the n value for the monomer, dimer, or tetramer forms? It is concerning that n < 1 is observed for dATP binding in the ITC whereas there are 3 dATP bound/subunit in the cryo-EM. For completeness, titration of a buffer into protein solution (no ligand) should be conducted and presented to demonstrate that the heats produced in Figure 2 correspond to the ligand only (and not a buffer mismatch).

      ITC experiments were performed in the presence of 1 mM GTP (c-site) and 1 mM dTTP (ssite). Unlike other parameters in ITC analyses, the N value is usually the least accurate of all fitted parameters and strongly depends on the concentration of the active protein in the sample. N values described in the current study are in the same range as values reported for ATP-cones in other RNRs and NrdR (Rozman Grinberg & al 2018a, 2018b, 2022 McKethan and Spiro 2013). The results most likely reflect two high-affinity binding sites for dATP and one high affinity binding site for ATP. Different nucleotide concentrations were used in the cryoEM and ITC experiments.

      (4) It is intriguing that the binding of dATP doesn't quell the glycyl radical. In fact, it appears that, as the authors suggest, the amount of glycyl radical might be increased in these samples. However, the cryo-EM data indicates that the GRD is disordered. It is unclear how these would be correlated, as one would not expect a disordered structural element to maintain such a potent oxidant.

      As already written above, we do not wish to imply that the GRD is completely or even highly disordered, just that its dynamics increase in the presence of dATP. Otherwise we completely agree that a very exposed Gly radical is incompatible with its stability. It could be that the amount of disorder is exaggerated somewhat by the vitrification process in cryo-EM. We have tried to reword some of the text to emphasise higher mobility rather than disorder.

      It has been difficult to propose a direct structural mechanism for transmission of the allosteric signal from the a-site in the ATP-cone to the active site and GRD given that the ATP-cones and linker are disordered in the dATP-bound dimers and that the linker can not be completely modelled even in the dATP-bound tetramers. We initially thought that a steric occlusion mechanism might be at play, but the reality appears more complex. Most likely dATP binding causes a change in the dynamics of the linker region and NxN flap that directly affects substrate binding and simultaneously causes higher mobility of the GRD, given that all are part of a connected system. The structures determined in the presence of dATP and CTP show that CTP cannot bind in the absence of an ordered NxN flap. We hope that future structural studies of NrdDs from other organisms may shed further light on this mechanism.

      (5) It is a bit difficult to keep track of the myriad of structural information and differences amongst the various nucleotide-dependent conditions. It would be useful for the authors to add a summary figure that depicts the various oligomers, orientations, and (dis)ordered structural elements with cartoon representations.

      Thank you for this suggestion. It has been added as Figure 11.

      (6) The mechanism by which (d)ATP binding changes the (dis)ordering of select loops based on the current cryo-EM data is unclear (even the authors agree). The addition of molecular dynamics (MD) simulations on two different structures to reveal the network or structural communication would be a great addition to the work and validate the structural data.

      We have discussed this with a colleague who is an expert in MD. Their advice was that such simulations would be very difficult given that some amino-acids are missing in both of the relevant starting structures (ATP-CTP and dATP-CTP dimer) and could give very variable results. Thus we chose to do complementary experiments with hydrogen-deuterium exchange mass spectrometry (HDX-MS) instead. The results are included in the revised manuscript.

      Minor points

      (1) There are some conflicting reports as to whether P. copri is considered a human 'pathogen'. According to Yeoh, et al Scientific Reports 2022, P. copri is one of the predominant microbes in the human gut and is linked to a positive impact on metabolism. Perhaps the addition of a citation that provides support for it as a pathogen would clarify the statement on p. 3.

      We have added a recent reference (Nii T, Maeda Y, Motooka D, et al. (2023) Genomic repertoires linked with pathogenic potency of arthritogenic Prevotella copri isolated from the gut of patients with rheumatoid arthritis. Ann Rheum Dis 82: 621-629. doi: 10.1136/annrheumdis-2022-222881).

      (2) In Figure 3, the number of dimers/tetramers for dATP (100 uM) does not add up to 100.

      What is the other 2%?

      Thank you for pointing this out - it has been corrected.

      (3) The data in Figures 5C and D do show slight changes that could be fit and interpreted as a 'weak' interaction. Thus, the statement on p 9 "where dATP-loaded PcNrdD could bind neither GTP nor CTP" should be changed to indicate that the interactions are weak (or that the nucleotides weakly associate).

      The text and the figure have been changed according to the reviewer’s suggestion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Firstly, the authors place a great deal of emphasis on the impact of the Hif1-a inhibitor PX-478. The literature surrounding this inhibitor and its mode of action indicates that it is not a direct inhibitor of activity but that its greatest impact is on the production of Hif1-a. The authors do include another inhibitor as a control, Echinomycin, but it does not appear to be as biologically active and the panel of experiments conducted with this is extremely limited. I would be more comfortable with a full Seahorse experimental panel for Echinomycin, similar to SFig 2.G as performed with PX-478.

      We thank the reviewer for their comment highlighting the different mechanisms of action of the HIF-1α inhibitors used in this article. While echinomycin inhibits the binding of HIF-1α to the hypoxia response element (HRE) thereby blocking HIF-1a DNA binding capability, PX-478 inhibits HIF-1α deubiquitination, decreases HIF-1α mRNA expression, and reduces HIF-1α translation. We have included a paragraph explaining this phenomenon in the new version of the manuscript (page 9). In addition, we extended the panel of experiments performed with echinomycin, which confirmed a marked inhibition of the glycolytic pathway when DCs were stimulated with irradiated Mtb in the presence of echinomycin as assessed by SCENITH (new Figure S3H).

      Similarly, it would be of value to have Seahorse profiling that directly excludes FAO from the metabolic profile through the use of Etomoxir as an inhibitor of fatty acid oxidation, which one would assume would have no impact on the metabolic response.

      In order to estimate the contribution of FAO towards fueling protein synthesis in DCs stimulated with iMtb, the FAO inhibitor etomoxir was incorporated to the SCENITH method as previously described (Adamik et al., 2022). Overall, FAO dependence was found to be less than 10% in DCs, regardless of their activation state. While mitochondrial dependence is reduced after iMtb stimulation, there is no difference in FAO dependence, suggesting that OXPHOS is primarily driven by glucose in iMtb-stimulated cells. This is consistent with HIF1α-induced increase of glucose metabolism-related genes. We have adjusted the results section to include this new result (new Figure S1).

      Aside from these minor points, I believe this to be a rigorous study.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 1 and Fig. 2, the authors conclude that Mtb rewires the metabolism of Mo-DCs and induces both glycolysis and OXPHOS. The data shows that infection with iMtb or Mtb increases glucose uptake and lactate release, suggesting an increase in glycolysis. However, an increase in lactate is not a measure of glycolysis. Lactate is a byproduct of glycolysis; the end product of glycolysis is pyruvate.

      We are grateful for the reviewer's comment, as it gives us the opportunity to explain the conceptual framework on which we based our study. Traditionally, pyruvate has been considered to be the end product of glycolysis when oxygen is present and lactate the end product under hypoxic conditions. Numerous studies have shown that lactate is produced even under aerobic conditions (Brooks, 2018). Therefore, we frame this work in accordance with this view that states that glycolysis begins with glucose as its substrate and terminates with the production of lactate as its main end product (Rogatzki, Ferguson, Goodwin, & Gladden, 2015; Schurr, 2023; Schurr & Schurr, 2017).

      Secondly, since the authors have access to the Agilent Extracellular Flux Analyzer, they should have performed detailed ECAR/OCR measurements to conclusively demonstrate that both glycolysis and OXPHOS are increased in Mo. This is especially important for OXPHOS because the only readout shown for OXPHOS is an increase in mitochondrial mass (figure 1 G, H), which is not acceptable. Overall, the data does not indicate that Mtb triggers OXPHOS in the dendritic cells. It only indicates dead iMtb increases the mass of mitochondria in DCs.

      The reviewer’s advice is well appreciated. However, we would like to clarify what may be a misunderstanding; that is, the assays alluded to by the reviewer were not performed on monocytes but on DCs. As advised by the reviewer, we now include the OCR measurements by Seahorse and describe the figures according to their order of appearance in the new version of the manuscript.

      What happens to the mitochondrial mass when infected with live Mtb?

      In response to the reviewer’s question, we determined the mitochondrial mass in infected DCs with live Mtb. In contrast to DCs treated with irradiated Mtb, those infected with live bacteria showed a clear reduction of their mitochondrial mass (modified Figure 1G). This result indicates that, although both Mtb-infected and irradiated Mtb-exposed DCs show a clear increase in their glycolytic activity, divergent responses are observed in terms of mitochondrial mass.

      It will be best if the authors indicate in the figure headings that dead Mtb was used.

      We agree with the reviewer. For figures 1-3, we applied the term “Mtb” in the figure headings since both irradiated and viable bacteria were used for the corresponding experiments. In figures 4-5, the term “iMtb” (alluding to irradiated Mtb) was used in the figure headings as suggested by the reviewer. For the remaining figures, the term “iMtb” was indicated in their legends when dead bacteria weres used to stimulate DCs.

      E.g., Figure 1F; what does live Mtb do to GLUT1 levels etc etc?

      In response to the reviewer’s question, we included new data about Glut1 expression in DCs infected with live Mtb in the latest version of the manuscript. In line with the increase in glucose uptake shown in figure 1B, we observed an increase in the percentage of Glut1 positive DCs upon Mtb infection (new Figure 1F, lower panels). The increase in Glut1 expression strengthens the notion that DCs activates their glycolytic activity in response to the infection, as demonstrated by the elevated release of lactate, glucose consumption, HIF-1α expression, LDHA expression (Figure 1) and glycolytic activity (Figure 2, SCENITH results with viable Mtb). Therefore, these data strongly support the induction of glycolysis by Mtb (either viable or irradiated) in DCs.

      Also, we found that they were still able to activate CD4+ T cells from PPD+ donors in response to iMtb. This activation of CD4 T cells with iMtb in the presence of a HIF-1alpha inhibitor is expected, as iMtb is dead and not virulent. What happens when the cells are infected with live virulent Mtb?

      We would like to clarify the main purpose of the DC-T cells co-culture assays in the presence of the HIF-1α inhibitors. To characterize the impact of HIF-1α on DC functionality, we assessed the capacity of DCs to activate autologous CD4+ T cells when stimulated with iMtb in the presence of HIF-1α inhibitors. To this end, we used iMtb merely as a source of antigens to load DCs and evaluate the effect of HIF-1α inhibition on the activation of antigen-specific T cell. The use of viable Mtb may introduce confounding factors, such as pathogen-triggered inhibitory mechanisms (e.g., EsxH secretion by Mtb, (Portal-Celhay et al., 2016)), which would prevent us from reaching conclusions about the role of HIF-1α. Thus, we consider that the use of live bacteria for this experiment is out of the scope of this manuscript.

      The authors demonstrated that CD16+ monocytes from TB patients have higher glycolytic capacity than healthy controls Fig 7. The authors should differentiate TB patient monocytes into DCs and measure their bioenergetics to test if infection alters their glycolysis and OXPHOS.

      In agreement with the reviewer, the determination of metabolic pathways in DCs differentiated from monocytes of TB patients is a key aspect of this work. Accordingly, the bioenergetic determinations of DCs generated from monocytes from TB patients versus healthy subjects are now illustrated in Figures 6F (lactate release) and 6G (SCENITH profile).

      In the discussion, the authors state that "pathologically active glycolysis in monocytes from TB patients leads to poor glycolytic induction and migratory capacities of monocyte-derived DCs." However, the data from Fig. 1 and 2 show that treatment with iMtb or Mtb induces glycolysis in MoDCs. How do the authors explain these contrasting results?

      We thank the reviewer for pointing out this issue. Figures 1 and 2 show DCs differentiated from monocytes of healthy donors (HS). In this case, DCs from HS respond to Mtb by inducing a glycolytic and migratory profile. Yet, in the case of monocytes isolated from TB patients, these cells exhibit an early glycolytic profile from the beginning of differentiation, ultimately yielding DCs with low glycolytic capacity and low migratory activity in response to Mtb. We included this explanation in the discussion (page 18) to better clarify this issue.

      Also, the term "pathological" active glycolysis (Introduction and Discussion) is an inappropriate term.

      As requested by the reviewer, we excluded the term “pathological” to describe the phenomenon reported in this study.

      Lastly, it should be shown whether the DCs generated from CD16+ monocyte from TB patients generate tolerogenic and/or aberrant DCs, which have lower glycolytic and migration capacity compared to the CD16- monocyte population. In Figure 7B, the authors should discuss why the CD16+ monocyte population has lower glycolytic capacity compared to CD16- monocytes in healthy donors. Furthermore, in contrast to the TB patients, do DCs generated from CD16+ monocyte in healthy donors have increased glycolytic and migration capacity compared to CD16- monocyte (because these monocytes showed lower glycolytic capacity)? Furthermore, if there is no difference in glycolytic capacity among the three monocyte populations in TB patients, on what basis was it concluded that DCs generated only from the CD16+ monocyte population may be the cause of lower migration capacity? The authors state in Figure 7F that the DMOG pretreatment matches the situation where the Mo-DCs from TB patients showed reduced migration. Did the authors check the Hif-1alpha levels in monocytes obtained from TB patients?

      We appreciate this in-depth analysis by the reviewer because it allows us to clarify some interpretations of the SCENITH results in Figure 7B. It is important to keep in mind that with the SCENITH technique we can only infer about the relative contributions between the metabolic pathways, without alluding to the absolute magnitudes of such contributions. In this regard, it is key to note that the amount of lactate released during the first hours of the TB monocyte culture is much higher than that released by monocytes from healthy subjects (HS, Figure 7A), even when most of monocytes, which are CD14+ CD16-, have comparable glycolytic capacities between HS and TB. Another example to illustrate how to interpret SCENITH results can be found in Figure 2, where a lower mitochondrial dependence is observed in iMtb-stimulated DCs (Figure 2A), while the absolute ATP production associated to OXPHOS is indeed higher as measured by Seahorse (Figure 2D). Therefore, the glycolytic capacity is not a direct readout of the magnitude of glycolysis, but of its contribution to total metabolism. The low levels of lactate released from HS monocytes likely reflects their low activation state and low metabolic activity compared to TB monocytes. In this regard, we have previously demonstrated that monocytes from pulmonary TB patients display an activated phenotype (Balboa et al., 2011). The fact that there is no difference between the glycolytic capacities of TB and HS CD16- monocytes indicates that their proportional contributions to protein synthesis are comparable (again, without inferring about their absolute values, which may be very different).

      Beyond the previous clarification, the reviewer's proposal to isolate subsets of monocytes is a very interesting idea. However, the experimental approach is very difficult based on the amount of blood we can obtain from patients. The cohort of patients included in this work comprises very severe patients and we are given up to 15-20 ml of peripheral blood from each. This volume of blood yields up to 10 million PBMC with approximately 1 million monocytes. If we separate the monocyte subsets, the recovered cells per condition will be insufficient to perform the intended assays.

      Nevertheless, we incorporate new evidence that TB disease is associated with an increased activation and glycolytic profile of circulating CD16+ monocytes.

      i) First, we show that the baseline glycolytic capacity of CD16+ monocytes correlates with time since the onset of TB-related symptoms (new Figure 7C).

      ii) Second, we performed high-throughput GeneSet Enrichment Analysis (GSEA) on transcriptomic data (GEO accession number: GSE185372) of CD14+CD16-, CD14+CD16+ and CD14dimCD16+ monocytes isolated from individuals with active TB, latent TB (IGRA+), as well as from TB negative healthy controls (IGRA-). We found enrichments that, unlike oxidative phosphorylation, glycolysis tends to increase in active TB in both CD14+CD16+ and CD14dimCD16+ monocytes (new Figure 7D).

      iii) We measured the expression of HIF-1α in monocyte subsets by FACS and found that this transcription factor is expressed at higher levels in CD16+ monocyte subsets from TB patients compared to their counterparts from healthy donors (new Figure 8 A). We consider this result justifies the assays shown in Figure 8B-C, in which we prematurely activated HIF-1α in healthy donor monocytes during early differentiation to DCs and measured its impact on the migration of the generated DCs.

      In the Discussion, the authors mention that circulating monocytes from TB patients differentiate from DCs with low immunogenic potential. However, the authors have not shown any immunological defect in any of their data with monocytes from TB patients. In the proxy model mentioned in Figure 7, they have in fact shown that these preconditioned DCs have higher CD86 expression. Can the authors explain/show data to justify the statement in the first paragraph of the Discussion?

      We agree with the reviewer on this observation. Our findings are limited to the generation of DCs with low migratory potential (low chemotactic activity towards CCL21 of DC differentiated from TB patient monocytes shown in figure 6H and of DC generated from pre-conditioned monocytes shown in figure 8C). We have modified that part of the discussion to better clarify this point, replacing migratory with immunogenic.

      The authors should note that oxamate is a competitive inhibitor of the enzyme lactate dehydrogenase and not glycolysis. Also, LDHA catalyzes the conversion from pyruvate to lactate and not the other way around (Results, page 6).

      This comment relates to the first one by the reviewer, in which the dogma of glycolysis was discussed. According to the new conception of glycolysis, it begins with glucose as its substrate and terminates with the production of lactate as its main end product.

      The following statements by the authors on page 6 are incorrect: "Because irradiated and viable Mtb induced comparable activation of glycolysis, we subsequently performed all our assays with irradiated Mtb only in the rest of the study due to biosafety reasons." and: "To our knowledge, this is the first study addressing the metabolic status and migratory activity of Mo-DCs from TB patients."

      We deleted the first sentence and reworded the second sentence as "To our knowledge, this is the first study to address how the metabolic status of monocytes from TB patients influences the migratory activity of further differentiated DCs".

      The Discussion reads as if live Mtb was used in the experiments, which is not the case. This should be corrected.

      We changed Mtb for iMtb when it was the case in the discussion. In some cases, Mtb stimulation was used instead of Mtb infection.

      Minor Comments:

      (1) In Figure 1F legend "Quantification of Glut1+ cells plotted to the right". The underlined part should be "plotted below".

      It was corrected.

      (2) In Figure 1H. Please describe the quantitation method and describe how many cells or the number/size of fields were used to quantitate mitochondria.

      For mitochondrial morphometric analysis, TEM images were quantified with the ImageJ “analyze particles” plugin in thresholded images, with size (μm2) settings from 0.001 to infinite. For quantification, 8–10 cells of random fields (1000x magnification) per condition were analyzed. We included this information in the methods section of the new version of the manuscript.

      (3) Please mention the number of independent experimental repeats for each experimental data set and figure.

      In each figure, the number of independent experiments is indicated by individual dots.

      (4) In Figure 2A legend, "PER; left panel" should be PER; lower panel and "OCR; right panel" should be OCR; upper panel.

      It was corrected.

      References for reviewers

      Adamik, J., Munson, P. V., Hartmann, F. J., Combes, A. J., Pierre, P., Krummel, M. F., … Butterfield, L. H. (2022). Distinct metabolic states guide maturation of inflammatory and tolerogenic dendritic cells. Nature Communications 2022 13:1, 13(1), 1–19. https://doi.org/10.1038/s41467-022-32849-1

      Balboa, L., Romero, M. M., Basile, J. I., Sabio y Garcia, C. A., Schierloh, P., Yokobori, N., … Aleman, M. (2011). Paradoxical role of CD16+CCR2+CCR5+ monocytes in tuberculosis: efficient APC in pleural effusion but also mark disease severity in blood. Journal of Leukocyte Biology. https://doi.org/10.1189/jlb.1010577

      Brooks, G. A. (2018). Cell Metabolism The Science and Translation of Lactate Shuttle Theory. Cell Metab. https://doi.org/10.1016/j.cmet.2018.03.008

      Portal-Celhay, C., Tufariello, J. M., Srivastava, S., Zahra, A., Klevorn, T., Grace, P. S., … Philips, J. A. (2016). Mycobacterium tuberculosis EsxH inhibits ESCRT-dependent CD4+ T-cell activation. Nature Microbiology, 2, 16232. https://doi.org/10.1038/NMICROBIOL.2016.232

      Rogatzki, M. J., Ferguson, B. S., Goodwin, M. L., & Gladden, L. B. (2015). Lactate is always the end product of glycolysis. Frontiers in Neuroscience, 9(FEB), 125097. https://doi.org/10.3389/FNINS.2015.00022/BIBTEX

      Schurr, A. (2023). From rags to riches: Lactate ascension as a pivotal metabolite in neuroenergetics. Frontiers in Neuroscience, 17, 1145358. https://doi.org/10.3389/FNINS.2023.1145358/BIBTEX

      Schurr, A., & Schurr, A. (2017). Lactate, Not Pyruvate, Is the End Product of Glucose Metabolism via Glycolysis. Carbohydrate. https://doi.org/10.5772/66699

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Thank you for your continued review and for providing insightful suggestions. Below, I share some unpublished new findings related to the MYRF ChIP, comment on the potential interplay between myrf-1 and myrf-2, and describe the modifications we've implemented to address the reviewers' comments.

      (1) MYRF-1 ChIP

      Our collaboration with the modERN (Model Organism Encyclopedia of Regulatory Networks) project has recently yielded MYRF ChIP data. The results demonstrate clear and consistent MYRF binding across samples, notably on the lin-4 promoter. Given the significant detail and extensive description required to adequately present these findings, we have decided it is impractical to include them in the current paper. These results will be more suitably published in a separate ongoing study focused on MYRF's regulatory targets during larval development.

      (2) Inter-regulation between myrf-1 and myrf-2

      We acknowledge the interpretation that myrf-2 may act as a genetic antagonist to myrf-1, as suggested by the delayed arrest in myrf-1; myrf-2 double mutants and a trend towards increased lin-4 expression in myrf-2 mutants. Additionally, our unpublished data suggest an elevated myrf-2 expression peak in myrf-1 null mutants during the L1-L2 transition, indicating a potential mutual repressive interaction between myrf1 and myrf-2.

      On the other hand, myrf-1 and myrf-2 exhibit functional redundancy in DD synaptic rewiring and lin-4 expression. A gain of function in myrf-2 promotes early DD synaptic rewiring. Furthermore, three independent co-immunoprecipitation analyses targeting myrf-1::gfp, myrf-2::gfp, and pan-1::gfp confirm a tight association between myrf-1 and myrf-2 in vivo. These findings challenge the notion of myrf-2 primarily antagonizing myrf-1, or vice versa.

      We propose a model where myrf-1 and myrf-2 collaborate and are functionally redundant, with compensatory elevated expression when one paralog is absent. For instance, the loss of myrf-1 triggers upregulation of myrf-2, which, though insufficient on its own, accelerates the transcriptional program and exacerbates system deterioration, leading to accelerated death. How exactly this takes place is currently unclear. We notice the MYRF binding on both myrf-1 and myrf-2 genes in MYRF-ChIP.

      Given the complexity of these interactions, we have chosen not to delve deeply into this discussion in the paper without more direct evidence, which would require detailed analysis.

      (3) Revisions Addressing Reviewer Suggestions

      (a) We have revised our interpretation of the mScarlet signal changes in myrf-1(ybq6) and myrf-2(ybq42) mutants to reflect a more nuanced understanding of their potential genetic relationship, as highlighted in the main text.

      “The mScarlet signals exhibit a marked reduction in the putative null mutant myrf-1(ybq6) (Figure 1D, E). Intriguingly, in the putative null myrf-2(ybq42) mutants, there is a noticeable trend towards increased mScarlet signals, although this increase does not reach statistical significance (Figure 2C, D).”

      (b) In response to feedback on Figure 2 and the characterization of lin-4(umn84) mutants, we've included a new series of images showing lin-4(umn84)/+ and lin-4(umn84) signals through larval stages, presented as Figure 2 Figure Supplement 2. This addition clarifies the functional status of lin-4 nulls in our study.

      “Our observations revealed that mScarlet signals were not detected early L1 larvae (Figure 2C-F; Figure 2 Figure Supplement 2).”

      (c) To improve the clarity of Fig 6, we've added indicator arrows in the red, green, and merge channels, enhancing the visualization of the signals.

      We appreciate the opportunity to clarify these points and hope that our revisions and additional data address the concerns raised.

    2. Reviewer #1 (Public Review):

      In this work, the authors set out to ask whether the MYRF family of transcription factors, represented by myrf-1 and myrf-2 in C. elegans, have a role in the temporally controlled expression of the miRNA lin-4. The precisely timed onset of lin-4 expression in the late L1 stage is known to be a critical step in the developmental timing ("heterochronic") pathway, allowing worms to move from the L1 to the L2 stage of development. Despite the importance of this step of the pathway, the mechanisms that control the onset of lin-4 expression are not well understood.

      Overall, the paper provides convincing evidence that MYRF factors have a key role in promoting lin-4 expression in young larvae. Using state-of-the-art techniques (knock-in reporters and conditional alleles), the authors show that MYRF factors are essential for lin-4 activation and act cell-autonomously. Results using some unusual gain-of-function alleles are supported by consistent results using other approaches. The authors also provide evidence supporting the idea that MYRF factors activate lin-4 by directly activating its promoter. Because these results are indirect test of this, further experiments will be necessary to conclusively determine whether lin-4 is indeed a direct target of MYRF factors. myrf-1 and myrf-2 likely function redundantly to activate lin-4; potential complex interactions between these two genes will be an interesting area for future work.

      Overall, the paper's results are convincing. The important findings on miRNA regulation and the control of developmental timing will make this work of interest to a broad range of developmental biologists.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the reviewers' and editors' comments and suggestions on our manuscript "Transposable elements regulate thymus development and function." We performed additional analyses to validate our results and rephrased some manuscript sections according to the comments. We believe these changes significantly increase the solidity of our conclusions. Our point-by-point answer to the reviewers' and editors' comments is detailed below. New data and analyses are shown in Figure 1d, Figure 2g and h, Figure 5e and f, Figure 1 – figure supplement 1, Figure 2 – figure supplement 2, Figure 3 – figure supplement 1 and 2, Figure 4 – figure supplement 2, Figure 5 – figure supplement 1, as well as the corresponding text sections.

      Reviewer #1:

      (1) The authors sometimes made overstatements largely due to the lack or shortage of experimental evidence.

      For example in figure 4, the authors concluded that thymic pDCs produced higher copies of TE-derived RNAs to support the constitutive expression of type-I interferons in thymic pDCs, unlike peripheral pDCs. However, the data was showing only the correlation between the distinct TE expression pattern in pDCs and the abundance of dsRNAs. We are compelled to say that the evidence is totally too weak to mention the function of TEs in the production of interferon. Even if pDCs express a distinct type and amount of TE-derived transcripts, it may be a negligible amount compared to the total cellular RNAs. How many TE-derived RNAs potentially form the dsRNAs? Are they over-expressed in pDCs?

      The data interpretation requires more caution to connect the distinct results of transcriptome data to the biological significance.

      We contend that our manuscript combines the attributes of a research article (novel concepts) and a resource article (datasets of TEs implicated in various aspects of thymus function). The critical strength of our work is that it opens entirely novel research perspectives. We are unaware of previous studies on the role of TEs in the human thymus. The drawback is that, as with all novel multi-omic systems biology studies, our work provides a roadmap for a multitude of future mechanistic studies that could not be realized at this stage. Indeed, we performed wet lab experiments to validate some but not all conclusions: i) presentation of TE-derived MAPs by TECs and ii) formation of dsRNAs in thymic pDCs. In response to Reviewer #1, we performed supplementary analyses to increase the robustness of our conclusions. Also, we indicated when conclusions relied strictly on correlative evidence and clarified the hypotheses drawn from our observations.

      Regarding the Reviewer's questions about TE-derived dsRNAs, LINE, LTR, and SINE elements all have the potential to generate dsRNAs, given their highly repetitive nature and bi-directional transcription (1). As ~32% of TE subfamilies are overexpressed in pDCs, we hypothesized that these TE sequences might form dsRNA structures in these cells. To address the Reviewer's concerns regarding the amount of TE-derived RNAs among total cellular RNAs, we also computed the percentage of reads assigned to TEs in the different subsets of thymic APCs (see Reviewer 1 comment #4).

      (2) Lack of generality of specific examples. This manuscript discusses the whole genomic picture of TE expression. In addition, one good way is to focus on the specific example to clearly discuss the biological significance of the acquisition of TEs for the thymic APC functions and the thymic selection.

      In figure 2, the authors focused on ETS-1 and its potential target genes ZNF26 and MTMR3, however, the significance of these genes in NK cell function or development is unclear. The authors should examine and discuss whether the distinct features of TEs can be found among the genomic loci that link to the fundamental function of the thymus, e.g., antigen processing/presentation.

      We thank the Reviewer for this highly relevant comment. We investigated the genomic loci associated with NK cell biology to determine if ETS1 peaks would overlap with TE sequences in protein-coding genes' promoter region. Figure 2h illustrates two examples of ETS1 significant peaks overlapping TE sequences upstream of PRF1 and KLRD1. PRF1 is a protein implicated in NK cell cytotoxicity, whereas KLRD1 (CD94) dimerizes with NKG2 and regulates NK cell activation via interaction with the nonclassical MHC-I molecule HLA-E (2, 3). Thus, we modified the section of the manuscript addressing these results to include these new analyses:

      "Finally, we analyzed publicly available ChIP-seq data of ETS1, an important TF for NK cell development (4), to confirm its ability to bind TE sequences. Indeed, 19% of ETS1 peaks overlap with TE sequences (Figure 2g). Notably, ETS1 peaks overlapped with TE sequences (Figure 2h, in red) in the promoter regions of PRF1 and KLRD1, two genes important for NK cells' effector functions (2, 3)."

      (3) Since the deep analysis of the dataset yielded many intriguing suggestions, why not add a discussion of the biological reasons and significance? For example, in Figure 1, why is TE expression negatively correlated with proliferation? cTEC-TE is mostly postnatal, while mTEC-TE is more embryonic. What does this mean?

      We thank the Reviewer for this comment. To our knowledge, the relationship between cell division and transcriptional activity of TEs has not been extensively studied in the literature. However, a recent study has shown that L1 expression is induced in senescent cells. We therefore added the following sentences to our Discussion:

      "The negative correlation between TE expression and cell cycle scores in the thymus is coherent with recent data showing that transcriptional activity of L1s is increased in senescent cells (5). A potential rationale for this could be to prevent deleterious transposition events during DNA replication and cell division."

      We also added several discussion points regarding the regulation of TEs by KZFPs to answer concerns raised by Reviewer 2 (see Reviewer 2 comment #1).

      (4) To consolidate the experimental evidence about pDCs and TE-derived dsRNAs, one option is to show the amount of TE-derived RNA copies among total RNAs. The immunohistochemistry analysis in figure 4 requires additional data to demonstrate that overlapped staining was not caused by technical biases (e.g. uneven fixation may cause the non-specifically stained regions/cells). To show this, authors should have confirmed not only the positive stainings but also the negative staining (e.g. CD3, etc.). Another possible staining control was showing that non-pDC (CD303- cell fractions in this case) cells were less stained by the ds-RNA probe.

      We thank the Reviewer for this suggestion. We computed the proportion of reads in each cell assigned to two groups of sequences known to generate dsRNAs: TEs and mitochondrial genes (1). These analyses showed that the proportion of reads assigned to TEs is higher in pDCs than other thymic APCs by several orders of magnitude (~20% of all reads). In contrast, reads derived from mitochondrial genes had a lower abundance in pDCs. We included these results in Figure 4 – figure supplement 2 and included the following text in the Results section entitled "TE expression in human pDCs is associated with dsRNA structures":

      "To evaluate if these dsRNAs arise from TE sequences, we analyzed in thymic APC subsets the proportion of the transcriptome assigned to two groups of genomic sequences known as important sources of dsRNAs, TEs and mitochondrial genes (1). Strikingly, whereas the percentage of reads from mitochondrial genes was typically lower in pDCs than in other thymic APCs, the proportion of the transcriptome originating from TEs was higher in pDCs (~22%) by several orders of magnitude (Figure 4 – figure supplement 2)."

      As a negative control for the immunofluorescence experiments, we used CD123- cells. Indeed, flow cytometry analysis of the magnetically enriched CD303+ fraction was around 90% pure, as revealed by double staining with CD123 and CD304 (two additional markers of pDCs): CD123- cells were also CD304-/lo, showing that these cells are non-pDCs. Thus, we decided to compare the dsRNA signal between CD123+ cells (pDCs) and CD123- cells (non-pDCs). The difference between CD123+ and CD123- cells was striking (Figure 4d).

      Author response image 1.

      Reviewer #1 (Recommendations For The Authors):

      It was sometimes difficult for me to recognize the dot plots representing low expression against the white background. e.g., figure 1 supplement 1.

      We thank the Reviewer for their comment, and we modified Figure 1 – figure supplement 1 as well as Figure 3 – figure 3 supplement 2 to improve the contrast between dots and background.

      Reviewer #2:

      Reviewer #2 (Recommendations For The Authors):

      (1) In the abstract, results and discussion, the following conclusions are drawn that are not supported by the data: a) TEs interact with multiple transcription factors in thymic cells, b) TE expression leads to dsRNA formation, activation of RIG-I/MDA5 and secretion of IFN-alpha, c) TEs are regulated by cell proliferation and expression of KZFPs in the thymus. All these statements derive from correlations. Only one TF has ChIP-seq data associated with it, dsRNA formation and/or IFN-alpha secretion could be independent of TE expression, and whilst KZFPs most likely regulate TEs in the thymus, the data do not demonstrate it. The authors also seem to suggests that AIRE, FEZF2 and CHD4 regulate TEs directly, but binding is not shown. The manuscript needs a thorough revision to be absolutely clear about the correlative nature of the described associations.

      We agree with Reviewer #2 that some of the conclusions in our initial manuscript were not fully supported by experimental data. In the revised manuscript, we clearly indicated when conclusions relied strictly on correlative evidence and clarified the hypotheses drawn from our observations. Regarding the regulation of TE expression by AIRE, FEZF2, and CHD4, we reanalyzed publicly available ChIP-seq data of AIRE and FEZF2 in murine mTECs. For AIRE, we confirmed that ~30% of AIRE's statistically significant peaks overlap with TE sequences (see Reviewer 2, comment #6 for more details on read alignment and peak calling), confirming its ability to bind to TE sequences directly. We added these results to the main figures (Figure 5f) and modified the "AIRE, CHD4, and FEZF2 regulate distinct sets of TE sequences in murine mTECs" as follows:

      “[…]. As a proof of concept, we validated that 31.42% of AIRE peaks overlap with TE sequences by reanalyzing ChIP-seq data, confirming AIRE's potential to bind TE sequences (Figure 5f)."

      A reanalysis of FEZF2's ChIP-seq data yielded no significant peaks while using stringent criteria. For this reason, we decided to exclude these data and only use AIRE as a proof of concept.

      Regarding KZFPs, we agree with Reviewer #2 that their impact on TE expression is probably significantly underestimated in our data. A potential reason for this is that KZFP expression is typically low; thus, transcriptomic signals from KZFPs could have been missed by the low depth of scRNA-seq. We mentioned this point in the Discussion:

      "On the other hand, the contribution of KZFPs to TE regulation in the thymus is likely underestimated due to their typically low expression (6) and scRNA-seq's limit of detection."

      (2) On the technical side, there are many dangers about analyzing RNA-seq data at the subfamily level and without stringent quality control checks. Outputs may be greatly confounded by pervasive transcription (see PMID 31425522), DNA contamination, and overlap of TEs with highly expressed genes. Whether TE transcripts are independent units or part of a gene also has important implications for the conclusions drawn. I would say that for most purposes of this work, an analysis restricted to independent TE transcripts, with appropriate controls for DNA contamination, would provide great reassurances that the results from subfamily-level analyses are sound. Showing examples from the genome browser throughout would also help.

      We agree with the Reviewer that contamination could have interfered with TE quantification. We used FastQ Screen (7) to evaluate the contamination of our human scRNA-seq data. As illustrated in the Figure below, most reads aligned with the human genome, and there were no reads uniquely assigned to another species analyzed, confirming the high purity of our dataset.

      Author response image 2.

      As stated by the Reviewer, pervasive expression is another factor that can lead to overestimation of TE expression. To evaluate if pervasive expression impacted the results of our differential expression analysis of TEs between APC subsets, we visualized read alignment to TE sequences using a genome browser. We selected two samples containing the highest numbers of mTEC(II) and pDCs (T07_TH_EPCAM and FCAImmP7277556, respectively) and used STAR to align reads to the human genome (GRCh38). We then visualized read alignment to randomly selected loci of two subfamilies identified as overexpressed by mTEC(II) or pDCs (HERVE-int and Harlequin-int, respectively). The examples below show that the signal detected is specific to the TE sequences located in introns. Even though this visualization cannot guarantee that pervasive expression did not affect TE quantification in any way, it increases the confidence that the signal detected by our analyses genuinely originates from TE expression.

      Author response image 3.

      Author response image 4.

      Author response image 5.

      Author response image 6.

      Author response image 7.

      (3) Related to the above, it would be useful to describe in the main text (and methods) how multi-mapping reads are being handled. It wasn't clear to me how kallisto handles this, and it has implications for the results. In the analysis suggested above, only uniquely mapped reads would have to be used, despite its limitations.

      We agree with the Reviewer that this information regarding assignment of multimapping reads is important. Kallisto uses an expectation-maximization (EM) algorithm to deal with multimapping reads, a strategy used by several algorithms developed to study TE expression (8). Briefly, the EM algorithm reassigns multimapping reads based on the number of uniquely mapped reads assigned to each sequence. Thus, we added the following details to the methods section:

      "Preprocessing of the scRNA-seq data was performed with the kallisto (9), which uses an expectation-maximization algorithm to reassign multimapping reads based on the frequency of unique mappers at each sequence, and bustools workflow."

      (4) Whilst I liked the basic idea, I am not convinced that correlating TE and TF expression is a good strategy for identifying TE-TF associations at enhancers. Enhancers express very low levels of short transcripts, which I doubt would be detected in low-depth scRNA-seq data. The transcripts the authors are using to make such associations may therefore have nothing to do with the enhancer roles of TEs. I would limit these analyses to cell types for which there is histone modification data and correlate TF expression with that instead.

      We agree with the Reviewer that it would have been interesting to correlate the expression of TFs with signals of histone marks at TE sequences. However, we could not perform this analysis because we did not have matched data of histone marks throughout thymic development. Therefore, we adopted an alternative, well-suited strategy.

      Our strategy to identify TE enhancer candidates is depicted in Figure 2a: i) correlation between the expression of the TF and the TE subfamily, ii) presence of the TF binding motif in the sequence of the TE enhancer candidate, and iii) colocalization of the TE enhancer candidate with significant peaks of H3K27ac and H3K4me3 in the same cell type from the ENCODE Consortium ChIP-seq data. We limited our analyses to the eight cell types present both in our dataset and the ENCODE Consortium: B cells, CD4 Single Positive T cells (CD4 SP), CD8 Single Positive T cells (CD8 SP), dendritic cells (DC), monocytes and macrophages (Mono/Macro), NK cells, Th17, and Treg.

      (5) Figure 2G: binding of ETS1 is unconvincing. Were there statistically meaningful peaks called in these regions? It would be good to also show a metaplot/heatmap of ETS1 profile over all elements of relevant subfamilies. Showing histone marks on the genome browser snapshots would also be useful. Is there any transcriptional evidence that the specific Alus shown act as alternative promoters?

      We agree with the Reviewer that the examples provided were not particularly convincing. Thus, we reanalyzed the data to determine if statistically significant ETS1 peaks (see the answer to Reviewer 2's comment #6 for details on the methods) located near gene transcription start sites overlapped with TEs. We thereby provided examples of significant ETS1 peaks overlapping TE sequences in the promoter region of two prototypical NK cell protein-coding genes (Figure 2h).

      (6) Why was -k 10 used with bowtie2? This will map the same read to multiple locations in the genome, increasing read density at more repetitive (younger) TEs. The authors should use either default settings, being clear about the outcome (random assignment of multimapping reads to one location), or use only uniquely aligned reads.

      We thank the Reviewer for their comment and agree that using the -k 10 parameter with bowtie2 was not optimal for TE analysis. To improve the strength of our analyses, we reanalyzed all ChIP-seq data of our manuscript (Figure 2g and h, Figure 5e and f) using the following strategy: alignment with bowtie2 using default parameters except –very-sensitive, multimapping read removal with samtools view -q 10, removal of duplicate reads with samtools markdup -r, peaks calling was performed with macs2 with the -m 5 50 parameter, and peaks overlapping ENCODE's blacklist regions were removed with bedtools intersect.

      These new analyses strengthen our evidence that TEs interact with multiple genes that regulate thymic development and function. We updated the results sections concerning ChIP-seq data analyses and the Methods section to include this information:

      "ChIP-seq reads were aligned to the reference Homo sapiens genome (GRCh38) using bowtie2 (version 2.3.5) (10) with the --very-sensitive parameter. Multimapping reads were removed using the samtools view function with the -q 10 parameter, and duplicate reads were removed using the samtools markdup function with the -r parameter (11). Peak calling was performed with macs2 with the -m 5 50 parameter (12). Peaks overlapping with the ENCODE blacklist regions (13) were removed with bedtools intersect (14) with default parameters. Overlap of ETS1 peaks with TE sequences was determined using bedtools intersect with default parameters. BigWig files were generated using the bamCoverage function of deeptools2 (15), and genomic tracks were visualized in the USCS Genome Browser (16)."

      (7) Figure 1d needs a y axis scale. Could the authors also provide details of how the random distribution of TE expression was generated?

      We agree that the Reviewer that Figure 1d was incomplete and made the appropriate modifications. Regarding the random distribution, we reproduced our dataset containing the expression of 809 TE subfamilies in 18 cell populations. For each combination of TE subfamily and cell type, we randomly assigned an "expression pattern" as identified by the hierarchical clustering of Figure 1b. Then, we computed the maximal occurrence of an expression pattern across cell types for each TE subfamily to generate the distribution curve in Figure 1d. We added the following details to the Methods section to clarify how the random distribution was generated:

      "As a control, a random distribution of the expression of 809 TE subfamilies in 18 cell populations was generated. A cluster (cluster 1, 2, or 3) was randomly attributed for each combination of TE subfamily and cell type, and the maximal occurrence of a given cluster across cell types was then computed for each TE subfamily. Finally, the distributions of LINE, LTR, and SINE elements were compared to the random distribution with Kolmogorov-Smirnov tests."

      (8) The motif analysis requires a minimum of 1 locus from each TE subfamily containing it in order to be reported, but this seems like a really low threshold that will output a lot of noise. What is the rationale here?

      We agree with the Reviewer that this threshold might appear low. Nonetheless, these analyses ultimately aimed to identify TE promoter and enhancer candidates. Hence, we did not want to put an arbitrary threshold at a higher value (e.g., a certain number or percentage of all loci of a given TE subfamily), as this might create a bias based on the total number of loci of a given TE subfamily. Moreover, our rationale was that a TE locus might act as a promoter/enhancer even if it is the only locus of its subfamily containing a TF binding site.

      Even though this strategy might have created some noise in the analyses of interactions between TFs and TEs of Figure 2 (panels a-e), we are confident that our bootstrap strategy efficiently removed low-quality identifications based on low correlations values or expression of TF and TE in low percentages of cells. Additionally, the subsequent analyses on TE promoter and enhancer candidates were performed exclusively for the TE loci containing TF binding sites to avoid adding noise to these analyses.

      (9) Figure 4e: is this a log2 enrichment? If not, the enrichments for some of the gene sets are not so high.

      The enrichment values represented in Figure 4e are not log-transformed. It is essential to highlight that gene set enrichment values were computed for each possible pair of thymic APCs (e.g., pDC vs. cDC1, pDC vs. mTEC(II), etc.), and the values represented in Figure 4e are an average of each comparison pictured at the bottom of the UpSet plot.

      However, we agree with Reviewer 2 that the average enrichment value is not extremely high. We thus made the following modifications to the Results section ("TE expression in human pDCs is associated with dsRNA structures") to better represent it:

      "Notably, thymic pDCs harbored moderate yet significant enrichment of gene signatures of RIG-I and MDA5-mediated IFN ɑ/β signaling compared to all other thymic APCs (Figure 4e and Supplementary file 1 – Table 8)."

      (10) Please be clear on results subtitles when these refer to mouse.

      We apologize for the confusion and modified the subtitles to clarify if the results refer to mouse or human data.

      (11) Figure 1 - figure supplement 2: "assignation" should be 'assignment'.

      We thank the Reviewer for their keen eye and changed the title of Figure 1 – figure supplement 2.

      (1) Sadeq S, Al-Hashimi S, Cusack CM, Werner A. Endogenous Double-Stranded RNA. Noncoding RNA. 2021;7(1).

      (2) Kim N, Kim M, Yun S, Doh J, Greenberg PD, Kim TD, et al. MicroRNA-150 regulates the cytotoxicity of natural killers by targeting perforin-1. J Allergy Clin Immunol. 2014;134(1):195-203.

      (3) Gunturi A, Berg RE, Forman J. The role of CD94/NKG2 in innate and adaptive immunity. Immunol Res. 2004;30(1):29-34.

      (4) Taveirne S, Wahlen S, Van Loocke W, Kiekens L, Persyn E, Van Ammel E, et al. The transcription factor ETS1 is an important regulator of human NK cell development and terminal differentiation. Blood. 2020;136(3):288-98.

      (5) De Cecco M, Ito T, Petrashen AP, Elias AE, Skvir NJ, Criscione SW, et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature. 2019;566(7742):73-8.

      (6) Huntley S, Baggott DM, Hamilton AT, Tran-Gyamfi M, Yang S, Kim J, et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 2006;16(5):669-77.

      (7) Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7:1338.

      (8) Lanciano S, Cristofari G. Measuring and interpreting transposable element expression. Nat Rev Genet. 2020;21(12):721-36.

      (9) Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525-7.

      (10) Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357-9.

      (11) Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10(2).

      (12) Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.

      (13) Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019;9(1):9354.

      (14) Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841-2.

      (15) Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44(W1):W160-5.

      (16) Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996-1006.

    2. Reviewer #2 (Public Review):

      Summary:

      Larouche et al show that TEs are broadly expressed in thymic cells, especially in mTECs and pDCs. Their data suggest a possible involvement of TEs in thymic gene regulation and IFN-alpha secretion. They also show that at least some TE-derived peptides are presented by MHC-I in the thymus.

      Strengths:

      The idea of high/broad TE expression in the thymus as a mechanism for preventing TE-mediated autoimmunity is certainly an attractive one, as is their involvement in IFN-alpha secretion therein. The analyses and experiments presented here are therefore a very useful primer for more in-depth experiments, as the authors point out towards the end of the discussion.

      Weaknesses:

      There are many dangers about analysing RNA-seq data at the subfamily level. Outputs may be greatly confounded by pervasive transcription, DNA contamination, and overlap of TEs with highly expressed genes. Whether TE transcripts are independent units or part of a gene also has important implications for the conclusions drawn. The authors have tried to mitigate against some of these issues, but they have not been completely ruled out.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The very detailed insights gained by the authors into allosteric regulation require very specialized techniques in this study. This poses a challenge to communicate the methods, the results, and the meaning of the results to a broader audience. In some places, the authors overcome this challenge better than in others.

      Following this reviewer’s suggestions, we have extensively revised the text, making the text more understandable to a broader audience.

      The manuscript does not show up on BioRxiv.

      The manuscript is now deposited in Biorxv (doi: 10.1101/2023.09.12.557419)

      Fig3: GS-ES2 transition: the changes appear minimal in the illustration.

      As suggested by this reviewer, we have re-examined the GS-ES2 transition and clearly defined the structural characteristics of the conformationally excited state 2 (ES2) state. As shown in the revised Fig.3 of the main text, the ground state (GS) features a π-π packing between the aromatic rings of F100 and Y156, as well as a cation-π stacking between R308 and F102. In the ES2 state, these above interactions are disrupted, while a new π-π packing interaction is formed between F100 and F102. We added new comments in the main text clarifying these structural interactions that characterize each state.

      GS-ES1 transition: how is the K72-E91 salt bridge disrupted? How do you define the formation/disruption of a salt bridge? The current figure does not make this very clear and the K72-E91 salt bridge appears to be intact in ES1. Maybe the authors could replace the dotted K72-E91 line with a dotted line and distance?

      As stated above, we revised Fig. 3 highlighting the differences between the two states. The K72 and E91 salt bridge is formed when the distance between Nε of K72 and Oε of E91 is shorter than 4.0 Å (the typical cutoff for a salt bridge). In the ES1 state, the outward movement of the αC helix increases the distance over 4.5 Å, disrupting the salt bridge.

      L251: Could the authors remind the reader why they are only comparing V104 and I150? Could they give a little context as to why they consider the agreement to be good? It appears that they would be statistically different, so a little context for what comprises a good agreement in the literature may be helpful.

      Our mutagenesis studies show that V104 and I150 are key residues for allosteric communication, and if mutated, result in well-folded but inactive kinases (Sci Adv. doi: 10.1126/sciadv.1600663). Importantly, V104 and I150 show two distinct populations in the CEST experiments that can be directly related to the GS and ES states. Regarding the fitting of these residues, we obtained a good agreement with the direction of the chemical shifts, which supports the hypothesized GS -> ES structural transition. The lack of a quantitative agreement between the chemical shifts of the experimental and simulated excited state is not surprising for two reasons a) all state-of-the art simulations fall short in sampling slow conformational interconversions, and b) the uncertainty of the SHIFTX algorithm for the prediction of 13C chemical shifts of methyl groups is quite large. Finally, we would like to point out that most NMR relaxation-dispersion experiments (CEST and CPMG) are performed for the backbone 15N, 13Calpha and 1H resonances, which have been used to calculate the structures of the intermediate states (Neudecker, P. et. al Science, 2012, 336,doi: 10.1126/science.1214203) and yield reasonable agreement with the prediction for metastable states derived from Markov Models (Olsson, S. J. Am. Chem. Soc., 2017,139,doi:10.1021/jacs.6b09460). To the best of our knowledge, there is no literature reporting on calculations of the 13C CEST profiles for methyl groups from MD simulations, and remarkably, we found a reasonably good agreement between experimental and predicted chemical shifts (see Fig.5C).

      Just to clarify: the calculated CS values are informed by experimental CS values that were used in the calculation?

      We used the backbone chemical shifts as the restraints only in the metadynamics simulations. We used the chemical shifts of the methyl groups and their corresponding excited states to verify the ES2 state.

      Figure 8: in its current form this potentially exciting result is lost on the average reader.

      we modified Fig. 8 of the main text, making the intra- and inter-residue correlations visible to the reader.

      Reviewer #2:

      While the alphaC-beta4 loop is a conserved feature of protein kinases, the residues within this loop vary across various kinase families and groups, enabling group and family-specific control of activity through cis and trans acting elements. F102 in PKA interacts with co-conserved residues in the C-tail, which has been proposed to function as a cis regulatory element. The authors should elaborate on the conformational changes in the C-tail, particularly in the arginine that packs against F102, in the results and discussion. This would further extend the impact and scope of the manuscript, which is currently confined to PKA.

      As suggested by this reviewer, we re-analyzed the time-dependent interactions between F102 and R308 at the C-tail. As this reviewer suspected, these interactions differentiate the ES2 from the GS state. In the GS state, there is a stable cation-π interaction between F102 and R308, which becomes transient in the ES2 state (Fig. 3). For the F100A mutant, the interactions between F102 and R308 have lower occurrence relative to the WT enzyme, i.e., a weaker interaction between the αC-β4 loop and the C-tail (see new Figure 6 - figure supplement 1). The latter supports our conclusion that the structural coupling between the C-tail and the two lobes of the enzyme decreases for the F100A mutant. We added more comments in the main text.

      FAIR standards of making the data accessible and reproducible are not directly addressed.

      We have deposited all our NMR data on the Data Repository Site at the University of Minnesota, DRUM (https://hdl.handle.net/11299/261043).

      The MD data and conformational states would be a valuable resource for the community and should be shared via some open-source repositories.

      Due to the large size of the simulations (>500 GB), we could not deposit them in the Data Repository Site at the University of Minnesota (DRUM). We are actively working with the personnel at DRUM to upload all the trajectories in an alternate site. However, these data will be available to the public immediately upon request.

      The authors state that ES1 and ES2 states are novel and not observed in previous crystal structures. The authors should quantify this through comparisons with PKA inactive states and with other AGC kinases.

      We apologize for the confusion. We now clarify that the ES1 is a well-known inactivation pathway. As suggested by this reviewer, we now report a few examples of active and inactive conformations of PKA-C and other kinases (see new Figure 3 – figure supplement 2.). Briefly, ES1 corresponds to the typical αC-out conformation found for PKA-C bound to inhibitors or in R194A mutant. A similar conformation is present for Src, Abl, and CDK2. The C-out conformation features a disrupted β3K-αCE salt bridge, which is key for active kinases. In contrast, the transition GS-ES2 is not present in the inactive conformations deposited in the PDB.

      Based on the results, can the authors speculate on the impact of oncogenic mutations in the alphaCbeta4 loop mutations in PKA?

      We now include additional comments and another citation that further supports our findings. In short, the activation of a kinase is generated by mutation insertions that stabilize the αC-β4 loop as pointed out by Kannan and Zhang (see references 28, 30, and 68). In contrast, mutations that destabilize this allosteric site (e.g., F100A) are inactivating, disrupting the structural couplings of the two lobes (our work).

      Reviewer #3:

      The manuscript is somewhat difficult to read even for kinase experts, and even harder for the layman. The difficulty partially arises from mixing technical description of the simulations with structural interpretation of the results, which is more intuitive, and partially arises from the assumption that readers are familiar with kinase architecture and its key elements (the aC helix, the APE motif, etc).

      We revised the text and modified Fig. 1 in the main text to make the paper more accessible to the general audience.

      The authors haven't done a good job describing the ES2 state intuitively. From my examination of the figures, it appears that in the ES2 state, the kinase domain is more elongated and the N and the C lobes are relatively less engaged than in the ground state. This may or may not be exactly, but a more intuitive description of the ES2 state is needed.

      As suggested by this reviewer, we include a better description of the ES2 state of the kinase and the structural details of the inactivation pathway. Also, we checked the radius of gyration of the two lobes for GS and ES2. ES2 is slightly more elongated with an Rg of 20.3 ± 0.1 Å as compared to the GS state (20.0 ± 0.2 Å). This marginal difference is consistent with our characterization of the local packing around the C-4 loop, in which the lack of stable interaction with E and C-tail in the ES2 state makes the overall structure less compact.

      The authors need to introduce and give a brief description of technical terms such as CV (collective variable), PC (principal component) etc.

      We now specify both collective variables and principal components and include those definitions in the Method section. Briefly, to characterize the complex conformational transitions of PKA-C, we utilize collective variables (Figure 2 – figure supplement 1). We chose these variables based on structural motifs described in the literature to define local and global structural transitions (Camilloni C., Vendruscolo, M, Biochemistry, 2015,54,7470; Kukic, P. et al. Structure, 2015,23, 745). On the other hand, we utilized the principal component analysis to compare the conformational changes of the kinase in the same two-dimensional space, revealing the two lowest frequencies that define the global motions of the enzyme (Figures 7C, D, and E).

      The following paper should be discussed as it discussed similar ATP/substrate binding of Src kinase based on an extensive network that largely overlaps with the discussed PKA network. Foda, et al. "A dynamically coupled allosteric network underlies binding cooperativity in Src kinase." Nature communications 6.1 (2015): 5939.

      We apologize for missing this citation. Indeed, it makes our finding more general as allosteric cooperativity is key in other kinases such as Src and ERK2. We included this in the Discussion section.

      The CHESCA analysis appears to be an add-on that doesn't add much value. It is difficult to direct. I'd suggest considering removing it to the SI.

      We understand this concern. We rewrote part of the paper to make the NMR analysis of the correlated chemical shifts described by the CHESCA matrices linked to the MD calculations.

    1. eLife assessment

      This is a useful study that identifies circadian changes in the gene expression profile of cultured mouse astrocytes. Mechanistic details linking circadian rhythmicity in HERP, a regulator of calcium signals in the endoplasmic reticulum, to altered phosphorylation of Connexin 43 remain currently incomplete. With improved manuscript clarity and statistical analysis, this work could be of interest to the field of astrocyte and circadian biology.

    2. Reviewer #1 (Public Review):

      Summary:

      In Ryu et al., the authors use a cortical mouse astrocyte culture system to address the functional contribution of astrocytes to circadian rhythms in the brain. The authors' starting point is transcriptional output from serum-shocked culture, comparative informatics with existing tools and existing datasets. After fairly routine pathway analyses, they focus on the calcium homeostasis machinery and one gene, Herp, in particular. They argue that Herp is rhythmic at both mRNA and protein levels in astrocytes. They then use a calcium reporter targeted to the ER, mitochondria, or cytosol and show that Herp modulates calcium signaling as a function of circadian time. They argue that this occurs through the regulation of inositol receptors. They claim that the signaling pathway is clock-controlled by a limited examination of Bmal1 knockout astrocytes. Finally, they switch to calcium-mediated phosphorylation of the gap junction protein Connexin 43 but do not directly connect HERP-mediated circadian signaling to these observations. While these experiments address very important questions related to the critical role of astrocytes in regulating circadian signaling, the mechanistic arguments for HERP function, its role in circadian signaling through inositol receptors, the connection to gap junctions, and ultimately, the functional relevance of these findings is only partially substantiated by experimental evidence.

      Strengths:

      - The paper provides useful datasets of astrocyte gene expression in circadian time.

      - Identifies HERP as a rhythmic output of the circadian clock.

      - Demonstrates the circadian-specific sensitivity of ATP -> calcium signaling.

      - Identifies possible rhythms in both Connexin 43 phosphorylation and rhythmic movement of calcium between cells.

      Weaknesses:

      - It is not immediately clear why the authors chose to focus on Ca2+ homeostasis or Herp from their initial screens as neither were the "most rhythmic" pathways in their primary analyses.

      - It would have been interesting (and potentially important) to know whether various methods of cellular synchronization would also render HERP rhythmic (e.g., temperature, forskolin, etc). If Herp is indeed relatively astrocyte-specific and rhythmic, it should be easy to assess its rhythmicity in vivo.

      - The authors show that Herp suppression reduces ATP-mediated suppression of calcium whereas it initially increases Ca2+ in the cytosol and mitochondria and then suppresses it. The dynamics of the mitochondrial and cytosolic responses are not discussed in any detail and it is unclear what their direct relationship is to Herp-mediated ER signaling. What is the explanation for Herp (which is thought to be ER-specific) to calcium signaling in other organelles?

      - What is the functional significance of promoting ATP-mediated suppression of calcium in ER?

      - The authors then nicely show that the effect of ATP is dependent on intrinsic circadian timing but do not explain why these effects are antiphase in cytosol or mitochondria. Moreover, the ∆F/F for calcium in mitochondria and cytosol both rise, cross the abscissa, and then diminish - strongly suggesting a biphasic signaling event. Therefore, one wonders whether measuring the area under the curve is the most functionally relevant measurement of the change.

      - Why are mitochondrial and cytosolic calcium not also demonstrated for Bmal1 KO astrocytes?

      - The authors claim that Herp acts by regulating the degradation of ITPRs but this hypothesis - rather central to the mechanisms proposed in this study - is not experimentally substantiated.

      - There is no clear demonstration of the functional relevance of the circadian rhythms of ATP-mediated calcium signaling.

    3. Reviewer #2 (Public Review):

      Summary:

      The article entitled "Circadian regulation of endoplasmic reticulum calcium response in mouse cultured astrocytes" submitted by Ryu and colleagues describes the circadian control of astrocytic intracellular calcium levels in vitro.

      Strengths:

      The authors used a variety of technical approaches that are appropriate

      Weaknesses:

      Statistical analysis is poor and could lead to a misinterpretation of the data

      Several conceptual issues have been identified.

      Overinterpretation of the data should be avoided. This is a mechanistic paper done completely in vitro, all references to the in vivo situation are speculative and should be avoided.

    4. Reviewer #3 (Public Review):

      Astrocyte biology is an active area of research and this study is timely and adds to a growing body of literature in the field. The RNA-seq, Herp expression, and Ca2+ release data across wild-type, Bmal1 knockout, and Herp knockdown cellular models are robust and lend considerable support to the study's conclusions, highlighting their importance. Despite these strengths, the manuscript presents a gap in elucidating the dynamics of HERP and the involvement of ITPR1/2 in modulating Ca2+ release patterns and their circadian variations, which remains insufficiently supported and characterized. While the Connexin data underscore the importance of rhythmic Ca2+ release triggered by ATP, the relationship here appears correlational and the role of HERP and ITPR in Cx function remains to be characterized. Moreover, enhancing the manuscript's clarity and readability could significantly benefit the presentation and comprehension of the findings.

    1. eLife assessment

      This fundamental work substantially advances our understanding of cell migration, especially in that of cranial neural crest. The additional evidence provided to support the conclusion is exceptional, with rigorous biochemical assays for materials used and with intensive genetic interventions. The work will be of broad interest to developmental biologists and cell biologists.

    1. eLife assessment

      This important study reports a novel approach to studying cerebellar function based on the idea of selective recruitment using fMRI. It provides convincing evidence for task-dependent gating of neocortical input to the cerebellum during a motor task and a working memory task. The study will be of interest to a broad cognitive neuroscience audience.

    2. Reviewer #1 (Public Review):

      This is an interesting and well-written paper reporting on a novel approach to studying cerebellar function based on the idea of selective recruitment using fMRI. The study is well-designed and executed. Analyses are sound and results are properly discussed. The paper makes a significant contribution to broadening our understanding of the role of the cerebellum in human behavior.

      - While the authors provide a compelling case for the link between BOLD and the cerebellar cortical input layer, there remains considerable unexplained variance. Perhaps the authors could elaborate a bit more on the assumption that BOLD signals mainly reflect the input side of the cerebellum (see for example King et al., elife. 2023 Apr 21;12:e81511).

      - The current approach does not appear to take the non-linear relationships between BOLD and neural activity into account.

      - The authors may want to address a bit more the issue of closed loops as well as the underlying neuroanatomy including the deep cerebellar nuclei and pontine nuclei in the context of their current cerebello-cortical correlational approach. But also the contribution of other brain areas such as the basal ganglia and hippocampus.

      - What about the direct projections of mossy fibers to the DCN that actually bypasses the cerebellar cortex?

    3. Reviewer #2 (Public Review):

      Summary:

      Shahshahani and colleagues used a combination of statistical modelling and whole-brain fMRI data in an attempt to separate the contributions of cortical and cerebellar regions in different cognitive contexts.

      Strengths:

      * The manuscript uses a sophisticated integration of statistical methods, cognitive neuroscience, and systems neurobiology.

      * The authors use multiple statistical approaches to ensure robustness in their conclusions.

      * The consideration of the cerebellum as not a purely 'motor' structure is excellent and important.

      Weaknesses:

      * Two of the foundation assumptions of the model - that cerebellar BOLD signals reflect granule cells > purkinje neurons and that corticocerebellar connections are relatively invariant - are still open topics of investigation. It might be helpful for the reader if these ideas could be presented in a more nuanced light.

      * The assumption that cortical BOLD responses in cognitive tasks should be matched irrespective of cerebellar involvement does not cohere with the idea of 'forcing functions' introduced by Houk and Wise.

    1. Author Response

      Reviewer #1 (Public Review):

      Theoretical principles of viscous fluid mechanics are used here to assess likely mechanisms of transport in the ER. A set of candidate mechanisms is evaluated, making good use of imaging to represent ER network geometries. Evidence is provided that the contraction of peripheral sheets provides a much more credible mechanism than the contraction of individual tubules, junctions, or perinuclear sheets.

      The work has been conducted carefully and comprehensively, making good use of underlying physical principles. There is a good discussion of the role of slip; sensible approximations (low volume fraction, small particle size, slender geometries, pragmatic treatment of boundary conditions) allow tractable and transparent calculations; clear physical arguments provide useful bounds; stochastic and deterministic features of the problem are well integrated.

      We thank the reviewer for their positive assessment of our work.

      There are just a couple of areas where more discussion might be warranted, in my view.

      (1) The energetic cost of tubule contraction is estimated, but I did not see an equivalent estimate for the contraction of peripheral sheets. It might be helpful to estimate the energetic cost of viscous dissipation in generated flows at higher frequencies.

      This is a good point. We will also include an energetic cost estimate for the contractions of peripheral sheets in the revised manuscript.

      The mechanism of peripheral sheet contraction is unclear: do ATP-driven mechanisms somehow interact with thermal fluctuations of membranes?

      The new energetic estimates in the revision might help constrain possible hypotheses for the mechanism(s) driving peripheral sheet contraction, and suggest if a dedicated ATP-driven mechanism is required.

      (2) Mutations are mentioned in the abstract but not (as far as I could see) later in the manuscript. It would be helpful if any consequences for pathologies could be developed in the text.

      We are grateful for this suggestion. The need to rationalise pathology associated with the subtle effects of ER-morphogens’ mutations is indeed pointed out as one factor motivating the study of the interplay between ER structure and performance. In the revised manuscript, we plan to include a brief discussion potentially linking ER morphogenes’ malfunction to luminal transport, integrating additional freshly published data.

      Reviewer #2 (Public Review):

      Summary:

      This study explores theoretically the consequences of structural fluctuations of the endoplasmic reticulum (ER) morphology called contractions on molecular transport. Most of the manuscript consists of the construction of an interesting theoretical flow field (physical model) under various hypothetical assumptions. The computational modeling is followed by some simulations

      Strengths:

      The authors are focusing their attention on testing the hypothesis that a local flow in the tubule could be driven by tubular pinching. We recall that trafficking in the ER is considered to be mostly driven by diffusion at least at a spatial scale that is large enough to account for averaging of any random flow occurring from multiple directions [note that this is not the case for plants].

      We thank the reviewer. We have indeed explored here the possibilities of active transport, focusing especially on transport over the length scale of single tubules, as a result of structural fluctuations, and found tubular pinching to be ineffective compared to e.g. peripheral sheets fluctuations. In the revised version we plan to add text mentioning what is known about the ER in plants.

      Weaknesses:

      The manuscript extensively details the construction of the theoretical model, occupying a significant portion of the manuscript. While this section contains interesting computations, its relevance and utility could be better emphasized, perhaps warranting a reorganization of the manuscript to foreground this critical aspect.

      Overall, the manuscript appears highly technical with limited conclusive insights, particularly lacking predictions confirmed by experimental validation. There is an absence of substantial conclusions regarding molecular trafficking within the ER.

      We sought to balance the theoretical/computational details of our model with the biophysical conclusions drawn from its predictions. Given the model's complexity and novelty, it was essential to elucidate the theoretical underpinnings comprehensively, in order to allow others to implement it in the future with additional, or different, parameters. To maintain clarity and focus in the main text, we have judiciously relegated extensive technical details to the methods section or supplementary materials, and divided the text into stand-alone section headings allowing the reader to skip through to conclusions.

      The primary focus of our manuscript is to introduce and explore, via our theoretical model, the interplay between ER structure dynamics and molecular transport. Our approach, while in silico, generates concrete predictions about the physical processes underpinning luminal motion within the ER. For instance, our findings challenge the previously postulated role of small tubular contractions in driving luminal flow, instead highlighting the potential significance of local flat ER areas—empirically documented entities—for facilitating such motion.

      Furthermore, by deducing what type of transport may or may not occur within the range of possible ER structural fluctuations, our model offers detailed predictions designed to bridge the gap between theoretical insight and experimental verification. These predictions detail the spatial and temporal parameters essential for effective transport, delineating plausible values for these parameters. We hope that the model’s predictions will invite experimentalists to devise innovative methodologies to test them. We plan to introduce text edits to the revised version to clarify these.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations

      Recommendation #1: Address potential confounds in the experimental design:

      (1a) Confounding factors between baseline to early learning. While the visual display of the curved line remains constant, there are at least three changes between these two phases: 1) the presence of reward feedback (the focus of the paper); 2) a perturbation introduced to draw a hidden, mirror-symmetric curved line; 3) instructions provided to use reward feedback to trace the line on the screen (intentionally deceitful). As such, it remains unclear which of these factors are driving the changes in both behavior and bold signals between the two phases. The absence of a veridical feedback phase in which participants received reward feedback associated with the shown trajectory seems like a major limitation.

      (1b) Confounding Factors Between Early and Late Learning. While the authors have focused on interpreting changes from early to late due to the explore-exploit trade-off, there are three additional factors possibly at play: 1) increasing fatigue, 2) withdrawal of attention, specifically related to individuals who have either successfully learned the perturbation within the first few trials or those who have simply given up, or 3) increasing awareness of the perturbation (not clear if subjective reports about perturbation awareness were measured.). I understand that fMRI research is resource-intensive; however, it is not clear how to rule out these alternatives with their existing data without additional control groups. [Another reviewer added the following: Why did the authors not acquire data during a control condition? How can we be confident that the neural dynamics observed are not due to the simple passage of time? Or if these effects are due to the task, what drives them? The reward component, the movement execution, increased automaticity?]

      We have opted to address both of these points above within a single reply, as together they suggest potential confounding factors across the three phases of the task. We would agree that, if the results of our pairwise comparisons (e.g., Early > Baseline or Late > Early) were considered in isolation from one another, then these critiques of the study would be problematic. However, when considering the pattern of effects across the three task phases, we believe most of these critiques can be dismissed. Below, we first describe our results in this context, and then discuss how they address the reviewers’ various critiques.

      Recall that from Baseline to Early learning, we observe an expansion of several cortical areas (e.g., core regions in the DMN) along the manifold (red areas in Fig. 4A, see manifold shifts in Fig. 4C) that subsequently exhibit contraction during Early to Late learning (blue areas in Fig. 4B, see manifold shifts in Fig. 4D). We show this overlap in brain areas in Author response image 1 below, panel A. Notably, several of these brain areas appear to contract back to their original, Baseline locations along the manifold during Late learning (compare Fig. 4C and D). This is evidenced by the fact that many of these same regions (e.g., DMN regions, in Author response image 1 panel A below) fail to show a significant difference between the Baseline and Late learning epochs (see Author response image 1 panel B below, which is taken from supplementary Fig 6). That is, the regions that show significant expansion and subsequent contraction (in Author response image 1 panel A below) tend not to overlap with the regions that significantly changed over the time course of the task (in Author response image 1 panel B below).

      Author response image 1.

      Note that this basic observation above is not only true of our regional manifold eccentricity data, but also in the underlying functional connectivity data associated with individual brain regions. To make this second point clearer, we have modified and annotated our Fig. 5 and included it below. Note the reversal in seed-based functional connectivity from Baseline to Early learning (leftmost brain plots) compared to Early to Late learning (rightmost brain plots). That is, it is generally the case that for each seed-region (A-C) the areas that increase in seed-connectivity with the seed region (in red; leftmost plot) are also the areas that decrease in seed-connectivity with the seed region (in blue; rightmost plot), and vice versa. [Also note that these connectivity reversals are conveyed through the eccentricity data — the horizontal red line in the rightmost plots denote the mean eccentricity of these brain regions during the Baseline phase, helping to highlight the fact that the eccentricity of the Late learning phase reverses back towards this Baseline level].

      Author response image 2.

      Critically, these reversals in brain connectivity noted above directly counter several of the critiques noted by the reviewers. For instance, this reversal pattern of effects argues against the idea that our results during Early Learning can be simply explained due to the (i) presence of reward feedback, (ii) presence of the perturbation or (iii) instructions to use reward feedback to trace the path on the screen. Indeed, all of these factors are also present during Late learning, and yet many of the patterns of brain activity during this time period revert back to the Baseline patterns of connectivity, where these factors are absent. Similarly, this reversal pattern strongly refutes the idea that the effects are simply due to the passage of time, increasing fatigue, or general awareness of the perturbation. Indeed, if any of these factors alone could explain the data, then we would have expected a gradual increase (or decrease) in eccentricity and connectivity from Baseline to Early to Late learning, which we do not observe. We believe these are all important points when interpreting the data, but which we failed to mention in our original manuscript when discussing our findings.

      We have now rectified this in the revised paper, where we now write in our Discussion:

      “Finally, it is important to note that the reversal pattern of effects noted above suggests that our findings during learning cannot be simply attributed to the introduction of reward feedback and/or the perturbation during Early learning, as both of these task-related features are also present during Late learning. In addition, these results cannot be simply explained due to the passage of time or increasing subject fatigue, as this would predict a consistent directional change in eccentricity across the Baseline, Early and Late learning epochs.”

      However, having said the above, we acknowledge that one potential factor that our findings cannot exclude is that they are (at least partially) attributable to changes in subjects’ state of attention throughout the task. Indeed, one can certainly argue that Baseline trials in our study don’t require a great deal of attention (after all, subjects are simply tracing a curved path presented on the screen). Likewise, for subjects that have learned the hidden shape, the Late learning trials are also likely to require limited attentional resources (indeed, many subjects at this point are simply producing the same shape trial after trial). Consequently, the large shift in brain connectivity that we observe from Baseline to Early Learning, and the subsequent reversion back to Baseline-levels of connectivity during Late learning, could actually reflect a heightened allocation of attention as subjects are attempting to learn the (hidden) rewarded shape. However, we do not believe that this would reflect a ‘confound’ of our study per se — indeed, any subject who has participated in a motor learning study would agree that the early learning phase of a task is far more cognitively demanding than Baseline trials and Late learning trials. As such, it is difficult to disentangle this ‘attention’ factor from the learning process itself (and in fact, it is likely central to it).

      Of course, one could have designed a ‘control’ task in which subjects must direct their attention to something other than the learning task itself (e.g., divided attention paradigm, e.g., Taylor & Thoroughman, 2007, 2008, and/or perform a secondary task concurrently (Codol et al., 2018; Holland et al., 2018), but we know that this type of manipulation impairs the learning process itself. Thus, in such a case, it wouldn’t be obvious to the experimenter what they are actually measuring in brain activity during such a task. And, to extend this argument even further, it is true that any sort of brain-based modulation can be argued to reflect some ‘attentional’ process, rather than modulations related to the specific task-based process under consideration (in our case, motor learning). In this regard, we are sympathetic to the views of Richard Andersen and colleagues who have eloquently stated that “The study of how attention interacts with other neural processing systems is a most important endeavor. However, we think that over-generalizing attention to encompass a large variety of different neural processes weakens the concept and undercuts the ability to develop a robust understanding of other cognitive functions.” (Andersen & Cui, 2007, Neuron). In short, it appears that different fields/researchers have alternate views on the usefulness of attention as an explanatory construct (see also articles from Hommel et al., 2019, “No one knows what attention is”, and Wu, 2023, “We know what attention is!”), and we personally don’t have a dog in this fight. We only highlight these issues to draw attention (no pun intended) that it is not trivial to separate these different neural processes during a motor learning study.

      Nevertheless, we do believe these are important points worth flagging for the reader in our paper, as they might have similar questions. To this end, we have now included in our Discussion section the following text:

      “It is also possible that some of these task-related shifts in connectivity relate to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      Finally, we should note that, at the end of testing, we did not assess participants' awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path). In hindsight, this would have been a good idea and provided some value to the current project. Nevertheless, it seems clear that, based on several of the learning profiles observed (e.g., subjects who exhibited very rapid learning during the Early Learning phase, more on this below), that many individuals became aware of a shape approximating the rewarded path. Note that we have included new figures (see our responses below) that give a better example of what fast versus slower learning looks like. In addition, we now note in our Methods that we did not probe participants about their subjective awareness re: the perturbation:

      “Note that, at the end of testing, we did not assess participants’ awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path of the visible path).”

      Recommendation #2: Provide more behavioral quantification.

      (2a) The authors chose to only plot the average learning score in Figure 1D, without an indication of movement variability. I think this is quite important, to give the reader an impression of how variable the movements were at baseline, during early learning, and over the course of learning. There is evidence that baseline variability influences the 'detectability' of imposed rotations (in the case of adaptation learning), which could be relevant here. Shading the plots by movement variability would also be important to see if there was some refinement of the moment after participants performed at the ceiling (which seems to be the case ~ after trial 150). This is especially worrying given that in Fig 6A there is a clear indication that there is a large difference between subjects' solutions on the task. One subject exhibits almost a one-shot learning curve (reaching a score of 75 after one or two trials), whereas others don't seem to really learn until the near end. What does this between-subject variability mean for the authors' hypothesized neural processes?

      In line with these recommendations, we have now provided much better behavioral quantification of subject-level performance in both the main manuscript and supplementary material. For instance, in a new supplemental Figure 1 (shown below), we now include mean subject (+/- SE) reaction times (RTs), movement times (MTs) and movement path variability (our computing of these measures are now defined in our Methods section).

      As can be seen in the figure, all three of these variables tended to decrease over the course of the study, though we note there was a noticeable uptick in both RTs and MTs from the Baseline to Early learning phase, once subjects started receiving trial-by-trial reward feedback based on their movements. With respect to path variability, it is not obvious that there was a significant refinement of the paths created during late learning (panel D below), though there was certainly a general trend for path variability to decrease over learning.

      Author response image 3.

      Behavioral measures of learning across the task. (A-D) shows average participant reward scores (A), reaction times (B), movement times (C) and path variability (D) over the course of the task. In each plot, the black line denotes the mean across participants and the gray banding denotes +/- 1 SEM. The three equal-length task epochs for subsequent neural analyses are indicated by the gray shaded boxes.

      In addition to these above results, we have also created a new Figure 6 in the main manuscript, which now solely focuses on individual differences in subject learning (see below). Hopefully, this figure clarifies key features of the task and its reward structure, and also depicts (in movement trajectory space) what fast versus slow learning looks like in the task. Specifically, we believe that this figure now clearly delineates for the reader the mapping between movement trajectory and the reward score feedback presented to participants, which appeared to be a source of confusion based on the reviewers’ comments below. As can be clearly observed in this figure, trajectories that approximated the ‘visible path’ (black line) resulted in fairly mediocre scores (see score color legend at right), whereas trajectories that approximated the ‘reward path’ (dashed black line, see trials 191-200 of the fast learner) resulted in fairly high scores. This figure also more clearly delineates how fPCA loadings derived from our functional data analysis were used to derive subject-level learning scores (panel C).

      Author response image 4.

      Individual differences in subject learning performance. (A) Examples of a good learner (bordered in green) and poor learner (bordered in red). (B) Individual subject learning curves for the task. Solid black line denotes the mean across all subjects whereas light gray lines denote individual participants. The green and red traces denote the learning curves for the example good and poor learners denoted in A. (C) Derivation of subject learning scores. We performed functional principal component analysis (fPCA) on subjects’ learning curves in order to identify the dominant patterns of variability during learning. The top component, which encodes overall learning, explained the majority of the observed variance (~75%). The green and red bands denote the effect of positive and negative component scores, respectively, relative to mean performance. Thus, subjects who learned more quickly than average have a higher loading (in green) on this ‘Learning score’ component than subjects who learned more slowly (in red) than average. The plot at right denotes the loading for each participant (open circles) onto this Learning score component.

      The reviewers note that there are large individual differences in learning performance across the task. This was clearly our hope when designing the reward structure of this task, as it would allow us to further investigate the neural correlates of these individual differences (indeed, during pilot testing, we sought out a reward structure to the task that would allow for these intersubject differences). The subjects who learn early during the task end up having higher fPCA scores than the subjects who learn more gradually (or learn the task late). From our perspective, these differences are a feature, and not a bug, and they do not negate any of our original interpretations. That is, subjects who learn earlier on average tend to contract their DAN-A network during the early learning phase whereas subjects who learn more slowly on average (or learn late) instead tend to contract their DAN-A network during late learning (Fig. 7).

      (2b) In the methods, the authors stated that they scaled the score such that even a perfectly traced visible path would always result in an imperfect score of 40 patients. What happens if a subject scores perfectly on the first try (which seemed to have happened for the green highlighted subject in Fig 6A), but is then permanently confronted with a score of 40 or below? Wouldn't this result in an error-clamp-like (error-based motor adaptation) design for this subject and all other high performers, which would vastly differ from the task demands for the other subjects? How did the authors factor in the wide between-subject variability?

      We think the reviewers may have misinterpreted the reward structure of the task, and we apologize for not being clearer in our descriptions. The reward score that subjects received after each trial was based on how well they traced the mirror-image of the visible path. However, all the participant can see on the screen is the visible path. We hope that our inclusion of the new Figure 6 (shown above) makes the reward structure of the task, and its relationship to movement trajectories, much clearer. We should also note that, even for the highest performing subject (denoted in Fig. 6), it still required approximately 20 trials for them to reach asymptote performance.

      (2c) The study would benefit from a more detailed description of participants' behavioral performance during the task. Specifically, it is crucial to understand how participants' motor skills evolve over time. Information on changes in movement speed, accuracy, and other relevant behavioral metrics would enhance the understanding of the relationship between behavior and brain activity during the learning process. Additionally, please clarify whether the display on the screen was presented continuously throughout the entire trial or only during active movement periods. Differences in display duration could potentially impact the observed differences in brain activity during learning.

      We hope that with our inclusion of the new Supplementary Figure 1 (shown above) this addresses the reviewers’ recommendation. Generally, we find that RTs, MTs and path variability all decrease over the course of the task. We think this relates to the early learning phase being more attentionally demanding and requiring more conscious effort, than the later learning phases.

      Also, yes, the visible path was displayed on the screen continuously throughout the trial, and only disappeared at the 4.5 second mark of each trial (when the screen was blanked and the data was saved off for 1.5 seconds prior to commencement of the next trial; 6 seconds total per trial). Thus, there were no differences in display duration across trials and phases of the task. We have now clarified this in the Methods section, where we now write the following:

      “When the cursor reached the target distance, the target changed color from red to green to indicate that the trial was completed. Importantly, other than this color change in the distance marker, the visible curved path remained constant and participants never received any feedback about the position of their cursor.”

      (2d) It is unclear from plots 6A, 6B, and 1D how the scale of the behavioral data matches with the scaling of the scores. Are these the 'real' scores, meaning 100 on the y-axis would be equivalent to 40 in the task? Why then do all subjects reach an asymptote at 75? Or is 75 equivalent to 40 and the axis labels are wrong?

      As indicated above, we clearly did a poor job of describing the reward structure of our task in our original paper, and we now hope that our inclusion of Figure 6 makes things clear. A ‘40’ score on the y-axis would indicate that a subject has perfectly traced the visible path whereas a perfect ‘100’ score would indicate that a subject has perfectly traced the (hidden) mirror image path.

      The fact that several of the subjects reach asymptote around 75 is likely a byproduct of two factors. Firstly, the subjects performed their movements in the absence of any visual error feedback (they could not see the position of a cursor that represented their hand position), which had the effect of increasing motor variability in their actions from trial to trial. Secondly, there appears to be an underestimation among subjects regarding the curvature of the concealed, mirror-image path (i.e., that the rewarded path actually had an equal but opposite curvature to that of the visible path). This is particularly evident in the case of the top-performing subject (illustrated in Figure 6A) who, even during late learning, failed to produce a completely arched movement.

      (2e) Labeling of Contrasts: There is a consistent issue with the labeling of contrasts in the presented figures, causing confusion. While the text refers to the difference as "baseline to early learning," the label used in figures, such as Figure 4, reads "baseline > early." It is essential to clarify whether the presented contrast is indeed "baseline > early" or "early > baseline" to avoid any misinterpretation.

      We thank the reviewers for catching this error. Indeed, the intended label was Early > Baseline, and this has now been corrected throughout.

      Recommendation #3. Clarify which motor learning mechanism(s) are at play.

      (3a) Participants were performing at a relatively low level, achieving around 50-60 points by the end of learning. This outcome may not be that surprising, given that reward-based learning might have a substantial explicit component and may also heavily depend on reasoning processes, beyond reinforcement learning or contextual recall (Holland et al., 2018; Tsay et al., 2023). Even within our own data, where explicit processes are isolated, average performance is low and many individuals fail to learn (Brudner et al., 2016; Tsay et al., 2022). Given this, many participants in the current study may have simply given up. A potential indicator of giving up could be a subset of participants moving straight ahead in a rote manner (a heuristic to gain moderate points). Consequently, alterations in brain networks may not reflect exploration and exploitation strategies but instead indicate levels of engagement and disengagement. Could the authors plot the average trajectory and the average curvature changes throughout learning? Are individuals indeed defaulting to moving straight ahead in learning, corresponding to an average of 50-60 points? If so, the interpretation of brain activity may need to be tempered.

      We can do one better, and actually give you a sense of the learning trajectories for every subject over time. In the figure below, which we now include as Supplementary Figure 2 in our revision, we have plotted, for each subject, a subset of their movement trajectories across learning trials (every 10 trials). As can be seen in the diversity of these trajectories, the average trajectory and average curvature would do a fairly poor job of describing the pattern of learning-related changes across subjects. Moreover, it is not obvious from looking at these plots the extent to which poor learning subjects (i.e., subjects who never converge on the reward path) actually ‘give up’ in the task — rather, many of these subjects still show some modulation (albeit minor) of their movement trajectories in the later trials (see the purple and pink traces). As an aside, we are also not entirely convinced that straight ahead movements, which we don’t find many of in our dataset, can be taken as direct evidence that the subject has given up.

      Author response image 5

      Variability in learning across subjects. Plots show representative trajectory data from each subject (n=36) over the course of the 200 learning trials. Coloured traces show individual trials over time (each trace is separated by ten trials, e.g., trial 1, 10, 20, 30, etc.) to give a sense of the trajectory changes throughout the task (20 trials in total are shown for each subject).

      We should also note that we are not entirely opposed to the idea of describing aspects of our findings in terms of subject engagement versus disengagement over time, as such processes are related at some level to exploration (i.e., cognitive engagement in finding the best solution) and exploitation (i.e., cognitively disengaging and automating one’s behavior). As noted in our reply to Recommendation #1 above, we now give some consideration of these explanations in our Discussion section, where we now write:

      “It is also possible that these task-related shifts in connectivity relates to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      (3b) The authors are mixing two commonly used paradigms, reward-based learning, and motor adaptation, but provide no discussion of the different learning processes at play here. Which processes were they attempting to probe? Making this explicit would help the reader understand which brain regions should be implicated based on previous literature. As it stands, the task is hard to interpret. Relatedly, there is a wealth of literature on explicit vs implicit learning mechanisms in adaptation tasks now. Given that the authors are specifically looking at brain structures in the cerebral cortex that are commonly associated with explicit and strategic learning rather than implicit adaptation, how do the authors relate their findings to this literature? Are the learning processes probed in the task more explicit, more implicit, or is there a change in strategy usage over time? Did the authors acquire data on strategies used by the participants to solve the task? How does the baseline variability come into play here?

      As noted in our paper, our task was directly inspired by the reward-based motor learning tasks developed by Dam et al., 2013 (Plos One) and Wu et al., 2014 (Nature Neuroscience). What drew us to these tasks is that they allowed us to study the neural bases of reward-based learning mechanisms in the absence of subjects also being able to exploit error-based mechanisms to achieve learning. Indeed, when first describing the task in the Results section of our paper we wrote the following:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014).”

      If the reviewers are referring to ‘motor adaptation’ in the context in which that terminology is commonly used — i.e., the use of sensory prediction errors to support error-based learning — then we would argue that motor adaptation is not a feature of the current study. It is true that in our study subjects learn to ‘adapt’ their movements across trials, but this shaping of the movement trajectories must be supported through reinforcement learning mechanisms (and, of course, supplemented by the use of cognitive strategies as discussed in the nice review by Tsay et al., 2023). We apologize for not being clearer in our paper about this key distinction and we have now included new text in the introduction to our Results to directly address this:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014). That is, subjects could not use error-based learning mechanisms to achieve learning in our study, as this form of learning requires sensory errors that convey both the change in direction and magnitude needed to correct the movement.”

      With this issue aside, we are well aware of the established framework for thinking about sensorimotor adaptation as being composed of a combination of explicit and implicit components (indeed, this has been a central feature of several of our other recent neuroimaging studies that have explored visuomotor rotation learning, e.g., Gale et al., 2022 PNAS, Areshenkoff et al., 2022 elife, Standage et al., 2023 Cerebral Cortex). However, there has been comparably little work done on these parallel components within the domain of reinforcement learning tasks (though see Codol et al., 2018; Holland et al., 2018, van Mastrigt et al., 2023; see also the Tsay et al., 2023 review), and as far as we can tell, nothing has been done to date in the reward-based motor learning area using fMRI. By design, we avoided using descriptors of ‘explicit’ or ‘implicit’ in our study because our experimental paradigm did not allow a separate measurement of those two components to learning during the task. Nevertheless, it seems clear to us from examining the subjects’ learning curves (see supplementary figure 2 above), that individuals who learn very quickly are using strategic processes (such as action exploration to identify the best path) to enhance their learning. As we noted in an above response, we did not query subjects after the fact about their strategy use, which admittedly was a missed opportunity on our part.

      Author response image 6.

      With respect to the comment on baseline variability and its relationship to performance, this is an interesting idea and one that was explored in the Wu et al., 2014 Nature Neuroscience paper. Prompted by the reviewers, we have now explored this idea in the current data set by testing for a relationship between movement path variability during baseline trials (all 70 baseline trials, see Supplementary Figure 1D above for reference) and subjects’ fPCA score on our learning task. However, when we performed this analysis, we did not observe a significant positive relationship between baseline variability and subject performance. Rather, we actually found a trend towards a negative relationship (though this was non-significant; r=-0.2916, p=0.0844). Admittedly, we are not sure what conclusions can be drawn from this analysis, and in any case, we believe it to be tangential to our main results. We provide the results (at right) for the reviewers if they are interested. This may be an interesting avenue for exploration in future work.

      Recommendation #4: Provide stronger justification for brain imaging methods.

      (4a) Observing how brain activity varies across these different networks is remarkable, especially how sensorimotor regions separate and then contract with other, more cognitive areas. However, does the signal-to-noise ratio in each area/network influence manifold eccentricity and limit the possible changes in eccentricity during learning? Specifically, if a region has a low signal-to-noise ratio, it might exhibit minimal changes during learning (a phenomenon perhaps relevant to null manifold changes in the striatum due to low signal-to-noise); conversely, regions with higher signal-to-noise (e.g., motor cortex in this sensorimotor task) might exhibit changes more easily detected. As such, it is unclear how to interpret manifold changes without considering an area/network's signal-to-noise ratio.

      We appreciate where these concerns are coming from. First, we should note that the timeseries data used in our analysis were z-transformed (mean zero, 1 std) to allow normalization of the signal both over time and across regions (and thus mitigate the possibility that the changes observed could simply reflect mean overall signal changes across different regions). Nevertheless, differences in signal intensity across brain regions — particularly between cortex and striatum — are well-known, though it is not obvious how these differences may manifest in terms of a task-based modulation of MR signals.

      To examine this issue in the current data set, we extracted, for each subject and time epoch (Baseline, Early and Late learning) the raw scanner data (in MR arbitrary units, a.u.) for the cortical and striatal regions and computed the (1) mean signal intensity, (2) standard deviation of the signal (Std) and (3) temporal signal to noise ratio (tSNR; calculated by mean/Std). Note that in the fMRI connectivity literature tSNR is often the preferred SNR measure as it normalizes the mean signal based on the signal’s variability over time, thus providing a general measure of overall ‘signal quality’. The results of this analysis, averaged across subjects and regions, is shown below.

      Author response image 7.

      Note that, as expected, the overall signal intensity (left plot) of cortex is higher than in the striatum, reflecting the closer proximity of cortex to the receiver coils in the MR head coil. In fact, the signal intensity in cortex is approximately 38% higher than that in the striatum (~625 - 450)/450). However, the signal variation in cortex is also greater than striatum (middle plot), but in this case approximately 100% greater (i.e., (~5 - 2.5)/2.5)). The result of this is that the tSNR (mean/std) for our data set and the ROI parcellations we used is actually greater in the striatum than in cortex (right plot). Thus, all else being equal, there seems to have been sufficient tSNR in the striatum for us to have detected motor-learning related effects. As such, we suspect the null effects for the striatum in our study actually stem from two sources.

      The first likely source is the relatively lower number of striatal regions (12) as compared to cortical regions (998) used in our analysis, coupled with our use of PCA on these data (which, by design, identifies the largest sources of variation in connectivity). In future studies, this unbalance could be rectified by using finer parcellations of the striatum (even down to the voxel level) while keeping the same parcellation of cortex (i.e., equate the number of ‘regions’ in each of striatum and cortex). The second likely source is our use of a striatal atlas (the Harvard-Oxford atlas) that divides brain regions based on their neuroanatomy rather than their function. In future work, we plan on addressing this latter concern by using finer, more functionally relevant parcellations of striatum (such as in Tian et al., 2020, Nature Neuroscience). Note that we sought to capture these interrelated possible explanations in our Discussion section, where we wrote the following:

      “While we identified several changes in the cortical manifold that are associated with reward-based motor learning, it is noteworthy that we did not observe any significant changes in manifold eccentricity within the striatum. While clearly the evidence indicates that this region plays a key role in reward-guided behavior (Averbeck and O’Doherty, 2022; O’Doherty et al., 2017), there are several possible reasons why our manifold approach did not identify this collection of brain areas. First, the relatively small size of the striatum may mean that our analysis approach was too coarse to identify changes in the connectivity of this region. Though we used a 3T scanner and employed a widely-used parcellation scheme that divided the striatum into its constituent anatomical regions (e.g., hippocampus, caudate, etc.), both of these approaches may have obscured important differences in connectivity that exist within each of these regions. For example, areas such the hippocampus and caudate are not homogenous areas but themselves exhibit gradients of connectivity (e.g., head versus tail) that can only be revealed at the voxel level (Tian et al., 2020; Vos de Wael et al., 2021). Second, while our dimension reduction approach, by design, aims to identify gradients of functional connectivity that account for the largest amounts of variance, the limited number of striatal regions (as compared to cortex) necessitates that their contribution to the total whole-brain variance is relatively small. Consistent with this perspective, we found that the low-dimensional manifold architecture in cortex did not strongly depend on whether or not striatal regions were included in the analysis (see Supplementary Fig. 6). As such, selective changes in the patterns of functional connectivity at the level of the striatum may be obscured using our cortex x striatum dimension reduction approach. Future work can help address some of these limitations by using both finer parcellations of striatal cortex (perhaps even down to the voxel level)(Tian et al., 2020) and by focusing specifically on changes in the interactions between the striatum and cortex during learning. The latter can be accomplished by selectively performing dimension reduction on the slice of the functional connectivity matrix that corresponds to functional coupling between striatum and cortex.”

      (4b) Could the authors clarify how activity in the dorsal attention network (DAN) changes throughout learning, and how these changes also relate to individual differences in learning performance? Specifically, on average, the DAN seems to expand early and contract late, relative to the baseline. This is interpreted to signify that the DAN exhibits lesser connectivity followed by greater connectivity with other brain regions. However, in terms of how these changes relate to behavior, participants who go against the average trend (DAN exhibits more contraction early in learning, and expansion from early to late) seem to exhibit better learning performance. This finding is quite puzzling. Does this mean that the average trend of expansion and contraction is not facilitative, but rather detrimental, to learning? [Another reviewer added: The authors do not state any explicit hypotheses, but only establish that DMN coordinates activity among several regions. What predictions can we derive from this? What are the authors looking for in the data? The work seems more descriptive than hypothesis-driven. This is fine but should be clarified in the introduction.]

      These are good questions, and we are glad the reviewers appreciated the subtlety here. The reviewers are indeed correct that the relationship of the DAN-A network to behavioral performance appears to go against the grain of the group-level results that we found for the entire DAN network (which we note is composed of both the DAN-A and DAN-B networks). That is, subjects who exhibited greater contraction from Baseline to Early learning and likewise, greater expansion from Early to Late learning, tended to perform better in the task (according to our fPCA scores). However, on this point it is worth noting that it was mainly the DAN-B network which exhibited group-level expansion from Baseline to Early Learning whereas the DAN-A network exhibited negligible expansion. This can be seen in Author response image 8 below, which shows the pattern of expansion and contraction (as in Fig. 4), but instead broken down into the 17-network parcellation. The red asterisk denotes the expansion from Baseline to Early learning for the DAN-B network, which is much greater than that observed for the DAN-A network (which is basically around the zero difference line).

      Author response image 8.

      Thus, it appears that the DAN-A and DAN-B networks are modulated to a different extent during the task, which likely contributes to the perceived discrepancy between the group-level effects (reported using the 7-network parcellation) and the individual differences effects (reported using the finer 17-network parcellation). Based on the reviewers’ comments, this seems like an important distinction to clarify in the manuscript, and we have now described this nuance in our Results section where we now write:

      “...Using this permutation testing approach, we found that it was only the change in eccentricity of the DAN-A network that correlated with Learning score (see Fig. 7C), such that the more the DAN-A network decreased in eccentricity from Baseline to Early learning (i.e., contracted along the manifold), the better subjects performed at the task (see Fig. 7C, scatterplot at right). Consistent with the notion that changes in the eccentricity of the DAN-A network are linked to learning performance, we also found the inverse pattern of effects during Late learning, whereby the more that this same network increased in eccentricity from Early to Late learning (i.e., expanded along the manifold), the better subjects performed at the task (Fig. 7D). We should note that this pattern of performance effects for the DAN-A — i.e., greater contraction during Early learning and greater expansion during Late learning being associated with better learning — appears at odds with the group-level effects described in Fig. 4A and B, where we generally find the opposite pattern for the entire DAN network (composed of the DAN-A and DAN-B subnetworks). However, this potential discrepancy can be explained when examining the changes in eccentricity using the 17-network parcellation (see Supplementary Figure 8). At this higher resolution level we find that these group-level effects for the entire DAN network are being largely driven by eccentricity changes in the DAN-B network (areas in anterior superior parietal cortex and premotor cortex), and not by mean changes in the DAN-A network. By contrast, our present results suggest that it is the contraction and expansion of areas of the DAN-A network (and not DAN-B network) that are selectively associated with differences in subject learning performance.”

      Finally, re: the reviewers’ comments that we do not state any explicit hypotheses etc., we acknowledge that, beyond our general hypothesis stated at the outset about the DMN being involved in reward-based motor learning, our study is quite descriptive and exploratory in nature. Such little work has been done in this research area (i.e., using manifold learning approaches to study motor learning with fMRI) that it would be disingenuous to have any stronger hypotheses than those stated in our Introduction. Thus, to make the exploratory nature of our study clear to the reader, we have added the following text (in red) to our Introduction:

      “Here we applied this manifold approach to explore how brain activity across widely distributed cortical and striatal systems is coordinated during reward-based motor learning. We were particularly interested in characterizing how connectivity between regions within the DMN and the rest of the brain changes as participants shift from learning the relationship between motor commands and reward feedback, during early learning, to subsequently using this information, during late learning. We were also interested in exploring whether learning-dependent changes in manifold structure relate to variation in subject motor performance.”

      We hope these changes now make it obvious the intention of our study.

      (4c) The paper examines a type of motor adaptation task with a reward-based learning component. This, to me, strongly implicates the cerebellum, given that it has a long-established crucial role in adaptation and has recently been implicated in reward-based learning (see work by Wagner & Galea). Why is there no mention of the cerebellum and why it was left out of this study? Especially given that the authors state in the abstract they examine cortical and subcortical structures. It's evident from the methods that the authors did not acquire data from the cerebellum or had too small a FOV to fully cover it (34 slices at 4 mm thickness 136 mm which is likely a bit short to fully cover the cerebellum in many participants). What was the rationale behind this methodological choice? It would be good to clarify this for the reader. Related to this, the authors need to rephrase their statements on 'whole-brain' connectivity matrices or analyses - it is not whole-brain when it excludes the cerebellum.

      As we noted above, we do not believe this task to be a motor adaptation task, in the sense that subjects are not able to use sensory prediction errors (and thus error-based learning mechanisms) to improve their performance. Rather, by denying subjects this sensory error feedback they are only able to use reinforcement learning processes, along with cognitive strategies (nicely covered in Tsay et al., 2023), to improve performance. Nevertheless, we recognize that the cerebellum has been increasingly implicated in facets of reward-based learning, particularly within the rodent domain (e.g., Wagner et al., 2017; Heffley et al., 2018; Kostadinov et al., 2019, etc.). In our study, we did indeed collect data from the cerebellum but did not include it in our original analyses, as we wanted (1) the current paper to build on prior work in the human and macaque reward-learning domain (which focuses solely on striatum and cortex, and which rarely discusses cerebellum, see Averbeck & O’Doherty, 2022 & Klein-Flugge et al., 2022 for recent reviews), and, (2) allow this to be a more targeted focus of future work (specifically we plan on focusing on striatal-cerebellar interactions during learning, which are hypothesized based on the neuroanatomical tract tracing work of Bostan and Strick, etc.). We hope the reviewers respect our decisions in this regard.

      Nevertheless, we acknowledge that based on our statements about ‘whole-brain’ connectivity and vagueness about what we mean by ‘subcortex,’ that this may be confusing for the reader. We have now removed and/or corrected such references throughout the paper (however, note that in some cases it is difficult to avoid reference to “whole-brain” — e.g., “whole-brain correlation map” or “whole-brain false discovery rate correction”, which is standard terminology in the field).

      In addition, we are now explicit in our Methods section that the cerebellum was not included in our analyses.

      “Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ~30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC-PC), providing whole-brain coverage of the cerebrum and cerebellum. Note that for the current study, we did not examine changes in cerebellar activity during learning.”

      (4d) The authors centered the matrices before further analyses to remove variance associated with the subject. Why not run a PCA on the connectivity matrices and remove the PC that is associated with subject variance? What is the advantage of first centering the connectivity matrices? Is this standard practice in the field?

      Centering in some form has become reasonably common in the functional connectivity literature, as there is considerable evidence that task-related (or cognitive) changes in whole-brain connectivity are dwarfed by static, subject-level differences (e.g., Gratton, et al, 2018, Neuron). If covariance matrices were ordinary scalar values, then isolating task-related changes could be accomplished simply by subtracting a baseline scan or mean score; but because the space of covariance matrices is non-Euclidean, the actual computations involved in this subtraction are more complex (see our Methods). However, fundamentally (and conceptually) our procedure is simply ordinary mean-centering, but adapted to this non-Euclidean space. Despite the added complexity, there is considerable evidence that such computations — adapted directly to the geometry of the space of covariance matrices — outperform simpler methods, which treat covariance matrices as arrays of real numbers (e.g. naive substraction, see Dodero et al. & Ng et al., references below). Moreover, our previous work has found that this procedure works quite well to isolate changes associated with different task conditions (Areshenkoff et al., 2021, Neuroimage; Areshenkoff et al., 2022, elife).

      Although PCA can be adapted to work well with covariance matrix valued data, it would at best be a less direct solution than simply subtracting subjects' mean connectivity. This is because the top components from applying PCA would be dominated by both subject-specific effects (not of interest here), and by the large-scale connectivity structure typically observed in component based analyses of whole-brain connectivity (i.e. the principal gradient), whereas changes associated with task-condition (the thing of interest here) would be buried among the less reliable components. By contrast, our procedure directly isolates these task changes.

      References cited above:

      Dodero, L., Minh, H. Q., San Biagio, M., Murino, V., & Sona, D. (2015, April). Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (pp. 42-45). IEEE.

      Ng, B., Dressler, M., Varoquaux, G., Poline, J. B., Greicius, M., & Thirion, B. (2014). Transport on Riemannian manifold for functional connectivity-based classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17 (pp. 405-412). Springer International Publishing.

      (4e) Seems like a missed opportunity that the authors just use a single, PCA-derived measure to quantify learning, where multiple measures could have been of interest, especially given that the introduction established some interesting learning-related concepts related to exploration and exploitation, which could be conceptualized as movement variability and movement accuracy. It is unclear why the authors designed a task that was this novel and interesting, drawing on several psychological concepts, but then chose to ignore these concepts in the analysis.

      We were disappointed to hear that the reviewers did not appreciate our functional PCA-derived measure to quantify subject learning. This is a novel data-driven analysis approach that we have previously used with success in recent work (e.g., Areshenkoff et al., 2022, elife) and, from our perspective, we thought it was quite elegant that we were able to describe the entire trajectory of learning across all participants along a single axis that explained the majority (~75%) of the variance in the patterns of behavioral learning data. Moreover, the creation of a single behavioral measure per participant (what we call a ‘Learning score’, see Fig. 6C) helped simplify our brain-behavior correlation analyses considerably, as it provided a single measure that accounts for the natural auto-correlation in subjects’ learning curves (i.e., that subjects who learn quickly also tend to be better overall learners by the end of the learning phase). It also avoids the difficulty (and sometimes arbitrariness) of having to select specific trial bins for behavioral analysis (e.g., choosing the first 5, 10, 20 or 25 trials as a measure of ‘early learning’, and so on). Of course, one of the major alternatives to our approach would have involved fitting an exponential to each subject’s learning curves and taking measures like learning rate etc., but in our experience we have found that these types of models don’t always fit well, or derive robust/reliable parameters at the individual subject level. To strengthen the motivation for our approach, we have now included the following text in our Results:

      “To quantify this variation in subject performance in a manner that accounted the auto-correlation in learning performance over time (i.e., subjects who learned more quickly tend to exhibit better performance by the end of learning), we opted for a pure data-driven approach and performed functional principal component analysis (fPCA; (Shang, 2014)) on subjects’ learning curves. This approach allowed us to isolate the dominant patterns of variability in subject’s learning curves over time (see Methods for further details; see also Areshenkoff et al., 2022).”

      In any case, the reviewers may be pleased to hear that in current work in the lab we are using more model-based approaches to attempt to derive sets of parameters (per participant) that relate to some of the variables of interest described by the reviewers, but that we relate to much more dynamical (shorter-term) changes in brain activity.

      (4f) Overall Changes in Activity: The manuscript should delve into the potential influence of overall changes in brain activity on the results. The choice of using Euclidean distance as a metric for quantifying changes in connectivity is sensitive to scaling in overall activity. Therefore, it is crucial to discuss whether activity in task-relevant areas increases from baseline to early learning and decreases from early to late learning, or if other patterns emerge. A comprehensive analysis of overall activity changes will provide a more complete understanding of the findings.

      These are good questions and we are happy to explore this in the data. However, as mentioned in our response to query 4a above, it is important to note that the timeseries data for each brain region was z-scored prior to analysis, with the aim of removing any mean changes in activity levels (note that this is a standard preprocessing step when performing functional connectivity analysis, given that mean signal changes are not the focus of interest in functional connectivity analyses).

      To further emphasize these points, we have taken our z-scored timeseries data and calculated the mean signal for each region within each task epoch (Baseline, Early and Late learning, see panel A in figure below). The point of showing this data (where each z-score map looks near identical across the top, middle and bottom plots) is to demonstrate just how miniscule the mean signal changes are in the z-scored timeseries data. This point can also be observed when plotting the mean z-score signal across regions for each epoch (see panel B in figure below). Here we find that Baseline and Early learning have a near identical mean activation level across regions (albeit with slightly different variability across subjects), whereas there is a slight increase during late learning — though it should be noted that our y-axis, which measures in the thousandths, really magnifies this effect.

      To more directly address the reviewers’ comments, using the z-score signal per region we have also performed the same statistical pairwise comparisons (Early > Baseline and Late>Early) as we performed in the main manuscript Fig. 4 (see panel C in Author response image 9 below). In this plot, areas in red denote an increase in activity from Baseline to Early learning (top plot) and from Early to Late learning (bottom plot), whereas areas in blue denote a decrease for those same comparisons. The important thing to emphasize here is that the spatial maps resulting from this analysis are generally quite different from the maps of eccentricity that we report in Fig. 4 in our paper. For instance, in the figure below, we see significant changes in the activity of visual cortex between epochs but this is not found in our eccentricity results (compare with Fig. 4). Likewise, in our eccentricity results (Fig. 4), we find significant changes in the manifold positioning of areas in medial prefrontal cortex (MPFC), but this is not observed in the activation levels of these regions (panel C below). Again, we are hesitant to make too much of these results, as the activation differences denoted as significant in the figure below are likely to be an effect on the order of thousandths of a z-score (e.g., 0.002 > 0.001), but this hopefully assuages reviewers’ concerns that our manifold results are solely attributable to changes in overall activity levels.

      We are hesitant to include the results below in our paper as we feel that they don’t add much to the interpretation (as the purpose of z-scoring was to remove large activation differences). However, if the reviewers strongly believe otherwise, we would consider including them in the supplement.

      Author response image 9.

      Examination of overall changes in activity across regions. (A) Mean z-score maps across subjects for the Baseline (top), Early Learning (middle) and Late learning (bottom) epochs. (B) Mean z-score across brain regions for each epoch. Error bars represent +/- 1 SEM. (C) Pairwise contrasts of the z-score signal between task epochs. Positive (red) and negative (blue) values show significant increases and decreases in z-score signal, respectively, following FDR correction for region-wise paired t-tests (at q<0.05).

    1. eLife assessment

      This important study reports the fungal composition and its interaction with bacteria in the Caesarean section scar diverticulum. The data are solid and supportive of the conclusion. This work will be of interest to researchers and clinicians who work on women's health.

    2. Reviewer #2 (Public Review):

      Summary:

      Shotgun data have been analysed to obtain fungal and bacterial organisms abundance. Through their metabolic functions and through co-occurrence networks, a functional relationship between the two types of organisms can be inferred. By means of metabolomics, function-related metabolites are studied in order to deepen the fungus-bacteria synergy.

      Strengths:

      Data obtained in bacteria correlate with data from other authors.<br /> The study of metabolic "interactions" between fungi and bacteria is quite new.<br /> The inclusion of metabolomics data to support the results is a great contribution.

      Weaknesses:

      All my concerns have been clarified