10,000 Matching Annotations
  1. Feb 2026
    1. Reviewer #2 (Public review):

      Summary:

      The fascinating topic of the host range of arthropods, including insects, and the detoxification of host secondary metabolites has been elucidated through studies of the host specificity of two closely related species. The discovery that key genes were acquired from fungi through horizontal gene transfer (HGT) is particularly significant.

      Strengths:

      (1) The discovery that the TkDOG15 enzyme, acquired through HGT from fungi, plays a key role in the detoxification of green tea catechins in the Kanzawa mite, revealing a new mechanism of plant-herbivore interactions, is highly encouraging.

      (2) The verification of this finding through various experiments, including behavioral, toxicological, transcriptomic, and proteomic analyses, RNAi-based gene function analysis, and recombinant enzyme activity assays, is also highly commendable.

      (3) By proposing a two-step model in which amino acid substitutions and expression regulation of a specific enzyme gene (TkDOG15) enable host adaptive evolution, this study contributes significantly to our understanding of the evolutionary mechanisms of speciation and plant defense overcoming.

      Weaknesses:

      While transcriptome/proteome analyses reported changes in the expression of other detoxification-related enzymes, including CCEs, UGTs, ABC transporters, DOG1, DOG4, and DOG7, it is regrettable that the contribution of each enzyme, including its interaction with TkDOG15 and the functional analysis of each enzyme within the overall catechin detoxification system, was not investigated.

    1. eLife Assessment

      This convincing study examines a novel interaction of RAB5 with VPS34 complex II. Structural data are combined with site-directed mutagenesis, sequence analysis, biochemistry, yeast mutant analysis, and prior data on RAB1-VPS34 and RAB5-VPS34 interactions to provide a new perspective on how RAB GTPases recruit related but distinct VPS34 complexes to different organelles. Although minor revisions are recommended, the judgment is that this work represents a fundamental advance in our understanding of VPS34 localization and regulation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents high-resolution cryoEM structures of VPS34-complex II bound to Rab5A at 3.2A resolution. The Williams group previously reported the structure of VPS34 complex II bound to Rab5A on liposomes using tomography, and therefore, the previous structure, although very informative, was at lower resolution.

      The first new structure they present is of the 'REIE>AAAA' mutant complex bound to RAB5A. The structure resembles the previously determined one, except that an additional molecule of RAB5A was observed bound to the complex in a new position, interacting with the solenoid of VPS15.

      Although this second binding site exhibited reduced occupancy of RAB5A in the structure, the authors determined an additional structure in which the primary binding site was mutated to prevent RAB5A binding ('REIE>ERIR'). In this structure, there is no RAB5A bound to the primary binding site on VPS34, but the RAB5A bound to VPS15 now has strong density. The authors note that the way in which RAB5A interacts with each site is distinct, though both interfaces involve the switch regions. The authors confirm the location of this additional binding site using HDX-MS.

      The authors then determine multiple structures of the wild-type complex bound to RAB5A from a single sample, as they use 3D classifications to separate out versions of the complex bound to 0, 1, or 2 copies of RAB5A. Overall, the structure of VPS34-Complex II does not change between the different states, and the data indicate that both RAB5A binding sites can be occupied at the same time.

      The authors then design a new mutant form of the complex (SHMIT>DDMIE) that is expected to disrupt the interaction at the secondary site between VPS15 and RAB5A. This mutation had a minor impact on the Kd for RAB5A binding, but when combined with the REIE>ERIR mutation of the primary binding site, RAB5A binding to the complex was abolished.

      Comparison of sequences across species indicated that the RAB5A binding site on VPS15 was conserved in yeast, while the RAB5A binding site on VPS34 is not.

      The authors tested the impact of a corresponding yeast Vps15 mutation (SHLITY>DDLIEY) predicted to disrupt interaction with yeast Rab5/Vps21, and found that this mutant Vps15 protein was mislocalized and caused defective CPY processing.

      The authors then compare these structures of the RAB5A-class II complex to recently published structures from the Hurley group of the RAB1A-class I complex, and find that in both complexes the Rab protein is bound to the VPS34 binding site in a somewhat similar manner. However, a key difference is that the position of VPS34 is slightly different in the two complexes because of the unique ATL14L and UVRAG subunits in the class I and class II complexes, respectively. This difference creates a different RAB binding pocket that explains the difference in RAB specificity between the two complexes.

      Finally, the higher resolution structures enable the authors to now model portions of BECLIN1 and UVRAG that were not previously modeled in the cryoET structure.

      Strengths:

      Overall, I found this to be an interesting and comprehensive study of the structural basis for the interaction of RAB5A with VPS34-complex II. The authors have performed experiments to validate their structural interpretations, and they present a clear and thorough comparative analysis of the Rab binding sites in the two different VPS34 complexes. The result is a much better understanding of how two different Rab GTPases specifically recruit two different, but highly similar complexes to the membrane surface.

      Weaknesses:

      No significant weaknesses were noted.

    3. Reviewer #2 (Public review):

      The work by Spokaite et al describes the discovery of a novel Rab5 binding site present in complex II of class III PI3K using a combination of HDX and Cryo EM. Extensive mutational and sequence analysis define this as the primordial Rab5 interface. The data presented are convincing that this is indeed a biologically relevant interface, and is important in defining mechanistically how VPS34 complexes are regulated.

      This paper is a very nice expansion of their previous cryo-ET work from 2021, and is an excellent companion piece on high-resolution cryo-EM of the complex I class III complex bound to Rab1 from the Hurley lab in 2025. Overall, this work is of excellent technical quality and answers important unexplained observations on some unexpected mutational analysis from the previous work.

      They used their increased affinity VPS34 mutant to determine the 3.2 ang structure of Rab5 bound to VPS34-CII. Clear density was seen for the original Rab5 interface, but an additional site was observed. Based on this structure, they mutated out the VPS34 interface, allowing for a high-resolution structure of the Rab5 bound at the VPS15 interface.

      They extensively validated the VPS15 interface in the yeast variant of VPS34, showing that the Vp215-Rab5 (VPS21) interface identified is critical in controlling complex II VPS34 recruitment.

      The major strengths of this paper are that the experiments appear to be done carefully and rigorously, and I have very few experimental suggestions.

      Here is what I recommend based on some very minor weaknesses I observed

      (1) My main concern has to do a little bit with presentation. My main issue is how the authors use mutant description. They clearly indicate the mutant sequence in the human isoform (for example, see Figure 2A, VPS15 described as 579-SHMIT-583>DDMIE); however, when they shift to the yeast version, they shift to saying VPS15 mutant, but don't define the mutant, Figure 2G). I would recommend they just include the same sequence numbering and WT to mutant replacement every time a new mutant (or species) is described. It is always easier to interpret what is being shown when the authors are jumping between species, when the exact mutant is included. This is particularly important in this paper, where we are jumping between different subunits and different species, so a clear description in the figure/figure legends makes it much easier to read for non-specialists.

      (2) The HDX data very clearly shows that Rab5 is likely able to bind at both sites, which back ups the cryo EM data nicely. I am slightly confused by some of the HDX statements described in the methods.

      (3) The authors state, "Only statistically significant peptides showing a difference greater than 0.25 Da and greater than 5% for at least two timepoints were kept." This seems to be confusing as to why they required multiple timepoints, and before they also describe that they required a p-value of less than 0.05. It might be clearer to state that significant differences required a 0.25 Da, 5%, and p-value of <0.05 (n=3). Also, what do they mean by kept? Does this mean that they only fully processed the peptides with differences?

      (4) They show peptide traces for a selection in the supplement, but it would be ideal to include the full set of HDX data as an Excel file, including peptides with no differences, as there is a lot of additional information (deuteration levels for everything) that would be useful to share, as recommended from the Masson et al 2019 recommendations paper. This may be attached, but this reviewer could not see an example of it in the shared data dropbox folder.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript of Spokaite et al. focuses on the Vps34 complex involved in PI3P production. This complex exists in two variants, one (class I) specific for autophagy, and a second one (class II) specific for the endocytic system. Both differ only in one subunit. The authors previously showed that the Vps34 complexes interact with Rab GTPases, Rab1 or Rab5 (for class II), and the identified site was found at Vps34. Now, the authors identify a conserved and overlooked Rab5 binding site in Vps15, which is required for the function of the Class II complex. In support of this, they show cryo-EM data with a second Rab5 bound to Vps15, identify the corresponding residues, and show by mutant analysis that impaired Rab5 binding also results in defects using yeast as a model system.

      Overall, this is a most complete study with little to criticize. The paper shows convincingly that the two Rab5 binding sites are required for Vps34 complex II function, with the Vps15 binding site being critical for endosomal localization. The structural data is very much complete. What I am missing are a few controls that show that the mutations in Vps15 do not affect autophagy. I also found the last paragraph of the results section a bit out of place, even though this is a nice observation that the N-terminal part of BECLIN has these domains. However, what does it add to the story?

    1. eLife Assessment

      Li et al. present an important and innovative study linking developmental changes in sleep to ecological context in Drosophila mojavensis, and propose that sleep at one stage of an animal's life might anticipate needs at a future stage. The results fit well with this model, but are correlative in nature. The work is convincing, scientifically rigorous, and effectively bridges sleep biology and evolutionary ecology, opening promising new directions for the field.

    2. Joint Public review:

      Summary

      This interesting work by Shuhao Li and colleagues suggests that developmental sleep and feeding behavior in larval flies is genetically programmed to prepare the animal for adult contingencies, such as in the case of flies living in harsh ecological environments, such as deserts. Thus, the work proposes that desert-dwelling flies such as Drosophila mojavensis sleep less and feed more than D. melanogaster as larvae, which allows them to feed less and sleep more as adults in the harsh desert conditions where they live. The authors argue that this is evidence for developmental sleep reallocation, which helps the adult flies survive in the desert. In general, their results support this compelling hypothesis, so this work provides a new perspective on how sleep might be differentially programmed across developmental stages according to the requirements of an ecological niche. This work is particularly innovative for several reasons. First, it extends the Drosophila sleep field beyond D. melanogaster and directly addresses questions about the evolution of sleep that remain largely unexplored. Second, it investigates the possibility that changes in sleep across development may be adaptive, rather than sleep being a static trait. Overall, this work opens new avenues of research, effectively bridges the fields of sleep biology and evolutionary ecology, and should be of broad interest to a general readership. The manuscript is scientifically sound and clearly written for a generalist audience.

      There are, however, two important weaknesses that should be addressed. The first is the implicit assumption that all observed behavioral differences are adaptive; this would benefit from a more cautious framing. Second, the manuscript would be strengthened by a more detailed discussion, and potentially additional data, regarding the ecological differences experienced by D. mojavensis and D. melanogaster at distinct life-cycle stages.

      Strengths:

      (1) The study astutely uses desert Drosophila species as models to understand how sleep is optimized in a challenging environment. The manuscript is rigorous, experiments are well controlled, the work is very clearly presented, and the results support the main conclusions, which are quite exciting.

      (2) The manuscript examines previously unexplored sleep differences in a non-melanogaster species.

      (3) The study provides evidence that selective pressure can be restricted to specific developmental stages.

      (4) This work offers evolutionary insights into the trade-offs between sleep and feeding across development.

      Weaknesses

      (1) The authors should soften interpretations so that it is not assumed that any observed difference between mojavensis and melanogaster is necessarily adaptive, or evolved due to food availability or temperature stress.

      (2) The study relies on comparisons and correlations. While it seems likely that the observed differences in sleep explain the increased food consumption and energy storage in the larvae of desert flies, demonstrating this through sleep manipulation would strengthen the authors' conclusions.

      (3) The question arises regarding whether transiently quiescent larvae are always really sleeping, and whether it is appropriate to treat sleep as a stochastic population-level phenomenon rather than as an individual trait.

      (4) The manuscript would benefit from comparative analysis beyond mojavensis and melanogaster.

      (5) A deeper discussion of the ecological differences between the 2 Drosophila species would place the results in a broader context.

      (6) The feeding parameters used in adults and larvae measure different aspects of feeding, confounding comparisons.

    1. eLife Assessment

      This work presents a brain-wide atlas of vasopressin (Avp) and vasopressin receptor 1A (Avpr1a) mRNA expression in mouse brains using high-resolution RNAscope in situ hybridization. The single-transcript approach provides precise localization and identifies additional brain regions expressing Avpr1a, creating a valuable resource for the field. The revised manuscript is clearer and more impactful, with improved figures, stronger data organization, and enhanced scholarship through added context and citations. Overall, the evidence is compelling, and the atlas should be broadly of use to researchers studying vasopressin signaling and related neural circuits.

    2. Reviewer #1 (Public review):

      Summary:

      Despite accumulating prior studies on the expressions of AVP and AVPR1a in the brain, a detailed, gender-specific mapping of AVP/AVPR1a neuronal nodes has been lacking. Using RNAscope, a cutting-edge technology that detects single RNA transcripts, the authors created a comprehensive neuroanatomical atlas of Avp and Avpr1a in male and female brains.

      Strengths:

      This well-executed study provides valuable new insights into gender differences in the distribution of Avp and Avpr1a. The atlas is an important resource for the neuroscience community.

      The authors have adequately addressed all of my concerns. I have no further questions or concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors conducted a brain-wide survey of Avp (arginine vasopressin) and its Avpr1a gene expression in the mouse brain using RNAscope, a high-resolution in situ hybridization method. Overall, the findings are useful and important because they identify brain regions that express the Avpr1a transcript. A comprehensive overview of Avpr1a expression in the mouse brain could be highly informative and impactful. The authors used RNAscope (a proprietary in situ hybridization method) to assess transcript abundance of Avp and one of its receptors, Avpr1a. The finding of Avp-expressing cells outside the hypothalamus and the extended amygdala is novel and is nicely demonstrated by new photomicrographs in the revised manuscript. The Avpr1a data suggest expression in numerous brain regions. In the revised manuscript, reworked figures make the data easier to interpret.

      Strengths:

      A survey of Avpr1a expression in the mouse brain is an important tool for exploring vasopressin function in the mammalian brain and for developing hypotheses about cell- and circuit-level function.

      Future considerations:

      The work contained in the manuscript is substantial and informative. Some questions remain and would be addressed in the current manuscript. How many cells are impacted? Are transcripts spread across many cells or only present in a few cells? Is density evenly distributed through a brain region or compacted into a subfield?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for great suggestions.

      (1) The X-axis labels in some panels in Figure 2C and Supplementary Figure 2B overlap, making them difficult to read. Adjusting the label spacing or formatting would improve clarity.

      We thank the reviewer for the comment. All panels including Figure 2C and Supplementary Figure 2B, have now been organized the way in which X-axis labels are easily read.

      (2) In the scatter dot plot bar diagrams, it appears that n=3 for most of the data. Does this represent the number of mice used or the number of tissue sections per sample? This should be clarified in the figure legends for better transparency. 

      Great suggestion. In Results (page 7, lines 135-136), we now clarified that quantification was performed on every tenth section of the brain from 3 female and 3 male mice. Additionally, in the legends for scatter dot plot bar diagrams we also mentioned that n=3 represents the number of mice used.

      (3) In Supplemental Figure 2B, the positive signals are not clearly visible. Providing higher-magnification images is recommended.

      Great suggestion. The revised Supplemental Figure 2B, but also Figure 2A, now provide higher magnification inset images with distinctive positive signals.

      Reviewer #2:

      We thank the reviewer for great and critical suggestions.

      (1) Introduction:

      Line 58: References should be provided for this statement as it is based on a robust field of research, not on a new concept.

      We thank the reviewer for the comment. We have now included relevant references as suggested (page 4, line 58).

      (2) Line 100-102: This sentence seems to make new, an idea that has been well-documented since the late 1970s. Posterior pituitary hormones oxytocin and vasopressin have long been known to have multiple peripheral targets, and at least a subset of vasopressin and oxytocin neurons have robust central projections. The central targets have been the focus of study for numerous labs. Reference 34 does not relate to posterior pituitary hormones and seems mis-cited.

      We have changed this sentence, excluded the reference that does not relate to posterior pituitary hormones and added 4 further references reporting other non-traditional roles of vasopressin and oxytocin (page 6, lines 100-102).

      (3) Lines 102-108: While the regulation of bone is an interesting example of an under-appreciated impact of vasopressin, the example does not build to the rationale for examining central Avp and Avpr1a expression.

      We mean no disrespect here, but we have recently reported neural brain-bone connections using the SNS-specific PRV152 virus (Ryu et al., 2024; PMID: 38963696) and submitted Single Transcript Level Atlas of Oxytocin and the Oxytocin Receptor in the Mouse Brain (doi: https://doi.org/10.1101/2024.02.15.580498). Surprisingly, we detected Avpr1a and Oxtr expression in certain brain areas (for example, PVH and MPOM) that connect to both bone and adipose tissue through the SNS—raising an important question regarding a central role of Avpr1a and Oxtr in bodily mass and fat regulation. 

      (4) Line 111: Avp expression and Avpr1a expression have both been studied using in situ hybridization. Thus, the overall concept is less novel than hinted at in the text. Avp expression has been studied quite extensively. Avpr1a expression has not been studied in an exhaustive fashion. 

      We thank the reviewer for this comment and absolutely agree that brain AVP expression has been studied extensively. As with the Avpr, we believe that RNAscope probe design and signal amplification system employed in our study allow for more specific and sensitive detection of individual RNA targets at the single transcript level with much cleaner background noise comparing to in situ hybridization method. 

      (5) Results:

      Line 143: RNAscope is indeed a powerful method of detecting mRNA at the single transcript level. However, using that single transcript resolution only to provide transcript per brain region analysis, losing all of the nuance of the individual transcript expression, seems like a poor use of the method potential.

      This is a good point and we did notice that Avpr1a transcript expression in several brain nuclei displayed individual pattern of expression versus more ubiquitous expression in most of the other brain areas. We noted this finding in Results (page 9, lines 164-168); however, because of the word limits in Discussion, we are not sure what would be dropped to make more room and whether this is truly necessary.

      (6 &7) Line 135: Sections were coded from 3 males and 3 females. I would argue that there is not enough statistical power to make inferences regarding sex differences or regional differences. In fact, the authors did not provide any statistical analysis in the manuscript at all, even though they stated they had completed statistical tests on the methods.

      150-157: All statements regarding sex differences in expression are made without statistical analyses, which, if conducted, would be underpowered. Given the limitations of performing and analyzing RNAscope data en masse a low n is understandable, but it requires a much more precise description of the data and a more careful look at how the results can be interpreted.

      We thank the reviewer for these comments. We mean no disrespect here, but while statistical analysis for main brain regions is relevant, it is not meaningful as far as nuclei, sub-nuclei and regions are concerned. It is noteworthy to mention that RNAscope data analysis in the whole mouse brain is an extremely drawn-out process requiring almost 2 months to conduct exhaustive manual counting of single Avpr1a transcripts in a single mouse brain—authors analyzed 6 brains. That said, statistical tests have been performed and exact P values are now shown in graphs.

      (8) Line 146: I am flagging this instance, but it should be corrected everywhere it occurs. Since we cannot know the gender of a given mouse, I would recommend referring to the mouse's "sex" rather than its "gender."

      Good suggestion. We made adequate changes throughout the manuscript.

      (9) Line 153: The authors switch to discussing cell numbers. Why is this data relegated to the supplemental material?

      Main figures in the manuscript report Avp and Avpr1a transcript density which has more important biological significance in terms of signal efficiency and cellular response dynamics. Due to the graph abundancy in the main text, we included all graphs with Avp and Avpr1a transcript counts in the supplemental material.

      (10) Methods:

      Line 369: "For simplicity and clarity, exact test results and exact P values are not presented." Simplicity or clarity is not a scientific rationale not to provide accurate statistics.

      We now provide exact P values in the graphs and the sentence in line 369 has been corrected accordingly (page 18, lines 379-380).

      (11) Line 362: The description of how data were analyzed is inadequate. More detail is needed.

      Agreed. We now included a detailed description on how data was analyzed (page 18, lines 365-374).

      (12) Discussion:

      Line 321: "This contrasts the rudimentary attribution of a single function per brain area." While brain function is often taught in such rudimentary terms to make the information palatable to students, I do not think the scientific literature on vasopressin function published over the past 50 years would suggest that we are so naïve in interpreting the functional role of vasopressin in the brain. Clearly, vasopressin is involved in numerous brain functions that likely cross behavioral modalities.

      Agreed and we removed this sentence.

      (13) Line 322: "The approach of direct mapping of receptor expression in the brain and periphery provides the groundwork." On its face, this statement is true, but the present data build on the groundwork laid by others (multiple papers from Ostrowski et al. in the early 1990s).

      Agreed.

      (14) Figures:

      Figure 1: 1B, I do not know the purpose of creating graphs with single bars (3V, ic, pir-male, and pir-female); there are no comparisons made in the graph. In the graphs with many brain regions, very little data can be effectively represented with the scale as it is. I recommend using tables to provide the count/density data and making graphs of only the most robust areas. In addition, although there is no statistical comparison, combining males and females in the same graphs might be beneficial to make a visual comparison easier. Why were cell counts only included in the supplemental material? Is that data not relevant?

      We thank the reviewer for this comment. Now all figures are presented in a more effective and aesthetically pleasing way.

      (15) There is a real missed opportunity to highlight some of the findings. For example, cell counts and density measures are provided for regions in the hippocampus, thalamus, and cortex that are not typically reported to contain vasopressin-expressing cells. Photomicrographs of these locations showing the RNAscope staining would be far more impactful in reporting these data. Are there cells expressing Avp, or is the Avp mRNA in these areas contained in fibers projecting to these areas from hypothalamic and forebrain sources?

      Great suggestion. We now see in Figure 1D showing novel Avp transcript expression in the hippocampus, thalamus and cortex. Based on counterstained hematoxylin staining, Avp mRNA transcripts were found in somata.

      (16) Figure 1C legend suggests images of the hippocampus and cortex, but all images are from the hypothalamus. Abbreviations are not defined.

      Thank you for the comment. We corrected Figure 1C legend and separately included Figure 1D showing novel Avp mRNA expression in the hippocampus and cortex.

      (17) Figure 2: The analysis of Avpr1a suffers from some of the same issues as the Avp analysis. In Figure 2A, the photomicrographs do not do a very good job of illustrating representative staining. The central canal image does not appear to have any obvious puncta, but the density of Avpr1a puncta suggests something different. The sex difference in the arcuate is also not clearly apparent in representative images. There is minimal visualization of the data for a project that depends so heavily on the appearance of puncta in tissue, coupled with the lack of clarity in the images provided, greatly diminished the overall enthusiasm for the data presentation. The figures in 2C would be more useful as tables with graphs used to highlight specific regions; as is, most of the data points are lost against the graph axis. Photomicrographs would also provide a better understanding of the data than graphs.

      Great suggestion. The revised Figure 2A but also Supplemental Figure 2B now provide higher magnification inset images with distinctive positive signals. As with Figures 2C, we arranged all graphs in a more effective and aesthetically pleasing manner.

      (18) Figure 3: Given the low number of animals and, therefore, low statistical power, I do not think that illustrating the ratios of male to female is a statistically valid comparison.

      Please see response to Point 6 & Point 7.

      (19) Figure 4: Pituitary is an interesting choice to analyze. However, why was only the posterior pituitary analyzed? Were Avp transcripts contained in terminals of vasopressin neuron axons or other cells? Was Avpr1a transcript present in blood vessel cells where Avp is released? A different cell type? Why not examine the anterior pituitary, which also expresses Avp receptors (although the literature suggests largely Avpr1b)?

      Thank you for the great comment. We included only posterior pituitary because there were no positive Avp/Avpr1a transcripts found in the anterior pituitary. Unfortunately, we have not performed cell type-specific staining, which would have enabled greater variation in AVP and its receptor expression across various cell types.

    1. eLife Assessment

      This study provides useful insights regarding the alterations of sleep architecture in a knock-in mouse model of Alzheimer's Disease (AD). These include age-related hyperactivity that is typically associated with increased arousal, a normal homeostatic response to sleep loss, and a stronger AD-like phenotype in females. Although the analyses are robust, evidence for the proposed mechanisms underlying abnormal sleep architecture is incomplete. Overall, the study may have a focused impact on the sleep and AD fields.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript titled, "Sleep-Wake Transitions Are Impaired in the AppNL-G-F Mouse Model of Early Onset Alzheimer's Disease", is about a study of sleep/wake phenomena in a knockin mouse strain carrying "three mutations in the human App gene associated with elevated risk for early onset AD". Traditional, in-depth characterization of sleep/wake states, EEG parameters, and response to sleep loss are employed to provide evidence, "supporting the use of this strain as a model to investigate interventions that mitigate AD burden during early disease stages". The sleep/wake findings of earlier studies (especially Maezono et al., 2020, as noted by the authors) were extended by several important, genotype-related observations, including age-related hyperactivity onset that is typically associated with increased arousal, a normal response to loss of sleep and to multiple sleep latency testing, and a stronger AD-like phenotype in females. The authors conclude that the AppNL-G-F mice demonstrate many of the human AD prodromal symptoms and suggest that this strain may serve as a model for prodromal AD in humans, confirming the earlier results and conclusions of Maezono et al. Finally, based on state bout frequency and duration analyses, it is suggested that the AppNL-G-F mice may develop disruptions in mechanism(s) involved in state transition.

      Strengths:

      The study appears to have been, technically, rigorously conducted with high quality, in-depth traditional assessment of both state and EEG characteristics, with the concordant addition of activity and temperature. The major strengths of this study derive from observations that the AppNL-G-F mice: (1) are more hyperactive in association with decreased transitions between states; (2) maintain a normal response to sleep deprivation and have normal MSLT results; and (3) display a sex specific, "stronger" insomnia-like effect of the knockin in females.

      Weaknesses:

      The weaknesses stem from the study's impact being limited due to its being largely confirmatory of the Maezono et al. study, with advances of importance to a potentially more focused field. Further, the authors conclude that AppNL-G-F mice have disrupted mechanism(s) responsible for state transition; however, these were not directly examined. The rationale for this conclusion is stated by the authors as based on the observations that bouts of both W and NREM tend to be longer in duration and decreased in frequency in AppNL-G-F mice. Although altered mechanism(s) of state transition (it is not clear what mechanisms are referenced here) cannot be ruled out, other explanations might be considered. For example, increased arousal in association with hyperactivity would be expected to result in increased duration of W bouts during the active phase. This would also predictably result in greater sleep pressure that is typically associated with more consolidated NREM bouts, consistent with the observations of bout duration and frequency.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have used a knock-in mouse model to explore late-in-life amyloid effects on sleep. This is an excellent model as the mutated genes are regulated by the endogenous promoter system. The sleep study techniques and statistical analyses are also first-rate.

      The group finds an age-dependent increase in motor activity in advanced age in the NLGF homozygous knock-in mice (NLGF), with a parallel age-dependent increase in body temperature, both effects predominate in the dark period. Interestingly, the sleep patterns do not quite follow the sleep changes. Wake time is increased in NLGF mice, and there is no progression in increased wake over time. NREMS and REM sleep are both reduced, and there is no progression. Sleep-wake effects, however, show a robust light:dark effect with larger effects in the dark period. These findings support distinct effects of this mutation on activity and temperature and on sleep. This is the first description of the temporal pattern of these effects. NLGF mice show wake stability (longer bout durations in the dark period (their active period) and fewer brief arousals from sleep. Sleep homeostasis across the lights-on period is normal. Wake power spectral density is unaffected in NLGF mice at either age. Only REM power spectra are affected, with NLGF mice showing less theta and more delta. There are interesting sex differences, with females showing no gene difference in wake bout number, while males show a gene effect. Similarly, gene effects on NREM bout number seem larger in males than in females. Although there was no difference in homeostatic response, there was normalization of sleep-wake activity after sleep deprivation.

      Strengths:

      Approach (model extent of sleep phenotyping), analysis.

      Weaknesses:

      The weaknesses are summarized below and are viewed as "addressable".

      (1) The term insomnia. Insomnia is defined as a subjective dissatisfaction with sleep, which cannot be ascertained in a mouse model. The findings across baseline sleep in NLGF mice support increased wake consolidation in the active period. The predominant sleep period (lights on) is largely unaffected, and the active period (lights off) shows increased activity and increased wake with longer bouts. There is a fantastic clue where NLGF effects are consistent with increased hypocretinergic (orexinergic) neuron activity in the dark period, and/or increased drive to hypocretin neurons from PVH.

      (2) Sleep-wake transitions are impaired: This should not be termed an impairment. It could actually be beneficial to have greater state stability, especially wake stability in the dark or active period. There is reduced sleep in the model that can be normalized by short-term sleep loss. It is fascinating that recovery sleep normalized sleep in the NLGF in the immediate lights-on and light-off period. This is a key finding.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Tisdale et al. studied the sleep/wake patterns in the biological mouse model of Alzheimer's disease. The results in this study, together with the established literature on the relationship of sleep and Alzheimer's disease progression, guided the authors to propose this mouse model for the mechanistic understanding of sleep states that translates to Alzheimer's disease patients. However, the manuscript currently suffers from a disconnect between the physiological data and the mechanistic interpretations. Specifically, the claim of "impaired transitions" is logically at odds with the observed increase in wake-state stability or possible hyperactivity. Additionally, the description of the methods, the quantification, and the figure presentation could be substantially improved. I detail some of my concerns below.

      Strengths:

      The selection of the knock-in model is a notable strength as it avoids the artifacts associated with APP overexpression and more closely mimics human pathology. The study utilizes continuous 14-day EEG recordings, providing a unique dataset for assessing chronic changes in arousal states. The assessment of sex as a biological variable identifies a more severe "insomniac-like" phenotype in females, which aligns with the higher prevalence and severity of Alzheimer's disease in women.

      Weaknesses:

      The study seems to lack a clear hypothesis-driven approach and relies mostly on explorative investigations. Moreover, lack of quantitative analytical methods as well as shaky logical conclusions, possibly not supported by data in its current form, leaves room for major improvement.

      Since this paper studied sleep states, the "Methods" section is quite unclear on what specific criteria were used to classify sleep states. There is no quantitative description of classifying sleep based on clear, reproducible procedures. There are many reasonably well-characterized sleep scoring systems used in rat electrophysiological literature, which could be useful here. The authors are generally expected to describe movement speed and/or EMG and/or EEG (theta/delta/gamma) criteria used to classify these epochs. The subjective (manual) nature of this procedure provides no verifiable validation of the accuracy and interpretability of the results.

      One of the bigger claims is that "state transition mechanism(s)" are impaired. However, Figure 7 shows that model mice exhibit significantly more long wake bouts (>260s) and fewer short wake bouts (<60s). Logically, an "impaired switch" (the flip-flop model, Saper et al., 2010) results in state fragmentation. The data here show the opposite: the wake state has become too stable. This suggests the primary defect is not in the transition mechanism itself, but possibly in a pathological increase in arousal drive (hyper-arousal), likely linked to the dark-phase hyperactivity shown in Figures 4 and 5. Also, a point to note is that this finding is not new.

      Figure 3 heatmaps lack color bars and units. Spectral power must be quantitatively defined and methods well-explained in the Methods section. Without these, the reader cannot discern if the "reduced power" in females is a global suppression of signal or a frequency-specific shift. Additionally, the representative example used to claim shorter sleep bouts lacks the statistical weight required for a major physiological conclusion. How does a cooler color (not clear what range and what the interpretation is) mean shorter sleep bout in female mice? The authors should clearly mark the frequency ranges that support their claims. In this figure, there is a question mark following the theta/delta range. The authors should avoid speculation and state their claims based on facts. They should also add the theta and delta ranges in the plot, such that readers can draw their own conclusions.

      Figure 8 and the MSLT results show that model mice are "no sleepier than WT mice" and have a functional homeostatic rebound. This presents a logical flaw in the "insomnia" narrative. True insomnia in AD patients typically involves a failure of the homeostatic process or a debilitating accumulation of sleep debt. If these mice do not show increased sleepiness (shorter latency) despite ~19% less sleep, the authors might be describing a "reduced need" for sleep or a "hyper-aroused" state, possibly not a clinical insomnia phenotype.

      In Figure 9, LFP power shown and compared in percentages is problematic, as LFP power distribution is known to be skewed (follows power law). This is particularly problematic here because all the frequencies above ~20 Hz seem to be totally flattened or nonexistent, which makes this comparison of power severely limited and biased towards the relative frequency in the highly skewed portion of the LFP power spectrum, i.e., very low frequency ranges like delta, theta, and possibly beta. This ignores low, mid, and high gamma as well as ripple band frequencies. NREM sleep is known to have relatively greater ripple band (100-250 Hz) power bursts in hippocampal regions, and REM sleep is known to have synchronous theta-gamma relationships.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled, "Sleep-Wake Transitions Are Impaired in the AppNL-G-F Mouse Model of Early Onset Alzheimer's Disease", is about a study of sleep/wake phenomena in a knockin mouse strain carrying "three mutations in the human App gene associated with elevated risk for early onset AD". Traditional, in-depth characterization of sleep/wake states, EEG parameters, and response to sleep loss are employed to provide evidence, "supporting the use of this strain as a model to investigate interventions that mitigate AD burden during early disease stages". The sleep/wake findings of earlier studies (especially Maezono et al., 2020, as noted by the authors) were extended by several important, genotype-related observations, including age-related hyperactivity onset that is typically associated with increased arousal, a normal response to loss of sleep and to multiple sleep latency testing, and a stronger AD-like phenotype in females. The authors conclude that the AppNL-G-F mice demonstrate many of the human AD prodromal symptoms and suggest that this strain may serve as a model for prodromal AD in humans, confirming the earlier results and conclusions of Maezono et al. Finally, based on state bout frequency and duration analyses, it is suggested that the AppNL-G-F mice may develop disruptions in mechanism(s) involved in state transition.

      Strengths:

      The study appears to have been, technically, rigorously conducted with high quality, in-depth traditional assessment of both state and EEG characteristics, with the concordant addition of activity and temperature. The major strengths of this study derive from observations that the AppNL-G-F mice: (1) are more hyperactive in association with decreased transitions between states; (2) maintain a normal response to sleep deprivation and have normal MSLT results; and (3) display a sex specific, "stronger" insomnia-like effect of the knockin in females.

      Weaknesses:

      The weaknesses stem from the study's impact being limited due to its being largely confirmatory of the Maezono et al. study, with advances of importance to a potentially more focused field. Further, the authors conclude that AppNL-G-F mice have disrupted mechanism(s) responsible for state transition; however, these were not directly examined. The rationale for this conclusion is stated by the authors as based on the observations that bouts of both W and NREM tend to be longer in duration and decreased in frequency in AppNL-G-F mice. Although altered mechanism(s) of state transition (it is not clear what mechanisms are referenced here) cannot be ruled out, other explanations might be considered. For example, increased arousal in association with hyperactivity would be expected to result in increased duration of W bouts during the active phase. This would also predictably result in greater sleep pressure that is typically associated with more consolidated NREM bouts, consistent with the observations of bout duration and frequency.

      Reviewer 1 succinctly summarizes the advances of this study beyond the ground-breaking Maezono et al (2020) study of this “humanized” mouse model exhibiting amyloid deposition. Whereas Maezono et al. conducted sleep/wake studies on male App<sup>NL-G-F</sup> mice at 6 and 12 months of age, we had the unusual opportunity to study both sexes of homozygous App<sup>NL-G-F</sup> mice and WT littermates at 14-18 months of age and to conduct a longitudinal assessment of many of the same individuals at 18-22 months. In addition to baseline sleep/wake and EEG spectral analyses, we (1) measured subcutaneous body temperature and activity to obtain a broader picture of the physiology and behavior of this strain at advanced ages; (2) assessed baseline sleepiness in this strain using the murine version of the clinically-relevant Multiple Sleep Latency Test (MSLT); (3) evaluated the response of App<sup>NL-G-F</sup> mice and WT littermates to a perturbation of the sleep homeostat; (4) compared the sleep/wake characteristics of male vs. female App<sup>NL-G-F</sup> mice at 18-22 months and, (5) to assess the stability of the phenotypes, analyzed these data over a continuous 14-d recording rather than the conventional 24h recordings typical of most sleep/wake studies including Maezono et al. We found that a long wake/short sleep phenotype was characteristic of homozygous App<sup>NL-G-F</sup> mice at these advanced ages which is also evident in the Maezono et al. (2020) study at 12 months of age (but not at 6 months), although the authors do not comment on this phenotype and instead focus on the reduced REM sleep which is particularly evident in female App<sup>NL-G-F</sup> mice in our study. Remarkably, despite being awake ~20% longer per day, we find that App<sup>NL-G-F</sup> mice are no sleepier than WT mice as determined by the MSLT and that their sleep homeostat is intact when challenged by 6-h sleep deprivation. At both advanced ages, the long wake/short sleep phenotype is due primarily to longer Wake bouts and shorter bouts of both NREM and REM sleep during the dark phase. Moreover, hyperactivity develops in older in App<sup>NL-G-F</sup> mice, particularly females, which contributes to this phenotype. We agree with Reviewer 1 that “hyperactivity would be expected to result in increased duration of W bouts during the active phase” and that this could result in more consolidated NREM bouts and we will modify the manuscript to discuss this alternative. However, the suggestion of greater sleep pressure is not borne out by the MSLT studies as we did not observe the shorter sleep latencies and increased sleep during the nap opportunities on the MSLT that we have observed in other mouse strains. Moreover, due to their short sleep phenotype, App<sup>NL-G-F</sup> mice would be entering the sleep deprivation study with a greater sleep debt than WT mice, yet we did not observe greater EEG Slow Wave Activity in this strain during recovery from sleep deprivation. Thus, we have suggested that App<sup>NL-G-F</sup> mice are unable to transition from Wake to sleep as readily as their WT littermates. Our observations summarized above set the stage for subsequent mechanistic studies in aged App<sup>NL-G-F</sup> mice, although realistically, mice of this age and genotype are a rare commodity.

      Reviewer #2 (Public review):

      Summary:

      The authors have used a knock-in mouse model to explore late-in-life amyloid effects on sleep. This is an excellent model as the mutated genes are regulated by the endogenous promoter system. The sleep study techniques and statistical analyses are also first-rate.

      The group finds an age-dependent increase in motor activity in advanced age in the NLGF homozygous knock-in mice (NLGF), with a parallel age-dependent increase in body temperature, both effects predominate in the dark period. Interestingly, the sleep patterns do not quite follow the sleep changes. Wake time is increased in NLGF mice, and there is no progression in increased wake over time. NREMS and REM sleep are both reduced, and there is no progression. Sleep-wake effects, however, show a robust light:dark effect with larger effects in the dark period. These findings support distinct effects of this mutation on activity and temperature and on sleep. This is the first description of the temporal pattern of these effects. NLGF mice show wake stability (longer bout durations in the dark period (their active period) and fewer brief arousals from sleep. Sleep homeostasis across the lights-on period is normal. Wake power spectral density is unaffected in NLGF mice at either age. Only REM power spectra are affected, with NLGF mice showing less theta and more delta. There are interesting sex differences, with females showing no gene difference in wake bout number, while males show a gene effect. Similarly, gene effects on NREM bout number seem larger in males than in females. Although there was no difference in homeostatic response, there was normalization of sleep-wake activity after sleep deprivation.

      Strengths:

      Approach (model extent of sleep phenotyping), analysis.

      Weaknesses:

      The weaknesses are summarized below and are viewed as "addressable".

      (1) The term insomnia. Insomnia is defined as a subjective dissatisfaction with sleep, which cannot be ascertained in a mouse model. The findings across baseline sleep in NLGF mice support increased wake consolidation in the active period. The predominant sleep period (lights on) is largely unaffected, and the active period (lights off) shows increased activity and increased wake with longer bouts. There is a fantastic clue where NLGF effects are consistent with increased hypocretinergic (orexinergic) neuron activity in the dark period, and/or increased drive to hypocretin neurons from PVH.

      (2) Sleep-wake transitions are impaired: This should not be termed an impairment. It could actually be beneficial to have greater state stability, especially wake stability in the dark or active period. There is reduced sleep in the model that can be normalized by short-term sleep loss. It is fascinating that recovery sleep normalized sleep in the NLGF in the immediate lights-on and light-off period. This is a key finding.

      Reviewer 2 suggests a provocative hypothesis to test. Curiously, although a recent Science paper suggests that hyperexcitable hypocretin/orexin neurons in aging mice results in greater sleep/wake fragmentation, hyperexcitability of this system could result in hyperactivity and longer wake bouts in aged App<sup>NL-G-F</sup> mice.

      Reviewer #3 (Public review):

      Summary:

      In this study, Tisdale et al. studied the sleep/wake patterns in the biological mouse model of Alzheimer's disease. The results in this study, together with the established literature on the relationship of sleep and Alzheimer's disease progression, guided the authors to propose this mouse model for the mechanistic understanding of sleep states that translates to Alzheimer's disease patients. However, the manuscript currently suffers from a disconnect between the physiological data and the mechanistic interpretations. Specifically, the claim of "impaired transitions" is logically at odds with the observed increase in wake-state stability or possible hyperactivity. Additionally, the description of the methods, the quantification, and the figure presentation could be substantially improved. I detail some of my concerns below.

      Strengths:

      The selection of the knock-in model is a notable strength as it avoids the artifacts associated with APP overexpression and more closely mimics human pathology. The study utilizes continuous 14-day EEG recordings, providing a unique dataset for assessing chronic changes in arousal states. The assessment of sex as a biological variable identifies a more severe "insomniac-like" phenotype in females, which aligns with the higher prevalence and severity of Alzheimer's disease in women.

      Weaknesses:

      The study seems to lack a clear hypothesis-driven approach and relies mostly on explorative investigations. Moreover, lack of quantitative analytical methods as well as shaky logical conclusions, possibly not supported by data in its current form, leaves room for major improvement.

      Since this paper studied sleep states, the "Methods" section is quite unclear on what specific criteria were used to classify sleep states. There is no quantitative description of classifying sleep based on clear, reproducible procedures. There are many reasonably well-characterized sleep scoring systems used in rat electrophysiological literature, which could be useful here. The authors are generally expected to describe movement speed and/or EMG and/or EEG (theta/delta/gamma) criteria used to classify these epochs. The subjective (manual) nature of this procedure provides no verifiable validation of the accuracy and interpretability of the results.

      One of the bigger claims is that "state transition mechanism(s)" are impaired. However, Figure 7 shows that model mice exhibit significantly more long wake bouts (>260s) and fewer short wake bouts (<60s). Logically, an "impaired switch" (the flip-flop model, Saper et al., 2010) results in state fragmentation. The data here show the opposite: the wake state has become too stable. This suggests the primary defect is not in the transition mechanism itself, but possibly in a pathological increase in arousal drive (hyper-arousal), likely linked to the dark-phase hyperactivity shown in Figures 4 and 5. Also, a point to note is that this finding is not new.

      Figure 3 heatmaps lack color bars and units. Spectral power must be quantitatively defined and methods well-explained in the Methods section. Without these, the reader cannot discern if the "reduced power" in females is a global suppression of signal or a frequency-specific shift. Additionally, the representative example used to claim shorter sleep bouts lacks the statistical weight required for a major physiological conclusion. How does a cooler color (not clear what range and what the interpretation is) mean shorter sleep bout in female mice? The authors should clearly mark the frequency ranges that support their claims. In this figure, there is a question mark following the theta/delta range. The authors should avoid speculation and state their claims based on facts. They should also add the theta and delta ranges in the plot, such that readers can draw their own conclusions.

      Figure 8 and the MSLT results show that model mice are "no sleepier than WT mice" and have a functional homeostatic rebound. This presents a logical flaw in the "insomnia" narrative. True insomnia in AD patients typically involves a failure of the homeostatic process or a debilitating accumulation of sleep debt. If these mice do not show increased sleepiness (shorter latency) despite ~19% less sleep, the authors might be describing a "reduced need" for sleep or a "hyper-aroused" state, possibly not a clinical insomnia phenotype.

      In Figure 9, LFP power shown and compared in percentages is problematic, as LFP power distribution is known to be skewed (follows power law). This is particularly problematic here because all the frequencies above ~20 Hz seem to be totally flattened or nonexistent, which makes this comparison of power severely limited and biased towards the relative frequency in the highly skewed portion of the LFP power spectrum, i.e., very low frequency ranges like delta, theta, and possibly beta. This ignores low, mid, and high gamma as well as ripple band frequencies. NREM sleep is known to have relatively greater ripple band (100-250 Hz) power bursts in hippocampal regions, and REM sleep is known to have synchronous theta-gamma relationships.

      We agree with the reviewer that the “Classification of arousal states” section was missing the key description of how we scored the recordings into arousal states based on EEG, EMG and locomotor activity; this was an oversight as the corresponding text exists in all our previous sleep/wake studies published over several decades. Reviewer 1 also points out the alternative interpretation that “the wake state has become too stable.” However, I think we are using different words to say the same thing: that the transition from wake to sleep is impaired whether it is due to hyperarousal or to a defect in the flip/flop switch that results in greater Wake stability. We will revise Fig 3 (Reviewer 2 suggests combining with Fig 14) but note that the X-axis is labelled 0-25 Hz and that this figure was intended to be descriptive -- illustrating how unusual the female App<sup>NL-G-F</sup> mice are relative to WT -- rather than a quantitative analysis of spectral power as in Fig. 14. Both Reviewer 2 and 3 suggest that we are using “insomnia” incorrectly, which we have simply used to describe less sleep per 24h period. Reviewer 2 states that “Insomnia is defined as a subjective dissatisfaction with sleep” and Reviewer 3 suggests a narrow definition of insomnia as due only to “a failure of the homeostatic process or a debilitating accumulation of sleep debt.” In a revised manuscript, we will define “insomnia” as an operational term to succinctly mean “less sleep”. Regarding the problem of presenting spectral power in percentages, we completely agree with the reviewer. However, we intentionally presented spectral power density, a measure of relative power, as in Figure 3A and 3B of Maezono et al. (2020). At the risk of making Fig. 9 even more busy, we will revise Fig. 9 to add labels for all Y-axes.

      In addition to a revised Fig. 9, in the revised manuscript, we will reformat Tables 1-3, Figs. S1 and S2 for legibility and correct an error in Fig. 7.

    1. eLife Assessment

      This important work employed a recent functional muscle network analysis to evaluate rehabilitation outcomes in post-stroke patients. While the research direction is relevant and suggests the need for further investigation, the strength of evidence supporting the claims is incomplete. Muscle interactions can serve as biomarkers, but improvements in function are not directly demonstrated, and the method's robustness is not benchmarked against existing approaches.

    2. Reviewer #1 (Public review):

      While the revised manuscript includes additional methodological details and a supplementary comparison with conventional NMF, it would be great if the authors could add the point below as limitations in the manuscript or change the title and abstract accordingly, since core issues remain:

      (1) The study claims to evaluate rehabilitation outcomes without demonstrating that patients actually improved functionally

      (2) The comparison with existing methods lacks the quantitative rigor needed to establish superiority

      (3) The added value of this complex framework over much simpler alternatives has not been demonstrated

      The strength of evidence supporting the main claims remains incomplete. I would encourage the authors to consider discussing these points

      (1) including or adding a limitation section about functional outcome measures that go beyond clinical scale scores, (2) providing/discussing quantitative benchmarks showing their method outperforms alternatives on specific, predefined metrics, and (3) clarifying the clinical pathway by which these biomarkers would inform treatment decisions.

      There are specific, relatively minor points, that require attention

      The authors write: "we did not focus on such complementary evidence in this study." This is a weakness for a paper claiming to provide "biomarkers of therapeutic responsiveness." The FMA-UE threshold defines responders, but there's no independent validation that patients actually functioned better in daily life. Can you please clarify?

      Maybe I missed the exact point about this, but with the added NMF plot, the authors list 'lower dimensionality' among their framework's advantages, but the basis for this claim is not clear because given that 12 network components were extracted compared to 11 "conventional" synergies. Can you please clarify, as it is not clear. You claim 'lower dimensionality' as an advantage of the proposed framework (in the Supplementary Materials), yet you extracted 12 components (5 redundant + 7 synergistic networks) compared to 11 synergies from the conventional NMF approach, which does not support a clinical / outcome advantage of this method. Please clarify.

    3. Reviewer #2 (Public review):

      This study presents an important analysis of how interactions between muscles can serve as biomarkers to quantify therapeutic responses in post-stroke patients. To do so, the authors employ an information-theoretical metric (co-information) to define muscle networks and perform cluster analysis.

      I thank the authors for improving the clarity of the Methods section; the newly added Figure 5 is very helpful.

      One minor suggestion is that the authors should avoid overloading the notation "m" for both the EEG measurement and the matrix of II values (Eq. 1.1), which I now realise was the source of some of my initial confusion. I suggest that the authors use separate notation for these two quantities.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses an important clinical challenge by proposing muscle network analysis as a tool to evaluate rehabilitation outcomes. The research direction is relevant, and the findings suggest further research. The strength of evidence supporting the claims is, however, limited: the improvements in function are not directly demonstrated, the robustness of the method is not benchmarked against already published approaches, and key terminology is not clearly defined, which reduces the clarity and impact of the work.

      Comments:

      There are several aspects of the current work that require clarification and improvement, both from a methodological and a conceptual standpoint.

      First, the actual improvements associated with the rehabilitation protocol remain unclear. While the authors report certain quantitative metrics, the study lacks more direct evidence of functional gains. Typically, rehabilitation interventions are strengthened by complementary material (e.g., videos or case examples) that clearly demonstrate improvements in activities of daily living. Including such evidence would make the findings more compelling.

      We thank the reviewer for their careful consideration of our work. We agree that direct evidence for the functional gains achieved by patients is important for establishing the efficacy of a clinical intervention and that this evidence should provide comprehensive insights for clinicians, from videos to case examples as suggested. Our aim here was apply a novel computational framework to a cohort of patients undergoing rehabilitation, and in doing so, provide empirical support for its utility in standardised motor assessments. We have shown that our novel approach can identify distinct physiological responses to VR vs PT conditions across the post-stroke cohort (see Fig.2B and associated text). Hence, although the data contains virtual reality vs. conventional physical therapy experimental conditions which likely holds important insights into the clinical use case of virtual reality interventions, we did not focus on such complementary evidence in this study. In future work, research groups (including our own) investigating the important question of clinical intervention efficacy will likely gain unique and useful mechanistic insights using our approach.

      Moreover, a threshold of 5 points at the FMA-UE was considered as MCID, to distinguish between responder and non-responder patients, which represents an acknowledged and applicable measure in the clinical field. The use of single cases represents low evidence of change from the perspective of expert clinicians, raising concerns on the clinical meaningful of reported results. All this given, we chose to provide stronger evidence of clinical effect (i.e. comparison between responders and non-responders) interpreted from the perspective of muscle synergies, than to support our results in single selected cases, representing a bias in terms of translation to population of people survived to a stroke.

      Second, the claim that the proposed muscle network analysis is robust is not sufficiently substantiated. The method is introduced without adequate reference to, or comparison with, the extensive literature that has proposed alternative metrics. It is also not evident whether a simpler analysis (e.g., EMG amplitude) might produce similar results. To highlight the added value of the proposed method, it would be important to benchmark it against established approaches. This would help clarify its specific advantages and potential applications. Moreover, several studies have shown very good outcomes when using AI and latent manifold analyses in patients with neural lesions. Interpreting the latent space appears even easier than interpreting muscle networks, as the manifolds provide a simple encoding-decoding representation of what the patient can still perform and what they can no longer do.

      To address the reviewers concerns regarding adequate evidence for the claims made about the presented framework, we have now included an application of the conventional muscle synergy analysis approach based on non-negative matrix factorisation to the post-stroke cohort (see Supplementary materials Fig.5 and associated text). We made efforts to make this comparison as fair as possible by applying the conventional approach at the population level also and clustering the activation coefficients using a similar yet more conventional approach, agglomerative clustering. Accompanying the output of this application, we have included several points of where our framework improves significantly upon conventional muscle synergy analysis:

      “Comparison with conventional approaches

      To more directly illustrate the advantages of the proposed framework, we carried out a standardised pre-processing of the EMG data in line with conventional muscle synergy analysis. This included rectification, low-pass filtration (cut-off: 20Hz) and smooth resampling of EMG waveforms to 50 timepoints. All data for each participant at each session was separately normalised by channel-wise variance, concatenated together and input into non-negative matrix factorisation (NMF) ('nnmf' Matlab function, 10 replications) to extract 11 muscle synergies (W1-11 of Supplementary Materials Fig.5(Left)) and their time-varying activations. The number of components to extract was determined in a conventional way as the number of components required to explain >75% of the data variance. The extracted muscle synergies included distinct shoulder- (e.g. W2), elbow (e.g. W8) and forearm-level (e.g. W1) muscle covariation patterns along with more isolated muscle contributions (e.g. UT in W3, TL in W10).

      Regarding the clustering results of our framework and how they compare to conventional approaches, to facilitate this comparison we applied agglomerative clustering to the time-varying activation coefficients of all participants, trials, tasks separately for pre- and post-sessions and employed the 'evalclusters' Matlab function (Ward linkage clustering, Calinski Harabasz criterion, Klist search = 2:21) for each session. We identified two clusters both at pre-session (Criterion = 1.69) and post-session (Criterion = 1.81) as optimal fits to the population data (see Supplementary Materials Fig.5(Right)). We found no associations between pre- or post-session cluster partitions and participants FMA-UE scores. Nevertheless, we did identify significant associations between the pre-session clustering’s and S_Pre (X<sup>2</sup> = 7.08, p = 0.008) and between post-session clustering’s and conventionally-defined treatment responders (X<sup>2</sup> = 4.2, p = 0.04). These findings, along with the similar two-way clustering structure found using the NIF, highlights important commonalities between these approaches.

      To summarise the main advantages of our framework over this conventional approach:

      - Lower dimensionality and enhanced interpretability of extracted components.

      Our framework yields a lower number of population-level components that correspond more consistently to meaningful biomechanical and physiological functions.

      - Integration of pairwise muscle relationships.

      By incorporating muscle-pair level analysis, our framework captures coordinated interactions between primary and stabilising muscles—relationships that conventional NMF approaches overlook.

      - Separation of task-relevant and task-irrelevant activity.

      The NIF isolates task-relevant coordination patterns, distinguishing them from task-irrelevant interactions driven by biomechanical or task constraints. On the other hand, task-relevant and -irrelevant muscle contributions are intermixed in conventional muscle synergy analysis.

      - Ability to identify complementary functional roles.

      The NIF characterises whether muscle pairs act in similar or complementary ways, providing richer insight into motor control strategies.

      - Reduced dependence on variance-based optimisation.

      Unlike conventional methods that rely on maximising variance explained, our framework allows detection of subtle but functionally significant interactions that contribute less to total variance.

      - Improved detection of clinically relevant population structure.

      The clustering component of our framework revealed distinct post-stroke subgroups with important clinical relevance, distinguishing moderately and severely impaired cohorts and treatment responders and non-responders from pre-treatment data.”

      This supplementary analysis is referred to in the Methods section of the main text with reference to previous similar comparisons between our framework and conventional approaches:

      “Towards finding an effective approach to clustering participants in this data based on differences in impairment severity and therapeutic (non-)responsiveness, we found that conventional clustering algorithms (e.g. agglomerative, k-means etc.) could not provide substantive outputs (see Supplementary Materials Fig.5 and associated text for a direct comparison with conventional approaches), perhaps resulting from the complex interdependencies between the modular activations.”

      “To facilitate comparisons with existing approaches, we performed a conventional muscle synergy analysis on the post-stroke cohort (see Supplementary Materials Fig.5 and associated text). Further comparisons with conventional approaches can be found in our previous work (O’Reilly & Delis, 2022).”

      Further, we have also referred to a previous analysis of this post-stroke dataset using the conventional approach in the discussion section, where we point out how our approach can identify salient features of post-stroke physiological responses that conventional approaches cannot:

      “Further, the NIF demonstrated here an enhanced capability over traditional approaches to identify these crucial patterns, as earlier work on related versions of this dataset could not identify any differentiable fractionation events across the cohort (Pregnolato et al., 2025).”

      Overall, the utility of conventional muscle synergy analysis is well recognised across the field (Hong et al 2021). Our proposed approach builds on this conventional method by addressing key limitations to further enhance this clinical utility. We also agree that manifold learning approaches are an exciting area of research that we aim to incorporate into our framework in future research. Specifically, manifold learning methods like Laplacian eigenmaps can readily be applied to the co-membership matrix produced by our clustering algorithm, exploiting the geometry of this matrix to provide a continuous rather than discrete representation of population structure. We have highlighted this possibility in the discussion section:

      “Indeed, in future work, we aim to apply manifold learning approaches to the co-membership matrix derived from this clustering algorithm, providing a continuous representation of the population structure.”

      Third, the terminology used throughout the manuscript is sometimes ambiguous. A key example is the distinction made between "functional" and "redundant" synergies. The abstract states: "Notably, we identified a shift from redundancy to synergy in muscle coordination as a hallmark of effective rehabilitation-a transformation supported by a more precise quantification of treatment outcomes."

      However, in motor control research, redundancy is not typically seen as maladaptive. Rather, it is a fundamental property of the CNS, allowing the same motor task to be achieved through different patterns of muscle activity (e.g., alternative motor unit recruitment strategies). This redundancy provides flexibility and robustness, particularly under fatiguing conditions, where new synergies often emerge. Several studies have emphasized this adaptive role of redundancy. Thus, if the authors intend to use "redundancy" differently, it is essential to define the term explicitly and justify its use to avoid misinterpretation.

      We appreciate the reviewers concerns regarding the terminology employed in this study. Indeed, we agree that redundancy is seen in the motor control literature as a positive feature of biological systems, appearing to contradict the interpretations of the redundancy-to-synergy information conversion result we have presented. We also wish to highlight that across the motor control literature and beyond, the idea of redundancy is often conflated with the related but distinct notion of degeneracy. Traditional motor control research has also recognised this difference, for example, Latash has outlined this difference in the seminal work on motor abundance (https://doi.org/10.1007/s00221-012-3000-4). A key reference discussing this conflation and these two concepts in an information-theoretic way is found here: https://doi.org/10.1093/cercor/bhaa148. To summarise what their arguments mean for our work:

      - System degeneracy relates to the ability of different system components to contribute towards the same task in a context-specific way.

      - System redundancy corresponds to the degree of functional overlap among system components.

      Hence, conceptually speaking, informational redundancy as employed in our study (i.e. functionally-similar muscle interactions) links with system redundancy in that it quantifies the functional overlap of system components. This definition of system redundancy implies that it is an unavoidable by-product of degenerate systems (inefficient use of degrees of freedom) which should be minimised where possible. As a result of stroke, in our study and related previous work patients displayed increased informational redundancy, linking with the abnormal co-activations they typically experience for example and with previous results from traditional muscle synergy analysis showing fewer components extracted as a function of motor impairment post-stroke (i.e. higher informational redundancy) (Clark et al. 2010). Our novel contribution here is to convey how effective rehabilitation is underpinned by a redundancy-to-synergy information conversion across the muscle networks, relating in a loose sense conceptually to a reduction in system redundancy and enhancement of system degeneracy (i.e. functionally differentiated system components contributing towards task performance).

      Together, and alongside the mathematical descriptions of redundant (functionally-similar) and synergistic (functionally-complementary) information in what types of functional relationships they capture, we believe the intuition behind this finding has clear links with previous research showing a) the merging of muscle synergies in response to post-stroke impairment (i.e. functional de-differentiation), b) reduction in abnormal couplings with effective rehabilitation (i.e. functional re-differentiation). To communicate this more clearly to readers, we have included the following in the corresponding discussion section:

      “Previous research has shown that functional redundancy increases post-stroke (Cheung et al., 2012; Clark et al., 2010), reflecting the characteristic loss of functional specificity (i.e. functional de-differentiation) of muscle interactions post-stroke. Enhanced synergy with treatment here thus reflects the functional re-differentiation of predominantly flexor-driven muscle networks towards different, complementary task-objectives across the seven upper-limb motor tasks performed (Kim et al., 2024b), leading to improved motor function among responders.”

      Finally, we have screened the updated manuscript for consistent use of terminology including functional/redundant/synergistic.

      References

      Clark DJ, Ting LH, Zajac FE, Neptune RR, Kautz SA. Merging of healthy motor modules predicts reduced locomotor performance and muscle coordination complexity post-stroke. Journal of neurophysiology. 2010 Feb;103(2):844-57.

      Hong YN, Ballekere AN, Fregly BJ, Roh J. Are muscle synergies useful for stroke rehabilitation?. Current Opinion in Biomedical Engineering. 2021 Sep 1;19:100315.

      Latash ML. The bliss (not the problem) of motor abundance (not redundancy). Experimental brain research. 2012 Mar;217(1):1-5.

      O'Reilly D, Delis I. Dissecting muscle synergies in the task space. Elife. 2024 Feb 26;12:RP87651.

      Sajid N, Parr T, Hope TM, Price CJ, Friston KJ. Degeneracy and redundancy in active inference. Cerebral Cortex. 2020 Nov;30(11):5750-66.

      Reviewer #2 (Public review):

      Summary:

      This study analyzes muscle interactions in post-stroke patients undergoing rehabilitation, using information-theoretic and network analysis tools applied to sEMG signals with task performance measurements. The authors identified patterns of muscle interaction that correlate well with therapeutic measures and could potentially be used to stratify patients and better evaluate the effectiveness of rehabilitation.

      However, I found that the Methods and Materials section, as it stands, lacks sufficient detail and clarity for me to fully understand and evaluate the quality of the method. Below, I outline my main points of concern, which I hope the authors will address in a revision to improve the quality of the Methods section. I would also like to note that the methods appear to be largely based on a previous paper by the authors (O'Reilly & Delis, 2024), but I was unable to resolve my questions after consulting that work.

      I understand the general procedure of the method to be: (1) defining a connectivity matrix, (2) refining that matrix using network analysis methods, and (3) applying a lower-dimensional decomposition to the refined matrix, which defines the sub-component of muscle interaction. However, there are a few steps not fully explained in the text.

      (1) The muscle network is defined as the connectivity matrix A. Is each entry in A defined by the co-information? Is this quantity estimated for each time point of the sEMG signal and task variable? Given that there are only 10 repetitions of the measurement for each task, I do not fully understand how this is sufficient for estimating a quantity involving mutual information.

      We acknowledge the confusion caused here in how many datapoints were incorporated into the estimation of II. The number of datapoints included in each variable involved was in fact no. of timepoints x 10 repetitions. Hence for the EMGs employed in this analysis with a sampling rate of 2000Hz, the length of variables involved in this analysis could easily extend beyond 20,000 datapoints each. We have clarified this more specifically in the corresponding section of the methods:

      “We carried out this application in the spatial domain (i.e. interactions between muscles across time (Ó’Reilly & Delis, 2022)) by concatenating the 10 repetitions of each task executed on a particular side (i.e. variables of length no. of timepoints x 10 trials) and quantifying II with respect to this discrete task parameter codified to describe the motor task performed at each timepoint for each trial included.”

      In the previous paper (O'Reilly & Delis, 2024), the authors initially defined the co-information (Equation 1.3) but then referred to mutual information (MI) in the subsequent text, which I found confusing. In addition, while the matrix A is symmetrical, it should not be orthogonal (the authors wrote A<sup>T</sup>A = I) unless some additional constraint was imposed?

      We thank the reviewer for spotting this typo in the previous paper describing a symmetric matrix as A<sup>T</sup>A = I which is in fact related to orthogonality instead. To clarify this error, in the current study we have correctly described the symmetric matrix as A = A<sup>T</sup> here:

      “We carried out this application in the spatial domain (i.e. interactions between muscles across time (Ó’Reilly & Delis, 2022)) by concatenating the 10 repetitions of each task executed on a particular side (i.e. variables of length no. of timepoints x 10 trials) and quantifying II with respect to this discrete task parameter codified to describe the motor task performed at each timepoint for each trial included. This computation was performed on all unique m<sub>x</sub> and m<sub>y</sub> pairings, generating symmetric matrices (A) (i.e. A = A<sup>T</sup>) composed separately of non-negative redundant and synergistic values (Fig.5).”

      Regarding the reviewers point about the reference to MI after equation 1.3 of the previous paper where co-Information is defined, we were referring both to the task-relevant and task-irrelevant estimates analysed there collectively in a general sense as ‘MI estimates’ as they both are derived from mutual information, task-irrelevant being the MI between two muscles conditioned on a task variable (conditional mutual information) and task-relevant being the difference between two MI values (co-I is a higher-order MI estimate). This removed the need to continuously refer to each separately throughout the paper which may in its own way cause some confusion. For clarity, in the results of that paper we also provided context for each MI estimate on how they were estimated (see beginning of “Task-irrelevant muscle couplings” and “Task-redundant muscle couplings” and “Task-synergistic muscle couplings” results sections), referring throughout the Venn diagrams depicting them (see Fig.1 of previous paper). In the present study however, for brevity and focus we did not perform an analysis on task-irrelevant muscle interactions and so decided to focus our terminology on co-I (II), a higher-order MI estimate. We acknowledge that this may have caused some confusion but highlight the efforts made to communicate each measure throughout the previous and present study. We have explicitly pointed out this specific focus on task-dependent muscle couplings in this paper at the end of the introduction of the updated manuscript:

      “To do so, here we focussed our analysis on quantifying task-dependent muscle couplings (collectively referred to as II), extracting functionally-similar (i.e. redundant) and -complementary (i.e. synergistic) modules…”

      (2) The authors should clarify what the following statement means: "Where a muscle interaction was determined to be net redundant/synergistic, their corresponding network edge in the other muscle network was set to zero."

      We acknowledge this sentence was unclear/misleading and have now clarified this statement in the following way:

      “This computation was performed on all unique m<sub>x</sub> and m<sub>y</sub> pairings, generating sparse symmetric matrices (A) (i.e. A = A<sup>T</sup>) composed separately of non-negative redundant and synergistic values (Fig.5).” Additionally, we have now included an additional figure (fig.5) describing this text graphically.

      (3) It should be clarified what the 'm' values are in Equation 1.1. Are these the co-information values after the sparsification and applying the Louvain algorithm to the matrix 'A'? Furthermore, since each task will yield a different co-information value, how is the information from different tasks (r) being combined here?

      We thank the reviewer for their attention to detail. For clarity, at the related section of Equation 1.1, we have clarified that the input matrix is composed of co-I estimates:

      “The input matrix for PNMF consisted of the sparsified A on both affected and unaffected sides from all participants at both pre- and post-sessions concatenated in their vectorised forms. More specifically, the input matrix composed of redundant or synergistic values was configured such that the set of unique muscle pairings (1 … K) on affected and unaffected sides (m<sub>aff</sub> and m<sub>unaff</sub> respectively)…”.

      The co-I estimates in this input matrix are indeed those that survived sparsification in previous steps, however, for determining the number of modules to extract using the Louvain algorithm, this step has no direct impact or transformation on the co-I estimates and is simply employed to derive an empirical input parameter for dimensionality reduction. We refer the reviewer to the following part of this paragraph where this is described:

      “The number of muscle network modules identified in this final consensus partition was used as the input parameter for dimensionality reduction, namely projective non-negative matrix factorisation (PNMF) (Fig.1(D)) (Yang & Oja, 2010). The input matrix for PNMF consisted of the sparsified A on both affected and unaffected sides from all participants at both pre- and post-sessions concatenated together in their vectorised form.”

      Finally, as the reviewer has mentioned, the co-I estimates from the same muscles pairings but for different tasks, experimental sessions and participants are indeed different, reflecting their task-specific tuning, changes with rehabilitation and individual differences. To combine these representations into low-dimensional components, we employed projective non-negative matrix factorisation (PNMF). As outlined in the previous paper and earlier work on this framework (O’ Reilly & Delis, 2022), application of dimensionality reduction here can generate highly generalisable motor components, highlighting their ability to effectively represent large populations of participants, tasks and sessions, while allowing interesting individual differences mentioned by the reviewer to be buffered into the corresponding activation coefficients. These activation coefficients are for this reason the focus of the cluster analyses in the present study to characterise the post-stroke cohort. We have explicitly provided this reason in the methods section of the updated manuscript:

      “We focussed on $a$ here as the extraction of population-level functional modules enabled the buffering of individual differences into the space of modular activations, making them an ideal target for identifying population structure.”

      (4) In general, I recommend improving the clarity of the Methods section, particularly by being more precise in defining the quantities that are being calculated. For example, the adjacency matrix should be defined clearly using co-information at the beginning, and explain how it is changed/used throughout the rest of the section.

      We thank the reviewer for their constructive advice and have gone to lengths to improve the clarity of the methods section. Firstly, we have addressed all the reviewers comments on various specific sections of the methods, including more clearly the ‘why’ and ‘how’ of what was performed. Secondly, we have now included an additional figure illustrating how co-information was quantified at the network level and separated into redundant and synergistic values (see Fig.5 of updated manuscript). Finally, we have re-structured several paragraphs of the methods section to enhance flow with additional subheadings for clarity.

      (5) In the previous paper (O'Reilly & Delis, 2024), the authors applied a tensor decomposition to the interaction matrix and extracted both the spatial and temporal factors. In the current work, the authors simply concatenated the temporal signals and only chose to extract the spatial mode instead. The authors should clarify this choice.

      The reviewer is correct in that a different dimensionality reduction approach was employed in the previous paper. In the present study, we instead chose to employ projective non-negative matrix factorisation, as was employed in a preliminary paper on this framework (O’Reilly & Delis, 2022). This decision was made simply based on aiming to maintain brevity and simplicity in the analysis and presentation of results as we introduce other tools to the framework (i.e. the clustering algorithm). Indeed, we could have just as easily employed the tensor decomposition to extract both spatial and temporal components, however we believed the main take away points for this paper could be more easily communicated using spatial networks only. To clarify this difference for readers we have included the following in the methods section:

      “The choice of PNMF here, in contrast to the space-time tensor decomposition employed in the parent study (O’Reilly & Delis, 2024), was chosen simply to maintain brevity by focussing subsequent analyses on the spatial domain.”

      References

      Ó’Reilly D, Delis I. A network information theoretic framework to characterise muscle synergies in space and time. Journal of Neural Engineering. 2022 Feb 18;19(1):016031.

      O'Reilly D, Delis I. Dissecting muscle synergies in the task space. Elife. 2024 Feb 26;12:RP87651.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Both reviewers are concerned with the manuscript in its current form. They questioned the relevance of the current approach in providing functional or mechanistic explanations about the rehabilitation process of post-stroke patients. Our eLife Assessment would change if you include comparisons between your current method and classical ones, in addition to improving the description of your method to strengthen the evidence of its robustness.

      Reviewer #1 (Recommendations for the authors):

      There is a minor typographical error in Figure 2 ("compononents" should be corrected).

      This error has been rectified.

      Reviewer #2 (Recommendations for the authors):

      The authors should be able to address most of my concerns by providing a substantially improved version of the Methods section.

      See above responses to the reviewers comments regarding the methods section.

      However, I would like the authors to explain in full detail (potentially including a simulation or power analysis) the procedure for estimating the co-information quantity, and to clarify whether it is robust given the sample size used in this paper.

      We refer the reviewer to our previous responses outlining with greater clarity the number of samples included in the estimation of co-I. We would also like to mention here that our framework does not make inferences on the statistical significance of individual muscle couplings (i.e. co-I estimates). Instead, these estimates are employed collectively for the sole purpose of pattern recognition. Nevertheless, to generate reliable estimates of the muscle couplings, we have employed a substantial number of samples for each co-I estimate (>20k samples in each variable) addressing the reviewers main concern her.

    1. eLife Assessment

      This important work introduces a splitGFP-based labeling tool with an analysis pipeline for the synaptic scaffold protein bruchpilot, with tests in the adult Drosophila mushroom bodies, a learning center in the Drosophila brain. The evidence supporting the conclusions is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Wu et al. uses endogenous bruchpilot expression in a cell-type-specific manner to assess synaptic heterogeneity in adult Drosophila melanogaster mushroom body output neurons. The authors performed genomic on locus tagging of the presynaptic scaffold protein bruchpilot (brp) with one part of splitGFP (GFP11) using the CRISPR/Cas9 methodology and co-expressed the other part of splitGFP (GFP1-10) using the GAL4/UAS system. Upon expression of both parts of splitGFP, fluorescent GFP is assembled at the C-terminus of brp, exactly where brp is endogenously expressed in active zones. For manageable analysis, a high-throughput pipeline was developed. This analysis evaluated parameters like location of brp clusters, volume of clusters, and cluster intensity as a direct measure of the relative amount of brp expression levels on site using publicly available 3D analysis tools that are integrated in Fiji. Analysis was conducted for different mushroom body cell types in different mushroom body lobes using various specific GAL4 drivers. Further validation was provided by extending analysis to R8 photoreceptors that reside in the fly medulla. To test this new method of synapse assessment, Wu et al. performed an associative learning experiment in which an odor was paired with an aversive stimulus and found that in a specific time frame after conditioning, the new analysis solidly revealed changes in brp levels at specific synapses that are associated with aversive learning. Additionally, brp levels were assessed in R8 photoreceptor terminals upon extended exposure to light.

      Strengths:

      Expression of splitGFP bound to brp enables intensity analysis of brp expression levels as exactly one GFP molecule is expressed per brp. This is a great tool for synapse assessment. This tool can be widely used for any synapse as long as driver lines are available to co-express the other part of splitGFP in a cell-type-specific manner. As neuropils and thus brp label can be extremely dense, the analysis pipeline developed here is very useful and important. The authors have chosen an exceptionally dense neuropil - the mushroom bodies - for their analysis and compellingly show that brp assessment can be achieved even with such densely packed active zones. The result that brp levels change upon associative learning in an experiment with odor presentation paired with punishment is likewise compelling and strongly suggests that the tool and pipeline developed here can be used in an in vivo context. Thus, the tool and its uses have the potential to fundamentally advance protein analysis not only at the synapse but especially there.

      Weaknesses:

      The weaknesses I perceived originally were satisfactorily explained and refuted.

    3. Reviewer #2 (Public review):

      Summary:

      The authors developed a cell-type-specific fluorescence-tagging approach using a CRISPR/Cas9 induced spilt-GFP reconstitution system to visualize endogenous Bruchpilot (BRP) clusters at presynaptic active zones (AZ) in specific cell types of the mushroom body (MB) in the adult Drosophila brain. This AZ profiling approach was implemented in a high-throughput quantification process allowing to compare synapse profiles within single cells, cell-types, MB compartments and between different individuals. Aim is to in more detail analyze neuronal connectivity and circuits in this center of associative learning, notoriously difficult to investigate due to the density of cells and structures within the cells. The authors detect and characterize cell-type specific differences in BRP-dependent profiling of presynapses in different compartments of the MB, while intracellular AZ distribution was found to be stereotyped. Next to the descriptive part characterizing various AZ profiles in the MB, the authors apply an associative learning assay and Rab3 knock-down and detected consequent AZ reorganization.

      Strengths:

      The strength of this study lies in the outstanding resolution of synapse profiling in the extremely dense compartments of the MB. This detailed analysis will serve as an entry point for many future studies of synapse diversity in connection with functional specificity to uncover the molecular mechanisms underlying learning and memory formation and neuronal network logic. Therefore, this approach is of high importance to the scientific community and represents a valuable tool to investigate and correlate AZ architecture and synapse function in the CNS.

      Weaknesses:

      The results and conclusions presented in this study are conclusively and well supported by the data presented and appropriate controls. As a comment that could possibly aid and strengthen the manuscript (but not required for acceptance of the manuscript): The experiments in the study are based on spilt-GFP lines (BRP:GFP11 and UAS-GFP1-10). The authors clearly validate the new on-locus construct with a genomic GFP insertion (qPCR, confocal and STED imaging of the brain with anti-BRP (Nc82), MB morphology and memory formation). It would be important to comment on the significant overall intensity decrease of anti-BRP (Nc82) in Fig. S1B (R57C10>BRP::rGFP) and possibly a Western Blot with a correlative antibody staining against BRP might help to show that BRP protein level are not affected. Additionally, it would be important to state, at least in the Materials and Methods section, that the flies are not homozygous viable (and to offer an explanation) and to state that all experiments were performed with heterozygous flies.

    4. Reviewer #3 (Public review):

      Summary:

      The authors develop a tool for marking presynaptic active zones in Drosophila brains, dependent on the GAL4 construct used to express a fragment of GFP, which will incorporate with a genome-engineered partial GFP attached to the active zone protein bruchpilot - signal will be specific to the GAL4 expressing neuronal compartment. They then use various GAL4s to examine innervation onto the mushroom bodies to dissect compartment specific differences in the size and intensity of active zones. After a description of these differences, they induce learning in flies with classic odour/electric shock pairing and observe changes after conditioning that are specific to the paired conditioning/learning paradigm.

      Strengths:

      The imaging and analysis appears strong. The tool is novel and exciting.

      Weaknesses:

      I feel that the tool could do with a little more characterisation. It is assumed that the puncta observed are AZs with no further definition or characterisation. It is not resolved if the AZs visualised here simply tagged, or are the constructs incorporated to be an active functional part of the AZ.

      Comments on revisions:

      Apologies, I should have thought of this in the first round of review. An experiment I would suggest (and it is not a difficult one) to address the functionality of the marker: It is mentioned that the genetically tagged half of the construct is homozygous lethal. Can this be placed in trans to a brp null, with a neuronal UAS-expression of the other half of Brp-GFP - Are the animals then 1) alive, and 2) able to fly (brp mutants can't fly, hence the name 'crashpilot') - a rescue would suggest (and that is all that would be needed here) that the reconstituted brp-GFP has function.

      On another note, the paper keeps switching between different DAN-GAL4 lines. In 1H, 2Band 4A, there are informative cartoons showing the extension of the neurons for PPL1, APL and DPM neurons - could these be incorporated into figures 5, 6 and 7, and the supplementary figures to help orient the reader. Ideally they would refer to a figure (in Fig 1?) -to refer to the groups of DANs in the adult brain that are known to innervate the MBs (e.g. Fig1 in Mao and Davis, Front in Neural Circuits 2009). I suggest this because I feel that this tool will be widely used, and if non-MB aficionados can follow what's being done here I feel it will be more widely accepted.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Wu et al. uses endogenous bruchpilot expression in a cell-type-specific manner to assess synaptic heterogeneity in adult Drosophila melanogaster mushroom body output neurons. The authors performed genomic on locus tagging of the presynaptic scaffold protein bruchpilot (BRP) with one part of splitGFP (GFP11) using the CRISPR/Cas9 methodology and co-expressed the other part of splitGFP (GFP1-10) using the GAL4/UAS system. Upon expression of both parts of splitGFP, fluorescent GFP is assembled at the N-terminus of BRP, exactly where BRP is endogenously expressed in active zones. For manageable analysis, a high-throughput pipeline was developed. This analysis evaluated parameters like location of BRP clusters, volume of clusters, and cluster intensity as a direct measure of the relative amount of BRP expression levels on site, using publicly available 3D analysis tools that are integrated in Fiji. Analysis was conducted for different mushroom body cell types in different mushroom body lobes using various specific GAL4 drivers. To test this new method of synapse assessment, Wu et al. performed an associative learning experiment in which an odor was paired with an aversive stimulus and found that, in a specific time frame after conditioning, the new analysis solidly revealed changes in BRP levels at specific synapses that are associated with aversive learning.

      Strengths:

      Expression of splitGFP bound to BRP enables intensity analysis of BRP expression levels as exactly one GFP molecule is expressed per BRP. This is a great tool for synapse assessment. This tool can be widely used for any synapse as long as driver lines are available to co-express the other part of splitGFP in a cell-type-specific manner. As neuropils and thus the BRP label can be extremely dense, the analysis pipeline developed here is very useful and important. The authors have chosen an exceptionally dense neuropil - the mushroom bodies - for their analysis and convincingly show that BRP assessment can be achieved with such densely packed active zones. The result that BRP levels change upon associative learning in an experiment with odor presentation paired with punishment is likewise convincing, and strongly suggests that the tool and pipeline developed here can be used in an in vivo context.

      Weaknesses:

      Although BRP is an important scaffold protein and its expression levels were associated with function and plasticity, I am still somewhat reluctant to accept that synapse structure profiling can be inferred from only assessing BRP expression levels and BRP cluster volume. Also, is it guaranteed that synaptic plasticity is not impaired by the large GFP fluorophore? Could the GFP10 construct that is tagged to BRP in all BRP-expressing cells, independent of GAL4, possibly hamper neuronal function? Is it certain that only active zones are labeled? I do see that plastic changes are made visible in this study after an associative learning experiment with BRP intensity and cluster volume as read-out, but I would be reassured by direct measurement of synaptic plasticity with splitGFP directly connected to BRP, maybe at a different synapse that is more accessible.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that Brp is an important, but not the only player in the active zone. We have included new data to demonstrate that split-GFP tagging does not severely affect the localization and plasticity of Brp and the function of synapses by showing: (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs; (3) activity-dependent Brp remodeling in R8 photoreceptors; (4) no defect in memory performance when labeling Brp::rGFP in KCs; These four lines of additional evidence further corroborate our approach to characterize endogenous Brp as a proxy of active zone structure.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a cell-type specific fluorescence-tagging approach using a CRISPR/Cas9 induced spilt-GFP reconstitution system to visualize endogenous Bruchpilot (BRP) clusters as presynaptic active zones (AZ) in specific cell types of the mushroom body (MB) in the adult Drosophila brain. This AZ profiling approach was implemented in a high-throughput quantification process, allowing for the comparison of synapse profiles within single cells, cell types, MB compartments, and between different individuals. The aim is to analyse in more detail neuronal connectivity and circuits in this centre of associative learning. These are notoriously difficult to investigate due to the density of cells and structures within a cell. The authors detect and characterize cell-type-specific differences in BRP-dependent profiling of presynapses in different compartments of the MB, while intracellular AZ distribution was found to be stereotyped. Next to the descriptive part characterizing various AZ profiles in the MB, the authors apply an associative learning assay and detect consequent AZ re-organisation.

      Strengths:

      The strength of this study lies in the outstanding resolution of synapse profiling in the extremely dense compartments of the MB. This detailed analysis will be the entry point for many future analyses of synapse diversity in connection with functional specificity to uncover the molecular mechanisms underlying learning and memory formation and neuronal network logics. Therefore, this approach is of high importance for the scientific community and a valuable tool to investigate and correlate AZ architecture and synapse function in the CNS.

      Weaknesses:

      The results and conclusions presented in this study are, in many aspects, well-supported by the data presented. To further support the key findings of the manuscript, additional controls, comments, and possibly broader functional analysis would be helpful. In particular:

      (1) All experiments in the study are based on spilt-GFP lines (BRP:GFP11 and UAS-GFP1-10).The Materials and Methods section does not contain any cloning strategy (gRNA, primer, PCR/sequencing validation, exact position of tag insertion, etc.) and only refers to a bioRxiv publication. It might be helpful to add a Materials and Methods section (at least for the BRP:GFP11 line). Additionally, as this is an on locus insertion the in BRP-ORF, it needs a general validation of this line, including controls (Western Blot and correlative antibody staining against BRP) showing that overall BRP expression is not compromised due to the GFP insertion and localizes as BRP in wild type flies, that flies are viable, have no defects in locomotion and learning and memory formation and MB morphology is not affected compared to wild type animals.

      We thank the reviewer for suggesting these important validations. We included details of the design of the construct and insertion site to the Methods section, performed several new experiments to validate the split-GFP tagging of Brp, and present the data in the revision.

      First, to examine whether the transcription of the brp gene is unaffected by the insertion of GFP<sub>11</sub>, we conducted qRT-PCR to compare the brp mRNA levels between brp::GFP<sub>11</sub>, UAS-GFP1-10 and UAS-GFP1-10 and found no difference (Figure 1 - figure supplement 1A).

      To further verify the effect of GFP<sub>11</sub> tagging at the protein level, we performed anti-Brp (nc82) immunohistochemistry of brains where GFP is reconstituted pan-neuronally. We found unaltered neuropile localization of nc82 signals (Figure 1 - figure supplement 1C). In presynaptic terminals of the mushroom body calyx, we found integration of Brp::rGFP to nc82 accumulation (Figure 1D). We performed super-resolution microscopy to verify the configuration of Brp::rGFP and confirmed the donut-shape arrangement of Brp::rGFP in the terminals of motor neurons (see Wu, Eno et al., 2025 PLOS Biology), corroborating the nanoscopic assembly of Brp::rGFP at active zones (Kittel et al., 2006 Science).

      Furthermore, co-expression of RFP-tagged voltage-gated calcium channel alpha subunit Cacophony (Cac) and Brp::rGFP in PAM-γ5 dopaminergic neurons revealed strong presynaptic colocalization of their punctate clusters (Figure 1E), suggesting that rGFP tagging of Brp did not damage key protein assembly at active zones (Kawasaki et al., 2004 J Neuroscience; Kittel et al., Science).

      These lines of evidence suggest that the localization of endogenous Brp is barely affected by the C-terminal GFP<sub>11</sub> insertion or GFP reconstitution therewith. This is in line with a large body of studies confirming that the N-terminal region and coiled-coil domains, but not the C-terminal, region of Brp are necessary and sufficient for active zone localization (Fouquet et al., 2009 J Cell Biol; Oswald et al., 2010 J Cell Biol; Mosca and Luo, 2014 eLife; Kiragasi et al., 2017 Cell Rep; Akbergenova et al., 2018 eLife; Nieratschker et al., 2009 PLoS Genet; Johnson et al., 2009 PLoS Biol; Hallermann et al., 2010 J Neurosci). We nevertheless report homozygous lethality and found the decreased immunoreactive signals in flies carrying the GFP<sub>11</sub> insertion (Figure 1 - figure supplement 1B).

      For these reasons, we always use heterozygotes for all the experiments therefore there is no conspicuous defect in locomotion as reported in the original study (Wagh et al., 2005 Neuron). To functionally validate the heterozygotes, we measured the aversive olfactory memory performance of flies where GFP reconstitution was induced in Kenyon cells using R13F02-GAL4. We found that all these transgenes did not alter mushroom body morphology (Figure 7 - figure supplement 1) or memory performance as compared to wild-type flies (Figure 7 - figure supplement 2), suggesting the synapse function required for short-term memory formation is not affected by split-GFP tagging of Brp.

      (2) Several aspects of image acquisition and high-throughput quantification data analysis would benefit from a more detailed clarification.

      (a) For BRP cluster segmentation it is stated in the Materials and Methods state, that intensity threshold and noise tolerance were "set" - this setting has a large effect on the quantification, and it should be specified and setting criteria named and justified (if set manually (how and why) or automatically (to what)). Additionally, if Pyhton was used for "Nearest Neigbor" analysis, the code should be made available within this manuscript; otherwise, it is difficult to judge the quality of this quantification step.

      (b) To better evaluate the quality of both the imaging analysis and image presentation, it would be important to state, if presented and analysed images are deconvolved and if so, at least one proof of principle example of a comparison of original and deconvoluted file should be shown and quantified to show the impact of deconvolution on the output quality as this is central to this study.

      We thank the reviewer for suggesting these clarifications. We have included more description to the revised manuscript to clarify the setting of segmentation, which was manually adjusted to optimize the F-score (previous Figure 1D, now moved to Figure 1 -figure supplement 5). We have included the code used for analyzing nearest neighbor distance, AZ density and local Brp density in the revised manuscript (Supplementary file 1), together with a pre-processed sample data sheet (Supplementary file 2).

      Regarding image deconvolution, we have clarified the differential use of deconvolved and not-deconvolved images in the revised manuscript. We have also included a quantitative evaluation of Richardson-Lucy iterative deconvolution (Figure 1 - figure supplement 4). We used 20 iterations due to only marginal FWHM improvement beyond this point (Figure 1 - figure supplement 4).

      (3) The major part of this study focuses on the description and comparison of the divergent synapse parameters across cell-types in MB compartments, which is highly relevant and interesting. Yet it would be very interesting to connect this new method with functional aspects of the heterogeneous synapses. This is done in Figure 7 with an associative learning approach, which is, in part, not trivial to follow for the reader and would profit from a more comprehensive analysis.

      (a) It would be important for the understanding and validation of the learning induced changes, if not (only) a ratio (of AZ density/local intensity) would be presented, but both values on their own, especially to allow a comparison to the quoted, previous AZ remodelling analysis quantifying BRP intensities (ref. 17, 18). It should be elucidated in more detail why only the ratio was presented here.

      We thank the reviewer for the suggestion on the presentation of learning-induced Brp remodeling. The reported values in Figure 7C are the correlation coefficient of AZ density and local intensity in each compartment, but not the ratio. These results suggest that subcompartment-sized clusters of AZs with high Brp accumulation (Figure 6) undergo local structural remodeling upon associative learning (Figure 7). For clarity, we have included a schematic of this correlation and an example scatter plot to Figure 6. Unlike the previous studies (refs 17 and 18), we did not observe robust learning-dependent changes in the Brp intensity, possibly due to some confounding factors such as overall expression levels and conditioning protocols as described in the previous and following points, respectively.

      (b) The reason why a single instead of a dual odour conditioning was performed could be clarified and discussed (would that have the same effects?).

      (c) Additionally, "controls" for the unpaired values - that is, in flies receiving neither shock nor odour - it would help to evaluate the unpaired control values in the different MB compartments.

      We use single odor conditioning because it is the simplest way to examine the effect of odor-shock association by comparing the paired and unpaired group. Standard differential conditioning with two odors contains unpaired odor presentation (CS-) even in the ‘paired’ group. We now show that single-odor conditioning induces memory that lasts one day as in differential conditioning (Figure 7B; Tully and Quinn, J Comp Phys A 1985).

      (d) The temporal resolution of the effect is very interesting (Figure 7D), and at more time points, especially between 90 and 270 min, this might raise interesting results.

      The sampling time points after training was chosen based on approximately logarithmic intervals, as the memory decay is roughly exponential (Figure 7B). This transient remodeling is consistent with the previous studies reporting that the Brp plasticity was short-lived (Zhang et al., 2018 Neuron; Turrel et al., 2022 Current Biol).

      (e) Additionally, it would be very interesting and rewarding to have at least one additional assay, relating structure and function, e.g. on a molecular level by a correlative analysis of BRP and synaptic vesicles (by staining or co-expression of SV-protein markers) or calcium activity imaging or on a functional level by additional learning assays.

      We thank the reviewer for raising this important point. We have performed calcium imaging of KC presynaptic terminals to correlate the structure and function in another study (see Figure 2 in Wu, Eno et al., 2025 PLOS Biology for more detail). The basal presynaptic calcium pattern along the γ compartments is strikingly similar to the compartmental heterogeneity of Brp accumulation (see also Figure 2 in this study). Considering colocalization of other active-zone components, such as Cac (Figure 1E), we propose that the learning-induced remodeling of local Brp clusters should transiently modulate synaptic properties.

      As a response to other reviewers’ interest, we used Brp::rGFP to measure different forms of Brp-based structural plasticity upon constant light exposure in the photoreceptors and upon silencing rab3 in KCs. Since these experiments nicely reproduced the results of previous studies (Sugie et al., Neuron 2013; Graf et al., Neuron 2009), we believe the learning-induced plasticity of Brp clustering in KCs has a transient nature.

      Reviewer #3 (Public review):

      Summary:

      The authors develop a tool for marking presynaptic active zones in Drosophila brains, dependent on the GAL4 construct used to express a fragment of GFP, which will incorporate with a genome-engineered partial GFP attached to the active zone protein bruchpilot - signal will be specific to the GAL4-expressing neuronal compartment. They then use various GAL4s to examine innervation onto the mushroom bodies to dissect compartment-specific differences in the size and intensity of active zones. After a description of these differences, they induce learning in flies with classic odour/electric shock pairing and observe changes after conditioning that are specific to the paired conditioning/learning paradigm.

      Strengths:

      The imaging and analysis appear strong. The tool is novel and exciting.

      Weaknesses:

      I feel that the tool could do with a little more characterisation. It is assumed that the puncta observed are AZs with no further definition or characterisation.

      We performed additional validation on the tool, including (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs (Figure 1D-E); 3) activity-dependent active zone remodeling in R8 photoreceptors (Figure 1F). These will be detailed in our point-by-point response below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors keep stating, they profile or assess synaptic structure by analyzing BRP localization, cluster volume, and intensity. However, I do not think that BRP cluster volume and intensity warrant an educated statement about presynaptic structure as a whole. I do not challenge the usefulness of BRP cluster analysis for synapse evaluation, but as there are so many more players involved in synaptic function, BRP analysis certainly cannot explain it all. This should at least be discussed.

      It is correct that Brp is not the only player in the active zone. We have included more discussion on the specific role of Brp (line 84 to 89) and other synaptic markers (line 250) and edited potentially misunderstanding text.

      (2) I do see that changes in BRP expression were observed following associative learning, but is it certain, that synaptic plasticity is generally unaffected by the large GFP fluorophore? BRP is grabbing onto other proteins, both with its C- and N-termini. As the GFP is right before the stop codon, it should be at the N-terminus. How far could BRP function be hampered by this? Is there still enough space for other proteins to interact?

      We thank the reviewer for sharing the concerns. We here provided three lines of evidence to demonstrate that the Brp assembly at active zones required for synaptic plasticity is unaffected by split-GFP tagging.

      First, we assessed olfactory memory of flies that have Brp::rGFP labeled in Kenyon cells and found the performance comparable to wild-type (Figure 7 - figure supplement 2), suggesting the Brp function required for olfactory memory (Knapek et al., J Neurosci 2011) is unaffected by split-GFP tagging.

      Second, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors. We have included this result into the revised manuscript (Figure 1F).

      To further validate protein interaction of Brp::rGFP, we focused on Rab3, as it was previously shown to control Brp allocation at active zones (Graf et al., 2009 Neuron). To this end, we silenced rab3 expression in Kenyon cells using RNAi and measured the intensity of Brp::rGFP clusters in γ Kenyon cells. As previously reported in the neuromuscular junction, we found that rab3 knock-down increased Brp::rGFP accumulation to the active zones, suggesting that Brp::rGFP represents the interaction with Rab3. We have included all the new data to the revised manuscript (Figure 1 - figure supplement 3).

      (3) It may well be that not only active-zone-associated BRP is labeled but possibly also BRP molecules elsewhere in the neuron. I would like to see more validation, e.g., the percentage of tagged endogenous BRP associated with other presynaptic proteins.

      To answer to what extent Brp::rGFP clusters represent active zones, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% of Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (4) Z-size is ~200 nm, while x/y pixel size is ~75 nm during acquisition. How far down does the resolution go after deconvolution?

      The Z-step was 370 nm and XY pixel size was 79 nm for image acquisition. We performed 20 iterations of Richarson-Lucy deconvolution using an empirical point spread function (PSF). We found that the effect of deconvolution on the full-width at half maximum (FWHM) of Brp::rGFP clusters improves only marginally beyond 20 iterations, when the XY FWHM is around 200 nm and the XZ FWHM is around 450 nm (Figure 1 - figure supplement 4).

      (5) Figure Legend 7: What is a "cytoplasm membrane marker"? Does this mean membrane-bound tdTom is sticking into the cytoplasm?

      We apologize for the typo and have corrected it to “plasma membrane marker”.

      (6) At the end of the introduction: "characterizing multiple structural parameters..." - which were these parameters? I was under the assumption that BRP localization, cluster volume, and intensity were assessed. I do not see how these are structural parameters. Please define what exactly is meant by "structural parameters".

      We apologize for the confusion. By "structural parameters”, we indeed referred to the volume, intensity and molecular density of Brp::rGFP clusters. We have revised the sentence to “Characterizing the distinct parameters and localization of Brp::rGFP cluster.”

      (7) Next to last sentence of the introduction: "Characterizing multiple structural parameters revealed a significant synaptic heterogeneity within single neurons and AZ distribution stereotypy across individuals." What do the authors mean by "significant synaptic heterogeneity"?

      By “synaptic heterogeneity”, we refer to the intracellular variability of active zone cytomatrices reported by Brp clusters. For instance, the intensities of Brp::rGFP clusters within Kenyon cell subtypes were variable among compartments (Figure 2). Intracellular variability of the Brp concentration of individual active zones was higher in DPM and APL neurons than Kenyon cells (Figure 3). These variabilities demonstrate intracellular synaptic heterogeneity. We have revised the sentence to be more specific to the different characters of Brp clusters.

      (8) I do not understand the last sentence of the introduction. "These cell-type-specific synapse profiles suggest that AZs are organized at multiple scales, ranging from neighboring synapses to across individuals." What do the authors mean by "ranging from neighboring synapses to across individuals"? Does this mean that even neighboring synapses in the same cell can be different?

      We have revised the sentence to “These cell-type-specific synapse profiles suggest that AZs are spatially organized at multiple scales, ranging from interindividual stereotypy to neighboring synapses in the same cells.”

      By “neighboring synapses", we refer to the nearest neighbor similarity in Brp levels in some cell-types (Figure 6A-C), and also the sub-compartmental dense AZ clusters with high Brp level in Kenyon cells (Figure 6D-H). By “across individuals”, we refer to the individually conserved active zone distribution patterns in some neurons (Figure 5).

      (9) The title talks about cell-type-specific spatial configurations. I do not understand what is meant by "spatial configurations"? Do you mean BRP cluster volume? I think the title is a little misleading.

      By “spatial configuration”, we refer to the arrangement of Brp clusters within individual mushroom body neurons. This statement is based on our findings on the intracellular synaptic heterogeneity (see also response to comment #7). We have streamlined the text description in the revised manuscript for clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) For Figure 3A: exemplary two AZs are compared here, a histogram comparing more AZs would aid in making the point that in general, AZ of similar size have different BRP level (intensities) and how much variation exists.

      We have included histograms for Brp::rGFP intensity and cluster volumes to Figure 3 in the revised manuscript.

      (2) Line 52: "endogenous synapses" is a confusing term; it's probably meant that the protein levels within the synapse are endogenous and not overexpressed. 

      We apologize for the confusion and have revised the term to “endogenous synaptic proteins.”

      (3) It is not clear from the Materials and Methods section, whether and where deconvolved or not-deconvolved images were used for the quantification pipeline. Please comment on this. 

      We have now revised the Method section to clarify how deconvolved or not-deconvolved images were differently used in the pipeline.

      (4) Line 664 (C) not bold.

      We have corrected the error.

      (5) 725 "Files" should be Flies.

      We have corrected the error.

      (6) 727 two times "first".

      We have corrected the error.

      (7) Figure 7. All (A) etc., not bold - there should be consistent annotation. 

      We want to thank the reviewer for the detailed proof and have corrected all the errors spotted.

      Reviewer #3 (Recommendations for the authors):

      (1) Has there been an expression of the construct in a non-neuronal cell? Astrocyte-like cell? Any glia? As some sort of control for background and activity?

      As the reviewer suggested, we verified the neuronal expression specificity of Brp::rGFP. Using R86E01-GAL4 and Amon-GAL4, we compared Brp::rGFP in astrocyte-like glia and neuropeptide-releasing neurons. We found no Brp::rGFP puncta in the neuropils in astrocyte-like glia compared to neurons, suggesting Brp::rGFP is specific to neurons. We have included this new dataset to the revised manuscript (Figure 1 - figure supplement 2).

      (2) Similarly, expression of the construct co-expressed with a channelrhodopsin, and induction of a 'learning'-like regime of activity, similarly in a control type of experiment, expression of an inwardly rectifying channel (e.g. Kir2.1) to show that increases in size of the BRP puncta are truly activity dependent? The NMJ may be an optimal neuron to use to see the 'donut' structures of the AZs and their increase with activity. Also, are these truly AZs we are seeing here? Perhaps try co-expressing cacophony-dsRed? If the GFP Puncta are active zones, then they should be surrounded by cacophony.

      We would like to clarify that we did not find Brp::rGFP size increase upon learning. Instead, we demonstrated that associative training transiently remodelled sub-compartment-sized AZ “hot spots” in Kenyon cells, indicated by the correlation of local intensity and AZ density (Figure 6-7).

      To demonstrate split-GFP tagging does not affect activity-dependent plasticity associated with Brp, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors (Figure 1F).

      As the reviewer suggested, we performed the STED microscopy for the larval motor neuron and confirmed the donut-shape arrangement of Brp::rGFP (Wu, Eno et al., PLOS Biol 2025).

      Also following the reviewer’s suggestion, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (3) In the introduction: Intro, a sentence about BRP - central organiser of the active zone, so a key regulator of activity.

      We have included a few more sentences about the role Brp in the active zones to the revised manuscript.

      (4) Figure 1 E, line 650 'cite the resource here'. 

      We thank the reviewer for pointing out the error and we have corrected it.

      (5) Many readers may not be MB aficionados, and to make the data more accessible, perhaps use a cartoon of an MB with the cell bodies of the neurons around the MB expressing the constructs highlighted so that the reader can have a wider idea of the anatomy in relation to the MB.

      We appreciate these comments and have appended cartoons of the MB to figures to help readers understand the anatomy.

    1. eLife Assessment

      This useful study uses creative scalp EEG decoding methods to attempt to demonstrate that two forms of learned associations in a Stroop task are dissociable, despite sharing similar temporal dynamics. However, the evidence supporting the conclusions is incomplete due to concerns with the experimental design and methodology. This paper would be of interest to researchers studying cognitive control and adaptive behavior, if the concerns raised in the reviews can be addressed satisfactorily.

    2. Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. Two types of learned associations are characterized, one being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify time-resolved SC and SR correlates, which are used to test properties of their dynamics.

      The conclusion is reached that SC and SR associations can independently and simultaneously guide behavior. This conclusion is based on results showing SC and SR correlates are: (1) not entirely overlapping in cross-decoding; (2) simultaneously observed on average over trials in overlapping time bins; (3) independently correlate with RT; and (4) have a positive within-trial correlation.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations.

      Nice idea to orthogonalize ISPC condition (MC/MI) from stimulus features.

      Weaknesses:

      I still have my concern from the first round that the decoders are overfit to temporally structured noise. As I wrote before, the SC and SR classes are highly confounded with phase (chunk of session). I do not see how the control analyses conducted in the revision adequately deal with this issue.

      In the figures, there are several hints that these decoders are biased. Unfortunately, the figures are also constructed in such a way that hides or diminishes the salience of the clues of bias. This bias and lack of transparency discourage trust in the methods and results.

      I have two main suggestions:

      (1) Run a new experiment with a design that properly supports this question.

      I don't make this suggestion lightly, and I understand that it may not be feasible to implement given constraints; but I feel that this suggestion is warranted. The desired inferences rely on successful identification of SC and SR representations. Solidly identifying SC and SR representations necessitates an experimental design wherein these variables are sufficiently orthogonalized, within-subject, from temporally structured noise. The experimental design reported in this paper unfortunately does not meet this bar, in my opinion (and the opinion of a colleague I solicited).

      An adequate design would have enough phases to properly support "cross-phase" cross-validation. Deconfounding temporal noise is a basic requirement for decoding analyses of EEG and fMRI data (see e.g., leave-one-run-out CV that is effectively necessary in fMRI; in my experience, EEG is not much different, when the decoded classes are blocked in time, as here). In a journal with a typical acceptance-based review process, this would be grounds for rejection.

      Please note that this issue of decoder bias would seem to weaken the rest of the downstream analyses that are based on the decoded values. For instance, if the decoders are biased, in the within-trial correlation analysis, how can we be sure that co-fluctuations along certain dimensions within their projected values are driven by signal or noise? A similar issue clouds the LMM decoding-RT correlations.

      (2) Increase transparency in the reporting of results throughout main text.

      Please do not truncate stimulus-aligned timecourses at time=0. Displaying the baseline period is very useful to identify bias, that is, to verify that stimulus-dependent conditions cannot be decoded pre-stimulus. Bias is most expected to be revealed in the baseline interval when the data are NOT baseline-corrected, which is why I previously asked to see the results omitting baseline correction. (But also note that if the decoders are biased, baseline-correcting would not remove this bias; instead, it would spread it across the rest of the epoch, while the baseline interval would, on average, be centered at zero.)

      Please use a more standard p-value correction threshold, rather than Bonferroni-corrected p<0.001. This threshold is unusually conservative for this type of study. And yet, despite this conservativeness, stimulus-evoked information can be decoded from nearly every time bin, including at t=0. This does not encourage trust in the accuracy of these p-values. Instead, I suggest using permutation-based cluster correction, with corrected p<0.05. This is much more standard and would therefore allow for better comparison to many other studies.

      I don't think these things should be done as control analyses, tucked away in the supplemental materials, but instead should be done as a part of the figures in the main text -- including decoding, RSA, cross-trial correlations, and RT correlations.

      Other issues:

      Regarding the analysis of the within-trial correlation of RSA betas, and "Cai 2019" bias:

      The correction that authors perform in the revision -- estimating the correlation within the baseline time interval and subtracting this estimate from subsequent timepoints -- assumes that the "Cai 2019" bias is stationary. This is a fairly strong assumption, however, as this bias depends not only on the design matrix, but also on the structure of the noise (see the Cai paper), which can be non-stationary. No data were provided in support of stationarity. It seems safer and potentially more realistic to assume non-stationarity.

      This analysis was included in the supplemental material. However, given that the correlation analysis presented in the Results is subject to the "Cai 2019" bias, it would seem to be more appropriate to replace that analysis, rather than supplement it.

      Regardless, this seems to be a moot issue, given that the underlying decoders seem to be overfit to temporally structured noise (see point above regarding weakening of downstream analyses based on decoder bias).

      Outliers and t-values:

      More outliers with beta coefficients could be because the original SD estimates from the t-values are influenced more by extreme values. When you use a threshold on the median absolute deviation instead of mean +/-SD, do you still get more outliers with beta coefficients vs t-values?

      Random slopes:

      Were random slopes (by subject) for all within-subject variables included in the LMMs? If not, please include them, and report this in the Methods.

    3. Reviewer #2 (Public review):

      Summary:

      In this EEG study, Huang et al. investigated the relative contribution of two accounts to the process of conflict control, namely the stimulus-control association (SC), which refers to the phenomenon that the ratio of congruent vs. incongruent trials affects the overall control demands, and the stimulus-response association (SR), stating that the frequency of stimulus-response pairings can also impact the level of control. The authors extended the Stroop task with novel manipulation of item congruencies across blocks in order to test whether both types of information are encoded and related to behaviour. Using decoding and RSA they showed that the SC and SR representations were concurrently present in voltage signals and they also positively co-varied. In addition, the variability in both of their strengths was predictive of reaction time. In general, the experiment has a sold design and the analyses are appropriate for the research questions.

      Strength:

      (1) The authors used an interesting task design that extended the classic Stroop paradigm and is effective in teasing apart the relative contribution of the two different accounts regarding item-specific proportion congruency effect.

      (2) Linking the strength of RSA scores with behavioural measure is critical to demonstrating the functional significance of the task representations in question.

      Weakness:

      (1) The distinction between Phase 2 and Phase 1&3 behavioral results, specifically the opposite effect of MC/MI in congruent trials raises some concerns with regard to the effectiveness of the ISPC manipulation. Why do RTs and error rates under MC congruent condition in Phase 2 seem to be worse than MI congruent? Could there be other factors at play here, e.g. order effect? How does this potentially affect the neural analyses where trials from different phases were combined? Also, the manuscript does not mention whether there is counterbalancing for the color groups across participants, so far as I can tell.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study uses creative scalp EEG decoding methods to attempt to demonstrate that two forms of learned associations in a Stroop task are dissociable, despite sharing similar temporal dynamics. However, the evidence supporting the conclusions is incomplete due to concerns with the experimental design and methodology. This paper would be of interest to researchers studying cognitive control and adaptive behavior, if the concerns raised in the reviews can be addressed satisfactorily.

      We thank the editors and the reviewers for their positive assessment of our work and for providing us with an opportunity to strengthen this manuscript. Please see below our responses to each comment raised in the reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. In particular, two types of learned associations are characterized. One being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify SC and SR correlates and to determine whether they have similar topographies and dynamics.

      The results suggest SC and SR associations are simultaneously coactivated and have shared topographies, with the inference being that these associations may share a common generator.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations. Nice idea to orthogonalize the ISPC condition (MC/MI) from stimulus features.

      Thank you for acknowledging the strength in EEG decoding and design. We have addressed all your concerns raised below point by point.

      Weaknesses:

      (1a) I'm relatively concerned that these results may be spurious. I hope to be proven wrong, but I would suggest taking another look at a few things.

      While a nice idea in principle, the ISPC manipulation seems to be quite confounded with the trial number. E.g., color-red is MI only during phase 2, and is MC primarily only during Phase 3 (since phase 1 is so sparsely represented). In my experience, EEG noise is highly structured across a session and easily exploited by decoders. Plus, behavior seems quite different between Phase 2 and Phase 3. So, it seems likely that the classes you are asking the decoder to separate are highly confounded with temporally structured noise.

      I suggest thinking of how to handle this concern in a rigorous way. A compelling way to address this would be to perform "cross-phase" decoding, however I am not sure if that is possible given the design.

      Thank you for raising this important issue. To test whether decoding might be confounded by temporally structured noise, we performed a control decoding analysis. As the reviewer correctly pointed out, cross-phase decoding is not possible due to the experimental design. Alternatively, to maximize temporal separation between the training and test data, we divided the EEG data in phase 2 and phase 1&3 into the first and second half chronologically. Phase 1 and 3 were combined because they share the same MC and MI assignments. We then trained the decoders on one half and tested them on the other half. Finally, we averaged the decoding results across all possible assignments of training and test data. The similar patterns (Supplementary Fig.1) observed confirmed that the decoding results are unlikely to be driven by temporally structured noise in the EEG data. The clarification has been added to page 13 of the revised manuscript.

      (1b) The time courses also seem concerning. What are we to make of the SR and SC timecourses, which have aggregate decoding dynamics that look to be <1Hz?

      As detailed in the response to your next comment, some new results using data without baseline correction show a narrower time window of above-chance decoding. We speculate that the remaining results of long-lasting above-chance decoding could be attributed to trials with slow responses (some responses were made near the response deadline of 1500 ms). Additionally, as shown in Figure 6a, the long-lasting above-chance decoding seems to be driven by color and congruency representations. Thus, it is also possible that the binding of color and congruency contributes to decoding. This interpretation has been added to page 17 of the revised manuscript.

      (1c) Some sanity checks would be one place to start. Time courses were baselined, but this is often not necessary with decoding; it can cause bias (10.1016/j.jneumeth.2021.109080), and can mask deeper issues. What do things look like when not baselined? Can variables be decoded when they should not be decoded? What does cross-temporal decoding look like - everything stable across all times, etc.?

      As the reviewer mentioned, baseline-corrected data may introduce bias to the decoding results. Thus, we cited the van Driel et al (2021) paper in the revised manuscript to justify the use of EEG data without baseline-correction in decoding analysis (Page 27 of the revised manuscript), and re-ran all decoding analysis accordingly. The new results revealed largely similar results (Fig. 2, 4, 6 and 8 in the revised manuscript) with the following exceptions: narrower time window for separatable SC subspace and SR subspace (Fig. 4b), narrower time window for concurrent representations of SC and SR (Fig. 6a-b), and wider time window for the correlations of SC/SR representations with RTs (Fig. 8).

      (2) The nature of the shared features between SR and SC subspaces is unclear.

      The simulation is framed in terms of the amount of overlap, revealing the number of shared dimensions between subspaces. In reality, it seems like it's closer to 'proportion of volume shared', i.e., a small number of dominant dimensions could drive a large degree of alignment between subspaces.

      What features drive the similarity? What features drive the distinctions between SR and SC? Aside from the temporal confounds I mentioned above, is it possible that some low-dimensional feature, like EEG congruency effect (e.g., low-D ERPs associated with conflict), or RT dynamics, drives discriminability among these classes? It seems plausible to me - all one would need is non-homogeneity in the size of the congruency effect across different items (subject-level idiosyncracies could contribute: 10.1016/j.neuroimage.2013.03.039).

      Thank you for this question. To test what dimensions are shared between SC and SR subspaces, we first identify which factors can be shared across SC and SR subspaces. For SC, the eight conditions are the four colors × ISPC. Thus, the possible shared dimensions are color and ISPC. Additionally, because the four colors and words are divided into two groups (e.g., red-blue and green-yellow, counterbalanced across subjects, see Methods), the group is a third potential shared dimension. Similarly, for SR decoders, potential shared dimensions are word, ISPC and group. Note that each class in SC and SR decoders has both congruent and incongruent trials. Thus, congruency is not decodable from SC/SR decoders and hence unlikely to be a shared dimension in our analysis. To test the effect of sharing for each of the potential dimensions, we performed RSA on decoding results of the SC decoder trained on SR subspace (SR | SC) (Supplementary Fig. 4a) and the SR decoder trained on SC subspace (SC | SR) (Supplementary Fig. 4b), where the decoders indicated the decoding accuracy of shared SC and SR representations. In the SC classes of SR | SC, word red and blue were mixed within the same class, same were word yellow and green. The similarity matrix for “Group” of SR | SC (Supplementary Fig. 4a) shows the comparison between two word groups (red & blue vs. yellow & green). The similarity matrix for “Group” of SC | SR (Supplementary Fig. 4b) shows the comparison between two color groups (red & blue vs. yellow & green).

      The RSA results revealed that the contributions of group to the SC decoder (Supplementary Fig. 5a) and the SR decoder (Supplementary Fig. 5b) were significant. Meanwhile, a wider time window showed significant effect of color on the SC decoder (approximately 100 - 1100 ms post-stimulus onset, Supplementary Fig. 5a) and a narrower time window showed significant effect of word on SR decoder (approximately 100 - 500 ms post-stimulus onset, Supplementary Fig. 5b). However, we found no significant effect of ISPC on either SC or SR decoders. We also performed the same analyses on response-locked data from the time window -800 to 200 ms. The results showed shared representation of color in the SC decoder (Supplementary Fig. 5c) and group in both decoders (Supplementary Fig. 5c-d). Overall, the above results demonstrated that color, word and group information are shared between SC and SR subspaces.

      Lastly, we would like to stress that our main hypothesis for the cross-subspace decoding analysis is that SR and SC subspaces are not identical. This hypothesis was supported by lower decoding accuracy for cross-subspace than within-subspace decoders and enables following analyses that treated SC and SR as separate representations.

      We have added the interpretation to page 13-14 of the revised manuscript.

      (3) The time-resolved within-trial correlation of RSA betas is a cool idea, but I am concerned it is biased. Estimating correlations among different coefficients from the same GLM design matrix is, in general, biased, i.e., when the regressors are non-orthogonal. This bias comes from the expected covariance of the betas and is discussed in detail here (10.1371/journal.pcbi.1006299). In short, correlations could be inflated due to a combination of the design matrix and the structure of the noise. The most established solution, to cross-validate across different GLM estimations, is unfortunately not available here. I would suggest that the authors think of ways to handle this issue.

      Thank you for raising this important issue. Because the bias comes from the covariance between the regressors and the same GLM was applied to all time points in our analysis, we assume that the inflation would be similar at different time points. Therefore, we calculated the correlation of SC and SR betas ranging from -200 to 0 ms relative to stimulus onset as a baseline (i.e., no SC or SR representation is expected before the stimulus onset) and compared the post-stimulus onset correlation coefficients against this baseline. We hypothesized that if the positively within-trial correlation of SC and SR betas resulted from the simultaneous representation instead of inflation, we should observe significantly higher correlation when compared with the baseline. To examine this hypothesis, we first performed the linear discriminant analysis (Supplementary Fig. 7a) and RSA regression (Supplementary Fig. 7b) on the -200 - 0 ms window relative to stimulus onset. We then calculated the average r<sub>baseline</sub> of SC and SR betas on that time window for each participant (group results at each time point are shown in Supplementary Fig. 7c) and computed the relative correlation at each post-stimulus onset time point using (fisher-z (r) - fisher-z (r<sub>baseline</sub>)). Finally, we performed a simple t test at the group level on baseline-corrected correlation coefficients with Bonferroni correction. The results (Fig. 6c) showed significantly more positive correlation from 100 - 500 ms post-stimulus onset compared with baseline, supporting our hypothesis that the positive within-trial correlation of SC and SR betas arise from simultaneous representation rather than inflation. The related interpretation was added to page 17 of the revised manuscript.

      (4) Are results robust to running response-locked analyses? Especially the EEG-behavior correlation. Could this be driven by different RTs across trials & trial-types? I.e., at 400 ms poststim onset, some trials would be near or at RT/action execution, while others may not be nearly as close, and so EEG features would differ & "predict" RT.

      Thanks for this question. We now pair each of the stimulus-locked EEG analysis in the manuscript with response-locked analysis. To control for RT variations among trial types, when using the linear mixed model (LMM) to predict RTs from trial-wise RSA results, we included a separate intercept for each of the eight trial types in SC or SR. Furthermore, at each time point, we only included trials that have not generated a response (for stimulus-locked analysis) or already started (for response-locked analysis). All the results (Fig. 3, 5, 7, 9 in the revised manuscript) are in support of our hypothesis. We added these detailed to page 31 of the revised manuscript.

      (5) I suggest providing more explanation about the logic of the subspace decoding method - what trialtypes exactly constitute the different classes, why we would expect this method to capture something useful regarding ISPC, & what this something might be. I felt that the first paragraph of the results breezes by a lot of important logic.

      In general, this paper does not seem to be written for readers who are unfamiliar with this particular topic area. If authors think this is undesirable, I would suggest altering the text.

      To improve clarity, we revised the first paragraph of the SC and SR association subspace analysis to list the conditions for each of the SC and SR decoders and explain more about how the concept of being separatable can be tested by cross-decoding between SC and SR subspaces. The revised paragraph now reads:

      “Prior to testing whether controlled and non-controlled associations were represented simultaneously, we first tested whether the two representations were separable in the EEG data.

      In other words, we reorganized the 16 experimental conditions into 8 conditions for SC (4 colors × MC/MI, while collapsing across SR levels) and SR (4 words × 2 possible responses per word, while collapsing across SC levels) associations separately. If SC and SR associations are not separable, it follows that they encode the same information, such that both SC and SR associations can be represented in the same subspace (i.e., by the same information encoded in both associations). For example, because (1) the word can be determined by the color and congruency and (2) the most-likely response can be determined by color and ISPC, the SR association (i.e., association between word and most-likely response) can in theory be represented using the same information as the SC association. On the other hand, if SC and SR associations are separable, they are expected to be represented in different subspaces (i.e., the information used to encode the two associations is different). Notably, if some, but not all, information is shared between SC and SR associations, they are still separable by the unique information encoded. In this case, the SC and SR subspaces will partially overlap but still differ in some dimensions. To summarize, whether SC and SR associations are separable is operationalized as whether the associations are represented in the same subspace of EEG data. To test this, we leveraged the subspace created by the LDA (see Methods). Briefly, to capture the subspace that best distinguishes our experimental conditions, we trained SC and SR decoders using their respective aforementioned 8 experimental conditions. We then projected the EEG data onto the decoding weights of the LDA for each of the SC and SR decoders to obtain its respective subspace. We hypothesized that if SC and SR subspaces are identical (i.e., not separable), SC/SR decoding accuracy should not differ by which subspace (SC or SR) the decoder is trained on. For example, SC decoders trained in SC subspace should show similar decoding performance as SC decoders trained in SR subspace. On the other hand, if SC and SR association representations are in different subspaces, the SC/SR subspace will not encode all information for SR/SC associations. As a result, decoding accuracy should be higher using its own subspace (e.g., decoding SC using the SC subspace) than using the other subspace (e.g., decoding SC using the SR subspace). We used cross-validation to avoid artificially higher decoding accuracy for decoders using their own subspace (see Methods).” (Page 11-12).

      We also explicitly tested what information is shared between SC and SR representations (see response to comment #2). Lastly, to help the readers navigate the EEG results, we added a section “Overview of EEG analysis” to summarize the EEG analysis and their relations in the following manner:

      “EEG analysis overview. We started by validating that the 16 experimental conditions (8 unique stimuli × MC/MI) were represented in the EEG data. Evidence of representation was provided by above-chance decoding of the experimental conditions (Fig. 2-3). We then examined whether the SC and SR associations were separable (i.e., whether SC and SR associations were different representations of equivalent information). As our results supported separable representations of SC and SR association (Fig. 4-5), we further estimated the temporal dynamics of each representation within a trial using RSA. This analysis revealed that the temporal dynamics of SC and SR association representations overlapped (Fig. 6a-b, Fig. 7a-b). To explore the potential reason behind the temporal overlap of the two representations, we investigated whether SC and SR associations were represented simultaneously as part of the task representation, independently from each other, or competitively/exclusively (e.g., on some trials only SC association was represented, while on other trials only SR association was represented). This was done by assessing the correlation between the strength of SC and SR representations across trials (Fig. 6c, Fig. 7c). Lastly, we tested how SC and SR representations facilitated performance (Fig.8-9).” (Page 8-9).

      Minor suggestions:

      (6) I'd suggest using single-trial RSA beta coefficients, not t-values, as they can be more stable (it's a t-value based on 16 observations against 9 or so regressors.... the SE can be tiny).

      Thank you for your suggestion. To choose between using betas and t-values, we calculate the proportion of outliers (defined as values beyond mean ± 5 SD) for each predictor of the design matrix and each subject. We found that outliers were less frequent for t-values than for beta coefficients (t-values: mean = 0.07%, SD = 0.009%; beta-values: mean = 0.19%, SD = 0.033%). Thus, we decided to stay with t-values.

      (7) Instead of prewhitening the RTs before the HLM with drift terms, try putting those in the HLM itself, to avoid two-stage regression bias.

      Thank you for your suggestion. Because our current LMM included each of the eight trial types in SC or SR as separate predictors with their own intercepts (as mentioned above), adding regressors of trial number and mini blocks (1-100 blocks) introduced collinearity (as ISPC flipped during the experiment). We therefore excluded these regressors from the current LMM (Page 31).

      (8) The text says classical MDS was performed on decoding *accuracy* - is this accurate?

      We now clarify in the manuscript that it is the decoders’ probabilistic classification results (Page 28).

      (9) At a few points, it was claimed that a negative correlation between SC and SR would be expected within single trials, if the two were temporally dissociable. Wouldn't it also be possible that they are not correlated/orthogonal?

      We agree with the reviewer and revised the null hypothesis in the cross-trial correlation analysis to include no correlation as SC and SR association representations may be independent from each other (Page 17, 22).

      Reviewer #2 (Public review):

      Summary:

      In this EEG study, Huang et al. investigated the relative contribution of two accounts to the process of conflict control, namely the stimulus-control association (SC), which refers to the phenomenon that the ratio of congruent vs. incongruent trials affects the overall control demands, and the stimulus-response association (SR), stating that the frequency of stimulusresponse pairings can also impact the level of control. The authors extended the Stroop task with novel manipulation of item congruencies across blocks in order to test whether both types of information are encoded and related to behaviour. Using decoding and RSA, they showed that the SC and SR representations were concurrently present in voltage signals, and they also positively co-varied. In addition, the variability in both of their strengths was predictive of reaction time. In general, the experiment has a solid design, but there are some confounding factors in the analyses that should be addressed to provide strong support for the conclusions.

      Strengths:

      (1) The authors used an interesting task design that extended the classic Stroop paradigm and is potentially effective in teasing apart the relative contribution of the two different accounts regarding item-specific proportion congruency effect, provided that some confounds are addressed.

      (2) Linking the strength of RSA scores with behavioural measures is critical to demonstrating the functional significance of the task representations in question.

      Thank you for your positive feedback. We hope our responses below address your concerns.

      Weakness:

      (1) While the use of RSA to model the decoding strength vector is a fitting choice, looking at the RDMs in Figure 7, it seems that SC, SR, ISPC, and Identity matrices are all somewhat correlated. I wouldn't be surprised if some correlations would be quite high if they were reported. Total orthogonality is, of course, impossible depending on the hypothesis, but from experience, having highly covaried predictors in a regression can lead to unexpected results, such as artificially boosting the significance of one predictor in one direction, and the other one to the opposite direction. Perhaps some efforts to address how stable the timed-resolved RSA correlations for SC and SR are with and without the other highly correlated predictors will be valuable to raising confidence in the findings.

      Thank you for this important point. The results of proportion of variability explained shown in the Author response table 1 below, indicated relatively higher correlation of SC/SR with Color and Identity. We agree that it is impossible to fully orthogonalize them. To address the issue of collinearity, we performed a control RSA by removing predictors highly correlated with others. Specifically, we calculated the variance inflation factor (VIF) for each predictor. The Identity predictor had a high VIF of 5 and was removed from the RSA. All other predictors had VIFs < 4 and were kept in the RSA. The results (Supplementary Fig. 6) showed patterns similar to the results with the Identity predictor, suggesting that the findings are not significantly influenced by collinearity. We have added the interpretation to page 17 of the revised manuscript.

      Author response table 1.

      Proportion of variability explained (r<sup>2</sup>) of RSA predictors.

      (2) In "task overview", SR is defined as the word-response pair; however, in the Methods, lines 495-496, the definition changed to "the pairing between word and ISPC" which is in accordance with the values in the RDMs (e.g., mccbb and mcirb have similarity of 1, but they are linked to different responses, so should they not be considered different in terms of SR?). This needs clarification as they have very different implications for the task design and interpretation of results, e.g., how correlated the SC and SR manipulations were.

      Thank you for pointing out this important issue with how our operationalization captures the concept in questions. In the revised manuscript, we clarified the stimulus-response (SR) association is the link between the word and the most-likely response (i.e., not necessarily the actual response on the current trial). This association is likely to be encoded based on statistical learning over several trials. On each trial, the association is updated based on the stimulus and the actual response. Over multiple trials, the accumulated association will be driven towards the most-common (i.e., most-likely) response. In our ISPC manipulation, a color is presented in mostly congruent/incongruent (MC/MI) trials, which will also pair a word with a most-likely response. For example, if the color blue is MC, the color blue, which leads to the response blue, will co-occur with the word blue with high frequency. In other words, the SR association here is between the word blue and the response blue. As the actual response is not part of the SR association, in the RDM two trial types with different responses may share the same SR association, as long as they share the same word and the same ISPC manipulation, which, by the logic above, will produce the same most-likely response. These clarifications have been added to page 4 and 29 of the revised manuscript.

      In the revised manuscript (Page 17), we addressed how much the correlated SC and SR predictors in the RDM could affect the correlation analysis between SC and SR association representation strength. Specifically, we conducted the RSA using the same GLM on EEG data prior to stimulus onset (Supplementary Fig. 7a-b). As no SC and SR associations are expected to be present before stimulus onset, the correlation between SC and SR representation would serve as a baseline of inflation due to correlated predictors in the GLM (Supplementary Fig. 7c, also see comment #3 of R1). The SC-SR correlation coefficients following stimulus onset was then compared to the baseline to control for potential inflation (Fig. 6c). Significantly above-baseline correlation was still observed between ~100-500 ms post-stimulus onset, providing support for the hypothesis that SC and SR are encoded in the same task representation.

      Minor suggestions:

      (3) Overall, I find that calling SC-controlled and SR-uncontrolled representations unwarranted. How is the level controlledness defined? Both are essentially types of statistical expectation that provide contextual information for the block of tasks. Is one really more automatic and requires less conscious processing than the other? More background/justification could be provided if the authors would like to use these terms.

      Following your advice, we have added more discussion on how controlledness is conceptualized in this work and in the literature, which reads:

      “We consider SC and SR as controlled and uncontrolled respectively based on the literature investigating the mechanism of ISPC effect. The SC account posits that the ISPC effect results from conflict and involves conflict adaptation, which requires the regulation of attention or control (Bugg & Hutchison, 2013; Bugg et al., 2011; Schmidt, 2018; Schmidt & Besner, 2008). On the other hand, the SR account argues that ISPC effect does not require conflict adaptation but instead reflects contingency leaning. That is, the response can be directly retrieved from the association between the stimulus and the most-likely response without top-down regulation of attention or control. As more empirical evidence emerged, researchers advocating control view began to acknowledge the role of associative learning in cognitive control regarding the ISPC effect (Abrahamse et al., 2016). SC association has been thought to include both automatic that is fast and resource saving and controlled processes that is flexible and generalizable (Chiu, 2019). Overall, we do not intend to claim that SC is entirely controlled or SR is completely automatic. We use SC-controlled and SR-uncontrolled representations to align with the original theoretical motivation and to highlight the conceptual difference between SC and SR associations.” (Page 24-25)

      (4) Figures 3c and d: the figures could benefit from more explanation of what they try to show to the readers. Also for 3d, the dimensions were aligned with color sets and congruencies, but word identities were not linearly separable, at least for the first 3 axes. Shouldn't one expect that words can be decoded in the SR subspace if word-response pairs were decodable (e.g., Figure 3b)?

      Thank you for the insightful observation. We now clarified that Fig. 3c and d in the original manuscript (Fig. 4c and d in the current manuscript) aim to show how each of the 8 trial types in the SC and SR subspaces are represented. The MDS approach we used for visualization tries to preserve dissimilarity between trial types when projecting from data from a high dimensional to a low dimensional space. However, such projection may also make patterns linearly separatable in high dimensional space not linearly separatable in low dimensional space. For example, if the word blue has two points (-1, -1) and (1, 1) and the word red has two points (-1, 1) and (1, -1), they are not linearly separatable in the 2D space. Yet, if they are projected from a 3D space with coordinates of (-1, -1, -0.1), (1, 1, -0.1), (-1, 1, 0.1) and (1, -1, 0.1), the two words can be linearly separatable using the 3<sup>rd</sup> dimension. Thus, a better way to test whether word can be linearly separated in SR subspace is to perform RSA on the original high dimensional space. We performed the RSA with word (Supplementary Fig. 2) on the SR decoder trained on the SR subspace. Note that in Fig. 3c and d of the original script (Fig. 4c and d in the current manuscript) there are two pairs of words that are not linearly separable: red-blue and yellow-green. Thus, we specifically tested the separability within the two pairs using the one predictor for each pair, as shown in Supplementary Fig. 2. The results showed that within both word pairs individual words were presented above chance level (Supplementary Fig. 3). Considering that the decoders are linear, this finding indicates linear separability of the word pairs in the original SR subspace. The clarification has been added to page 13 (the end of the second paragraph) of the revised manuscript.

      References

      Abrahamse, E., Braem, S., Notebaert, W., & Verguts, T. (2016). Grounding cognitive control in associative learning. Psychological Bulletin, 142(7), 693-728.doi:10.1037/bul0000047.

      Bugg, J. M., & Hutchison, K. A. (2013). Converging evidence for control of color-word Stroop interference at the item level. Journal of Experimental Psychology:Human Perception and Performance, 39(2), 433-449. doi:10.1037/a0029145.

      Bugg, J. M., Jacoby, L. L., & Chanani, S. (2011). Why it is too early to lose control in accounts of item-specific proportion congruency effects. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 844-859. doi:10.1037/a0019957.

      Chiu, Y.-C. (2019). Automating adaptive control with item-specific learning. In Psychology of Learning and Motivation (Vol. 71, pp. 1-37).

      Schmidt, J. R. (2018). Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin & Review, 26(3), 753-771. doi:10.3758/s13423018-1520-z.

      Schmidt, J. R., & Besner, D. (2008). The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(3), 514-523. doi:10.1037/0278-7393.34.3.514.

    1. eLife Assessment

      Stearns and Poletti present a technically impressive study that aims to uncover a deeper understanding of microsaccade function: their role in perceptual modulation and the associated temporal dynamics. The question is useful, and advances prior work by adding temporal granularity. However, the strength of the evidence is currently incomplete. Additional analysis is needed to control for the effects of endogenous attention and to demonstrate changes in perceptual performance.

    2. Reviewer #1 (Public review):

      Summary:

      Using high-precision eyetracking, the authors measure foveolar sensitivity modulations before, during, and after instructed microsaccades to a centrally cued orientation stimulus.

      Strengths:

      The article is clearly written, and the stimulus presentation method is sophisticated and well-established. The data provide interesting insights that will be useful for comparisons between trans-saccadic and trans-microsaccadic sensitivity modulations.

      Weaknesses:

      Nonetheless, I have major concerns regarding the interpretation of the measured time courses (in particular, inconsistencies in distinguishing enhancement from suppression), the attempt to disentangle these effects from endogenous attention shifts, and the overstatement of the findings' novelty.

      (1) Overstatement of novelty

      The authors motivate their study by stating that "the temporal dynamics of these pre-microsaccadic modulations remain unknown" (l. 55-56). However, Shelchkova & Poletti (2020) already report a microsaccade-aligned sensitivity time course. I understand that the present study uses shorter target durations and thus provides a more resolved estimate. Nonetheless, a fairer characterization of the study's novelty would be that observers' discrimination performance is continuously measured across the pre-, intra-, and post-movement interval, within the same observers and experimental design. Relatedly, the authors state that it is unclear whether pre-microsaccadic sensitivity modulations reflect "suppression at the non-foveated location, enhancement at the microsaccade target, or both" (l. 70). Guzhang et al. (2024) examined the spatial spread of pre-microsaccadic sensitivity modulations by measuring performance at the PRL, the movement target, and several other equidistant locations. They report that "whereas fine spatial vision is enhanced at the microsaccade goal location, it drops at the very center of gaze". The current authors' reasoning seems to be that performances at locations that are neither the target nor the PRL may behave differently. Why would that be the case? If my understanding is correct, I would recommend incorporating these clarifications into the motivation paragraph, so that readers less familiar with the literature do not overestimate the novelty of the findings. Moreover, and related to point 3, I am unsure if the current analyses provide decisive evidence to distinguish enhancement from suppression, as claimed by the authors.

      (2) Distinction from endogenous attention

      To "rule out the possible influence of covert attention" (l. 232), the authors compute a cue-aligned in addition to the movement-aligned performance time course. A difference in alignment cannot rule out the influence of a certain mechanism; it can only dilute it. Just like endogenous attention may contribute to the movement-aligned time course, movement preparation will necessarily contribute to the cue-aligned time course, since these timelines are intrinsically correlated: as the trial progresses, observers will be in later and later stages of saccade preparation. For this and several additional reasons, an effect in the cue-aligned time course is in fact expected-and, in my view, clearly present (see below). As the authors themselves note, endogenous attention has been shown to operate within the foveola and should therefore be engaged in the present experiment in addition to movement-related attentional shifts (unless the authors believe that specific design features, e.g., stimulus timing, preclude its involvement?). Regardless of the theoretical considerations, the empirical data show a pronounced, near-linear increase in performance at the target location, with d′ doubling from approximately 1 to 2. Although the interaction between condition and time does not reach significance (p = 0.09), this result should not be taken as conclusive evidence against a plausible and perhaps expected contribution of endogenous attention. I suggest an additional analysis that could more directly address these issues. In previous work (Rolfs & Carrasco, 2012; Kroell & Rolfs, 2025; see Figure 3), the relative contributions of cue-alinged influences and pre-saccadic attention were disentangled by reweighting each data point according to its position on both the cue-locked and saccade-locked timelines. Applied to the present study, the authors could compute, for each cue-to-target offset bin, its proportional contribution to each pre-movement time bin. Microsaccade-locked sensitivities could then be reweighted based on these proportions. As a result, each movement-locked time bin would contain equal contributions from all cue-locked time bins, effectively isolating the effect of microsaccade preparation.

      (3) Interpretation and analysis of the time course

      (3.1) Discrimination before microsaccade onset<br /> In lines 151-153, the author state "While the enhancement at the target location did not reach significance relative to baseline, the impairment at the non-target location did", suggesting that pre-movement sensitivity advantages for information presented at the target location are due to a decrease in performance at the non-target location and not an enhancement at the target location per se. After analyzing the difference between the two locations, the authors state, "These results show that approximately 100 milliseconds before microsaccade onset, discrimination rapidly improved at the intended target location while decreasing at the non-target location." (l. 159-161). How is the statement that discrimination performance rapidly improved (which is repeated throughout the manuscript) justified by the results?

      More generally, the authors may benefit from applying bootstrapping or permutation-based analyses to their data. Such approaches would, for example, allow direct comparisons between congruent and incongruent conditions at every individual time point in Figure 3B and may be more sensitive to temporally confined sensitivity variations while requiring fewer assumptions than analyses based on manually segregated temporal bins and aggregate measures. If enhancement at the target location does not reach significance even in these analyses, all corresponding statements should be removed throughout the manuscript. The term "enhancement" should then be rephrased as "detection advantage" or "relative performance benefit" to emphasize the contrast to enhancement effects classically associated with pre-saccadic attention shifts.

      Relatedly, the authors state that pre-microsaccadic enhancement peaks around 70 ms before microsaccade onset, which is earlier than sensitivity enhancements preceding large-scale saccades that often increase monotonically up until movement onset. The authors suggest potential reasons for this in the Discussion, yet an additional one seems conceivable based on Figure 3B. Performances at both the cue-congruent and incongruent location decrease leading up to the movement, reaching values even below their early baselines around 100 ms and 25 ms before movement onset for the incongruent and congruent location, respectively. A spatially non-specific decline that drives sensitivities toward a common absolute minimum may thus dictate the time course of detection advantages. In other words, a spatially widespread decrease in foveolar sensitivity likely contributes to both "suppression" at the non-target location and the decrease in "enhancement" at the target location. If this general decrease is due to saccadic suppression, as the authors suggest, it appears to exert a much more pronounced influence on sensitivity modulations than it does before large-scale saccades (which is interesting). Are there other findings suggesting an increased magnitude of micro-saccadic (as compared to saccadic) suppression? Another potentially related phenomenon is the decrease in pre-saccadic foveal detection performances reported twice before (Hanning & Deubel, 2022; Kroell & Rolfs, 2022). It is possible that whatever mechanism triggers this decrease is engaged by the preparation of microsaccadic and saccadic motor programs alike. In any case, I would ask the authors to acknowledge this general decrease early on to clarify that any currently significant advantage for the target location relies on varied degrees of suppression, and not on true enhancement similar to pre-saccadic attention shifts.

      Moreover, in Figure 3C, the final 25 ms before microsaccade onset are excluded from the aggregate measure, presumably since including this interval substantially reduces the effect size. Since the last 25 ms before movement onset is the interval most commonly associated with saccadic suppression, I think that this choice can be justified. Nonetheless, it should be mentioned explicitly in the main text. On a minor note, the authors state that "Performance (evaluated as percent of correct responses) was averaged within a 50 millisecond sliding window, advancing in 1 ms steps (with 24 ms overlap)". Why is the overlap not 49 ms?

      (3.2) Discrimination during the microsaccade:<br /> The authors state that "in the "during" trials the target must be presented during the peak speed of the microsaccade." Since the target was presented for 50 ms and the average microsaccade duration was around 60 ms, this implies that the intra-microsaccadic condition includes many trials in which the target overlapped with the pre- or post-movement fixation interval. Were there not enough trials to isolate purely intra-microsaccadic presentations? Are the results descriptively comparable?

      (4) Additional analyses

      Several additional analyses could strengthen the authors' conclusions. If there are enough trials in which observers erroneously saccaded to the uncued (i.e., wrong) location, these trials could experimentally isolate the influence of pre-microsaccadic attention, assuming that endogenous attention went to the cued location. In addition, the authors speculate whether differences in saccadic and microsaccadic movement latencies may underlie the differences in perceptual time courses. The latency distributions provided in the manuscript look sufficiently broad, such that intra-individual variation could be harnessed to explore this question. Do sensitivity time courses differ before microsaccades with shorter vs. longer latencies?

      (5) Clarifications regarding the design

      At 50 ms, the duration of the to-be discriminated stimulus, although shorter than in previous investigations, is still rather long. What is the reason for this? I would encourage the authors to state in the main text that the duration of the analyzed/plotted time bins is often shorter than the stimulus duration (i.e., there is some overlap between bins that likely introduces smoothing). In Figure 3A, it would be helpful to plot raw data points computed from non-overlapping bins on top of the moving-window estimates, to allow readers to assess the degree of smoothing and potential temporal delays introduced by this analysis. Moreover, I wonder whether the abrupt onset of the target unmasked by flickering noise masks might induce saccadic inhibition, which would manifest as a transient dip in saccade execution probability. The distributions shown in Figure 2B appear too smoothed or fitted to clearly reveal such a dip. How exactly are all distributions in the manuscript computed (e.g., binning, smoothing, fitting procedures)? Finally, on a minor note, explicitly stating on line 105 that two different orientations can be presented at the cued and non-cued location would help avoid potential confusion.

    3. Reviewer #2 (Public review):

      Summary and overall evaluation:

      The authors assessed how visual discrimination of stimuli in the foveola changes before, during, and after small instructed eye movements (in the "micro" range). Consistent with (and advancing) related prior work, their main finding regards a pre-saccadic modulation of visual performance at the saccade target vs. the opposite location. This pre-saccadic modulation in foveal vision peaks ~70 ms prior to the instructed small saccade.

      Strengths:

      The study uses an impressive, technically advanced set-up and zooms in on peri-saccadic modulations in visual acuity at the micro scale. The findings build on related prior findings from the literature on smaller and larger eye movements and add temporal granularity over prior work from the same lab. The writing is easy to follow, and the figures are clear.

      Weaknesses:

      At the same time, the findings remain relatively empirical in nature and do not profoundly advance theoretical understanding beyond adding valuable granularity to existing knowledge. Relevant prior literature could be better introduced and acknowledged. In addition, there remain concerns regarding potential cue-driven attentional influences that may confound the reported effects (leaving the possibility that the reported effects may be related to cue-driven attention, rather than saccade planning/execution per se). There are also some issues regarding specific statistical inferences. I detail these points below.

      Major Points:

      (1) Novelty framing and introduction of relevant prior literature

      At times, this study is introduced as if no prior study explored the time course of changes in visual perception surrounding small (micro) saccades. Yet, it appears that a prior study from the same lab, using a very similar task, already showed a time course (Figure 5 in Shelchkova & Poletti, 2020). While this study is discussed in the introduction, it is not mentioned that at least some pre-saccade time course was already reported there, albeit a more crude one than the one in the current article. Moreover, the 2013 study by Hafed also specifically looked at "peri-microsaccade modulation in visual perception" and also already showed a temporal modulation that peaked ~50 ms before microsaccade onset. I appreciate how the current study differs in a number of ways (focusing on visual acuity in the foveola), but I was nevertheless surprised to see the first reference to this relevant prior finding in the discussion (and without any elaboration). Though more recent, the same could be argued for the 2025 study by Bouhnik et al. on pre-microsaccade modulations in visual processing in V1, which, like the Hafed study, is first mentioned only in the discussion. Perhaps these studies could be introduced in the paragraph starting at line 48, or in the next paragraph, to do better justice to the existing literature on this topic when motivating the study. This would likely also help to better point out the major advances provided by the current study.

      Relatedly, in Shelchkova & Poletti (PNAS, 2020), an apparently similar congruency effect on performance was reported >200 ms milliseconds before saccade onset, as evident from Fig 5 in that article. How should readers rhyme this with the current findings? Ideally, the authors would not only acknowledge that such a time course was already reported previously, but also discuss the discrepancies between these findings further: why may the performance effects appear much earlier in this prior study compared to in the current study, where the congruency effect emerges only ~100 ms prior to the instructed small saccade?

      (2) Saccade- or cue-driven? (assumption that attention is unaltered in failed saccade trials)

      Because the authors used a cue to instruct saccade direction, it remains a possibility that the reported modulations in visual performance may be driven directly by the spatial cue (cue-related attentional allocation), rather than the instructed small saccade per se. While the authors are clearly aware of this potential confound, questions remain regarding the convincingness of the presented control analyses. In my view, a more compelling control would require an additional experiment.

      The central argument against a cue-locked (purely attentional) modulation is the absence of a performance modulation in so-called "failed" saccade trials. However, a key assumption here is that putative cue-driven attention was unaltered in these trials. This is never verified and, in my opinion, highly unlikely. Rather, trials with failed microsaccades could very well be the result of failing to process the cue in the first place (indeed, if the task is to make a saccade to the cue, failure to make a saccade equates failure to perform the task). In such trials, any putative cue-driven influences over spatial attention would also be expected to be substantially reduced. Accordingly, just because failed saccade trials show little performance modulation does not rule out cue-driven attention effects, because attention may also have "failed" in these failed saccade trials. The control for potential cue-driven attention effects would be more convincing if the authors included a condition with the same cues, where participants are simply not instructed to make any saccades to the cues. Unfortunately, such an experimental condition appears not to have been included here. The author may still consider adding such a control experiment.

      Another argument against a cue-driven effect is that the authors found no interaction with time in the cue-locked data, whereas they did find such an interaction in the saccade-locked data. However, the lack of significance in the cue-locked data but significance in the saccade-locked data is not strong evidence against a cue-driven influence. Statistically, there is no direct comparison here, and more importantly, with longer delays, the cue-locked data may also start to show a dip (this could potentially be tested by the authors if they have enough trials available to extend their cue-locked analysis further in time). Indeed, exogenous attention, that may have been automatically evoked by the spatial cue, is known to be transient and to eventually even reverse after a brief initial facilitation (see e.g., Klein TiCS, 2000).

      Finally, the authors consistently refer to "endogenous" attention (starting at line 221) when addressing potential cue-driven attention confounds. However, because the cue is not predictive, but is a spatial cue that differs in a bottom-up manner between left and right cues, "exogenous" attention is a more likely confound here in my view. Specifically, the spatial cue may automatically trigger attention in the direction of the target location it points to (and such exogenous effects would be expected even for unpredictive cues).

      (3) Benefit and cost, or just cost?

      Line 151 states that no statistically significant benefit for the saccade target was found compared to the neutral baseline. Yet, the claim throughout the article is distinct, such as in line 159: "These results show that approximately 100 milliseconds before microsaccade onset, discrimination rapidly improved at the intended target location". I do not question the robustness of the congruency effect, but the authors should be more careful when inferring "improved" perception at the target location because, as far as I could tell (as well as in the authors' own writing in line 151), this is not substantiated statistically when compared to the neutral baseline.

      Related to this point, in Figure 3B, it would be informative to also see the average performance in the neutral cue condition (for example, as a straight line as in some other figures). This would help to better appreciate the relative benefits and/or costs compared to the neutral condition, also in the time-resolved data.

      (4) Statistical inference for the comparison between failed and non-failed trials

      Currently, the lack of modulation in the failed saccade trials hinges on a null effect. It would be stronger to support the claims with a significant difference in the congruency effect between failed and non-failed trials. Indeed, lack of significance in failed saccade trials does by itself not constitute valid evidence that the congruency effect is larger in saccade compared to failed saccade trials. For this, a significant interaction between saccade-trial-type (failed/non-failed) and congruency (congruent/incongruent) should be established (see e.g., Nieuwenhuis et al., Nat Neurosci, 2011).

      (5) Time window justification

      While the authors nicely depict their data across the full time axis, all statistics are currently performed on data extracted from specific time windows. How exactly were these time windows determined and justified? Likewise, how were the specific times picked for visualizing and statistically quantifying the data in e.g., Figures 3D and E? It would be reassuring to add justification for these specific time windows and/or to verify (using follow-up analyses) that the presented results are robust when different time windows are chosen.

      (6) Microsaccade definition

      Microsaccades are explicitly defined as being below half a degree. This appears rather arbitrary and rigid. Does the size of saccades not ultimately depend on the task and stimulus (e.g., Otero-Millan et al., PNAS, 2013) rather than being a fixed biological property? Perhaps this could be stated less rigidly, such as by stating how microsaccades are often observed below 0.5 degrees.

      (Relatedly, one may wonder whether the type of instructed saccades that the authors studied here involves the same type of eye movements as the type of fixational microsaccades that have been the focus of ample prior studies. However, I recognize that this specific reflection may open a debate that is beyond the scope of this article.

    1. eLife Assessment

      This important study identifies a novel role for Hes5+ astrocytes in modulating the activity of descending pain-inhibitory noradrenergic neurons from the locus coeruleus during stress-induced pain facilitation. The role of glia in modulating neurological circuits including pain is poorly understood, and in that light, the role of Hes5+ astrocytes in this circuit is a key finding with broader potential impacts. This work is supported by convincing evidence, albeit somewhat limited by the indirect nature of the evidence linking adenosine to nearby neuronal modulation, and possible questions on the population specificity of the transgenic approach.

    2. Reviewer #1 (Public review):

      Review of the revised submission:

      I thank the authors for their detailed consideration of my comments and for the additional data, analyses, and clarifications they have incorporated. The new behavioral experiments, quantification of targeted manipulations, and expanded methodological details strengthen the manuscript and address many of my initial concerns. While some questions remain for future work, the authors' careful responses and the additional evidence provided help resolve the main issues I raised, and I am generally satisfied with the revisions.

      Review of original submission:

      Summary

      In this article, Kawanabe-Kobayashi et al., aim to examine the mechanisms by which stress can modulate pain in mice. They focus on the contribution of noradrenergic neurons (NA) of the locus coeruleus (LC). The authors use acute restraint stress as a stress paradigm and found that following one hour of restraint stress mice display mechanical hypersensitivity. They show that restraint stress causes the activation of LC NA neurons and the release of NA in the spinal cord dorsal horn (SDH). They then examine the spinal mechanisms by which LC→SDH NA produces mechanical hypersensitivity. The authors provide evidence that NA can act on alphaA1Rs expressed by a class of astrocytes defined by the expression of Hes (Hes+). Furthermore, they found that NA, presumably through astrocytic release of ATP following NA action on alphaA1Rs Hes+ astrocytes, can cause an adenosine-mediated inhibition of SDH inhibitory interneurons. They propose that this disinhibition mechanism could explain how restraint stress can cause the mechanical hypersensitivity they measured in their behavioral experiments.

      Strengths:

      (1) Significance. Stress profoundly influences pain perception; resolving the mechanisms by which stress alters nociception in rodents may explain the well-known phenomenon of stress-induced analgesia and/or facilitate the development of therapies to mitigate the negative consequences of chronic stress on chronic pain.

      (2) Novelty. The authors' findings reveal a crucial contribution of Hes+ spinal astrocytes in the modulation of pain thresholds during stress.

      (3) Techniques. This study combines multiple approaches to dissect circuit, cellular, and molecular mechanisms including optical recordings of neural and astrocytic Ca2+ activity in behaving mice, intersectional genetic strategies, cell ablation, optogenetics, chemogenetics, CRISPR-based gene knockdown, slice electrophysiology, and behavior.

      Weaknesses:

      (1) Mouse model of stress. Although chronic stress can increase sensitivity to somatosensory stimuli and contribute to hyperalgesia and anhedonia, particularly in the context of chronic pain states, acute stress is well known to produce analgesia in humans and rodents. The experimental design used by the authors consists of a single one-hour session of restraint stress followed by 30 min to one hour of habituation and measurement of cutaneous mechanical sensitivity with von Frey filaments. This acute stress behavioral paradigm corresponds to the conditions in which the clinical phenomenon of stress-induced analgesia is observed in humans, as well as in animal models. Surprisingly, however, the authors measured that this acute stressor produced hypersensitivity rather than antinociception. This discrepancy is significant and requires further investigation.

      (2) Specifically, is the hypersensitivity to mechanical stimulation also observed in response to heat or cold on a hotplate or coldplate?

      (3) Using other stress models, such as a forced swim, do the authors also observe acute stress-induced hypersensitivity instead of stress-induced antinociception?

      (4) Measurement of stress hormones in blood would provide an objective measure of the stress of the animals.

      (5) Results:

      (a) Optical recordings of Ca2+ activity in behaving rodents are particularly useful to investigate the relationship between Ca2+ dynamics and the behaviors displayed by rodents.

      (b) The authors report an increase in Ca2+ events in LC NA neurons during restraint stress: Did mice display specific behaviors at the time these Ca2+ events were observed such as movements to escape or orofacial behaviors including head movements or whisking?

      (c) Additionally, are similar increases in Ca2+ events in LC NA neurons observed during other stressful behavioral paradigms versus non-stressful paradigms?

      (d) Neuronal ablation to reveal the function of a cell population.

      (e) The proportion of LC NA neurons and LC→SDH NA neurons expressing DTR-GFP and ablated should be quantified (Figures 1G and J) to validate the methods and permit interpretation of the behavioral data (Figures 1H and K). Importantly, the nocifensive responses and behavior of these mice in other pain assays in the absence of stress (e.g., hotplate) and a few standard assays (open field, rotarod, elevated plus maze) would help determine the consequences of cell ablation on processing of nociceptive information and general behavior.

      (f) Confirmation of LC NA neuron function with other methods that alter neuronal excitability or neurotransmission instead of destroying the circuit investigated, such as chemogenetics or chemogenetics, would greatly strengthen the findings. Optogenetics is used in Figure 1M, N but excitation of LC→SDH NA neuron terminals is tested instead of inhibition (to mimic ablation), and in naïve mice instead of stressed mice.

      (g) Alpha1Ars. The authors noted that "Adra1a mRNA is also expressed in INs in the SDH".

      (h) The authors should comprehensively indicate what other cell types present in the spinal cord and neurons projecting to the spinal cord express alpha1Ars and what is the relative expression level of alpha1Ars in these different cell types.

      (i) The conditional KO of alpha1Ars specifically in Hes5+ astrocytes and not in other cell types expressing alpha1Ars should be quantified and validated (Figure 2H).

      (j) Depolarization of SDH inhibitory interneurons by NA (Figure 3). The authors' bath applied NA, which presumably activates all NA receptors present in the preparation.

      k) The authors' model (Figure 4H) implies that NA released by LC→SDH NA neurons leads to the inhibition of SDH inhibitory interneurons by NA. In other experiments (Figure 1L, Figure 2A), the authors used optogenetics to promote the release of endogenous NA in SDH by LC→SDH NA neurons. This approach would investigate the function of NA endogenously released by LC NA neurons at presynaptic terminals in the SDH and at physiological concentrations and would test the model more convincingly compared to the bath application of NA.

      (l) As for other experiments, the proportion of Hes+ astrocytes that express hM3Dq, and the absence of expression in other cells, should be quantified and validated to interpret behavioral data.

      (m) Showing that the effect of CNO is dose-dependent would strengthen the authors' findings.

      (n) The proportion of SG neurons for which CNO bath application resulted in a reduction in recorded sIPSCs is not clear.

      (o) A1Rs. The specific expression of Cas9 and guide RNAs, and the specific KD of A1Rs, in inhibitory interneurons but not in other cell types expressing A1Rs should be quantified and validated.

      (6) Methods:

      It is unclear how fiber photometry is performed using "optic cannula" during restraint stress while mice are in a 50ml falcon tube (as shown in Figure 1A).

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of spinal astrocytes in mediating stress-induced pain hypersensitivity, focusing on the LC (locus coeruleus)-to-SDH (spinal dorsal horn) circuit and its mechanisms. The authors aimed to delineate how LC activity contributes to spinal astrocytic activation under stress conditions, explore the role of noradrenaline (NA) signaling in this process, and identify the downstream astrocytic mechanisms that influence pain hypersensitivity.

      The authors provide strong evidence that 1-hour restraint stress-induced pain hypersensitivity involves the LC-to-SDH circuit, where NA triggers astrocytic calcium activity via alpha1a adrenoceptors (alpha1aRs). Blockade of alpha1aRs on astrocytes-but not on Vgat-positive SDH neurons-reduced stress-induced pain hypersensitivity. These findings are rigorously supported by well-established behavioral models and advanced genetic techniques, uncovering the critical role of spinal astrocytes in modulating stress-induced pain.

      However, the study's third aim-to establish a pathway from astrocyte alpha1aRs to adenosine-mediated inhibition of SDH-Vgat neurons-is less compelling. While pharmacological and behavioral evidence is intriguing, the ex vivo findings are indirect and lack a clear connection to the stress-induced pain model. Despite these limitations, the study advances our understanding of astrocyte-neuron interactions in stress-pain contexts and provides a strong foundation for future research into glial mechanisms in pain hypersensitivity.

      Strengths:

      The study is built on a robust experimental design using a validated 1-hour restraint stress model, providing a reliable framework to investigate stress-induced pain hypersensitivity. The authors utilized advanced genetic tools, including retrograde AAVs, optogenetics, chemogenetics, and subpopulation-specific knockouts, allowing precise manipulation and interrogation of the LC-SDH circuit and astrocytic roles in pain modulation. Clear evidence demonstrates that NA triggers astrocytic calcium activity via alpha1aRs, and blocking these receptors effectively reduces stress-induced pain hypersensitivity.

      Weaknesses:

      The study offers mainly indirect evidence for astrocyte-released adenosine acting on SDH-VGAT neurons. The potential contributions of astrocyte-derived D-serine and adenosine to different spinal neuron subtypes, as well as the transient "dip" in astrocytic calcium following LC optostimulation, merit further clarification in future work once appropriate tools become available.

      Comments on revisions:

      The authors have thoroughly addressed my previous comments, resolving most of the points I raised except those noted in the "Weaknesses" section above. I understand that some of these aspects will require future tool development.

    4. Reviewer #3 (Public review):

      Summary

      This is an exciting and timely study addressing the role of descending noradrenergic systems in nocifensive responses. While it is well-established that spinally released noradrenaline (aka norepinephrine) generally acts as an inhibitory factor in spinal sensory processing, this system is highly complex. Descending projections from the A6 (locus coeruleus, LC) and the A5 regions typically modulate spinal sensory processing and reduce pain behaviours, but certain subpopulations of LC neurons have been shown to mediate pronociceptive effects, such as those projecting to the prefrontal cortex (Hirshberg et al., PMID: 29027903).

      The study proposes that descending cerulean noradrenergic neurons potentiate touch sensation via alpha-1 adrenoceptors on Hes5+ spinal astrocytes, contributing to mechanical hyperalgesia. This finding is consistent with prior work from the same group (dd et al., PMID:). However, caution is needed when generalising about LC projections, as the locus coeruleus is functionally diverse, with differences in targets, neurotransmitter co-release, and behavioural effects. Specifying the subpopulations of LC neurons involved would significantly enhance the impact and interpretability of the findings.

      Strengths

      The study employs state-of-the-art molecular, genetic, and neurophysiological methods, including precise CRISPR and optogenetic targeting, to investigate the role of Hes5+ astrocytes. This approach is elegant and highlights the often-overlooked contribution of astrocytes in spinal sensory gating. The data convincingly support the role of Hes5+ astrocytes as regulators of touch sensation, coordinated by brain-derived noradrenaline in the spinal dorsal horn, opening new avenues for research into pain and touch modulation.

      Furthermore, the data support a model in which superficial dorsal horn (SDH) Hes5+ astrocytes act as non-neuronal gating cells for brain-derived noradrenergic (NA) signalling through their interaction with substantia gelatinosa inhibitory interneurons. Locally released adenosine from NA-stimulated Hes5+ astrocytes, following acute restraint stress, may suppress the function of SDH-Vgat+ inhibitory interneurons, resulting in mechanical pain hypersensitivity. However, the spatially restricted neuron-astrocyte communication underlying this mechanism requires further investigation in future studies.

      Comments on revisions:

      One important point remains insufficiently resolved. In Figure S4C, two of the three visible neurons in the A5 example appear to show a white "halo" at the cell border, suggesting a merge of eGFP (green) and TH (magenta) and therefore possible transgene positivity. To draw a confident conclusion about the specificity of the approach for the A6 (LC) population, the authors are kindly asked to provide high-resolution images of several representative A5 sections, presented both as merged and as separate colour channels. Ideally, quantification across multiple rostrocaudal sections of A5, A6 and A7 should be provided. This is essential for determining whether any transgene expression occurs within the A5 nucleus, particularly given its several-millimetre rostrocaudal extent. As the behavioural phenotype arises from manipulation of only a small subset of A6 neurons, ruling out any contribution from A5 (or A7) is critical for validating pathway specificity, especially in light of prior reports showing that similar approaches can label A5 fibres.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      In this article, Kawanabe-Kobayashi et al., aim to examine the mechanisms by which stress can modulate pain in mice. They focus on the contribution of noradrenergic neurons (NA) of the locus coeruleus (LC). The authors use acute restraint stress as a stress paradigm and found that following one hour of restraint stress mice display mechanical hypersensitivity. They show that restraint stress causes the activation of LC NA neurons and the release of NA in the spinal cord dorsal horn (SDH). They then examine the spinal mechanisms by which LC→SDH NA produces mechanical hypersensitivity. The authors provide evidence that NA can act on alphaA1Rs expressed by a class of astrocytes defined by the expression of Hes (Hes+). Furthermore, they found that NA, presumably through astrocytic release of ATP following NA action on alphaA1Rs Hes+ astrocytes, can cause an adenosine-mediated inhibition of SDH inhibitory interneurons. They propose that this disinhibition mechanism could explain how restraint stress can cause the mechanical hypersensitivity they measured in their behavioral experiments.

      Strengths:

      (1) Significance. Stress profoundly influences pain perception; resolving the mechanisms by which stress alters nociception in rodents may explain the well-known phenomenon of stress-induced analgesia and/or facilitate the development of therapies to mitigate the negative consequences of chronic stress on chronic pain.

      (2) Novelty. The authors' findings reveal a crucial contribution of Hes+ spinal astrocytes in the modulation of pain thresholds during stress.

      (3) Techniques. This study combines multiple approaches to dissect circuit, cellular, and molecular mechanisms including optical recordings of neural and astrocytic Ca2+ activity in behaving mice, intersectional genetic strategies, cell ablation, optogenetics, chemogenetics, CRISPR-based gene knockdown, slice electrophysiology, and behavior.

      Weaknesses:

      (1) Mouse model of stress. Although chronic stress can increase sensitivity to somatosensory stimuli and contribute to hyperalgesia and anhedonia, particularly in the context of chronic pain states, acute stress is well known to produce analgesia in humans and rodents. The experimental design used by the authors consists of a single one-hour session of restraint stress followed by 30 min to one hour of habituation and measurement of cutaneous mechanical sensitivity with von Frey filaments. This acute stress behavioral paradigm corresponds to the conditions in which the clinical phenomenon of stress-induced analgesia is observed in humans, as well as in animal models. Surprisingly, however, the authors measured that this acute stressor produced hypersensitivity rather than antinociception. This discrepancy is significant and requires further investigation.

      We thank the reviewer for evaluating our work and for highlighting both its strengths and weaknesses. As stated by the reviewer, numerous studies have reported acute stress-induced antinociception. However, as shown in a new additional table (Table S1) in which we have summarized previously published data using the acute restraint stress model employed in our present study, most studies reporting antinociceptive effects of acute restraint stress assessed behavioral responses to heat stimuli or formalin. This observation is consistent with the findings from our previous study (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215)). The present study also confirms that acute restraint stress reduces behavioral responses to noxious heat (see also our response to Comment #2 below). In contrast to the robust and consistent antinociceptive effects observed with thermal stimuli, some studies evaluating behavioral responses to mechanical stimuli have reported stress-induced hypersensitivity (see Table S1), which aligns with our current findings. Taken together, these data support our original notion that the effects of acute stress on pain-related behaviors depend on several factors, including the nature, duration, and intensity of the stressor, as well as the sensory modality assessed in behavioral tests. We have incorporated this discussion and Table S1 into the revised manuscript (lines 344-353). Furthermore, we have slightly modified the text including the title, replacing "pain facilitation" with "mechanical pain hypersensitivity" to more accurately reflect our research focus and the conclusion of this study that LC<sup>→SDH</sup> NAergic signaling to spinal astrocytes is required for stress-induced mechanical pain hypersensitivity. Finally, while mouse models of stress could provide valuable insights, the clinical relevance of stress-induced mechanical pain hypersensitivity remains to be elucidated and requires further investigation. We hope these clarifications address your concerns.

      (2) Specifically, is the hypersensitivity to mechanical stimulation also observed in response to heat or cold on a hotplate or coldplate?

      Thank you for your important comment. We have now conducted additional behavioral experiments to assess responses to heat using the hot-plate test. We found that mice subjected to restraint stress did not exhibit behavioral hypersensitivity to heat stimuli; instead, they displayed antinociceptive responses (Figure S2; lines 95-98). These results are consistent with our previous findings (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215)) as well as numerous other reports (Table S1).

      (3) Using other stress models, such as a forced swim, do the authors also observe acute stress-induced hypersensitivity instead of stress-induced antinociception?

      As suggested by the reviewer, we conducted a forced swim test. We found that mice subjected to forced swimming, which has been reported to produce analgesic effects on thermal stimuli (Contet et al., Neuropsychopharmacology, 2006 (PMID: 16237385)), did not exhibit any changes in mechanical pain hypersensitivity (Figure S2; lines 98-99). Furthermore, a previous study demonstrated that mechanical pain sensitivity is enhanced by other stress models, such as exposure to an elevated open platform for 30 min (Kawabata et al., Neuroscience, 2023 (PMID: 37211084)). However, considering our data showing that changes in mechanosensory behavior induced by restraint stress depend on the duration of exposure (Figure S1), and that restraint stress also produced an antinociceptive effect on heat stimuli (Figure S2), stress-induced modulation of pain is a complex phenomenon influenced by multiple factors, including the stress model, intensity, and duration, as well as the sensory modality used for behavioral testing (lines 100-103).

      (4) Measurement of stress hormones in blood would provide an objective measure of the stress of the animals.

      A previous study has demonstrated that plasma corticosterone levels—a stress hormone—are elevated following a 1-hour exposure to restraint stress in mice (Kim et al., Sci Rep, 2018 (PMID: 30104581)), using a stress protocol similar to that employed in our current study. We have included this information with citing this paper (lines 104-105).

      (5) Results:

      (a) Optical recordings of Ca2+ activity in behaving rodents are particularly useful to investigate the relationship between Ca2+ dynamics and the behaviors displayed by rodents.

      In the optical recordings of Ca<sup>2+</sup> activity in LC neurons, we monitored mouse behavior during stress exposure. We have now included a video of this in the revised manuscript (video; lines 111-114).

      (b) The authors report an increase in Ca2+ events in LC NA neurons during restraint stress: Did mice display specific behaviors at the time these Ca2+ events were observed such as movements to escape or orofacial behaviors including head movements or whisking?

      By reanalyzing the temporal relationship between Ca<sup>2+</sup> events and mouse behavior during stress exposure, we found that the Ca<sup>2+</sup> transients and escape behaviors (struggling) occurred almost simultaneously (video). A similar temporal correlation is also observed in Ca<sup>2+</sup> responses in the bed nucleus of the stria terminalis (Luchsinger et al., Nat Commun, 2021 (PMID: 34117229)). The video file has been included in the revised manuscript (video; lines 111-113, 552-553, 573-575).

      Additionally, as described in the Methods section and shown in Figure S2 of the initial version (now Figure S3), non-specific signals or artifacts—such as those caused by head movements—were corrected (although such responses were minimal in our recordings).

      (c) Additionally, are similar increases in Ca2+ events in LC NA neurons observed during other stressful behavioral paradigms versus non-stressful paradigms?

      We appreciate the reviewer's valuable suggestion. Since the present, initial version of our manuscript focused on acute restraint stress, we did not measure Ca<sup>2+</sup> events in LC-NA neurons in other stress models, but a recent study has shown an increase in Ca<sup>2+</sup> responses in LC-NA neurons by social defeat stress (Seiriki et al., BioRxiv, https://www.biorxiv.org/content/10.1101/2025.03.07.641347v1).

      (d) Neuronal ablation to reveal the function of a cell population.

      This method has been widely used in numerous previous studies as an effective experimental approach to investigate the role of specific neuronal populations—including SDH-projecting LC-NA neurons (Ma et al., Brain Res, 2022 (PMID: 34929182); Kawanabe et al., Mol Brain, 2021 (PMID: 33971918))—in CNS function.

      (e) The proportion of LC NA neurons and LC→SDH NA neurons expressing DTR-GFP and ablated should be quantified (Figures 1G and J) to validate the methods and permit interpretation of the behavioral data (Figures 1H and K). Importantly, the nocifensive responses and behavior of these mice in other pain assays in the absence of stress (e.g., hotplate) and a few standard assays (open field, rotarod, elevated plus maze) would help determine the consequences of cell ablation on processing of nociceptive information and general behavior.

      As suggested, we conducted additional experiments to quantitatively analyze the number of LC<sup>→SDH</sup>-NA neurons. We used WT mice injected with AAVretro-Cre into the SDH (L4 segment) and AAV-FLEx[DTR-EGFP] into the LC. In these mice, 4.4% of total LC-NA neurons [positive for tyrosine hydroxylase (TH)] expressed DTR-GFP, representing the LC<sup>→SDH</sup>-NA neuronal population (Figure S4; lines 126-127). Furthermore, treatment with DTX successfully ablated the DTR-expressing LC<sup>→SDH</sup>-NA neurons. Importantly, the neurons quantified in this analysis were specifically those projecting to the L4 segment of the SDH; therefore, the total number of SDH-projecting LC-NA neurons across all spinal segments is expected to be much higher.

      We also performed the rotarod and paw-flick tests to assess motor function and thermal sensitivity following ablation of LC<sup>→SDH</sup>-NA neurons. No significant differences were observed between the ablated and control groups (Figure S5; lines 131-134), indicating that ablation of these neurons does not produce non-specific behavioral deficits in motor function or other sensory modalities.

      (f) Confirmation of LC NA neuron function with other methods that alter neuronal excitability or neurotransmission instead of destroying the circuit investigated, such as chemogenetics or chemogenetics, would greatly strengthen the findings. Optogenetics is used in Figure 1M, N but excitation of LCLC<sup>→SDH</sup> NA neuron terminals is tested instead of inhibition (to mimic ablation), and in naïve mice instead of stressed mice.

      We appreciate the reviewer’s comment. The optogenetic approach is useful for manipulating neuronal excitability; however, prolonged light illumination (> tens of seconds) can lead to undesirable tissue heating, ionic imbalance, and rebound spikes (Wiegert et al., Neuron, 2017 (PMID: 28772120)), making it difficult to apply in our experiments, in which mice are exposed to stress for 60 min. For this reason, we decided to employ the cell-ablation approach in stress experiments, as it is more suitable than optogenetic inhibition. In addition, as described in our response to weakness (1)-a) by Reviewer 3 (Public review), we have now demonstrated the specific expression of DTRs in NA neurons in the LC, but not in A5 or A7 (Figure S4; lines 127-128), confirming the specificity of LCLC<sup>→SDH</sup>-NAergic pathway targeting in our study. Chemogenetics represent another promising approach to further strengthen our findings on the role of LCLC<sup>→SDH</sup>-NA neurons, but this will be an important subject for future studies, as it will require extensive experiments to assess, for example, the effectiveness of chemogenetic inhibition of these neurons during 60 min of restraint stress, as well as optimization of key parameters (e.g., systemic DCZ doses).

      (g) Alpha1Ars. The authors noted that "Adra1a mRNA is also expressed in INs in the SDH".

      The expression of α<sub>1A</sub>Rs in inhibitory interneurons in the SDH is consistent with our previous findings (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215)) as well as with scRNA-seq data (http://linnarssonlab.org/dorsalhorn/, Häring et al., Nat Neurosci, 2018 (PMID: 29686262)).

      (h) The authors should comprehensively indicate what other cell types present in the spinal cord and neurons projecting to the spinal cord express alpha1Ars and what is the relative expression level of alpha1Ars in these different cell types.

      According to the scRNA-seq data (https://seqseek.ninds.nih.gov/genes, Russ et al., Nat Commun, 2021 (PMID: 34588430); http://linnarssonlab.org/dorsalhorn/, Häring et al., Nat Neurosci, 2018 (PMID: 29686262)), we confirmed that α<sub>1A</sub>Rs are predominantly expressed in astrocytes and inhibitory interneurons in the spinal cord. Also, an α<sub>1A</sub>R-expressing excitatory neuron population (Glut14) expresses Tacr1, GPR83, and Tac1 mRNAs, markers that are known to be enriched in projection neurons of the SDH. This raises the possibility that α<sub>1A</sub> Rs may also be expressed in a subset of projection neurons, although further experiments are required to confirm this. In DRG neurons, α<sub>1A</sub>R expression was detected to some extent, but its level seems to be much lower than in the spinal cord (http://linnarssonlab.org/drg/ Usoskin et al., Nat Neurosci, 2015 (PMID: 25420068)). Consistent with this, primary afferent glutamatergic synaptic transmission has been shown to be unaffected by α<sub>1A</sub>R agonists (Kawasaki et al., Anesthesiology, 2003 (PMID: 12606912); Li and Eisenach, JPET, 2001 (PMID: 11714880)). This information has been incorporated into the Discussion section (lines 317-319).

      (i) The conditional KO of alpha1Ars specifically in Hes5+ astrocytes and not in other cell types expressing alpha1Ars should be quantified and validated (Figure 2H).

      We have previously shown a selective KO of α<sub>1A</sub>R in Hes5<sup>+</sup> astrocytes in the same mouse line (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). This information has been included in the revised text (line 166-167).

      (j) Depolarization of SDH inhibitory interneurons by NA (Figure 3). The authors' bath applied NA, which presumably activates all NA receptors present in the preparation.

      We believe that the reviewer’s concern may pertain to the possibility that NA acts on non-Vgat<sup>+</sup> neurons, thereby indirectly causing depolarization of Vgat<sup>+</sup> neurons. As described in the Method section of the initial version, in our electrophysiological experiments, we added four antagonists for excitatory and inhibitory neurotransmitter receptors—CNQX (AMPA receptor), MK-801 (NMDA receptor), bicuculline (GABA<sub>A</sub> receptor), and strychnine (glycine receptor)—to the artificial cerebrospinal fluid to block synaptic inputs from other neurons to the recorded Vgat<sup>+</sup> neurons. Since this method is widely used for this purpose in many previous studies (Wu et al., J Neurosci, 2004 (PMID: 15140934); Liu et al., Nat Neurosci, 2010 (PMID: 20835251)), it is reasonable to conclude that NA directly acts on the recorded SDH Vgat<sup>+</sup> interneurons to produce excitation (lines 193-196).

      (k) The authors' model (Figure 4H) implies that NA released by LC→SDH NA neurons leads to the inhibition of SDH inhibitory interneurons by NA. In other experiments (Figure 1L, Figure 2A), the authors used optogenetics to promote the release of endogenous NA in SDH by LC→SDH NA neurons. This approach would investigate the function of NA endogenously released by LC NA neurons at presynaptic terminals in the SDH and at physiological concentrations and would test the model more convincingly compared to the bath application of NA.

      We appreciate the reviewer’s valuable comment. As noted, optogenetic stimulation of LC<sup>→SDH</sup>-NA neurons would indeed be useful to test this model. However, in our case, it is technically difficult to investigate the responses of Vgat<sup>+</sup> inhibitory neurons and Hes5<sup>+</sup> astrocytes to NA endogenously released from LC<sup>→SDH</sup>-NA neurons. This would require the use of Vgat-Cre or Hes5-CreERT2 mice, but employing these lines precludes the use of NET-Cre mice, which are necessary for specific and efficient expression of ChrimsonR in LC<sup>→SDH</sup>-NA neurons. Nevertheless, all of our experimental data consistently support the proposed model, and we believe that the reviewer will agree with this, without additional experiments that is difficult to conduct because of technical limitations (lines 382-388).

      (l) As for other experiments, the proportion of Hes+ astrocytes that express hM3Dq, and the absence of expression in other cells, should be quantified and validated to interpret behavioral data.

      We thank the reviewer for raising this point. In our experiments, we used an HA-tag (fused with hM3Dq) to confirm hM3Dq expression. However, it is difficult to precisely analyze individual astrocytes because, as shown in Figure 3J, the boundaries of many HA-tag<sup>+</sup> astrocytes are indistinguishable. This seems to be due to the membrane localization of HA-tag, the complex morphology of astrocytes, and their tile-like distribution pattern (Baldwin et al., Trends Cell Biol, 2024 (PMID: 38180380)). Nevertheless, our previous study demonstrated that ~90% of astrocytes in the superficial laminae are Hes5<sup>+</sup> (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), and intra-SDH injection of AAV-hM3Dq labeled the majority of superficial astrocytes (Figure 3J). Thus, AAV-FLEx[hM3Dq] injection into Hes5-CreERT2 mice allows efficient expression of hM3Dq in Hes5<sup>+</sup> astrocytes in the SDH. Importantly, our previous studies using Hes5-CreERT2 mice have confirmed that hM3Dq is not expressed in other cell types (neurons, oligodendrocytes, or microglia) (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); Kagiyama et al., Mol Brain, 2025 (PMID: 40289116)). This information regarding the cell-type specificity has now been briefly described in the revised version (lines 218-219).

      (m) Showing that the effect of CNO is dose-dependent would strengthen the authors' findings.

      Thank you for your comment. We have now demonstrated a dose-dependent effect of CNO on Ca<sup>2+</sup> responses in SDH astrocytes (please see our response to Major Point (4) from Reviewer #2 (Recommendations for the Authors) (Figure S7; lines 225-228). In addition, we also confirmed that the effect of CNO is not nonspecific, as CNO application did not alter sIPSCs in spinal cord slices prepared from mice lacking hM3Dq expression in astrocytes (Figure S7; lines 225-228).

      (n) The proportion of SG neurons for which CNO bath application resulted in a reduction in recorded sIPSCs is not clear.

      We have included individual data points in each bar graph to more clearly illustrate the effect of CNO on each neuron (Figure 3L, N).

      (o) A1Rs. The specific expression of Cas9 and guide RNAs, and the specific KD of A1Rs, in inhibitory interneurons but not in other cell types expressing A1Rs should be quantified and validated.

      In addition to the data demonstrating the specific expression of SaCas9 and sgAdora1 in Vgat<sup>+</sup> inhibitory neurons shown in Figure 3G of the initial version, we have now conducted the same experiments with a different sample and confirmed this specificity: SaCas9 (detected via HA-tag) and sgAdora1 (detected via mCherry) were expressed in PAX2<sup>+</sup> inhibitory neurons (Author response image 1). Furthermore, as shown in Figure 3H and I in the initial version, the functional reduction of A<sub>1</sub>Rs in inhibitory neurons was validated by electrophysiological recordings. Together, these results support the successful deletion of A<sub>1</sub>Rs in inhibitory neurons.

      Author response image 1.

      Expression of HA-tag and mCherry in inhibitory neurons (a different sample from Figure 3G) SaCas9 (yellow, detected by HA-tag) and mCherry (magenta) expression in the PAX2<sup>+</sup> inhibitory neurons (cyan) at 3 weeks after intra-SDH injection of AAV-FLEx[SaCas9-HA] and AAV-FLEx[mCherry]-U6-sgAdora1 in Vgat-Cre mice. Arrowheads indicate genome-editing Vgat<sup>+</sup> cells. Scale bar, 25 µm.

      (6) Methods:

      It is unclear how fiber photometry is performed using "optic cannula" during restraint stress while mice are in a 50ml falcon tube (as shown in Figure 1A).

      We apologize for the omission of this detail in the Methods section. To monitor Ca<sup>2+</sup> events in LC-NA neurons during restraint stress, we created a narrow slit on the top of the conical tube, allowing mice to undergo restraint stress while connected to the optic fiber (see video). This information has now been added to the Methods section (lines 552-553).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Scientific rigor:

      It is unclear if the normal distribution of the data was determined before selecting statistical tests.

      We apologize for omitting this description. For all statistical analyses in this study, we first assessed the normality of the data and then selected appropriate statistical tests accordingly. We have added this information to the revised manuscript (lines 711-712).

      (2) Nomenclature:

      (a) Mouse Genome Informatics (MGI) nomenclature should be used to describe mouse genotypes (i.e., gene name in italic, only first letter is capitalized, alleles in superscript).

      (b) FLEx should be used instead of flex.

      Thank you for the suggestion. We have corrected these terms (including FLEx) according to MGI nomenclature.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the role of spinal astrocytes in mediating stress-induced pain hypersensitivity, focusing on the LC (locus coeruleus)-to-SDH (spinal dorsal horn) circuit and its mechanisms. The authors aimed to delineate how LC activity contributes to spinal astrocytic activation under stress conditions, explore the role of noradrenaline (NA) signaling in this process, and identify the downstream astrocytic mechanisms that influence pain hypersensitivity.

      The authors provide strong evidence that 1-hour restraint stress-induced pain hypersensitivity involves the LC-to-SDH circuit, where NA triggers astrocytic calcium activity via alpha1a adrenoceptors (alpha1aRs). Blockade of alpha1aRs on astrocytes - but not on Vgat-positive SDH neurons - reduced stress-induced pain hypersensitivity. These findings are rigorously supported by well-established behavioral models and advanced genetic techniques, uncovering the critical role of spinal astrocytes in modulating stress-induced pain.

      However, the study's third aim - to establish a pathway from astrocyte alpha1aRs to adenosine-mediated inhibition of SDH-Vgat neurons - is less compelling. While pharmacological and behavioral evidence is intriguing, the ex vivo findings are indirect and lack a clear connection to the stress-induced pain model. Despite these limitations, the study advances our understanding of astrocyte-neuron interactions in stress-pain contexts and provides a strong foundation for future research into glial mechanisms in pain hypersensitivity.

      Strengths:

      The study is built on a robust experimental design using a validated 1-hour restraint stress model, providing a reliable framework to investigate stress-induced pain hypersensitivity. The authors utilized advanced genetic tools, including retrograde AAVs, optogenetics, chemogenetics, and subpopulation-specific knockouts, allowing precise manipulation and interrogation of the LC-SDH circuit and astrocytic roles in pain modulation. Clear evidence demonstrates that NA triggers astrocytic calcium activity via alpha1aRs, and blocking these receptors effectively reduces stress-induced pain hypersensitivity.

      Weaknesses:

      Despite its strengths, the study presents indirect evidence for the proposed NA-to-astrocyte(alpha1aRs)-to-adenosine-to-SDH-Vgat neurons pathway, as the link between astrocytic adenosine release and stress-induced pain remains unclear. The ex vivo experiments, including NA-induced depolarization of Vgat neurons and chemogenetic stimulation of astrocytes, are challenging to interpret in the stress context, with the high CNO concentration raising concerns about specificity. Additionally, the role of astrocyte-derived D-serine is tangential and lacks clarity regarding its effects on SDH Vgat neurons. The astrocyte calcium signal "dip" after LC optostimulation-induced elevation are presented without any interpretation.

      We appreciate the reviewer's careful reading of our paper. According to the reviewer's comments, we have performed new additional experiments and added some discussion in the revised manuscript (please see the point-by-point responses below).

      Reviewer #2 (Recommendations for the authors):

      The astrocyte-mediated pathway of NA-to-astrocyte (alpha1aRs)-to-adenosine-to-SDH Vgat neurons (A1R) in the context of stress-induced pain hypersensitivity requires more direct evidence. While the data showing that the A1R agonist CPT inhibits stress-induced hypersensitivity and that stress combined with Aβ fiber stimulation increases pERK in the SDH are intriguing, these findings primarily support the involvement of A1R on Vgat neurons and are only behaviorally consistent with SDH-Vgat neuronal A1R knockdown. The role of astrocytes in this pathway in vivo remains indirect. The ex vivo chemogenetic Gq-DREADD stimulation of SDH astrocytes, which reduced sIPSCs in Vgat neurons in a CPT-dependent manner, needs revision with non-DREADD+CNO controls to validate specificity. Furthermore, the ex vivo bath application of NA causing depolarization in Vgat neurons, blocked by CPT, adds complexity to the data leaving me wondering how astrocytes are involved in such processes, and it does not directly connect to stress-induced pain hypersensitivity. These findings are potentially useful but require additional refinement to establish their relevance to the stress model.

      We thank the reviewer for the insightful feedback. First, regarding the role of astrocytes in this pathway in vivo, we showed in the initial version that mechanical pain hypersensitivities induced by intrathecal NA injection and by acute restraint stress were attenuated by both pharmacological blockade and Vgat<sup>+</sup> neuron-specific knockdown of A<sub>1</sub>Rs (Figure 4A, B). Given that NA- and stress-induced pain hypersensitivity is mediated by α<sub>1A</sub>R-dependent signaling in Hes5<sup>+</sup> astrocytes (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); this study), these findings provide in vivo evidence supporting the involvement of the NA → Hes5<sup>+</sup> astrocyte (via α<sub>1A</sub>Rs) → adenosine → Vgat<sup>+</sup> neuron (via A<sub>1</sub>Rs) pathway. As noted in the reviewer’s major comment (2), in vivo monitoring of adenosine dynamics in the SDH during stress exposure would further substantiate the astrocyte-to-neuron signaling pathway. However, we did not detect clear signals, potentially due to several technical limitations (see our response below). Acknowledging this limitation, we have now added a new paragraph in the end of Discussion section to address this issue. Second, the specificity of the effect of CNO has now been validated by additional experiments (see our response to major point (4)). Third, the reviewer’s concern regarding the action of NA on Vgat<sup>+</sup> neurons has also been addressed (see our response to major point (3) below).

      Major points:

      (1) The in vivo pharmacology using DCK to antagonize D-serine signaling from alpha1a-activated astrocytes is tangential, as there is limited evidence on how Vgat neurons (among many others) respond to D-serine. This aspect requires more focused exploration to substantiate its relevance.

      We propose that the site of action of D-serine in our neural circuit model is the NMDA receptors (NMDARs) on excitatory neurons, a notion supported by our previous findings (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); Kagiyama et al., Mol Brain, 2025 (PMID: 40289116)). However, we cannot exclude the possibility that D-serine also acts on NMDARs expressed by Vgat<sup>+</sup> inhibitory neurons. Nevertheless, given that intrathecal injection of D-serine in naïve mice induces mechanical pain hypersensitivity (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), it appears that the pronociceptive effect of D-serine in the SDH is primarily associated with enhanced pain processing and transmission, presumably via NMDARs on excitatory neurons. We have added this point to the Discussion section in the revised manuscript (lines 325-330).

      (2) Additionally, employing GRAB-Ado sensors to monitor adenosine dynamics in SDH astrocytes during NA signaling would significantly strengthen conclusions about astrocyte-derived adenosine's role in the stress model.

      We agree with the reviewer’s comment. Following this suggestion, we attempted to visualize NA-induced adenosine (and ATP) dynamics using GRAB-ATP and GRAB-Ado sensors (Wu et al., Neuron, 2022 (PMID: 34942116); Peng et al., Science, 2020 (PMID: 32883833)) in acutely isolated spinal cord slices from mice after intra-SDH injection of AAV-hSyn-GRABATP<sub>1.0</sub> and -GRABAdo<sub>1.0</sub>. We confirmed expression of these sensors in the SDH (Author response image 2a) and observed increased signals after bath application of ATP (0.1 or 1 µM) or adenosine (1 µM) (Author response image 2b, c). However, we were unable to detect clear signals following NA stimulation (Author response image 2b, c). The reason for this lack of detectable changes remains unclear. If the release of adenosine from astrocytes is a highly localized phenomenon, it may be measurable using high-resolution microscopy capable of detecting adenosine levels at the synaptic level and more sensitive sensors. Further investigation will therefore be required (lines 340-341).

      Author response image 2.

      Ex vivo imaging of GRAB-ATP and GRAB-Ado sensors.(a) Representative images of GRAB<sub>ATP1.0</sub> (left, green) or GRAB<sub>Ado1.0</sub> (right, green) expression in the SDH at 3 weeks after SDH injection of AAV-hSyn-GRAB<sub>Ado1.0</sub> or AAV-hSyn-GRAB<sub>Ado1.0</sub> in Hes5-CreERT2 mice. Scale bar, 200 µm. (b) Left: Representative fluorescence images showing GRAB<sub>ATP1.0</sub> responses before and after perfusion with NA or ATP. Right: Representative traces showing responses to ATP (0.1 and 1 µM) or NA (10 µM). (c) Left: Representative fluorescence images showing GRABAdo1.0 responses before and after perfusion with NA or adenosine (Ado). Right: Representative traces showing responses to Ado (0.01, 0.1, and 1 µM), NA (10 µM), or no application (negative control).

      (3) The interpretation of Figure 3D is challenging. The manuscript implies that 20 μM NA acts on Adra1a receptors on Vgat neurons to depolarize them, but this concentration should also activate Adra1a on astrocytes, leading to adenosine release and potential inhibition of depolarization. The observation of depolarization despite these opposing mechanisms requires explanation, as does the inhibition of depolarization by bath-applied A1R agonist. Of note, 20 μM NA is a high concentration for Adra1a activation, typically responsive at nanomolar levels. The discussion should reconcile this with prior studies indicating dose-dependent effects of NA on pain sensitivity (e.g., Reference 22).

      Like the reviewer, we also considered that bath-applied NA could activate α<sub>1A</sub>Rs expressed on Hes5<sup>+</sup> astrocytes. To clarify this point, we have performed additional patch-clamp recordings and found that knockdown of A<sub>1</sub>Rs in Vgat<sup>+</sup> neurons tended to increase the proportion of Vgat<sup>+</sup> neurons with NA-induced depolarizing responses (Figure S8). Therefore, it is conceivable that NA-induced excitation of Vgat<sup>+</sup> neurons may involve both a direct effect of NA activating α<sub>1A</sub>Rs in Vgat<sup>+</sup> neurons and an indirect inhibitory signaling from NA-stimulated Hes5<sup>+</sup> astrocytes via adenosine (lines 298-300).

      The concentration of NA used in our ex vivo experiments is higher than that typically used in vitro with αR-<sub>1A</sub>expressing cell lines or primary culture cells, but is comparable to concentrations used in other studies employing spinal cord slices (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); Baba et al., Anesthesiology, 2000 (PMID: 10691236); Lefton et al., Science, 2025 (PMID: 40373122)). In slice experiments, drugs must diffuse through the tissue to reach target cells, resulting in a concentration gradient. Therefore, higher drug concentrations are generally necessary in slice experiments, in contrast to cultured cell experiments, where drugs are directly applied to target cells. Importantly, we have previously shown that the pharmacological effects of 20 μM NA on Vgat<sup>+</sup> neurons and Hes5<sup>+</sup> astrocytes are abolished by loss of α<sub>1A</sub>Rs in these cells (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215); Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), confirming the specificity of these NA actions.

      Regarding the dose-dependent effect of NA on pain sensitivity, NA-induced pain hypersensitivity is abolished in Hes5<sup>+</sup> astrocyte-specific α<sub>1A</sub>R-KO mice (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), indicating that this behavior is mediated by α<sub>1A</sub>Rs expressed on Hes5<sup>+</sup> astrocytes. In contrast, the suppression of pain sensitivity by high doses of NA was unaffected in the KO mice (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), suggesting that other adrenergic receptors may contribute to this phenomenon. Clarifying the responsible receptors will require future investigation.

      (4) In Figure 3K-M, the CNO concentration used (100 μM) is unusually high compared to standard doses (1 to a few μM), raising concerns about potential off-target effects. Including non-hM3Dq controls and using lower CNO concentrations are essential to validate the specificity of the observed effects. Similarly, the study should clarify whether astrocyte hM3Dq stimulation alone (without NA) would induce hyperpolarization in Vgat neurons and how this interacts with NA-induced depolarization.

      We acknowledge that the concentration of CNO used in our experiments is relatively high compared to that used in other reports. However, in our experiments, application of CNO at 1, 10, and 100 μM induced Ca<sup>2+</sup> increases in GCaMP6-expressing astrocytes in spinal cord slices in a concentration-dependent manner (Figure S7). Among these, 100 μM CNO most effectively replicated the NA-induced Ca<sup>2+</sup> signals in astrocytes. Based on these findings, we selected this concentration for use in both the current and previous studies (Kohro et al., Nat Neurosci., 2020 (PMID: 33020652)). Importantly, to rule out non-specific effects, we conducted control experiments using spinal cord slices from mice that did not express hM3Dq in astrocytes and confirmed that CNO had no effect on Ca<sup>2+</sup> responses in astrocytes and sIPSCs in substantial gelatinosa (SG) neurons (Figure S7; lines 223-228). Thus, although the CNO concentration used is relatively high, the observed effects of CNO are not non-specific but result from the chemogenetic activation of hM3Dq-expressing astrocytes.

      In this study, we used Hes5-CreERT2 and Vgat-Cre mice to manipulate gene expression in Hes5<sup>+</sup> astrocytes and Vgat<sup>+</sup> neurons, respectively. In order to fully address the reviewer’s comment, the use of both Cre lines is necessary. However, simultaneous and independent genetic manipulation in each cell type using Cre activity alone is not feasible with the current genetic tools. We have mentioned this as a technical limitation in the Discussion section (lines 382-388).

      (5) The role of D-serine released by hM3Dq-stimulated astrocytes in (separately) modulating sub-types of neurons including excitatory neurons and Vgat positives needs more detailed discussion. If no effect of D-serine on Vgat neurons is observed, this should be explicitly stated, and the discussion should address why this might be the case.

      As mentioned in our response to Major Point (1) above, we have added a discussion of this point in the revised manuscript (lines 325-330).

      (6) Finally, the observed "dip" in astrocyte calcium signals below baseline following the large peaks with LC optostimulation should be discussed further, as understanding this phenomenon could provide valuable insights into astrocytic signaling dynamics in the context of single acute or repetitive chronic stress.

      Thank you for your comment. We found that this phenomenon was not affected by pretreatment with the α<sub>1A</sub>R-specific antagonist silodosin (Author response image 3), which effectively suppressed Ca<sup>2+</sup> elevations evoked by stimulation of LC-NA neurons (Figure 2F). This implies that the phenomenon is independent of α<sub>1A</sub>R signaling. Elucidating the detailed underlying mechanism remains an important direction for future investigation.

      Author response image 3.

      The observed "dip" in astrocyte Ca<sup>2+</sup> signals was not affected by pretreatment with the α<sub>1A</sub>R-specific antagonist silodosin. Representative traces of astrocytic GCaMP6m signals in response to optogenetic stimulation of LC-NAe<sup>→SDH</sup>rgic axons/terminals in a spinal cord slice. Each trace shows the GCaMP6m signal before and after optogenetic stimulation (625 nm, 1 mW, 10 Hz, 5 ms pulse duration, 10 s). Slices were pretreated with silodosin (40 nM) for 5 min prior to stimulation.

      Reviewer #3 (Public review):

      Summary:

      This is an exciting and timely study addressing the role of descending noradrenergic systems in nocifensive responses. While it is well-established that spinally released noradrenaline (aka norepinephrine) generally acts as an inhibitory factor in spinal sensory processing, this system is highly complex. Descending projections from the A6 (locus coeruleus, LC) and the A5 regions typically modulate spinal sensory processing and reduce pain behaviours, but certain subpopulations of LC neurons have been shown to mediate pronociceptive effects, such as those projecting to the prefrontal cortex (Hirshberg et al., PMID: 29027903).

      The study proposes that descending cerulean noradrenergic neurons potentiate touch sensation via alpha-1 adrenoceptors on Hes5+ spinal astrocytes, contributing to mechanical hyperalgesia. This finding is consistent with prior work from the same group (dd et al., PMID:). However, caution is needed when generalising about LC projections, as the locus coeruleus is functionally diverse, with differences in targets, neurotransmitter co-release, and behavioural effects. Specifying the subpopulations of LC neurons involved would significantly enhance the impact and interpretability of the findings.

      Strengths:

      The study employs state-of-the-art molecular, genetic, and neurophysiological methods, including precise CRISPR and optogenetic targeting, to investigate the role of Hes5+ astrocytes. This approach is elegant and highlights the often-overlooked contribution of astrocytes in spinal sensory gating. The data convincingly support the role of Hes5+ astrocytes as regulators of touch sensation, coordinated by brain-derived noradrenaline in the spinal dorsal horn, opening new avenues for research into pain and touch modulation.

      Furthermore, the data support a model in which superficial dorsal horn (SDH) Hes5+ astrocytes act as non-neuronal gating cells for brain-derived noradrenergic (NA) signalling through their interaction with substantia gelatinosa inhibitory interneurons. Locally released adenosine from NA-stimulated Hes5+ astrocytes, following acute restraint stress, may suppress the function of SDH-Vgat+ inhibitory interneurons, resulting in mechanical pain hypersensitivity. However, the spatially restricted neuron-astrocyte communication underlying this mechanism requires further investigation in future studies.

      Weaknesses

      (1) Specificity of the LC Pathway targeting

      The main concern lies with how definitively the LC pathway was targeted. Were other descending noradrenergic nuclei, such as A5 or A7, also labelled in the experiments? The authors must convincingly demonstrate that the observed effects are mediated exclusively by LC noradrenergic terminals to substantiate their claims (i.e. "we identified a circuit, the descending LC→SDH-NA neurons").

      (a) For instance, the direct vector injection into the LC likely results in unspecific effects due to the extreme heterogeneity of this nucleus and retrograde labelling of the A5 and A7 nuclei from the LC (i.e., Li et al., PMID: 26903420).

      We appreciate the reviewer's valuable comments. To address this point, we performed additional experiments and demonstrated that intra-SDH injection of AAVretro-Cre followed by intra-LC injection of AAV2/9-EF1α-FLEx[DTR-EGFP] specifically results in DTR expression in NA neurons of the LC, but not of the A5 or A7 regions (Figure S4; lines 127-128). These results confirm the specificity of targeting the LC<sup>→SDH</sup>-NAergic pathway in our study.

      (b) It is difficult to believe that the intersectional approach described in the study successfully targeted LC→SDH-NA neurons using AAVrg vectors. Previous studies (e.g., PMID: 34344259 or PMID: 36625030) demonstrated that similar strategies were ineffective for spinal-LC projections. The authors should provide detailed quantification of the efficiency of retrograde labelling and specificity of transgene expression in LC neurons projecting to the SDH.

      Thank you for your comment. As we described in our response to the weakness (5)-e) of Reviewer #1 (Public review), our additional analysis showed that, under our experimental conditions, expression of genes (for example DTR) was observed in 4.4% of NA (TH<sup>+</sup>) neurons in the LC (Figure S4; lines 126-127).

      The reasons for this difference between the previous studies and our current study is unclear; however, it is likely attributed to methodological differences, including the type of viral vectors employed, species differences (mouse (PMID: 34344259, our study) vs. rat (PMID: 36625030)), the amount of AAV injected into the SDH (300 nL at three sites (PMID: 34344259), and 300 nL at a single site (our study)) and LC (500 nL at a single site (PMID: 34344259), and 300 nL at a single site (our study)), as well as the depth of AAV injection in the SDH (200–300 µm from the dorsal surface of the spinal cord (PMID: 34344259), and 120–150 µm in depth from the surface of the dorsal root entry zone (our study)).

      (c) Furthermore, it is striking that the authors observed a comparably strong phenotypical change in Figure 1K despite fewer neurons being labelled, compared to Figure 1H and 1N with substantially more neurons being targeted. Interestingly, the effect in Figure 1K appears more pronounced but shorter-lasting than in the comparable experiment shown in Figure 1H. This discrepancy requires further explanation.

      Although only a representative section of the LC was shown in the initial version, LC<sup>→SDH</sup>-NA neurons are distributed rostrocaudally throughout the LC, as previously reported (Llorca-Torralba et al., Brain, 2022 (PMID: 34373893)). Our additional experiments analyzing multiple sections of the anterior and posterior regions of the LC have now revealed that approximately sixty LC<sup>→SDH</sup>-NA neurons express DTR, and these neurons are eliminated following DTX treatment (Figure S4; lines 126-128) (it should be noted that these neurons specifically project to the L4 segment of the SDH, and the total number of LC<sup>→SDH</sup>-NA neurons is likely much higher). Considering the specificity of LC<sup>→SDH</sup>-NAergic pathway targeting demonstrated in our study (as described above), together with the fact that primary afferent sensory fibers from the plantar skin of the hindpaw predominantly project to the L4 segment of the SDH, these data suggest that the observed behavioral changes are attributable to the loss of these neurons and that ablation of even a relatively small number of NA neurons in the LC can have a significant impact on behavior. We have added this hypothesis in the Discussion section (lines 373-382).

      Regarding the data in Figures 1H and 1K, as the reviewer pointed out, a statistically significant difference was observed at 90 min in mice with ablation of LC-NA neurons, but not in those with LC<sup>→SDH</sup>-NA neuron ablation. This is likely due to a slightly higher threshold in the control group at this time point (Figure 1K), and it remains unclear whether there is a mechanistic difference between the two groups at this specific time point.

      (d) A valuable addition would be staining for noradrenergic terminals in the spinal cord for the intersectional approach (Figure 1J), as done in Figures 1F/G. LC projections terminate preferentially in the SDH, whereas A5 projections terminate in the deep dorsal horn (DDH). Staining could clarify whether circuits beyond the LC are being ablated.

      As suggested, we performed DTR immunostaining in the SDH; however, we did not detect any DTR immunofluorescence there. A similar result was also observed in the spinal terminals of DTR-expressing primary afferent fibers (our unpublished data). The reason for this is unclear, but to the best of our knowledge, no studies have clearly shown DTR expression at presynaptic terminals, which may be because the action of DTX on the neuronal cell body is necessary for cell ablation. Nevertheless, as described in our response to the weakness (5)-f) by Reviewer 1 (Public review), we have now confirmed the specific expression of DTR in the LC, but not in the A5 and A7 regions (Figure S4; lines 127-128).

      (e) Furthermore, different LC neurons often mediate opposite physiological outcomes depending on their projection targets-for example, dorsal LC neurons projecting to the prefrontal cortex PFCx are pronociceptive, while ventral LC neurons projecting to the SC are antinociceptive (PMIDs: 29027903, 34344259, 36625030). Given this functional diversity, direct injection into the LC is likely to result in nonspecific effects.

      To avoid behavioral outcomes resulting from a mixture of facilitatory and inhibitory effects caused by activating the entire population of LC-NA neurons, we employed a specific manipulation targeting LC<sup>→SDH</sup>-NA neurons using AAV vectors. The specificity of this manipulation was confirmed in our previous study (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)) and in the current study (Figure S4). Using this approach, we previously demonstrated that LC neurons can exert pronociceptive effects via astrocytes in the SDH (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). This pronociceptive role is further supported by the current study, which uses a more selective manipulation of LC<sup>→SDH</sup>-NA neurons through a NET-Cre mouse line. In addition, intrathecal administration of relatively low doses of NA in naïve mice clearly induces mechanical pain hypersensitivity. Nevertheless, we have also acknowledged that several recent studies have reported an inhibitory role of LC<sup>→SDH</sup>-NA neurons in spinal nociceptive signaling. The reason for these differing behavioral outcomes remains unclear, but several methodological differences may underlie the discrepancy. First, the degree of LC<sup>→SDH</sup>-NA neuronal activity may play a role. Although direct comparisons between studies reporting pro- and anti-nociceptive effects are difficult, our previous studies demonstrated that intrathecal administration of high doses of NA in naïve mice does not induce mechanical pain hypersensitivity (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). Second, the sensory modality used in behavioral testing may be a contributing factor as the pronociceptive effect of NA appears to be selectively observed in responses to mechanical, but not thermal, stimuli (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). This sensory modality-selective effect is also evident in mice subjected to acute restraint stress (Table S1). Therefore, the role of LC<sup>→SDH</sup>-NA neurons in modulating nociceptive signaling in the SDH is more complex than previously appreciated, and their contribution to pain regulation should be reconsidered in light of factors such as NA levels, sensory modality, and experimental context. In revising the manuscript, we have included some points described above in the Discussion (lines 282-291).

      Conclusion on Specificity: The authors are strongly encouraged to address these limitations directly, as they significantly affect the validity of the conclusions regarding the LC pathway. Providing more robust evidence, acknowledging experimental limitations, and incorporating complementary analyses would greatly strengthen the manuscript.

      We appreciate the reviewer’s comments. We fully acknowledge the limitations raised and agree that addressing them directly is important for the rigor of our conclusions on the LC pathway. To this end, we have performed additional experiments (e.g., Figure A and S4), which are now included in the revised manuscript. Furthermore, we have also newly added a new paragraph for experimental limitations in the end of Discussion section (lines 373-408). We believe these new data substantially strengthen the validity of our findings and have clarified these points in the Discussion section.

      (2) Discrepancies in Data

      (a) Figures 1B and 1E: The behavioural effect of stress on PWT (Figure 1E) persists for 120 minutes, whereas Ca2+ imaging changes (Figure 1B) are only observed in the first 20 minutes, with signal attenuation starting at 30 minutes. This discrepancy requires clarification, as it impacts the proposed mechanism.

      Thank you for your important comment. As pointed out by the reviewer, there is a difference between the duration of behavioral responses and Ca<sup>2+</sup> events, although the exact time point at which the PWT begins to decline remains undetermined (as behavioral testing cannot be conducted during stress exposure). A similar temporal difference was also observed following intraplantar injection of capsaicin (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)); while LC<sup>→SDH</sup>-NA neuron-mediated astrocytic Ca<sup>2+</sup> responses in SDH astrocytes last for 5–10 min after injection, behavioral hypersensitivity peaks around 60 min post-injection and gradually returns to baseline over the subsequent 60–120 min. These findings raise the possibility that astrocyte-mediated pain hypersensitivity in the SDH may involve a sustained alteration in spinal neural function, such as central sensitization. We have added this hypothesis to the Discussion section of the revised manuscript (lines 399-408), as it represents an important direction for future investigation.

      (b) Figure 4E: The effect is barely visible, and the tissue resembles "Swiss cheese," suggesting poor staining quality. This is insufficient for such an important conclusion. Improved staining and/or complementary staining (e.g., cFOS) are needed. Additionally, no clear difference is observed between Stress+Ab stim. and Stress+Ab stim.+CPT, raising doubts about the robustness of the data.

      As suggested, we performed c-FOS immunostaining and obtained clearer results (Figure 4E,F; lines 243-252). We also quantitatively analyzed the number of c-FOS<sup>+</sup> cells in the superficial laminae, and the results are consistent with those obtained from the pERK experiments.

      (c) Discrepancy with Existing Evidence: The claim regarding the pronociceptive effect of LC→SDH-NAergic signalling on mechanical hypersensitivity contrasts with findings by Kucharczyk et al. (PMID: 35245374), who reported no facilitation of spinal convergent (wide-dynamic range) neuron responses to tactile mechanical stimuli, but potent inhibition to noxious mechanical von Frey stimulation. This discrepancy suggests alternative mechanisms may be at play and raises the question of why noxious stimuli were not tested.

      In our experiments, ChrimsonR expression was observed in the superficial and deeper laminae of the spinal cord (Figure S6). Due to the technical limitations of the optical fibers used for optogenetics, the light stimulation could only reach the superficial laminae; therefore, it may not have affected the activity of neurons (including WDR neurons) located in the deeper laminae. Furthermore, the study by Kucharczyk et al. (Brain, 2022 (PMID: 35245374)) employed a stimulation protocol that differed from ours, applying continuous stimulation over several minutes. Given that the levels of NA released from LC<sup>→SDH</sup>-NAergic terminals in the SDH increase with the duration of terminal stimulation (as shown in Figure 2B), longer stimulation may result in higher levels of NA in the SDH. Considering also our data indicating that the pro- and anti-nociceptive effects of NA are dose dependent (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), these differences may be related to LC<sup>→SDH</sup>-NA neuron activity, NA levels in the SDH, and the differential responses of SDH neurons in the superficial versus deeper laminae (lines 388-395).

      (3) Sole reliance on Von Frey testing

      The exclusive use of von Frey as a behavioural readout for mechanical sensitisation is a significant limitation. This assay is highly variable, and without additional supporting measures, the conclusions lack robustness. Incorporating other behavioural measures, such as the adhesive tape removal test to evaluate tactile discomfort, the needle floor walk corridor to assess sensitivity to uneven or noxious surfaces, or the kinetic weight-bearing test to measure changes in limb loading during movement, could provide complementary insights. Physiological tests, such as the Randall-Selitto test for noxious pressure thresholds or CatWalk gait analysis to evaluate changes in weight distribution and gait dynamics, would further strengthen the findings and allow for a more comprehensive assessment of mechanical sensitisation.

      Thank you for your suggestion. Based on our previous findings that Hes5<sup>+</sup> astrocytes in the SDH selectively modulate mechanosensory signaling (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), the present study focused on behavioral responses to mechanical stimuli using von Frey filaments. As we have not previously conducted most of the behavioral tests suggested by the reviewers, and as we currently lack the necessary equipments for these tests (e.g., Randall–Selitto test, CatWalk gait analysis, and weight-bearing test), we were unable to include them in this study. However, it will be of great interest in future research to investigate whether activation of the LC<sup>→SDH</sup>-NA neuron-to-SDH Hes5<sup>+</sup> astrocyte signaling pathway similarly sensitizes behavioral responses to other types of mechanical stimuli and also to investigate the sensory modality-selective pro- and antinociceptive role of LC<sup>→SDH</sup>-NAergic signaling in the SDH (lines 396-399).

      Overall Conclusion

      This study addresses an important and complex topic with innovative methods and compelling data. However, the conclusions rely on several assumptions that require more robust evidence. Specificity of the LC pathway, experimental discrepancies, and methodological limitations (e.g., sole reliance on von Frey) must be addressed to substantiate the claims. With these issues resolved, this work could significantly advance our understanding of astrocytic and noradrenergic contributions to pain modulation.

      We have made every effort to address the reviewer’s concerns through additional experiments and analyses. Based on the new control data presented, we believe that our explanation is reasonable and acceptable. Although additional data cannot be provided on some points due to methodological constraints and limitations of the techniques currently available in our laboratory, we respectfully submit that the evidence presented sufficiently supports our conclusions.

      Reviewer #3 (Recommendations for the authors):

      A lot of beautiful and challenging-to-collect data is presented. Sincere congratulations to all the authors on this achievement!

      Notwithstanding, please carefully reconsider the conclusions regarding the LC pathway, as additional evidence is required to ensure their specificity and robustness.

      We thank the reviewer for the kind comments and for raising an important point regarding the LC pathway. The reviewer’s feedback prompted us to conduct additional investigations to further strengthen the validity of our conclusions. We have incorporated these new data and analyses into the revised manuscript, and we believe that these revisions substantially enhance the robustness and reliability of our findings.

    1. eLife Assessment

      This important study provides evidence for dynamic coupling between translation initiation and elongation that can help maintain low ribosome density and translational homeostasis. The authors combine single-molecule imaging with a new approach to analyze mRNA translation kinetics using Bayesian modeling. This work is overall solid and will be of interest to those studying translational regulation.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Lamberti et al. investigate how translation initiation and elongation are coordinated at the single-mRNA level in mammalian cells. The authors aim to uncover whether and how cells dynamically adjust initiation rates in response to elongation dynamics, with the overarching goal of understanding how translational homeostasis is maintained. To this end, the study combines single-molecule live-cell imaging using the SunTag system with a kinetic modeling framework grounded in the Totally Asymmetric Simple Exclusion Process (TASEP). By applying this approach to custom reporter constructs with different coding sequences, and under perturbations of the initiation/elongation factor eIF5A, the authors infer initiation and elongation rates from individual mRNAs and examine how these rates covary.

      The central finding is that initiation and elongation rates are strongly correlated across a range of coding sequences, resulting in consistently low ribosome density ({less than or equal to}12% of the coding sequence occupied). This coupling is preserved under partial pharmacological inhibition of eIF5A, which slows elongation but is matched by a proportional decrease in initiation, thereby maintaining ribosome density. However, a complete genetic knockout of eIF5A disrupts this coordination, leading to reduced ribosome density, potentially due to changes in ribosome stalling resolution or degradation.

      Strengths:

      A key strength of this work is its methodological innovation. The authors develop and validate a TASEP-based Hidden Markov Model (HMM) to infer translation kinetics at single-mRNA resolution. This approach provides a substantial advance over previous population-level or averaged models and enables dynamic reconstruction of ribosome behavior from experimental traces. The model is carefully benchmarked against simulated data and appropriately applied. The experimental design is also strong. The authors construct matched SunTag reporters differing only in codon composition in a defined region of the coding sequence, allowing them to isolate the effects of elongation-related features while controlling for other regulatory elements. The use of both pharmacological and genetic perturbations of eIF5A adds robustness and depth to the biological conclusions. The results are compelling: across all constructs and conditions, ribosome density remains low, and initiation and elongation appear tightly coordinated, suggesting an intrinsic feedback mechanism in translational regulation. These findings challenge the classical view of translation initiation as the sole rate-limiting step and provide new insights into how cells may dynamically maintain translation efficiency and avoid ribosome collisions.

      Assessment of Goals and Conclusions:

      The authors successfully achieve their stated aims: they quantify translation initiation and elongation at the single-mRNA level and show that these processes are dynamically coupled to maintain low ribosome density. The modeling framework is well suited to this task, and the conclusions are supported by multiple lines of evidence, including inferred kinetic parameters, independent ribosome counts, and consistent behavior under perturbation.

      Impact and Utility:

      This work makes a significant conceptual and technical contribution to the field of translation biology. The modeling framework developed here opens the door to more detailed and quantitative studies of ribosome dynamics on single mRNAs and could be adapted to other imaging systems or perturbations. The discovery of initiation-elongation coupling as a general feature of translation in mammalian cells will likely influence how researchers think about translational regulation under homeostatic and stress conditions.

      The data, models, and tools developed in this study will be of broad utility to the community, particularly for researchers studying translation dynamics, ribosome behavior, or the effects of codon usage and mRNA structure on protein synthesis.

      Context and Interpretation:

      This study contributes to a growing body of evidence that translation is not merely controlled at initiation but involves feedback between elongation and initiation. It supports the emerging view that ribosome collisions, stalling, and quality control pathways play active roles in regulating initiation rates in cis. The findings are consistent with recent studies in yeast and metazoans showing translation initiation repression following stalling events. However, the mechanistic details of this feedback remain incompletely understood and merit further investigation, particularly in physiological or stress contexts.

      In summary, this is a thoughtfully executed and timely study that provides valuable insights into the dynamic regulation of translation and introduces a modeling framework with broad applicability. It will be of interest to a wide audience in molecular biology, systems biology, and quantitative imaging.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript uses single-molecule run-off experiments and TASEP/HMM models to estimate biophysical parameters, i.e., ribosomal initiation and elongation rates. Combining inferred initiation and elongation rates, the authors quantify ribosomal density. TASEP modeling was used to simulate the mechanistic dynamics of ribosomal translation, and the HMM is used to link ribosomal dynamics to microscope intensity measurements. The authors' main conclusions and findings are:

      - Ribosomal elongation rates and initiation rates are strongly coordinated.

      - Elongation rates were estimated between 1 and 4.5 aa/sec. Initiation rates were estimated between 1 and 2 ribosomes/min. These values agree with previously reported ones.

      - Ribosomal density was determined to be below 12% for all constructs and conditions.

      - eIF5A-perturbations (GC7 inhibition) resulted in non-significant changes in translational bursting and ribosome density.

      - eIF5A perturbations affected both elongation and initiation rates.

      Strengths:

      This manuscript presents an interesting scientific hypothesis to study ribosome initiation and elongation concurrently. This topic is relevant for the field. The manuscript presents a novel quantitative methodology to estimate ribosomal initiation rates from Harringtonine run-off assays. This is relevant because run-off assays have been used to estimate, exclusively, elongation rates.

      Comments on revisions:

      The authors have addressed my concerns. Specifically, they have expanded the discussion on unexpected eIF5A perturbation results, calculated CAI values for all constructs, and made code and data publicly available via GitHub and Zenodo. The mathematical notation is now consistent, and all variables are properly defined.

    4. Reviewer #3 (Public review):

      Disclaimer:

      My expertise is in live single-molecule imaging of RNA and transcription, as well as associated data analysis and modeling. While this aligns well with the technical aspects of the manuscript, my background in translation is more limited, and I am not best positioned to assess the novelty of the biological conclusions.

      Summary:

      This study combines live-cell imaging of nascent proteins on single mRNAs with time-series analysis to investigate the kinetics of mRNA translation.<br /> The authors (i) used a calibration method for estimating absolute ribosome counts, and (ii) developed a new Bayesian approach to infer ribosome counts over time from run-off experiments, enabling estimation of elongation rates and ribosome density across conditions.

      They report (i) translational bursting at the single-mRNA level, (ii) low ribosome density (~10% occupancy {plus minus} a few percents), (iii) that ribosome density is minimally affected by perturbations of elongation (using a drug and/or different coding sequences in the reporter), suggesting a homeostatic mechanism potentially involving a feedback of elongation onto initiation, although (iv) this coupling breaks down upon knockout of elongation factor eIF5A.

      Strengths:

      (1) The manuscript is well written and the conclusions are in general appropriately cautious (besides the few improvements I suggest below).

      (2) The time-series inference method is interesting and promising for broader application.

      (3) Simulations provide convincing support for the modeling (though some improvements are possible).

      (4) The reported homeostatic effect on ribosome density is surprising and carefully validated with multiple perturbations.

      (5) Imaging quality and corrections (e.g., flat-fielding, laser power measurements) are robust.

      (6) Mathematical modeling is clearly described and precise; a few clarifications could improve it further.

      Weaknesses:

      (1) The absolute quantification of ribosome numbers (via the measurement of $i_{MP}$​) should be improved. This only affects the finding that ribosome density is low, not that it appears to be under homeostatic control. However, if $i_{MP}$​ turns out to be substantially overestimated (hence ribosome density underestimated), then "ribosomes queuing up to the initiation site and physically blocking initiation" could become a relevant hypothesis. In my first review of this work, I made recommendations, which the authors did not follow. In my view, the robustness of this particular aspect of this study remains moderate.

      (2) The proposed initiation-elongation coupling is plausible, but alternative explanations such as changes in abortive elongation frequency should be considered. In their response to my previous comments, the authors indicate that this is "beyond the scope of the present work".

      (3) More an opportunity for improvement than a weakness: It is unclear what the single-mRNA nature of the inference method is bringing since it is only used here to report _average_ ribosome elongation rate and density (averaged across mRNAs and across time during the run-off experiments -although the method, in principle, has the power to resolve these two aspects). In response to my previous comment, the authors note that such analyses could be incorporated in future work.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this study, Lamberti et al. investigate how translation initiation and elongation are coordinated at the single-mRNA level in mammalian cells. The authors aim to uncover whether and how cells dynamically adjust initiation rates in response to elongation dynamics, with the overarching goal of understanding how translational homeostasis is maintained. To this end, the study combines single-molecule live-cell imaging using the SunTag system with a kinetic modeling framework grounded in the Totally Asymmetric Simple Exclusion Process (TASEP). By applying this approach to custom reporter constructs with different coding sequences, and under perturbations of the initiation/elongation factor eIF5A, the authors infer initiation and elongation rates from individual mRNAs and examine how these rates covary.

      The central finding is that initiation and elongation rates are strongly correlated across a range of coding sequences, resulting in consistently low ribosome density ({less than or equal to}12% of the coding sequence occupied). This coupling is preserved under partial pharmacological inhibition of eIF5A, which slows elongation but is matched by a proportional decrease in initiation, thereby maintaining ribosome density. However, a complete genetic knockout of eIF5A disrupts this coordination, leading to reduced ribosome density, potentially due to changes in ribosome stalling resolution or degradation.

      Strengths:

      A key strength of this work is its methodological innovation. The authors develop and validate a TASEP-based Hidden Markov Model (HMM) to infer translation kinetics at single-mRNA resolution. This approach provides a substantial advance over previous population-level or averaged models and enables dynamic reconstruction of ribosome behavior from experimental traces. The model is carefully benchmarked against simulated data and appropriately applied. The experimental design is also strong. The authors construct matched SunTag reporters differing only in codon composition in a defined region of the coding sequence, allowing them to isolate the effects of elongation-related features while controlling for other regulatory elements. The use of both pharmacological and genetic perturbations of eIF5A adds robustness and depth to the biological conclusions. The results are compelling: across all constructs and conditions, ribosome density remains low, and initiation and elongation appear tightly coordinated, suggesting an intrinsic feedback mechanism in translational regulation. These findings challenge the classical view of translation initiation as the sole rate-limiting step and provide new insights into how cells may dynamically maintain translation efficiency and avoid ribosome collisions.

      We thank the reviewer for their constructive assessment of our work, and for recognizing the methodological innovation and experimental rigor of our study.

      Weaknesses:

      A limitation of the study is its reliance on exogenous reporter mRNAs in HeLa cells, which may not fully capture the complexity of endogenous translation regulation. While the authors acknowledge this, it remains unclear how generalizable the observed coupling is to native mRNAs or in different cellular contexts.

      We agree that the use of exogenous reporters is a limitation inherent to the SunTag system, for which there is currently no simple alternative for single-mRNA translation imaging. However, we believe our findings are likely generalizable for several reasons.

      As discussed in our introduction and discussion, there is growing mechanistic evidence in the literature for coupling between elongation (ribosome collisions) and initiation via pathways such as the GIGYF2-4EHP axis (Amaya et al. 2018, Hickey et al. 2020, Juszkiewicz et al. 2020), which might operate on both exogenous and endogenous mRNAs.

      As already acknowledged in our limitations section, our exogenous reporters may not fully recapitulate certain aspects of endogenous translation (e.g., ER-coupled collagen processing), yet the observed initiation-elongation coupling was robust across all tested constructs and conditions.

      We have now expanded the Discussion (L393-395) to cite complementary evidence from Dufourt et al. (2021), who used a CRISPR-based approach in Drosophila embryos to measure translation of endogenous genes. We also added a reference to Choi et al. 2025, who uses a ER-specific SunTag reporter to visualize translation at the ER (L395-397).

      Additionally, the model assumes homogeneous elongation rates and does not explicitly account for ribosome pausing or collisions, which could affect inference accuracy, particularly in constructs designed to induce stalling. While the model is validated under low-density assumptions, more work may be needed to understand how deviations from these assumptions affect parameter estimates in real data.

      We agree with the reviewer that the assumption of homogeneous elongation rates is a simplification, and that our work represents a first step towards rigorous single-trace analysis of translation dynamics. We have explicitly tested the robustness of our model to violations of the low-density assumption through simulations (Figure 2 - figure supplement 2). These show that while parameter inference remains accurate at low ribosome densities, accuracy slightly deteriorates at higher densities, as expected. In fact, our experimental data do provide evidence for heterogeneous elongation: the waiting times between termination events deviate significantly from an exponential distribution (Figure 3 - figure supplement 2C), indicating the presence of ribosome stalling and/or bursting, consistent with the reviewer's concern. We acknowledge in the Limitations section (L402-406) that extending the model to explicitly capture transcript-dependent elongation rates and ribosome interactions remains challenging. The TASEP is difficult to solve analytically under these conditions, but we note that simulation-based inference approaches, such as particle filters to replace HMMs, could provide a path forward for future work to capture this complexity at the single-trace level.

      Furthermore, although the study observes translation "bursting" behavior, this is not explicitly modeled. Given the growing recognition of translational bursting as a regulatory feature, incorporating or quantifying this behavior more rigorously could strengthen the work's impact.

      While we do not explicitly model the bursting dynamics in the HMM framework, we have quantified bursting behavior directly from the data. Specifically, we measure the duration of translated (ON) and untranslated (OFF) periods across all reporters and conditions (Figure 1G for control conditions and Figure 4G-H for perturbed conditions), finding that active translation typically lasts 10-15 minutes interspersed with shorter silent periods of 5-10 minutes. This empirical characterization demonstrates that bursting is a consistent feature of translation across our experimental conditions. The average duration of silent periods is similar to what was inferred by Livingston et al. 2023 for a similar SunTag reporter; while the average duration of active periods is substantially shorter (~15 min instead of ~40 min), which is consistent with the shorter trace duration in our system compared to theirs (~15 min compared to ~80 min, on average). Incorporating an explicit two-state or multi-state bursting model into the TASEP-HMM framework would indeed be computationally intensive and represents an important direction for future work, as it would enable inference of switching rates alongside initiation and elongation parameters. We have added this point to the Discussion (L415-417).

      Assessment of Goals and Conclusions:

      The authors successfully achieve their stated aims: they quantify translation initiation and elongation at the single-mRNA level and show that these processes are dynamically coupled to maintain low ribosome density. The modeling framework is well suited to this task, and the conclusions are supported by multiple lines of evidence, including inferred kinetic parameters, independent ribosome counts, and consistent behavior under perturbation.

      Impact and Utility:

      This work makes a significant conceptual and technical contribution to the field of translation biology. The modeling framework developed here opens the door to more detailed and quantitative studies of ribosome dynamics on single mRNAs and could be adapted to other imaging systems or perturbations. The discovery of initiation-elongation coupling as a general feature of translation in mammalian cells will likely influence how researchers think about translational regulation under homeostatic and stress conditions.

      The data, models, and tools developed in this study will be of broad utility to the community, particularly for researchers studying translation dynamics, ribosome behavior, or the effects of codon usage and mRNA structure on protein synthesis.

      Context and Interpretation:

      This study contributes to a growing body of evidence that translation is not merely controlled at initiation but involves feedback between elongation and initiation. It supports the emerging view that ribosome collisions, stalling, and quality control pathways play active roles in regulating initiation rates in cis. The findings are consistent with recent studies in yeast and metazoans showing translation initiation repression following stalling events. However, the mechanistic details of this feedback remain incompletely understood and merit further investigation, particularly in physiological or stress contexts. 

      In summary, this is a thoughtfully executed and timely study that provides valuable insights into the dynamic regulation of translation and introduces a modeling framework with broad applicability. It will be of interest to a wide audience in molecular biology, systems biology, and quantitative imaging.

      We appreciate the reviewer's thorough and positive assessment of our work, and that they recognize both the technical innovation of our modeling framework and its potential broad utility to the translation biology community. We agree that further mechanistic investigation of initiation-elongation feedback under various physiological contexts represents an important direction for future research.

      Reviewer #2 (Public review):

      Summary:

      This manuscript uses single-molecule run-off experiments and TASEP/HMM models to estimate biophysical parameters, i.e., ribosomal initiation and elongation rates. Combining inferred initiation and elongation rates, the authors quantify ribosomal density. TASEP modeling was used to simulate the mechanistic dynamics of ribosomal translation, and the HMM is used to link ribosomal dynamics to microscope intensity measurements. The authors' main conclusions and findings are:

      (1) Ribosomal elongation rates and initiation rates are strongly coordinated.

      (2) Elongation rates were estimated between 1-4.5 aa/sec. Initiation rates were estimated between 0.5-2.5 events/min. These values agree with previously reported values.

      (3) Ribosomal density was determined below 12% for all constructs and conditions.

      (4) eIF5A-perturbations (KO and GC7 inhibition) resulted in non-significant changes in translational bursting and ribosome density.

      (5) eIF5A perturbations resulted in increases in elongation and decreases in initiation rates.

      Strengths:

      This manuscript presents an interesting scientific hypothesis to study ribosome initiation and elongation concurrently. This topic is highly relevant for the field. The manuscript presents a novel quantitative methodology to estimate ribosomal initiation rates from Harringtonine run-off assays. This is relevant because run-off assays have been used to estimate, exclusively, elongation rates.

      We thank the reviewer for their careful evaluation of our work and for recognizing the novelty of our quantitative methodology to extract both initiation and elongation rates from harringtonine run-off assays, extending beyond the traditional use of these experiments.

      Weaknesses:

      The conclusion of the strong coordination between initiation and elongation rates is interesting, but some results are unexpected, and further experimental validation is needed to ensure this coordination is valid. 

      We agree that some of our findings need further experimental investigation in future studies. However, we believe that the coordination between initiation and elongation is supported by multiple results in our current work: (1) the strong correlation observed across all reporters and conditions (Figure 3E), and (2) the consistent maintenance of low ribosome density despite varying elongation rates. While additional experimental validation would be valuable, we note that directly manipulating initiation or elongation independently in mammalian cells remains technically challenging. Nevertheless, our findings are consistent with emerging mechanistic understanding of collision-sensing pathways (GIGYF2-4EHP) that could mediate such coupling, as discussed in our manuscript.

      (1) eIF5a perturbations resulted in a non-significant effect on the fraction of translating mRNA, translation duration, and bursting periods. Given the central role of eIF5a, I would have expected a different outcome. I would recommend that the authors expand the discussion and review more literature to justify these findings.

      We appreciate this comment. This finding is indeed discussed in detail in our manuscript (Discussion, paragraphs 6-7). As we note there, while eIF5A plays a critical role in elongation, the maintenance of bursting dynamics and ribosome density upon perturbation can be explained by compensatory feedback mechanisms. Specifically, the coordinated decrease in initiation rates that counterbalances slower elongation to maintain homeostatic ribosome density. We also discuss several factors that complicate interpretation: (1) potential RQC-mediated degradation masking stronger effects in proline-rich constructs, (2) differences between GC7 treatment and genetic knockout suggesting altered stalling resolution kinetics, and (3) the limitations of using exogenous reporters that lack ER-coupled processing, which may be critical for eIF5A function in endogenous collagen translation (as suggested by Rossi et al., 2014; Mandal et al., 2016; Barba-Aliaga et al., 2021). The mechanistic complexity and tissue-specific nature of eIF5A function in mammals, which differs substantially from the better-characterized yeast system, likely contributes to the nuanced phenotype we observe. We believe our Discussion adequately addresses these points.

      (2) The AAG construct leading to slow elongation is very surprising. It is the opposite of the field consensus, where codon-optimized gene sequences are expected to elongate faster. More information about each construct should be provided. I would recommend more bioinformatic analysis on this, for example, calculating CAI for all constructs, or predicting the structures of the proteins.

      We agree that the slow elongation of the AAG construct is counterintuitive and indeed surprising. Following the reviewer's suggestion, we have now calculated the Codon Adaptation Index (CAI) for all constructs (Renilla 0.89, Col1a1 0.78, Col1a1 mutated 0.74). It is therefore unlikely that codon bias explains the slow translation, particularly since we designed the mutated Col1a1 construct with alanine codons selected to respect human codon usage bias, thereby minimizing changes in codon optimality. As we discuss in the manuscript, we hypothesize that the proline-to-alanine substitutions disrupted co-translational folding of the collagen-derived sequence. Prolines are critical for collagen triple-helix formation (Shoulders and Raines, 2009), and their replacement with alanines likely generates misfolded intermediates that cause ribosome stalling (Barba-Aliaga et al., 2021; Komar et al., 2024). This interpretation is supported by the high frequency (>30%) of incomplete run-off traces for AAG, suggesting persistent stalling events. Our findings thus illustrate an important potential caveat: "optimizing" a sequence based solely on codon usage can be detrimental when it disrupts functionally important structural features or co-translational folding pathways.

      This highlights that elongation rates depend not only on codon optimality but also on the interplay between nascent chain properties and ribosome progression.

      (3) The authors should consider using their methodology to study the effects of modifying the 5'UTR, resulting in changes in initiation rate and bursting, such as previously shown in reference Livingston et al., 2023. This may be outside of the scope of this project, but the authors could add this as a future direction and discuss if this may corroborate their conclusions. 

      We thank the reviewer for this excellent suggestion. We agree that applying our methodology to 5'-UTR variants would provide a complementary test of initiation-elongation coupling, and we have now added this as a future direction in the Discussion (L417-420).

      (4) The mathematical model and parameter inference routines are central to the conclusions of this manuscript. In order to support reproducibility, the computational code should be made available and well-documented, with a requirements file indicating the dependencies and their versions. 

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Reviewer #3 (Public review):

      Disclaimer:

      My expertise is in live single-molecule imaging of RNA and transcription, as well as associated data analysis and modeling. While this aligns well with the technical aspects of the manuscript, my background in translation is more limited, and I am not best positioned to assess the novelty of the biological conclusions.

      Summary:

      This study combines live-cell imaging of nascent proteins on single mRNAs with time-series analysis to investigate the kinetics of mRNA translation.

      The authors (i) used a calibration method for estimating absolute ribosome counts, and (ii) developed a new Bayesian approach to infer ribosome counts over time from run-off experiments, enabling estimation of elongation rates and ribosome density across conditions.

      They report (i) translational bursting at the single-mRNA level, (ii) low ribosome density (~10% occupancy

      {plus minus} a few percents), (iii) that ribosome density is minimally affected by perturbations of elongation (using a drug and/or different coding sequences in the reporter), suggesting a homeostatic mechanism potentially involving a feedback of elongation onto initiation, although (iv) this coupling breaks down upon knockout of elongation factor eIF5A.

      Strengths:

      (1) The manuscript is well written, and the conclusions are, in general, appropriately cautious (besides the few improvements I suggest below).

      (2) The time-series inference method is interesting and promising for broader applications. 

      (3) Simulations provide convincing support for the modeling (though some improvements are possible). 

      (4) The reported homeostatic effect on ribosome density is surprising and carefully validated with multiple perturbations.

      (5) Imaging quality and corrections (e.g., flat-fielding, laser power measurements) are robust.

      (6) Mathematical modeling is clearly described and precise; a few clarifications could improve it further.

      We thank the reviewer for recognizing the novelty of the approach and its rigour, and for providing suggestions to improve it further.

      Weaknesses:

      (1) The absolute quantification of ribosome numbers (via the measurement of $i_{MP}$ ) should be improved.This only affects the finding that ribosome density is low, not that it appears to be under homeostatic control. However, if $i_{MP}$ turns out to be substantially overestimated (hence ribosome density underestimated), then "ribosomes queuing up to the initiation site and physically blocking initiation" could become a relevant hypothesis. In my detailed recommendations to the authors, I list points that need clarification in their quantifications and suggest an independent validation experiment (measuring the intensity of an object with a known number of GFP molecules, e.g., MS2-GFP MS2-GFP-labeled RNAs, or individual GEMs).

      We agree with the reviewer that the estimation of the number of ribosomes is central to our finding that translation happens at low density on our reporters. This result derives from our measurement of the intensity of one mature protein (i<sub>MP</sub>), that we have achieved by using a SunTag reporter with a RH1 domain in the C terminus of the mature protein, allowing us to stabilise mature proteins via actin-tethering. In addition, as suggested by the reviewer, we already validated this result with an independent estimate of the mature protein intensity (Figure 5 - figure supplement 2B), which was obtained by adding the mature protein intensity directly as a free parameter of the HMM. The inferred value of mature protein intensity for each construct (10-15 a.u) was remarkably close to the experimental calibration result (14 ± 2 a.u.). Therefore, we have confidence that our absolute quantification of ribosome numbers is accurate.

      (2) The proposed initiation-elongation coupling is plausible, but alternative explanations, such as changes in abortive elongation frequency, should be considered more carefully. The authors mention this possibility, but should test or rule it out quantitatively. 

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new perturbation experiments is beyond the scope of the present work.

      (3) The observation of translational bursting is presented as novel, but similar findings were reported by Livingston et al. (2023) using a similar SunTag-MS2 system. This prior work should be acknowledged, and the added value of the current approach clarified.

      We did cite Livingston et al. (2023) in several places, but we recognized that we could add a few citations in key places, to make clear that the observation of bursting is not novel but is in agreement with previous results. We now did so in the Results and Discussion sections.

      (4) It is unclear what the single-mRNA nature of the inference method is bringing since it is only used here to report _average_ ribosome elongation rate and density (averaged across mRNAs and across time during the run-off experiments - although the method, in principle, has the power to resolve these two aspects).

      While decoding individual traces, our model infers shared (population-level) rates. Inferring transcript-specific parameters would be more informative, but it is highly challenging due to the uncertainty on the initial ribosome distribution on single transcripts. Pooling multiple transcripts together allows us to use some assumptions on the initial distribution and infer average elongation and initiation-rate parameters, while revealing substantial mRNA-to-mRNA variability in the posterior decoding (e.g. Figure 3 - figure Supplement 2C). Indeed, the inference still informs on the single-trace run-off time distribution (Figure 3 A) and the waiting time between termination events (Figure 3 - figure supplement 2C), suggesting the presence of stalling and bursting. In addition, the transcript-to-transcript heterogeneity is likely accounted for by our model better than previous methods (linear fit of the average run-off intensity), as suggested by their comparison (Figure 3 - figure supplement 2 A). In the future the model could be refined by introducing transcript-specific parameters, possibly in a hierarchical way, alongside shared parameters.

      (5) I did not find any statement about data availability. The data should be made available. Their absence limits the ability to fully assess and reproduce the findings.

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Major Comments:

      (1) Lack of Explicit Bursting Model

      Although translation "bursts" are observed, the current framework does not explicitly model initiation as a stochastic ON/OFF process. This limits insight into regulatory mechanisms controlling burst frequency or duration. The authors should either incorporate a two-state/more-state (bursting) model of initiation or perform statistical analysis (e.g., dwell-time distributions) to quantify bursting dynamics. They should clarify how bursting influences the interpretation of initiation rate estimates.

      We agree with the reviewer that an explicit bursting model (e.g., a two-state telegraph model) would be the ideal theoretical framework. However, integrating such a model into the TASEP-HMM inference framework is computationally intensive and complex. As a robust first step, we have opted to quantify bursting empirically based on the decoded single-mRNA traces. As shown in Figure 1G (control) and Figure 4G (perturbed conditions), we explicitly measured the duration of "ON" (translated) and "OFF" (untranslated) periods. This statistical analysis provides a quantitative description of the bursting dynamics without relying on the specific assumptions of a telegraph model. We have clarified this in the text (L123-125) and, as suggested, added a discussion (L415-417) on the potential extensions of the model to include explicit switching kinetics in the Outlook section.

      (2) Assumption of Uniform Elongation Rates

      The model assumes homogeneous elongation across coding sequences, which may not hold for stalling-prone inserts (e.g., PPG). This simplification could bias inference, particularly in cases of sequence-specific pausing. Adding simulations or sensitivity analysis to assess how non-uniform elongation affects the accuracy of inferred parameters. The authors should explicitly discuss how ribosome stalling, collisions, or heterogeneity might skew model outputs (see point 4).

      A strong stalling sequence that affects all ribosomes equally should not deteriorate the inference of the initiation rate, provided that the low-density assumption holds. The scenario where stalling events lead to higher density, and thus increased ribosome-ribosome interactions, is comparable to the conditions explored in Figure 2E. In those simulations, we tested the inference on data generated with varying initiation and elongation rates, resulting in ribosome densities ranging from low to high. We demonstrated that the inference remains robust at low ribosome densities (<10%). At higher densities, the accuracy of the initiation rate estimate decreases, whereas the elongation rate estimate remains comparatively robust. Additionally, the model tends to overestimate ribosome density under high-density conditions, likely because it neglects ribosome interference at the initiation site (Figure 2 figure supplement 2C). We agree that a deeper investigation into the consequences of stochastic stalling and bursting would be beneficial, and we have explicitly acknowledged this in the Limitations section.

      (3) Interpretation of eIF5A Knockout Phenotype

      The observation that eIF5A KO reduces initiation more than elongation, leading to decreased ribosome density, is biologically intriguing. However, the explanation invoking altered RQC kinetics is speculative and not directly tested. The authors should consider validating the RQC hypothesis by monitoring reporter mRNA stability, ribosome collision markers, or translation termination intermediates.

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new experiments is beyond the scope of the present work.

      (4) To strengthen the manuscript, the authors should incorporate insights from three studies.

      - Livingston et al. (PMC10330622) found that translation occurs in bursts, influenced by mRNA features and initiation factors, supporting the coupling of initiation and elongation.

      - Madern et al. (PMID: 39892379) demonstrated that ribosome cooperativity enhances translational efficiency, highlighting coordinated ribosome behavior.

      - Dufourt et al. (PMID: 33927056) observed that high initiation rates correlate with high elongation rates, suggesting a conserved mechanism across cell cultures and organisms.

      Integrating these studies could enrich the manuscript's interpretation and stimulate new avenues of thought.

      We thank the reviewer for the valuable comment. We added citations of Livingston et al. in the context of translational bursting. We already cited Madern et al. in multiple places and, although its observations of ribosome cooperativity are very compelling, they cannot be linked with our observations of a feedback between initiation and elongation, and it would be very challenging to see a similar effect on our reporters. This is why we did not expressly discuss cooperativity. We also integrated Dufourt et al. in the Discussion about the possibility of designing genetically-encoded reporter. We also added a sentence about the possibility of using an ER-specific SunTag reporter, as done recently in Choi et al., Nature (2025) (https://doi.org/10.1038/s41586-025-09718-0).

      Minor Comments:

      (1) Use consistent naming for SunTag reporters (e.g., "PPG" vs "proline-rich") throughout.

      Thank you for the comment. However, the term proline-rich always appears together with PPG, so we believe that the naming is clear and consistent.

      (2) Consider a schematic overview of the experimental design and modeling pipeline for accessibility.

      Thank you for the suggestion. We consider that experimental design and modeling is now sufficiently clearly described and does not justify an additional scheme. 

      (3) Clarify how incomplete run-off traces are handled in the HMM inference.

      Incomplete run-off traces are treated identically to complete traces in our HMM inference. This is possible because our model relies on the probability of transitions occurring per time step to infer rates. It does not require observing the final "empty" state to estimate the kinetic parameters ɑ and λ. The loss of signal (e.g., mRNA moving out of the focal volume or photobleaching) does not invalidate the kinetic information contained in the portion of the trace that was observed. We have clarified this in the Methods section.

      Reviewer #2 (Recommendations for the authors):

      (1) Reproducibility:

      (1.1) The authors should use a GitHub repository with a timestamp for the release version.

      The code is available on GitHub (https://github.com/naef-lab/suntag-analysis).

      (1.2) Make raw images and data available in a figure repository like Figshare.

      The raw images (.ome.tif) are now available on Zenodo (https://zenodo.org/records/17669332).

      (2) Paper reorganization and expansion of the intensity and ribosome quantification:

      (2.1) Given the relevance of the initiation and elongation rates for the conclusions of this study, and the fact that the authors inferred these rates from the spot intensities. I recommend that the authors move Figure 1 Supplement 2 to the main text and expand the description of the process to relate spot intensity and number of ribosomes. Please also expand the figure caption for this image.

      We agree with the importance of this validation. We have expanded the description of the calibration experiment in the main text and in the figure caption.

      (2.2) I suggest the authors explicitly mention the use of HMM in the abstract.

      We have now explicitly mentioned the TASEP-based HMM in the abstract.

      (2.3) In line 492, please add the frame rate used to acquire the images for the run-off assays.

      We have added the specific frame rate (one frame every 20 seconds) to the relevant section.

      (3) Figures and captions:

      (3.1) Figure 1, Supplement 2. Please add a description of the colors used in plots B, C. 

      We have expanded the caption and added the color description.

      (3.2) In the Figure 2 caption. It is not clear what the authors mean by "traceseLife". Please ensure it is not a typo.

      Thank you for spotting this. We have corrected the typo.

      (3.3) Figure 1 A, in the cartoon N(alpha)->N-1, shouldn't the transition also depend on lambda?

      The transition probability was explicitly derived in the “Bayesian modeling of run-off traces” section (Eqs. 17-18), and does not depend on λ, but only on the initiation rate under the low-density assumption.

      (3.4) Figure 3, Supplement 2. "presence of bursting and stalling.." has a typo.

      Corrected.

      (3.5) Figure 5, panel C, the y-axis label should be "run-off time (min)."

      Corrected.

      (3.6) For most figures, add significance bars.

      (3.7) In the figure captions, please add the total number of cells used for each condition.

      We have systematically indicated the number of traces (n<sub>t</sub>) and the number of independent experiments (n<sub>e</sub>) in the captions in this format (n<sub>t</sub>, n<sub>e</sub>).

      (4) Mathematical Methods:

      We greatly thank the reviewer for their detailed attention to the mathematical notation. We have addressed all points below.

      (4.1) In lines 555, Materials and Methods, subsection, Quantification of Intensity Traces, multiple equations are not numbered. For example, after Equation (4), no numbers are provided for the rest of the equations. Please keep consistency throughout the whole document.

      We have ensured that all equations are now consistently numbered throughout the document.

      (4.2) In line 588, the authors mention "$X$ is a standard normal random variable with mean $\mu$ and standard deviation $s_0$". Please ensure this is correct. A standard normal random variable has a 0 mean and std 1. 

      Thank you for the suggestion, we have corrected the text (L678).

      (4.3) Line 546, Equation 2. The authors use mu(x,y) to describe a 2d Gaussian function. But later in line 587, the authors reuse the same variable name in equation 5 to redefine the intensity as mu = b_0 + I.

      We have renamed the 2D Gaussian function to \mu_{2D}(x,y) in the spot tracking section

      (4.4) For the complete document, it could be beneficial to the reader if the authors expand the definition of the relationship between the signal "y" and the spot intensity "I". Please note how the paragraph in lines 582-587 does not properly introduce "y".

      We have added an explicit definition of y and its relationship to the underlying spot intensity I in the text to improve readability and clarity.

      (4.5) Please ensure consistency in variable names. For example, "I" is used in line 587 for the experimental spot intensity, then line 763 redefines I(t) as the total intensity obtained from the TASEP model; please use "I_sim(t)" for simulated intensities. Please note that reusing the variable "I" for different contexts makes it hard for the reader to follow the text. 

      We agree that this was confusing. We have implemented the suggestion and now distinguish simulated intensities using the notation I<sub>S</sub> .

      (4.6) Line 555 "The prior on the total intensity I is an "uninformative" prior" I ~ half_normal(1000). Please ensure it is not "I_0 ~ half_normal(1000)."? 

      We confirm that “I” is the correct variable representing the total intensity in this context; we do not use an “I<sub>0</sub>” variable here.

      (4.7) In lines 595, equation 6. Ensure that the equation is correct. Shouldn't it be: s_0^2 = ln ( 1 + (sigma_meas^2 / ⟨y⟩^2) )? Please ensure that this is correct and it is not affecting the calculated values given in lines 598.

      Thank you for catching this typo. We have corrected the equation in the manuscript. We confirm that the calculations performed in the code used the correct formula, so the reported values remain unchanged.

      (4.8) In line 597, "the mean intensity square ^2". Please ensure it is not "the square of the temporal mean intensity."

      We have corrected the text to "the square of the temporal mean intensity."

      (4.9) In lines 602-619, Bayesian modeling of run-off traces, please ensure to introduce the constant "\ell". Used to define the ribosomal footprint?

      We have added the explicit definition of 𝓁 as the ribosome footprint size (length of transcript occupied by one ribosome) in the "Bayesian modeling of run-off traces" section.

      (4.10) Line 687 has a minor typo "[...] ribosome distribution.. Then, [...]"

      We have corrected the punctuation.

      (4.11) In line 678, Equation 19 introduces the constant "L_S", Please ensure that it is defined in the text.

      We have added the explicit definition of L<sub>S</sub> (the length of the SunTag) to the text surrounding Equation 19.

      (4.12) In line 695, Equation 22, please consider using a subscript to differentiate the variance due to ribosome configuration. For example, instead of "sigma (...)^2" use something like "sigma_c ^2 (...)". Ensure that this change is correctly applied to Equation 24 and all other affected equations.

      Thank you, we have implemented the suggestions.

      (4.13) In line 696, please double-check equations 26 and 27. Specifically, the denominator ^2. Given the previous text, it is hard to follow the meaning of this variable. 

      We have revised the notation in Equations 26 and 27 to ensure the denominator is consistent with the definitions provided in the text.

      (4.14) In lines 726, the authors mention "[...], but for the purposes of this dissertation [...]", it should be "[...], but for the purposes of this study [...]"

      Thank you for spotting this. We have replaced "dissertation" with "study."

      (4.15) Equations 5, 28, 37, and the unnumbered equation between Equations 16 and 17 are similar, but in some, "y" does not explicitly depend on time. Please ensure this is correct. 

      We have verified these equations and believe they are correct.

      (4.16) Please review the complete document and ensure that variables and constants used in the equations are defined in the text. Please ensure that the same variable names are not reused for different concepts. To improve readability and flow in the text, please review the complete Materials and Methods sections and evaluate if the modeling section can be written more clearly and concisely. For example, Equation 28 is repeated in the text.

      We have performed a comprehensive review of the Materials and Methods section. To improve conciseness and flow, we have merged the subsection “Observation model and estimation of observation parameters” with the “Bayesian modeling of run-off traces” section. This allowed us to remove redundant definitions and repeated equations (such as the previous Equation 28). We have also checked that all variables and constants are defined upon first use and that variable names remain consistent throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Data Presentation

      (1.1) In main Figures 1D and 4E, the traces appear to show frequent on-off-on transitions ("bursting"), but in supplementary figures (1-S1A and 4-S1A), this behavior is seen in only ~8 of 54 traces. Are the main figure examples truly representative?

      We acknowledge the reviewer's point. In Figure 1D, we selected some of the longest and most illustrative traces to highlight the bursting dynamics. We agree that the term "representative" might be misleading if interpreted as "average." We have updated the text to state "we show bursting traces" to more accurately reflect the selection.

      (1.2) There are 8 videos, but I could not identify which is which.

      Thank you for pointing this out. We have renamed the video files to clearly correspond to the figures and conditions they represent.

      (2) Data Availability:

      As noted above, the data should be shared. This is in accordance with eLife's policy: "Authors must make all original data used to support the claims of the paper, or that are required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). [...] eLife considers works to be published when they are posted as preprints, and expects preprints we review to meet the standards outlined here." Access to the time traces would have been helpful for reviewers.

      We have now added the Github link for the code (https://github.com/naef-lab/suntag-analysis) and deposited the raw data (.ome.tif files) on Zenodo (10.5281/zenodo.17669332).

      (3) Model Assumptions:

      (3.1) The broad range of run-off times (Figure 3A) suggests stalling, which may be incompatible with the 'low-density' assumption used on the TASEP model, which essentially assumes that ribosomes do not bump into each other. This could impact the validity of the assumptions that ribosomes behave independently, elongate at constant speed (necessary for the continuum-limit approximation), and that the rate-limiting step is the initiation. How robust are the inferences to this assumption?

      We agree that the deviation of waiting times from an exponential distribution (Figure 3 - figure supplement 2C) suggests the presence of stalling, which challenges the strict low-density assumption and constant elongation speed. We explicitly explored the robustness of our model to higher ribosome densities in simulations. As shown in Figure 2 - figure supplement 2, while the model accuracy for single parameters deteriorates at very high densities (overestimating density due to neglected interference), it remains robust for estimating global rates in the regime relevant to our data. We have expanded the discussion on the limitations of the low density and homogeneous elongation rate assumptions in the text (L404-408).

      (3.2) Since all constructs share the same SunTag region, elongation rates should be identical there and diverge only in the variable region. This would affect $\gamma (t)$ and hence possibly affect the results. A brief discussion would be helpful.

      This is a valid point. Currently, our model infers a single average elongation rate that effectively averages the behavior over the SunTag and the variable CDS regions. Modeling distinct rates for these regions would be a valuable extension but adds significant complexity. While our current "effective rate" approach might underestimate the magnitude of differences between reporters, it captures the global kinetic trend. We have added a brief discussion acknowledging this simplification (L408-412).

      (3.3) A similar point applies to the Gillespie simulations: modeling the SunTag region with a shared elongation rate would be more accurate.

      We agree. Simulating distinct rates for the SunTag and CDS would increase realism, though our current homogeneous simulations serve primarily to benchmark the inference framework itself. We have noted this as a potential future improvement (L413-414).

      (3.4) Equation (13) assumes that switching between bursting and non-bursting states is much slower than the elongation time. First, this should be made explicit. Second, this is not quite true (~5 min elongation time on Figure 3-s2A vs ~5-15min switching times on Figure 1). It would be useful to show the intensity distribution at t=0 and compare it to the expected mixture distribution (i.e., a Poisson distribution + some extra 'N=0' cells). 

      We thank the reviewer for this insightful comment. We have added a sentence to the text explicitly stating the assumption that switching dynamics are slower than the translation time. While the timescales are indeed closer than ideal (5 min vs. 5-15 min), this assumption allows for a tractable approximation of the initial conditions for the run-off inference. Comparing the intensity distribution at t=0 to a zero-inflated Poisson distribution is an excellent suggestion for validation, which we will consider for future iterations of the model.

      (4) Microscopy Quantifications:

      (4.1) Figure 1-S2A shows variable scFv-GFP expression across cells. Were cells selected for uniform expression in the analysis? Or is the SunTag assumed saturated? which would then need to be demonstrated. 

      All cell lines used are monoclonal, and cells were selected via FACS for consistent average cytoplasmic GFP signal. We assume the SunTag is saturated based on the established characterization of the system by Tanenbaum et al. (2014), where the high affinity of the scFv-GFP ensures saturation at expression levels similar to ours.

      (4.2) As translation proceeds, free scFv-GFP may become limiting due to the accumulation of mature SunTag-containing proteins. This would be difficult to detect (since mature proteins stay in the cytoplasm) and could affect intensity measurements (newly synthesized SunTag proteins getting dimmer over time).

      This effect can occur with very long induction times. To mitigate this, we optimized the Doxycycline (Dox) incubation time for our harringtonine experiments to prevent excessive accumulation of mature protein. We also monitor the cytoplasmic background for granularity, which would indicate aggregation or accumulation.

      (4.3) The statements "for some traces, the mRNA signal was lost before the run-off completion" (line 195) and "we observed relatively consistent fractions of translated transcripts and trace duration distributions across reporters" (line 340) should be supported by a supplementary figure.

      The first statement is supported by Figure 2 - figure supplement 1, which shows representative run-off traces for all constructs, including incomplete ones.

      The second statement regarding consistency is supported by the quantitative data in Figure 1E and G, which summarize the fraction of translated transcripts and trace durations across conditions.

      (4.4) Measurements of single mature protein intensity $i_{MP}$:

      (4.4.1) Since puromycin is used to disassemble elongating ribosomes, calibration may be biased by incomplete translation products (likely a substantial fraction, since the Dox induction is only 20min and RNAs need several minutes to be transcribed, exported, and then fully translated).

      As mentioned in the “Live-cell imaging” paragraph, the imaging takes place 40 min after the end of Dox incubation. This provides ample time for mRNA export and full translation of the synthesized proteins. Consequently, the fraction of incomplete products generated by the final puromycin addition is negligible compared to the pool of fully synthesized mature proteins accumulated during the preceding hour.

      (4.4.2) Line 519: "The intensity of each spot is averaged over the 100 frames". Do I understand correctly that you are looking at immobile proteins? What immobilizes these proteins? Are these small aggregates? It would be surprising that these aggregates have really only 1, 2, or 3 proteins, as suggested by Figure 1-S2A.

      We are visualizing mature proteins that are specifically tethered to the actin cytoskeleton. This is achieved using a reporter where the RH1 domain is fused directly to the C-terminus of the Renilla protein (SunTag-Renilla-RH1). The RH1 domain recruits the endogenous Myosin Va motor, which anchors the protein to actin filaments, rendering it immobile. Since each Myosin Va motor interacts with one RH1 domain (and thus one mature protein), the resulting spots represent individual immobilized proteins rather than aggregates. We have now revised the text and Methods section to make this calibration strategy and the construct design clearer (L130-140).

      (4.4.3) Estimating the average intensity $i_{MP}$ of single proteins all resides in the seeing discrete modes in the histogram of Figure 1-S2B, which is not very convincing. A complementary experiment, measuring *on the same microscope* the intensity of an object with a known number of GFP molecules (e.g., MS2-GFP labeled RNAs, or individual GEMs https://doi.org/10.1016/j.cell.2018.05.042 (only requiring a single transfection)) would be reassuring to convince the reader that we are not off by an order of magnitude.

      While a complementary calibration experiment would be valuable, we believe our current estimate is robust because it is independently validated by our model. When we inferred i<sub>MP</sub> as a free parameter in the HMM (Figure 5 - figure supplement 2B), the resulting value (10-15 a.u.) was remarkably consistent with our experimental calibration (14 ± 2 a.u.). We have clarified this independent validation in the text to strengthen the confidence in our quantification (L264-272).

      (4.4.4) Further on the histogram in Figure 1-S2B:

      - The gap between the first two modes is unexpectedly sharp. Can you double-check? It means that we have a completely empty bin between two of the most populated bins.

      We have double-checked the data; the plot is correct, though the sharp gap is likely due to the small sample size (n=29).

      - I am surprised not to see 3 modes or more, given that panel A shows three levels of intensity (the three colors of the arrows).

      As noted below, brighter foci exist but fall outside the displayed range of the histogram.

      - It is unclear what the statistical test is and what it is supposed to demonstrate.

      The Student's t-test compares the means of the two identified populations to confirm they are statistically distinct intensity groups.

      - I count n = 29, not 31. (The sample is small enough that the bars of the histogram show clear discrete heights, proportional to 1, 2, 3, 4, and 5 --adding up all the counts, I get 29). Is there a mistake somewhere? Or are some points falling outside of the displayed x-range?

      You are correct. Two brighter data points fell outside the displayed range. The total number of foci in the histogram is 29. We have corrected the figure caption and the text accordingly.

      (5) Miscellaneous Points: 

      (5.1) Panel B in Figure 2-s1 appears to be missing.

      The figure contains only one panel.

      (5.2) In Equation (7), $l$ is not defined (presumably ribosome footprint length?). Instead, $J$ is defined right after eq (7), as if it were used in this equation.

      Thank you for pointing this out, we have corrected it.

      (5.3) Line 703, did you mean to write something else than "Equation 26" (since equation 26 is defined after)?

      Yes, this was a typo. We have corrected the cross-reference.

    1. eLife Assessment

      This manuscript reports important findings indicating that cell cycle progression and cytokinesis both contribute to the transition from early to late neural stem cell fates. Loss-of-function experimental evidence convincingly shows that disrupting the cell cycle or cytokinesis can alter cell fate. This work sets the stage for future investigations into the underlying mechanisms linking the cell cycle to the expression of temporal factors controlling cell fate.

    2. Reviewer #1 (Public review):

      Summary:

      Drosophila larval type II neuroblasts generate diverse types of neurons by sequentially expressing different temporal identity genes during development. Previous studies have shown that transition from early temporal identity genes (such as Chinmo and Imp) to late temporal identity genes (such as Syp and Broad) depends on the activation of the expression of EcR by Seven-up (Svp) and progression through the G1/S transition of the cell cycle. In this study, Chaya and Syed examined if the expression of Syp and EcR is regulated by cell cycle and cytokinesis by knocking down CDK1 or Pav, respectively, throughout development or at specific developmental stages. They find that knocking down CDK1 or Pav either in all type II neuroblasts throughout the development or in single type neuroblast clones after larval hatching consistently leads to failure to activate late temporal identity genes Syp and EcR. To determine whether the failure of the activation of Syp and EcR is due to impaired Svp expression, they also examined Svp expression using a Svp-lacZ reporter line. They find that Svp is expressed normally in CDK1 RNAi neuroblasts. Further, knocking down CDK1 or Pav after Svp activation still leads to loss of Syp and EcR expression. Finally, they also extended their analysis to type I neuroblasts. They find that knocking down CDK1 or Pav, either at 0 hours or at 42 hours after larval hatching, also results in loss of Syp and EcR expression in type I neuroblasts. Based on these findings, the authors conclude that cycle and cytokinesis are required for the transition from early to late late temporal identity genes in both types of neuroblasts. These findings add mechanistic details to our understanding of the temporal patterning of Drosophila larval neuroblasts.

      Strengths:

      The data presented in the paper are solid and largely support their conclusion. Images are of high quality. The manuscript is well-written and clear.

      Weaknesses:

      The authors have addressed all the weaknesses in this revision.

    3. Reviewer #2 (Public review):

      Summary:

      Neural stem cells produce a wide variety of neurons during development. The regulatory mechanisms of neural diversity are based on the spatial and temporal patterning of neural stem cells. Although the molecular basis of spatial patterning is well-understood, the temporal patterning mechanism remains unclear. In this manuscript, the authors focused on the roles of cell cycle progression and cytokinesis in temporal patterning and found that both are involved in this process.

      Strengths:

      They conducted RNAi-mediated disruption on cell cycle progression and cytokinesis. As they expected, both disruptions affected temporal patterning in NSCs.

      Weaknesses:

      Although the authors showed clear results, they needed to provide additional data to support their conclusion sufficiently.

      For example, they can examine the effects of cell cycle acceleration on the temporal patterning.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chaya and Syed focuses on understanding the link between cell cycle and temporal patterning in central brain type II neural stem cells (NSCs). To investigate this, the authors perturb the progression of the cell cycle by delaying the entry into M phase and preventing cytokinesis. Their results convincingly show that temporal factor expression requires progression of the cell cycle in both Type 1 and Type 2 NSCs in the Drosophila central brain. Overall, this study establishes an important link between the two timing mechanisms of neurogenesis.

      Strengths:

      The authors provide solid experimental evidence for the coupling of cell cycle and temporal factor progression in Type 2 NSCs. The quantified phenotype shows an all-or-none effect of cell cycle block on the emergence of subsequent temporal factors in the NSCs, strongly suggesting that both nuclear division and cytokinesis are required for temporal progression. The authors also extend this phenotype to Type 1 NSCs in the central brain, providing a generalizable characterization of the relationship between cell cycle and temporal patterning.

      Weaknesses:

      One major weakness of the study is that the authors do not explore the mechanistic relationship between cell cycle and temporal factor expression. Although their results are quite convincing, they do not provide an explanation as to why Cdk1 depletion affects Syp and EcR expression but not the onset of svp. This result suggests that at least a part of the temporal cascade in NSCs is cell-cycle independent which isn't addressed or sufficiently discussed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Drosophila larval type II neuroblasts generate diverse types of neurons by sequentially expressing different temporal identity genes during development. Previous studies have shown that the transition from early temporal identity genes (such as Chinmo and Imp) to late temporal identity genes (such as Syp and Broad) depends on the activation of the expression of EcR by Seven-up (Svp) and progression through the G1/S transition of the cell cycle. In this study, Chaya and Syed examined whether the expression of Syp and EcR is regulated by cell cycle and cytokinesis by knocking down CDK1 or Pav, respectively, throughout development or at specific developmental stages. They find that knocking down CDK1 or Pav either in all type II neuroblasts throughout development or in single-type neuroblast clones after larval hatching consistently leads to failure to activate late temporal identity genes Syp and EcR. To determine whether the failure of the activation of Syp and EcR is due to impaired Svp expression, they also examined Svp expression using a Svp-lacZ reporter line. They find that Svp is expressed normally in CDK1 RNAi neuroblasts. Further, knocking down CDK1 or Pav after Svp activation still leads to loss of Syp and EcR expression. Finally, they also extended their analysis to type I neuroblasts. They find that knocking down CDK1 or Pav, either at 0 hours or at 42 hours after larval hatching, also results in loss of Syp and EcR expression in type I neuroblasts. Based on these findings, the authors conclude that cycle and cytokinesis are required for the transition from early to late temporal identity genes in both types of neuroblasts. These findings add mechanistic details to our understanding of the temporal patterning of Drosophila larval neuroblasts.

      Strengths:

      The data presented in the paper are solid and largely support their conclusion. Images are of high quality. The manuscript is well-written and clear.

      We appreciate the reviewer’s detailed summary and recognition of the study’s strengths.

      Weaknesses:

      The quantifications of the expression of temporal identity genes and the interpretation of some of the data could be more rigorous.

      (1) Expression of temporal identity genes may not be just positive or negative. Therefore, it would be more rigorous to quantify the expression of Imp, Syp, and EcR based on the staining intensity rather than simply counting the number of neuroblasts that are positive for these genes, which can be very subjective. Or the authors should define clearly what qualifies as "positive" (e.g., a staining intensity at least 2x background).

      We thank the reviewer for this helpful suggestion. In the new version, we have now clarified how positive expression was defined and added more details of our quantification strategy to the Methods section (page 11, lines 380-388; lines 426-434 in tracked changes file). Fluorescence intensity for each neuroblast was normalized to the mean intensity of neighboring wild-type neuroblasts imaged in the same field. A neuroblast was considered positive for a given factor when its normalized nuclear intensity was at least 2× the local background. This scoring criterion was applied uniformly across all genotypes and time points. All quantifications were performed on the raw LSM files in Fiji prior to assembling the figure panels.

      (2) The finding that inhibiting cytokinesis without affecting nuclear divisions by knocking down Pav leads to the loss of expression of Syp and EcR does not support their conclusion that nuclear division is also essential for the early-late gene expression switch in type II NSCs (at the bottom of the left column on page 5). No experiments were done to specifically block the nuclear division in this study specifically. This conclusion should be revised.

      We blocked both cell cycle progression and cytokinesis, and both these manipulations affected temporal gene transitions, suggesting that both cell cycle and cytokinesis are essential. To our knowledge, no mechanism/tool exists that selectively blocks nuclear division while leaving cell cycle progression intact. We have added more clarification on page 4, line 123 onwards (lines 126 onwards in tracked changes file).

      (3) Knocking down CDK1 in single random neuroblast clones does not make the CDK1 knockdown neuroblast develop in the same environment (except still in the same brain) as wild-type neuroblast lineages. It does not help address the concern whether "type 2 NSCS with cell cycle arrest failed to undergo normal temporal progression is indirectly due to a lack of feedback signaling from their progeny", as discussed (from the bottom of the right column on page 9 to the top of the left column on page 10). The CDK1 knockdown neuroblasts do not divide to produce progeny and thus do not receive a feedback signal from their progeny as wild-type neuroblasts do. Therefore, it cannot be ruled out that the loss of Syp and EcR expression in CDK1 knockdown neuroblasts is due to the lack of the feedback signal from their progeny. This part of the discussion needs to be clarification.

      Thanks to the reviewer for raising this critical point. We agree and have added more clarification of our interpretations and limitations to our studies in the revised text on page 8, line 278-282 (lines 296-300 in tracked changes file)

      (4) In Figure 2I, there is a clear EcR staining signal in the clone, which contradicts the quantification data in Figure 2J that EcR is absent in Pav RNAi neuroblasts. The authors should verify that the image and quantification data are consistent and correct.

      When cytokinesis is blocked using pav-RNAi, the neuroblasts become extremely large and multinucleated. In some large pav RNAi clones, we observed a weak EcR signal near the cell membrane. However, more importantly, none of the nuclear compartments showed detectable EcR staining, where EcR is typically localized. We selected a representative nuclear image for the figure panel. To clarify this observation, we have now added an explanatory note to the discussion section on page 8, lines 283-291 (lines 301-309 in tracked changes file).

      Reviewer #2 (Public review):

      Summary:

      Neural stem cells produce a wide variety of neurons during development. The regulatory mechanisms of neural diversity are based on the spatial and temporal patterning of neural stem cells. Although the molecular basis of spatial patterning is well-understood, the temporal patterning mechanism remains unclear. In this manuscript, the authors focused on the roles of cell cycle progression and cytokinesis in temporal patterning and found that both are involved in this process.

      Strengths:

      They conducted RNAi-mediated disruption on cell cycle progression and cytokinesis. As they expected, both disruptions affected temporal patterning in NSCs.

      We appreciate the reviewer’s positive assessment of our experimental results.

      Weaknesses:

      Although the authors showed clear results, they needed to provide additional data to support their conclusion sufficiently.

      For example, they need to identify type II NSCs using molecular markers (Ase/Dpn).The authors are encouraged to provide a more detailed explanation of each experiment. The current version of the manuscript is difficult for non-expert readers to understand.

      Thanks for your feedback. We have now included a detailed description of how we identify type II NSCs in both wild-type and mutant clones. We have also added a representative Asense staining to clearly distinguish type 1 (Ase<sup>+</sup>) from type 2 (Ase<sup>-</sup>) NSCs see Figure S1. We have also added a resources table explaining the genotypes associated with each figure, which was omitted due to an error in the previous version of the manuscript. 

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chaya and Syed focuses on understanding the link between cell cycle and temporal patterning in central brain type II neural stem cells (NSCs). To investigate this, the authors perturb the progression of the cell cycle by delaying the entry into M phase and preventing cytokinesis. Their results convincingly show that temporal factor expression requires progression of the cell cycle in both Type 1 and Type 2 NSCs in the Drosophila central brain. Overall, this study establishes an important link between the two timing mechanisms of neurogenesis.

      Strengths:

      The authors provide solid experimental evidence for the coupling of cell cycle and temporal factor progression in Type 2 NSCs. The quantified phenotype shows an all-ornone effect of cell cycle block on the emergence of subsequent temporal factors in the NSCs, strongly suggesting that both nuclear division and cytokinesis are required for temporal progression. The authors also extend this phenotype to Type 1 NSCs in the central brain, providing a generalizable characterization of the relationship between cell cycle and temporal patterning.

      We thank the reviewer for recognizing the robustness of our data linking the cell cycle to temporal progression.

      Weaknesses:

      One major weakness of the study is that the authors do not explore the mechanistic relationship between the cell cycle and temporal factor expression. Although their results are quite convincing, they do not provide an explanation as to why Cdk1 depletion affects Syp and EcR expression but not the onset of svp. This result suggests that at least a part of the temporal cascade in NSCs is cell-cycle independent, which isn't addressed or sufficiently discussed.

      Thank you for bringing up this important point. We are equally interested in uncovering the mechanism by which the cell cycle regulates temporal gene transitions; however, such mechanistic exploration is beyond the scope of the present study. Interestingly, while the temporal switching factor Svp is expressed independently of the cell cycle, the subsequent temporal transitions are not. We have expanded our discussion on this intriguing finding (page 9, line 307-315; lines 345-355 in tracked changes file). Specifically, we propose that svp activation marks a cell-cycle–independent phase, whereas EcR/Syp induction likely depends on cell-cycle–coupled mechanisms, such as mitosis-dependent chromatin remodeling or daughter-cell feedback. Although further dissection of this mechanism lies beyond the current study, our findings establish a foundation for future work aimed at identifying how developmental timekeeping is molecularly coupled to cell-cycle progression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Figure 1 C and D, it would be better to put a question mark to indicate that these are hypotheses to be tested. 

      We appreciate this suggestion and have added question marks in Figure 1C and 1D to clearly indicate that these panels represent hypotheses under investigation clearly.

      (2) Figure 2A-I, Figure 4A-I, Figure 5A-I and K-S, in addition to enlarged views of single type II neuroblasts, it would be more convincing to include zoomed-out images of the entire larval brain or at least a portion of the brain to include neighboring wild-type type II neuroblasts as internal controls. Also, it would be ideal to show EcR staining from the same neuroblasts as IMP and Syp staining. 

      We thank the reviewer for this valuable input. In our imaging setup, the number of available antibody channels was limited to four (anti-Ase, anti-GFP, anti-Syp, and antiImp). Adding EcR in the same sample was therefore not technically possible, we performed EcR staining separately. 

      (3) The authors cited "Syed et al., 2024" (in the middle of the right column on page 5), but this reference is missing in the "References" section and should be added. 

      The missing citation has been added to the reference section.  

      (4) It would be better to include Ase staining in the relevant figure to indicate neuroblast identity as type I or type II. 

      We agree and now include representative Ase staining for both type 1 and type 2 NSC clones in Figure S1, along with corresponding text updates that describe these markers.

      Reviewer #2 (Recommendations for the authors): 

      Major comments 

      (1) The present conclusion relies on the results using Cdk1 RNAi and pav RNAi. It is still possible that Cdk1 and Pav are involved in the regulation of temporal patterning independent of the regulation of cell cycle or cytokinesis, respectively. To avoid this possibility, the authors need to inhibit cell cycle progression or cytokinesis in another alternative manner. 

      We thank the reviewer for raising this important point. While we cannot completely exclude gene-specific, cell-cycle-independent roles for Cdk1 or Pav, we observe consistent phenotypes across several independent manipulations that slow or block the cell cycle. Also, earlier studies using orthogonal approaches that delay G1/S (Dacapo/Rbf) or impair mitochondrial OxPhos (which lengthens G1/S; van den Ameele & Brand, 2019) produce similar temporal delays. These concordant phenotypes strongly support the interpretation that altered cell-cycle progression—rather than specific roles of a single gene—is the primary cause of the defect. While we cannot exclude additional, gene-specific effects of Cdk1 or Pav, the concordant phenotypes across independent perturbations make the cell-cycle disruption model the most parsimonious interpretation. We have clarified this reasoning in the discussion section on pages 8-9, lines 293-305 (lines 311-343 in tracked changes file).

      (2) To reach the present conclusion, the authors need to address the effects of acceleration of cell cycle progression or cytokinesis on temporal patterning. 

      We thank the reviewer for this insightful suggestion. To our knowledge, there are currently no established genetic tools that can specifically accelerate cell-cycle progression in Drosophila neuroblasts. However, our results demonstrate that blocking the cell cycle impairs the transition from early to late temporal gene expression. These findings suggest that proper cell-cycle progression is essential for the transition from early to late temporal identity in neuroblasts.

      Minor comments 

      (3) P3L2 (right), ... we blocked the NSC cell cycle...

      How did they do it? 

      Which fly lines were used?

      Why did they use the line? 

      These details are now included in the Materials and Methods and the Resource Table (pages 11-13). We used Wor-Gal4, Ase-Gal80 to drive UAS-Cdk1RNAi and UASpavRNAi in type 2 NSCs 

      (4) P5L1(left), ... we used the flip-out approach...

      Why did they conduct it? 

      Probably, the authors have reasons other than "to further ensure." 

      We have clarified in the text on page 4, lines 137-139, that the flip-out approach was used to generate random single-cell clones, enabling quantitative analysis of type 2 NSCs within an otherwise wild-type brain. 

      (5) P5L8(left), ... type 2 hits were confirmed by lack of the type 1 Asense...  The authors must examine Deadpan (Dpn) expression as well. Because there are a lot of Asense (Ase) negative cells in the brain (neurons, glial cell, and neuroepithelial cells). 

      Type II NSCs can be identified as Dpn+/Ase- cells.

      We agree that Dpn is a helpful marker. However, we reliably distinguished type II NSCs by their lack of Ase and larger cell size relative to surrounding neurons and glia, which are smaller in size and located deeper within the clone. These differences, together with established lineage patterns, allow unambiguous identification of type 2 NSCs across all genotypes. We have now added representative type I and type 2 NSC clones to the supplemental figure S1 (E-G’) with Asense stains to demonstrate how we differentiate type I from type II NSCs. 

      (6) P5L32(left), To do this, we induced... 

      This sentence should be made more concise.

      Please rephrase it. 

      The sentence has been rewritten for clarity and concision.

      (7)  P5L42(left), ...lack of EcR/Syp expression (Figure 2).  However, EcR expression is still present (Figure 2I). 

      In some large pavRNAi clones, a weak EcR signal can be observed near the cell membrane; however, none of the nuclear compartments—where EcR is typically localized—show detectable staining. We selected a representative nuclear image for the figure and addressed this observation on page 8, lines 283-291 (lines 301-309 in tracked changes file).

      (8) P7L29(left), ......had persistent Imp expression...

      Imp expression is faint compared to that in Figure 2G.

      The differences between Figures 2G and 3G should be discussed. 

      We thank the reviewer for this comment. We have added a note in the Methods section clarifying that brightness and contrast were adjusted per panel for optimal visualization; thus, apparent differences in signal intensity do not reflect biological variation. Fluorescence intensity for each neuroblast was normalized to the mean intensity of neighboring wild-type neuroblasts imaged in the same field. A neuroblast was considered Imp-positive when its normalized nuclear intensity was at least 2× the local background. This scoring criterion was applied uniformly across all genotypes and time points. All quantifications were performed on the raw LSM files in Fiji prior to assembling the figure panels.

      (9) P8 (Figure 5)

      The Imp expression is faint compared to that in Figure 5Q.

      The difference between Figure 5G and 5Q should be discussed further. 

      As mentioned above, we have clarified our image processing approach in the Methods section to explain any differences in signal appearance between these figures.

      (10) P10 Materials and Methods

      The authors did not mention the fly lines used. This is very important for the readers. 

      We thank the reviewer for bringing this oversight to our attention. The Resource Table was inadvertently omitted from the initial submission. The complete list of fly lines and reagents used in this study is now provided in the updated Resource Table.

      Reviewer #3 (Recommendations for the authors): 

      Major points 

      (1) The authors mention that the heat-shock induction at 42ALH is well after svp temporal window and therefore the cell cycle block independently affects Syp and EcR expression. However, Figure 3 shows svp-LacZ expression at 48ALH. If svp expression is indeed transient in Type 2 NSCs, then this must be validated using an immunostaining of the svp-LacZ line with svp antibody. This is crucial as the authors claim that cell cycle block doesn't affect does affect svp expression and is required independently. 

      We thank the reviewer for bringing this important issue to our attention. As noted, Svp protein is expressed transiently and stochastically in type 2 NSCs (Syed et al., 2017), making direct antibody quantification challenging upon cell cycle block. Consistent with previous work (Syed et al., 2017), we used the svp-LacZ reporter line to visualize stabilized Svp expression, which reliably captures Svp expression in type 2 NSCs (Syed et al., 2017 https://doi.org/10.7554/eLife.26287, and Dhilon et al., 2024 https://doi.org/10.1242/dev.202504).

      (2) The authors have successfully slowed down the cell cycle and showed that it affects temporal progression. However, a converse experiment where the cell cycle is sped up in NSCs would be an important test for the direct coupling of temporal factor expression and cell cycle, wherein the expectation would be the precocious expression of late temporal factors in faster cycle NSCs. 

      We agree that such an experiment would be ideal. However, as noted above (Reviewer #2 comment 2), to our knowledge, no suitable tools currently exist to accelerate neuroblast cell-cycle progression without pleiotropic effects.

      Minor point 

      The authors must include Ray and Li (https://doi.org/10.7554/eLife.75879) in the references when describing that "...cell cycle has been shown to influence temporal patterning in some systems,...".  

      We thank the reviewer for this helpful suggestion. The cited reference (Ray and Li, eLife, 2022) has now been included and appropriately referenced in the revised manuscript.

    1. eLife Assessment

      This valuable study investigates the computational role of top-down feedback - a property found in biological circuits - in artificial neural network (ANN) models of the neocortex. Using hierarchical recurrent ANNs in an audiovisual integration task, the authors show that an anatomically inspired feedback motif induces a stable visual bias consistent with human perception and yields modest but meaningful benefits for learning dynamics and robustness. The strength of evidence is solid: the modeling, analyses, and controls mostly support the central claim that top-down feedback motifs impose persistent inductive biases that shape functional specialization and behavior. But the evidence for a broad, general framework that predicts behavior remains only partially supported, and the Methods would benefit from a compact, reproducible summary of hyperparameters and architectural details.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors aim to investigate the potential improvements of ANNs when used to explain brain data using top-down feedback connections found in the neocortex. To do so, they use a retinotopic and tonotopic organization to model each subregion of the ventral visual (V1, V2, V4, and IT) and ventral auditory (A1, Belt, A4) regions using Convolutional Gated Recurrent Units. The top-down feedback connections are inspired by the apical tree of pyramidal neurons, modeled either with a multiplicative effect (change of gain of the activation function) or a composite effect (change of gain and threshold of the activation function).

      To assess the functional impact of the top-down connections, the authors compare three architectures: a brain-like architecture derived directly from brain data analysis, a reversed architecture where all feedforward connections become feedback connections and vice versa, and a random connectivity architecture. More specifically, in the brain-like model the visual regions provide feedforward input to all auditory areas, whereas auditory areas provide feedback to visual regions.

      First, the authors found that top-down feedback influences audiovisual processing and that the brain-like model exhibits a visual bias in multimodal visual and auditory tasks. Second, they discovered that in the brain-like model, the composite integration of top-down feedback, similar to that found in the neocortex, leads to an inductive bias toward visual stimuli, which is not observed in the feedforward-only model. Furthermore, the authors found that the brain-like model learns to utilize relevant stimuli more quickly while ignoring distractors. Finally, by analyzing the activations of all hidden layers (brain regions), they found that the feedforward and feedback connectivity of a region could determine its functional specializations during the given tasks.

      Strengths:

      The study introduces a novel methodology for designing connectivity between regions in deep learning models. The authors also employ several tasks based on audiovisual stimuli to support their conclusions. Additionally, the model utilizes backpropagation of error as a learning algorithm, making it applicable across a range of tasks, from various supervised learning scenarios to reinforcement learning agents. Conversely, the presented framework offers a valuable tool for studying top-down feedback connections in cortical models. Thus, it is a very nice study that can also give inspiration to other fields (machine learning) to start exploring new architectures.

    3. Reviewer #2 (Public review):

      Summary:

      This work addresses the question whether artificial deep neural network models of the brain could be improved by incorporating top-down feedback, inspired by the architecture of neocortex.

      In line with known biological features of cortical top-down feedback, the authors model such feedback connections with both, a typical driving effect and a purely modulatory effect on the activation of units in the network.

      To asses the functional impact of these top-down connections, they compare different architectures of feedforward and feedback connections in a model that mimics the ventral visual and auditory pathways in cortex on an audiovisual integration task.

      Notably, one architecture is inspired by human anatomical data, where higher visual and auditory layers possess modulatory top-down connections to all lower-level layers of the same modality, and visual areas provide feedforward input to auditory layers, whereas auditory areas provide modulatory feedback to visual areas.

      First, the authors find that this brain-like architecture imparts the models with a light visual bias similar to what is seen in human data, which is the opposite in a reversed architecture, where auditory areas provide feedforward drive to the visual areas.

      Second, they find that, in their model, modulatory feedback should be complemented by a driving component to enable effective audiovisual integration, similar to what is observed in neural data.

      Overall, the study shows some possible functional implications when adding feedback connections in a deep artificial neural network that mimic some functional aspects of visual perception in humans.

      Strengths:

      The study contains innovative ideas, such as incorporating an anatomically inspired architecture into a deep ANN, and comparing its impact on a relevant task to alternative architectures.

      Moreover, the simplicity of the model allows it to draw conclusions on how features of the architecture and functional aspects of the top-down feedback affects performance of the network.

      This could be a helpful resource for future studies of the impact of top-down connections in deep artificial neural network models of neocortex.

      Weaknesses:

      Some claims not yet supported.

      The problem is that results are phrased quite generally in the abstract and discussion, while the actual results shown in the paper are very specific to certain implementations of top-down feedback and architectures. This could lead to misunderstanding and requires some revisions of the claims in the abstract and discussion (see below).

      "Altogether our findings demonstrate that modulatory top-down feedback is a computationally relevant feature of biological brain..."

      This claim is not supported, since no performance increase is demonstrated for modulatory feedback. So far, only the second half of the sentence is supported: "...and that incorporating it into ANNs affects their behavior and constrains the solutions it's likely to discover."

      "This bias does not impair performance on the audiovisual tasks."

      This is only true for the composite top-down feedback that combines driving and modulatory effects, whereas modulatory feedback alone can impair the performance (e.g., in the visual tasks VS1 and VS2). The fact that modulatory feedback alone is insufficient in ANNs to enable effective cross-modal integration and requires some driving component is actually very interesting, but it is not stressed enough in the abstract. This is hinted at in the following sentence, but should be made more explicitly:

      "The results further suggest that different configurations of top-down feedback make otherwise identically connected models functionally distinct from each other, and from traditional feedforward and laterally recurrent models."

      "Here we develop a deep neural network model that captures the core functional properties of top-down feedback in the neocortex" -> this is too strong, take out "the", because very likely there are other important properties that are not yet incorporated.

      "Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature."

      This claim is still not substantiated by evidence provided in the paper. First, the wording is a bit imprecise, because mechanistically, it is not really the feedforward versus feedback (a purely feedforward model is not considered at all in the paper), but modulatory versus driving. Moreover, the second part of the sentence is problematic: The results imply that, computationally/functionally, driving connections are doing the job, while modulatory feedback does not really seem to improve performance (best case, it does not do any harm). It is true that it is a feature that is inspired by biology, but I don't see why the results imply that (modulatory) top-down feedback should be considered in ANN models of the brain. This would require to show that such models either improve performance, or do improve the ability to fit neural data, both which are beyond the scope of the paper.

      The same argument holds for the following sentence, which is not supported by the results of the paper:

      "More broadly, our work supports the conclusion that both the cellular neurophysiology and structure of feed-back inputs have critical functional implications that need to be considered by computational models of brain function."

      Additional supplementary material required

      Although the second version checked the influence of processing time, this was not done for the most important figure of the paper, Figure 4. A central claim in the abstract "This bias does not impair performance on the audiovisual tasks" relies on this figure, because only with composite feedback the performance is comparable between the the "drive-only" and "brain-like" models. Thus, the supplementary Figure 3 should also include the composite networks and drive only network to check the robustness of the claim with respect to process time. This robustness analysis should then also be mentioned in the text. For example, it should be mentioned whether results in these networks are robust or not with respect to process time, whether there are differences between network architectures or types of feedback in general etc.

      Moreover, the current analysis for networks with modulatory feedback is a bit confusing. Why is the performance so low for the reverse model for a process time of 3 and 10? This is a very strong effect that warrants explanation. More details should be added in the caption as well. For example, are the models separately trained for the output after 3 and 10 processing steps for the comparison, or just evaluated at these times? Not training these networks separately might explain the low performance for some networks, so ideally networks are trained for each choice of processing steps.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates the computational role of top-down feedback in artificial neural networks (ANNs), a feature that is prevalent in the brain but largely absent in standard ANN architectures. The authors construct hierarchical recurrent ANN models that incorporate key properties of top-down feedback in the neocortex. Using these models in an audiovisual integration task, they find that hierarchical structures introduce a mild visual bias, akin to that observed in human perception, not always compromising task performance.

      Strengths:

      The study investigates a relevant and current topic of considering top-down feedback in deep neural networks. In designing their brain-like model, they use neurophysiological data, such as externopyramidisation and hierarchical connectivity. Their brain-like model exhibits a visual bias that qualitatively matches human perception.

      Weaknesses:

      While the model is brain-inspired, it has limited bioplausibility. The model assumes a simplified and fixed hierarchy. The authors acknowledge this limitation in the discussion.

      While the brain-like model showed an advantage in ignoring distracting auditory inputs, it struggled when visual information had to be ignored. This suggests that its rigid bias toward visual processing could make it less adaptive in tasks requiring flexible multimodal integration. It hence does not necessarily constitute an improvement over existing ANNs. The study does not evaluate whether the top-down feedback architecture scales well to more complex problems or larger datasets. A valuable future contribution would be to evaluate how the network's behaviour fits to human data.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors aim to investigate the potential improvements of ANNs when used to explain brain data using top-down feedback connections found in the neocortex. To do so, they use a retinotopic and tonotopic organization to model each subregion of the ventral visual (V1, V2, V4, and IT) and ventral auditory (A1, Belt, A4) regions using Convolutional Gated Recurrent Units. The top-down feedback connections are inspired by the apical tree of pyramidal neurons, modeled either with a multiplicative effect (change of gain of the activation function) or a composite effect (change of gain and threshold of the activation function).

      To assess the functional impact of the top-down connections, the authors compare three architectures: a brain-like architecture derived directly from brain data analysis, a reversed architecture where all feedforward connections become feedback connections and vice versa, and a random connectivity architecture. More specifically, in the brain-like model the visual regions provide feedforward input to all auditory areas, whereas auditory areas provide feedback to visual regions.

      First, the authors found that top-down feedback influences audiovisual processing and that the brain-like model exhibits a visual bias in multimodal visual and auditory tasks. Second, they discovered that in the brain-like model, the composite integration of top-down feedback, similar to that found in the neocortex, leads to an inductive bias toward visual stimuli, which is not observed in the feedforward-only model. Furthermore, the authors found that the brain-like model learns to utilize relevant stimuli more quickly while ignoring distractors. Finally, by analyzing the activations of all hidden layers (brain regions), they found that the feedforward and feedback connectivity of a region could determine its functional specializations during the given tasks.

      Strengths:

      The study introduces a novel methodology for designing connectivity between regions in deep learning models. The authors also employ several tasks based on audiovisual stimuli to support their conclusions. Additionally, the model utilizes backpropagation of error as a learning algorithm, making it applicable across a range of tasks, from various supervised learning scenarios to reinforcement learning agents. Conversely, the presented framework offers a valuable tool for studying top-down feedback connections in cortical models. Thus, it is a very nice study that also can give inspiration to other fields (machine learning) to start exploring new architectures.

      We thank the reviewer for their accurate summary of our work and their kind assessment of its strengths.

      Weaknesses:

      Although the study explores some novel ideas on how to study the feedback connections of the neocortex, the data presented here are not complete in order to propose a concrete theory of the role of top-down feedback inputs in such models of the brain.

      (1) The gap in the literature that the paper tries to fill in the ability of DL algorithms to predict behavior: "However, there are still significant gaps in most deep neural networks' ability to predict behavior, particularly when presented with ambiguous, challenging stimuli." and "[...] to accurately model the brain."

      It is unclear to me how the presented work addresses this gap, as the only facts provided are derived from a simple categorization task that could also be solved by the feedforward-only model (see Figures 4 and 5). In my opinion, this statement is somewhat far-fetched, and there is insufficient data throughout the manuscript to support this claim.

      We can see now that the way the introduction was initially written led to some confusion about our goal in this study. Our goal here was not to demonstrate that top-down feedback can enable superior matches to human behaviour. Rather, our goal was to determine if top-down feedback had any real implications for processing ambiguous stimuli. The sentence that the reviewer has highlighted was intended as an explanation for why top-down feedback, and its impact on ambiguous stimuli, might be something one would want to examine for deep neural networks. But, here, we simply wanted to (1) provide an overview of the code base we have created, (2) demonstrate that top-down feedback does impact the processing of ambiguous stimuli.

      We agree with the reviewer that if our goal was to improve our ability to predict behaviour, then there was a big gap in the evidence we provided here. But, this was not our goal, and we believe that the data we provide here does convincingly show that top-down feedback has an impact on processing of ambiguous stimuli. We have updated the text in the introduction to make our goals more clear for the reader and avoid this misunderstanding of what we were trying to accomplish here. Specifically, the end of the introduction is changed to:

      “To study the effect of top-down feedback on such tasks, we built a freely available code base for creating deep neural networks with an algorithmic approximation of top-down feedback. Specifically, top-down feedback was designed to modulate ongoing activity in recurrent, convolutional neural networks. We explored different architectural configurations of connectivity, including a configuration based on the human brain, where all visual areas send feedforward inputs to, and receive top-down feedback from, the auditory areas. The human brain-based model performed well on all audiovisual tasks, but displayed a unique and persistent visual bias compared to models with only driving connectivity and models with different hierarchies. This qualitatively matches the reported visual bias of humans engaged in audio-visual tasks. Our results confirm that distinct configurations of feedforward/feedback connectivity have an important functional impact on a model's behavior. Therefore, top-down feedback captures behaviors and perceptual preferences that do not manifest reliably in feedforward-only networks. Further experiments are needed to clarify whether top-down feedback helps an ANN fit better to neural data, but the results show that top-down feedback affects the processing of stimuli and is thus a relevant feature that should be considered for deep ANN models in computational neuroscience more broadly.”

      (2) It is not clear what the advantages are between the brain-like model and a feedforward-only model in terms of performance in solving the task. Given Figures 4 and 5, it is evident that the feedforward-only model reaches almost the same performance as the brain-like model (when the latter uses the modulatory feedback with the composite function) on almost all tasks tested. The speed of learning is nearly the same: for some tested tasks the brain-like model learns faster, while for others it learns slower. Thus, it is hard to attribute a functional implication to the feedback connections given the presented figures and therefore the strong claims in the Discussion should be rephrased or toned down.

      Again, we believe that there has been a misunderstanding regarding the goals of this study, as we are not trying to claim here that there are performance advantages conferred by top-down feedback in this case. Indeed, we share the reviewer’s assessment that the feedforward only model seems to be capable of solving this task well. To reiterate: our goal here was to demonstrate that top-down feedback alters the computations in the network and, thus, has distinct effects on behaviour that need to be considered by researchers who use deep networks to model the brain. But we make no claims of “superiority” of the brain-like model.

      In-line with this, we’re not completely sure which claims in the discussion the reviewer is referring to. We note that we were quite careful in our claims. For example, in the first section of the discussion we say:

      “Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature.”

      And later on:

      “In summary, our study shows that modulatory top-down feedback and the architectural diversity enabled by it can have important functional implications for computational models of the brain. We believe that future work examining brain function with deep neural networks should therefore consider incorporating top-down modulatory feedback into model architectures when appropriate.”

      If we have missed a claim in the discussion that implies superiority of the brain-like model in terms of task performance we would be happy to change it.

      (3) The Methods section lacks sufficient detail. There is no explanation provided for the choice of hyperparameters nor for the structure of the networks (number of trainable parameters, number of nodes per layer, etc). Clarifying the rationale behind these decisions would enhance understanding. Moreover, since the authors draw conclusions based on the performance of the networks on specific tasks, it is unclear whether the comparisons are fair, particularly concerning the number of trainable parameters. Furthermore, it is not clear if the visual bias observed in the brain-like model is an emerging property of the network or has been created because of the asymmetries in the visual vs. auditory pathway (size of the layer, number of layers, etc).

      We thank the reviewer for raising this issue, and want to provide some clarifications: First, the number of trainable parameters are roughly equal, since we were only switching the direction of connectivity (top-down versus bottom-up), not the number of connections. We confirmed the biggest difference in size is between models with composite and multiplicative feedback; models with composite feedback have roughly ~1K more parameters, and all models are within the 280K parameter range. We now state this in the methods.

      Second, because superior performance was not the goal of this study, as stated above, we conducted limited hyperparameter tuning. Given the reviewer’s comment, we wondered whether this may have impacted our results. Therefore, we explored different hyperparameters for the model during the multimodal auditory tasks, which show the clearest example of the visual dominance in the brainlike model (Figure 3).

      We explored different hidden state sizes, learning rates and processing times, and examined whether the core results were different. We found that extremely high learning rates (0.1) destabilize all models and that some models perform poorly under different processing times. But overall, the core results are evident across all hyperparameters where the models learn i.e the different behaviors of models with different connectivities and the visual dominance observed in the brainlike model. We now provide these results in a supplementary figure (Fig. S2, showing larger models trained with different learning rates, and Fig S3, which shows the effect of processing time on AS task performance).

      Reviewer #2 (Public review):

      Summary:

      This work addresses the question of whether artificial deep neural network models of the brain could be improved by incorporating top-down feedback, inspired by the architecture of the neocortex.

      In line with known biological features of cortical top-down feedback, the authors model such feedback connections with both, a typical driving effect and a purely modulatory effect on the activation of units in the network.

      To assess the functional impact of these top-down connections, they compare different architectures of feedforward and feedback connections in a model that mimics the ventral visual and auditory pathways in the cortex on an audiovisual integration task.

      Notably, one architecture is inspired by human anatomical data, where higher visual and auditory layers possess modulatory top-down connections to all lower-level layers of the same modality, and visual areas provide feedforward input to auditory layers, whereas auditory areas provide modulatory feedback to visual areas.

      First, the authors find that this brain-like architecture imparts the models with a light visual bias similar to what is seen in human data, which is the opposite in a reversed architecture, where auditory areas provide a feedforward drive to the visual areas.

      Second, they find that, in their model, modulatory feedback should be complemented by a driving component to enable effective audiovisual integration, similar to what is observed in neural data.

      Last, they find that the brain-like architecture with modulatory feedback learns a bit faster in some audiovisual switching tasks compared to a feedforward-only model.

      Overall, the study shows some possible functional implications when adding feedback connections in a deep artificial neural network that mimics some functional aspects of visual perception in humans.

      Strengths:

      The study contains innovative ideas, such as incorporating an anatomically inspired architecture into a deep ANN, and comparing its impact on a relevant task to alternative architectures.

      Moreover, the simplicity of the model allows it to draw conclusions on how features of the architecture and functional aspects of the top-down feedback affect the performance of the network.

      This could be a helpful resource for future studies of the impact of top-down connections in deep artificial neural network models of the neocortex.

      We thank the reviewer for their summary and their recognition of the innovative components and helpful resources therein.

      Weaknesses:

      Overall, the study appears to be a bit premature, as several parts need to be worked out more to support the claims of the paper and to increase its impact.

      First, the functional implication of modulatory feedback is not really clear. The "only feedforward" model (is a drive-only model meant?) attains the same performance as the composite model (with modulatory feedback) on virtually all tasks tested, it just takes a bit longer to learn for some tasks, but then is also faster at others. It even reproduces the visual bias on the audiovisual switching task. Therefore, the claims "Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature." and "More broadly, our work supports the conclusion that both the cellular neurophysiology and structure of feed-back inputs have critical functional implications that need to be considered by computational models of brain function" are not sufficiently supported by the results of the study. Moreover, the latter points would require showing that this model describes neural data better, e.g., by comparing representations in the model with and without top-down feedback to recorded neural activity.

      To emphasize again our specific claims, we believe that our data shows that top-down feedback has functional implications for deep neural network behaviour, not increased performance or neural alignment. Indeed, our results demonstrate that top-down feedback alters the behaviour of the networks, as shown by the differences in responses to various combinations of ambiguous stimuli. We agree with the reviewer that if our goal was to claim either superior performance on these tasks, or better fit to neural data, we would need to actually provide data supporting that claim.

      Given the comments from the reviewer, we have tried to provide more clarity in the introduction and discussion regarding our claims. In particular, we now highlight that we are not trying to demonstrate that the models with top-down feedback exhibit superior performance or better fit to neural data.

      As one final note, yes, the reviewer understood correctly that the “only feedforward” model is a model with only driving inputs. We have renamed the feedforward-only models to drive only models and added additional emphasis in the text to ensure that the distinction is clear for all readers.

      Second, the analyses are not supported by supplementary material, hence it is difficult to evaluate parts of the claims. For example, it would be helpful to investigate the impact of the process time after which the output is taken for evaluation of the model. This is especially important because in recurrent and feedback models the convergence should be checked, and if the network does not converge, then it should be discussed why at which point in time the network is evaluated.

      This is an excellent point, and we thank the reviewer for raising it. We allowed the network to process the stimuli for seven time-steps, which was enough for information from any one region to be transmitted to any other. We found in some initial investigations that if we shortened the processing time some seeds would fail to solve the task. But, based on the reviewer’s comment, we have now also run additional tests with longer processing times for the auditory tasks where we see the clearest visual bias (Figure 3). We find that different process times do not change the behavioral biases observed in our models, but may introduce difficulties ignoring visual stimuli for some models. Thus, while process time is an important hyperparameter for optimal performance of the model, the central claim of the paper remains. We include this new data in a supplementary figure S3.

      Third, the descriptions of the models in the methods are hard to understand, i.e., parameters are not described and equations are explained by referring to multiple other studies. Since the implications of the results heavily rely on the model, a more detailed description of the model seems necessary.

      We agree with the reviewer that the methods could have been more thorough. Therefore, we have greatly expanded the methods section. We hope the model details are now more clear.

      Lastly, the discussion and testable predictions are not very well worked out and need more details. For example, the point "This represents another testable prediction flowing from our study, which could be studied in humans by examining the optical flow (Pines et al., 2023) between auditory and visual regions during an audiovisual task" needs to be made more precise to be useful as a prediction. What did the model predict in terms of "optic flow", how can modulatory from simple driving effect be distinguished, etc.

      We see that the original wording of this prediction was ambiguous, thank you for pointing this out. In the study highlighted (Pines et al., 2023) the authors use an analysis technique for measuring information flow between brain regions, which is related to analysis of optical flow in images, but applied to fMRI scans. This is confusing given the current study, though. Therefore, we have changed this sentence to make clear that we are speaking of information flow here. 

      Reviewer #3 (Public review):

      Summary:

      This study investigates the computational role of top-down feedback in artificial neural networks (ANNs), a feature that is prevalent in the brain but largely absent in standard ANN architectures. The authors construct hierarchical recurrent ANN models that incorporate key properties of top-down feedback in the neocortex. Using these models in an audiovisual integration task, they find that hierarchical structures introduce a mild visual bias, akin to that observed in human perception, not always compromising task performance.

      Strengths:

      The study investigates a relevant and current topic of considering top-down feedback in deep neural networks. In designing their brain-like model, they use neurophysiological data, such as externopyramidisation and hierarchical connectivity. Their brain-like model exhibits a visual bias that qualitatively matches human perception.

      We thank the reviewer for their summary and evaluation of our paper’s strengths.

      Weaknesses:

      While the model is brain-inspired, it has limited bioplausibility. The model assumes a simplified and fixed hierarchy. In the brain with additional neuromodulation, the hierarchy could be more flexible and more task-dependent.

      We agree, there are still many facets of top-down feedback that we have not captured here, and the modulation of hierarchy is an interesting example. We have added some consideration of this point to the limitations section of the discussion.

      While the brain-like model showed an advantage in ignoring distracting auditory inputs, it struggled when visual information had to be ignored. This suggests that its rigid bias toward visual processing could make it less adaptive in tasks requiring flexible multimodal integration. It hence does not necessarily constitute an improvement over existing ANNs. It is unclear, whether this aspect of the model also matches human data. In general, there is no direct comparison to human data. The study does not evaluate whether the top-down feedback architecture scales well to more complex problems or larger datasets. The model is not well enough specified in the methods and some definitions are missing.

      We agree with the reviewer that we have not demonstrated anything like superior performance (since the brain-like network is quite rigid, as noted) nor have we shown better match to human data with the brain-like network. This was not our intended claim. Rather, we demonstrated here simply that top-down feedback impacts behavior of the networks in response to ambiguous stimuli. We have now added statements to the introduction and discussion to make our specific claims (which are supported by our data, we believe) clear.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I believe that the work is very nice but not so mature at this stage. Below, you can find some comments that eventually could improve your manuscript.

      (1) Intro, last sentence: "Therefore, top-down feedback is a relevant feature that should be considered for deep ANN models in computational neuroscience more broadly." I don't understand what the authors refer to with this sentence. There are numerous models (deep ANNs) that have been used to model the neural activity and are much simpler than the one proposed here which contains very complex models and connectivity. Although I do agree that the top-down connections are very important there is no data to support their importance for modeling the brain.

      Respectfully, we disagree with the reviewer that we don’t provide data to demonstrate the importance of top-down feedback for modelling. Indeed, we provided a great deal of data to show that top-down feedback in the networks has real functional implications for behaviour, e.g., it can induce a human-like visual bias. Thus, top-down feedback is a factor that one should care about when modelling the brain. But, we agree with the reviewer that more demonstration of the utility of using top-down feedback for achieving better fits to neural data would be an important next step. 

      (2) I suggest adding some extra supplementary simulations where, for example, the number of data for visual and auditory pathways is equal in size (i.e., the same number of examples), the number of layers is identical (3 per pathway), and also the number of parameters. Doing this would help strengthen the claims presented in the paper.

      In fact, all of the hyperparameters the reviewer mentions here were identical for the different networks, so the experiments the reviewer is requesting here were already part of the paper. We now clarify this in the text.

      (3) Results: I suggest adding Tables with quantifications of the presented results. For example, best performance, epochs to converge, etc. As it is now, it is very hard to follow the evidence shown in Figures.

      This is a good suggestion, we have now added this table to the start of the supplemental figures.

      (4) Figure 2e, 3e: Although VS3, and AS3 have been used only for testing, the plot shows alignments with respect to training epochs. The authors should clarify in the Methods if they tested the network with all intermediate weights during VS1/VS2 or AS1/AS2 training.

      Testing scenarios in this context meant that the model was never shown the scenario/task during training, but the models were indeed evaluated on the VS3 and AS3 after each training epoch. We have added clarifications to the figure legends.

      (5) Methods: It would be beneficial to discuss how specific hyperparameters were selected based on prior research, empirical testing, or theoretical considerations. Also, it is not clear how the alignment (visual or audio) is calculated. Do the authors use the examples that have been classified correctly for both stimuli or do they exclude those from the analysis (maybe I have missed it).

      As noted above, because superior performance was not the goal of this study, we conducted limited hyperparameter tuning. But we have extended the results with additional hyperparameter tuning in a supplementary figure, and describe the hyperparameter choices more thoroughly in the methods. As well, all data includes all model responses, regardless of whether they were correct or not. We now clarify this in the methods.

      (6) Code: The code repository lacks straightforward examples demonstrating how to utilize the modeling approach. Given that it is referred to as a "framework", one would expect it to facilitate easy integration into various models and tasks. Including detailed instructions or clear examples would significantly improve usability and help users effectively apply the proposed methodology.

      We agree with the reviewer, this would be beneficial. We have revised the README of the codebase to explain the model and its usage more clearly and included an interactive jupyter notebook with example training on MNIST.

      Some minor comments are given below. Generally speaking, the Figures need to be more carefully checked for consistent labels, colors, etc.

      (1) Page 4, 1st paragraph - grammar correction: "a larger infragranular layer" or "larger infragranular layers"

      Thank you for catching this, we have fixed the text.

      (2) Page 4, 2nd para - rephrase: "In three additional control ANNs" → "In the third additional control ANN"

      In fact, we did mean three additional control ANNs, each one representing a different randomized connectivity profile. We now clarify this in the text and provide the connectivity of the two other random graphs in the supplemental figures.

      (3) Page 4, VAE acronym needs to be defined before its first use

      The variational autoencoder is introduced by its full name in the text now.

      (4) Page 4: Fig. 2c reference should be Fig. 2b, Fig. 2d should be Fig. 2c, Fig. 2b should be Fig. 2d, VS4; Fig. 2b, bottom should be VS4; Fig. 2f, Fig. 2f to Fig. 2g. Double check the Figure references in the text. Here is very confusing for the reader.

      We have now fixed this, thank you for catching it.

      (5) Page 5, 1st para: "Altogether, our results demonstrated both" → "Altogether, our results demonstrated that both"

      This has been updated.

      (6) Figure 2: In the e and g panels the x label is missing.

      This was actually because the x-axis were the same across the panels, but we see how this was unclear, so we have updated the figure.

      (7) Figure 3: There is no panel g (the title is missing); In panels b, c, e, and g the y label is missing, and in panels e and g the x label is missing. Also, the Feedforward model is shown in panel g but it is introduced later in the text. Please remove it from Figure 3. Also in legend: "AV Reverse graph" → "Reverse graph". Also, "Accuracy" and "Alignment" should be presented as percentages (as in Figure 2).

      This has been corrected.

      (8) Figure 4; x labels are missing.

      As with point (6), this was actually because the x-axis were the same across the panels, but we see how this was unclear, so we have updated the figure.

      (9) Page 7; I can’t find the cited Figure S1.

      Apologies, we have added the supplemental figure (now as S4). It shows the results of models with multiplicative feedback on the task in Fig 5 (as opposed to models with composite feedback shown in the main figure).

      Reviewer #2 (Recommendations for the authors):

      (1) Discussion Section 3.1 is only a literature review, and does not really add any value.

      Respectfully, we think it is important to relate our work to other computational work on the role of top-down feedback, and to make clear what our specific contribution is. But, we have updated the text to try to place additional emphasis on our study’s contribution, so that this section is more than just a literature review.

      “Our study adds to this previous work by incorporating modulatory top-down feedback into deep, convolutional, recurrent networks that can be matched to real brain anatomy. Importantly, using this framework we could demonstrate that the specific architecture of top-down feedback in a neural network has important computational implications, endowing networks with different inductive biases.”

      (2) Including ipython notebooks and some examples would be great to make it easier to use the code.

      We now provide a demo of how to use the code base in a jupyter notebook.

      (3) The description of the model is hard to comprehend. Please name and describe all parameters. Also, a figure would be great to understand the different model equations.

      We have added definitions of all model terms and parameters.

      (4) The terminology is not really clear to me. For example "The results further suggest that different configurations of top-down feedback make otherwise identically connected models functionally distinct from each other and from traditional feedforward only recurrent models." The feedforward and only recurrent seem to contradict each other. Would maybe driving and modulatory be a better term here? I also saw in the code that you differentiate between three types of inputs, modulatory, threshold offset and basal (like feedforward). How about you only classify connections based on these three type? I was also confused about the feedforward only model, because I was unsure whether it is still feedback connections but with "basal" quality, or whether feedback connections between modalities and higher-to-lower level layers were omitted altogether.

      We take the reviewer’s point here. To clarify this, we have updated the text to refer to “driving only” rather than “feedforward only”, to make it obvious that what we change in these models is simply whether the connection has any modulatory impact on the activity. 

      (5) "incorporating it into ANNs can affect their behavior and help determine the solutions that the network can discover." -> Do you mean constrain? Overall, I did not really get this point.

      Yes, we mean that it constrains the solutions that the network is likely to discover.

      (6) "ignore the auditory inputs when they visual inputs were unambiguous" -> the not they

      This has been fixed. Thank you for catching it.

      (7) xlabel in Figure 4 is missing.

      This has been fixed, thank you for catching it.

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) How alignment is computed is not defined. In addition to a proper definition in the methods section, it would be nice to briefly define it when it first appears in the results section.

      We’ve added an explicit definition of how alignment is calculated in the methods and emphasized the calculation when its first explained in the results

      (2) A connectivity matrix for the feedforward-only model is missing and could be added.

      We have added this to Figure 1.

      (3) The connectivity matrix for each random model should also be shown.

      We’ve shown each of the random model configurations in the new supplemental figure S1.

      (4) Initial parameters are not defined, such as W, b etc. A table with all model parameters would be great.

      We have added a table to the methods listing all of the parameters.

      (5) Would be nice to show the t-sne plots (not just the NH score) for each model and each task in the appendix.

      We can provide these figures on request. They massively increase the file size of the paper pdf, as there’s 49 of them for each task and each model, 980 in total. An example t-SNE plot is provided in figure 6.

      Minor:

      (1) Page 4:

      "we refer to this as Visual-dominant Stimulus case 1, or VS1; Fig. 1a, top)." This should be Fig. 2a.

      (2) "In stimulus condition VS1, all of the models were able to learn to use the auditory clues to disambiguate the images (Fig. 2c)."

      This should be Fig. 2b.

      (3) "In comparison, in VS2, we found that the brainlike model learned to ignore distracting audio inputs quickly and consistently compared to the random models, and a bit more rapidly than the auditory information (Fig 2d)."

      This should be Fig. 2c.

      (4) "VS3; Fig. 2b, top"

      This should be Fig. 2d

      (5) "while all other models had to learn to do so further along in training (Fig. 2e)."

      It is not stated explicitly, but this suggests that the image-aligned target was considered correct, and that weight updates were happening.

      (6) "VS4; Fig. 2b, bottom"

      This should be Fig. 2f

      (7) "adept at learning (Fig. 2f)."

      This should be Fig. 2g

      (8) Figure 3:b,c,e y-labels are missing

      3f: both x and y labels are missing

      (9) Figure labeling in the text is not consistent (Fig. 1A versus Fig. 2a)

      (10) Doubled "the" in ""This shows that the inductive bias towards vision in the brainlike model depended on the presence of the multiplicative component of the the feedback"

      (11) Page 9 Figure 6: The caption says b shows the latent spaces for the VS2 task, whereas the main text refers to 6b as showing the latent space for the AS2 task. Please correct which task it is.

      (12) Methods 4.1 page 13

      "which is derived from the feedback input (h_{l−1})"

      This should be h_{l+1}

      (13) r_l, u_l, u and c are not defined to which aspects of the model they refer to

      Even though this is based on a previous model, the methods section should completely describe the model.

      Equations 1,2,3: the notation [x;y] is unclear and should be defined.

      Equation 5: u should probably be u_l.

      (14) Page 14 typo: externopyrmidisation.

      (15) It is confusing to use different names for the same thing: the all-feedforward model, the all feedforward network, the feedforward network, and the feedforward-only model are probably all the same? Consistent naming would help here.

      Thank you for the detailed comments! We’ve fixed the minor errors and renamed the feedforward models to drive-only models.

    1. eLife Assessment

      This study investigates the temporal dynamics of neural activity preceding self-initiated movements and makes a valuable contribution to this field. The authors identify key methodological and analytical limitations in previous work and introduce a novel approach to overcome the shortcomings in assessing how predictive neural activity is of an upcoming event. Applying generally solid methods and analyses, they show that a late-stage neural event, ~100 ms before movement execution, is most predictive of upcoming movements, whereas earlier neural activity is less informative. Although interesting, additional analyses are needed to strengthen confidence in this central claim.

    2. Reviewer #1 (Public review):

      Summary:

      Jeay-Bizot and colleagues investigate the neural correlates of the preparation of, and commitment to, a self-initiated motor action. In their introduction, they differentiate between theoretical proposals relating to the timing of such neural correlates relative to the time of a recorded motor action (e.g., a keypress). These are categorised into 'early' and 'late' timing accounts. The authors advocate for 'late' accounts based on several arguments that align well with contemporary models of decision-making in other domains (for example, evidence accumulation models applied to perceptual decisions). They also clearly describe prevalent methodological issues related to the measurement of event-related potentials (ERPs) and time-frequency power to gauge the timing of the commitment to making a motor action. These methodological insights are communicated clearly and denote potentially important limitations on the inferences that can be drawn from a large body of existing work.

      To attempt to account for such methodological concerns, the authors devise an innovative experiment that includes an experimental condition whereby participants make a motor action (a right-hand keypress) to make an image disappear. They also include a condition whereby the stimulus presentation program automatically proceeds at a set time that is matched to the response timing in a previous trial. In this latter condition, no motor action is required by the participant. The authors then attempt to determine the times at which they can differentiate between these two conditions (motor action vs no motor action) based on EEG and MEG data, using event-related potential analyses, time-frequency analyses, and multivariate classifiers. They also apply analysis techniques based on comparing M/EEG amplitudes at different time windows (as used in previous work) to compare these results to those of their key analyses.

      When using multivariate classifiers to discriminate between conditions, they observed very high classification performance at around -100ms from the time of the motor response or computer-initiated image transition, but lower classification performance and a lack of statistically significant effects across analyses for earlier time points. Based on this, they make the key claim that measured M/EEG responses at the earlier time points (i.e., earlier than around -100ms from the motor action) do not reliably correlate with the execution of a motor action (as opposed to no such action being prepared or made). This is argued to favour 'late' accounts of motor action commitment, aligning with the well-made theoretical arguments in favour of these accounts in the introduction. Although the exact time window related to 'late' accounts is not concretely specified, an effect that occurs around -100ms from response onset is assumed here to fall within that window.

      Importantly, this claim relies on accepting the null hypothesis of zero effect for the time points preceding around -100ms based on a somewhat small sample of n=15 and some additional analyses of individual participant datasets. Although the authors argue that their classifiers are sensitive to detecting relevant effects, and the study appears well-powered to detect the (likely to be large magnitude) M/EEG signal differences occurring around the time of the response or computer-initiated image transition, there is no guarantee that the study is adequately sensitive to detect earlier differences in M/EEG signals. These earlier effects are likely to be more subtle and exhibit lower signal-to-noise ratios, but would still be relevant to the 'early' vs 'late' debate framed in the manuscript. This, along with some observed patterns in the data, may substantially reduce the confidence one may have in the key claim about the onset timing of M/EEG signal differences.

      Notably, there is some indication of above-chance (above 0.5 AUC) classification performance at time points earlier than -100ms from the response, as visible in Figure 3A for the task-based EEG analyses (EEG OC dataset, blue line). While this was not statistically significantly above chance for their n=15 sample, these results do not appear to be clear evidence in favour of a zero-effect null-hypothesis. In Figures 2A-B, there are also visible differences in the ERPs across conditions, from around the time that motor action-related components have been previously observed (around -500ms from the response). The plotted standard errors in the data are large enough to indicate that the study may not have been adequately powered to differentiate between the conditions.

      Although the authors acknowledge this limitation in the discussion section of their manuscript, their counter-argument is that the classifiers could reliably differentiate between conditions at time points very close to the motor response, and in the time-based analyses where substantive confounds are likely to be present, as demonstrated in a set of analyses. Based on this data, the authors imply that the study is sufficiently powered to detect effects across the range of time points used in the analyses. While it's commendable that these extra analyses were run, they do not provide convincing evidence that the study is necessarily sensitive to detecting more subtle effects that may occur at earlier time points. In other words, the ability of classifiers (or other analysis methods) to detect what are likely to be very prominent, large effects around the time of the motor response does not guarantee that such analyses will detect smaller magnitude effects at other time points.

      In summary, the authors develop some very important lines of argument for why existing work may have misestimated the timing of neural signals that precede motor actions. This in itself is an important contribution to the field. However, their attempt to better estimate the timing of such signals is limited by a reliance on accepting the null hypothesis based on non-statistically significant results, and arguably a limited degree of sensitivity to detect subtle but meaningful effects.

      Strengths:

      This manuscript provides compelling reasons why existing studies may have misestimated the timing of the neural correlates of motor action preparation and execution. They provide additional analyses as evidence of the relevant confounds and provide simulations to back up their claims. This will be important to consider for many in the field. They also endeavoured to collect large numbers of trials per participant to also examine effects in individuals, which is commendable and arguably better aligned with contemporary theory (which pertains to how individuals make decisions to act, rather than groups of people).

      The innovative control condition in their experiment may also be very useful for providing complementary evidence that can better characterise the neural correlates of motor action preparation and commitment. The method for matching image durations across active and passive conditions is particularly well thought-out and provides a nice control for a range of potential confounding factors.

      Weaknesses:

      There is a mismatch between the stated theoretical phenomenon of interest (commitment to making a motor action) and what is actually tested in the study (differences in neural responses when an action is prepared and made compared to when no action is required). The assumed link between these concepts could be made more explicit for readers, particularly because it is argued in the manuscript that neural correlates of motor action preparation are not necessarily correlates of motor action commitment.

      As mentioned in the summary, the main issue is the strong reliance on accepting the null hypothesis of no differences between motor action and computer initiation conditions based on a lack of statistically significant results from the modest (n=15) sample. Although a larger sample will increase measurement precision at the group level, there are some EEG data processing changes that could increase the signal-to-noise ratio of the analysed data and produce more precise estimates of effects, which may improve the ability to detect more subtle effects, or at least provide more confidence in the claims of null effects.

      First, it is stated in the EEG acquisition and preprocessing section that the 64-channel Biosemi EEG data were recorded with a common average reference applied. Unless some non-standard acquisition software was used (of which we are not aware exists), Biosemi systems do not actually apply this reference at recording (it is for display purposes only, but often mistaken to be the actual reference applied). As stated in the Biosemi online documentation, a reference should be subsequently applied offline; otherwise, there is a substantial decrease in the signal-to-noise ratio of the EEG data, and a large portion of ambient alternating current noise is retained in the recordings. This can be easily fixed by applying a referencing scheme (e.g., the common average reference) offline as one of the first steps of data processing. If this was, in fact, done offline, it should be clearly communicated in the manuscript.

      In addition, the data is downsampled using a non-integer divisor of the original sampling rate (a 2,048 Hz dataset is downsampled to 500 Hz rather than 512 Hz). Downsampling using a non-integer divisor is not recommended and can lead to substantial artefacts in raw data as a result, as personally observed by this Reviewer in Biosemi data. Finally, although a 30 Hz low-pass filter is applied for visualisation purposes of ERPs, no such filter is applied prior to analyses, and no method is used to account for alternating current noise that is likely to be in the data. As noted above, much of the alternating current noise will be retained when an offline reference is not applied, and this is likely to further degrade the quality of the data and reduce one's ability to identify subtle patterns in EEG signals. Changes in data processing to address these issues would likely lead to more precise estimates of EEG signals (and by extension differences across conditions).

      With regard to possible effects extending hundreds of milliseconds before the response, it would be helpful for the authors to more precisely clarify the time windows associated with 'early' and 'late' theories in this case. The EEG data that would be required to support 'early' theories is also not made sufficiently clear. For example, even quite early neural correlates of motor actions in this task (e.g., around -500ms from the response, or earlier) could still be taken as evidence for the 'late' theories if these correlates simply reflect the accumulation of evidence toward making a decision and associated motor action, as implied by the Leaky Stochastic Accumulator model described by the authors. In other words, even observations of neural correlates of motor action preparation that occur much earlier than the response would not constitute clear evidence against the 'late' account if this neural activity represents an antecedent to a decision and action (rather than commitment to the action), as the authors point out in the introduction.

      In addition, there is some discrepancy regarding the data that is used by the classifiers to differentiate between the conditions in the EEG data and the claims about the timing of neural responses that differentiate between conditions. Unless we reviewers are mistaken, the Sliding Window section of the methods states that the AUC scores in Figure 3 are based on windows of EEG data that extend from the plotted time point until 0.5 seconds into the past. In other words, an AUC value at -100ms from the response is based on classifiers applied to data ranging from -600 to -100 milliseconds relative to the response. In this case, the range of data used by the classifiers extends much earlier than the time points indicated by Figure 3, and it is difficult to know whether the data at these earlier time points may have contributed (even in subtle ways) to the success of the classifiers. This may undermine the claim that neural responses only become differentiable from around -100ms from response onset. The spans of these windows used for classification could be made more explicit in Figure 3, and classification windows that are narrower could be included in a subset of analyses to ensure that classifiers only using data in a narrow window around the response show the high degree of classification performance in the dataset. If we are mistaken, then perhaps these details could be clarified in the method and results sections.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to investigate how well the onset of a self-initiated movement could be predicted at different times prior to action onset. To do so, they collected EEG and MEG data across 15 human participants who watched natural landscape images on a screen. These participants performed active self-initiated movements or observed passive actions to have a new image appear. By comparing the neural activity prior to active and time-matched passive actions, the authors found that even though a build-up of neural activity is visible close to 1s prior to action, action onset could only be reliably predicted around 100ms prior to action. These results confirm what was already suggested in previous literature: the commitment to action is only clear from the late stages in the visible neural ramp-up to action onset.

      Strengths:

      (1) The paper presents a well-thought-out methodology to assess the predictive value of neural activity prior to a self-initiated movement and passively observed action, while keeping all other experimental factors identical. This methodology can be applied outside the specific scope of this paper as well, in efforts to assess the correspondence of a neural signature with an observed behavior.

      (2) The results are a strong confirmation of what was suggested less clearly in previous research (Trevena & Miller, 2010, Consciousness & Cognition; Schmidt et al., 2016, Neuroscience & Biobehavioral Reviews; Travers et al., 2020, NeuroImage).

      Weaknesses:

      (1) Although the authors conducted a solid confirmatory study, the importance of this confirmation is less clear to me. How do the current results change our interpretation of the relation between conscious intention and neural preparation for action? Do these results affect our interpretation of free will? Why does it matter at all whether we see neural preparatory activity prior to the report of a conscious intention to act, or prior to action observation? This study does not clarify the relationship between the observed neural phenomenon, the action or the experienced intention. It does not explain whether this relation is causal, correlational or something else.

      (2) Whereas Derchi et al. (2023, Scientific Reports) were able to keep the entire experimental context similar across intended and unintended conditions, Jeay-Bizot et al. have one big difference between their passive and active conditions: the presence of a movement. Therefore, the present results explain the presence or absence of a movement rather than the presence or absence of an intention to act.

    1. eLife Assessment

      This fundamental study reports the effects of the psychedelic drug psilocin on iPSC-derived human cortical neurons, analyzing different aspects of structural and functional neuronal plasticity. The evidence is convincing and supports the value of using iPSC-derived human cortical neurons for testing the potentially translational effects of psilocin and other psychedelic-related compounds.

    2. Reviewer #1 (Public review):

      Summary:

      This study reports the effects of psilocin on iPSC-derived human cortical neurons.

      Strengths:

      The characterization was comprehensive, involving immunohistochemistry of various markers, 5-HT2A receptors, BDNF, and TrkB, transcriptomics analyses, morphological determination, electrophysiology, and finally synaptic protein measurements. The results are in close agreement with prior work (PMID 29898390) on rat cultured cortical neurons. Nevertheless, there is value in confirming those earlier findings and furthermore to demonstrate the effects in human neurons, which are important for translation. The genetic, proteomics, and cell structure analyses used in this paper are its major strength. The study supports the value of using iPSC-derived human cortical neurons for drug development involving psychedelics-related compounds.

      Weaknesses:

      (1) Line 140: 5-HT2A receptor expression was found via immunocytochemistry to reside in the somatodendritic and axonal compartments. However, prior work from ex vivo tissue using electron microscopy has found predominantly 5-HT2A receptor expression in the somatodendritic compartment (PMID: 12535944). Was this antibody validated to be 5-HT2A receptor-specific? Can the authors reason why the discrepancy may arise, and if the axonal expression is specific to the cultured neurons?

      (2) Line 143: It would be helpful to specify the dose of psilocin tested, and describe how this dose was chosen.

      (3) Figure 1: The interpretation is that the differential internalization in the axonal and somatodendritic compartments is time-dependent. However, given that only one dose is tested, it is also possible that this reflects dose dependence, with the longer time exposure leading to higher dose exposure, so these variables are related. That is, if a higher dose is given, internalization may also be observed after 10 minutes in the dendritic compartment.

      (4) Figure 3 & 4: What is the 'control' here? A more appropriate control for the 24 hours after psilocin application would be 24 hours after vehicle application. Here the authors are looking at before and after, but the factor of time elapsed and perturbation via application is not controlled for.

      (5) The sample size was not clearly described. In the figure legend, N = the number of neurites is provided, but it is unclear how many cells have been analyzed, and then how many of those cells belong to the same culture. These are important sample size information that should be provided. Relatedly, statistical analyses should consider that the neurites from the same cells are not independent. If the neurites indeed come from the same cells, then the sample size is much smaller and a statistical analysis considering the nested nature of the data should be used.

      Comments on revisions:

      The authors performed substantial experiments to check validity of the HTR2A antibody for the revision. Briefly, they found that western blot shows a single band, abolished by a blocking peptide, in neural progenitors and iPSC-derived neurons, suggesting positive results. However, they also detected immunofluorescence signals in HEK293 and HeLa cells, which do not express 5-HT2A receptors as scRNAseq analysis of these cells show complete absence of the transcript. Therefore the antibody has epitope-selective binding but also has some non-specific binding, precluding its use. The authors rightfully removed the data related to the antibody in the revised manuscript. The account is repeated here to highlight to anyone who may find the information helpful. Overall, the additional results added rigor to the study.

    3. Reviewer #2 (Public review):

      In this article, Schmidt et al use iPSC-derived human cortical neurons to test the effects the psychedelic psilocin in different models of neuroplasticity.

      Using human iPSC-derived cortical neurons, the authors test the expression of 5-HT2A and subcellular distribution, as well as the effect of different times of exposure to psilocin on 5-HT2A expression. The authors evaluated the effect of the 5-HT2 antagonist ketanserin, as well as the inhibition of dynamin-dependent endocytic pathways with dynasore. Gene expression and plasticity (structural and functional) was also evaluated after different times of exposure to psilocin.

      In general, results are interesting since they use the iPSC to evaluate the potentially translationally relevant effects of psilocin (the active metabolite of the psychedelic psilocybin).

      Comments on revisions:

      The authors have addressed all of my previous concerns. A particular strength of the rebuttal is that the authors corroborated the lack of selectivity/specificity of the anti-5-HT2A antibody used in earlier versions of the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Comment 1: 5-HT2A Antibody Specificity

      Was this antibody validated to be 5-HT2A receptor-specific? Can the authors reason why the discrepancy may arise, and if the axonal expression is specific to the cultured neurons?

      We performed extensive validation of the anti-5-HT2A receptor antibody (Alomone #ASR-033), which is summarized in the accompanying Author response images:

      Positive findings (Author response image 1c-e, Author response image 2a): (1) Western blot showed a single band at the expected molecular weight (~50 kDa) in neural progenitors and iPSCderived neurons. (2) The blocking peptide (#BLP-SR033) abolished Western blot bands and markedly reduced immunofluorescence signals in neurons, confirming epitope-specific binding.

      Negative findings (Author response image 1a-b, Author response image 2a-b, Author response image 3): (1) We detected positive immunofluorescence signals in HEK293 and HeLa cells (Author response image 1a-b), which do not express 5-HT2AR. (2) Western blot also showed bands in HEK293 and HeLa cells (Author response image 2a-b). (3) Single-cell RNA-seq analysis of HEK293T cells confirmed complete absence of HTR2A expression (Author response image 3a). (4) qPCR showed no detectable HTR2A transcripts in iPSCs or HeLa cells (Ct > 36), while neural progenitors and neurons showed clear expression (Author response image 3b). (5) siRNA knockdown experiments failed to produce a corresponding decrease in immunofluorescence or Western blot signals, despite reduced HTR2A transcript levels (data not shown).

      BLAST analysis: Protein BLAST analysis of the 13-amino acid immunogenic peptide sequence identified the human 5-HT2A receptor as the top hit (9/13 amino acids overlap). However, shorter sequence similarities were also found with other proteins, including APPBP1 (6/9 amino acids), Immunoglobulin Heavy Chain (6/7 amino acids), and Interleukin31 receptor (6/8 amino acids). While these partial homologies do not provide a definitive mechanistic explanation for the observed off-target binding, they illustrate that the epitope sequence is not entirely unique to the 5-HT2A receptor.

      Conclusion: While our validation confirmed epitope-specific binding (blocking peptide effective in neurons), the antibody clearly detects something in cells that demonstrably lack HTR2A gene expression. This indicates off-target binding to other proteins sharing the epitope sequence. We have therefore removed all antibody-based 5-HT2A receptor experiments from the revised manuscript. This includes the receptor internalization data from Figure 1. The remaining findings (BDNF upregulation, gene expression changes, morphological effects, electrophysiology) are supported by independent methods including pharmacological blockade with ketanserin.

      Comment 2: Psilocin Dose Selection

      It would be helpful to specify the dose of psilocin tested, and describe how this dose was chosen.

      We used 10 µM psilocin based on: (1) The seminal study by Ly et al. (2018), which demonstrated neuroplasticity effects at this concentration in rat cortical neurons. (2) Our own dose-response experiments (Figure S2B) showing maximal BDNF increase at 10 µM compared to lower concentrations (10 nM, 100 nM, 1 µM). We have clarified this in the revised Methods section.

      Comment 3: Dose vs. Time Dependence

      Given that only one dose is tested, it is also possible that this reflects dose dependence, with the longer time exposure leading to higher dose exposure.

      We agree that dose dependence cannot be excluded with our current experimental design. This point is now moot as we have removed the 5-HT2A receptor internalization experiments from the manuscript. Future studies in our group will address dose-dependent effects on other readouts.

      Comment 4: Control Conditions

      What is the 'control' here? A more appropriate control would be 24 hours after vehicle application.

      The control condition is indeed a vehicle (DMSO) control collected at the same time point as the experimental condition (i.e., 24 hrs post-treatment). We have clarified this in the revised figure legends and Methods section to avoid confusion.

      Comment 5: Sample Size Description

      The sample size was not clearly described. Statistical analyses should consider that neurites from the same cells are not independent.

      We have expanded the sample size descriptions in the figure legends. Analyses were performed using 5-10 microscope images per condition, with 15 ROIs per image, across at least two independent differentiations from two genetic backgrounds. Regarding independence: each neurite segment exists within a distinct microenvironment and can be considered an independent measurement unit, consistent with established practices in the field (Paul et al., 2021, CNS Neurosci Ther). We acknowledge this increases statistical power and have noted this in the Methods.

      Reviewer #2:

      Comment 1: 5-HT2A Antibody Validation

      Without validation (using for example knockdown techniques to decrease expression of 5HT2A), the experiments using this antibody should be excluded from the manuscript.

      We agree with this assessment. As detailed in our response to Reviewer 1 (Comment 1) and documented in the Response to Reviewer Figure, our extensive validation attempts—including siRNA knockdown—could not conclusively demonstrate antibody specificity. We have removed all antibody-based 5-HT2A receptor experiments from the revised manuscript.

      Comment 2: Serotonin in Cell Media

      Did the authors evaluate whether 5-HT is present in the cell media?

      The cell culture media used in our experiments does not contain serotonin. We have explicitly stated this in the revised Methods section.

      Comment 3: Statistical Analysis of Figure S1F

      Some of the datasets are not statistically analyzed, such as Figure S1F.

      Figure S1F related to the 5-HT2A receptor experiments and has been removed from the revised manuscript along with the associated data.

      Comment 4: Translational Validity of Prolonged Exposure

      The authors continuously exposed cells to psilocin for hours or days. Since this is not the model of what occurs in vivo, the findings lack translational validity.

      We acknowledge this limitation. Most experiments (BDNF, gene expression, branching) were conducted 24–48 hrs after a brief 10-minute exposure, which better reflects the in vivo situation. Prolonged exposures (96 hrs) were used specifically for synaptogenesis experiments based on literature showing that repeated LSD administration enhances spine density (Inserra et al., 2022; De Gregorio et al., 2022). Our in vitro system lacks metabolizing enzymes and glial cells, which may introduce temporal biases. We have added a discussion of these limitations in the revised manuscript.

      Comment 5: Ketanserin Effect on BDNF

      In Figure 2E, ketanserin by itself seems to reduce BDNF density. How do the authors conclude that ketanserin blocks psi-induced effects?

      We identified that one cell line (Ctrl 1) with inherently higher BDNF density was inadvertently excluded from the ketanserin-only condition. After removing Ctrl 1 from all conditions and reanalyzing, the difference between Ctrl and Ket alone is no longer significant. The significant difference between Psi+Ket and Ket alone demonstrate that psilocin exerts effects that ketanserin can block, consistent with 5-HT2A receptor mediation. The revised figure and statistical analysis are included in the updated manuscript.

      Comment 6: mCherry Localization mCherry (Fig 4A) seems to be retained in the nucleus.

      The CamKII promoter drives expression of cytoplasmic mCherry, which fills the entire neuron including soma, dendrites, and axons. The apparent nuclear signal reflects mCherry accumulation in the soma, which surrounds the nucleus. The images clearly show mCherry extending into neurites, which was essential for our Sholl analysis of neuronal complexity.

      Comment 7: Reference 36

      Reference 36 is a review article that does not mention psilocin.

      Our statement refers broadly to serotonergic psychedelics increasing neurotrophic factors. Reference 36 (Colaço et al., 2020) examines ayahuasca, which contains the serotonergic psychedelic DMT. We have revised the text to clarify this point.

      Summary of Major Revisions

      (1) Removed all 5-HT2A receptor antibody-based experiments from Figure 1 and supplementary figures due to inconclusive specificity validation. An Author response image documenting our validation attempts is provided.

      (2) Clarified control conditions (vehicle controls at matched time points) in figure legends.

      (3) Expanded sample size descriptions in Methods and figure legends.

      (4) Re-analyzed ketanserin experiments with consistent cell line inclusion.

      (5) Added discussion of translational limitations.

      (6) Added new Figure S5 summarizing proposed signaling pathways.

      (7) Expanded discussion on the relevance of iPSC-derived neurons for drug development.

      Author response image 1.

      Immunostaining for 5-HT2A receptor across cell types and peptide-blocking control. (a) HEK293 cells display a positive immunofluorescent signal despite not endogenously expressing 5-HT2AR, indicating nonspecific antibody reactivity. (b) HeLa cells also exhibit a positive signal despite lacking endogenous 5-HT2AR expression, further demonstrating nonspecific antibody binding in non-expressing cell types. (c) Neural progenitor cells show clear positive 5-HT2AR staining. (d) iPSC-derived neurons exhibit robust and well-defined 5-HT2AR staining. (e) Application of the Alomone 5-HT2AR blocking peptide (#BLP-SR033) markedly reduces neuronal signal intensity, supporting epitope-specific binding.

      Author response image 2.

      Western blot analysis of 5-HT2A receptor abundance and peptide-blocking control. (a-b) In line with the immunofluorescence a single band is detected in iPSCs, HEK cells, neural progenitors, iPSC-derived neurons and (b) HeLa cells. (a) Preincubation of the primary antibody with the corresponding blocking peptide abolishes this band across all samples, consistent with specific binding of the antibody to its intended epitope.

      Author response image 3.

      Lack of detectable 5-HT2AR expression in HEK and HeLa cells. (a) Analysis of a human-only HEK293T single-cell RNA-seq dataset (10x Genomics; https://www.10xgenomics.com/datasets/293-t-cells-1-standard-1-1-0, accessed 2025-11-25) shows no meaningful HTR2A expression, whereas other genes such as GAPDH, TP53, MYC, and ACTB are robustly detected. Consistently, evaluation of a “Barnyard” dataset - an equal mixture of human HEK293T and mouse NIH3T3 cells (10x Genomics; https://www.10xgenomics.com/datasets/20-k-1-1mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-ht-v-3-1-3-1-high-6-1-0, accessed 2025-1125) reveals only ~4 of ~10,000 droplets with minimal HTR2A signal, confirming the absence of meaningful expression.(b) (b) qPCR analysis further demonstrates no detectable HTR2A transcripts in iPSCs or HeLa cells (Ct > 36), while neural progenitors and iPSC-derived cortical neurons show expression when normalized to housekeeping genes GAPDH and TBP.

    1. eLife Assessment

      This study provides valuable insight into stress biology by showing that yeast populations can rapidly evolve a trehalose producing resting state that substantially improves survival and rapid regrowth after freeze-thaw. This finding is consistent with the role of trehalose metabolism as a biophysical adaptation that is broadly relevant to the community working on environmental resilience and dormancy. The evidence is convincing: the authors integrate experimental evolution, cell-level biophysical measurements, and modelling in a mutually reinforcing manner.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents findings on the adaptation mechanisms of Saccharomyces cerevisiae under extreme stress conditions. The authors try to generalize this to adaptation to stress tolerance. A major finding is that S. cerevisiae evolves a quiescence-like state with high trehalose to adapt to freeze-thaw tolerance independent of their genetic background. The manuscript is comprehensive, and each of the conclusions is well supported by careful experiments.

      Strengths:

      This is excellent interdisciplinary work.

      I have commented on the response of the authors, in-line, below. This is to maintain the conversation thread with the authors.

      Comment 1:

      Earlier papers have shown that loss of ribosomal proteins, that slow growth, leads to better stress tolerance in S. cerevisiae. Given this, isn't it expected that any adaptation that slows down growth would, overall, increase stress tolerance? Even for other systems, it has been shown that slowing down growth (by spore formation in yeast or bacteria/or dauer formation in C. elegans) is an effective strategy to combat stress and hence is a likely route to adaptation. The authors stress this as one of the primary findings. I would like the authors to explain their position, detailing how their findings are unexpected in the context of the literature.

      Response:

      We agree that the link between slower growth and higher stress tolerance has been well stud-ied. What is distinctive here is that repeated, near-lethal freeze-thaw selected not only for a tolerant/quiescent-like state but also for a shorter lag on re-entry. In this regime of freeze-thaw-regrowth, cells that are tolerant but slow to restart would be outcompeted by naive fast growers. Our quiescence-based selection simulations reproduce exactly this constraint. We have added this explanation to the Results to make clear that the novelty is the co-evolution of a tolerant, trehalose-rich state together with rapid regrowth under an alternating regime.

      Comment to Response: I get the point. I believe that the outcome is highly dependent on how selection pressure is administered. So, generalizing this over all stresses (as done in the abstract) may not be accurate.

      Comment 2:

      Convergent evolution of traits: I find the results unsurprising. When selecting for a trait, if there is a major mode to adapt to that stress, most of the strains would adapt to that mode, independent of the route. According to me, finding out this major route was the objective of many of the previous reports on adaptive evolution. The surprising part in the previous papers (on adaptive evolution of bacteria or yeast) was the resampling of genes that acquired mutations in multiple replicates of an evolution experiments, providing a handle to understand the major genetic route or the molecular mechanism that guides the adaptation (for example in this case it would be - what guides the over-accumulation of trehalose). I fail to understand why the authors find the results surprising, and I would be happy to understand that from the authors. I may have missed something important.

      Response:

      Our surprise was precisely that we did not see the classical pattern of "phenotypic convergence + repeated mutations in the same locus/module." All independently evolved lines converged on a trehalose-rich, mechanically reinforced, quiescence-like phenotype, but population sequencing across lines did not reveal a single repeatedly hit gene or small shared pathway, even when we increased selection stringency (1-3 freeze-thaw cycles per round). We have now stated in the manuscript that this decoupling (strong phenotypic convergence, non-overlapping genetic routes) is the central inference: selection is acting on a physiologically defined state that multiple genotypes can reach.

      Comment to Response: You indeed saw a case of phenotypic convergence. Converging towards trehalose-rich, mechanically reinforced, quiescent like - are phenotypes that have converged. This is what prevented lysis. The same locus need not be mutated over and over again, if the trehalose pathway is controlled by many processes (it is, and many are still unknown as I point in the next comment), many different mutations on different loci can result in the same regulation! I do not see the decoupling between phenotypic convergence and decoupling of genetic mutations as surprising or novel; molecular and cellular biology is replete with such examples where deletion(mutation) of hundreds of different genes can have the same phenotypic outcome (yeast deletion library screening, indirect effects etc). If this was a specific question unsolved in evolutionary biology, then the matter is different.

      A minor point: Here I would also like to point out that the three phenotypes you measure may be linked to each other, so their independent evolution may just be a cause-effect relationship. For example Trehalose accumulation may drive the other two. This has not been deconvoluted in this manuscript.

      Comment 3:

      Adaptive evolution would work on phenotype, as all of selective evolution is supposed to. So, given that one of the phenotypes well-known in literature to allow free-tolerance is trehalose accumulation, I think it is not surprising that this trait is selected. For me, this is not a case of "non-genetic" adaptation as the authors point out: it is likely because perturbation of many genes can individually result in the same outcome - up-regulation of trehalose accumulation. Thereby, although the adaptation is genetic, it is not homogeneous across the evolving lines - the end result is. Do the authors check that the trait is actually a non-genetic adaptation, i.e., if they regrow the cells for a few generations without the stress, the cells fall back to being similarly only partially fit to freeze-thaw cycles? Additionally, the inability to identify a network that is conserved in the sequencing does not mean that there is no regulatory pathway. A large number of cryptic pathways may exist to alter cellular metabolic states.<br /> This is a point in continuation of point #2, and I would like to understand what I have missed.

      Response:

      We agree, and we have removed the wording "non-genetic adaptation." The evolved populations retain high survival even after regrowth for {greater than or equal to}25 generations without freeze-thaw, so the adaptation is clearly genetically maintained. What our data show is that there is no single genetic route to the shared phenotype; different mutations can all drive cells into the same trehalose-rich, quiescence-like, mechanochemically reinforced state. We now describe this as "genetic diversification with phenotypic convergence."

      Comment to Response: While the last term does explain what is going on, isn't it an outcome that is routine in cell biology (as pointed out in my previous comment to your response)? I apologize for not understanding the punchline that is provided in the last few sentences of the abstract.

      Comment 4:

      To propose the convergent nature, it would be important to check for independently evolved lines and most probably more than 2 lines. It is not clear from their results section if they have multiple lines that have evolved independently.

      Response:

      We indeed evolved four independent lines and maintained two independent controls. We have added this information at the start of the Results so that the level of replication is immediately clear.

      Comment to Response: Previous large scale studies have done hundreds of sequencing to oversample the pathway and figure out reproducible loci. With pooled sequencing (as mentioned below) and only 4 sample evolution, I am not sure that you would have the power in your study to conclude in the loci are sampled or not! If there were 10 gene LOFs that control Trehalose levels (which you can find from the published deletion screening experiment), then four of the experiments are likely to go through one of these routes; what is the likely event that you would identify the same route in two pools? It is unlikely, and therefore, sequencing of 4 pools cannot tell you if the mutation path is repeatedly sampled or not.

      Comment 5:

      For the genomic studies, it is not clear if the authors sequenced a pool or a single colony from the evolved strains. This is an important point, since an average sequence will miss out on many mutations and only focus on the mutations inherited from a common ancestral cell. It is also not clear from the section.

      Response:

      We sequenced population samples from the evolved lines. Our specific question was whether independently evolved lines would show the same high-frequency genetic solution, as is often seen in parallel evolution. Pool sequencing may under-sample rare/private variants, but it is appropriate for detecting such shared, high-frequency routes - and we do not find any. We have clarified this rationale in the Methods/Results.

      Comment to Response: Please provide the average sequencing depth of each sequencing run. It is essential to understand the power of this study in identifying mutations. What coverage was used in Xgenome size?

    3. Reviewer #2 (Public review):

      Summary:

      The authors used experimental evolution, repeatedly subjecting Saccharomyces cerevisiae populations to rapid liquid-nitrogen freeze-thaw cycles, while tracking survival, cellular biophysics, metabolite levels, and whole-genome sequence changes. Within 25 cycles, viability rose from ~2 % to ~70 % in all independent lines, demonstrating rapid and highly convergent adaptation despite distinct starting genotypes. Evolved cells accumulated about three-fold more intracellular trehalose, adopted a quiescence-like phenotype (smaller, denser, non-budding cells), showed cytoplasmic stiffening and reduced membrane damage, and re-entered growth with shorter lags-traits that together protected them from ice-induced injury. Whole-genome indicated that multiple genetic routes can yield the same mechano-chemical survival strategy. A population model in which trehalose controls quiescence entry, growth rate, lag, and freeze-thaw survival reproduced the empirical dynamics, implicating physiological state transitions rather than specific mutations as the primary adaptive driver. The study therefore concludes that extreme-stress tolerance can evolve quickly through a convergent, trehalose-rich quiescence-like state that reinforces membrane integrity and cytoplasmic structure.

      Strengths:

      Experimental design, data presentation and interpretation, writing

      Weaknesses:

      None

      Comments on revisions:

      The revised manuscript is improved and addresses the reviews concerns adequately.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and the reviewers for the detailed and constructive comments. In revising the manuscript we have: (i) clarified what is new relative to prior stress tolerance work, (ii) made explicit that we observe phenotypic convergence without a shared genetic route, (iii) stated upfront that we evolved four independent lines plus two controls, and (iv) corrected figure legends, statistics, and the missing citations. Below we respond point-by-point.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents findings on the adaptation mechanisms of Saccharomyces cerevisiae under extreme stress conditions. The authors try to generalize this to adaptation to stress tolerance. A major finding is that S. cerevisiae evolves a quiescence-like state with high trehalose to adapt to freeze-thaw tolerance independent of their genetic background. The manuscript is comprehensive, and each of the conclusions is well supported by careful experiments.

      Strengths:

      This is excellent interdisciplinary work.

      Weaknesses:

      I have questions regarding the overall novelty of the proposal, which I would like the authors to explain.

      (1) Earlier papers have shown that loss of ribosomal proteins, that slow growth, leads to better stress tolerance in S. cerevisiae. Given this, isn’t it expected that any adaptation that slows down growth would, overall, increase stress tolerance? Even for other systems, it has been shown that slowing down growth (by spore formation in yeast or bacteria/or dauer formation in C. elegans) is an effective strategy to combat stress and hence is a likely route to adaptation. The authors stress this as one of the primary findings. I would like the authors to explain their position, detailing how their findings are unexpected in the context of the literature.

      We agree that the link between slower growth and higher stress tolerance has been well studied. What is distinctive here is that repeated, near-lethal freeze–thaw selected not only for a tolerant/quiescent-like state but also for a shorter lag on re-entry. In this regime of freeze–thaw–regrowth, cells that are tolerant but slow to restart would be outcompeted by naive fast growers. Our quiescence-based selection simulations reproduce exactly this constraint. We have added this explanation to the Results to make clear that the novelty is the co-evolution of a tolerant, trehaloserich state together with rapid regrowth under an alternating regime.

      (2) Convergent evolution of traits: I find the results unsurprising. When selecting for a trait, if there is a major mode to adapt to that stress, most of the strains would adapt to that mode, independent of the route. According to me, finding out this major route was the objective of many of the previous reports on adaptive evolution. The surprising part in the previous papers (on adaptive evolution of bacteria or yeast) was the resampling of genes that acquired mutations in multiple replicates of an evolution experiments, providing a handle to understand the major genetic route or the molecular mechanism that guides the adaptation (for example in this case it would be - what guides the overaccumulation of trehalose). I fail to understand why the authors find the results surprising, and I would be happy to understand that from the authors. I may have missed something important.

      Our surprise was precisely that we did not see the classical pattern of “phenotypic convergence + repeated mutations in the same locus/module.” All independently evolved lines converged on a trehalose-rich, mechanically reinforced, quiescence-like phenotype, but population sequencing across lines did not reveal a single repeatedly hit gene or small shared pathway, even when we increased selection stringency (1–3 freeze–thaw cycles per round). We have now stated in the manuscript that this decoupling (strong phenotypic convergence, non-overlapping genetic routes) is the central inference: selection is acting on a physiologically defined state that multiple genotypes can reach.

      (3) Adaptive evolution would work on phenotype, as all of selective evolution is supposed to. So, given that one of the phenotypes well-known in literature to allow free-tolerance is trehalose accumulation, I think it is not surprising that this trait is selected. For me, this is not a case of ”non-genetic” adaptation as the authors point out: it is likely because perturbation of many genes can individually result in the same outcome - up-regulation of trehalose accumulation. Thereby, although the adaptation is genetic, it is not homogeneous across the evolving lines - the end result is. Do the authors check that the trait is actually a non-genetic adaptation, i.e., if they regrow the cells for a few generations without the stress, the cells fall back to being similarly only partially fit to freeze-thaw cycles? Additionally, the inability to identify a network that is conserved in the sequencing does not mean that there is no regulatory pathway. A large number of cryptic pathways may exist to alter cellular metabolic states.

      This is a point in continuation of point #2, and I would like to understand what I have missed.

      We agree, and we have removed the wording “non-genetic adaptation.” The evolved populations retain high survival even after regrowth for ≥25 generations without freeze–thaw, so the adaptation is clearly genetically maintained. What our data show is that there is no single genetic route to the shared phenotype; different mutations can all drive cells into the same trehalose-rich, quiescencelike, mechanochemically reinforced state. We now describe this as “genetic diversification with phenotypic convergence.”

      (4) To propose the convergent nature, it would be important to check for independently evolved lines and most probably more than 2 lines. It is not clear from their results section if they have multiple lines that have evolved independently.

      We indeed evolved four independent lines and maintained two independent controls. We have added this information at the start of the Results so that the level of replication is immediately clear.

      (5) For the genomic studies, it is not clear if the authors sequenced a pool or a single colony from the evolved strains. This is an important point, since an average sequence will miss out on many mutations and only focus on the mutations inherited from a common ancestral cell. It is also not clear from the section.

      We sequenced population samples from the evolved lines. Our specific question was whether independently evolved lines would show the same high-frequency genetic solution, as is often seen in parallel evolution. Pool sequencing may under-sample rare/private variants, but it is appropriate for detecting such shared, high-frequency routes — and we do not find any. We have clarified this rationale in the Methods/Results.

      Reviewer #2 (Public review):

      Summary:

      The authors used experimental evolution, repeatedly subjecting Saccharomyces cerevisiae populations to rapid liquid-nitrogen freeze-thaw cycles while tracking survival, cellular biophysics, metabolite levels, and whole-genome sequence changes. Within 25 cycles, viability rose from ~2 % to ~70 % in all independent lines, demonstrating rapid and highly convergent adaptation despite distinct starting genotypes. Evolved cells accumulated about threefold more intracellular trehalose, adopted a quiescence-like phenotype (smaller, denser, non-budding cells), showed cytoplasmic stiffening and reduced membrane damage, and re-entered growth with shorter lag traits that together protected them from ice-induced injury. Whole-genome sequencing indicated that multiple genetic routes can yield the same mechano-chemical survival strategy. A population model in which trehalose controls quiescence entry, growth rate, lag, and freeze-thaw survival reproduced the empirical dynamics, implicating physiological state transitions rather than specific mutations as the primary adaptive driver. The study therefore concludes that extreme-stress tolerance can evolve quickly through a convergent, trehalose-rich quiescence-like state that reinforces membrane integrity and cytoplasmic structure.

      Strengths:

      The strengths of the paper are the experimental design, data presentation and interpretation, and that it is well-written.

      (1) While the phenotyping is thorough, a few more growth curves would be quite revealing to determine the extent of cross-stress protection. For example, comparing growth rates under YPD vs. YPEG (EtOH/glycerol), and measuring growth at 37ºC or in the presence of 0.8 M KCl.

      We thank the referee for the interesting suggestions. However, growth rates alone may be difficult to interpret since WT strains also show different growth rates under these conditions. Therefore, comparing the relative fitness or survival of the evolved strains versus the WT under these stresses would be more informative. In the present study we limited growth/survival measurements to what was needed to parameterize the adaptation model in YPD under the freeze–thaw regime. We have now added a statement in the Discussion that, given the shared trehalose/mechanical mechanism, such cross-stress assays are an expected and straightforward follow-up.

      (2) Is GEMS integrated prior to evolution? Are the evolved cells transformable?

      Yes. GEMs were integrated prior to evolution, because the non-integrated evolved population showed low transformation efficiency, likely due to altered cell-wall properties.

      (3) From the table, it looks like strains either have mutations in Ras1/2 or Vac8. Given the known requirements of Ras/PKA signaling for the G1/S checkpoint (to make sure there are enough nutrients for S phase), this seems like a pathway worth mentioning and referencing. Regarding Vac8, its emerging roles in NVJ and autophagy suggest another nutrient checkpoint, perhaps through TORC1. The common theme is rewired metabolism, which is probably influencing the carbon shuttling to trehalose synthesis.

      We appreciate the reviewer’s suggestion to consider pathways like Ras/PKA (linked to Ras1/2) and autophagy/TORC1 (linked to Vac8) as potential upstream modulators. While these pathways are involved in nutrient sensing and metabolic regulation, we choose not to emphasize them specifically. This is because (i) some evolved lines lack Ras1/2 or Vac8 variants, and (ii) none of the variants lies directly in trehalose synthesis/degradation pathways. Furthermore, direct links to trehalose accumulation are not well established for these specific variants in this context, and pathways like Ras are global regulators with broad effects. Together with the strongly convergent phenotype, this supports our main inference that multiple genetic/metabolic routes can feed into the same trehalose-rich, mechanochemically reinforced, quiescence-like state. We have added a note in the discussion regarding metabolic rewiring and trehalose.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Generally, the results sections should have more details. The figures should be corrected, and the legends should be checked for correctness. The manuscript seems to have been assembled in haste?

      We have expanded the relevant Results subsections with one-sentence motivations (why each measurement was performed) and we have corrected the figure legends for ordering and consistency.

      Figure 3: It will be good to have the correct p-values on the figure itself. P-values are typically less than 1, unless there is some special method (here the values presented are , etc). Please explain how the P-values were obtained in the figure legend itself.

      Figure 3 now shows the actual p-values. The legend specifies the details and the sample sizes used.

      Figure 5: It is not clear what the error bars show in 5B, E (different evolved population/ clones/ cells?). All the figure legends are mixed up, please correct them. It is difficult to follow the paper.

      Figure 5 legends now state clearly what the error bars represent (biological replicates) and which panels are from single-cell measurements. We have checked the panel lettering and legend order for consistency with the flow of the main text.

      Reviewer #3 (Recommendations for the authors):

      Overall, the paper is outstanding, well-written, and insightful.

      A point to address is that there are missing citations on lines 60, 91.

      We have added the missing citations at both locations. We apologize for the omission, which was due to a compilation error. This error has been fixed, and the bibliography has been corrected (now containing 74 references).

    1. eLife Assessment

      The authors present an important set of data implicating ETFDH as an epigenetically suppressed gene in cancer with tumor suppressive functions. The evidence is convincing, with the authors demonstrating that suppression of ETFDH activity results in accumulation of amino acids that impact metabolism via hyperactive mTORC1.

    2. Reviewer #1 (Public review):

      In their manuscript, Papadopoli et al explore the role of ETFDH in transformation. They note that ETFDH protein levels are decreased in cancer, and that deletion of ETFDH in cancer cell lines results in increased tumorigenesis, elevated OXPHOS and glycolysis, and a reduction in lipid and amino acid oxidation. The authors attribute these effects to increased amino acid levels stimulating mTORC1 signaling and driving alterations in BCL6 and EIF4EBP1. They conclude that ETFDH1 is epigenetically silenced in a proportion of neoplasms, suggesting a tumor-suppressive function. Overall, the authors logically present clear data and perform appropriate experiments to support their hypotheses.

    3. Reviewer #2 (Public review):

      Summary:

      The altered metabolism of tumors enables their growth and survival. Classically, tumor metabolism often involves increased activity of a given pathway in intermediary metabolism to provide energy or substrates needed for growth. Papadopoli et al. investigate the converse - the role of mitochondrial electron transfer flavoprotein dehydrogenase (ETFDH) in cancer metabolism and growth. The authors present compelling evidence that ETFDH insufficiency, which is detrimental in non-malignant tissues, paradoxically enhances bioenergetic capacity and accelerates neoplastic growth in cancer cells in spite of the decreased metabolic fuel flexibility that this affords tumor cells. This is achieved through the retrograde activation of the mTORC1/BCL-6/4E-BP1 axis, leading to metabolic and signaling reprogramming that favors tumor progression.

      Strengths:

      This review focuses primarily on the cancer metabolism aspects of the manuscript.

      The study provides robust evidence linking ETFDH insufficiency to enhanced cancer cell bioenergetics and tumor growth.

      The use of multiple cancer cell lines and in vivo models strengthens the generalizability of the findings.

      The mechanistic insights into the mTORC1/BCL-6/4E-BP1 axis and its role in metabolic reprogramming are of general interest within and outside the immediate field of tumor metabolism.

      Conclusion:

      This manuscript provides significant insights into the role of ETFDH insufficiency in cancer metabolism and growth. The findings highlight the potential of targeting the mTORC1/BCL-6/4E-BP1 axis in ETFDH-deficient cancers. The compelling data support the conclusions presented in the manuscript, which will be valuable to the cancer metabolism community.

      [Editors' note: The authors have addressed each of the two weaknesses previously listed in the public review, providing new experimental data on nucleotides and showing that the catalytic activity is required via the suggested addback experiment.]

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Authors state, "we identified ETF dehydrogenase (ETFDH) as one of the most dispensable metabolic genes in neoplasia." Surely there are thousands of genes that are dispensable for neoplasia. Perhaps the authors can revise this sentence and similar sentiments in the text.

      We agree with the reviewer and have corrected the text accordingly. Specifically, we rephrased the sentence: “Surprisingly, we observed that in contrast to muscle, ETFDH is one of the most non-essential metabolic genes in cancer cells.” to “Surprisingly, we observed that in contrast to muscle, ETFDH is a non-essential gene in acute lymphoblastic leukemia NALM-6 cells”

      Authors state, "These findings show that ETFDH loss elevates glutamine utilization in the CAC to support mitochondrial metabolism." While elevated glutamine to CAC flux is consistent with the statement that increased glutamine, the authors have not measured the effect of restoring glutamine utilization to baseline on mitochondrial metabolism. Thus, the causality implied by the authors can only be inferred based on the data presented. Indeed, the increased glutamine consumption may be linked to the increase in ROS, as glutamate efflux via system xCT is a major determinant of glutamine catabolism in vitro.

      Indeed. We changed the statement "These findings show that ETFDH loss elevates glutamine utilization in the CAC to support mitochondrial metabolism." to "Collectively, these data demonstrate that ETF insufficiency in cancer cells remodels mitochondrial metabolism and increases the glutamine consumption and anaplerosis."

      Authors state that the mechanism described is an example of "retrograde signaling". However, the mechanism seems to be related to a reduction in BCAA catabolism, suggesting that the observed effects may be a consequence of altered metabolic flux rather than a direct signaling pathway. The data presented do not delineate whether the observed effects stem from disrupted mitochondrial communication or from shifts in nutrient availability and metabolic regulation.

      Notwithstanding that the term “retrograde” was used to refer to signaling from mitochondria to mTORC1, rather than from mTORC1 to mitochondria [1], we have removed the term “retrograde signaling” throughout the manuscript.

      The authors should discuss which amino acids that are ETFDH substrates might affect mTORC1 activity or consider whether other ETFDH substrates might also affect mTORC1 in their discussion. Along these lines, the authors might consider discussing why amino acids that are not ETFDH substrates are increased upon ETFDH loss.

      Based on the literature, we expect that branched chain amino acids that are ETFDH substrates (e.g., leucine) are likely to play a major role in activating mTORC1 upon ETFDH abrogation. As expected, the aforementioned amino acids are among those that are the most highly upregulated in ETFDH deficient cells (Fig 3A). We have, however, never formally tested the role of branched chain amino acid in activating mTORC1 in the context of ETFDH disruption. The increase in amino acids that are not metabolized via ETFDH, is likely to stem from global metabolic rewiring of ETFDH-deficient cells and observed alterations in amino acid uptake (e.g., glutamine; Fig 2F). We discuss this in the revised version of the paper as follows:

      “Several metabolites can be sensed via signaling partners upstream of mTORC1, including leucine, arginine, methionine/SAM, and threonine [2]. Branched-chain amino acids (leucine, isoleucine, and valine), which are among the highest upregulated metabolites in ETFDH deficient cells (Fig 3A) serve as ETFDH substrates, and have been described to display strong activation capabilities towards mTORC1 in the literature [3,4]. Glutamine can also activate mTORC1 through Arf family of GTPases [5]. Indeed, glutamine can supplement the non-essential amino acid (NEAA) pool through transamination [6] and amino acid uptake [7]. Accordingly, the maintenance of NEAA that are non-ETFDH substrates may be supported by the global metabolic rewiring fueled by enhanced glutamine metabolism in ETFDH-deficient cells. Deciphering the mechanisms leading to accumulation of specific amino acids and their role in ETFDH-dependent mTORC1 modulation is warranted.”

      Reviewer #2 (Public review):

      The authors would strengthen the paper considerably by adding back catalytically inactive ETFDH to show that the activity of this enzyme is responsible for the increased growth phenotypes and changes in labeling that they observe.

      Based on the Reviewers’ suggestions we performed these experiments. Herein, we took advantage of Y304A/G306E ETFDH mutant that impairs electron transfer from ETF and cannot substitute for the wild type (WT) gene function in ETFDH-deficient myoblasts [8]. We expressed WT and Y304A/G306E ETFDH mutant in ETFDH KO HCT116 colorectal cancer cells and confirmed that they are expressed to a comparable level (Supplementary Figure 6C). Re-expression of WT decreased proliferation, while suppressing mTORC1 signaling and increasing 4E-BP1 levels relative to control (vector infected) ETFDH KO EV HCT116 cells (Supplementary Figure 6D). In contrast, proliferation rates, mTORC1 signaling and 4E-BP1 levels remained largely unchanged upon Y304A/G306E ETFDH mutant expression in ETFDH KO HCT116 cells (Supplementary Figure 6D). Similarly, re-expression of WT ETFDH disrupted the bioenergetic phenotype associated with ETFDH loss, in contrast to re-expression of Y304A/G306E ETFDH mutant, which exhibited similar bioenergetic profiles as ETFDH KO control (Supplementary Figure 6E-F). Collectively these findings argue that the ETFDH activity is required for its tumor suppressive effects.

      If nucleotide pool and labeling data are available, or can be obtained readily, this would significantly strengthen the tracing data already obtained.

      We followed Reviewer’s suggestion and measured nucleotide levels. This revealed that loss of ETFDH results in increase in steady-state nucleotide pools (Supplementary Figure 2K), consistent with increased aspartate labelling and accelerated tumor growth.

      References

      (1) Morita, M. et al. mTORC1 controls mitochondrial activity and biogenesis through 4EBP-dependent translational regulation. Cell Metab 18, 698-711 (2013). https://doi.org/10.1016/j.cmet.2013.10.001

      (2) Valenstein, M. L. et al. Structural basis for the dynamic regulation of mTORC1 by amino acids. Nature 646, 493-500 (2025). https://doi.org/10.1038/s41586-025-09428-7

      (3) Appuhamy, J. A., Knoebel, N. A., Nayananjalie, W. A., Escobar, J., & Hanigan, M. D. Isoleucine and leucine independently regulate mTOR signaling and protein synthesis in MAC-T cells and bovine mammary tissue slices. J Nutr 142, 484-491 (2012). https://doi.org/10.3945/jn.111.152595

      (4) Herningtyas, E. H. et al. Branched-chain amino acids and arginine suppress MaFbx/atrogin-1 mRNA expression via mTOR pathway in C2C12 cell line. Biochim Biophys Acta 1780, 1115-1120 (2008). https://doi.org/10.1016/j.bbagen.2008.06.004

      (5) Jewell, J. L. et al. Metabolism. Differential regulation of mTORC1 by leucine and glutamine. Science 347, 194-198 (2015). https://doi.org/10.1126/science.1259472

      (6) Tan, H. W. S., Sim, A. Y. L. & Long, Y. C. Glutamine metabolism regulates autophagy-dependent mTORC1 reactivation during amino acid starvation. Nat Commun 8, 338 (2017). https://doi.org/10.1038/s41467-017-00369-y

      (7) Chen, R. et al. The general amino acid control pathway regulates mTOR and autophagy during serum/glutamine starvation. J Cell Biol 206, 173-182 (2014).https://doi.org/10.1083/jcb.201403009

      (8) Herrero Martin, J. C. et al. An ETFDH-driven metabolon supports OXPHOS efficiency in skeletal muscle by regulating coenzyme Q homeostasis. Nat Metab 6, 209-225 (2024). https://doi.org/10.1038/s42255-023-00956-y

    1. eLife Assessment

      This manuscript investigates the extremely interesting and important claim that the human hippocampus represents interactions with multiple social interaction partners on two relatively abstract social dimensions - and that this ability correlates with the social network size of the participant. This research potentially demonstrates the intricate role of the hippocampus in navigating our social world. While most of the results are solid, the paper requires some further clarification.

    2. Reviewer #1 (Public review):

      Schafer et al. tested whether the hippocampus tracks social interactions as sequences of neural states within an abstract social space defined by the dimensions of affiliation and power, using a narrative-based task in which participants engaged in dynamic social interactions. The study showed that individual social relationships were represented as distinct trajectories of hippocampal activity patterns. These neural trajectories systematically reflected trial-by-trial changes in affiliation and power between the participant and each character, suggesting that the hippocampus encodes sequences of socially relevant events and their relational structure, extending its well-established role beyond spatial representations.

      A major strength of this study is the use of a richly structured, narrative-based task that allows social relationships to evolve dynamically over time. The use of representational similarity analysis provides a principled framework for linking behavioral trajectories in social space to neural pattern dynamics.

      One potential limitation concerns temporal autocorrelation in the neural data, as nearby trials are inherently related both behaviorally and temporally within a continuous narrative. Although the authors carefully attempted to control for temporal distance and related confounds, fully disentangling representational similarity driven by social structure from similarity driven by temporal proximity remains challenging within a single-session task design.

      While the findings of a two-dimensional representational structure is an important contribution, it remains an open question whether such a representation reflects an inherent property of how the human brain encodes social relationships, or whether it is partly driven by task constraints in which social interactions were limited to changes along two (affiliation and power) dimensions. Future studies that allow social relationships to vary along richer or higher-dimensional feature spaces will be necessary to determine the generality of low dimensional representations.

    3. Reviewer #2 (Public review):

      The substantially revised paper has increased in clarity and is much more accessibe and straightforward than the first version. The analyses are now clearer and support the conclusions better. There are however some remaining methodological weakness, which in my mind still renders the evidence to not be entirely convincing.

      (1) The temporal autocorrelation concern is not fully convincingly addressed. The temporal autocorrelation curves supplied in the supplements are really helpful, but linearly regressing out the temporal distance from the neural distance clearly does not work, as one can see from the right panel of supplementary Figure 1. If the method had worked correctly the line should have been flat. The analysis however shows that decision trials with a lag > 2 are basically independent - so a simple way to address this is to restrict the RSA analysis to trials with a decision lag of > 2. This analysis would strengthen the paper a lot.

      (2) In the final analysis, the authors use all the trials to make the claim that the hippocampus represents the characters in a shared social space. However, as within-character distances are still included in the analysis, this result could still be driven by the effects of within-character representations that are not shared across characters. A simple way of addressing this concern would be to only include between-character distances in this analysis, making it truly complementary to the previous within-character analysis. It would also be very interesting to compare the the within- and between-character analyses in the hippocampus directly.

      (3) Overall, the correction for multiple comparisons in the fMRI and the resulting corrected p-values are not sufficiently explained and documented in the paper. What was exactly permuted in the tests? Was correction applied in a voxel-wise or cluster-wise fashion? If cluster-wise, the cluster-wise p-values need to be reported.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      Schafer et al. tested whether the hippocampus tracks social interactions as sequences of neural states within an abstract social space defined by dimensions of affiliation and power, using a task in which participants engaged in narrative-based social interactions. The findings of this study revealed that individual social relationships are represented by unique sequences of hippocampal activity patterns. These neural trajectories corresponded to the history of trial-to-trial affiliation and power dynamics between participants and each character, suggesting an extended role of the hippocampus in encoding sequences of events beyond spatial relationships.

      The current version has limited information on details in decoding and clustering analyses which can be improved in the future revision.

      Strengths:

      (1) Robust Analysis: The research combined representational similarity analysis with manifold analyses, enhancing the robustness of the findings and the interpretation of the hippocampus's role in social cognition.

      (2) Replicability: The study included two independent samples, which strengthens the generalizability and reliability of the results.

      Weaknesses:

      I appreciate the authors for utilizing contemporary machine-learning techniques to analyze neuroimaging data and examine the intricacies of human cognition. However, the manuscript would benefit from a more detailed explanation of the rationale behind the selection of each method and a thorough description of the validation procedures. Such clarifications are essential to understand the true impact of the research. Moreover, refining these areas will broaden the manuscript's accessibility to a diverse audience.

      We thank the reviewer for these comments and have addressed them in various ways.

      First, we removed the spline-based decoding and spectral clustering analyses. As we detail in our response to the recommendations, these approaches were complex and raised legitimate interpretational concerns, making it unclear how they supported our core claims. The revised manuscript now focuses on a set of representational similarity analyses to show representations consistent with social dimension similarity (affiliation vs. power decision trials) and social location similarity (trajectory/map-like coding based on participant choices).

      Second, we expanded the Methods and Results to more clearly explain the analyses, the questions they address, and associated controls and robustness tests. The dimension similarity analysis tests whether hippocampal patterns differentiate affiliation and power decisions in a way consistent with an abstract dimension representation. The location similarity RSAs test whether within-character neural pattern distances scale with Euclidean distance in social space (relationship-specific trajectories), and whether pattern distances across all characters scale with location distances when distances are globally standardized, consistent with a shared map-like coordinate system.

      Third, we emphasize new controls. For the dimension similarity RSA, we test for potential confounds such as word count, text sentiment, and reaction time differences between affiliation and power trials. For the location similarity RSA, we control for temporal distance between trials and show (in the Supplement) that the reported effects cannot be explained by temporal autocorrelation in the fMRI data or by the relationship between temporal distance and behavioral location distance.

      We believe that these changes address the reviewer’s request for clearer rationale and validation.

      Reviewer #2 (Public review):

      Summary:

      Using an innovative task design and analysis approach, the authors set out to show that the activity patterns in the hippocampus related to the development of social relationships with multiple partners in a virtual game. While I found the paper highly interesting (and would be thrilled if the claims made in the paper turned out to be true), I found many of the analyses presented either unconvincing or slightly unconnected to the claims that they were supposed to support. I very much hope the authors can alleviate these concerns in a revision of the paper.

      Strengths & Weaknesses:

      (1) The innovative task design and analyses, and the two independent samples of participants are clear strengths of the paper.

      We thank the reviewer for this comment.

      (2) The RSA analysis is not what I expected after I read the abstract and tile of the result section "The hippocampus represents abstract dimensions of affiliation and power". To me, the title suggests that the hippocampus has voxel patterns, which could be read out by a downstream area to infer the affiliation and power value, independent of the exact identity of the character in the current trial. The presented RSA analysis however presents something entirely different - namely that the affiliation trials and power trials elicit different activity patterns in the area indicated in Figure 3. What is the meaning of this analysis? It is not clear to me what is being "decoded" here and alternative explanations have not been considered. How do affiliation and power trials differ in terms of the length of sentences, complexity of the statements, and reaction time? Can the subsequent decision be decoded from these areas? I hope in the revision the authors can test these ideas - and also explain how the current RSA analysis relates to a representation of the "dimensions of affiliation and power".

      We agree that this analysis needed to be better justified and explained. We have revised the text to clarify that by “represents the interaction decision trials along abstract social dimensions” we mean that hippocampal multivoxel patterns differentiate affiliation and power decisions in a way consistent with the conceptual framework of underlying latent dimensions. The analysis tests one simple prediction of this view – that on average these trial types are separable in the neural patterns. We have added details to the Methods, showing how the affiliation and power trials do not differ in word count or in sentiment, but do differ in their semantics, as assessed by a Large Language Model, as we expect from our task assumptions. Thanks to the reviewer’s comment, we also tested for and found a reaction time difference between affiliation and power trials, that we now control for.

      (3) Overall, I found that the paper was missing some more fundamental and simpler RSA analyses that would provide a necessary backdrop for the more complicated analyses that followed. Can you decode character identity from the regions in question? If you trained a simple decoder for power and affiliation values (using the LLE, but without consideration of the sequential position as used in the spline analysis), could you predict left-out trials? Are affiliation and power represented in a way that is consistent across participants - i.e. could you train a model that predicts affiliation and power from N-1 subjects and then predict the Nth subject? Even if the answer to these questions is "no", I believe that they are important to report for the reader to get a full understanding of the nature of the neural representations in these areas. If the claim is that the hippocampus represents an "abstract" relationship space, then I think it is important to show that these representations hold across relationships. Otherwise, the claim needs to be adjusted to say that it is a representation of a relationship-specific trajectory, but not an abstract social space.

      We appreciate this comment and agree on the value of clear, conceptually simple analyses. To address this concern, we have simplified our main analysis significantly by removing the spline-based analysis and substituting it with a multiple regression representational similarity analysis approach. We test whether within-character neural pattern distances scale with distance in social space (relationship-specific trajectories), and whether pattern distances across all characters scale with location distances when distances are globally standardized. We find evidence for both, consistent with a shared map-like coordinate system.

      We agree that decoding character identity and an across-participant decoding approach could be informative. However, our current task is not well designed for such analyses and as such would complicate the paper. Although we agree that these questions are interesting, they would test questions that are outside the scope of this paper. 

      (4) To determine that the location of a specific character can be decoded from the hippocampal activity patterns, the authors use a sequential analysis in a lowdimensional space (using local linear embedding). In essence, each trial is decoded by finding the pair of two temporally sequential trials that is closest to this pattern, and then interpolating the power/affiliation values linearly between these two points. The obvious problem with this analysis is that fMRI pattern will have temporal autocorrelation and the power and affiliation values have temporal autocorrelation. Successful decoding could just reflect this smoothness in both time series. The authors present a series of control analyses, but I found most of them to not be incisive or convincing and I believe that they (and their explanation of their rationale) need to be improved. For example, the circular shifting of the patterns preserves some of the autocorrelation of the time series - but not entirely. In the shifted patterns, the first and last items are considered to be neighboring and used in the evaluation, which alone could explain the poor performance. The simplest way that I can see is to also connect the first and last item in a circular fashion, even when evaluating the veridical ordering. The only really convincing control condition I found was the generation of new sequences for every character by shuffling the sequence of choices and re-creating new artificial trajectories with the same start and endpoint. This analysis performs much better than chance (circular shuffling), suggesting to me that a lot of the observed decoding accuracy is indeed simply caused by the temporal smoothness of both time series.

      We thank the reviewer for emphasizing this important concern; we agree that we did not sufficiently address this in the initial submission. This concern is one main reason we removed the spline-based analysis and now use regression-based representational similarity analyses in its place. In the revision, we report autocorrelation-related analyses in the supplement, and via controls and additional analysis show that temporal distance (or its square) cannot explain the location-like effects. This substantially improves our ability to interpret the findings.

      (5) Overall, I found the analysis of the brain-behavior correlation presented in Figure 5 unconvincing. First, the correlation is mostly driven by one individual with a large network size and a 6.5 cluster. I suspect that the exclusion of this individual would lead to the correlation losing significance. Secondly, the neural measure used for this analysis (determining the number of optimal clusters that maximize the overlap between neural clustering and behavioral clustering) is new, non-validated, and disconnected from all the analyses that had been reported previously. The authors need to forgive me for saying so, but at this point of the paper, would it not be much more obvious to use the decoding accuracy for power and affiliation from the main model used in the paper thus far? Does this correlate? Another obvious candidate would be the decoding accuracy for character identity or the size of the region that encodes affiliation and power. Given the plethora of candidate neural measures, I would appreciate if the authors reported the other neural measures that were tried (and that did not correlate). One way to address this would have been to select the method on the initial sample and then test it on the validation sample - unfortunately, the measure was not pre-registered before the validation sample was collected. It seems that the correlation was only found and reported on the validation sample?

      We agree that this analysis was too complicated and under constrained, and thus not convincing. We think that removing this cluster-based analysis is the most conservative response to the reviewer’s concerns and have removed it from the revised paper.

      Recommendations to the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript's description of the shuffling analysis performed during decoding is currently ambiguous, particularly concerning the control variables. This ambiguity is present only in the Figure 4 legends and requires a more detailed explanation within the methods section. It is essential to clarify whether the permutation process was conducted within each character's data set or across multiple characters' data sets. If permutations were confined to within-character data, the conclusion would be that the hippocampus encodes context-specific information rather than providing a twodimensional common space.

      We thank the reviewer for this comment. We have now removed the spline analysis due to these and other problems and have replaced it with representational similarity analyses that are both more rigorous and easier to interpret. We think these analyses allow us to make the claim that the characters are represented in a common space. 

      In the methods, we explain the analyses (page 23-24, lines 475-500):

      “We also expected the hippocampus to represent the different characters’ changing social locations, which are implicit in the participant’s choices. We used multiple regression searchlight RSA to test whether hippocampal pattern dissimilarity increases with social location distance, based on participant-specific trial-wise beta images where boxcar regressors spanned each trial’s reaction time.”

      “We ran two complementary regression analyses to address two related questions. First, we asked whether the hippocampus represents how a specific relationship changes over time. For this analysis, for each participant and each searchlight, we computed character-specific (i.e., only for same character trial pairs) correlation distances between trial-wise beta patterns and Euclidean distances between the social location behavioral coordinates. Distances were zscored within character trial pairs to isolate character-specific changes. The second analysis asked whether the there is a common map-like representation, where all trials, regardless of relationship, are represented in a shared coordinate system. Here, we included all trial pairs and z-scored the distances globally. For both regression analyses, we included control distances to control for possible confounds. To account for generic time-related changes, we controlled for absolute scan-time difference, as this correlated with location distance across participants (see Temporal autocorrelation of hippocampal beta patterns in the supplement). Although the square of this temporal distance did not explain any additional variance in behavioral distances, we ran a robustness analysis including both temporal distance and its square and saw qualitatively the same clusters with similar effect sizes. As such, we report the main analysis only. We included binary dimension difference (0 = trial pairs of different dimension, 1 = trials pairs of the same dimension), to ensure effects could not be explained by dimension-related effects. In the group-level model, we controlled for sample and the average reaction time between affiliation and power decisions.”

      In the results, we describe the results and our interpretation (pages 11-12, lines 185208):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation. Does it also represent the changing social coordinates of each character? To test this, we multiple-regression RSA searchlight to test whether left hippocampus patterns represent the characters’ changing social locations across interactions (see Figure 3). We restricted the distances to those from trial pairs from the same character and standardized the distances within character (see Figure 3BD). We controlled for temporal distance to ensure the effect was not explainable by the time between trials, and for whether the trials shared the same underlying dimension (affiliation or power; see Location similarity searchlight analyses for more details). At the group level, we controlled for sample and the average reaction time difference between affiliation and power trials. Using the same testing logic as the dimensionality similarity analysis, we first tested our hypothesis in the bilateral hippocampus and found widespread effects in both the left (peak voxel MNI x/y/z = -35/-22/-15, cluster extent = 1470 voxels) and right (peak voxel MNI x/y/z = 37/-19/-14, cluster extent = 1953 voxels) hemispheres. The whole-brain searchlight analysis revealed additional clusters in the left putamen (-27/-3/14, cluster extent = 131 voxels) and left posterior cingulate cortex (-10/-28/41, cluster extent = 304 voxels).”

      “We then asked a second, complementary question: does the hippocampus represent all interactions, across characters, within a shared map? To test for this map-like structure, we repeated the analysis but now included all trial pairs, z-scoring distances globally rather than within character (Figure 3E-F). The remainder of the procedure followed the same logic as the preceding analysis. The hippocampus analysis revealed an extensive right hippocampal cluster (27/27/-14, cluster extent = 1667 voxels). The whole-brain analysis did not show any significant clusters.”

      We also describe the results in the discussion (page 12, lines 220-226): 

      “Then, we show that the hippocampus tracks the changing social locations (affiliation and power coordinates), above and beyond the effects of dimension or time; the hippocampus seemed to reflect both the changing within-character locations, tracking their locations over time, and locations across characters, as if in a shared map. Thus, these results suggest that the hippocampus does not just encode static character-related representations but rather tracks relationship changes in terms of underlying affiliation and power.”

      The manuscript's description of the decoding analysis is unclear regarding the variability of the decoded positions. The authors appear to decode the position of a character along a spline, which raises the question of whether this position correlates with time, since characters are more likely to be located further from the center in later trials. There is a concern that the decoded position may not solely reflect the hippocampal encoding of spatial location, but could also be influenced by an inherent temporal association. Given that a character's position at time t is likely to be similar to its positions at t−1 and t+1, it is crucial that the authors clearly articulate their approach to separating spatial representation from temporal autocorrelation. While this issue may have been addressed in the construction of the test set, the manuscript does not seem to adequately explain how such biases were mitigated in the training set.

      We agree that temporal confounding needs to be better accounted for, as our claims depend on space-like signals being separable from time-like ones. We address this in several ways in the revised manuscript.

      First, we emphasize that this is a narrative-based task, where temporal structure is relevant. As such, our analyses aim to demonstrate that effects go beyond simple temporal confounds, like trial order or time elapsed.

      Despite the temporal structure to the task, the decisions for the same character are spaced in time, and interleaved with other characters’ decisions, reducing the chance that a simple temporal confound could explain trajectory-related effects. We now describe the task better in the revised methods (page 16, lines 314-318):

      “All six characters’ decision trials are interleaved with one another and with narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to that same character, such that each character’s choices are separated by an average of ~20 seconds (range 12 seconds to 10 min).”

      To address temporal autocorrelation in the fMRI time series, we used SPM’s FAST algorithm. Briefly, FAST models temporal autocorrelation as a weighted combination of candidate correlation functions, using the best estimate to remove autocorrelated signal.

      We also now report the temporal autocorrelation profile of the hippocampal beta series in the supplement, including (pages 29-31, lines 593-656):

      “The Social Navigation Task is a narrative-based task, where the relationships with characters evolve over time; trial pairs that are close in time may have more similar fMRI patterns for reasons unrelated to social mapping (e.g., slow drift). It is important to account for the role of time in our analyses, to ensure effects go beyond simple temporal confounds, like the time between decision trials. To aid in this, we quantified how fMRI signals change over time using a pattern autocorrelation function across decision trial lags. We defined the left and right hippocampus and the left and right intracalcarine cortex using the HarvardOxford atlas and thresholded them at 50% probability. We chose intracalcarine corex as an early visual control region that largely corresponds to primary visual cortex (V1), as it is likely to be driven by the visually presented narrative. We used the same trial-wise beta images as in the location similarity RSA (boxcar regressors spanning each decision trial’s reaction time). For each participant and region-of-interest (ROI), we extracted the decision trial-by-voxel beta matrix and quantified three kinds of temporal dependence: beta autocorrelation, multivoxel pattern correlation and multivoxel pattern correlation after regressing out temporal distance.”

      “To estimate the temporal autocorrelation of the trial-wise beta values, we treated each voxel’s beta values as a time series across trials and measured how much a voxel’s response on one trial correlated (Pearson) with its response on previous trials. We averaged these voxel wise autocorrelations within each ROI. At one trial apart (lag 1), both the hippocampus and V1 showed small positive autocorrelations, indicating modest trial-to-trial carryover in response amplitude (see Supplemental figure 1) that by three trials apart was approximately 0.”

      “Because our representational similarity analyses depend on trial-by-trial pattern similarity, we also estimated how multivoxel patterns were autocorrelated over time. For each lag, we computed the Pearson correlation between each trial’s voxelwise pattern and the pattern from the trial that many trials earlier, then averaged those correlations to obtain a single autocorrelation value for that lag. At one trial apart, both regions showed positive autocorrelation, with V1 having greater autocorrelation than the hippocampus; pattern correlations between trials 3 or 4 trials apart reduced across participants, settling into low but positive values. Then, for each participant and ROI, we regressed out the effect of absolute trial onset differences from all pairwise pattern correlations, to mirror the effects of controlling for these temporal distances in regressions. After removing this temporal distance component, the short lag pattern autocorrelation dropped substantially in both regions. The similarity in autocorrelation profiles between the two regions suggests that significant similarity effects in the hippocampus are unlikely to be driven by generic temporal autocorrelation.”

      “Relationship between behavioral location distance and temporal distance “

      “We also quantified how temporal distances between trials relates to their behavioral location distances, participant by participant. Our dimension similarity analysis controls for temporal distance between trials by design (see Social dimension similarity searchlight analysis), but our location similarity analysis does not. To decide on covariates to include in the analysis, we tested whether temporal distances can explain behavioral location distances. For each participant, we computed the correlations between trial pairs’ Euclidean distances in social locations and their linear temporal distances (“linear”) and the temporal distances squared (“quadratic”), to test for nonlinear effects. We then summarized the correlations using one-sample t-tests. The linear relationship was statistically significant (t<sub>49</sub> = 12.24, p < 0.001), whereas the quadratic relationship was not (t<sub>49</sub> = -0.55, p = 0.586). Similarly, in participant specific regressions with both linear and quadratic temporal distances, the linear effect was significant (t<sub>49</sub> = 5.69, p < 0.001) whereas the quadratic effect was not (t<sub>49</sub> = 0.20, p = 0.84). Based on this, we included linear temporal distances as a covariate in our location similarity analyses (see Location similarity searchlight analyses), and verified that adding a quadratic temporal distance covariate does not alter the results. Thus, the reported location-related pattern similarity effects go beyond what can be explained by temporal distance alone.”

      How the free parameter of spectral clustering was determined, if there is any?

      The interpretation of the number of hippocampal activity clusters is ambiguous. It is suggested that this number could fluctuate due to unique activity patterns or the fit to behaviorally defined trajectories. A lower number of clusters might indicate either a noisier or less distinct representation, raising the question of the necessity and interpretability of such a complex analysis. This concern is compounded by the potential sensitivity of the clustering to the variance in Euclidean distances of each trial's position relative to the center. If a character's position is consistently near the center, this could artificially reduce the perceived number of clusters. Furthermore, the manuscript should address whether there is any correlation between the number of clusters and behavioral performance. Specifically, what are the implications if participants are able to perform the task adequately with a smaller number of distinct hippocampal representation states?

      The rationale for conducting both cluster analysis and position decoding as separate analyses remains unclear. While cluster analysis can corroborate the findings of position decoding, it is not apparent why the authors chose to include trials across characters for cluster analysis but not for decoding analysis. An explanation of the reasoning behind this methodological divergence would help in understanding the distinct contributions of each analysis to the study's findings.

      The paper by Cohen et al. (1997), which provides the questionnaire for measuring the social network index, is not cited in the references. Upon reviewing the questionnaire that the author may have used, it appears that the term "social network size" does not refer to the actual size but to a score or index derived from the questionnaire responses. It may be more appropriate to replace the term "size" with a different term to more accurately reflect this distinction.

      Thank you for seeking these clarifications. Given the complexity of this analysis, we have decided to drop it to focus instead on our dimension and location representational similarity analysis results.

      Reviewer #2 (Recommendations for the authors):

      How did the participants' decisions on previous trials influence the future trials that the subjects saw? If the different participants were faced with different decision trials, then how did you compare their decision? If two participants made the same decisions, would they have seen exactly the same sequence of trials (see point X on how the trial sequence was randomized).

      All participants experience the same narrative, with the same decisions (i.e., the same available options); their choices (i.e., the options they select) are what implicitly shape each character’s affiliation and power locations, and thus each character’s trajectory. In other words, the narrative is fixed; what changes is the social coordinates assigned to each trial’s outcome depending on the participant’s choice of how to interact from the two narrative options. This means that we can meaningfully compare participants' neural patterns, given that every participant received the same text and images throughout.

      We have now added details on the narrative structure, replacing more ambiguous statements with a clearer description (page 16, lines 309-318):

      “The sequence of trials, including both narrative and decision trials, were fixed across participants; all that differs are the choices that the participants make. Narrative trials varied in duration, depending on the content (range 2-10 seconds), but were identical across participants. Decision trials always lasted 12 seconds, with two options presented until the participant made a choice, after which a blank screen was presented for the remainder of the duration. All six characters’ decision trials are interleaved with one another, and with the narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to another decision with the same character, such that each character’s choices are separated by an average of ~20 seconds (ranging from 12 seconds to 10 min).”

      Figure 2B: I assume that "count" is "count of participants"? It would be good to indicate this on the axis/caption.

      Thank you for noting this. We have now removed this figure to improve the clarity of our figures. 

      We have shown that the hippocampus represents the interaction decision trials along abstract social dimensions, but does it track each relationship's unique sequence of abstract social coordinates?". Please clarify what you mean by "represents the interaction decision trials”.

      By “represents the interaction decision trials along abstract social dimensions”, we mean that when the participant makes a choice during the social interactions the hippocampal patterns represent the current social dimension of the choice (affiliation vs power). In other words, the hippocampal BOLD patterns differentiate affiliation and power decisions, consistent with our hypothesis of abstract social dimension representation in the hippocampus. We have clarified this (page 11, lines 185-187):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation.”

      Page 8: "Hippocampal sequences are ordered like trajectories": It is not entirely clear to me what is meant by the split midpoint. Is this the midpoint of the piece-wise linear interpolation between two points, or simply the mean of all piecewise splines from one character? If the latter, is the null model the same as simply predicting the mean affiliation and power value for this character? If yes, please clarify and simplify this for the reader.

      Page 8: "Hippocampal sequences track relationship-specific paths". First, I was misled by the "relationship-specific". I first understood this to mean that you wanted to test whether two relationships (i.e. the identity of the partner) had different representations in Hippocampus, even if the power/affiliation trajectories are the same. I suggest changing the title of this section.

      The analysis in this section also breaks any temporal autocorrelation of measured patterns - so I am not sure if this is a strong analysis that should be interpreted at all. This analysis seems to not address the claim and conclusion that is drawn from it. I assume that the random trajectories have different choices and different affiliation/power values than the true trajectories. So the fact that the true trajectories can be better decoded simply shows that either choices or affiliation and power (or both) are represented in the neural code - but not necessarily anything beyond this.

      Page 9: "Neural trajectories reflect social locations, not just choices". The motivation of this analysis is not clear to me. As I understand this analysis, both social location and choices are changed from the real trajectories. How can it then show that it reflects social locations, not just the choices?

      Figure 4 caption: "on the -based approximation" Is there a missing "point"-[based] here?

      We agree with the reviewer that this analysis is hard to interpret and does not adequately address concerns regarding temporal autocorrelation, and as such we have removed it from the manuscript. We describe the new results that include controlling for temporal distance between trials (pages 11-12, lines 185-208):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation. Does it also represent the changing social coordinates of each character? To test this, we multiple-regression RSA searchlight to test whether left hippocampus patterns represent the characters’ changing social locations across interactions (see Figure 3). We restricted the distances to those from trial pairs from the same character and standardized the distances within character (see Figure 3BD). We controlled for temporal distance to ensure the effect was not explainable by the time between trials, and for whether the trials shared the same underlying dimension (affiliation or power; see Location similarity searchlight analyses for more details). At the group level, we controlled for sample and the average reaction time difference between affiliation and power trials. Using the same testing logic as the dimensionality similarity analysis, we first tested our hypothesis in the bilateral hippocampus and found widespread effects in both the left (peak voxel MNI x/y/z = -35/-22/-15, cluster extent = 1470 voxels) and right (peak voxel MNI x/y/z = 37/-19/-14, cluster extent = 1953 voxels) hemispheres. The whole-brain searchlight analysis revealed additional clusters in the left putamen (-27/-3/14, cluster extent = 131 voxels) and left posterior cingulate cortex (-10/-28/41, cluster extent = 304 voxels).”

      “We then asked a second, complementary question: does the hippocampus represent all interactions, across characters, within a shared map? To test for this map-like structure, we repeated the analysis but now included all trial pairs, z-scoring distances globally rather than within character (Figure 3E-F). The remainder of the procedure followed the same logic as the preceding analysis. The hippocampus analysis revealed an extensive right hippocampal cluster (27/27/-14, cluster extent = 1667 voxels). The whole-brain analysis did not show any significant clusters.”

      We emphasize that the results are robust to the inclusion of temporal distance squared, in the methods (pages 23-24, lines 493-496):

      “Although the square of this temporal distance did not explain any additional variance in behavioral distances, we ran a robustness analysis including both temporal distance and its square and saw qualitatively the same clusters with similar effect sizes.”

      Page 8: last paragraph: The text sounds like you have already shown that you can decode character identity from the patterns - but I do not believe you have it this point. I would consider this would be an interesting addition to the paper, though.

      This section has been removed, and we have been careful to not imply this in the current version of the manuscript. While we agree a character identity decoding would enrich our argument, we do not believe our task is well-suited to capture a character identity effect. Each character only has 12 decision trials, and these trials are partially clustered in time - this is one problem of temporal autocorrelation that we thank the reviewers for pushing us to consider in more detail. Dimension and location patterns, on the other hand, are more natural to analyze in our task, especially in representational similarity analyses that test whether the relevant differences scale with neural distances.

      Page 14ff: Why is "Analysis section" not part of "Materials and Methods"? I believe adding the analysis after a careful description of the methods would improve the clarity of this section.

      We agree with the reviewer and have now consolidated these two sections.

      Two or three examples of Affiliation and Power decision trials should be provided, so the reader can form a more thorough understanding of how these dimensions were operationalized. For the RSA analysis, it is important to consider other differences between these two types of trials.

      We agree that adding examples will clarify the operationalization of these dimensions. We now include example affiliation and power trials in a table (page 17-18).

      We thank the reviewer for noting the need to rule out alternative hypotheses; we have added several such tests. Affiliation and power trials were not different in word count (page 17, lines 329-332):

      “To ensure that any observed neural or behavioral differences were not confounded by trivial features of the text, we tested for differences between the affiliation and power trials (where the two options are concatenated). There were no differences in word count (affiliation average = 26.6, power average = 25.6; t-test p = 0.56).”

      They were also not different in their sentiment, as assessed by a Large Language Model (LLM) analysis (page 17, lines 332-335): 

      “The text’s sentiment also did not differ between these trial types (t-test p = 0.72), as quantified by comparing sentiment compound scores (from most negative, −1, to most positive, +1), using a Large Language Model (LLM) specialized for sentiment analysis [26]. “

      The affiliation and power trials were different in terms of semantic content, consistent with our assumptions (page 17, lines 337-347):

      “Our framework assumes that affiliation and power trials differ in their semantic content–that is, in the conceptual meaning of the text, beyond word count or sentiment. To test this assumption, we used an LLM-based semantic embedding analysis. Each decision trial was embedded into a semantic vector. We then measured the cosine similarity between pairs of trials and calculated the difference between average within-dimension similarity (affiliation-affiliation and power-power comparisons) and average between-dimension similarity (affiliationpower comparisons) and assessed its statistical significance with permutation testing (1,000 shuffles of trial labels). As expected, decision trials of the same dimension were more similar to each other than trials of different dimension, across multiple LLMs (OpenAI’s text-embedding-3-small [27]: similarity difference = 0.041, p < 0.001; all-MiniLM-L12-v2 [28]: similarity difference = 0.032, p < 0.001).”

      The affiliation and power trials were different in average reaction time. To control for this difference in the dimension RSA analysis, we added each participant’s absolute value reaction time difference between the trial types as a covariate. The results were nearly identical to what they were before. We updated the text to reflect this new control (page 23, lines 471-474):

      “However, there was a significant difference in the average reaction time between affiliation and power decisions across participants (t<sub>49</sub> = 6.92, p < 0.001; affiliation mean = 4.92 seconds (s), power mean = 4.51 s), so we controlled for this in the group-level analysis.”

      The exact implementation and timing of the behavioral tasks should be described better. How many narrative trials were intermixed with the decision trials? Which characters were they assigned to? How was the sequence of trials determined? Was it fixed across participants, or randomized?

      We agree that additional details are helpful. In the Methods, we now describe this with more detail (page 16, lines 301-318):

      “There are two types of trials: “narrative” trials where background information is provided or characters talk or take actions (a total of 154 trials), and “decision” trials where the participant makes decisions in one-on-one interactions with a character that can change the relationship with that character (a total of 63 trials). On each decision, participants used a button response box to select between the two options. The options (1 or 2, assigned to the index and middle fingers) choice directions (+/-1 arbitrary unit on the current dimension) were counterbalanced.”

      “The sequence of trials, including both narrative and decision trials, were fixed across participants; all that differs are the choices that the participants make. Narrative trials varied in duration, depending on the content (range 2-10 seconds), but were identical across participants. Decision trials always lasted 12 seconds, with two options presented until the participant made a choice, after which a blank screen was presented for the remainder of the duration. All six characters’ decision trials are interleaved with one another, and with the narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to another decision with the same character, such that each character’s choices are separated by an average of ~20 seconds (ranging from 12 seconds to 10 min).”

      What is the exact timing of trials during fMRI acquisition - i.e. how long were the trials, what was the ITI, were there long phases of rest to determine the resting baseline? These are all important factors that will determine the covariance between regressors and should be reported carefully. Ideally, I would like to see the trial-by-trial temporal auto-correlation structure across beta-weights to be reported.

      We thank the reviewer for asking for this clarification. We have added the following text to clarify the trial timing (page 16, lines 314-318):

      “All six characters’ decision trials are interleaved with one another and with narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to that same character, such that each character’s choices are separated by an average of ~20 seconds (range 12 seconds to 10 min).”

      We now describe the temporal autocorrelation patterns in the supplement, including how we decided on how to control for temporal distance in representational similarity analyses (pages 29-31, lines 593-656):

      “The Social Navigation Task is a narrative-based task, where the relationships with characters evolve over time; trial pairs that are close in time may have more similar fMRI patterns for reasons unrelated to social mapping (e.g., slow drift). It is important to account for the role of time in our analyses, to ensure effects go beyond simple temporal confounds, like the time between decision trials. To aid in this, we quantified how fMRI signals change over time using a pattern autocorrelation function across decision trial lags. We defined the left and right hippocampus and the left and right intracalcarine cortex using the HarvardOxford atlas and thresholded them at 50% probability. We chose intracalcarine corex as an early visual control region that largely corresponds to primary visual cortex (V1), as it is likely to be driven by the visually presented narrative. We used the same trial-wise beta images as in the location similarity RSA (boxcar regressors spanning each decision trial’s reaction time). For each participant and region-of-interest (ROI), we extracted the decision trial-by-voxel beta matrix and quantified three kinds of temporal dependence: beta autocorrelation, multivoxel pattern correlation and multivoxel pattern correlation after regressing out temporal distance.”

      “To estimate the temporal autocorrelation of the trial-wise beta values, we treated each voxel’s beta values as a time series across trials and measured how much a voxel’s response on one trial correlated (Pearson) with its response on previous trials. We averaged these voxel wise autocorrelations within each ROI. At one trial apart (lag 1), both the hippocampus and V1 showed small positive autocorrelations, indicating modest trial-to-trial carryover in response amplitude (see Supplemental figure 1) that by three trials apart was approximately 0.”

      “Because our representational similarity analyses depend on trial-by-trial pattern similarity, we also estimated how multivoxel patterns were autocorrelated over time. For each lag, we computed the Pearson correlation between each trial’s voxelwise pattern and the pattern from the trial that many trials earlier, then averaged those correlations to obtain a single autocorrelation value for that lag. At one trial apart, both regions showed positive autocorrelation, with V1 having greater autocorrelation than the hippocampus; pattern correlations between trials 3 or 4 trials apart reduced across participants, settling into low but positive values. Then, for each participant and ROI, we regressed out the effect of absolute trial onset differences from all pairwise pattern correlations, to mirror the effects of controlling for these temporal distances in regressions. After removing this temporal distance component, the short lag pattern autocorrelation dropped substantially in both regions. The similarity in autocorrelation profiles between the two regions suggests that significant similarity effects in the hippocampus are unlikely to be driven by generic temporal autocorrelation.”

      “Relationship between behavioral location distance and temporal distance “

      “We also quantified how temporal distances between trials relates to their behavioral location distances, participant by participant. Our dimension similarity analysis controls for temporal distance between trials by design (see Social dimension similarity searchlight analysis), but our location similarity analysis does not. To decide on covariates to include in the analysis, we tested whether temporal distances can explain behavioral location distances. For each participant, we computed the correlations between trial pairs’ Euclidean distances in social locations and their linear temporal distances (“linear”) and the temporal distances squared (“quadratic”), to test for nonlinear effects. We then summarized the correlations using one-sample t-tests. The linear relationship was statistically significant (t<sub>49</sub> = 12.24, p < 0.001), whereas the quadratic relationship was not (t<sub>49</sub> = -0.55, p = 0.586). Similarly, in participant specific regressions with both linear and quadratic temporal distances, the linear effect was significant (t<sub>49</sub> = 5.69, p < 0.001) whereas the quadratic effect was not (t<sub>49</sub> = 0.20, p = 0.84). Based on this, we included linear temporal distances as a covariate in our location similarity analyses (see Location similarity searchlight analyses), and verified that adding a quadratic temporal distance covariate does not alter the results. Thus, the reported location-related pattern similarity effects go beyond what can be explained by temporal distance alone.”

    1. eLife Assessment

      This study presents a potentially valuable exploration of the role of thalamic nuclei in language processing. The results will be of interest to researchers interested in the neurobiology of language. However, the evidence is incomplete to support robust conclusions at this point.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Mengxing et al., reports an assessment of three first-order thalamic nuclei (auditory, visual, somatosensory) in a 3 x 2 factorial design to test for specificity of responses in first-order thalamic nuclei to linguistic processing particularly in the left hemisphere. The conditions are reading, speech production, and speech comprehension and their respective control conditions. The authors report the following results:

      (1) BOLD-response analyses: left MGB linguistic vs non-linguistic significant; left LGN linguistic vs non-linguistic significant. There is no hemisphere x stimulus interaction.

      (2) MVPA: left MGB linguistic vs. non-linguistic significant; bilateral VLN linguistic vs. non-linguistic significant; significant lateralisation in MGB (left MGB responses better classified linguistic vs. non-linguistic in contrast to right).

      (3) Functional connectivity: there is, in general, connectivity between the thalamic ROIs and the respective primary cortices independent of linguistics.

      Strengths:

      The study has a clear and comprehensive design and addresses a timely topic. First-order thalamic nuclei and their interaction with the respective cerebral cortex area are likely key to understanding how perception works in a world where one has to compute highly dynamic stimuli often in an instant. Speech is a prime example of an ecologically important, extremely dynamic, and complex stimulus. The field of the contribution of cerebral cortex-thalamic loops is wide open, and the study presents a solid approach to address their role in different speech modalities (i.e., reading, comprehension, production).

      Weaknesses:

      I see two major overall weaknesses in the manuscript in its current form:

      (1) Statistics:

      Unfortunately, I have doubts about the solidity of the statistics. In the analyses of the BOLD responses, the authors do not find significant hemisphere x stimulus interactions. In my view, such results would pre-empt doing a post-hoc t-test. Nevertheless, the authors motivate their post-hoc t-test by 'trends' in the interaction and prior hypotheses. I see two difficulties with that. First, the origin of the prior hypotheses is somewhat unclear (see also the comment below on hypotheses), and the post-hoc t-test is not corrected for multiple comparisons. I find that it is a pity that the authors did not derive more specific hypotheses grounded in the literature to guide the statistical testing, as I think these would have been available, and the response properties of the MGB and LGN also make sense in light of them. In addition, I was wondering whether the MVPA results would also need to be corrected for the three tests, i.e., the three ROIs.

      Hypotheses:

      In my view, it is relatively unclear where the hypotheses precisely come from. For example, the paragraph on the hypotheses in the introduction (p. 6-7) is devoid of references. I also have the impression that the hypotheses are partly not taking into account previous reports on first-order thalamic nuclei involvement in linguistic vs. non-linguistic processing. For example, the authors test for lateralisation of linguistic vs. non-linguistic responses in all nuclei. However, from previous literature, one could derive the hypothesis that the lateralisation in MGB for speech might be there - previous work shows, for example, that speech recognition abilities consistently correlate with left MGB only (von Kriegstein et al., 2008 Curr Biol; Mihai et al., 2019 eLife). In addition, the involvement of the MGB in speech in noise processing is present in the left MGB (Mihai et al., 2021, J Neuroscience). Developmental dyslexia, which is supposed to be based on imprecise phonological processing (Ramus et al., 2004 TiCS), has alterations in left MGB (Diaz et al., 2012 PNAS; Galaburda et al., 1994 PNAS) and left MGB connections to planum temporale (Tschentscher et al., 2019 J Neurosci) as well as altered lateralisation (Müller-Axt et al., 2025 Brain). Conversely, in the LGN, I'm not aware of any studies showing lateralisation for speech. See, for example, Diaz et al., 2018, Neuroimage, where there are correlations of LGN task-dependent modulation with visual speech recognition behaviour in both LGNs. Thus, based on this literature, one could have predicted the result pattern displayed, for example, in Figure 3A at least for MGB and LGN.

      In summary, the motivation for the different hypotheses needs to be carved out more and couched into previous literature that is directly relevant to the topic. The above paragraph is, of course, my view on the topic, but currently, the paper lacks different literature as references to fully understand where the hypotheses are derived from.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the involvement of first-order thalamic nuclei in language-related tasks using task-based fMRI in a 3 × 2 design contrasting linguistic and non-linguistic versions of reading, speech comprehension, and speech production. By focusing on the LGN, MGN, and VLN and combining activation, connectivity, lateralization, and multivariate pattern analyses, the authors aim to characterize modality-specific and language-related thalamic contributions.

      Strength:

      A major strength of the work is its hypothesis-driven and multimodal analytical approach, and the modality-specific engagement of first-order thalamic nuclei is robust and consistent with known thalamocortical organization. This is a very sound study overall.

      Weaknesses:

      However, several conceptual issues complicate the interpretation of the results as evidence for linguistic modulation per se. A central concern relates to the operationalization of the linguistic versus non-linguistic contrast. In the present design, linguistic and non-linguistic stimuli differ along multiple dimensions beyond linguistic content. For example, written words and scrambled images differ in spatial frequency structure, edge composition, contrast regularities, and familiarity, while intelligible speech and acoustically scrambled sounds differ substantially in temporal and spectral statistics. This is particularly relevant given that first-order thalamic nuclei such as the LGN are known to be highly sensitive to low-level sensory properties. As a result, observed differences in thalamic responses may reflect sensitivity to stimulus properties rather than linguistic processing per se, and this limits the specificity of claims regarding linguistic modulation.

      Relatedly, although the manuscript frequently refers to effects "depending on the linguistic nature of the stimuli," the statistical evidence for linguistic versus non-linguistic modulation is uneven across analyses. Whole-brain contrasts collapse across stimulus type and primarily test modality effects. Similarly, the primary ROI analyses of activation amplitude are collapsed across linguistic and non-linguistic conditions and convincingly demonstrate modality-specific engagement of thalamic nuclei, but do not in themselves provide evidence for linguistic modulation. Linguistic effects emerge only in later, more targeted analyses focusing on hemispheric lateralization and multivariate pattern classification, and these effects are nucleus-, modality-, and analysis-specific rather than general. Taken together, these results suggest that linguistic modulation constitutes a secondary and selective finding, whereas modality-specific task engagement represents the primary and most robust outcome of the study.

      An additional interpretational issue concerns task engagement and attention. The tasks differ substantially in cognitive demands (e.g., passive reading and listening versus overt speech production), and linguistic and non-linguistic blocks may differ systematically in salience or engagement. This is particularly important given prior evidence, cited by the authors, that LGN and MGN activity can be modulated by task demands and attention. In the absence of behavioral measures indexing task engagement or compliance, it is difficult to determine whether differences between linguistic and non-linguistic conditions reflect linguistic processing per se or are mediated by attentional factors.

      Finally, while the manuscript emphasizes the novelty of evaluating thalamic involvement in language, thalamic contributions to language have been documented previously in both lesion and functional imaging studies. The contribution of the present work, therefore, lies less in establishing thalamic involvement in language per se, and more in its focus on specific first-order nuclei, its multimodal design, and its combination of univariate, connectivity, and multivariate analyses. Moderating claims of novelty would help place the findings more clearly within the existing literature.

    4. Author response:

      We acknowledge the concerns raised by both reviewers and plan to address them in our revision:

      Regarding Reviewer #1's comments: We will strengthen the statistical framework and address the concerns about multiple comparison corrections. We will also expand our literature review to better motivate our hypotheses, particularly incorporating the work on lateralization patterns in MGN/LGN and the existing evidence on first-order thalamic nuclei in linguistic processing.

      Regarding Reviewer #2's comments: We acknowledge the valid concern that linguistic and non-linguistic stimuli differ beyond linguistic content, including some low-level sensory properties. We will elaborate on the creation and properties of these stimuli in the Methods section and upload stimuli examples to an online repository to provide transparency about differences. We will also add a discussion of this limitation in the Discussion section, acknowledging that disentangling effects of linguistic processing from low-level stimulus properties will require further testing in future research. Additionally, we will moderate part of our claims and reorganize the presentation of results as suggested, and clarify our contribution relative to existing literature.

    1. eLife Assessment

      This manuscript reports high-resolution cryo-EM structures of a trimethylamine N-oxide demethylase and advances the intriguing hypothesis that the enzyme is bifunctional, coupling TMAO demethylation to formaldehyde capture at a distal tetrahydrofolate-binding site via an enclosed intramolecular tunnel. Supported by biochemical assays and molecular dynamics simulations, the structural findings are valuable and potentially of broad interest, particularly the unusual oligomeric architecture and the proposed conduit for a reactive intermediate. However, the mechanistic framework is considered incomplete, raising substantial concerns regarding the proposed catalytic mechanism, metal/cofactor requirements, and the interpretation of biochemical data supporting formaldehyde channelling.

    2. Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal Zn²-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a plausible hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channeling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO, but the concentrations used for THF greatly exceed its reported KD (enzyme concentration used in this assay is not reported). It has previously been shown that HCHO and THF can couple spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic channeling. An additional control that can rule out this possibility would help to strengthen the evidence. For example, mutating the THF binding site to prevent THF binding to the protein complex could clarify whether the observed decrease in HCHO depends on enzyme-mediated proximity effects. A mutation which would specifically disable channeling could be even more convincing (maybe at the narrowest bottleneck).

      Another concern is that the observed decrease in HCHO could alternatively arise from a reduced production of HCHO due to a negative allosteric effect of THF binding on the active site. From this perspective, the interpretation would be more convincing if a clear coupled effect could be demonstrated, specifically, that removal of the product (HCHO) from the reaction equilibrium leads to an increase in the catalytic efficiency of the demethylation reaction.

      While the enzyme kinetics appear to have been performed thoroughly, the description of the kinetic assays in the Methods section is very brief. Important details such as reaction buffer composition, cofactor identity and concentration (Zn²⁺), enzyme concentration, defined temperature, and precise pH are not clearly stated. Moreover, a detailed methodological description could not be found in the cited reference (6), if I am not mistaken.

      The composition of the complex is intriguing but raises some questions. Based on SDS-PAGE analysis, the purified protein appears to be predominantly full-length TDM, and size-exclusion chromatography suggests an apparent molecular weight below 100 kDa. However, the cryo-EM structure reveals a substantially larger complex composed of two full-length monomers and two half-domains.

      Given the lack of clear evidence for proteolytic fragments on the SDS-PAGE gel, it is unclear how the observed stoichiometry arises. This raises the possibility of higher-order assemblies or alternative oligomeric states. Did the authors attempt to pick or analyze larger particles during cryo-EM processing? Additional biophysical characterization of particle size distribution - for example, using interferometric scattering microscopy (iSCAT)-could help clarify the oligomeric state of the complex in solution.

      The authors mention strict symmetry in the complex, yet C2 symmetry was enforced during refinement. While this is reasonable as an initial approach, it would strengthen the structural interpretation to relax the symmetry to C1 using the C2-refined map as a reference. This could reveal subtle asymmetries or domain-specific differences without sacrificing the overall quality of the reconstruction.

      In this context, the proposed catalytic role of Zn²⁺ raises additional questions. Why is a 2:1 enzyme-to-metal stoichiometry observed, and how does this reconcile with previous reports? This point warrants discussion. Does this imply asymmetric catalysis within the complex? Would the stoichiometry change under Zn²⁺-saturating conditions, as no Zn²⁺ appears to be added to the buffers? It would be helpful to clarify whether Zn²⁺ occupancy is equivalent in both active sites when symmetry is not imposed, or whether partial occupancy is observed.

      The divalent ion Zn2+ is suggested to activate water for the catalytic reaction. I am not sure if there is a need for a water molecule to explain this catalytic mechanism. Can you please elaborate on this more? As one aspect, it might be helpful to explain in more detail how Zn-OH and D220 are recovered in the last step before a new water molecule comes in.

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channeling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channeling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe2+, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense to me, for several reasons:

      (i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which is unprecedented, and even if it were possible, would generate methanol, not formaldehyde.

      (ii) The amine oxide is then proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox-active metal ion;

      (iii) The authors say "forming a tetrahedral intermediate, as described for metalloproteinase", but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism, there is no carbonyl to attack, so this statement is just wrong.

      So on several counts, the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn2+ cannot fulfil that role. Fe2+ could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so, then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands, the proposed catalytic mechanism is unacceptable.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors do quote a Vmax of 16.52 µM/min/mg; however, these are incorrect units for Vmax, they should be µmol/min/mg. There is a further inconsistency between the text saying µM/min/mg and the Figure saying µM/min/µg.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme; is that the strain used? Details about the strain are needed, and the accession for the protein sequence.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal Zn²-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a plausible hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channeling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO, but the concentrations used for THF greatly exceed its reported KD (enzyme concentration used in this assay is not reported). It has previously been shown that HCHO and THF can couple spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic channeling. An additional control that can rule out this possibility would help to strengthen the evidence. For example, mutating the THF binding site to prevent THF binding to the protein complex could clarify whether the observed decrease in HCHO depends on enzyme-mediated proximity effects. A mutation which would specifically disable channeling could be even more convincing (maybe at the narrowest bottleneck).

      We agree with the reviewer that HCHO and THF can react spontaneously in a non-enzymatic manner, and our experiments were not intended to demonstrate enzymatic channeling. The linear regression analysis in Figure 1C was designed solely to confirm that HCHO reacts with THF under our assay conditions. Accordingly, THF was titrated over a broad concentration range starting from zero, and the observed THF concentration–dependent decrease in HCHO reflects this chemical reactivity.

      We do not interpret these data as evidence that the enzyme catalyzes or is required for the HCHO–THF coupling reaction. Instead, the structural observation of an enclosed channel is presented as a separate finding. We have clarified this point in the revised text to avoid overinterpretation of the biochemical data (page 2, line 16).

      Another concern is that the observed decrease in HCHO could alternatively arise from a reduced production of HCHO due to a negative allosteric effect of THF binding on the active site. From this perspective, the interpretation would be more convincing if a clear coupled effect could be demonstrated, specifically, that removal of the product (HCHO) from the reaction equilibrium leads to an increase in the catalytic efficiency of the demethylation reaction.

      We agree that, in principle, a decrease in detectable HCHO could also arise from an indirect effect of THF binding on enzyme activity. However, in our study the experiment was not designed to assess catalytic coupling or allosteric regulation. The assay in question monitors HCHO levels under defined conditions and does not distinguish between changes in HCHO production and downstream consumption.

      Additionally, we do not interpret the observed decrease in HCHO as evidence that THF binding enhances catalytic efficiency, or that removal of HCHO shifts the reaction equilibrium. Instead, the data are presented to establish that HCHO can react with THF under the assay conditions. Any potential allosteric effects of THF on the demethylation reaction, or kinetic coupling between HCHO removal and catalysis, are beyond the scope of the current study, and are not claimed.

      While the enzyme kinetics appear to have been performed thoroughly, the description of the kinetic assays in the Methods section is very brief. Important details such as reaction buffer composition, cofactor identity and concentration (Zn<sup>2+</sup>), enzyme concentration, defined temperature, and precise pH are not clearly stated. Moreover, a detailed methodological description could not be found in the cited reference (6), if I am not mistaken.

      Thank you for the suggestion. We have added reference [24] to the methodological description on page 8. The Methods section has been revised accordingly on page 8 under “TDM Activity Assay,” without altering the Zn<sup>2+</sup> concentration.

      The composition of the complex is intriguing but raises some questions. Based on SDS-PAGE analysis, the purified protein appears to be predominantly full-length TDM, and size-exclusion chromatography suggests an apparent molecular weight below 100 kDa. However, the cryo-EM structure reveals a substantially larger complex composed of two full-length monomers and two half-domains.

      We appreciate the reviewer’s careful analysis of the apparent discrepancy between the biochemical characterization and the cryo-EM structure. This issue is addressed in Figure S1, which may have been overlooked.

      As shown in Figure S1, the stability of TDM is highly dependent on protein and salt conditions. At 150 mM NaCl, SEC reveals a dominant peak eluting between 10.5 and 12 mL, corresponding to an estimated molecular weight of ~170–305 kDa (blue dot, Author response image 1). This fraction was explicitly selected for cryo-EM analysis and yields the larger complex observed in the reconstruction. At lower salt concentrations (50 mM) or higher (>150 mM NaCl), the protein either aggregates or elutes near the void volume (~8 mL).

      SDS–PAGE analysis detects full-length TDM together with smaller fragments (~40–50 kDa and ~22–25 kDa). The apparent predominance of full-length protein on SDS–PAGE likely reflects its greater staining intensity per molecule and/or a higher population, rather than the absence of truncated species.

      Author response image 1.

      Given the lack of clear evidence for proteolytic fragments on the SDS-PAGE gel, it is unclear how the observed stoichiometry arises. This raises the possibility of higher-order assemblies or alternative oligomeric states. Did the authors attempt to pick or analyze larger particles during cryo-EM processing? Additional biophysical characterization of particle size distribution - for example, using interferometric scattering microscopy (iSCAT)-could help clarify the oligomeric state of the complex in solution.

      Cryo-EM data were collected exclusively from the size-exclusion chromatography fraction eluting between 10.5 and 12 mL. This fraction was selected to isolate the dominant assembly in solution. Extensive 2D and 3D particle classification did not reveal distinct classes corresponding to smaller species or higher-order oligomeric assemblies. Instead, the vast majority of particles converged to a single, well-defined structure consistent with the 2 full-length + 2 half-domain stoichiometry.

      A minor subpopulation (~2%) exhibited increased flexibility in the N-terminal region of the two full-length subunits, but these particles did not form a separate oligomeric class, indicating conformational heterogeneity rather than alternative assembly states (Author response image 2). Together, these data support the 2+2½ architecture as the predominant and stable complex under the conditions used for cryo-EM. Additional techniques, such as iSCAT, would provide complementary information, but are not required to support the conclusions drawn from the SEC and cryo-EM analyses presented here.

      Author response image 2.

      The authors mention strict symmetry in the complex, yet C2 symmetry was enforced during refinement. While this is reasonable as an initial approach, it would strengthen the structural interpretation to relax the symmetry to C1 using the C2-refined map as a reference. This could reveal subtle asymmetries or domain-specific differences without sacrificing the overall quality of the reconstruction.

      We thank the reviewer for this thoughtful suggestion. In standard cryo-EM data processing, symmetry is typically not imposed initially to minimize potential model bias; accordingly, we first performed C1 refinement before applying C2 symmetry. The resulting C1 reconstructions revealed no detectable asymmetry or domain-specific differences relative to the C2 map. In addition, relaxing the symmetry consistently reduced overall resolution, indicating lower alignment accuracy and further supporting the presence of a predominantly symmetric assembly.

      In this context, the proposed catalytic role of Zn<sup>2+</sup> raises additional questions. Why is a 2:1 enzyme-to-metal stoichiometry observed, and how does this reconcile with previous reports? This point warrants discussion. Does this imply asymmetric catalysis within the complex? Would the stoichiometry change under Zn<sup>2+</sup>-saturating conditions, as no Zn<sup>2+</sup> appears to be added to the buffers? It would be helpful to clarify whether Zn<sup>2+</sup> occupancy is equivalent in both active sites when symmetry is not imposed, or whether partial occupancy is observed.

      The observed ~2:1 enzyme-to-Zn<sup>2+</sup> stoichiometry likely reflects the composition of the 2 full-length + 2 half-domain (2+2½) complex. In this assembly, only the core domains that are fully present in the complex contribute to metal binding. The truncated or half-domains lack the Zn<sup>2+</sup> binding domain. As a result, only two metal-binding sites are occupied per assembled complex, consistent with the measured stoichiometry.

      We note that Zn<sup>2+</sup> was not deliberately added to the buffers, so occupancy may not reflect full saturation. Based on our cryo-EM and biochemical data, both metal-binding sites in the full-length subunits appear to be occupied to an equivalent extent, and no clear evidence of asymmetric catalysis is observed under these current experimental conditions. Full Zn<sup>2+</sup> saturation could potentially increase occupancy, but was not explored in these experiments.

      The divalent ion Zn<sup>2+</sup> is suggested to activate water for the catalytic reaction. I am not sure if there is a need for a water molecule to explain this catalytic mechanism. Can you please elaborate on this more? As one aspect, it might be helpful to explain in more detail how Zn-OH and D220 are recovered in the last step before a new water molecule comes in.

      Thank you for your suggestion. We revised our text in page 2 as bellow.

      Based on our structural and biochemical data, we propose a structurally informed working model for TMAO turnover by TDM (Scheme 1). In this model, Zn<sup>2+</sup> plays a non-redox role by polarizing the O–H bond of the bound hydroxyl, thereby lowering its pK<sub>a</sub>. The D220 carboxylate functions as a general base, abstracting the proton to generate a hydroxide nucleophile. This hydroxide then attacks the electrophilic N-methyl carbon of TMAO, forming a tetrahedral carbinolamine (hemiaminal) intermediate. Subsequent heterolytic cleavage of the C–N bond leads to the release of HCHO. D220 then switches roles to act as a general acid, donating a proton to the departing nitrogen, which facilitates product release and regenerates the active site. This sequence allows a new water molecule to rebind Zn<sup>2+</sup>, enabling subsequent catalytic turnovers. This proposed pathway is consistent with prior mechanistic studies, in which water addition to the azomethine carbon of a cationic Schiff base generates a carbinolamine intermediate, followed by a rate-limiting breakdown to yield an amino alcohol and a carbonyl compound, in the published case, an aldehyde (Pihlaja et al., J. Chem. Soc. Perkin Trans. 2, 1983, 8, 1223–1226).

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channeling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channeling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe<sup>2+</sup>, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense to me, for several reasons:

      (i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which is unprecedented, and even if it were possible, would generate methanol, not formaldehyde.

      (ii) The amine oxide is then proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox-active metal ion;

      (iii) The authors say "forming a tetrahedral intermediate, as described for metalloproteinase", but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism, there is no carbonyl to attack, so this statement is just wrong.

      So on several counts, the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn<sup>2+</sup>cannot fulfil that role. Fe<sup>2+</sup> could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so, then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands, the proposed catalytic mechanism is unacceptable.

      We thank the reviewer for the detailed and thoughtful mechanistic critique. We fully agree that Zn<sup>2+</sup> is not redox-active, and cannot directly mediate oxidative demethylation or amine oxide deoxygenation. We acknowledge that the oxidative step required for the conversion of TMAO to HCHO is not explicitly resolved in the present study. Accordingly, we have revised the manuscript to remove any implication of Zn<sup>2+</sup>-mediated redox chemistry, and have eliminated the previously imprecise analogy to zinc metalloproteases.

      We recognize and now discuss prior biochemical work on TMAO demethylase from Methylocella silvestris (MsTDM), which proposed an iron-dependent oxidative mechanism (Zhu et al., FEBS 2016, 3979–3993). That study reported approximately one Zn<sup>2+</sup> and one non-heme Fe<sup>2+</sup> per active enzyme, implicated iron in catalysis through homology modeling and mutagenesis, and used crossover experiments suggesting a trimethylamine-like intermediate and oxygen transfer from TMAO, consistent with an Fe-dependent redox process. However, that system lacked experimental structural information, and did not define discrete metal-binding sites.

      In contrast,

      (1) Our high-resolution cryo-EM structures and metal analyses of TDM consistently reveal only a single, well-defined Zn<sup>2+</sup>-binding site, with no structural evidence for an additional iron-binding site as in the previous report (Zhu et al., FEBS 2016, 3979–3993).

      (2) To investigate the potential involvement of iron, we expressed TDM in LB medium supplemented with Fe(NH<sub>4</sub>)<sub>2</sub>SO<sub>4</sub> and determined its cryo-EM structure. This structure is identical to the original one, and no EM density corresponding to a second iron ion was observed. Moreover, the previously proposed Fe<sup>2+</sup>-binding residues are spatially distant (Figure S6).

      (3) ICP-MS analysis shows undetectable Iron, and only Zinc ion (Figure S5).

      (4) Our enzyme kinetics analysis with the TDM without Iron is comparable to that of from MsTDM (Figure 1A). The differences in Km and Vmax we propose is due to the difference in the overall sequence of the enzymes. Please also see comment at the end on a new published paper on MsTDM.

      While we cannot comment on the MsTDM results, our ‘experimental’ results do not support the presence of an iron-binding site. Our data indicate that this chemistry is unlikely to be mediated by a canonical non-heme iron center as proposed for MsTDM. We therefore revised our model as a structural framework that rationalizes substrate binding, metal coordination, and product stabilization, while clearly delineating the limits of mechanistic inference supported by the current data.

      The scheme 1 and proposal mechanism section were revised in page 4. Figure S6 was added.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors do quote a Vmax of 16.52 µM/min/mg; however, these are incorrect units for Vmax, they should be µmol/min/mg. There is a further inconsistency between the text saying µM/min/mg and the Figure saying µM/min/µg.

      Thank you for the correction. We converted the V<sub>max</sub> unit to nmol/min/mg. and revised the text in page 2. We also compared with the value of the previous report in the TDM enzyme by revising the text on page 2. See also the note on a newly published manuscript and its comparison.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %.

      We appreciate this important point. We have revised Figure 1C to present HCHO levels in absolute concentration units. While the current data demonstrate reduced detectable HCHO in the presence of THF, we agree that distinguishing between HCHO consumption and potential THF-mediated enzyme inhibition would require dedicated time-course and protein-dependence experiments. We have therefore revised the description to avoid overinterpretation and limit our conclusions to the observed changes in HCHO concentration in page 2, line 18-19.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme; is that the strain used? Details about the strain are needed, and the accession for the protein sequence.

      Thank you for this comment. We now indicate that the enzyme is derived from Paracoccus sp. DMF and provide the accession number for the protein sequence (WP_263566861) in the Experimental Section (page 8, line 4).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The ITC experiment requires a ligand-into-buffer titration as an additional control. Also, maybe I misunderstood the molar ratio or the concentrations you used, but if you indeed added a total of 4.75 μL of 20 μM THF into 250 μL of 5 μM TDM, it is not clear to me how this leads to a final molar ratio of 3.

      We thank the reviewer for this suggestion. A ligand-into-buffer control ITC experiment was performed and is now included in Figure S8C, which shows no realizable signal.

      Regarding the molar ratio, it is our mistake. The experiment used 2.45 μL injections of 80 μM THF into 250 μL of 5 μM TDM. This corresponds to a final ligand concentration of ~12.8 μM, giving a ligand-to-protein molar ratio of ~2.6. We revised our text in page 9, ITC section.

      (2) Characterization/quality check of all mutant enzymes should be performed by NanoDSF, CD spectroscopy or similar techniques to confirm that proteins are properly folded and fit for kinetic testing.

      We appreciate the reviewer’s suggestion. All mutant proteins, including D220A, D367A, and F327A, were purified with yields similar to the wild-type enzyme. Additionally, cryo-EM maps of the mutants show well-defined density and overall structural integrity consistent with the wild-type. These findings indicate that the introduced mutations do not significantly affect protein folding, supporting their use for kinetic analysis. While NanoDSF might reveal differences in thermal stability due to mutations, it does not provide structural information. Our conclusions are not based on minor differences in thermostability. Our cryo-EM structures of the mutants offer much more reliable structural data than CD spectroscopy.

      (3) Best practice would suggest overlapping pH ranges between different buffer systems in the pH-dependence experiments to rule out buffer-specific effects independent of pH.

      We thank the reviewer for this helpful suggestion. We agree that overlapping pH ranges between different buffer systems can be valuable for excluding buffer-specific effects. In this study, the pH-dependence experiments were intended to provide a qualitative assessment of pH sensitivity rather than a detailed analysis of buffer-independent pKa values. While we cannot fully exclude minor buffer-specific contributions, the overall trends observed were reproducible and sufficient to support the conclusions drawn. We have added a clarifying statement to the revised manuscript to reflect this consideration, page 2, line 12.

      (4) Structural comparison revealed high similarity to a THF-binding protein, with superposition onto a T protein.": It would be nice to show this as an additional figure, as resolution and occupancy for THF are low.

      We thank the reviewer for this suggestion. To address this point, we have revised Figure S6 by adding an additional panel (C, now is Figure S7C) showing the structural superposition of TDM with the THF-binding T protein. This comparison is included to better illustrate the structural similarity, despite the limited resolution and partial occupancy of THF density in our map.

      (5) Editing could have been done more thoroughly. Some spelling mistakes, e.g. "RESEULTS", "redius", "complec"; kinetic rate constants should be written in italic (not uniform between text and figures); Prism version is missing; Vmax of 16.52 µM/min/mg - doublecheck units; Figure S1B: The "arrow on the right" might have gone missing.

      We corrected the spelling in page 2 ~ line 10, page 5 ~ line 34, page 6 ~ line40. Prism version was added. The arrow was added into figure S1B. The Vmax unit is corrected to nmol/min/mg.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must re-examine the metal content of their purified enzyme, looking in particular for Fe or another redox-active metal ion, which could be involved in a reasonable catalytic mechanism.

      We thank the reviewer for this suggestion and have carefully re-examined the metal content of TDM. Elemental analyses by EDX and ICP-MS consistently detected Zn<sup>2+</sup> in purified TDM (Zn:protein ≈ 1:2), whereas Fe was below the detection limit across multiple independent preparations (Fig. S5A,B). To assess whether iron could be incorporated or play a functional role, we expressed TDM in E. coli grown in LB medium supplemented with Fe(NH<sub>4</sub>SO<sub>4</sub>)<sub>2</sub> and performed activity assays in the presence of exogenous Fe<sup>2+</sup>. Neither condition resulted in enhanced enzymatic activity.

      Consistent with these biochemical data, all cryo-EM structures reveal a single, well-defined metal-binding site coordinated by three conserved cysteine residues and occupied by Zn<sup>2+</sup>, with no evidence for an additional iron species or other redox-active metal site.

      (2) The specific activity of the enzyme should be quoted in the same units as other literature papers, so that the enzyme activity can be compared. It could be, for example, that the content of Fe (or other redox-active metal) is low, and that could then give rise to a low specific activity.

      Thank you for the suggestion, we quoted the enzyme units as similar with previous report. and revised the text in in page 2.

      Since the submission of our paper a new report on MsTDM has been published (Cappa et al., Protein Science 33(11), e70364). It further supports our findings. First, the reported kinetic parameters using ITC (Vmax = 0.309 μmol/s, approximately 240 nmol/min/mg; Km = 0.866 mM) are comparable to our observed (156 nmol/min/mg and 1.33 mM, respectively) in the absence of exogenous iron. Second, the optimal pH for enzymatic activity similar to that observed in our paraTDM. Third, the reported two-state unfolding behavior is consistent with our cryo-EM structural observations, in which the more dynamic subunits appear to destabilize prior to unfolding of the core domains. Based on these findings, we now propose that Zn<sup>2+</sup> appears to function primarily as an organizational cofactor at the core catalytic domain (revised Scheme 1).

    1. eLife Assessment

      This study provides a useful contribution to understanding how wearable augmentation devices interact with human proprioception, using a longitudinal design over a single session. Results demonstrate that the perceptual representation of the biological finger and augmentation device changes across different phases of device exposure and use. The evidence supporting a representational change over time is solid, although it is still not clear whether these changes reflect three distinct phases of sensorimotor plasticity, as argued, versus 'washout' or adaptation effects. This work will be of interest to researchers studying body representation, sensorimotor learning, and human-technology interaction.

    2. Reviewer #1 (Public review):

      This study by Radziun and colleagues investigates the effects of using a hand-augmentation device on mental body representations. The authors use a proprioceptive localisation task to measure metric representations of finger length before and after participants wear the device, and then before and after they learn to use the device, which extends the lengths of the fingers by 10 cm. The authors find changes between different time points, which they interpret as evidence for three distinct forms of plasticity: one related to simply wearing the device, one related to learning to use it, and an aftereffect after taking the device off. A control experiment with a similar device, which does not lengthen the fingers, showed the first and third of these forms of plasticity, but not the second.

      This study takes an interesting approach to a timely and theoretically significant issue. The study appears to be appropriately designed and conducted. There are, however, some points which require clarification.

      (1) The nature of the localization task is unclear. On its face, the task appears to involve localization of each landmark within the 2-dimensional surface of the touchscreen. However, the regression analysis presupposes that localization is made in a 1-dimensional space. Figure S2 shows that three lines are presented on the screen above the index, middle, and ring fingers, which I imagine the participant is meant to use as a guide. But it is at least conceivable that the perceived location or orientation of the finger might not correspond exactly to these lines. While the method can deal gracefully with proximal-distal translations of the fingers (i.e., with the intercept parameter of the regression), it isn't clear how the participant is supposed to respond if their proprioceptive perception of finger location is translated left-right or rotated relative to the lines on the screen. I also worry that presenting a long, thin line to represent each finger on the screen may not be a neutral method and may prime participants to represent the finger as long and thin.

      (2) The task used here fits within a wider family of tasks in the literature using localization judgments of multiple landmarks to map body representations. I feel that some discussion of this broader set of tasks and their use to measure body representation and plasticity is notably absent from the paper. It is also striking to me that some of the present authors have themselves recently criticized the use of landmark localization methods as a measure of represented body size and shape (Peviani et al, 2024, Current Biology). It is therefore surprising to see them use this task here as a measure of represented finger length without commenting on this issue.

      (3) 18 participants strikes me as a relatively small sample size for this type of study. It weakens the manuscript that the authors do not provide any justification, or even comment on, the sample size. This is especially true as participants are excluded from the entire sample, and from specific analyses, on rather post-hoc grounds.

      (4) I have some concerns about the interpretation of contraction in stage 2. The authors claim that wearing the finger extended produces "a contraction",i.e., an "under-representation" (page 12). But in both experiments, regression slopes in stage 2 were not significantly different from 1 (i.e., 0.98 [SE: 0.07] in Exp 1a and 1.04 [SE: 0.09] in Experiment 1b). So how can that be interpreted as "under-representation"?

      (5) I also have concerns about the interpretation of the stretch that is claimed to occur following training. In Exp 1a, regression slopes in stage 3 are on average 1.15. That is LESS than in the pretest at stage 1 (mean: 1.16). The idea of stretch only comes about because of the lower slopes in stage 2, which the authors have interpreted as reflecting contraction. So what the authors call stretch and a 2nd form of plasticity could just be the contraction from stage 2 wearing off or dissipating, since perceived finger length in stage 3 just appears to return to the baseline level seen in stage 1. While the authors describe their results in terms of three distinct forms of plasticity, these are not in fact statistically independent. The dip in regression slopes in stage 2 is interpreted as evidence for two distinct plasticity effects, which I do not find convincing.

      (6) The distinction between plasticity at stage 3 (which appears specific to augmentation) and plasticity at stage 4 (which does not appear specific, as it also occurs in Experiment 1b) feels strained. This feels like a very subtle distinction, and the theoretical significance of it is not convincingly developed.

      (7) The reporting of statistics is not always consistent. For example, 95%CIs are presented for regression slopes in stages 1, 3, and 4, but not for stage 2. Statistics are performed on regression slopes, except for one t-test on page 7 comparing lengths in cm. Estimates of effect size would be nice additions to statistical tests.

      (8) Minor point: On page 4, the authors write, "These included sorting colored blocks, stacking a Jenga tower, and sorting pegs into holes; the latter task required fine-grained manipulation and was used as our outcome measure of motor learning." This suggests that peg sorting was the outcome measure, but in Figure 1D, Jenga is presented as the outcome measure.

    3. Reviewer #2 (Public review):

      Summary:

      This study aimed to explore dynamic changes in the somatosensory representation of both the body and artificial body parts. The study investigated how proprioceptive localisation along the finger changes when participants wear, actively use, and then remove a hand augmentation device - a rigid finger-extension. By mapping perceived target locations along the biological finger and the extension across multiple stages, the authors aim to characterise how the somatosensory system updates our spatial body representation during and after interaction with body augmentation technology.

      Strengths:

      The manuscript addresses an interesting question of how augmentation devices alter proprioceptive localisation abilities. Conceptually, the work moves beyond classic tool-use paradigms by focusing on a device that is used with the hand to extend the fingers' abilities (versus a tool that is simply used by the hand), and by attempting to map perceived spatial structure across both biological and artificial segments within the same framework.

      A major strength is the multi-stage design, which samples localisation abilities at baseline, the beginning of device wear, post-training, and immediately post-removal. This provides a richer characterisation of short-term adaptation compared to a simple pre/post comparison. The dense sampling across stages and target locations generates a rich behavioural dataset that will be valuable to readers interested in somatosensory body representation. The within-subject, counterbalanced control session further strengthens interpretability, providing a useful comparison for interpreting stage-dependent effects, and to probe how functional training shapes changes in the perceptual representations. Finally, the augmentation device itself appears carefully engineered, with thoughtful design decisions regarding wearability, including comfort and customised fit. The manuscript is also communicated clearly, with transparent reporting of analyses and succinct figures that make the pattern changes across stages straightforward to evaluate.

      Weaknesses:

      There is conceptual ambiguity in how the regression outcomes are interpreted in relation to perceived length and spatial integration. The manuscript treats regression slope as a proxy for "length perception" and discards the intercept as "spatial bias," but in this localisation task translation (intercept) and scaling (slope) are coupled: changes in anchoring at the proximal baseline (intercept) or distal endpoint can generate slope differences without uniform rescaling across the mapped surface. Relatedly, the analyses do not establish whether the reported effects are global across targets or disproportionately driven by the most distal locations. This limits the strength of inferences about "partitioning" or "reallocation" of representational space across biological and artificial segments. Some interpretive statements also appear stronger than the evidence supports (e.g., describing the stage 2 bio-extension map as "geometrically accurate", despite Bayes factors that provide only anecdotal support for no difference from true length). Extensive repeated judgements to a fixed set of locations may additionally stabilise response strategies or anchoring even without feedback, complicating the separation of body-representation change from task-specific calibration.

      The manuscript would also benefit from clearer conceptual framing of what the device is and what its training probes are. The device is described variably as an "artificial finger" versus a rigid "finger extension," with different implications for perception and function. In addition, the training tasks appear to emphasise manipulation and dexterity more than scenarios requiring an extended reachable workspace (indeed, participants appear to have performed at least as well, if not better, in the control training), which brings into question whether participants explored the device's intended functionality and possible proprioceptive consequences. The control experiment is thoughtfully designed to test whether functional training contributes to the stage 3 changes, but because localisation is not performed while wearing the short device, the design does not resolve whether the stage 2 change and the post-removal aftereffect are specific to the augmentative extension versus more general consequences of wearing a device on the finger (and the following possible distorted distal cues).

      Finally, the immediate post-removal aftereffects are intriguing, but the mechanistic interpretation remains underspecified. As presented within the internal model framework, the magnitude and consistency of the aftereffect following brief exposure are difficult to reconcile with the stability expected from a lifetime biological finger model, and because the aftereffect is assessed only immediately after removal, its time course and functional significance remain unclear.

    4. Reviewer #3 (Public review):

      Summary:

      The study aims to investigate sensorimotor plasticity mechanisms by exposing a cohort of 20 subjects to manipulation activities while using wearable finger extensions. With a series of experiments involving localization and motor tasks, the authors provide evidence that the finger extensions are integrated into the body representation of the subjects.

      Strengths:

      The study deserves attention, and the psychophysical protocols are carefully designed, and the statistical analyses are solid.

      Weaknesses:

      However, the current version of the manuscript, in my opinion, makes an exaggerated use of the term plasticity, and this should be amended. This is because the authors support the plasticity claims with psychophysical experiments, without providing evidence of neural-plasticity mechanisms (e.g., neuroimaging methods are not used).

      The authors are recommended to revise the wording of the manuscript and possibly perform additional experiments with brain imaging methods (e.g., EEG or fMRI).

    1. eLife Assessment

      This paper investigates the Achilles' heel of an aggressive pediatric bone cancer known as Ewing sarcoma. The authors aimed to better understand how its previously undruggable drivers mediate oncogenic mechanisms using several omics approaches. Transcriptomic changes aligned with their findings provide convincing evidence for the role of a short alpha helix in the DNA binding domain of FLI1 in modulating binding to GGAA microsatellites and promoting enhancer activity. The study provides valuable new insights into the underlying oncogenic mechanisms in Ewing sarcoma.

    2. Reviewer #1 (Public review):

      Summary:

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult and better understanding how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediate EWS-FLI cis activity (at msat) and to generate the formation of specific TADs. Furthermore, cells expressing DBD deficient EWS-FLI display very poor colony forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      Strengths:

      The group has strong expertise in Ewing sarcoma genetics and epigenetics and also in using and analyzing this model (Theisen et al., 2019; Boone et al., 2021; Showpnil et al., 2022).

      They aim at better understanding how EWS-FLI mediated its oncogenic activity, which is critical to eventually identifying novel therapies against this aggressive cancer.

      They use the most recent state-of-the-art omics methods to investigate transcriptome, epigenetics, and genome conformation methods. In particular, Micro-C enables achieving up to 1kb resolved 3D chromatin structures, making it possible to investigate a large number of TADs and sub-TADs structures where EWS-FLI1 mediates its oncogenic activity.

      They performed all their experiments in an Ewing sarcoma genetic background (A673 cells) which circumvents bias from previously reported approaches when working in non-orthologous cell models using similar approaches.

      Weaknesses:

      The main weakness stems from the poor reproducibility of the Micro-C data. Indeed, the distances and clustering observed between replicates appear to be similar to, or even greater than, those observed between biological conditions. For instance, in Figure 1B, we do not observe any clear clustering among DBD1, DBD2, DBD+1, and DBD+2. Although no further experiments were performed, the authors tempered their claims by rephrasing aspects related to this issue and the reviewer also acknowledged that the transcriptomic data are convincing and support their findings.

      Regarding DBD stability and the cycloheximide experiments requested to rule out any half-life bias of DBD (as higher stability of the re-expressed DBD+ could also partially explain the results independently of a 3D conformational change), the reviewer acknowledged that the WB, RNA-seq data and agar assays presented by the authors appear reproducible across experiments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult and better understanding how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediate EWS-FLI cis activity (at msat) and to generate the formation of specific TADs. Furthermore, cells expressing DBD deficient EWS-FLI display very poor colony forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      This new version of the study comprises as requested new data from an additional cell line. The new data has strengthened the manuscript. Nevertheless, some of the arguments of the authors pertaining to the limitations of immunoblots to assess stability of the DBD constructs or the poor reproducibility of the Micro C data remain problematic. While the effort to repeat MicroC in a different cell line is appreciated, the data are as heterogeneous as those in A673 and no real conclusion can be drawn. The authors should tone down their conclusions. If DBD has a strong effect on chromatin organization, it should be reproducible and detectable. The transcriptomic and cut and tag data are more consistent and provide robust evidence for their findings at these levels. 

      We agree that the Micro-C data have more apparent heterogeneity within and across cell lines as compared to other analyses such as our included CUT&Tag and RNA-seq. We addressed the possible limitations of the technique as well as inherent biology that might be driving these findings in our previous responses. Despite the poor clustering on the PCA plots, our analysis on differential interacting regions, TADs and loops remain consistent across both cell lines. We are confident that these findings reflect the context of transcriptional regulation by the constructs, therefore the role of the alpha-helix in modulating chromatin organization. To address the concerns raised by the editors and reviewers for the strength of the conclusions we drew from the Micro-C findings we have made changes to the language used to describe them throughout the manuscript. Find these changes outlined below.

      • On lines 70-71, "is required to restructure" was changed to "is implicated in restructuring of"

      • On line 91, "is required for" was changed to "participates in"

      • On line 98, "is required for" changed to "is potentially required for"

      • On line 360-361, "is required for restructuring" changed to "participates in restructuring"

      Concerning the issue of stability of the DBD and DBD+ constructs, a simple protein half-life assay (e.g. cycloheximide chase assay) could rule out any bias here and satisfactorily address the issue.

      While we generally agree that a cycloheximide assay is a relatively simple approach to look at protein half-life, as we discussed last me the assays included in this paper are performed at equilibrium and rely on the concentration of protein at the me of the assay. This is particularly true for assays involving crosslinking, like Micro-C. As discussed in our prior response, western blots are semi quantitative at best, even when normalized to a housekeeping protein. In analyzing the relative protein concentration of DBD vs. DBD+ with relative protein intensities first normalized to tubulin and using the wildtype EWSR1::FLI1 rescue as a reference point, we find that there is no statistical difference in the samples used for micro-C here (Author responseimage 1A) or across all of the samples that we have used for publication (Author response image 1B). This does show that DBD generally has more variable expression levels relative to wildtype EWSR1::FLI1, and this is consistent with our experience in the lab.

      Nonetheless, we did attempt to perform the requested cycloheximide chase experiment to determine protein stability. Unfortunately, despite an extensive number of troubleshooting attempts, we have not been able to get good expression of DBD for these experiments. The first author who performed this work has left the lab and we have moved to a new lab space since the benchwork was performed. We continue to try to troubleshoot to get this experimental system for DBD and DBD+ to work again. When we tried to look at stability of DBD+ following cycloheximide treatment, there did appear to be some difference in protein stability (Author response image 2). However, these conditions are not the same conditions as those we published, they do not meet our quality control standards for publication, and we are concerned about being close to the limit of detection for DBD throughout the later timepoints. Additional studies will be needed with more comparable expression levels between DBD and DBD+ to satisfactorily address the reviewer concerns.

      Author response image 1.

      Expression Levels of DBD and DBD+ Across Experiments. Expression levels of DBD and DBD+ protein based on western blot band intensity normalized by tubulin band intensity. Expression levels are relative to wildtype EWSR1::FLI1 rescue levels and are calculated for (A) A673 samples used for micro-C and (B) all published studies of DBD and DBD+. P-values were calculated with an unpaired t-test.

      Author response image 2.

      CHX chase assay to determine the stability of DBD and DBD+. (A) Knock-down of endogenous EWSR1::FLI1 detected with FLI1 ab and rescue with DBD and DBD+ detected with FLAG ab. (B) CHX chase assay to determine the stability of DBD and DBD+ in A-673 cells with quantification of the protein levels (n=3). Error bars represent standard deviation. The half-lives (t1/2) of DBD and DBD+ were listed in the table.

      Suggestions:

      The Reviewing Editor and a referee have considered the revised version and the responses of the referees. While the additional data included in the new version has consolidated many conclusions of the study, the MicroC data in the new cell line are also heterogeneous and as the authors argue, this may be an inherent limitation of the technique. In this situation, the best would be for the authors to avoid drawing robust conclusions from this data and to acknowledge its current limitations.

      As discussed above, we have changed the language regarding our conclusions from micro-C data to soften the conclusions we draw per the Editor’s suggestion.

      The referee and Reviewing Editor also felt that the arguments of the authors concerning a lack of firm conclusions on the stability of EWS-FLI1 under +/-DBD conditions could be better addressed. We would urge the authors to perform a cycloheximide chase type assay to assess protein half-life. These types of experiments are relatively simple to perform and should address this issue in a satisfactory manner.

      As discussed above, we do not feel that differences in protein stability would affect the results here because the assays performed required similar levels of protein at equilibrium. Our additional analyses in this response shows that there are not significant differences between DBD and DBD+ levels in samples that pass quality control and are used in published studies. However, we attempted to address the reviewer and editor comments with a cycloheximide chase assay and were unable to get samples that would have passed our internal quality control standards. These data may suggest differences in protein stability, but it is unclear that these conditions accurately reflect the conditions of the published experiments, or that this would matter with equal protein levels at equilibrium.

    1. eLife Assessment

      This fundamental manuscript describes how the posterolateral cortical amygdala (plCoA) generates appetitive or aversive behaviors in response to odors. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

    2. Reviewer #1 (Public review):

      Summary:

      This study by Howe and colleagues investigates the role of the posterolateral cortical amygdala (plCoA) in mediating innate responses to odors, specifically attraction and aversion. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. They show that specific glutamatergic neurons in the anterior and posterior regions of plCoA are responsible for driving attraction and avoidance, respectively, and that these neurons project to distinct downstream regions, including the medial amygdala and nucleus accumbens, to control these responses.

      Strengths:

      The major strength of the study is the thoroughness of the experimental approach, which combines advanced techniques in neural manipulation and mapping with high-resolution molecular profiling. The identification of a topographically organized circuit in plCoA and the connection between molecularly defined populations and distinct behaviors is a notable contribution to understanding the neural basis of innate motivational responses. Additionally, the use of fucntional manipulations adds depth to the findings, offering valuable insights into the functionality of specific neuronal populations.

      Weaknesses:

      Previously described weaknesses in the study's methods and interpretation were fully addressed during revision. Locomotor behavior of the mice during head-fixed imaging experiments was added and analysis of the correlation of locomotion with neural activity was also added.

      This work provides significant insights into the neural circuits underlying innate behaviors and opens new avenues for further research. The findings are particularly relevant for understanding the neural basis of motivational behaviors in response to sensory stimuli, and the methods used could be valuable for researchers studying similar circuits in other brain regions. If the authors address the methodological issues raised, this work could have a substantial impact on the field, contributing to both basic neuroscience and translational research on the neural control of behavior.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by the Root laboratory and colleagues describes how the posterolateral cortical amygdala (plCoA) generates valenced behaviors. Using a suite of methods, the authors demonstrate that valence encoding is mediated by several factors, including spatial localization of neurons within the plCoA, glutamatergic markers, and projection. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

      Strengths:

      - The manuscript is well constructed, containing lots of data sets and clearly presented, in spite of the abundance of experimental results.

      - The authors should be commended for their rigorous anatomical characterizations and post-hoc analysis. In the field of circuit neuroscience, this is rarely done so carefully, and when it is, often new insights are gleaned as is the case in the current manuscript.

      - The combination of molecular markers, behavioral readouts and projection mapping together substantially strengthens the results.

      - The focus on this relatively understudied brain region in the context is valence is well appreciated, exciting and novel.

      Weaknesses:

      The weaknesses noted in the primary review have all been addressed adequately.

    4. Reviewer #3 (Public review):

      Summary:

      Combining electrophysiological recording, circuit tracing, single cell RNAseq, and optogenetic and chemogenetic manipulation, Howe and colleagues have identified a graded division between anterior and posterior plCoA and determined the molecular characteristics that distinguish the neurons in this part of the amygdala. They demonstrate that the expression of slc17a6 is mostly restricted to the anterior plCoA whereas slc17a7 is more broadly expressed. Through both anterograde and retrograde tracing experiments, they demonstrate that the anterior plCoA neurons preferentially projected to the MEA whereas those in the posterior plCoA preferentially innervated the nucleus accumbens. Interestingly, optogenetic activation of the aplCoA drives avoidance in a spatial preference assay whereas activating the pplCoA leads to preference. The data support a model that spatially segregated and molecularly defined populations of neurons and their projection targets carry valence specific information for the odors. Moreover, the intermingling of neurons in the plCoA is consistent with prior observations. The presence of a gradient rather than a distinct separation of the cells fits the model being proposed. The discoveries represent a conceptual advance in understanding plCoA function and innate valence coding in the olfactory system.

      Strengths:

      The strongest evidence supporting the model comes from single-cell RNASeq, genetically facilitated anterograde and retrograde circuit tracing, and optogenetic stimulation. The evidence clear demonstrates two molecularly defined cell populations with differential projection targets. Stimulating the two populations produced opposite behavioral responses.

      Weaknesses:

      The weaknesses noted in primary review have all been addressed adequately.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Howe and colleagues investigates the role of the posterolateral cortical amygdala (plCoA) in mediating innate responses to odors, specifically attraction and aversion. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. They show that specific glutamatergic neurons in the anterior and posterior regions of plCoA are responsible for driving attraction and avoidance, respectively, and that these neurons project to distinct downstream regions, including the medial amygdala and nucleus accumbens, to control these responses.

      Strengths:

      The major strength of the study is the thoroughness of the experimental approach, which combines advanced techniques in neural manipulation and mapping with high-resolution molecular profiling. The identification of a topographically organized circuit in plCoA and the connection between molecularly defined populations and distinct behaviors is a notable contribution to understanding the neural basis of innate motivational responses. Additionally, the use of functional manipulations adds depth to the findings, offering valuable insights into the functionality of specific neuronal populations.

      Weaknesses:

      There are some weaknesses in the study's methods and interpretation. The lack of clarity regarding the behavior of the mice during head-fixed imaging experiments raises the possibility that restricted behavior could explain the absence of valence encoding at the population level.

      We agree with idea that head-fixation may alter the state of the animal and the neural encoding of odor. To address this, we have provided further analysis of walking behavior during the imaging sessions, which is provided in Figure S2. Overall, we could not identify any clear patterns in locomotor behavior that are odor-specific. Moreover, when neural activity was sorted depending on the behavioral state (walking, pausing or fleeing) we didn’t observe any apparent patterns in odor-evoked neural activity. This is now discussed in the Results and Limitations sections of the manuscript.

      Furthermore, while the authors employ chemogenetic inhibition of specific pathways, the rationale for this choice over optogenetic inhibition is not fully addressed, and this could potentially affect the interpretation of the results.

      The rationale was logistical. First, inhibition of over a timescale of minutes is problematic with heat generation during prolonged optical stimulation. Second, our behavioral apparatus has a narrow height between the ceiling and floor, making tethering difficult. This is now explained the results section. The trade-off of using chemogenetics is that we are silencing neurons and not specific projections. However, because we find that NAc- and MeA- projecting neurons have little shared collateralization, we believe the conclusion of divergent pathways still stands. This is now discussed in the Limitations section.

      Additionally, the choice of the mplCoA for manipulation, rather than the more directly implicated anterior and posterior subregions, is not well-explained, which could undermine the conclusions drawn about the topographic organization of plCoA.

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intra plCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have now clarified this in the text.

      Despite these concerns, the work provides significant insights into the neural circuits underlying innate behaviors and opens new avenues for further research. The findings are particularly relevant for understanding the neural basis of motivational behaviors in response to sensory stimuli, and the methods used could be valuable for researchers studying similar circuits in other brain regions. If the authors address the methodological issues raised, this work could have a substantial impact on the field, contributing to both basic neuroscience and translational research on the neural control of behavior.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by the Root laboratory and colleagues describes how the posterolateral cortical amygdala (plCoA) generates valenced behaviors. Using a suite of methods, the authors demonstrate that valence encoding is mediated by several factors, including spatial localization of neurons within the plCoA, glutamatergic markers, and projection. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

      Strengths:

      - For a first submission the manuscript is well constructed, containing lots of data sets and clearly presented, in spite of the abundance of experimental results.

      - The authors should be commended for their rigorous anatomical characterizations and posthoc analysis. In the field of circuit neuroscience, this is rarely done so carefully, and when it is, often new insights are gleaned as is the case in the current manuscript.

      - The combination of molecular markers, behavioral readouts and projection mapping together substantially strengthen the results.

      - The focus on this relatively understudied brain region in the context is valence is well appreciated, exciting and novel.

      Weaknesses:

      - Interpretation of calcium imaging data is very limited and requires additional analysis and behavioral responses specific to odors should be considered. If there are neural responses behavioral epochs and responses to those neuronal responses should be displayed and analyzed.

      We have now considered this, see response above.

      - The effect of odor habituation is not considered.

      We considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Optogenetic data in the two subregions relies on very careful viral spread and fiber placement. The current anatomy results provided should be clear about the spread of virus in A-P, and D-V axis, providing coordinates for this, to ensure readers the specificity of each sub-zone is real.

      We were careful to exclude animals for improper targeting. The spread of virus is detailed in Figures S3, S8 & S9.

      - The choice of behavioral assays across the two regions doesn't seem balanced and would benefit from more congruency.

      The choice of the 4-quadrant assay was used because this study builds off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. Moreover, the approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. We did examine other non-olfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways.

      - Rationale for some of the choices of photo-stimulation experiment parameters isn't well defined.

      The parameters for photo-stimulation were based on those used in our past work (Root et al., 2014). We used a gradient of frequency from 1-10 Hz based on the idea that odor likely exists in a gradient and this was meant to mimic a potential gradient, though we don’t know if it exists. The range in stimulation frequencies appears to align with the actual rate of firing of plCoA neurons (Iurilli et al., 2017).

      Reviewer #3 (Public review):

      Summary:

      Combining electrophysiological recording, circuit tracing, single cell RNAseq, and optogenetic and chemogenetic manipulation, Howe and colleagues have identified a graded division between anterior and posterior plCoA and determined the molecular characteristics that distinguish the neurons in this part of the amygdala. They demonstrate that the expression of slc17a6 is mostly restricted to the anterior plCoA whereas slc17a7 is more broadly expressed. Through both anterograde and retrograde tracing experiments, they demonstrate that the anterior plCoA neurons preferentially projected to the MEA whereas those in the posterior plCoA preferentially innervated the nucleus accumbens. Interestingly, optogenetic activation of the aplCoA drives avoidance in a spatial preference assay whereas activating the pplCoA leads to preference. The data support a model that spatially segregated and molecularly defined populations of neurons and their projection targets carry valence specific information for the odors. The discoveries represent a conceptual advance in understanding plCoA function and innate valence coding in the olfactory system.

      Strengths:

      The strongest evidence supporting the model comes from single cell RNASeq, genetically facilitated anterograde and retrograde circuit tracing, and optogenetic stimulation. The evidence clear demonstrates two molecularly defined cell populations with differential projection targets. Stimulating the two populations produced opposite behavioral responses.

      Weaknesses:

      There are a couple of inconsistencies that may be addressed by additional experiments and careful interpretation of the data.

      Stimulating aplCoA or slc17a6 neurons results in spatial avoidance, and stimulating pplCoA or slc17a7 neurons drives approach behaviors. On the other hand, the authors and others in the field also show that there is no apparent spatial bias in odor-driven responses associated with odor valence. This discrepancy may be addressed better. A possibility is that odor-evoked responses are recorded from populations outside of those defined by slc17a6/a7. This may be addressed by marking activated cells and identifying their molecular markers. A second possibility is that optogenetic stimulation activates a broad set of neurons that and does not recapitulate the sparseness of odor responses. It is not known whether sparsely activation by optogenetic stimulation can still drive approach of avoidance behaviors.

      We agree that marking specific genetic or projection defined neurons could help to clarify if there are some neurons have more selective valence responses. However, we are not able to perform these experiments at the moment. We have included new data demonstrating that sparser optogenetic activation evokes behaviors similar in magnitude as the broader activation (see Figure S4).

      The authors show that inhibiting slc17a7 neurons blocks approaching behaviors toward 2-PE. Consistent with this result, inhibiting NAc projection neurons also inhibits approach responses. However, inhibiting aplCOA or slc17a6 neurons does not reduce aversive response to TMT, but blocking MEA projection neurons does. The latter two pieces of evidence are not consistent with each other. One possibility is that the MEA projecting neurons may not be expressing slc17a6. It is not clear that the retrogradely labeling experiments what percentage of MEA- and NACprojecting neurons express slc17a6 and slc17a7. It is possible that neurons expressing neither VGluT1 nor VGluT2 could drive aversive or appetitive responses. This possibility may also explain that silencing slc17a6 neurons does not block avoidance.

      We have now performed RNAscope staining on retrograde tracing to better define this relationship. Although the VGluT1 and VGluT2 neurons have biased projections to the MeA and NAc, respectively, there is some nuance detailed in Figure S10. Generally, MeA projecting neurons are predominately VGluT2+, whereas NAc projecting have about 20% that express both. Some (less than 35%) retrogradely labeled neurons were not detected as VGluT1 or VGluT2 positive, suggesting that other populations could also contribute. We agree that the discrepancy between MeA-projection and VGluT2 silencing is likely due to incomplete targeting of the MeA-projecting population with the VGluT2-cre line. This is included in the Discussion section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main:

      (1) For the head-fixed imaging experiments, what is the behavior of the mice during odor exposure? Could the weak reliability of individual neurons be due to a lack of approach or avoidance behavior? Could restricted behavior also explain the lack of valence encoding at the population level?

      We agree that this is a limitation of head-fixed recordings. In the revised manuscript we did attempt to characterize their behavioral response, and look for correlations in odor representation. Although we did find different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). For example, one might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors as all odors evoked a mixture of responses (Figure S2A-D, text lines 208-232). We then examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any apparent patterns in odor responses (Figure S2E,F). Lastly, we acknowledge in the text that the lack of valence encoding may be an artifact of head-fixation (see lines 849-857).

      (2) For the optogenetic manipulations of Vglut1 and Vglut2 neurons, why was the injection and fiber targeted to the medial portion of the plCoA, if the hypothesis was that these glutamatergic neuron populations in different regions (anterior or posterior) are responsible for approach and avoidance? 

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intraplCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have clarified this in the text (Lines 417-419).

      Could this explain the lack of necessity with the DREADD experiments? 

      For the loss of function experiments, a larger volume of virus was injected to cover a larger area and we did confirm targeting of the appropriate areas. Though, it is always possible that the lack of necessity is due to incomplete silencing.

      Further, why was an optogenetic inhibition approach not utilized? 

      Although optogenetic inhibition could have plausibly been used instead, we chose chemogenetic inhibition for two reasons: First, for minutes-long periods of inhibition, optical illumination poses the risk of introducing heat related effects (Owen et al., 2019). In fact, we first tried optical inhibition but controls were exhibited unusually large variance. Second, it is more feasible in our assay as it has a narrow height between the floor and lid that complicates tethering to an optic fiber. Past experiments overcame this with a motorized fiber retraction system (Root et al., 2014), but this is highly variable with user-dependent effects, so we found chemogenetics to be a more practical strategy. We have added a sentence to explain the rationale (see lines 561-563).

      (3) The specific subregion of the nucleus accumbens that was targeted should be named, as distinct parts of the nucleus accumbens can have very different functions. 

      We attempted to define specific subregions of the nucleus accumbens and found that plCoA projection is not specific to the shell or core, anterior or posterior, rather it broadly innervates the entire structure. We have added a note about this in manuscript (see lines 470-471). Given that we did not find notable subregion-specific outputs within the NAc, targeting was directed to the middle region of NAc, with coordinates stated in the methods. 

      (4) Why was an intersectional DREADD approach used to inhibit the projection pathways, as opposed to optogenetic inhibition? The DREADD approach could potentially affect all projection targets, and the authors might want to address how this could influence the interpretation of the results.

      This is partly addressed above in point 2. As for interpretation, we acknowledge that the intersectional approach silences the neurons projecting to a given target and not the specific projection and we have been careful with the wording. Although this may complicate the conclusion, we did map the collaterals for NAc and MeA projecting neurons and find that neurons do not appreciably project to both targets and have minimal projections to other targets. We have now taken care to state that we silence the neurons projecting to a structure, not silencing the projection, and we acknowledge this caveat. However, since the MeA- and NAcprojecting neurons appear to be distinct from each other (largely not collateralizing to each other), the conclusion that these divergent pathways are required still stands. We have added discussion of this in the Limitations section (see lines 859-863).

      Minor:

      (1) Line 402 needs a reference.

      We have added the missing reference (now line 441).

      (2) The Supplemental Figure labeling in the main text should be checked carefully.

      Thank you for pointing this out. We have fixed the prior errors.

      (3) Panel letter D is missing from Figure 2.

      This has been fixed.

      Reviewer #2 (Recommendations for the authors):

      Major Concerns, additional experiments:

      - In the calcium imaging experiments mice were presented with the same odor many times. Overall responses to odor presentations were quite variable and appear to habituate dramatically (Figure S1F). The general conclusion from these experiments are a lack of consistent valence-specific responses of individual neurons, but I wonder if this conclusion is slightly premature. A few potential explanatory factors that may need additional attention are: -First, despite recording video of the mouse's face during experiments, no behavioral response to any odor is described. Is it possible these odors when presented in head-fixed conditions do not have the same valence?

      Yes, we agree that this is a possibility. We have added a discussion in the Limitations section (see lines 849-857). We have also added additional behavioral analysis discussed below.

      On trials with neural responses are there behavioral responses that could be quantified? 

      We have now added data in which we attempt to characterize their behavioral response, to look for correlations in odor representation (see lines 208-228). Although we did observe different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). One might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors (Figure S2A-D). Next, we examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any meaningful differences in odor responses (Figure S2E,F). Lastly, we acknowledge that the odor representation may be different in freely moving animals that exhibit dynamic responses to odor (see lines 859-857).

      - Habituation seems to play a prominent role in the neural signals, is there a larger contribution of valence if you look only at the first delivery (or some subset of the 20 presentations) of an odor type for a given trial? 

      Indeed, we considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Is it reasonable to exclude valence encoding as a possibility when largely neurons were unresponsive to the positive valence odors (2PE and peanut) chosen when looking at the average cluster response (Figure 1F)? 

      It is true that we see fewer neurons responding to the appetitive odors (Figure 1H) and smaller average responses within the cluster, but some neurons do respond robustly. If these were valence responses, we would predict that neural responses should be similarly selective, but we do not observe any such selectivity. The sparseness of responses to appetitive odors does cause the average cluster analysis (Figure 1F) to show muted responses to these odors, consistent with the decreased responsivity to appetitive odors. Moreover, single neuron response analysis reveals that a given neuron is not more likely to respond to appetitive or aversive odors with any selectivity greater than chance. For these reasons, we think it is reasonable to conclude an absence of valence responses, which is consistent with the conclusion from another report (Iurilli et al., 2017).

      - While the preference and aversion assay with 4 corners is an interesting set-up and provides a lot of data for this particular manuscript. It would be helpful to test additional behaviors to determine whether these circuits are more conserved. As it stands the current manuscript relies on very broad claims using a single behavioral readout. Some attempts to use head-fixed approaches with more defined odor delivery timelines and/or additional valenced behavioral readouts is warranted.

      We appreciate the suggestion, but are not able to perform these experiments at the moment. The choice of the 4-quadrant assay was used because it built off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. The approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. Moreover, we did examine other nonolfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways. Lastly, we have tried to define parameters for head-fixed behavior that would permit correlation of neural responses with behavior, including longer stimulations and closed loop locomotion control of odor concentration, but were unsuccessful at establishing parameters that generated reliable behavioral responses. We acknowledge that one limitation of the study is the limited behavioral tests with two odors and whether the circuits are more broadly necessary for other odors. 

      Minor comments:

      • Please define PID in the Results when it is first introduced.

      Done (see line 154)

      • Line 412 Figure S5C-N should be Figure S6C-N.

      Fixed. Now Figure S8C-N due to additional figures (see line 451).

      • Throughout the Discussion it would be helpful if the authors referred to specific Figure panels that support their statements (e.g. lines 654-656 "[...] which is supported by other findings presented here showing that both VGluT2+ and VGluT1+ neurons project to MeA, while the projection to NAc is almost entirely composed of VGluT1+ neurons".

      Thank you for the suggestion. We have added figure references in the discussion.

      • Line 778 "producing" should be "produce".

      Corrected (see line 840)

      • The figures are very busy, especially all the manipulations. The authors are commended for including each data point, but they might consider a more subtle design (translucent lines only for each animal, and one mean dot for the SEM), just to reduce the overall clutter of an already overwhelming figure set. But this is ultimately left to the authors to resolve and style to their liking. 

      Thank you for the suggestion. We have tried some different styles but like the original best.

      Reviewer #3 (Recommendations for the authors):

      If within reach, I suggest that the author determine the percentage of retrogradely labeled neurons to NAc or MEA that expresses GluT1 and GluT2. 

      We have done this for the middle region plCoA that has the greatest mixture of cell types (See Figure S10, lines 504-517). We find that the MeA projecting neurons are mostly VGluT2+ with a minority that express both VGluT1 and VGlut2. NAc-projecting neurons are primarily VGluT1+ with about 20% expressing VGlut2 as well.

      It would also be nice to sparse label of aplCoA and pplCoA using ChR2 to see if sparse activation drives approach or avoidance. 

      We agree that it would be useful to vary the sparseness of the ChR2 expression, to see if produces similar results. We examined this using sparsely labeled odor ensembles, as previously done (Root et al., 2014). Briefly, we used the Arc-CreER mouse to label TMT responsive neurons with a cre-dependent ChR2 AAV vector targeted to the anterior or posterior regions, while previously we had broadly targeted the entirety of plCoA. We had established that this labeling method captures about half of the active cells detected by Arc expression, which is on the order of hundreds of neurons rather than thousands by broad cre-independent expression. Remarkably, we get effects similar in magnitude that are not significantly different from that with broader activation of the anterior or posterior domains (see new Figure S4, lines 267-288). It still remains possible that there is a threshold number of neurons that are necessary to elicit behavior, but that is beyond the scope of the current study. However, these data indicate that the effect of activating anterior and posterior domains is not an artifact of broad stimulation.

    1. eLife Assessment

      This is an important study with direct implications for the rational selection of antimalarial drug combinations. The authors present data demonstrating antagonism between 4-aminoquinoline antimalarials and peroxide drugs under physiologically relevant conditions, including robust effects at the trophozoite stage and for chloroquine at the ring stage. While the conclusions are based on in vitro assays and further work will be needed to fully resolve the underlying mechanism, the findings are convincing and provide a strong rationale for evaluating drug combinations in relevant preclinical models prior to clinical testing.

    2. Reviewer #1 (Public review):

      Summary:

      This study set out to investigate potential pharmacological drug-drug interactions between the two most common antimalarial classes, the artemisinins and quinolines. There is strong rationale for this aim, because drugs from these classes are already widely-used in Artemisinin Combination Therapies (ACTs) in the clinic, and drug combinations are an important consideration in the development of new medicines. Furthermore, whilst there is ample literature proposing many diverse mechanisms of action and resistance for the artemisinins and quinolines, it is generally accepted that the mechanisms for both classes involve heme metabolism in the parasite, and that artemisinin activity is dependent on activation by reduced heme. The study was designed to measure drug-drug interactions associated with a short pulse exposure (4 h) that is reminiscent of the short duration of artemisinin exposure obtained after in vivo dosing. Clear antagonism was observed between dihydroartemisinin (DHA) and chloroquine, which became even more extensive in chloroquine-resistant parasites. Antagonism was also observed in this assay for the more clinically-relevant ACT partner drugs piperaquine and amodiaquine, but not for other ACT partners mefloquine and lumefantrine, which don't share the 4-aminoquinoline structure or mode of action. Interestingly, chloroquine induced an artemisinin resistance phenotype in the standard in vitro Ring-stage Survival Assay, whereas this effect was not as extensive for piperaquine.

      The authors also utilised a heme-reactive probe to demonstrate that the 4-aminoquinolines can inhibit heme-mediated activation of the probe within parasites, which suggests that the mechanism of antagonism involves the inactivation of heme, rendering it unable to activate the artemisinins. Measurement of protein ubiquitination showed reduced DHA-induced protein damage in the presence of chloroquine, which is also consistent with decreased heme-mediated activation, and/or with decreased DHA activity more generally.

      Overall, the study clearly demonstrates a mechanistic antagonism between DHA and 4-aminoquinoline antimalarials in vitro. It is interesting that this combination is successfully used to treat millions of malaria cases every year, which may raise questions about the clinical relevance of this finding. However, the conclusions in this paper are supported by multiple lines of evidence and the data is clearly and transparently presented, leaving no doubt that DHA activity is compromised by the presence of chloroquine in vitro. It is perhaps fortunate the that the clinical dosing regimens of 4-aminoquinoline-based ACTs have been sufficient to maintain clinical efficacy despite the non-optimal combination. Nevertheless, optimisation of antimalarial combinations and dosing regimens is becoming more important in the current era of increasing resistance to artemisinins and 4-aminoquinolines. Therefore, these findings should be considered when proposing new treatment regimens (including Triple-ACTs) and the assays described in this study should be performed on new drug combinations that are proposed for new or existing antimalarial medicines.

      Strengths:

      This manuscript is clearly written and the data presented is clear and complete. The key conclusions are supported by multiple lines of evidence, and most findings are replicated with multiple drugs within a class, and across multiple parasite strains, thus providing more confidence in the generalisability of these findings across the 4-aminoquinoline and peroxide drug classes.

      A key strength of this study was the focus on short pulse exposures to DHA (4 h in trophs and 3 h in rings), which is relevant to the in vivo exposure of artemisinins. Artemisinin resistance has had a significant impact on treatment outcomes in South-East Asia, and is now emerging in Africa, but is not detected using a 'standard' 48 or 72 h in vitro growth inhibition assay. It is only in the RSA (a short pulse of 3-6 h treatment of early ring stage parasites) that the resistance phenotype can be detected in vitro. Therefore, assays based on this short pulse exposure provide the most relevant approach to determine whether drug-drug interactions are likely to have a clinically-relevant impact on DHA activity. These assays clearly showed antagonism between DHA and 4-aminoquinolines (chloroquine, piperaquine, amodiaquine and ferroquine) in trophozoite stages. Interestingly, whilst chloroquine clearly induced an artemisinin-resistant phenotype in the RSA, piperaquine only had a minor impact on the early ring stage activity of DHA, which may be fortunate considering that piperaquine is a currently recommended DHA partner drug in ACTs, whereas chloroquine is not.

      The evaluation of additional drug combinations at the end of this paper is a valuable addition, which increases the potential impact of this work. The finding of antagonism between piperaquine and OZ439 in trophozoites is consistent with the general interactions observed between peroxides and 4-aminoquinolines, and it may be interesting to see whether piperaquine impacts the ring-stage activity of OZ439.

      The evaluation of reactive heme in parasites using a fluorescent sensor, combined with the measurement of K48-linked ubiquitin, further support the findings of this study, providing independent read-outs for the chloroquine-induced antagonism.<br /> The in-depth discussion of the interpretation and implications of the results are an additional strength of this manuscript. Whilst the discussion section is rather lengthy, there are important caveats to the interpretation of some of these results, and clear relevance to the future management of malaria that require these detailed explanations.

      Overall, this is a high quality manuscript describing an important study that has implications for the selection of antimalarial combinations for new and existing malaria medicines.

      Weaknesses:

      This study is an in vitro study of parasite cultures, and therefore caution should be taken when applying these findings to decisions about clinical combinations. The drug concentrations and exposure durations in these assays are intended to represent clinically relevant exposures, although it is recognised that the in vitro system is somewhat simplified and there may be additional factors that influence in vivo activity. This limitation is reasonably well acknowledged in the manuscript.

      It is also important to recognise that the majority of the key findings regarding antagonism are based on trophozoite-stage parasites, and one must show caution when generalising these findings to other stages or scenarios. For example, piperaquine showed clear antagonism in trophozoite stages, but minimal impact in ring stages under these assay conditions.

      A key limitation is the interpretation of the mechanistic studies that implicate heme-mediated artemisinin activation as the mechanism underpinning antagonism by chloroquine. This study did not directly measure the activation of artemisinins. The data obtained from the activation of the fluorescent probe are generally supportive of chloroquine suppressing the heme-mediated activation of artemisinins, and I think this is the most likely explanation, but there are significant caveats to consider. Primarily, the inconsistency between the fluorescence profile in the chemical reactions and the cell-based assay raise questions about the accuracy of this readout. In the chemical reaction, mefloquine and chloroquine showed identical inhibition of fluorescence, whereas piperaquine had minimal impact. On the contrary, in the cell, chloroquine and piperaquine had similar impacts on fluorescence, but mefloquine had minimal impact. This inconsistency indicates that the cellular fluorescence based on this sensor does not give a simple direct readout of the reactivity of ferrous heme, and therefore, these results should be interpreted with caution. Indeed, the correlation between fluorescence and antagonism for the tested drugs is a correlation, not causation. There could be several reasons for the disconnect between the chemical and biological results, either via additional mechanisms that quench fluorescence, or the presence of biomolecules that alter the oxidation state or coordination chemistry of heme or other potential catalysts of this sensor. It is possible that another factor that influences the H-FluNox fluorescence in cells also influences the DHA activity in cells, leading to the correlation with activity. It should be noted that H-FluNox is not a chemical analogue of artemisinins. It's activation relies on Fenton-like chemistry, but with a N-O rather that O-O bond, and it possesses very different steric and electronic substituents around the reactive centre, which are known to alter reactivity to different iron sources. Despite these limitations, the authors have provided reasonable justification for the use of this probe to directly visualise heme reactivity in cells, and the results are still informative.

      Another interesting finding that was not elaborated by the authors is the impact of chloroquine in the DHA dose-response curves from the ring stage assays. Detection of artemisinin resistance in the RSA generally focuses on the % survival at high DHA concentrations (700 nM) as there is minimal shift in the IC50 (see Fig 2), however, chloroquine clearly induces a shift in the IC50 (~5-fold), where the whole curve is shifted to the right, whereas the increase in % survival is relatively small. This different profile suggests that the mechanism of chloroquine-induced antagonism may be different to the mechanism of artemisinin resistance. Current evidence regarding the mechanism of artemisinin resistance generally points towards decreased heme-mediated drug activation due to a decrease in hemoglobin uptake, which should be analogous to the decrease in heme-mediated drug activation caused by chloroquine. However, these different dose response curves suggest different mechanisms are primarily responsible. Additional mechanisms have been proposed for artemisinin resistance, involving redox or heat stress responses, proteostatic responses, mitochondrial function, dormancy and PI3K signalling among others. Whilst the H-FluNox probe generally supports the idea that chloroquine suppresses heme-mediated DHA activation, it remains plausible that chloroquine could induce these, or other, cellular responses that suppress DHA activity.

      Impact:

      This study has important implications for the selection of drugs to form combinations for the treatment of malaria. The overall findings of antagonism between peroxide antimalarials and 4-aminoquinolines in the trophozoite stage are robust, and the this carries across to the ring stage for chloroquine.

      The manuscript also provides a plausible mechanism to explain the antagonism, although future work will be required to further explore the details of this mechanism and to rule out alternative factors that may contribute.

      Overall, this is an important contribution to the field and provides a clear justification for the evaluation of potential drug combinations in relevant in vitro assays before clinical testing.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Rosenthal and Goldberg investigates interactions between artemisinins and its quinoline partner drugs currently used for treating uncomplicated Plasmodium falciparum malaria. The authors show that chloroquine (CQ), piperaquine, and amodiaquine antagonize dihydroartemisinin (DHA) activity, and in CQ-resistant parasites, the interaction is described as "superantagonism," linked to the pfcrt genotype. Mechanistically, application of the heme-reactive probe H-FluNox indicates that quinolines render cytosolic heme chemically inert, thereby reducing peroxide activation. The work is further extended to triple ACTs and ozonide-quinoline combinations, with implications for artemisinin-based combination therapy (ACT) design, including triple ACTs.

      Strengths:

      The manuscript is clearly written, methodologically careful, and addresses a clinically relevant question. The pulsing assay format more accurately models in vivo artemisinin exposure than conventional 72-hour assays, and the use of H-FluNox and Ac-H-FluNox probes provides mechanistic depth by distinguishing chemically active versus inert heme. These elements represent important refinements beyond prior studies, adding nuance to our understanding of artemisinin-quinoline interactions.

      Weaknesses:

      Several points warrant consideration. The novelty of the work is somewhat incremental, as antagonism between artemisinins and quinolines is well established. Multiple prior studies using standard fixed-ratio isobologram assays have shown that DHA exhibits indifferent or antagonistic interactions with chloroquine, piperaquine, and amodiaquine (e.g., Davis et al., 2006; Fivelman et al., 2007; Muangnoicharoen et al., 2009), with recent work highlighting the role of parasite genetic background, including pfcrt and pfmdr1, in modulating these interactions (Eastman et al., 2016). High-throughput drug screens likewise identify quinoline-artemisinin combinations as mostly antagonistic. The present manuscript adds refinement by applying pulsed-exposure assays and heme probes rather than establishing antagonism de novo.

      The dataset focuses on several parasite lines assayed in vitro, so claims about broad clinical implications should be tempered, and the discussion could more clearly address how in vitro antagonism may or may not translate to clinical outcomes. The conclusion that artemisinins are predominantly activated in the cytoplasm is intriguing but relies heavily on Ac-H-FluNox data, which may have limitations in accessing the digestive vacuole and should be acknowledged explicitly. The term "superantagonism" is striking but may appear rhetorical; clarifying its reproducibility across replicates and providing a mechanistic definition would strengthen the framing. Finally, some discussion points, such as questioning the clinical utility of DHA-PPQ, should be moderated to better align conclusions with the presented data while acknowledging the complexity of in vivo pharmacology and clinical outcomes.

      Despite these mild reservations, the data are interesting and of high quality and provide important new information for the field.

      Editor's Review of the Revision: The authors have provided a well-reasoned rebuttal to the comments of the three reviewers. Most of the changes were incorporated in their revised Discussion. Their data with the active heme probe H-FluNox are novel and the authors reveal interesting interactions between peroxide and 4-aminoquinoline-based antimalarials that open new avenues of research especially when considering antimalarial combinations that combine these chemical scaffolds. This study will be of broad interest to investigators studying and developing antimalarial drugs and combinations and the impact of Plasmodium falciparum resistance mechanisms. A minor recommendation would be that the authors state H-FluNox when referring to their small molecule probe in the abstract, so that it is captured in PubMed searches.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present an in vitro evaluation of drug-drug interactions between artemisinins and quinoline antimalarials, as an important aspect for screening the current artemisinin-based combination therapies for Plasmodium falciparum. Using a revised pulsing assay, they report antagonism between dihydroartemisinin (DHA) and several quinolines, including chloroquine, piperaquine (PPQ), and amodiaquine. This antagonism is increased in CQ-resistant strains in isobologram analyses. Moreover, CQ co-treatment was found to induce artemisinin resistance even in parasites lacking K13 mutations during the ring-stage survival assay. This implies that drug-drug interactions, not just genetic mutations, can influence resistance phenotypes. By using a chemical probe for reactive heme, the authors demonstrate that quinolines inhibit artemisinin activation by rendering cytosolic heme chemically inert, thereby impairing the cytotoxic effects of DHA. The study also observed negative interactions in triple-drug regimens (e.g., DHA-PPQ-Mefloquine) and in combinations involving OZ439, a next-generation peroxide antimalarial. Taken together, these findings raise significant concerns regarding the compatibility of artemisinin and quinoline combinations, which may promote resistance or reduce efficacy.

      With the additive profile as the comparison and a lack of synergistic effect in any of the comparisons, it is hard to contextualize the observed antagonism. Including a known synergistic pair (e.g., artemisinin + lumefantrine) would have provided a useful benchmark to assess the relative impact of the drug interactions described.

      Strengths:

      This study demonstrates the following strengths:

      • The use of a pulsed in vitro assay that is more physiologically relevant over the traditional 48h or 72h assays

      • Small molecule probes, H-FluNox, and Ac-H-FluNox to detect reactive cytosolic heme, demonstrating that quinolines render heme inert and thereby block DHA activation.

      • Evaluates not only traditional combinations but also triple-drug combinations and next-generation artemisinins like OZ439. This broad scope increases the study's relevance to current treatment strategies and future drug development.

      • By using the K13 wild-type parasites, the study suggests that resistance phenotypes can emerge from drug-drug interactions alone, without requiring genetic resistance markers.

      Weaknesses:

      • The study would benefit from a future characterization of the molecular basis for the observed heme inactivation by quinolines to support this hypothesis - while the probe experiments are valuable, they do not fully elucidate how quinolines specifically alter heme chemistry at the molecular level.

      • Suggestion of alternative combinations that show synergy could have improved the significance of the work. The invitro study did not include pharmacokinetic/pharmacodynamic modeling, hence it leaves questions about how the observed antagonism would manifest under real-world dosing conditions, necessitating furture work based on these findings.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      We appreciate the positive assessment. We recognize that since all of the work in this manuscript was done in vitro, there are reasonable concerns about the translatability of these data to clinical settings. These results should not directly inform malaria policy, but we hope that these data bring new considerations to the approach for choosing strategic antimalarial combinations. We have modified the manuscript to clarify this distinction.

      Public Reviews

      Reviewer #1 (Public Review):

      We thank the reviewer for their thoughtful summary of this manuscript. It is important to note that DHA-PPQ did show antagonism in RSAs. In this modified RSA, 200 nM PPQ alone inhibited growth of PPQ-sensitive parasites approximately 20%. If DHA and PPQ were additive, then we would expect that addition of 200 nM PPQ would shift the DHA dose response curve to the left and result in a lower DHA IC50. Please refer to Figure 4a and b as examples of additive relationships in dose-response assays. We observed no significant shift in IC50 values between DHA alone and DHA + PPQ. This suggests antagonism, albeit not to the extent seen with CQ. We have modified the manuscript to emphasize this point. As the reviewer pointed out, it is fortunate that despite being antagonistic, clinically used artemisinin-4-aminoquinoline combinations are effective, provided that parasites are sensitive to the 4-aminoquinoline. It is possible that superantagonism is required to observe a noticeable effect on treatment efficacy (Sutherland et al. 2003 and Kofoed et al. 2003), but that classical antagonism may still have silent consequences. For example, if PPQ blocks some DHA activation, this might result in DHA-PPQ acting more like a pseudo-monotherapy. However, as the reviewer pointed out, while our data suggest that DHA-PPQ and AS-ADQ are “non-optimal” combinations, the clinical consequences of these interactions are unclear. We have modified the manuscript to emphasize the later point.

      While the Ac-H-FluNox and ubiquitin data point to a likely mechanism for DHA-quinoline antagonism, we agree that there are other possible mechanisms to explain this interaction.  We have addressed this limitation in the discussion section. Though we tried to measure DHA activation in parasites directly, these attempts were unsuccessful. We acknowledge that the chemistry of DHA and Ac-H-FluNox activation is not identical and that caution should be taken when interpreting these data. Nevertheless, we believe that Ac-H-FluNox is the best currently available tool to measure “active heme” in live parasites and is the best available proxy to assess DHA activation in live parasites. These points are now addressed in the discussion section. Both in vitro and in parasite studies point to a roll for CQ in modulating heme, though an exact mechanism will require further examination. Similar to the reviewer, we were perplexed by the differences observed between in vitro and in parasite assays with PPQ and MFQ. We proposed possible hypotheses to explain these discrepancies in the discussion section. Interestingly, our data corelate well with hemozoin inhibition assays in which all three antimalarials inhibit hemozoin formation in solution, but only CQ and PPQ inhibit hemozoin formation in parasites. In both assays, in-parasite experiments are likely to be more informative for mechanistic assessment.

      It remains unclear why K13 genotype influences RSA values, but not early ring DHA IC50 values. In K13<sup>WT</sup> parasites, both RSA values and DHA IC50 values were increased 3-5 fold upon addition of CQ. This suggests that CQ-mediated resistance is more robust than that conferred by K13 genotype. However, this does not necessarily suggest a different resistance mechanism. We acknowledge that in addition to modulating heme, it is possible that CQ may enhance DHA survival by promoting parasite stress responses. Future studies will be needed to test this alternative hypothesis. This limitation has been acknowledged in the manuscript. We have also addressed the reviewer’s point that other factors, including poor pharmacokinetic exposure, contributed to OZ439-PPQ treatment failure.

      Reviewer #2 (Public Review):

      We appreciate the positive feedback. We agree that there have been previous studies, many of which we cited, assessing interactions of these antimalarials. We also acknowledge that previous work, including our own, has shown that parasite genetics can alter drug-drug interactions. We have included the author’s recommended citations to the list of references that we cited. Importantly, our work was unique not only for utilizing a pulsing format, but also for revealing a superantagonistic phenotype, assessing interactions in an RSA format, and investigating a mechanism to explain these interactions. We agree with the reviewer that implications from this in vitro work should be cautious, but hope that this work contributes another dimension to critical thinking about drug-drug interactions for future combination therapies. We have modified the manuscript to temper any unintended recommendations or implications.

      The reviewer notes that we conclude “artemisinins are predominantly activated in the cytoplasm”. We recognize that the site of artemisinin activation is contentious. We were very clear to state that our data combined with others suggest that artemisinins can be activated in the parasite cytoplasm. We did not state that this is the primary site of activation. We were clear to point out that technical limitations may prevent Ac-H-FluNox signal in the digestive vacuole, but determined that low pH alone could not explain the absence of a digestive vacuole signal.

      With regard to the “reproducibility” and “mechanistic definition” of superantagonism, we observed what we defined as a one-sided superantagonistic relationship for three different parasites (Dd2, Dd2 PfCRT<sup>Dd2</sup>, and Dd2 K13<sup>R539T</sup>) for a total of nine independent replicates. In the text, we define that these isoboles are unique in that they had mean ΣFIC50 values > 2.4 and peak ΣFIC50 values >4 with points extending upward instead of curving back to the axis. As further evidence of the reproducibility of this relationship, we show that CQ has a significant rescuing effect on parasite survival to DHA as assessed by RSAs and IC50 values in early rings.

      Reviewer #3 (Public Review):

      We thank the reviewer for their positive feedback. We acknowledge that no combinations tested in this manuscript were synergistic. However, two combinations, DHA-MFQ and DHA-LM, were additive, which provides context for contextualizing antagonistic relationships. We have previously reported synergistic and additive isobolograms for peroxide-proteasome inhibitor combinations using this same pulsing format (Rosenthal and Ng 2021). These published results are now cited in the manuscript.

      We believe that these findings are specific to 4-aminoquinoline-peroxide combinations, and that these findings cannot be generalized to antimalarials with different mechanisms of action. Note that the aryl amino alcohols, MFQ and LM, were additive with DHA. Since the mechanism of action of MFQ and LM are poorly understood, it is difficult to speculate on a mechanism underlying these interactions.

      We agree with the reviewer that while the heme probe may provide some mechanistic insight to explain DHA-quinoline interactions, there is much more to learn about CQ-heme chemistry, particularly within parasites.

      The focus of this manuscript was to add a new dimension to considerations about pairings for combination therapies. It is outside the scope of this manuscript to suggest alternative combinations. However, we agree that synergistic combinations would likely be more strategic clinically.

      An in vitro setup allows us to eliminate many confounding variables in order to directly assess the impact of partner drugs on DHA activity. However, we agree that in vivo conditions are incredibly more complex, and explicitly state this.

      We agree that in the future, modeling studies could provide insight into how antagonism may contribute to real-world efficacy. This is outside the scope of our studies.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      The key weaknesses identified in this manuscript are described in the 'weaknesses' section of the public review. The major one is the inconsistency around the H-FluNox response in the chemical vs biological experiments. I can't think of a simple experiment to resolve this issue, but it is good that this data is openly provided in the manuscript. I believe there could be more discussion to clarify this limitation with the current study, and the conclusions, and particularly the title, should be softened regarding the mechanism of antagonism being based on heme reactivity.

      We have softened the title and conclusions to take into account the limitations of our studies.

      (1) Please double-check the definitions for isobologram interpretation. In most antimicrobial interaction studies, I see the threshold for antagonism at sumFIC50 of 1.5, or even 2. 1.25 is often interpreted as additive in many studies.

      We acknowledge that different studies use various cutoff values. Our interpretations for additive versus antagonistic versus superantagonistic were based not only on mean ΣFIC50 values, but also isobologram shape. For example, the flat isoboles for MFQ-DHA were clearly distinct from the curved isoboles of PPQ-DHA. It is unclear what cutoff value(s) would be most clinically relevant.

      (2) For the MFQ-PPQ interaction study, please make it clear that these drugs have very long half-lives (weeks), so the 4 h pulse assay isn't really relevant to their overall activity. It probably shows a slower onset of action, but there is plenty of drug remaining for many days in the clinical scenario, so perhaps the data from the traditional 48h assay is more relevant. The same consideration applies to OZ439, which may impact the interpretation of that data.

      We have now included the half-lives of these compounds in the discussion section. Our intent was to use a pulsing format to make these isobolograms comparable with the other assays. It is important to note that pulses can reveal stronger phenotypes that might be missed with traditional methods. Thus, while 48 h assays may better mimic in vivo conditions, they could also mask important phenotypes.

      Reviewer #3 (Recommendations for the Authors):

      I have included most of my concerns in the public review. Below are some additional specific points for consideration:

      (1) It is expected to include a synergistic combination as a control (e.g., artemisinin + lumefantrine) to contextualize the degree of antagonism observed. The experimental design should show some synergistic profiles in comparison. Adding a few experiments by including a synergistic control is needed.

      Both MFQ-DHA and LM-DHA combinations were additive, which provides context for antagonistic combinations. This is now stated in the results section pertaining to Figure 1. We have also included a reference to our previous publication in which we demonstrated that proteasome inhibitor-peroxide combinations are synergistic to additive using this same pulsing format.

      (2) Consider in vivo validation or pharmacokinetic/pharmacodynamic modeling to strengthen the translational relevance of the findings when it comes to doses and the IC50 correlations.

      We agree that this would be useful to do in future, but it is outside the scope of the current study.

      (3) It would be beneficial to include a discussion section on how the findings are generalizable to different Plasmodium falciparum genotypes (3D7, Dd2, MRA-1284) and their relevance.

      Findings were consistent across three parasite backgrounds depending on PfCRT genotype. This point has been included in the discussion section. The background of these parasites is also provided in Table 1.

      (4) Potential evaluation criteria to understand where certain combinations should be reconsidered can be included as a suggestion for the wider audience.

      Our in vitro studies suggest that pulsing isobolograms would be a useful assay to include when evaluating combination therapies. While we believe that synergistic combinations would be more strategic than antagonistic combinations, we cannot provide evaluation criteria or make recommendations for reconsidering currently used combinations.

      (5) Further elaborate on the mechanistic basis of heme inactivation by quinolines. If data are available, please include more data on the specificity of the process.

      Despite our best efforts, we were unable to evaluate quinoline-heme interactions in parasites. Even in vitro, this interaction has remined elusive for decades. We agree that this would be an important future step towards supporting a specific mechanism for quinoline-DHA antagonism.

    1. eLife Assessment

      In this study, the authors identify EOLA1 as a novel mitochondrial protein required for mitochondrial translation and normal cardiac function. The characterization of the molecular role of EOLA1 is still incomplete, and additional controls will be necessary. Nevertheless, the identification of a novel factor critical for mitochondrial gene expression and oxidative phosphorylation will be useful for cell biologists working on mitochondrial dysfunction.

    2. Reviewer #1 (Public review):

      Summary:

      Mitochondria encode a small set of proteins that are made inside the organelle by specialized ribosomes. When this mitochondrial translation system fails, oxidative phosphorylation is impaired, an outcome that is particularly harmful to energy-demanding tissues such as the heart. In this manuscript, the authors use a targeted CRISPR/Cas9 screen in cultured cells grown on galactose (a condition that forces reliance on oxidative phosphorylation) to identify genes required for mitochondrial activity. They highlight EOLA1, previously studied mainly in inflammatory contexts, as a top candidate.

      Strengths:

      The authors present data suggesting that EOLA1 is imported into mitochondria via an N-terminal targeting sequence and resides in the mitochondrial matrix. Loss of EOLA1 reduces oxygen consumption and is associated with altered mitochondrial ultrastructure. Mechanistically, affinity purification suggests interaction with mitochondrial elongation factors TUFM (mtEF-Tu), and RNA immunoprecipitation experiments enrich 12S mt-rRNA, consistent with a relationship to the small ribosomal subunit. Multiple assays, including sucrose-gradient profiling, reduced abundance of selected mtDNA-encoded proteins, and a click-chemistry labeling approach, support the conclusion that mitochondrial protein synthesis is decreased in EOLA1-deficient cells. Finally, whole-body Eola1 knockout mice show echocardiographic findings consistent with dilated cardiomyopathy and reduced levels of representative mitochondrially encoded proteins in cardiac tissue.

      How to interpret the work:

      The data support a role for EOLA1 in maintaining mitochondrial gene expression and oxidative phosphorylation capacity, and they plausibly implicate mitochondrial translation.

      Weaknesses:

      The main caveat is that the study does not yet establish how EOLA1 acts, whether it directly modulates translation elongation through TUFM, whether it is primarily required for mitoribosome biogenesis/rRNA stability, or whether it influences translation indirectly through mitochondrial stress pathways. The in vivo phenotype is intriguing, but without tissue-specific deletion/rescue and deeper cardiac pathology/mitochondrial functional measurements, it remains uncertain how directly the heart phenotype reflects a cardiomyocyte-autonomous defect in mitochondrial translation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify a previously uncharacterised regulator of mitochondrial function using a genetic screen and propose a role for this protein in supporting mitochondrial protein production. They provide evidence that the protein localises to mitochondria, interacts with components of the mitochondrial translation machinery, and is required for normal heart function in an animal model.

      Strengths:

      A major strength of the work is the use of multiple independent approaches to assess mitochondrial activity and protein production, which together provide support for the central conclusions. The in vivo data linking loss of this factor to impaired heart function are particularly compelling and elevate the relevance of the study beyond a purely cell-based context.

      Weaknesses:

      Given prior reports placing this protein outside mitochondria, its mitochondrial localisation would benefit from more rigorous and quantitative validation, and the proposed mechanism of the interaction with the mitochondrial translation machinery remains only partially explored. In addition, the physiological analysis is largely limited to the heart, leaving open questions about how broadly this pathway operates across tissues.

      Major comments:

      (1) Evidence for mitochondrial localization of EOLA1<br /> EOLA1 has previously been reported as a nuclear and cytosolic protein and is not annotated in MitoCarta 3.0, making rigorous validation of its mitochondrial localization particularly important. Although the authors provide several lines of evidence, interpretation is complicated by the use of different cell lines across localization, interaction, and functional experiments. Greater consistency in the cellular models used would strengthen the conclusions. The immunofluorescence analysis of tagged EOLA1 would also benefit from quantification across more cells and the inclusion of an additional mitochondrial marker (e.g., an outer membrane marker such as TOM20), as HSP60 staining can vary with mitochondrial state.

      (2) Normalization of OCR measurements<br /> Clarification of how Seahorse oxygen consumption rate measurements were normalized (e.g., cell number or protein content) would aid interpretation, particularly given potential effects of Eola1 loss on cell growth.

      (3) Linking interaction data to functional phenotypes<br /> Loss-of-function analyses are performed in mouse cell lines, whereas localization and interactome studies are conducted in human HEK293T cells. The absence of a human EOLA1 knockout model makes it difficult to directly connect the interaction data to the observed functional phenotypes. Additional validation or discussion of species conservation would improve clarity.

      (4) Mechanistic interpretation of the EOLA1-TUFM-12S rRNA interaction<br /> The identification of TUFM and 12S mt-rRNA as EOLA1 interactors is an interesting finding; however, the basis for prioritizing TUFM among the many mitochondrial proteins identified in the interactome is not fully explained. Providing enrichment statistics and functional categorization of mitochondrial interactors would increase transparency. In addition, the proposed role of the ASCH domain in RNA binding would be strengthened by structure-informed or mutational analysis of the conserved RNA-binding motif.

      (5) Interpretation of mitochondrial translation and protein abundance data<br /> Several assays supporting impaired mitochondrial translation would benefit from additional controls and quantification. The de novo mitochondrial translation assay (Fig. 3h) is not quantified, making it difficult to assess the magnitude and reproducibility of the effect. In addition, western blots showing reduced levels of mitochondrially encoded OXPHOS subunits (Figure 3g) lack a mitochondrial loading control (e.g., TOM20 or VDAC). Since loss of EOLA1 may affect mitochondrial mass, normalization to a mitochondrial marker is necessary. Relatedly, it would be informative to assess whether steady-state levels of mitoribosomal proteins (e.g., MRPS15, MRPL37) and nuclear-encoded OXPHOS subunits are altered upon Eola1 loss, both in knockout cell lines and in the knockout mouse.

      (6) Physiological scope of the in vivo analysis<br /> The cardiac phenotype observed in the whole-body Eola1 knockout mouse is compelling, but the focus on a single tissue limits interpretation of EOLA1's broader physiological role. Examination of additional high-energy-demand tissues would help clarify whether the observed effects are heart-specific or more general. In addition, the presence of residual EOLA1 protein bands in western blots (Figure 4a) and remaining Eola1 transcripts in qRT-PCR analyses (Extended Figure 4e) from knockout tissues should be addressed. The authors should clarify whether these signals reflect incomplete knockout, alternative isoforms, antibody cross-reactivity, or technical background.

      (7) Relationship to previously reported MT2A interaction<br /> Given prior reports of EOLA1 interaction with MT2A, a brief comment on whether MT2A was detected in the authors' co-immunoprecipitation experiments and how this relates to the proposed mitochondrial role would be useful.

    4. Reviewer #3 (Public review):

      The authors identified EOLA1 in a CRISPR/Cas9 screen for essential mitochondrial genes in a mouse B16-F10 cell line; however, no information on the library used for this screen or the list of all identified essential genes is provided. What was the p-value for EOLA1 in Figure 1b?

      The authors show that EOLA1 is indeed a mitochondrial protein (using both mouse and human cell lines). It is valuable that the authors use different cell lines to investigate the function of this protein; however, this also presents a challenge, as four different cell lines (two mouse and two human) are used across individual experiments, with no consistency between them. Knock-out (KO) experiments were performed in mouse cell lines only, and human cell lines were used in overexpression experiments, in which EOLA1 was tagged with FLAG-HA. It would be beneficial if a knock-out were also generated in a human cell line to confirm the effect on the expression of mitochondria-encoded proteins, along with a rescue experiment in which the EOLA1 protein is reintroduced into KO cells.

      Functional analysis of EOLA1: The authors performed affinity immunoprecipitation of FLAG-HA-tagged EOLA1 from stably overexpressing cells, and identified 202 co-immunoprecipitating proteins, of which 71 were known mitochondrial proteins; however, no list of these proteins is provided. Why did the authors choose TUFM? Were any mitochondrial ribosomal proteins co-immunoprecipitated, if EOLA1 is suggested to regulate translation? Were levels of TUFM affected in EOLA1-KO cells?

      The authors continued to analyze mitochondrial ribosomes using sucrose gradient fractionation and in-vitro mitochondrial translation. However, there are several technical problems with the presented data: It has been established that mitochondrial ribosomes do not form polysomes in mammalian cells but rather perform translation as monosomes. The authors indirectly confirm this: almost no 12S or 16S rRNA (Fig. 3f) or MRP proteins (Extended data 3c) are present in "polysome" fractions. Although indeed 12S and 16S rRNAs are decreased in monosome fractions, the levels of mRNAs are not different between KO and WT cells, and neither is the migration of mitochondrial ribosomal proteins. As there is no loading control provided for the sucrose gradients blots (such as SDHA, VDAC), it is not possible to assess the overall levels of mitochondrial ribosomes. The gel presented for mitochondrial translation is of poor quality, as it is impossible to identify any of the expected 13 polypeptides. Although the intensity of the signal is weaker for KO, so is the intensity in the portion of Coomassie stained gel. A better-quality gel and quantification need to be provided to support the claims.

      What is the difference between endogenous and exogenous RIP-qPCR? EOLA1 pulled down 12S rRNA without cross-linking (Figure 3d) or with UV-crosslinking (Figure 3e), however, both 12S and 16S rRNAs were enriched in UV-crosslinked cells (Figure 3c) and by UV-RIP-seq (Extended data 3b; although no control is provided here). Is no discussion offered for this observation? Is it possible that EOLA1 plays a role in the maturation of the mito-ribosome, rather than translation? Does EOLA1 co-migrate with the mito-ribosome on sucrose gradients?

      Altogether, there is insufficient evidence to support the conclusion that EOLA1 plays a role in mitochondrial translation.

      To investigate EOLA1 biological function, the authors created a whole-body EOLA1-/- mouse that exhibited no overall developmental abnormalities; however presented with an abnormal cardiac function. This is an ideal model to confirm prior observations in cellular models; however, apart from one western-blot for three mitochondrial encoded subunits, no other experiments were provided (such as measurements of the levels of 12S, or 16S rRNA, TUFM levels, ribosomes profile, mitochondrial translation, OXPHOS assembly, respirometry).

      In Figure 2 g-i: TEM images are presented, but the method is not described, nor is any information on the cells used provided, nor is it clear how the circularity was determined. KO cells certainly look abnormal; however, are the authors sure that the indicated structures are mitochondria? They rather resemble autophagosomes/lysosomes with lamellar inclusions.

    1. Reviewer #3 (Public review):

      This work by Du et al. addresses a critical problem in cryo-electron microscopy. To date, there are few ways of generating phase contrast during cryo-EM imaging while remaining in focus. Cryo-EM practitioners today must generate contrast by collecting out-of-focus exposures, a process that introduces aberrations in the resulting image data. Recent work has shown that standing wave lasers are capable of using the ponderomotive effect to shift the phase of electrons in transmission electron microscopy to generate in-focus phase contrast imaging for cryo-EM. A limitation of this 'laser phase plate' is the high laser power required, which can damage optical mirrors and necessitate high laser safety. Thus, alternative approaches are needed for phase contrast imaging in cryo-EM.

      In this manuscript, Du et al. exploit their expertise in ultrafast electron microscopy to explore the ability to shift the phase of electrons using pulsed electrons and lasers. The motivation for exploring pulsed laser phase plates stems from the fact that femtosecond pulses from 9W lasers can generate extremely high power (as much as the standing-wave laser phase plate, > 1 gigawatt) at the back focal plane. If successful, this type of instrument will likely be much more affordable and easier to deploy worldwide.

      The work outlined here shows a proof of principle, highlighting that an ultrafast scanning electron microscopy beam at 30 kV can have the electron packets phase shift by 430 radians (24637 degrees), which is much greater than the required 1.5 radians (90 degrees) needed for phase contrast imaging. The data presented do not use any biological samples; instead, they measure the spread of the electron beam on a test sample to assess the ability to target pulsed lasers onto electron packets and the amount of electron spread (which relates to the phase shift). They were also able to take their system a step further to measure how changes to the system in terms of laser power affect performance, and show that the system can be stable for 10+ hours.

      The only weaknesses relate to the broad readability of the text. Improved textual clarity will help ensure a wider readership.

      Overall, this work is an important step toward developing lower-cost alternatives to the standing-wave laser phase plate.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors present the development and characterization of a pulsed ponderomotive phase plate for transmission electron microscopy (TEM). The primary goal is to overcome the long-standing challenge of generating stable, tunable phase contrast for weakly scattering biological specimens - a capability that has remained elusive despite decades of development. While the commercially available Volta Phase Plate offers phase enhancement, it suffers from a lack of control and stability. More recent efforts have focused on continuous-wave (CW) laser phase plates; however, these systems face significant practical hurdles, including extreme optical power requirements, thermal instability of mirrors, and the necessity for high-finesse optical cavities that act as diffraction gratings for the electron beam. The authors aim to demonstrate that a pulsed, free-space laser interaction can circumvent these limitations, offering a more robust path toward practically usable phase plates

      Strengths:

      The most significant strength of this work is the elegant use of a free-space pulsed interaction, which fundamentally simplifies the hardware requirements compared to cavity-based designs. By utilizing a high-intensity pulsed laser focus rather than a standing wave inside a resonator, the authors eliminate the need for complex locking feedback loops and avoid the thermal mirror deformation that currently limits CW systems.

      Furthermore, this approach provides a critical theoretical advantage regarding image quality. Current CW cavity-based designs must grapple with the Kapitza-Dirac effect, where the standing wave creates a diffraction grating that generates unwanted "ghost images," delocalizing the signal. Recent proposals have had to resort to complex crossed-beam geometries to mitigate these artifacts. In contrast, the traveling-wave nature of the pulsed interaction described here inherently avoids the creation of a standing wave grating, thereby eliminating ghost images entirely without requiring elaborate compensation strategies.

      The authors successfully demonstrate a proof-of-concept implementation, reporting a pronounced peak phase shift of approximately 430 radians and a stable angular deflection of the electron beam. The stability data, covering a 10-hour period, suggests that this approach is robust enough for data collection sessions typical in structural biology.

      Weaknesses:

      However, the strength of the evidence is modestly tempered by limitations in data presentation and analysis. The agreement between the experimental data and the theoretical simulation in Figure 2b is imperfect; the simulation underestimates the depth of the central signal trough. While the authors acknowledge this "muted" prediction, the discrepancy suggests that the theoretical model or the estimation of experimental parameters (such as electron beam size or laser intensity) requires refinement to fully describe the interaction.

      While the authors claim stability over many hours, the data in Figure 3c reveal a significant drift in the baseline reference signal. Although attributed to a weakening electron beam, this drift complicates the reader's ability to assess the true stability of the laser-induced phase shift. A drift-corrected analysis would have provided more compelling evidence of the "stable angular kick" described.

      Despite these specific weaknesses in data presentation, the work represents a fundamental step forward. The authors have effectively demonstrated that the trade-off between beam current and spatiotemporal resolution (driven by space-charge effects) can be managed to achieve significant phase modulation. By moving the field away from the tight constraints of optical cavities and toward free-space pulsed interactions, this work establishes a potentially more viable route for integrating laser phase plates into routine biological imaging workflows. This study will be of high value to biophysicists and microscopists seeking to push the boundaries of contrast in cryo-EM

    3. Reviewer #1 (Public review):

      Summary:

      Du, Daniel X. et al studied the interaction of the ultrashort electron and laser pulses inside a scanning electron microscopy (SEM), aiming to build a foundation for pulsed laser phase plate electron microscopy, in which the contrast of cryo samples can be significantly increased. The author modified a commercial SEM to accommodate optics to introduce a laser beam inside the instrument to overlap with the electron beam and performed multiple experiments aimed to characterize the electron-light interaction, particularly reaching an extremely high phase shift of >400 rad. Moreover, the authors built a theoretical model for this interaction and estimated the laser beam parameters needed to reach 90 degrees phase shift in transmission electron microscopy (TEM).

      Strengths:

      The conclusion on the interaction of the electron pulses and laser pulses is well described and supported by the experiment.

      The presented instrument can serve as a great tool for studying fundamental interactions of electrons with extremely intense light pulses.

      Weaknesses:

      The authors motivate the project by using the pulsed electron beam with a phase shift for improving the contrast in cryo-EM, and while they indicate the low current in UEM, they do not discuss the limitations of the laser beam properties.

      Such, even for 1 ps electron pulses with the repetition rate of 100 GHz (duty cycle of 10%), they will need to use 100 GHz laser pulses with pulse energies of at least ~1 uJ a second (the lowest pulse energy reported in the simulations in Figure 4), which would mean that ~10 kW of optical power needs to enter the electron microscope and be dumped somewhere after leaving the instrument. This significantly complicates the system and, in my view, makes it harder to use a pulsed laser phase plate in cryo-EM due to either low acquisition rate at lower repetition rates or extreme difficulties to operate multi kW ultrafast laser system.

      I would also expect the unscattered electron beam diameter to be <1 micron, which would significantly change the plot in 4b for the 300 keV electron beam.

      Adding experimental parameters for a typical cryo-EM experiment with the pulsed phase plate, including the repetition rate, electron pulse duration, number of electrons per pulse, electron beam size, and the parameters of the laser beam (wavelength, laser pulse duration, pulse energy), will help readers better understand technical requirements for the proposed cryo-EM experiments.

    4. eLife Assessment

      This important study introduces a pulsed laser phase plate that generates stable phase contrast in electron microscopy, offering a practical alternative to continuous-wave designs that suffer from optical instabilities and diffraction artifacts. The experimental results demonstrate a controllable and stable electron phase shift, and the evidence supporting the feasibility of this approach for phase-contrast electron microscopy is convincing. Clarifying the agreement between experiment and theory and further elaborating on possible applications would strengthen the manuscript.

    1. eLife Assessment

      This important study reports an endometrial organoid culture system mimicking the window of implantation. The evidence supporting the conclusion drawn is convincing. The data will be of interest to embryologists and investigators working on reproductive biology and medicine.

    2. Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only early-passage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

      [Editors' note: the authors have responded to the previous round of recommendations.]

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. Although many bioinformatic analyses point in this direction, there are major concerns that must be addressed.

      Strengths:

      The addition of 3 hormones to enhance the WOI state (although not clearly supported in comparison to the secretory state).

      Comments on revisions:

      The authors did their best to revise their study according to the Reviewers' comments. However, the study remains unconvincing, incomplete and at the same time still too dense and not focused enough.

      Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only earlypassage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This study generated 3D cell constructs (i.e., assembloids) that were treated with hormones to induce a 'window of implantation' (WOI) state. While the authors have made large efforts to address the reviewers' feedback, the study's findings remain unconvincing and incomplete.

      (1) The authors have appropriately revised the terminology from 'organoids' to 'assembloids' in several parts of the manuscript. However, this revision remains incomplete, as the main title, figure legends, and figure titles still contain the incorrect term. A thorough review of the entire manuscript is recommended to ensure consistent and accurate use of terminology.

      Thank you for your meticulous review. We have now conducted a full check and confirmed that terminology is used consistently and accurately throughout the text.

      (1) Previous comments raised concerns about the feasibility of robustly passaging assembloid structures - comprising epithelial, stromal and immune cells - under epithelial growth conditions. The authors responded by stating that they optimized the expansion medium with a stromal cell-promoting factor. Additionally, rather than conducting scRNA-seq on both early and late passages (P6-P10) as suggested, they performed immunofluorescence staining, which confirmed the persistence of stromal cells at passage 6. However, the presence of immune cells was not addressed. Confirmation of their presence is essential for all further claims. Moreover, a more zoomed-out view of the immunostaining would help clarify the overall cellular composition across the entire well and facilitate comparison with corresponding brightfield images.

      Whole-mount immunofluorescence of the 6th - generation assembloids revealed that CD45<sup>+</sup> immune cells surrounded FOXA2<sup>+</sup> glands, with a more zoomed-out view provided.

      Author response image 1.

      Whole-mount immunofluorescence showed that CD45<sup>+</sup> cells (immune cells) were arranged around the glandular spheres that were FOXA2<sup>+</sup>. Scale bar =50 μm (left) and 30 μm (right).

      In their response, the authors mention using the first three passages to ensure optimal cell diversity and viability. However, the manuscript states that 'assembloids derived from the first generation are used for experiments' (line 106). This discrepancy must be clarified.

      Thank you for your suggestion. We have revised the relevant content to “The assembloids derived from the first three generation are used for experiments” (Line 90-91).

      (2) The authors have made a commendable effort to bring more focus to the manuscript, which has improved readability.

      We thank you for your insightful suggestions, which have greatly improved the quality of our manuscript.

      (3) The "embryo implantation" part remains very unconvincing. How did authors define "the blastoids could grow within the endometrial assembloids and interact with them"? What did they mean with "grow"? Did blastoids further differentiate? Normally, blastoids cannot further "grow". "Survival rates of blastoids" is not equal to "growth". It is not clear how the survival rate was quantified. Besides, regarding the "interaction rates", how did authors define and quantify it? Actually, blastoids are able to attach to Matrigel efficiently (even without any endometrial cells), so authors cannot simply define the "interaction" as the co-localization of blastoids and assembloids via brightfield images. In addition, for the assembloids as the 3D structures grow in the Matrigel, the epithelial parts are normally apical-in, while the blastoids attach to the apical (lumen) side of the epithelial cells, so physiologically, blastoids should interact with the apical part of the epithelial cells instead of the outside of the assembloids.

      (1) What did they mean with "grow"? Did blastoids further differentiate?

      On the one hand, volume and morphology undergo continuous dynamic changes; on the other hand, only the inner cell mass and trophectoderm exist at the blastocyst stage, with the ICM further differentiating into OCT4<sup>+</sup> epiblast and GATA6<sup>+</sup> hypoblast.

      (2) Survival rates of blastoids" is not equal to "growth". It is not clear how the survival rate was quantified.

      The definition of "survival rate" is as follows: morphologically, the blastocoel remains noncollapsed and the cell boundaries are distinct (with no obvious cell detachment); molecularly, the markers of epiblast, hypoblast and trophectoderm are expressed. The survival rate is calculated as the ratio of viable embryoids to the total number of embryoids.

      (3) Besides, regarding the "interaction rates", how did authors define and quantify it? Actually, blastoids are able to attach to Matrigel efficiently (even without any endometrial cells), so authors cannot simply define the "interaction" as the co-localization of blastoids and assembloids via brightfield images.

      The criteria for determining interaction include not only attachment between the blastoids and assembloids observed via brightfield images, but also their sustained tight adhesion against external mechanical perturbations (e.g., medium replacement, immunostaining procedures).

      (4) In addition, for the assembloids as the 3D structures grow in the Matrigel, the epithelial parts are normally apical-in, while the blastoids attach to the apical (lumen) side of the epithelial cells, so physiologically, blastoids should interact with the apical part of the epithelial cells instead of the outside of the assembloids.

      You are absolutely correct. In vivo, the embryo indeed makes initial contact with the apical side of the epithelial cells. The introduction of the blastoid co-culture model herein is intended to demonstrate that this receptive endometrial assembloids can better support blastoid growth and development.

      (4) Previous comments highlighted the absence of distinct shifts in gene expression profiles between SEC assembloids and WOI assembloids, which contrasts with findings from primary endometrial tissue reported by Wang et al. (2020). While the authors have expanded their analysis using the Mfuzz algorithm and identified changes in mitochondria- and cilia-associated genes, the manuscript still lacks evidence of significant transcriptional changes in key WOI marker genes, as described in Wang et al. This discrepancy must be addressed and discussed in greater depth to clarify the biological relevance of their model.

      The endometrium in vivo involves complex crosstalk among multiple cell types and is tightly regulated by the hypothalamic-pituitary-ovarian (HPO) axis, thus exhibiting distinct shifts in gene expression during the peri-implantation period.

      In our in vitro model, alterations in mitochondria- and cilia-related genes were observed, which to a certain extent demonstrates that these window of implantation (WOI) assembloids possess receptive-phase characteristics and can be employed to investigate WOI-associated scientific questions or conduct in vitro drug screening.

      However, substantial efforts are still required to optimize the current model for fully recapitulating the dynamic changes in endometrial gene expression across different phases in vivo, and this aspect is further addressed in the Limitations section of our discussion (Line 342-353).

      “However, our WOI endometrial assembloids also exhibit some limitations. It is undeniable that the assembloids cannot perfectly replicate the in vivo endometrium, which comprises functional and basal layers with a greater abundance of cell subtypes, under superior regulation by hypothalamic-pituitary-ovarian (HPO) axis. Specifically, stromal and immune cells are challenging to stably passage, and their proportion is lower than in the in vivo endometrium. While the in vivo peri-implantation period exhibits intricate gene expression dynamics driven by systemic regulation, our models only partially recapitulate these changes, primarily in mitochondria- and cilia-associated genes. Nevertheless, to some extent, these WOI assembloids possess receptivity characteristics and can be utilized for investigating receptivity-related scientific questions or conducting in vitro drug screening. Further refinements are required to fully simulate the dynamic endometrial gene expression patterns across all menstrual cycle stages. We are looking forward to integrating stem cell induction, 3D printing, and microfluidic systems to modify the culture environment.”

      (5) In the authors' response document, they present data integrating their results with those of Garcia Alonso et al. (2021). However, these integrated analyses are not included in the revised manuscript (which should be, if answering a major concern).

      Thanks for your valuable suggestions. We have now integrated the findings of Garcia Alonso et al. (2021) into the revised manuscript (Line 132) and Figure S2E–F.

      (8) Fig 2D: The authors have clarified that CD45+ staining is used. However, they have not yet adapted the typo in the figure legend of the right picture.

      Thanks for your thorough review. The left panel of Figure 2D is stained with CD45 to label immune cells, while the right panel is stained with CD44. These details have been clearly indicated in both the manuscript and the figure legend.  

      (9) All quantification analyses (as described in the authors' response document) should be clearly described in the Materials & Methods section.  

      Thanks for your valuable suggestions. All quantification analyses have now been added to the Supporting Materials and Methods section (Line 94-104, Line 110-111, Line 241244).

      (10) The authors have provided clarification regarding their method for quantifying immunofluorescence staining (e.g., OLFM4 expression in Fig. 3C) in their response document. However, these methodological details are not included in the revised manuscript. It is important that such information is incorporated into the manuscript itself to ensure transparency and reproducibility for others.

      Thanks for your valuable suggestions. All quantification analyses have now been added to the Supporting Materials and Methods section (Line 94-104).

      (13) It is needed to include the author's response to the comment about literature showing the opposite of increased number of cilia during the WOI into the discussion part of the paper.

      We appreciate your suggestions. The relevant content has now been added to the Discussion section (Lines 319–323).

      (14) In the authors' response, they explain the difference between pinopodes and microvilli. They should include this explanation briefly in the manuscript. Moreover, Fig. 3F lacks a picture of cilia structure in CTRL condition. In addition, the structures that are indicated as cilia with an orange arrow seem to not be attached to the endometrial cells (anymore). It would be useful to show another more representative picture for the cilia.

      (1) Thank you for your valuable suggestions. The distinction between pinopodes and microvilli has now been added to the Supporting Materials and Methods section (Line 230-236).

      (2) You are probably referring to Figure 2F—we did not observe ciliary structures in the CTRL group.

      (3) The cilia structure was visualized via transmission electron microscopy (TEM), which requires ultrathin sectioning. Thus, the cilia shown in the image correspond to a single cross-section of the captured assembloids. Owing to technical limitations, three-dimensional visualization of cilia on the cells cannot be achieved.

      (17) The results on co-culturing blastoids with the WOI assembloids is not convincing. The blastoids are exposed to the basolateral side of the endometrial epithelial cells, while in vivo, blastocysts interact with the apical side of the endometrial epithelial cells first (apposition and attachment), followed by invasion into the endometrium. This means that the interaction shown here is not physiological. Therefore, it is not justified to say that this platform holds promise to investigate maternal-fetal interactions.

      We agree with your perspective that discrepancies exist between this model and the physiological processes in vivo. However, such differences do not negate the scientific value of the model.

      The core merit of this study lies in the successful establishment of co-culture systems for blastoids and WOI assembloids. Notably, genuine cross-talk occurs between the two components, thereby providing a practical and operational tool for subsequent research.

      Although the current contact orientation differs from that observed in vivo, future optimization of the cell culture protocol (via modulation of cell polarity) will enable the model to better recapitulate physiological conditions. Therefore, the innovation and operability of this model within specific research contexts still render it a robust platform for investigating maternal-fetal interactions.

      Overall, it is highly recommended that the authors carefully review the manuscript for grammatical errors, inconsistencies and issues with scientific phrasing. The language throughout the text requires substantial editing to improve clarity, readability and precision. 

      We appreciate your suggestions. A full manuscript check was performed to rectify grammatical errors, inconsistencies, and inappropriate scientific phrasing, with further language refinement by a native English-speaking specialist.

      Fig 1A: This overview is unclear. How many days do the assembloids grow before being stimulated with hormones? Are CTRL assembloids only kept in culture until day 2 and SEC and WOI assembloids until day 8? This is also not clear form the Materials and Methods section. Should be clarified.

      Thanks for your valuable suggestions. We have now updated the overview (Figure 1A) and Materials and Methods section (Line 370-371, Line 379-381).

      “Hormonal treatment was initiated following the assembly of the endometrial assembloids (about 7-day growth period).”

      “The CTRL group was cultured in ExM without hormone supplementation and subjected to parallel culture for 8 days along with the two aforementioned groups.”

      Fig 1B: From these brightfield images, it appears that the size of the assembloids remains relatively consistent from Day 0 to Day 3 and up to Day 11 (especially in CTRL). However, in Fig S1A, the assembloids on Day 11 appear significantly larger compared to those on Day 2 (or Day 4). Authors should clarify this discrepancy (since both of the figures are shown as "brightfield of endometrial assembloids").

      You are probably referring to the observation that the assembloids at Day 11 in Fig. S1A are smaller in size than those at Day 2 (or Day 4) in Fig. 1B. This discrepancy arises because the time points in Fig. 1B are calculated starting from the initiation of hormone treatment for the SEC and WOI groups, rather than from the beginning of the overall culture as in Fig. S1A. In addition, assembloids exhibit size variability during the same culture period due to individual heterogeneity.

      To eliminate ambiguity, we have now labeled “Hormone Day 0, Day 2, Day 8” in Fig. 1B and revised the corresponding figure legend to read: “Endometrial assembloids from the CTRL, SEC, and WOI groups, which were subjected to hormone treatment on Days 0, 2, and 8, exhibited comparable growth patterns throughout the culture period.”

      Fig 2G: authors still used the description "organoids" here instead of "assembloids".

      We appreciate your careful review. Corrections have been made accordingly.

      Fig. 3C: For the OLFM4 staining quantification, in the Y-axis authors wrote "proportion of OLFM4 (+) cells (OLFM4 (+)/total", but in the rebuttal letter they mention "its fluorescence intensity (quantified as mean grey value) was significantly stronger in both the SEC and WOI groups compared to the CTRL group". This is confounding and should be clarified.

      We apologize for incorrectly writing "fluorescence intensity" in the rebuttal letter; the correct term should be the "proportion of OLFM4 (+) cells (OLFM4 (+)/total)" as shown in Fig. 3C.

      Fig 5D: Acetyl-α-tubulin is the marker of ciliated cells and should be expressed in the cilia instead of the whole cells. It is very strange to quantify as "mean fluorescence intensity (acetyl-αtubulin/DAPI)" to assess the cilia. Please clarify.

      Thank you for your insightful comment. To clarify, the ratio "mean fluorescence intensity (acetyl-α-tubulin/DAPI)" was calculated within individual acetyl-α-tubulin<sup>+</sup> ciliated cells. Acetyl-αtubulin fluorescence was normalized to the DAPI signal of the same cell nucleus, not the wholecell population. This corrected for variations in cell number and staining efficiency to ensure data accuracy.

      Fig 5F: it is very bizarre that unciliated epithelium was transformed from ciliated epithelium, and CTRL was transformed from SEC and WOI. Should be clarified and discussed.

      Pseudotime analysis sorts discrete cells along a "pseudotime axis" based on similarities and differences in cellular gene expression, thereby simulating cell state transitions.

      Ciliated epithelium → unciliated epithelium: During the menstrual cycle, ciliated and unciliated epithelia undergo mutual transformation from the secretory phase (or mid-secretory phase) to the menstrual phase, and then to the proliferative phase. Here, we demonstrate the transition of ciliated cells to unciliated cells from the SEC and WOI stages to the CTRL stage.

      Notably, the two cell types coexist, and what is presented here merely reflects a transformation trend. Relative content has been incorporated into the Discussion section (Line 319-321).

      “Throughout the menstrual cycle, ciliated and unciliated epithelia undergo mutual transformation from the secretory phase (or mid-secretory phase) to the menstrual phase, and then to the proliferative phase.”

      Fig 5H: To show "enhanced invasion ability", authors must provide some quantification and statistic analysis. It is very hard to see the difference between the CTRL and SEC regarding ROR2Wnt5A.

      We appreciate your suggestion. Quantification and statistic analysis have been added to Figure 5H.

      Fig 6A: please elaborate the "mIVC1" and "mIVC2" in the figure legends.

      Additions have been made to the figure legends accordingly, as follows: "mIVC1: modified In Vitro Culture Medium 1; mIVC2: modified In Vitro Culture Medium 2."

      Fig S1D: Is the PAS staining also done in CTRL assembloids? In addition, it is stated that the assembloids secrete glycogen because of a positive PAS staining, while it could also be neutral mucins, glycoproteins, etc, which are all detected by PAS staining. So, the authors should be more careful in stating that it is glycogen, or a PAS staining with diastase digestion should be done.

      The PAS staining results for the CTRL group are presented in Fig. S1I. In addition, results of PAS staining with diastase digestion are included in Figure S1.

      Line 120: references?

      The reference has been added accordingly.

      Line 178: The term 'Endometrial Receptivity Test (ERT)' is used. Do the authors mean Endometrial Receptivity Analysis (ERA) test? ERA is the commonly used abbreviation for this test. Moreover, the authors describe ERA as 'a kind of gene analysis-based test.' This should be rephrased more scientifically correct.

      Thank you for your valuable suggestion. We have revised the term to ERA, and modified the phrase "a kind of gene analysis-based test" to "gene expression profiling-based diagnostic assay" (Lines 160–163).

      “We performed Endometrial Receptivity Analysis (ERA), a gene expression profiling-based diagnostic assay that integrates high-throughput sequencing and machine learning to quantify the expression of endometrial receptivity-associated genes.”

      Line 83: assemblies à assembloids

      We appreciate your suggestion. The text has been updated to “the endometrial assembloids progressed from epithelial organoids, to assemblies of epithelial and stromal cells and then to stem cell-laden 3D artificial endometrium”.

      The Materials and Methods section currently lacks the needed details. Authors should substantially expand this section to clearly describe all experimental and analytical procedures, including, aùmong others, immunofluorescence staining, quantification methods, bioinformatics analyses and statistical approaches. Providing comprehensive methodological information is essential.

      A detailed description of these methods is provided in the Supporting Materials and Methods section.

      Reviewer #2 (Recommendations for the authors): 

      The revised manuscript is much improved in clarity, focus, and experimental support. The authors have thoughtfully addressed the major concerns from the previous review. In particular, the logic and flow of the paper are clearer, it now guides the reader through the rationale (constructing a WOI model), the comparative analysis against in vivo tissue and simpler organoids, and the key features that distinguish the WOI assembloid. The added functional validation (especially the blastoid co-culture experiment) significantly strengthens the work by showing a tangible outcome of "receptivity" beyond molecular profiling. The distinction between the standard secretory-phase organoid and the WOI assembloid is now more convincing, as the authors highlight several specific differences in morphology (more cilia, pinopodes), metabolism, and implantation success that favor the WOI model. The manuscript also reads cleaner with the bioinformatic sections condensed to the most important findings (excess detail was trimmed or moved to supplements) and the rationale for gene/pathway selection explicitly stated.

      The manuscript has been significantly strengthened through the addition of functional assays (like the blastoid co-culture), clearer transcriptomic and proteomic data, and detailed analyses of hormone treatments, cilia biology, and stromal and immune cell behavior in early passages. These updates confirm that the WOI assembloid supports embryo attachment and outperforms standard secretory organoids, while integrating external references and clarifications on terminology. Minor suggestions remain, such as clarifying statistical significance and adding functional interpretations for certain observations, but overall, the manuscript is now more robust and biologically convincing.

      Remaining points for clarification: There are a few minor points that still merit attention:

      - Use of the Endometrial Receptivity Test (ERT): As previously mentioned, if the authors have ERT data for the SEC organoid group, including that information would further support the claim that the WOI assembloid is uniquely receptive. If not, it would be helpful to add a statement clarifying that the ERT was employed specifically as a confirmatory test for the WOI assembloids, rather than as a comparative measure across all groups.

      Thank you for your valuable suggestion. We have now supplemented the description in the Supporting Materials and Methods section (Lines 160–162) as follows: “ERA was employed specifically as a confirmatory test for the WOI assembloids, rather than as a comparative measure across all groups.”

      - Because the assembloids are created from primary tissue samples, it would be helpful to briefly comment on how consistent the findings were across different patient-derived samples. For example, did all biological replicates show similar expression of receptivity markers and comparable capacity to support blastoid attachment? Although this seems implied, including a sentence in the Methods or Results sections that specifies the number of donor lines tested would help readers assess the model's variability and reproducibility.

      We appreciated your advice. The relevant statement has been added to the Supporting Materials and Methods section. (Line 312-313).

      “All biological replicates (fourteen individuals) of endometrial assembloids show similar expression of receptivity markers and comparable capacity to support blastoid attachment.”

      - The authors mention promising future directions, such as integrating 3D printing and microfluidics to further enhance the model, which is an excellent forward-looking statement. It would also be valuable to suggest the inclusion of additional cell types, like more robust immune cell populations or endothelial components, as future improvements to create an even more comprehensive model of the endometrial lining.

      Thank you for your valuable suggestion. 3D printing and microfluidics serve as approaches for introducing multiple cell types. We have supplemented the following statement in the manuscript: “We are looking forward to integrating stem cell induction, 3D printing, and microfluidic systems to modify the culture environment.” (Line 352-353).

      We are grateful for your valuable feedback and constructive criticism, which have helped us improve the quality of our work in terms of content and presentation. We have diligently revised the manuscript and made necessary changes. Here, we have attached the revised manuscript, figures, and all supplementary materials for your re-evaluation. Thank you again for your continued support and look forward to your favorable decision.

    1. eLife Assessment

      The authors developed a fundamental computational method, which is intended to automatically process bioluminescence imaging-derived tumour images across anatomical regions and over time. This allows quantitative analysis of such data, and the authors applied it to describe the spatiotemporal distribution of tumour cells in response to CD19-targeted CAR-T cells that contained either CD28 or 4-1BB costimulatory domains. Some operational limitations were identified, which relate to the pipeline's reliance on predefined regions of interest instead of aligning signal sites with anatomical information, scaling, and limitations in taking animal pose into account. Overall, the authors provide compelling evidence for the functionality of their computational approach towards automated analysis of bioluminescence imaging data, while applying it to a current topic of wide interest in cell therapy research.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

      Comments on revisions:

      (1) Clarification of 2D Analysis. We strongly recommend that the authors explicitly define maRQup as a 2D spatiotemporal analysis technique. Since optical imaging quantification is inherently dependent on tissue type and signal depth, characterizing this as a 3D or volumetric method without tomographic correction is inaccurate. Please precede "spatiotemporal" with "2D" throughout the text to ensure precision regarding the method's capabilities.

      (2) Data Validation and Scaling in Supplemental Figure g currently lacks the units necessary to support the assertion.

      Non-Uniform Growth: The authors' method implies that mouse growth is linear and uniform in all directions (isotropic). However, murine growth is not akin to the inflation of a balloon; animals elongate and widen at different rates. The current scaling does not account for these physiological non-linearities.

      Pose Variability: The scaling approach appears to neglect significant variability in animal positioning. Even under anesthesia, animal pose is rarely identical across subjects or time points.

      Requirement for Evidence: Without quantitative data, there appears to be significant differences between the individual images and the merged image. If the authors assert that this is a "classical setting" where mouse positioning is 100% consistent and growth curves are identical in multiple dimensions, please provide specific references that validate these assumptions. Otherwise, the scaling must be corrected to account for anisotropic growth and pose differences or stated that scaling was only based on one dimension.

      (3) Methodology of Spatial Regions The manuscript does not currently indicate how the nine distinct spatial regions were determined. Please expand the methods section to include the specific segmentation algorithms or anatomical criteria used to define these regions, as this is critical for reproducibility.

    3. Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights on preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification.<br /> This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Comments on revisions:

      The critiques have been taken care of appropriately.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup, a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      Our answer to this comment is included in the Supplemental Methods. The standard deviation of the mouse pixels was calculated to ensure that the image processing steps did not alter the shape or size of the mice. Such consistency is particularly striking because our dataset was accrued by nine lab members over the last five years, before we conceived and carried out our analysis (c.f., answer to point #2). In fact, it is the very consistency of this IVIS measurement that led us to conceive our pipeline. As seen from Supplemental Figure 4G, there is minimal difference in the shape or size of the mice across 7,534 images. A total of 99 images were removed either due to being too slanted (91/7663, 1.2%) or due to processing errors (8/7633, 0.1%). Also, the vertical scaling was conducted while keeping the aspect ratio unchanged to prevent any non-anatomical scaling. Hence, we did not record any nonlinear growth of the mice that would warrant more convoluted alignment and/or batch correction for our images.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera, as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

      Reviewer #1 is correct that different mouse postures would be an issue when aligning the images and normalizing for size. However, all experiments are conducted for luminescence measurements in the IVIS system (i.e., this requires anesthesia and long integration time for imaging). In our experience and in our 1000+ mouse dataset, we noticed that all experiments (n=37) did place the anesthetized mice in a stretched/elongated position. Of note, these experiments were conducted by nine different researchers who were not instructed on how to place the mice on the machine for ideal image processing, thus showing that the standard protocol of imaging mice on IVIS does not introduce large variations in animal pose during image acquisition. We think the issue raised by Reviewer #1 is moot in the context of classical settings for mouse luminescence imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a method that automatically processes bioluminescent tumor images for quantitative analysis and used it to describe the spatiotemporal distribution of tumor cells in response to CD19-targeting CAR-T cells, comprising CD28 or 4-1BB costimulatory domains. The conclusion highlights the dependence of tumor decay and relapse on the number of injected cells, the type of cells, and the initial growth rate of tumors (where initial is intended from the first day of therapy). The authors also determined the spatiotemporal analysis of tumor response to CAR T therapy in different regions of the mouse body in a model of acute lymphoblastic leukemia (ALL).

      Strengths:

      The analysis is based on a large number of images and accounts for many variables. The results of the analysis largely support their claims that the kinetics of tumor decay and relapse are dependent on the CAR T co-stimulatory domain and number of cells injected and tumor growth rates. 

      Weaknesses:

      The study does not specify how a) differences in mouse positioning (and whether they excluded not-aligned mice) and b) tumor spread at the start of therapy influenced their data. The study does not take into account the potential heterogeneity of CAR T cells in terms of CAR T expression or T cell immunophenotype (differentiation, exhaustion, fitness...).

      See answer #2 to Reviewer #1.

      Author response image 1.

      Author response image 1 shows the average tumor radiance on day zero (when CAR-T cell therapy was administered) for all mice. While there is some spread, most mice had tumor localized to the liver or bone marrow.

      Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights into preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification. 

      This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Weaknesses: 

      No weaknesses were identified by this Reviewer. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this paper, the authors propose a significant advancement in optical image data analysis by employing automation. They effectively demonstrate the valuable insights that can be gained from analyzing extensive datasets with a more unbiased methodology. At present, I do not have any specific suggestions for improvement.

      However, it is important to note that this work is limited in its operational scope. Specifically, it relies on predefined ROIs rather than aligning the signal site with anatomical systems. The scaling model and image cropping are simplistic, animal pose is not taken into account, and the data output needs to be called semi-quantitative or qualitative, and would have been stronger utilizing an AI agent. Nevertheless, this work underscores the potential of automated systems in preclinical image analysis, which is a crucial step towards developing more sophisticated approaches to optical image data analysis.

      While our analysis used predefined ROIs, the maRQup pipeline allows users to manually draw ROIs on the mouse image.

      Reviewer #2 (Recommendations for the authors):

      The writing and presentation of data are clear and accurate, but some additional information should be added regarding the imaging protocol used to acquire the original data. 

      The authors mention fluorescence in Figure 1. I expected all the data to be generated from bioluminescent NALM-6 tumors, since bioluminescence is indeed measured in average radiance and can be per pixel (p/sec/cm2/sr/pixel). Fluorescence should be measured using radiance efficiency (p/sec/cm2/sr)/(µW/cm2), a unit that compensates for non-uniform excitation light pattern in the instrument. Would the author find different results if fluorescence data were analyzed separately?

      Reviewer #2 is correct that the unit for fluorescence would be radiance efficiency. The word “fluorescent” was included in the label of Figure 1a  to highlight that our workflow could be applied to other types of light-generating methods (i.e., fluorescence vs. bioluminescence). However, in this study, measurements of bioluminescent tumors only were analyzed. If fluorescence measurements are to be analyzed, our methods of image acquisition and processing would be directly applicable.

      Did the author ever check the signal of the snout in mice with no tumor?

      In mice with no tumor, there is no detectable signal in the snout (or anywhere else, for that matter).

      The urine of mice contains phosphor, and might give a background signal, especially if longer exposure is used at the end of the study.

      For the mice with no tumor injection, the luminescence signal was below background (<10<sup>2</sup> p/sec/cm<sup>2</sup>/sr/pixel). In particular, we do not detect any signal in the bladder/urine. Additionally, as described in the Supplemental Methods and Figure 1b, only pixels that were on the mouse as determined from the brightfield image were used to calculate the tumor burden from the radiance of the luminescent image. This method ensures that any background signal (e.g., from phosphor in mouse urine) would be excluded in the radiance quantification and not bias the results.

      Additionally, as described in the Methods, the exposure time was held constant at 30 seconds for each IVIS measurement across all 37 experiments.

      The data using more than 2 million cells comes from only 10 mice, and maybe the biological relevance of this group is limited since it will not be achievable and translatable in humans (PMID: 33653113).

      We appreciate Reviewer #2’s attention to this issue. The effect observed in our study is large enough to reach statistical significance despite the small number of mice. Note that the dosing regimen used was optimized for the murine NSG model and would require appropriate scaling before clinical application. Nonetheless, NSG mice remain the gold standard for pre‑clinical in vivo evaluation and their use is generally required by regulatory agencies, such as the FDA, for assessing novel CAR‑T cell therapies; thus these findings are relevant for advancing such treatments.

    1. eLife Assessment

      This valuable study presents a technically sophisticated intravital two-photon calcium imaging approach to characterize Ca²⁺ dynamics in distinct populations of meningeal macrophages in awake, freely behaving mice. These data are solid and suggest that meningeal macrophage calcium activity is tightly linked to anatomical sub-compartments, with potential implications for migraine and neuroinflammatory processes. Despite these strengths and broad relevance to neuroimmunology, several technical and interpretational issues limit the study, which could be addressed to strengthen this manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a technically sophisticated intravital two-photon calcium imaging approach to characterize meningeal macrophage Ca²⁺ dynamics in awake mice. The development of a Pf4Cre:GCaMP6s reporter line and the integration of event-based Ca²⁺ analysis represent clear methodological strengths. The findings reveal niche-specific Ca²⁺ signaling patterns and heterogeneous macrophage responses to cortical spreading depolarization (CSD), with potential relevance to migraine and neuroinflammatory conditions. Despite these strengths, several conceptual, technical, and interpretational issues limit the impact and mechanistic depth of the study. Addressing the points below would substantially strengthen the manuscript.

      Strengths:

      The use of chronic two-photon Ca²⁺ imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca²⁺ dynamics in the meninges.

      The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca²⁺ signaling properties. The identification of macrophage Ca²⁺ activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      By linking macrophage Ca²⁺ responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Weaknesses:

      The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca²⁺ signal interpretation.

      The manuscript offers an extensive characterization of Ca²⁺ event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca²⁺ activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca²⁺ dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      The GLM analysis revealing coupling between dural perivascular macrophage Ca²⁺ activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca²⁺ manipulation).

      The authors conclude that synchronous Ca²⁺ events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved.

      A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca²⁺ activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca²⁺ suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding.

      The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca²⁺ increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions.

    3. Reviewer #2 (Public review):

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      I have only minor comments.

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this report wish to show that distinct populations of meningeal macrophages respond to cortical spreading depolarization (CSD) via unique calcium activity patterns depending on their location in the meningeal sub-compartments. Perivascular macrophages display calcium signaling properties that are sometimes in opposition to non-perivascular macrophages. Many of the meningeal macrophages also displayed synchronous activity at variable distances from one another. Other macrophages were found to display calcium signals in response to dural vasomotion. CSD could induce variable calcium responses in both perivascular and non-perivascular macrophages in the meninges, in part due to RAMP1-dependent effects. Results will inform future research on the calcium responses displayed by macrophages in the meninges under both normal and pathological conditions.

      Strengths:

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Weaknesses:

      The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Strengths:

      (1) The use of chronic two-photon Ca<sup>2+</sup> imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca<sup>2+</sup> dynamics in the meninges.

      (2) The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca<sup>2+</sup> signaling properties. The identification of macrophage Ca<sup>2+</sup> activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      (3) By linking macrophage Ca<sup>2+</sup> responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Thank you for recognizing the strengths in our work.

      Weaknesses: 

      (1) The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca<sup>2+</sup> signal interpretation.

      We acknowledge that PF4 is not an exclusively macrophage-restricted marker. Yet, among meningeal immunocytes, it is almost exclusively expressed in macrophages (1, 2). Furthermore, in the adult mouse meninges, Pf4<sup>Cre</sup>-based reporter lines label nearly all dural and leptomeningeal macrophages and almost no other cells (3, 4). This Cre line has also been used to target border-associated macrophages (2, 4). Moreover, a recent study suggests that the bacterial artificial chromosome used to generate the Pf4<sup>Cre</sup> line does not affect meningeal macrophage activity (4). Nonetheless, while we already discussed PF4 expression in meningeal megakaryocytes, in a revised version, we plan to discuss the possibility that a very small population of other meningeal immune cells may also be labeled.

      (2) The manuscript offers an extensive characterization of Ca<sup>2+</sup> event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca<sup>2+</sup> activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca<sup>2+</sup> dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      In our discussion, we indicated that “the exact link between the distinct Ca<sup>2+</sup> signal properties of meningeal macrophage subsets observed herein and their homeostatic function remains to be established”. In a revised version, we plan to further acknowledge that this is primarily a descriptive study that provides a foundational landscape of Ca<sup>2+</sup> dynamics in meningeal macrophages.

      (3) The GLM analysis revealing coupling between dural perivascular macrophage Ca<sup>2+</sup> activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca<sup>2+</sup> manipulation). 

      In the results section, we indicated that our data suggest that dural perivascular macrophages are functionally coupled to locomotion-driven dural vasomotion, either responding to it or mediating it. Furthermore, in our discussion, we discussed the possibilities that 1) macrophages sense vascular-related mechanical changes and 2) macrophage Ca<sup>2+</sup> signaling may regulate dural vasomotion. Moreover, we explicitly state that studying causality will require an experimental approach that has yet to be developed, enabling selective manipulation of dural perivascular macrophages.

      (4) The authors conclude that synchronous Ca<sup>2+</sup> events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved. 

      Thank you for this suggestion. In the revision, we will indicate that the source of synchrony remains unresolved.

      (5) A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca<sup>2+</sup> activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca<sup>2+</sup> suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding. 

      While we propose that the decrease in macrophage calcium signaling following CSD could indicate that a hyperexcitable cortex dampens meningeal immunity, in the revised version, we plan to elaborate on the possible implications of this finding.

      (6) The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca<sup>2+</sup> increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions. 

      We plan to acknowledge these limitations.

      Reviewer #2 (Public review): 

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      Thank you.

      I have only minor comments. 

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text. 

      We did not measure the exact distance of the perivascular macrophages from the blood vessels, but defined them as such based on previous data showing that these cells reside along the abluminal surface and maintain tight interactions with mural cells (5). We plan to provide this information in the revised manuscript.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

      We plan to add the method for inducing CSD (i.e., a pinprick in the frontal cortex) to the Results section and provide more background in the Introduction section.

      Reviewer #3 (Public review):

      Strengths: 

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Thank you for recognizing the strengths of our paper

      Weaknesses:

      (1) The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included. 

      We plan to address these issues in the revision

      References

      (1) H. Van Hove et al., A single-cell atlas of mouse brain macrophages reveals unique transcriptional identities shaped by ontogeny and tissue environment. Nat Neurosci 22, 1021-1035 (2019).

      (2) F. A. Pinho-Ribeiro et al., Bacteria hijack a meningeal neuroimmune axis to facilitate brain invasion. Nature 615, 472-481 (2023).

      (3) G. L. McKinsey et al., A new genetic strategy for targeting microglia in development and disease. Elife 9,  (2020).

      (4) H. J. Barr et al., The circadian clock regulates scavenging of fluid-borne substrates by brain border-associated macrophages. bioRxiv,  (2025).

      (5) H. Min et al., Mural cells interact with macrophages in the dura mater to regulate CNS immune surveillance. J Exp Med 221,  (2024).

    1. eLife Assessment

      This study provides valuable evidence that hepatic DHHC7-dependent palmitoylation is a physiologically relevant regulator of systemic metabolism, and that loss of DHHC7 disrupts Gαi palmitoylation, activates cAMP-PKA-CREB signaling, and increases hepatic transcription and secretion of Prg4. The identification of Prg4 as a hepatokine that is elevated in vivo, together with some in vitro evidence for its interaction with GPR146, represents a conceptually novel contribution to the field. However, the evidence linking these mechanisms to systemic lipolysis, liver-adipose tissue crosstalk, and whole-body metabolic physiology remains incomplete, as the phenotypic analyses rely on a limited set of experiments and do not yet fully support claims regarding adipose tissue dysfunction or altered lipid flux.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors' aim was to determine whether hepatic palmitoylation is a physiologically relevant regulator of systemic metabolism. The data demonstrate that loss of DHHC7 in hepatocytes disrupts Gαi palmitoylation, enhances cAMP-PKA-CREB signaling, and drives transcriptional upregulation and secretion of Prg4. The KO mice display increased body weight, fat mass, and plasma cholesterol, but at 12 weeks on HFD, do not exhibit insulin resistance. The potential mechanism underlying the metabolic phenotype was examined by assessing adipocyte signaling and by exploring whether Prg4 acts through GPR146. Through this pathway, the authors intend to link DHHC7-dependent palmitoylation to the regulation of hepatokines that exert systemic metabolic effects.

      Strengths:

      (1) Hepatic palmitoylation in systemic metabolic regulation is largely unexplored. The authors demonstrate the role of DHHC7 in vivo using a successful liver-specific knockout mouse model that causes HFD-dependent obesity without insulin resistance.

      (2) Several studies were performed on chow and HFD, as well as male and female mice.

      (3) Plasma proteomics identified Prg4 as a circulating factor elevated in KO mice. Prg4 overexpression phenocopied the KO mice.

      (4) There is solid mechanistic data supporting the hypothesis that hepatic DHHC7 loss selectively increases Prg4 secretion as a hepatokine.

      (5) There is convincing evidence for the DHHC7 mechanism in liver: DHHC7 controls cAMP-PKA-CREB via Gαi palmitoylation. The authors recognize that the palmitoylation change is causative rather than correlated, and this needs to be more fully explored in the future.

      (6) Strong in vitro data support that Prg4 acts through adipocyte GPR146 via its SMB domain

      Weaknesses:

      (1) The assessment of liver and adipose tissue responses to DHH7 loss is insufficient to support claims that it alters systemic lipolysis. In this new mouse model, liver histology is necessary, especially given the cholesterol increase in the KO. As this is a newly established mouse line, common assessments of the liver during HFD feeding would be important for interpreting the phenotype.

      (2) The data show DHH7 loss causes adipose tissue dysfunction and alterations in lipid metabolism. Beyond that, I suggest not stating more regarding the phenotype of the DHH7 mice for this work. A thorough analysis would be needed to determine which factor drives the obesity and changes in energy balance in the mice. For example, the KO mice had lower oxygen consumption (but no change in CO2 production, which is also usually similarly altered), suggesting a CNS component could drive obesity. However, since the data are not normalized for lean mass and there is no information about locomotor activity, this analysis is incomplete. RER may be informative if available. A broad conservative description of the KO phenotype would be more accurate since Pgr4 has many paracrine targets and likely has autocrine signaling in the liver.

      (3) Most references to lipolysis or lipolysis flux systemically would be inaccurate. To suggest a suppression of lipolysis, serum NEFA would need to be measured, and in vivo or in vitro lipolysis assays performed to test the effect of DHH7 loss or the specificity of PGR4 action on adipocytes in vivo. To demonstrate adipose tissue dysfunction, analysis of lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction should be performed/measured.

      (4) Line 179: The experiment was performed in brown adipocytes to show that Prg4 does not affect p-CREB Figure S8 under the heading: "DHHC7 controls hepatic PKA-CREB activity through Gαi palmitoylation to regulate Prg4 transcription." Unless repeated using liver lysate, the conclusions stated in the text throughout the paper should be revised.

      (5) It appears that the serum and liver proteomics were only assessed for factors that increased in KO mice? Were proteins that were significantly decreased analyzed?

      (6) The beige adipocyte culture method is unclear. The methods do not describe the fat pad used, and the protocol suggests the cells would be differentiated into mature white adipocytes. If they are beige cells, a reference for the method, gene expression, and cell images could support that claim.

      (7) The use of tamoxifen can confound adipocyte studies, as it increases beigeing and weight gain even after a brief initiation period. Both groups were treated with Tam, but another way to induce Cre would be ideal.

      (8) Evidence for the lack of the glucose phenotype is incomplete. One reason could be due to the IP route of glucose administration, which has a large impact on glucose handling during a GTT. To confirm the absence of a glucose tolerance phenotype, an OGTT should be performed, as it is more physiological. In addition, the mice should be fed for 16 weeks. Prg4 affects immune cells, changing how adipose tissue expands, and 12 weeks of HFD feeding is often not long enough to see the effects of adipose tissue inflammation spilling over into the system.

      (9) There may be liver-adipose tissue crosstalk in KO mice, but this was not fully assessed in this study and would be difficult to determine in any setting, given the diverse cell types that are targets of Pdg4. The crosstalk claim is unnecessary to share the basic premises; there is the DHH7 mechanism/phenotype and the Pgr4 mechanism/phenotype, and while there is no Pgr4 adipose direct mechanism, the paper can be successfully reframed.

      (10) Although the DHH7 loss on the chow diet did not result in a phenotype, did the Pgr4 increase in the KO mice on chow? This would determine whether either i) the expression of Pgr4 is dependent on HFD/obesity, or ii) circulating Pgr4 has effects only in an HFD condition. The receptors may also change on HFD, especially in adipocytes.

      Impact:

      This work would significantly contribute to the study of liver metabolism, provided it includes data describing the liver. The role of Pgr4 in adipocytes and other cell types is of substantial value to the field of metabolism. By reframing the paper and conducting some key experiments, its quality and impact can be increased.

    3. Reviewer #2 (Public review):

      In the current report, Sun and Colleagues sought to determine the liver-specific role that DHHC7, a DHHC palmitoyltransferase protein, plays in regulating whole-body energy balance and hepatic crosstalk with adipose tissues. The authors generated an inducible, liver-specific DHHC7 knockout mouse to determine how altered palmitoylation in hepatocytes alters hepatokine production/secretion, and in turn, systemic metabolism. The ablation of DHHC7 was found to alter the production of proteoglycan 4 (Prg4), a hepatokine previously linked to metabolic regulation. The authors propose that the change in Prg4 production is mediated by the loss of Gαi palmitoylation, due to DHHC7 ablation, thereby augmenting cAMP-PKA-CREB signaling in hepatocytes, which alleviates the 'brake' on Prg4 production. The authors further propose that Prg4 overexpression leads to excessive binding to GPR146 on adipocytes, which in turn suppresses PKA-mediated HSL activation, promoting impairments in lipolysis, leading to obesity. The report is interesting and generally well-written, but it appears to have some clear gaps in additional data that would aid in interpretation. The addition of confirmatory culture studies would be incredibly helpful for testing the hypotheses being explored. My comments, concerns, and/or suggestions are outlined below in no particular order.

      (1) Figures: All data should be presented in dot-boxplot format so the reader knows how many samples were analyzed for each assay and group. n=3 for some assays/experiments is incredibly low, particularly when considering the heterogeneity in responsiveness to HFD, food intake, etc....

      (2) Figure 1E-F: It is unclear when the food intake measure was performed. Mice can alter their feeding behavior based on a myriad of environmental and biological cues. It would also be interesting to show food intake data normalized to body mass over time. Mice can counterregulate anorexigenic cues by altering neuropeptide production over time. It is not clear if this is occurring in these mice, but the timing of measuring food intake is important. Additionally, the VO2 measure appears to be presented as being normalized to total body mass, when in fact, it would probably be more accurate to normalize this to lean body mass. Normalizing to total body mass provides a denominator effect due to excessive adiposity, but white fat is not as metabolically active as other high-glucose-consuming tissues. If my memory serves me right, several reports have discussed appropriate normalizations in circumstances such as this.

      (3) Figure 1J-N: It is not all that surprising that fasting glucose and/or TGs were found to be similar between groups. It is well-established that mice have an incredible ability to become hyperinsulinemic in an effort to maintain euglycemia and lipid metabolism dynamics. A few relatively easy assays can be performed to glean better insights into the metabolic status of the authors' model. First, fasting insulin concentrations will be incredibly helpful. Secondly, if the authors want to tease out which adipose depot is most adversely affected by ablation, they could take an additional set of CON and KO mice, fast them for 5-6 hours, provide a bolus injection of insulin (similar to that provided during an insulin tolerance test), and then quickly harvest the animals ~15 minutes after insulin injections; followed by evaluating AKT phosphorylation. This will really tell them if these issues have impairments in insulin signaling. The gold-standard approach would be to perform a hyperinsulinemic-euglyemic clamp in the CON and KO mice. I now see GTT and ITT data, but the aforementioned assays could help provide insight.

      (4) Figure 3A: This looks overexposed to me.

      (5) Figures 3-4: It appears that several of these assays could be complemented with culture-based models, which would almost certainly be cleaner. The conditioned media could then be used from hepatocyte cultures to treat differentiated adipocytes.

      (6) Figure 4: It is unclear how to interpret the phospho-HSL data because the fasting state can affect this readout. It needs to be made clear how the harvest was done. Moreover, insulin and glucagon were never measured, and these hormones have a significant influence over HSL activity. I suspect the KO mice have established hyperinsulinemia, which would likely affect HSL activity. This provides an example of why performing some of these experiments in a dish would make for cleaner outcomes that are easier to interpret.

    4. Reviewer #3 (Public review):

      Summary:

      In the current manuscript, Sun et al aimed to determine the metabolic function of hepatocyte DHHC7, one of the key enzymes in protein palmitoylation. They generated inducible liver-specific Dhhc7 knockout mice and discovered that Dhhc7-LKO mice are more prone to gain weight and develop adipose expansion and obesity. Via unbiased proteomic analysis, they identified PRG4 as one of the top secreted factors in the liver of Dhhc7-LKO mice. Hepatic overexpression of PRG4 recapitulates the obesity phenotype observed in Dhh7-LKO mice. At the mechanistic level, PRG4, once secreted from the liver, can bind to GPR146 on adipocytes and inhibit PKA-HSL signaling and lipolysis. Taken together, their findings suggest a novel pathway by which the liver communicates with adipose tissue and impacts systemic metabolism.

      Strengths:

      (1) The systemic metabolic homeostasis depends on coordination among metabolically active tissues. Thus, active communication between the liver and adipose tissue when facing nutritional challenges (such as high-fat diet feeding) is crucial for achieving metabolic health. The concept that the liver can communicate with adipose tissue and impact the lipolysis process via secreted hepatokines is quite significant but remains poorly understood.

      (2) Hepatocyte Dhhc7 knockout mice developed a significant obesity phenotype, which is associated with adipose expansion.

      (3) Unbiased proteomic analysis identified PRG4 as one of the top secreted factors in the liver of Dhh7-LKO mice. Hepatic overexpression of PRG4 recapitulates the obesity phenotype observed in Dhh7-LKO mice.

      (4) In vitro cell-based assay showed that PRG4 can bind to adipocyte GPR146, inhibit PKA-mediated HSL phosphorylation, and subsequently, the lipolysis process.

      Weaknesses:

      (1) Lack of a causal-effect study to generate evidence directly linking hepatocyte DHH7 and PRG4 in driving adipose expansion and obesity upon HFD feeding.

      (2) Lack of direct evidence to support that PRG4 inhibits adipocyte lipolysis via GPR146. A functional assay demonstrating adipocyte lipolysis is required.

      (3) The conclusion is largely based on the correlation evidence.

    5. Author response:

      Public reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The assessment of liver and adipose tissue responses to DHH7 loss is insufficient to support claims that it alters systemic lipolysis. In this new mouse model, liver histology is necessary, especially given the cholesterol increase in the KO. As this is a newly established mouse line, common assessments of the liver during HFD feeding would be important for interpreting the phenotype.

      We will add the data of the liver histology in the revised version.

      (2) The data show DHH7 loss causes adipose tissue dysfunction and alterations in lipid metabolism. Beyond that, I suggest not stating more regarding the phenotype of the DHH7 mice for this work. A thorough analysis would be needed to determine which factor drives the obesity and changes in energy balance in the mice. For example, the KO mice had lower oxygen consumption (but no change in CO2 production, which is also usually similarly altered), suggesting a CNS component could drive obesity. However, since the data are not normalized for lean mass and there is no information about locomotor activity, this analysis is incomplete. RER may be informative if available. A broad conservative description of the KO phenotype would be more accurate since Pgr4 has many paracrine targets and likely has autocrine signaling in the liver.

      We will add the data of CO2 production, locomotor activity and RER in the revised version.

      (3) Most references to lipolysis or lipolysis flux systemically would be inaccurate. To suggest a suppression of lipolysis, serum NEFA would need to be measured, and in vivo or in vitro lipolysis assays performed to test the effect of DHH7 loss or the specificity of PGR4 action on adipocytes in vivo. To demonstrate adipose tissue dysfunction, analysis of lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction should be performed/measured.

      We will measure the serum NEFA to test the effect of DHHC7. We will analyze the lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction.

      (4) Line 179: The experiment was performed in brown adipocytes to show that Prg4 does not affect p-CREB Figure S8 under the heading: "DHHC7 controls hepatic PKA-CREB activity through Gαi palmitoylation to regulate Prg4 transcription." Unless repeated using liver lysate, the conclusions stated in the text throughout the paper should be revised.

      The figure S8 is to demonstrate that Prg4 has no impact on forskolin induced CREB phosphorylation at Ser133, and provide the evidence that the prg4 acts on the upstream of adenylyl cyclase. We will revise the description.

      (5) It appears that the serum and liver proteomics were only assessed for factors that increased in KO mice? Were proteins that were significantly decreased analyzed?

      We are analyzing the decreased proteins in the following project.

      (6) The beige adipocyte culture method is unclear. The methods do not describe the fat pad used, and the protocol suggests the cells would be differentiated into mature white adipocytes. If they are beige cells, a reference for the method, gene expression, and cell images could support that claim.

      We will add a reference for the method, gene expression, asn cell images.

      (7) The use of tamoxifen can confound adipocyte studies, as it increases beigeing and weight gain even after a brief initiation period. Both groups were treated with Tam, but another way to induce Cre would be ideal.

      We will use the Doxycycline-inducible systems in the future.

      (8) Evidence for the lack of the glucose phenotype is incomplete. One reason could be due to the IP route of glucose administration, which has a large impact on glucose handling during a GTT. To confirm the absence of a glucose tolerance phenotype, an OGTT should be performed, as it is more physiological. In addition, the mice should be fed for 16 weeks. Prg4 affects immune cells, changing how adipose tissue expands, and 12 weeks of HFD feeding is often not long enough to see the effects of adipose tissue inflammation spilling over into the system.

      We will perform the OGTT and feed the mice for 16 weeks in the future.

      (9) There may be liver-adipose tissue crosstalk in KO mice, but this was not fully assessed in this study and would be difficult to determine in any setting, given the diverse cell types that are targets of Pdg4. The crosstalk claim is unnecessary to share the basic premises; there is the DHH7 mechanism/phenotype and the Pgr4 mechanism/phenotype, and while there is no Pgr4 adipose direct mechanism, the paper can be successfully reframed.

      We will reframe the paper.

      (10) Although the DHH7 loss on the chow diet did not result in a phenotype, did the Pgr4 increase in the KO mice on chow? This would determine whether either i) the expression of Pgr4 is dependent on HFD/obesity, or ii) circulating Pgr4 has effects only in an HFD condition. The receptors may also change on HFD, especially in adipocytes.

      We will test the Prg4 in the KO mice on chow diet.

      Reviewer #2 (Public review):

      (1) Figures: All data should be presented in dot-boxplot format so the reader knows how many samples were analyzed for each assay and group. n=3 for some assays/experiments is incredibly low, particularly when considering the heterogeneity in responsiveness to HFD, food intake, etc.

      We will present the data in dot-boxplot format.

      (2) Figure 1E-F: It is unclear when the food intake measure was performed. Mice can alter their feeding behavior based on a myriad of environmental and biological cues. It would also be interesting to show food intake data normalized to body mass over time. Mice can counterregulate anorexigenic cues by altering neuropeptide production over time. It is not clear if this is occurring in these mice, but the timing of measuring food intake is important. Additionally, the VO2 measure appears to be presented as being normalized to total body mass, when in fact, it would probably be more accurate to normalize this to lean body mass. Normalizing to total body mass provides a denominator effect due to excessive adiposity, but white fat is not as metabolically active as other high-glucose-consuming tissues. If my memory serves me right, several reports have discussed appropriate normalizations in circumstances such as this.

      We will see how to be more accurate to normalize.

      (3) Figure 1J-N: It is not all that surprising that fasting glucose and/or TGs were found to be similar between groups. It is well-established that mice have an incredible ability to become hyperinsulinemic in an effort to maintain euglycemia and lipid metabolism dynamics. A few relatively easy assays can be performed to glean better insights into the metabolic status of the authors' model. First, fasting insulin concentrations will be incredibly helpful. Secondly, if the authors want to tease out which adipose depot is most adversely affected by ablation, they could take an additional set of CON and KO mice, fast them for 5-6 hours, provide a bolus injection of insulin (similar to that provided during an insulin tolerance test), and then quickly harvest the animals ~15 minutes after insulin injections; followed by evaluating AKT phosphorylation. This will really tell them if these issues have impairments in insulin signaling. The gold-standard approach would be to perform a hyperinsulinemic-euglyemic clamp in the CON and KO mice. I now see GTT and ITT data, but the aforementioned assays could help provide insight.

      We have the data for evaluating AKT phosphorylation and will add it in the revised version.

      (4) Figure 3A: This looks overexposed to me.

      We will replace it with short exposed one.

      (5) Figures 3-4: It appears that several of these assays could be complemented with culture-based models, which would almost certainly be cleaner. The conditioned media could then be used from hepatocyte cultures to treat differentiated adipocytes.

      We will perform the cell culture experiments for Figures 3-4

      (6) Figure 4: It is unclear how to interpret the phospho-HSL data because the fasting state can affect this readout. It needs to be made clear how the harvest was done. Moreover, insulin and glucagon were never measured, and these hormones have a significant influence over HSL activity. I suspect the KO mice have established hyperinsulinemia, which would likely affect HSL activity. This provides an example of why performing some of these experiments in a dish would make for cleaner outcomes that are easier to interpret.

      We will perform some experiments in cell culture dish.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) Lack of a causal-effect study to generate evidence directly linking hepatocyte DHH7 and PRG4 in driving adipose expansion and obesity upon HFD feeding.

      We will perform the causal-effect study to demonstrate the hypothesis.

      (2) Lack of direct evidence to support that PRG4 inhibits adipocyte lipolysis via GPR146. A functional assay demonstrating adipocyte lipolysis is required.

      We will add the direct evidence in the revised version.

      (3) The conclusion is largely based on the correlation evidence.

      We will perform the experiment to strengthen the conclusion base on the a causal-effect study.

    1. eLife assessment

      The manuscript presents important findings with theoretical or practical implications beyond a single subfield. The work is overall solid, and the methods, data, and analyses broadly support the claims. Although the novelty of this study and the work put into it are appreciated, there are also clearly some weaknesses that should be addressed.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patient-specific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9-GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      (g) Suggested fixes/experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers)

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes/experiments

      Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii)Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus.

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiple-comparison corrections).

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

    3. Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPS-fGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH⁺ neurons, GFAP⁺ glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patientspecific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      We thank the reviewer for highlighting the importance of genetic and biological replication. An additional patient-derived iPSC line was included in the manuscript, therefore, our study includes two independent nGD patient-derived iPSC lines, GD2-1260 (GBA1<sup>L444P/P415R</sup>) and GD2-10-257 (GBA1<sup>L444P/RecNcil</sup>), both of which carry the severe mutations associated with nGD. These two lines represent distinct genetic backgrounds and were used to demonstrate the consistency of key disease phenotypes (reduced GCase activity, elevated substrate, impaired dopaminergic neuron differentiation, etc.) across different patient’s MLOs. Major experiments (e.g., GCase activity assays, substrate, immunoblotting for DA marker TH, and therapeutic testing with SapC-DOPS-fGCase, AAV9-GBA1) were performed using both patient lines, with results showing consistent phenotypes and therapeutic responses (see Figs. 2-6, and Supplementary Figs. 4-5). To ensure clarity and transparency, a new Supplementary Table 2 summarizes the characterization of both the GD2-1260 and GD2-10-257 lines.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      Biological replication was ensured in our study by conducting experiments in at least 3 independent differentiations per line, and technical replicates (multiple organoids/fields per batch) were averaged accordingly. We have clarified biological replicates and differentiation in the figure legends. 

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      In the revision, we have clarified biological replicates and differentiation in the figure legend in Fig.1E; Fig.2B,2G; Fig.3F, 3G; Fig.4B-C,E,H-J, M-N; Fig.6D; and Fig.7A-C, I.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was not feasible because GBA1 overlaps with a highly homologous pseudogene (PGBA), which makes precise editing technically challenging. Consequently, only the L444P mutation was successfully corrected, and the resulting isogenic line retains the P415R mutation in a heterozygous state. Because Gaucher disease is an autosomal recessive disorder, individuals carrying a single GBA1 mutation (heterozygous carriers) do not develop clinical symptoms. Therefore, the partially corrected isogenic line, which retains only the P415R allele, represents a clinically relevant carrier model. Consistent with this, our results show that GCase activity was restored to approximately 50% of wild-type levels (Fig.4B-C), supporting the expected heterozygous state. These findings also make it unlikely that the remaining differences observed are due to clonal variation or epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      We agree that a systematic analysis of maturation stages is essential for validating the MLO model. Our data integrated a longitudinal comparison across multiple developmental windows (Weeks 3 to 28) to characterize the transition from progenitors to mature/functional states for nGD phenotyping and evaluation of therapeutic modalities: 1) DA differentiation (Wks 3 and 8 in Fig. 3): qPCR analysis demonstrated the progression of DA-specific programs. We observed a steady increase in the mature DA neuron marker TH and ASCL1. This was accompanied by a gradual decrease in early floor plate/progenitor markers FOXA2 and PLZF, indicating a successful differentiation path from progenitors to differentiated/mature DA neurons. 2) Glycosphingolipid substrates accumulation (Wks 15 and 28 in Fig 2): To assess late-stage nGD phenotyping, we compared GluCer and GluSph at Week 15 and Week 28. This comparison highlights the progressive accumulation of substrates in nGD MLOs, reflecting the metabolic consequences of the disease at different mature stage. 3) Organoid growth dynamics (Wks 4, 8, and 15 in new Fig. 4): The new Fig. 4 tracks physical maturation through organoid size and growth rates across three key time points, providing a macro-scale verification of consistent development between WT and nGD groups. By comparing these early (Wk 3-8) and late (Wk 15-28) stages, we confirmed that our MLOs transition from a proliferative state to a post-mitotic, specialized neuronal state, satisfied the requirement for comparing distinct maturation stages.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work.

      (g) Suggested fixes / experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers).

      Additional line iPSC GD2-10-257 derived MLO was included in the manuscript. This was addressed above [see response to Weaknesses (1)-a]. 

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was unsuccessful because the GBA1 gene overlaps with a pseudogene (PGBA) located 16 kb downstream of GBA1, which shares 96-98% sequence similarity with GBA1 (Ref#1, #2), which complicates precise editing. GBA1 is shorter (~5.7 kb) than PGBA (~7.6 kb). The primary exonic difference between GBA1 and PGBA is a 55-bp deletion in exon 9 of the pseudogene. As a result, the isogenic line we obtained carries only the P415R mutation, and L444P was corrected to the normal sequence. We have included this limitation in the Methods as “This gene editing strategy is expected to also target the GBA1 pseudogene due to the identical target sequence, which limits the gene correction on certain mutations (e.g., P415R)”. 

      References:

      (1) Horowitz M., Wilder S., Horowitz Z., Reiner O., Gelbart T., Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics (1989). 4, 87–96. doi:10.1016/0888-7543(89)90319-4

      (2) Woo EG, Tayebi N, Sidransky E. Next-Generation Sequencing Analysis of GBA1: The Challenge of Detecting Complex Recombinant Alleles. Front Genet. (2021). 12:684067. doi:10.3389/fgene.2021.684067. PMCID: PMC8255797.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      This was addressed above [see response to Weaknesses (1)-b, (1)-c]. 

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes/experiments

      Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii) Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      We thank the reviewer for these valuable suggestions. We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work. Importantly, the primary conclusions of our manuscript, that GBA1 mutations in nGD MLOs resulted in nGD pathologies such as diminished enzymatic function, accumulation of lipid substrates, widespread transcriptomic changes, and impaired dopaminergic neuron differentiation, which can be corrected by several therapeutic strategies in this study, are supported by the evidence presented. The suggested experiments represent an important direction for future research using brain organoids.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      We agree with the reviewer. Because this is a proof-of-principle study, the treatment was designed within a short time window. Long-term studies with more comprehensive outcome assessments will be conducted in future work.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      We appreciate the reviewer’s concerns. This study was intended to demonstrate the feasibility and initial response of MLOs to AAV therapy. A comprehensive evaluation of AAV biodistribution will be considered in future studies.

      The penetration and distribution of SapC-DOPS have been extensively characterized in prior studies. In vivo biodistribution of SapC–DOPS coupled CellVue Maroon, a fluorescent cargo, was examined in mice bearing human tumor xenografts using real-time fluorescence imaging, where CellVue Maroon fluorescence in tumor remained for 48 hours (Ref. #3: Fig. 4B, mouse 1), 100 hours (Ref. #4: Fig. 5), up to 216 hours (Ref. #5: Fig. 3). Uptake kinetics were also demonstrated in cells, with flow cytometry quantification showing that fluorescent cargo coupled SapC-DOPS nanovesicles, were incorporated into human brain tumor cell membranes within minutes and remained stably incorporated into the cells for up to one hour (Ref. # 6: Fig. 1a and Fig. 1b). Building on these findings, the present study focuses on evaluating the restoration of GCase function rather than reexamining biodistribution and uptake kinetics.

      References:

      (3) X. Qi, Z. Chu, Y.Y. Mahller, K.F. Stringer, D.P. Witte, T.P. Cripe. Cancer-selective targeting and cytotoxicity by liposomal-coupled lysosomal saposin C protein. Clin. Cancer Res. (2009) 15, 5840-5851. PMID: 19737950.

      (4) Z. Chu, S. Abu-Baker, M.B. Palascak, S.A. Ahmad, R.S. Franco, and X. Qi. Targeting and cytotoxicity of SapC-DOPS nanovesicles in pancreatic cancer. PLOS ONE (2013) 8, e75507. PMID: 24124494.

      (5) Z. Chu, K. LaSance, V.M. Blanco, C.-H. Kwon, B., Kaur, M., Frederick, S., Thornton, L., Lemen, and X. Qi. Multi-angle rotational optical imaging of brain tumors and arthritis using fluorescent SapC-DOPS nanovesicles. J. Vis. Exp. (2014) 87, e51187, 17. PMID: 24837630.

      (6) J. Wojton, Z. Chu, C-H. Kwon, L.M.L. Chow, M. Palascak, R. Franco, T. Bourdeau, S. Thornton, B. Kaur, and X. Qi. Systemic delivery of SapC-DOPS has antiangiogenic and antitumor effects against glioblastoma. Mol. Ther. (2013) 21, 1517-1525. PMID: 23732993.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      Including inactive fGCase would confound the assessment of fGCase in MLOs by immunoblot and immunofluorescence; therefore, saposin C–DOPS was used as the control instead. 

      We agree that assessment of Off-target expression and potential cytotoxicity for AAV is important; this will be included in future studies.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      To address this comment, we have added a new table (Supplementary Table 2) comparing the four therapeutic modalities and summarizing their respective outcomes. While this study focused on short-term responses as a proof-of-principle, future work will explore long-term therapeutic effects. 

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      We appreciate the reviewer’s suggestions. The therapeutic testing in patient-derived MLOs was designed as a proof-of-principle study to demonstrate feasibility and the primary response (rescue of GCase function) to the treatment. A comprehensive, long-term therapeutic evaluation of AAV and SapC-DOPS-fGCase is indeed important for a complete assessment; however, this represents a separate therapeutic study and is beyond the scope of the current work.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      For the AAV-treated experiments, we agree that measuring AAV copy number and GFP expression would provide additional information. However, the primary goal of this study was to demonstrate the key therapeutic outcome, rescue of GCase function by AAV-delivered normal GCase, which is directly relevant to the treatment objective.

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      As noted above [see response to Weakness (3)-c], using inert GCase would confound the assessment of fGCase uptake in MLOs; therefore, it was not suitable for this study. See response above for the distribution and uptake kinetics of SapC-DOPS [see response to Weaknesses (3)-b].

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      We have added a new table (Supplementary Table 2) providing a head-to-head comparison of the treatment effects. 

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      We agree that the absence of microglia and vasculature in midbrain-like organoids represents a limitation, as we have discussed in the manuscript. In this revision, we highlighted this limitation in the Discussion section and clarified that it may contribute to incomplete phenotyping and phenotypic rescue observed in our therapeutic experiments. Additionally, we have outlined future directions to incorporate microglia and vascularization into the organoid system to better recapitulate the in vivo environment and improve translational relevance (see 7th paragraph in the Discussion).

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that certain abnormalities, such as patterning defects observed during early differentiation, likely reflect developmental consequences of GBA1 mutations rather than degenerative processes. Conversely, phenotypes such as substrate accumulation, lysosomal dysfunction, and impaired dopaminergic maturation at later stages are interpreted as degenerative features. We have updated the Results and Discussion sections to avoid conflating developmental defects with neurodegenerative mechanisms.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      The manuscript has been revised to avoid overstatements.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      The manuscript now includes further plans to address the incorporation of microglia and vascularization, described in the last two paragraphs in the Discussion. Pilot study of microglia incorporation will be reported when it is completed.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      We have clarified biological replicates and differentiation in the figure legend [see response to Weaknesses (1)-b, (1)-c]. 

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated Statistical analysis in the methods as described below:

      “For comparisons between two groups, data were analyzed using unpaired two-tailed Student’s t-tests when the sample size was ≥6 per group and normality was confirmed by the Shapiro-Wilk test. When the normality assumption was not met or when sample sizes were small (n < 6), the non-parametric Mann-Whitney U test was used instead. For comparisons involving three or more groups, one-way ANOVA followed by Tukey’s multiple comparison test was applied when data were normally distributed; otherwise, the nonparametric Dunn’s multiple comparison test was used. Exclusion of outliers was made based on cut-offs of the mean ±2 standard deviations. All statistical analyses were performed using GraphPad Prism 10 software. Exact p-values are reported throughout the manuscript and figures where feasible. A p-value < 0.05 was considered statistically significant.”

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      In this work, quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM. This multilevel averaging approach minimizes bias from regional heterogeneity within organoids and accounts for variability across differentiations. Representative confocal images shown in the figures were selected to accurately reflect the quantified data. We believe this standardized quantification strategy ensures robust and reproducible results while appropriately representing the 3D architecture of the organoids.

      In the revision, we have clarified the method used for image analysis of sectioned MLOs as below:

      “Quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed using ImageJ (NIH) on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM.”

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      RNA-seq data are from the same batch. The mapping rate is >90%. GEO accession will be active upon publication. These were included in the Methods.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      We have revised the figure legends to include replicates for each figure and statistical tests [see response in weaknesses (1)-b, (1)-c].

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      Statistical analysis method is provided in the revision [see response in Weaknesses (5)-b].

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      We validated the MLO identity by 1) FOXG1 and 2) EN1. FOXG1 was barely detectable in Wk8 75.1_MLO but highly present in ‘age-matched’ cerebral organoid (CO), suggesting our culturing method is midbrain region-oriented. In nGD MLO, FOXG1 expression is significantly higher than 75.1_MLO, indicating that there was aberrant anterior-posterior brain specification, consistent with the transcriptomic dysregulation observed in our RNA-seq data.

      To further confirm midbrain identity, we examined the expression of EN1, an established midbrain-specific marker. Quantitative RT-PCR analysis demonstrated that EN1 expression increased progressively during differentiation in both WT-75.1 and nGD2-1260 MLOs at weeks 3 and 8 (Author response image 1). EN1 reached 34-fold and 373-fold higher levels than in WT-75.1 iPSCs at weeks 3 and 8, respectively, in WT-75.1 MLOs. In nGD MLOs, although EN1 expression showed a modest reduction at week 8, the levels were not significantly different from those observed in age-matched WT-75.1 MLOs (p > 0.05, ns).

      Author response image 1.

      qRT-PCR quantification of midbrain progenitor marker EN1 expression in WT-75.1 and GD2-1260 MLOs at Wk3 and Wk8. Data was normalized to WT-75.1 hiPSC cells and presented as mean ± SEM (n = 3-4 MLOs per group).ns, not significant.<br />

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      We quantified TH expression at both the mRNA level (Fig. 3F) and the protein level (Fig. 3G/H) from whole-organoid lysates, which provides a more consistent and integrative measure across samples. These TH expression levels correlated well with the corresponding extracellular (medium) dopamine concentrations for each genotype. In contrast, TH⁺ neuron counts may not reliably reflect total cellular dopamine levels because the number of cells captured on each organoid section varies substantially, making normalization difficult. Measuring intracellular dopamine is an alternative approach that will be considered in future studies.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus. (off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus). 

      The off-target effect was analyzed during gene editing and the chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis. The related method was also updated as stated below:

      “The chance to target other Off-targets is low due to low Off-target scores ranked based on the MIT Specificity Score analysis (Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013).https://doi.org/10.1038/nbt.2647).”

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      The normalization was to the protein of the organoid lysate. This was clarified in the Methods section in the revision as stated below:

      “The GluCer and GluSph levels in MLO were normalized to total MLO protein (mg) that were used for glycosphingolipid analyses. Protein mass was determined by BCA assay and glycosphingolipid was expressed as pmol/mg protein. Additionally, GluSph levels in the culture medium were quantified and normalized to the medium volume (pmol/mL).”

      Representative LC-MS chromatograms for both normal and GD MLOs have been included in a new figure, Supplementary Figure 2.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiplecomparison corrections).

      This was addressed above [see response to Weaknesses (1)-b and (5)-b].

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      The title was revised to: Patient-Specific Midbrain Organoids with CRISPR Correction Recapitulate Neuronopathic Gaucher Disease Phenotypes and Enable Evaluation of Novel Therapies

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      This was addressed above [see response to Weaknesses (1)-a, (1)-b, (1)-c].

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      As addressed above [see response to Weaknesses (2)], the suggested experiments in b) and c) would provide additional insights into this study and we will consider them in future work. 

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      This was addressed above [see response to Weaknesses (3)-a to e].

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

      This was addressed above [see response to Weaknesses (1)-b, (5)-b].

      Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

      We thank the reviewer for the supportive remarks.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a 

      general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      We observed reduced dopaminergic neuron marker TH expression in GBA1 L444P/RecNciI (GD2-10-257) MLOs, suggesting that this line also exhibits defects in dopaminergic neuron differentiation. These data are provided in a new Supplementary Fig. 4E, and are summarized in new Supplementary Table 2 in the revision.

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPSfGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH⁺ neurons, GFAP⁺ glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      All cell types in wild-type MLOs are expected to express GBA1, as it is a housekeeping gene broadly expressed across neurons, astrocytes, and other brain cell types. Its lysosomal function is essential for cellular homeostasis and is therefore not restricted to any specific lineage. (https://www.proteinatlas.org/ENSG00000177628GBA1/brain/midbrain). 

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

      We appreciate the reviewer’s suggestion; however, we respectfully prefer to retain the current order of Figures 2 and 3, as we believe this structure provides the clearest narrative flow. Figure 2 establishes the core biochemical hallmarks: reduced GCase activity, substrate accumulation, and global transcriptomic dysregulation (1,429 DEGs enriched in neural development, WNT signaling, and lysosomal pathways), which together provide essential molecular context for studying the specific cellular differentiation defects presented in Figure 3. Presenting the broader disease landscape first creates a coherent mechanistic link to the subsequent analyses of midbrain patterning and dopaminergic neuron impairment.

      To enhance readability, we have added a brief transitional sentence at the start of the Figure 3 paragraph: “Building on the molecular and transcriptomic hallmarks of GCase deficiency observed in nGD MLOs (Figure 2), we next investigated the impact on midbrain patterning and dopaminergic neuron differentiation (Figure 3).”

    1. eLife Assessment

      This modelling study tests several hypotheses describing how seasonality and migration drive the epidemiology of Rift Valley Fever Virus among transhumant cattle in The Gambia. The work is methodologically solid, and the findings offer valuable insights into how the movement of cattle in and out of the Gambia River and Sahel ecoregions could lead to source-sink transmission dynamics among cattle subpopulations, sustaining endemic transmission.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses data from a recent RVFV serosurvey among transhumant cattle in The Gambia to inform the development of an RVFV transmission model. The model incorporates several hypotheses that capture the seasonal nature of both vector-borne RVFV transmission and cattle migration. These natural phenomena are driven by contrasting wet and dry seasons in The Gambia's two main ecoregions and are purported to drive cyclical source-sink transmission dynamics. Although the Sahel is hypothesized to be unsuitable for year-long RVFV transmission, findings suggest that cattle returning from the Gambia River to the Sahel at the beginning of the wet season could drive repeated RVFV introductions and ensuing seasonal outbreaks. Upon review, the authors have removed an additional analysis evaluating the potential impacts of cattle movement bans on transmission dynamics, which was poorly supported by the methodological approach.

      Strengths:

      Like most infectious diseases in animal systems in low- and middle-income countries, the transmission dynamics of RVFV in cattle in The Gambia are poorly understood. This study harnesses important data on RVFV seroepidemiology to develop and parameterize a novel transmission model, providing plausible estimates of several epidemiological parameters and transmission dynamic patterns.

      This study is well written and easy to follow.

      The authors consider both deterministic and stochastic formulations of their model, demonstrating potential impacts of random events (e.g. extinctions) and providing confidence regarding model robustness.

      The authors use well-established Bayesian estimation techniques for model fitting and confront their transmission model with a seroepidemiological model to assess model fit.

      Elasticity analyses help to understand the relative importance of competing demographic and epidemiological drivers of transmission in this system.

      Weaknesses:

      The model does not include an impact of infection on cattle birth rates, but the authors justify that this parameter should have limited impact on dynamics given predicted low-level circulation patterns, as opposed to explosive outbreaks, in this region.

      The importance of the LVFV positivity decay rate is highlighted but loss of immunity is not considered in the SIR model. The authors do discuss uncertainty regarding model structure and a need for future data collection to begin to answer this question.

      The model's structure, including homogenous mixing within each ecoregion and step-change seasonality, allows for estimation of generalized transmission rates at a macro scale. However, it greatly simplifies the movement process itself and assumes that transhumant cattle movement is the only mechanism for RVF reintroduction into the Sahel region. The authors discuss that integration of more finely-scaled movement and contact data may help to address this limitation in future work.

      This model seems well-suited to be exploited in future work to explore for e.g. impacts of cattle vaccination, and potential differential efficiency when targeting T herds relative to M or L.

      Comments on revisions:

      I thank the authors for thoughtfully and thoroughly addressing my concerns. I have no further comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public reviews:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In this revision, we have:

      - clarified that while epidemics occur in other parts of sub-Saharan Africa, our results are consistent with the epidemiological narrative of RVF in The Gambia, characterised by sustained, moderate transmission without resulting in substantial outbreaks (hyperendemicity).

      - discussed how model assumptions (e.g. seasonality, homogenous mixing) may bias our results toward an endemic quasi-equilibrium dynamic.

      - highlighted the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In this revision we have:

      - clarified this distinction in the manuscript to avoid overinterpretation.

      - emphasized the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause substantial abortions and neonatal deaths, these events occur during sporadic epidemics. In the Gambian context, where we’re not observing large outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This approach is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we have acknowledged this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for the M subpopulation in the dry season were not included in the appendix due to an oversight, though demographic turnover was implemented in the model code. We have now added the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay) is an important consideration in RVFV serology; however, whether this decline reflects a true loss of protective immunity following natural infection remains unknown. Available evidence suggests that infected cattle likely develop long-lasting immunity, and findings in humans further support this assumption, although longitudinal field data regarding RVFV-specific antibody durability in animals are not available to the best of our knowledge. From a modelling perspective, our objective was to estimate FOI and use it to predict an age-seroprevalence curve consistent with the observed cross-sectional age-seroprevalence patterns. We therefore adopted a parsimonious SIR framework, interpreting loss of seropositivity as a potential explanation for discrepancies between observed and predicted age-seroprevalence rather than explicitly modelling waning immunity. We have now:

      - clarified this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      - highlighted that while an SEIS/SIRS framework could theoretically generate different long-term dynamics, evaluating this approach requires stronger evidence for true immunity loss.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment and for raising an important conceptual point. However, the force of infection in our study is not estimated using a serocatalytic framework. Instead, FOI is estimated mechanistically within the transmission model as a function of the number of infectious cattle, rather than from age-stratified seroprevalence data.

      RVF-induced mortality is accounted for through its effect on the infectious compartment, where increased mortality reduces the number and duration of infectious cattle and therefore indirectly reduces FOI. Consequently, RVF-related cattle death does not need to be explicitly incorporated into the FOI expression itself. Seroreversion similarly does not influence FOI estimation under this modelling framework. We have clarified this distinction in the Methods section to avoid confusion between mechanistic transmission models and serocatalytic approaches.

      (7) Clarifying previous vs. current study components

      We have revised the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We have expanded the limitations section to identify the sparse household movement data as contributing most to uncertainty. We have outlined how these limitations may have implications for our conclusions, and may lead to under- or over-estimation of periods of heightened transmission risk.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not ideally be suited to exploring a movement ban. In this revised manuscript, we have removed this analysis. We are currently developing separate work focused on RVF vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

      Reviewer #1 (Recommendations for the authors):

      We thank the reviewer for the recommendations regarding the Introduction, Methods, Results, and Supplementary Figures. We have addressed these points below and revised the manuscript accordingly.

      (1) Introduction: Should avoid describing as "inaccessible" the regions that are inhabited by nomadic and transhumant pastoralists.

      We have revised the wording to “hard-to-reach” regions.

      (2) Methods: Can the authors state what share of the animals included in the household survey data were cattle as opposed to other small ruminants? It would be helpful to understand what share of the data is "excluded"

      We have now included the total number of cattle sampled, providing clarity on the proportion of data used in the analyses.

      (3) Methods: When introducing the deterministic model, it seems unnecessary to mention the initialization conditions (i.e., introduction of a single infected individual at time 0) when this is later repeated in the Estimation of model parameters section, where it seems simulations were first conducted.

      We have removed the redundant description.

      (4) Results: Could the negative correlation between geographic distance of connected herds and mean seroprevalence simply indicate proximal exposure rather than common risk factors?

      We acknowledge that both mechanisms are plausible. RVFV transmission is strongly influenced by share environmental factors that shape mosquito dynamics; however, direct transmission between proximal cattle herds may also occur through close contact with infectious tissues, bodily fluids, or contaminated materials. We have clarified this interpretation in the Results section.

      (5) Figure S5: inconsistent notation for the scaling factor parameter (tau), which is expressed in equations and tables as psi.

      We thank the reviewer for identifying this issue and have corrected all instances to ensure consistent use of tau throughout the manuscript.

      (6) Figure S6: Why a density plot, isn't the number of temporary extinctions (x-axis) discrete?

      We have replaced the density plot with a bar plot in Figure S6.

    1. eLife Assessment

      This useful study examines whether the sugar trehalose, coordinates energy supply with the gene programs that build muscle in the cotton bollworm (Helicoverpa armigera). The evidence for this currently is incomplete. The central claim - that trehalose specifically regulates an E2F/Dp-driven myogenic program - is not supported by the specificity of the data: perturbations and sequencing are systemic, alternative explanations such as general energy or amino-acid scarcity remain plausible, and mechanistic anchors are also limited. The work will interest researchers in insect metabolism and development; focused, tissue-resolved measurements together with stronger mechanistic controls would substantially strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this work by Mohite et al., they have used transcriptomic and metabolic profiling of H. armigera, muscle development, and S. frugiperda to link energy trehalose metabolism and muscle development. They further used several different bioinformatics tools for network analysis to converge upon transcriptional control as a potential mechanism of metabolite-regulated transcriptional programming for muscle development. The authors have also done rescue experiments where trehalose was provided externally by feeding, which rescues the phenotype. Though the study is exciting, there are several concerns and gaps that lead to the current results as purely speculative. It is difficult to perform any genetic experiments in non-model insects; the authors seem to suggest a similar mechanism could also be applicable in systems like Drosophila; it might be possible to perform experiments to fill some missing mechanistic details.

      A few specific comments below:

      The authors used N-(phenylthio) phthalimide (NPP), a trehalose-6-phosphate phosphatase (TPP) inhibitor. They also find several genes, including enzymes of trehalose metabolism, that change. Further, several myogenic genes are downregulated in bulk RNA sequencing. The major caveat of this experiment is that the NPP treatment leads to reduced muscle development, and so the proportion of the samples from the muscles in bulk RNA sequencing will be relatively lower, which might have led to the results. So, a confirmatory experiment has to be performed where the muscle tissues are dissected and sequenced, or some of the interesting targets could be validated by qRT-PCR. Further to overcome the off-target effects of NPP, trehalose rescue experiments could be useful.

      Even the reduction in the levels of ADP, NAD, NADH, and NMN, all of which are essential for efficient energy production and utilization, could be due to the loss of muscles, which perform predominantly metabolic functions due to their mitochondria-rich environment. So it becomes difficult to judge if the levels of these energy molecules' reduction are due to a cause or effect.

      The authors have used this transcriptomic data for pathway enrichment analysis, which led to the E2F family of transcription factors and a reduction in the level of when trehalose metabolism is perturbed. EMSA experiments, though, confirm a possibility of the E2F interaction with the HaTPS/TPP promoter, but it lacks proper controls and competition to test the actual specificity of this interaction. Several transcription factors have DNA-binding domains and could bind any given DNA weakly, and the specificity is ideally known only from competitive and non-competitive inhibition studies.

      The work seems to have connected the trehalose metabolism with gene expression changes, though this is an interesting idea, there are no experiments that are conclusive in the current version of the manuscript. If the authors can search for domains in the E2F family of transcription factors that can bind to the metabolite, then, if not, a chip-seq is essential to conclusively suggest the role of E2F in regulating gene expression tuned by the metabolites.

      Some of the above concerns are partially addressed in experiments where silencing of E2F/Dp shows similar phenotypes as with NPP and dsRNA. It is also notable that silencing any key transcription factor can have several indirect effects, and delayed pupation and lethality could not be definitely linked to trehalose-dependent regulation.

      Trehalose rescue experiments that rescue phenotype and gene expression are interesting. But is it possible that the fed trehalose is metabolized in the gut and might not reach the target tissue? In which case, the role of trehalose in directly regulating transcription factors becomes questionable. So, a confirmatory experiment is needed to demonstrate that the fed trehalose reaches the target tissues. This could possibly be done by measuring the trehalose levels in muscles post-rescue feeding. Also, rescue experiments need to be done with appropriate control sugars.

      No experiments are performed with non-target control dsRNA. All the experiments are done with an empty vector. But an appropriate control should be a non-target control.

    3. Reviewer #2 (Public review):

      Summary:

      This study shows that the knockdown of the effects of TPS/TPP in Helicoverpa armigera and Spodoptera frugiperda can be rescued by trehalose treatment. This suggests that trehalose metabolism is necessary for development in the tissues that NPP and dsRNA can reach.

      Strengths:

      This study examines an important metabolic process beyond model organisms, providing a new perspective on our understanding of species-specific metabolism equilibria, whether conserved or divergent.

      Weaknesses:

      While the effects observed may be truly conserved across Lepidopterans and may be muscle-specific, the study largely relies on one species and perturbation methods that are not muscle-specific. The technical limitations arising from investigations outside model systems, where solid methods are available, limit the specificity of inferences that may be drawn from the data.

    4. Reviewer #3 (Public review):

      The hypothesis is that Trehalose metabolism regulates transcriptional control of muscle development in lepidopteran insects.

      The manuscript investigates the role of Trehalose metabolism in muscle development. Through sequencing and subsequent bioinformatics analysis of insects with perturbed trehalose metabolism (knockdown of TPS/TPP), the authors have identified transcription factor E2F, which was validated through RT-PCR. Their hypothesis is that trehalose metabolism regulates E2F, which then controls the myogenic genes. Counterintuitive to this hypothesis, the investigators perform EMSAs with the E2F protein and promoter of the TPP gene and show binding. Their knockdown experiments with Dp, the binding partner of E2F, show direct effect on several trehalose metabolism genes. Similar results are demonstrated in the trehalose feeding experiment, where feeding trehalose leads to partial rescue of the phenotype observed as a result of Dp knockdown. This seems contradictory to their hypothesis. Even more intriguing is a similar observation between paramyosin, a structural muscle protein, and E2F/Dp - they show that paramyosin regulates E2F/Dp and E2F/Dp regulated paramyosin. The only plausible way to explain the results is the existence of a feed-forward loop between TPP-E2F/Dp and paramyosin-E2F/Dp. But the authors have mentioned nothing in this line. Additionally, I think trehalose metabolism impacts amino acid content in insects, and that will have a direct bearing on muscle development. The sequencing analysis and follow-up GSEA studies have demonstrated enrichment of several amino acid biosynthetic genes. Yet authors make no efforts to measure amino acid levels or correlate them with muscle development. Any study aiming to link trehalose metabolism and muscle development and not considering the above points will be incomplete.

      The result section of the manuscript is quite concise, to my understanding (especially the initial few sections), which misses out on mentioning details that would help readers understand the paper better. While technical details of the methods should be in the Materials and Methods section, the overall experimental strategy for the experiments performed should be explained in adequate detail in the results section itself or in figure legends. I would request authors to include more details in the results section. As an extension of the comment above, many times, abbreviations have been used without introducing them. A thorough check of the manuscript is required regarding this.

      The Spodoptera experiments appear ad hoc and are insufficient to support conservation beyond Helicoverpa. To substantiate this claim, please add a coherent, minimal set of Spodoptera experiments and present them in a dedicated subsection. Alternatively, consider removing these data and limiting the conclusions (and title) to H. armigera.

      In order to check the effects of E2F/Dp, a dsRNA-mediated knockdown of Dp was performed. Why was the E2F protein, a primary target of the study, not chosen as a candidate? The authors should either provide justification for this or perform the suggested experiments to come to a conclusion. I would like to point out that such experiments were performed in Drosophila.

      Silencing of HaDp resulted in a significant decrease in HaE2F expression. I find this observation intriguing. DP is the cofactor of E2F, and they both heterodimerise and sit on the promoter of target genes to regulate them. I would request authors to revisit this result, as it contradicts the general understanding of how E2F/Dp functions in other organisms. If Dp indeed controls E2F expression, then further experiments should be conducted to come to a conclusion convincingly. Additionally, these results would need thorough discussion with citations of similar results observed for other transcription factor-cofactor complexes.

      I consider the overall bioinformatics analysis to remain very poorly described. What is specifically lacking is clear statements about why a particular dry lab experiments were conducted.

      In my judgement, the EMSA analysis presented is technically poor in quality. It lacks positive and negative controls, does not show mutation analysis or super shifts. Also, it lacks any competition assays that are important to prove the binding beyond doubt. I am not sure why protein is not detected at all in lower concentrations. Overall, the EMSA assays need to be redone; I find the current results to be unacceptable.

      GSEA studies clearly indicate enrichment of the amino acid synthesis gene in TPP knockdown samples. This supports the plausible theory that a lack of Trehalose means a lack of enough nutrients, therefore less of that is converted to amino acids, and therefore muscle development is compromised. Yet the authors make no effort to measure amino acid levels. While nutrients can be sensed through signalling pathways leading to shut shutdown of myogenic genes, a simple and direct correlation between less raw material and deformed muscle might also be possible.

      The authors are encouraged to stick to one color palette while demonstrating sequencing results. Choosing a different color palette for representing results from the same sequencing analysis confuses readers.

      Expression of genes, as understood from sequencing analysis in Figure 1D, Figure 2F, and Figure 3D, appears to be binary in nature. This result is extremely surprising given that the qRT-PCR of these genes have revealed a checker and graded expression.

      In several graphs, non-significant results have been interpreted as significant in the results section. In a few other cases, the reported changes are minimal, and the statistical support is unclear; please recheck the analyses and include exact statistics. In the results section, fold changes observed should be discussed, as well as the statistical significance of the observed change.

      Finally, I would add that trehaolse metabolism regulates cell cycle genes, and muscle development genes establish correlation and causation. The authors should ensure that any comments they make are backed by evidence.

    5. Author response:

      eLife Assessment

      This useful study examines whether the sugar trehalose, coordinates energy supply with the gene programs that build muscle in the cotton bollworm (Helicoverpa armigera). The evidence for this currently is incomplete. The central claim - that trehalose specifically regulates an E2F/Dp-driven myogenic program - is not supported by the specificity of the data: perturbations and sequencing are systemic, alternative explanations such as general energy or amino-acid scarcity remain plausible, and mechanistic anchors are also limited. The work will interest researchers in insect metabolism and development; focused, tissue-resolved measurements together with stronger mechanistic controls would substantially strengthen the conclusions.

      We thank the reviewer for the thoughtful and constructive evaluation of our work and for recognizing its potential relevance to researchers working on insect metabolism and development. We fully agree that our current evidence is preliminary and that the mechanistic link between trehalose and the E2F/Dp‑driven myogenic program needs to be strengthened.

      Our intention was to present trehalose-E2F/Dp coupling as a working model emerging from our data, rather than as a fully established pathway. We agree that systemic manipulations of trehalose and whole‑larval RNA‑seq cannot fully differentiate global metabolic stress from specific effects on myogenic programs. In the revision, we plan to include additional metabolic readouts (e.g., ATP/AMP ratio, key amino acids where available) to better discuss the overall energetic and nutritional state. We will reanalyze our RNA‑seq data to more clearly distinguish broad stress/metabolic signatures from cell‑cycle/myogenic signatures. Furthermore, we will reframe our discussion to explicitly state that we cannot completely rule out a contribution of general energy or amino‑acid scarcity at this stage.

      We acknowledge that, with our current experiments, the specificity for an E2F/Dp‑driven program is inferred mainly from enrichment of E2F targets among differentially expressed genes, and expression changes in canonical E2F partners and downstream cell‑cycle/myogenic regulators. To address this more rigorously, we are performing targeted qRT-PCR for a panel of well‑characterized E2F/Dp target genes and myogenic markers in larval muscle versus non‑muscle tissues, following trehalose perturbation. Where technically feasible, testing whether partial knockdown of HaE2F or HaDp modifies the effect of trehalose manipulation on selected myogenic markers. These data, even if limited, will help to provide a more direct functional link, and we will include them in the manuscript if completed in time. In parallel, we will soften statements that imply a fully established, trehalose‑specific regulation of E2F/Dp and instead present this as a strong candidate pathway suggested by the current data.

      We fully agree that tissue‑resolved analyses are essential to move from systemic correlations to causality in muscle. We are in the process of standardizing larval muscle dissections and isolating thoracic/abdominal body wall muscle for trehalose, glycogen, and expression assays. Comparing expression of key metabolic and myogenic genes in muscle versus fat body and midgut, under trehalose manipulation. These tissue‑resolved data will directly address whether the transcriptional changes we report are preferentially localized to muscle.

      We are grateful for the reviewer’s critical but encouraging comments. We will moderate our central claims, also explicitly consider and discuss alternative explanations. Further, we will add tissue‑resolved and more focused mechanistic data as far as possible within the current revision. We believe these changes will substantially strengthen the manuscript and better align our conclusions with the evidence we presently have.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work by Mohite et al., they have used transcriptomic and metabolic profiling of H. armigera, muscle development, and S. frugiperda to link energy trehalose metabolism and muscle development. They further used several different bioinformatics tools for network analysis to converge upon transcriptional control as a potential mechanism of metabolite-regulated transcriptional programming for muscle development. The authors have also done rescue experiments where trehalose was provided externally by feeding, which rescues the phenotype. Though the study is exciting, there are several concerns and gaps that lead to the current results as purely speculative. It is difficult to perform any genetic experiments in non-model insects; the authors seem to suggest a similar mechanism could also be applicable in systems like Drosophila; it might be possible to perform experiments to fill some missing mechanistic details.

      A few specific comments below:

      The authors used N-(phenylthio) phthalimide (NPP), a trehalose-6-phosphate phosphatase (TPP) inhibitor. They also find several genes, including enzymes of trehalose metabolism, that change. Further, several myogenic genes are downregulated in bulk RNA sequencing. The major caveat of this experiment is that the NPP treatment leads to reduced muscle development, and so the proportion of the samples from the muscles in bulk RNA sequencing will be relatively lower, which might have led to the results. So, a confirmatory experiment has to be performed where the muscle tissues are dissected and sequenced, or some of the interesting targets could be validated by qRT-PCR. Further to overcome the off-target effects of NPP, trehalose rescue experiments could be useful.

      Thank you for this valuable comment. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. This will help determine whether the gene expression patterns observed in the RNA-seq data are muscle-specific or systemic.

      Even the reduction in the levels of ADP, NAD, NADH, and NMN, all of which are essential for efficient energy production and utilization, could be due to the loss of muscles, which perform predominantly metabolic functions due to their mitochondria-rich environment. So it becomes difficult to judge if the levels of these energy molecules' reduction are due to a cause or effect.

      We thank the reviewer for this thoughtful comment and agree that reduced levels of ADP, NAD, NADH, and NMN could arise either from a disturbance of energy metabolism or from loss of mitochondria‑rich muscles. Our current data cannot fully separate these two possibilities. Still, several studies support the interpretation that perturbing trehalose metabolism causes a primary systemic energy deficit that is coupled to mitochondrial function, not merely a passive consequence of tissue loss.

      For example:

      (1) Our previous study in H. armigera showed that chemical inhibition of trehalose synthesis results in depletion of trehalose, glucose, glucose‑6‑phosphate, and suppression of the TCA cycle, indicating reduced energy levels and dysregulated fatty‑acid oxidation (Tellis et al., 2023).

      (2) Chang et al. (2022) showed that trehalose catabolism and mitochondrial ATP production are mechanistically linked. HaTreh1 localizes to mitochondria and physically interacts with ATP synthase subunit α. 20‑hydroxyecdysone increases HaTreh1 expression, enhances its binding to ATP synthase, and elevates ATP content, while knockdown of HaTreh1 or HaATPs‑α reduces ATP levels.

      (3) Similarly, our previous study inhibition of Treh activity in H. armigera generates an “energy‑deficient condition” characterized by deregulation of carbohydrate, protein, fatty‑acid, and mitochondria‑related pathways, and a concomitant reduction in key energy metabolites (Tellis et al., 2024).

      (4) The starvation study in H. armigera has shown that reduced hemolymph trehalose is associated with respiratory depression and large‑scale reprogramming of glycolysis and fatty‑acid metabolism (Jiang et al., 2019).

      These findings support a direct coupling between trehalose availability and systemic energy/redox state. Therefore, the coordinated decrease in ADP, NAD, NADH, and NMN following TPS/TPP silencing is consistent with a primary disturbance of systemic energy and mitochondrial metabolism rather than exclusively a secondary consequence of muscle loss. We agree, however, that the present whole‑larva metabolite measurements do not allow a quantitative partitioning between changes due to altered muscle mass and those due to intrinsic metabolic impairment at the cellular level. Thus, tissue-specific quantification of these metabolites would allow us to directly test whether altered energy metabolites are a cause or consequence of muscle loss.

      References:

      (1) Tellis, M. B., Mohite, S. D., Nair, V. S., Chaudhari, B. Y., Ahmed, S., Kotkar, H. M., & Joshi, R. S. (2024). Inhibition of Trehalose Synthesis in Lepidoptera Reduces Larval Fitness. Advanced Biology, 8(2), 2300404.

      (2) Chang, Y., Zhang, B., Du, M., Geng, Z., Wei, J., Guan, R., An, S. and Zhao, W., 2022. The vital hormone 20-hydroxyecdysone controls ATP production by upregulating the binding of trehalase 1 with ATP synthase subunit α in Helicoverpa armigera. Journal of Biological Chemistry, 298(2).

      (3) Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Jiang, T., Ma, L., Liu, X.Y., Xiao, H.J. and Zhang, W.N., 2019. Effects of starvation on respiratory metabolism and energy metabolism in the cotton bollworm Helicoverpa armigera (Hübner)(Lepidoptera: Noctuidae). Journal of Insect Physiology, 119, p.103951.

      The authors have used this transcriptomic data for pathway enrichment analysis, which led to the E2F family of transcription factors and a reduction in the level of when trehalose metabolism is perturbed. EMSA experiments, though, confirm a possibility of the E2F interaction with the HaTPS/TPP promoter, but it lacks proper controls and competition to test the actual specificity of this interaction. Several transcription factors have DNA-binding domains and could bind any given DNA weakly, and the specificity is ideally known only from competitive and non-competitive inhibition studies.

      We thank the reviewer for this important comment and fully agree that EMSA alone, without appropriate competition and control reactions, cannot establish the specificity or functional relevance of a transcription factor-DNA interaction. In our study, we found the E2F family from GRN analysis of the RNA seq data obtained upon HaTPS/TPP silencing, suggesting a potential regulatory connection. After that, we predicted E2F binding sites on the promoter of HaTPS/TPP. The EMSA experiments were intended as preliminary evidence that E2F can associate with the HaTPS/TPP promoter in vitro. We will clarify this in the manuscript by softening our conclusion to indicate that our data support a “possible E2F-HaTPS/TPP interaction”. We also perform EMSA with specific and non‑specific competitors to confirm the E2F binding to the HaTPS/TPP promoter.

      The work seems to have connected the trehalose metabolism with gene expression changes, though this is an interesting idea, there are no experiments that are conclusive in the current version of the manuscript. If the authors can search for domains in the E2F family of transcription factors that can bind to the metabolite, then, if not, a chip-seq is essential to conclusively suggest the role of E2F in regulating gene expression tuned by the metabolites.

      A previous study in D. melanogaster, Zappia et al., (2016) showed vital role of E2F in skeletal muscle required for animal viability. They have shown that Dp knockdown resulted in reduced expression of genes encoding structural and contractile proteins, such as Myosin heavy chain (Mhc), fln, Tropomyosin 1 (Tm1), Tropomyosin 2 (Tm2), Myosin light chain 2 (Mlc2), sarcomere length short (sals) and Act88F, and myogenic regulators, such as held out wings (how), Limpet (Lmpt), Myocyte enhancer factor 2 (Mef2) and spalt major (salm). Also, ChiP-qRT-PCR showed upstream regions of myogenic genes, such as how, fln, Lmpt, sals, Tm1 and Mef2, were specifically enriched with E2f1, E2f2, and Dp antibodies in comparison with a nonspecific antibody. Further, Zappia et al. (2019) reported a chip-seq dataset that suggests that E2F/Dp directly activates the expression of glycolytic and mitochondrial genes during muscle development. Zappia et al., (2023) showed the regulation of one of the glycolytic genes, Phosphoglycerate kinase (Pgk) by E2F during Drosophila development.

      However, the regulation of trehalose metabolic genes by E2F/Dp and vice versa was not studied previously. So here in our study, we tried to understand the correlation of trehalose metabolism and E2F/Dp in the muscle development of H. armigera.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Rogers, A., Islam, A.B. and Frolov, M.V., 2019. Rbf activates the myogenic transcriptional program to promote skeletal muscle differentiation. Cell reports, 26(3), pp.702-719.

      (3) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      Some of the above concerns are partially addressed in experiments where silencing of E2F/Dp shows similar phenotypes as with NPP and dsRNA. It is also notable that silencing any key transcription factor can have several indirect effects, and delayed pupation and lethality could not be definitely linked to trehalose-dependent regulation.

      Yes. It’s true that silencing of any key transcription factor can have several indirect effects. Our intention was not to argue that delayed pupation and lethality are exclusively due to trehalose-dependent regulation, but that E2F/Dp and HaTPS/TPP silencing showed a consistent set of phenotypes and molecular changes, such as (i) transcriptomic enrichment of E2F targets upon trehalose perturbation, (ii) reduced HaTPS/TPP expression following E2F/Dp silencing, (iii) reduced myogenic gene expression that parallels the phenotypes observed with HaTPS/TPP silencing and (iv) restoration of E2F and Dp expression in E2F/Dp‑silenced insects upon trehalose feeding in the rescue assay. Together, these findings support a functional association between E2F/Dp and trehalose homeostasis. At the same time, we fully acknowledge that these results do not exclude additional, trehalose‑independent roles of E2F/Dp in development.

      Trehalose rescue experiments that rescue phenotype and gene expression are interesting. But is it possible that the fed trehalose is metabolized in the gut and might not reach the target tissue? In which case, the role of trehalose in directly regulating transcription factors becomes questionable. So, a confirmatory experiment is needed to demonstrate that the fed trehalose reaches the target tissues. This could possibly be done by measuring the trehalose levels in muscles post-rescue feeding. Also, rescue experiments need to be done with appropriate control sugars.

      Yes, it’s possible that, to some extent, trehalose is metabolized in the gut. Even though trehalase is present in the insect gut, some of the trehalose will be absorbed via trehalose transporters on the gut lining. Trehalose feeding was not rescued in insects fed with the control diet (empty vector and dsHaTPP), which contains chickpea powder, which is composed of an ample amount of amino acids and carbohydrates. Insects fed exclusively on a trehalose-containing diet are rescued, but not on a control diet that contains other carbohydrates. We agree that direct measurement of trehalose in target tissues will provide important confirmation. In the manuscript, we will measure trehalose levels in muscle, gut, and haemolymph after trehalose feeding.

      No experiments are performed with non-target control dsRNA. All the experiments are done with an empty vector. But an appropriate control should be a non-target control.

      Yes, there was no experiment with non-target dsRNA. Earlier, we have optimized a protocol for dsRNA delivery and its effectiveness in target knockdown (concentration, time) experiment, and published several research articles using a similar protocol:

      (1) Chaudhari, B.Y., Nichit, V.J., Barvkar, V.T. and Joshi, R.S., 2025. Mechanistic insights in the role of trehalose transporter in metabolic homeostasis in response to dietary trehalose. G3: Genes, Genomes, Genetics, p. jkaf303.

      (2) Barbole, R.S., Sharma, S., Patil, Y., Giri, A.P. and Joshi, R.S., 2024. Chitinase inhibition induces transcriptional dysregulation altering ecdysteroid-mediated control of Spodoptera frugiperda development. Iscience, 27(3).

      (3) Patil, Y.P., Wagh, D.S., Barvkar, V.T., Gawari, S.K., Pisalwar, P.D., Ahmed, S. and Joshi, R.S., 2025. Altered Octopamine synthesis impairs tyrosine metabolism affecting Helicoverpa armigera vitality. Pesticide Biochemistry and Physiology, 208, p.106323.

      (4) Tellis, M.B., Chaudhari, B.Y., Deshpande, S.V., Nikam, S.V., Barvkar, V.T., Kotkar, H.M. and Joshi, R.S., 2023. Trehalose transporter-like gene diversity and dynamics enhances stress response and recovery in Helicoverpa armigera. Gene, 862, p.147259.

      (5) Joshi, K.S., Barvkar, V.T., Hadapad, A.B., Hire, R.S. and Joshi, R.S., 2025. LDH-dsRNA nanocarrier-mediated spray-induced silencing of juvenile hormone degradation pathway genes for targeted control of Helicoverpa armigera. International Journal of Biological Macromolecules, p.148673.

      The same vector backbone and preparation procedures were used for both control and experimental constructs, allowing us to specifically compare the effects of the target dsRNA. The phenotypes and gene expression changes we observed were specific to the target genes and were not seen in the empty vector controls, suggesting that the effects are not due to nonspecific responses of dsRNA delivery or vector components.<br /> We acknowledge your suggestions, and in future studies, we will keep non-target dsRNA as a control in silencing assays.

      Reviewer #2 (Public review):

      Summary:

      This study shows that the knockdown of the effects of TPS/TPP in Helicoverpa armigera and Spodoptera frugiperda can be rescued by trehalose treatment. This suggests that trehalose metabolism is necessary for development in the tissues that NPP and dsRNA can reach.

      Strengths:

      This study examines an important metabolic process beyond model organisms, providing a new perspective on our understanding of species-specific metabolism equilibria, whether conserved or divergent.

      Weaknesses:

      While the effects observed may be truly conserved across Lepidopterans and may be muscle-specific, the study largely relies on one species and perturbation methods that are not muscle-specific. The technical limitations arising from investigations outside model systems, where solid methods are available, limit the specificity of inferences that may be drawn from the data.

      Thank you for this potting out this experimental weakness. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. We will also perform metabolite analysis with muscle samples. This will help to determine whether the observed gene expression patterns and metabolite changes are muscle-specific or systemic.

      Reviewer #3 (Public review):

      The hypothesis is that Trehalose metabolism regulates transcriptional control of muscle development in lepidopteran insects.

      The manuscript investigates the role of Trehalose metabolism in muscle development. Through sequencing and subsequent bioinformatics analysis of insects with perturbed trehalose metabolism (knockdown of TPS/TPP), the authors have identified transcription factor E2F, which was validated through RT-PCR. Their hypothesis is that trehalose metabolism regulates E2F, which then controls the myogenic genes. Counterintuitive to this hypothesis, the investigators perform EMSAs with the E2F protein and promoter of the TPP gene and show binding. Their knockdown experiments with Dp, the binding partner of E2F, show direct effect on several trehalose metabolism genes. Similar results are demonstrated in the trehalose feeding experiment, where feeding trehalose leads to partial rescue of the phenotype observed as a result of Dp knockdown. This seems contradictory to their hypothesis. Even more intriguing is a similar observation between paramyosin, a structural muscle protein, and E2F/Dp - they show that paramyosin regulates E2F/Dp and E2F/Dp regulated paramyosin. The only plausible way to explain the results is the existence of a feed-forward loop between TPP-E2F/Dp and paramyosin-E2F/Dp. But the authors have mentioned nothing in this line. Additionally, I think trehalose metabolism impacts amino acid content in insects, and that will have a direct bearing on muscle development. The sequencing analysis and follow-up GSEA studies have demonstrated enrichment of several amino acid biosynthetic genes. Yet authors make no efforts to measure amino acid levels or correlate them with muscle development. Any study aiming to link trehalose metabolism and muscle development and not considering the above points will be incomplete.

      We appreciate the reviewer’s efforts in the careful evaluation of this manuscript and constructive comments. From our and earlier data we found it was difficult to consider linear pathway “trehalose → E2F → muscle,” but rather a regulatory module in which trehalose metabolism and E2F/Dp form an interdependent circuit controlling myogenic genes. E2F/Dp binds and activates trehalose metabolism genes (TPS/TPP, Treh1) and myogenic structural genes, consistent with EMSA (TPS/TPP-E2F) and predicted binding sites of E2F on metabolic genes, Treh1, Pgk, and myogenic genes such as Act88F, Prm, Tm1, Fln, etc. At the same time, perturbing trehalose synthesis reduces E2F/Dp expression and myogenic gene expression, and trehalose feeding partially restores all three. This bidirectional influence is similar to E2F‑dependent control of carbohydrate metabolism and systemic sugar homeostasis described in D. melanogaster, where E2F/Dp both regulates metabolic genes and is itself constrained by metabolic state (Zappia et al., 2023a; Zappia et al., 2021).

      The reciprocal regulation between Prm and E2F/Dp is indeed intriguing. Rather than a paradox, we interpret this as evidence that E2F/Dp couples metabolic genes and structural muscle genes within a shared module, and that key sarcomeric components (such as paramyosin) feed back on this transcriptional program. Similar cross‑talk between E2F‑controlled metabolic programs and tissue function has been documented in D. melanogaster muscle and fat body, where E2F loss in one tissue elicits systemic changes in the other (Zappia et al., 2021). For further confirmation of E2F-regulated Prm, we will perform EMSA on the Prm promoter with appropriate controls.

      We fully agree that amino‑acid metabolism is a critical missing piece. In the manuscript, we will quantify the amino acid levels and include the results: “Amino acids display differential levels showing cysteine, leucine, histidine, valine, and proline showed significant reductions, while isoleucine and lysine showed non-significant reductions upon trehalose metabolism perturbation. These results are consistent with previous reports published by Tellis et al. (2024) and Shi et al. (2016)”. We will reframe our conclusions more cautiously as establishing a trehalose-E2F/Dp-muscle development, while stating that “definitive causal links via amino‑acid metabolism remain to be demonstrated”.

      Reference:

      (1) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      (3)Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Shi, J.F., Xu, Q.Y., Sun, Q.K., Meng, Q.W., Mu, L.L., Guo, W.C. and Li, G.Q., 2016. Physiological roles of trehalose in Leptinotarsa larvae revealed by RNA interference of trehalose-6-phosphate synthase and trehalase genes. Insect Biochemistry and Molecular Biology, 77, pp.52-68.

      Author response image 1.

      The result section of the manuscript is quite concise, to my understanding (especially the initial few sections), which misses out on mentioning details that would help readers understand the paper better. While technical details of the methods should be in the Materials and Methods section, the overall experimental strategy for the experiments performed should be explained in adequate detail in the results section itself or in figure legends. I would request authors to include more details in the results section. As an extension of the comment above, many times, abbreviations have been used without introducing them. A thorough check of the manuscript is required regarding this.

      Thank you very much for pointing out this issue. We will revise the manuscript content according to these suggestions.

      The Spodoptera experiments appear ad hoc and are insufficient to support conservation beyond Helicoverpa. To substantiate this claim, please add a coherent, minimal set of Spodoptera experiments and present them in a dedicated subsection. Alternatively, consider removing these data and limiting the conclusions (and title) to H. armigera.

      We thank the reviewer for this helpful comment. We agree that, in this current version of the manuscript, the S. frugiperda experiments are not sufficiently systematic to support strong claims about conservation beyond H. armigera. Our primary focus in this study is indeed on H. armigera, and the addition of the S. frugiperda data was intended only as preliminary, supportive evidence rather than a central component of our conclusions. To avoid over‑interpretation and to keep the manuscript focused and coherent, we will remove all S. frugiperda data from the revised version, including the corresponding text and figures. We will also adjust the title, abstract, and conclusion to clearly state that our findings are limited to H. armigera.

      In order to check the effects of E2F/Dp, a dsRNA-mediated knockdown of Dp was performed. Why was the E2F protein, a primary target of the study, not chosen as a candidate? The authors should either provide justification for this or perform the suggested experiments to come to a conclusion. I would like to point out that such experiments were performed in Drosophila.

      Thank you for this thoughtful comment and the specific suggestion. We agree that directly targeting E2F would, in principle, be an informative complementary approach. In our study, however, we prioritized Dp knockdown for two main reasons. First, E2F is a large family, and E2F-Dp functions as an obligate heterodimer. Previous work in D. melanogaster has shown that depletion of Dp is sufficient to disrupt E2F-dependent transcription broadly, often with more efficient loss of complex activity than targeting individual E2F isoforms (Zappia et al., 2021; Zappia et al., 2016). Second, in our preliminary trials, we performed a dsRNA feeding assay with dsHaE2F, dsHaDp, and combined dsHaE2F plus dsHaDp. In that assay, we did not achieve silencing of E2F in dsRNA targeting HaE2F (dsHaE2F). So here, as E2F is a large family, other E2F isoforms may be compensating for the silencing effect of targeted HaE2F. However, HaE2F showed significantly reduced expression upon dsHaDp and combined dsHaE2F plus dsHaDp feeding (Figure A), whereas HaDp showed a significant reduction in its expression in all three conditions (Figure B).  As we observed reduced expression of both HaE2F and HaDp upon combined feeding of dsHaE2F and dsHaDp, we further performed a rescue assay by exogenous feeding of trehalose. We observed the significant upregulation of HaE2F, HaDp, trehalose metabolic genes (HaTPS/TPP and HaTreh1), and myogenic genes (HaPrm and HaTm2) (Figure C). For these reasons, we focused on Dp silencing as a more reliable way to impair E2F/Dp complex function in H. armigera.

      Author response image 2.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      Silencing of HaDp resulted in a significant decrease in HaE2F expression. I find this observation intriguing. DP is the cofactor of E2F, and they both heterodimerise and sit on the promoter of target genes to regulate them. I would request authors to revisit this result, as it contradicts the general understanding of how E2F/Dp functions in other organisms. If Dp indeed controls E2F expression, then further experiments should be conducted to come to a conclusion convincingly. Additionally, these results would need thorough discussion with citations of similar results observed for other transcription factor-cofactor complexes.

      Thank you for highlighting this point and for prompting us to examine these data more carefully. Silencing HaDp leading to reduced HaE2F mRNA is indeed unexpected if one only considers the canonical view of E2F/Dp as a heterodimer that co-occupies target promoters without strongly regulating each other’s expression. However, several lines of work suggest that transcription factor-cofactor networks frequently include feedback loops in which cofactors influence the expression of their partner TFs. First, in multiple systems, transcription factors and their cofactors are known to regulate each other’s transcription, forming positive or negative feedback loops. For example, in hematopoietic cells, the transcription factor Foxp3 controls the expression of many of its own cofactors, and some of these cofactors in turn facilitate or stabilize Foxp3 expression, forming an interconnected regulatory network rather than a simple one‑way interaction (Rudra et al., 2012). Second, E2F/Dp complexes exhibit non‑canonical regulatory mechanisms and can regulate broad sets of targets, including other transcriptional regulators. Several studies show that E2F/Dp proteins not only control classical cell‑cycle genes but also participate in diverse processes such as DNA damage signaling, mitochondrial function, and differentiation (Guarner et al., 2017; Ambrus et al., 2013; Sánchez-Camargo et al., 2021). In D. melanogaster, complete loss of dDP alters the expression of direct targets E2F/DP, including dATM (Guarner et al., 2017).

      All these reports indicate that the E2F-Dp complex sits at the top of multi‑layer regulatory hierarchies. Such architectures make it plausible that Dp silencing in H. armigera could modulate HaE2F expression in a non-canonical way.

      References:

      (1) Rudra, D., DeRoos, P., Chaudhry, A., Niec, R.E., Arvey, A., Samstein, R.M., Leslie, C., Shaffer, S.A., Goodlett, D.R. and Rudensky, A.Y., 2012. Transcription factor Foxp3 and its protein partners form a complex regulatory network. Nature immunology, 13(10), pp.1010-1019.

      (2) Guarner, A., Morris, R., Korenjak, M., Boukhali, M., Zappia, M.P., Van Rechem, C., Whetstine, J.R., Ramaswamy, S., Zou, L., Frolov, M.V. and Haas, W., 2017. E2F/DP prevents cell-cycle progression in endocycling fat body cells by suppressing dATM expression. Developmental cell, 43(6), pp.689-703.

      (3) Ambrus, A.M., Islam, A.B., Holmes, K.B., Moon, N.S., Lopez-Bigas, N., Benevolenskaya, E.V. and Frolov, M.V., 2013. Loss of dE2F compromises mitochondrial function. Developmental cell, 27(4), pp.438-451.

      (4) Sánchez-Camargo, V.A., Romero-Rodríguez, S. and Vázquez-Ramos, J.M., 2021. Non-canonical functions of the E2F/DP pathway with emphasis in plants. Phyton, 90(2), p.307.

      I consider the overall bioinformatics analysis to remain very poorly described. What is specifically lacking is clear statements about why a particular dry lab experiments were conducted.

      We again thank the reviewer for advising us to give a biological context/motivation for every bioinformatics analysis performed. The bioinformatics analyses devised here, try to explain the systems-level perturbations of HaTPS/TPP silencing to explain the observed phenotype and to discover transcription factors potentially modulating the HaTPS/TPP induced gene regulatory changes.

      (1) Gene set enrichment analyses:

      Differential gene expression analyses of the bulk RNA sequencing data followed by qRT-PCR confirmed the transcriptional changes in myogenic genes and gene expression alterations in metabolic and cell cycle-related genes. These perturbations merely confirmed the effect induced by HaTPS/TPP silencing in obviously expected genes. We wanted to see whether using an “unbiased” system-level statistical analyses like gene set enrichment analyses (GSEA), can reveal both expected and novel biological processes that underlie HaTPS/TPP silencing. GSEA results revealed large-scale transcriptional changes in 11 enriched processes, including amino acid metabolism, energy metabolism, developmental regulatory processes, and motor protein activity. GSEA not only divulged overall transcriptionally enriched pathways but also identified the genes undergoing synchronized pathway-level transcriptional change upon HaTPS/TPP silencing.

      (2) Gene regulatory network analysis:

      Although GSEA uncovered potential pathway-level changes, we were also interested in identifying the gene regulatory network associated with such large-scale process-level transcriptional perturbations. Interestingly, the biological processes undergoing perturbations were also heterogeneous (e.g., motor protein activity, energy metabolism, amino acid metabolism, etc.). We hypothesized that the inference of a causal gene regulatory network associated with the genes associated with GSEA-enriched biological processes should predict core/master transcription factors that might synchronously regulate metabolic and non-metabolic processes related to HaTPS/TPP silencing, thereby providing a broad understanding of the perturbed phenotype. The gene regulatory network analysis statistically inferred an “active” gene regulatory network corresponding to the GSEA-enriched KEGG gene sets. Ranking the transcription factors (TFs) based on the number of outgoing connections (outdegree centrality) within the active gene regulatory network, E2F family TFs were identified to be top-ranking, highly connected transcription factors associated with the transcriptionally enriched processes. This suggests that E2F family TFs are central to controlling the flow of regulatory information within this network. Intriguingly, E2F has been previously implicated in muscle development in insects (Zappia et al., 2016). Further extracting the regulated targets of E2F family TFs within this network revealed the mechanistic connection with the 11 enriched processes. This GRN analysis was crucial in discovering and prioritizing E2F TFs as central transcription factors mediating HaTPS/TPP silencing effects, which was not apparent using trivial analyses like differential gene expression analysis.

      As per the reviewer’s suggestions, we will add these outlined points in the text of the manuscript (Results section) to further give context and clarity to the bioinformatics analyses conducted in this study.

      In my judgement, the EMSA analysis presented is technically poor in quality. It lacks positive and negative controls, does not show mutation analysis or super shifts. Also, it lacks any competition assays that are important to prove the binding beyond doubt. I am not sure why protein is not detected at all in lower concentrations. Overall, the EMSA assays need to be redone; I find the current results to be unacceptable.

      Thank you for pointing out this issue. We will reperform the EMSA analysis with appropriate controls.  Although the gel image was not clear, there was a light band of protein (indicated by the white square) observed in well No. 8, where we used 8 μg of E2F protein and 75 ng of HaTPS/TPP promoter, upon gel stained with SYPRO Ruby protein stain, suggesting weak HaTPS/TPP-E2F complex formation.

      GSEA studies clearly indicate enrichment of the amino acid synthesis gene in TPP knockdown samples. This supports the plausible theory that a lack of Trehalose means a lack of enough nutrients, therefore less of that is converted to amino acids, and therefore muscle development is compromised. Yet the authors make no effort to measure amino acid levels. While nutrients can be sensed through signalling pathways leading to shut shutdown of myogenic genes, a simple and direct correlation between less raw material and deformed muscle might also be possible.

      We quantified amino acid levels as per the suggestion, and we observed differential levels of amino acids upon trehalose metabolism perturbation.

      However, we observed that insect were failed to rescue when fed a control chickpea-based artificial diet that contained nutrients required for normal growth and development. Based on this observation, we conclude that trehalose deficiency is the only possible cause for the defect in muscle development.

      The authors are encouraged to stick to one color palette while demonstrating sequencing results. Choosing a different color palette for representing results from the same sequencing analysis confuses readers.

      Thank you for the comment. We will revise the color palette as per the suggestion.

      Expression of genes, as understood from sequencing analysis in Figure 1D, Figure 2F, and Figure 3D, appears to be binary in nature. This result is extremely surprising given that the qRT-PCR of these genes have revealed a checker and graded expression.

      Thank you for pointing out this issue. We will revise the scale range for these figures to get more insights about gene expression levels and include figures as per the suggestion.

      In several graphs, non-significant results have been interpreted as significant in the results section. In a few other cases, the reported changes are minimal, and the statistical support is unclear; please recheck the analyses and include exact statistics. In the results section, fold changes observed should be discussed, as well as the statistical significance of the observed change.

      We will revise the analyses and include exact statistics as per the suggestion.

      Finally, I would add that trehalose metabolism regulates cell cycle genes, and muscle development genes establish correlation and causation. The authors should ensure that any comments they make are backed by evidence.

      We thank the reviewer for this insightful comment.  Although direct evidence in insects is currently lacking, multiple independent studies in yeast, plants and mammalian systems support a regulatory link between trehalose metabolism and the cell cycle. In budding yeast Saccharomyces cerevisiae, neutral Treh (Nth1) is directly phosphorylated and activated by the major cyclin‑dependent kinase Cdk1 at G1/S, routing stored trehalose into glycolysis to fuel DNA replication and mitosis (Ewald et al., 2016). CDK‑dependent regulation of trehalase activity has also been reported in plants, where CDC28‑mediated phosphorylation channels glucose into biosynthetic pathways necessary for cell proliferation (Lara-núñez et al., 2025). Furthermore, budding yeast cells accumulate trehalose and glycogen upon entry into quiescence and subsequently mobilize these stores to generate a metabolic “finishing kick” that supports re‑entry into the cell cycle (Silljé et al., 1999; Shi et al., 2010). Exogenous trehalose that perturbs the trehalose cycle impairs glycolysis, reduces ATP, and delays cell cycle progression in S. cerevisiae, highlighting a dose‑ and context‑dependent control of growth versus arrest (Zhang, Zhang and Li, 2020). In mammalian systems, trehalose similarly modulates proliferation-differentiation decisions. In rat airway smooth muscle cells, low trehalose concentrations promote autophagy, whereas higher doses induce S/G2–M arrest, downregulate Cyclin A1/B1, and trigger apoptosis, indicating a shift from controlled growth to cell elimination at higher exposure (Xiao et al., 2021). In human iPSC‑derived neural stem/progenitor cells, low‑dose trehalose enhances neuronal differentiation and VEGF secretion, while higher doses are cytotoxic, again highlighting a tunable impact on cell‑fate outcomes (Roose et al., 2025). In wheat, exogenous trehalose under heat stress reduces growth, lowers auxin, gibberellin, abscisic acid and cytokinin levels, and represses CycD2 and CDC2 expression, suggesting that trehalose signalling integrates with hormone pathways and core cell‑cycle regulators to restrain proliferation during stress (Luo, Liu, and Li, 2021). Together, these studies showed the importance of trehalose metabolism in cell‑cycle regulation to decide whether cells and tissues proliferate, differentiate, or remain quiescent.

      With respect to muscle development, previous work has implicated glycolytic metabolism in myogenesis and muscle growth. Tixier et al. (2013) showed that loss of key glycolytic genes results in abnormally thin muscles, while Bawa et al. (2020) demonstrated that loss of TRIM32 decreases glycolytic flux and reduces muscle tissue size. These findings indicate that carbohydrate and energy metabolism pathways are important determinants of muscle structure and growth. However, there are no previous studies about the role of trehalose metabolism in muscle development, other than as an energy source, so here we specifically set out to establish the involvement of trehalose metabolism in muscle development.

      References:

      (1) Ewald, J.C. et al. (2016) “The yeast cyclin-dependent kinase routes carbon fluxes to fuel cell cycle progression,” Molecular cell, 62(4), pp. 532–545.

      (2) Lara-núñez, A. et al. (2025) “The Cyclin-Dependent Kinase activity modulates the central carbon metabolism in maize during germination,” (January), pp. 1–16.

      (3) Silljé, H.H.W. et al. (1999) “Function of trehalose and glycogen in cell cycle progression and cell viability in Saccharomyces cerevisiae,” Journal of bacteriology, 181(2), pp. 396–400.

      (4) Shi, L. et al. (2010) “Trehalose Is a Key Determinant of the Quiescent Metabolic State That Fuels Cell Cycle Progression upon Return to Growth,” 21, pp. 1982–1990.

      (5) Zhang, X., Zhang, Y. and Li, H. (2020) “Regulation of trehalose, a typical stress protectant, on central metabolisms, cell growth and division of Saccharomyces cerevisiae CEN. PK113-7D,” Food Microbiology, 89, p. 103459.

      (6) Xiao, B. et al. (2021) “Trehalose inhibits proliferation while activates apoptosis and autophagy in rat airway smooth muscle cells,” Acta Histochemica, 123(8), p. 151810.

      (7) Roose, S.K. et al. (2025) “Trehalose enhances neuronal differentiation with VEGF secretion in human iPSC-derived neural stem / progenitor cells,” Regenerative Therapy, 30, pp. 268–277.

      (8) Luo, Y., Liu, X. and Li, W. (2021) “Exogenously-supplied trehalose inhibits the growth of wheat seedlings under high temperature by affecting plant hormone levels and cell cycle processes,” Plant Signaling & Behavior, 16(6).

      (9) Tixier, V., Bataillé, L., Etard, C., Jagla, T., Weger, M., DaPonte, J.P., Strähle, U., Dickmeis, T. and Jagla, K., 2013. Glycolysis supports embryonic muscle growth by promoting myoblast fusion. Proceedings of the National Academy of Sciences, 110(47), pp.18982-18987.

      (10) Bawa, S., Brooks, D.S., Neville, K.E., Tipping, M., Sagar, M.A., Kollhoff, J.A., Chawla, G., Geisbrecht, B.V., Tennessen, J.M., Eliceiri, K.W. and Geisbrecht, E.R., 2020. Drosophila TRIM32 cooperates with glycolytic enzymes to promote cell growth. elife, 9, p.e52358.

      Finally, we appreciate the meticulous review of this manuscript and constructive comments. We will perform the recommended experiments, data analysis, and revise the manuscript accordingly.

    1. eLife Assessment

      This study offers important methodological advances for CRISPR-based mutagenesis in mice, highlighting the potential of founder animals for early phenotypic characterization. The authors present convincing evidence, supported by rigorous experimental design, that "crispant" (F0) analysis in mice, despite prior concerns about genetic mosaicism, can be utilized to assess protein function.

    2. Reviewer #1 (Public review):

      Summary:

      This study evaluates the feasibility of using crispant founder mice, first-generation animals directly edited by CRISPR/Cas9, for initial phenotypic assessments. The authors target seven genes known to produce visible recessive traits to test whether mosaicism in founder animals prevents meaningful phenotype-genotype interpretation. Remarkably, they observe clear null phenotypes in founders for six of the seven genes, with high editing efficiencies. These results demonstrate that crispant mice can, under specific conditions, display recessive phenotypes that are readily interpretable. However, this conclusion should be moderated, as the study addresses only one biological context, visible Mendelian traits, and may not generalize to quantitative, subtle, or late-onset phenotypes. The report also examines attempts at multiplex CRISPR targeting, which reduce viability, underscoring limits for concurrent gene disruptions. Finally, the detailed description of diverse alleles generated by CRISPR provides valuable insight into how allelic series can be exploited to investigate protein function.

      Strengths:

      (1) The manuscript provides a comprehensive and technically rigorous description of CRISPR/Cas9‑induced mutations across several loci. The accompanying genotyping, sequencing, and analytical approaches are sound, complementary, and well-detailed, providing a resource that will be valuable to researchers using genome editing beyond the specific application of genetic screening.

      (2) By documenting a wide diversity of alleles and mutation types, the study contributes to understanding how allelic series generated by CRISPR can be leveraged for dissecting protein function, a perspective that has been less systematically presented in prior literature and could be compared to targeted strategies such as those described by Cassidy et al. (2022, DOI: [10.1016/bs.mie.2022.03.053]) or other relevant studies addressing CRISPR-based allelic series generation.

      (3) The work demonstrates technically solid editing and validation workflows, setting a methodological reference point for similar projects across species or trait categories.

      Weaknesses:

      (1) There is a disconnect between the abstract/introduction and the discussion. While both the abstract and introduction focus on the potential use of crispant founders for phenotypic assessment in the context of genetic screening, with the introduction notably emphasizing this framework through a detailed section on ENU-based screens, the discussion devotes relatively little attention to this aspect. Instead, it primarily examines CRISPR mutagenesis outcomes, mutation detection, and allele characterization. Overall, the study's aims are not clearly or explicitly defined, which contributes to the lack of alignment across sections.

      (2) Important limitations of the approach are not sufficiently discussed. For instance, the paper does not address how applicable the findings are beyond visible Mendelian traits, such as for quantitative, late-onset, or more subtle phenotypes, including behavioral ones, or how to interpret wild-type appearing founders. There is little consideration of appropriate experimental controls (e.g., wild-type or mock-edited animals) or of how many animals might be required to robustly establish genotype-phenotype associations.

      (3) The conclusion that this strategy will "dramatically reduce time, resources, and animal numbers" is not quantitatively supported by the data presented and should be expressed more cautiously.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to validate the use of genetic screening pipelines that assess phenotypes in founders (F0, referred to as "crispants") obtained from CRISPR/Cas9 gene editing in 1-cell zygotes. The application of this approach in mice has generally been avoided due to concerns that results would be confounded by genetic mosaicism, but benefits to this approach include reducing animal numbers needed to achieve goals of identifying knockout phenotypes, as well as improved efficiency in the use of time and resources. The authors targeted seven genes associated with visible recessive phenotypes and observed the expected null phenotype in up to 100% of founders for each gene. Although mosaicism was common in the crispants, the various alleles were generally all functional null alleles and, in fact, some in-frame deletions with null phenotypes revealed critical functional motifs within the gene products. The rigorous data presented support using crispants to assess knockout phenotypes when guide RNAs with strong on-target and low off-target scores are used for gene editing in 1-cell mouse embryos.

      Strengths:

      By targeting multiple genes with existing, well-characterized mutations, the authors established a robust system for validating the analysis of crispants to assess gene function.

      Cutting-edge technologies were used to carefully assess the spectrum of mutations generated.

      Weaknesses:

      There could have been some discussion regarding how this approach would be impacted if mutations are dominant or embryonic lethal (for the latter, for example, F0 can be examined as embryos).

    4. Reviewer #3 (Public review):

      Summary:

      The study assesses whether CRISPR-generated founder (F0) "crispant" mice can be reliably used for initial phenotypic assessment and screening. By targeting seven genes with known visible recessive phenotypes, the authors show that, despite genetic mosaicism, the expected null phenotypes were observed in all targeted genes. These findings demonstrate that the phenotyping and screening of F0 "crispant" mice is a valid (and efficient) approach to selecting candidate alleles for follow-up studies, thereby streamlining mouse breeding and animal husbandry-related costs.

      Strengths:

      The study is comprehensive, carefully executed, and provides deep insight into the utility of F0 "crispant" mice for phenotypic screening. The authors evaluated the CRISPR/Cas9 editing outcomes across seven genes using multiple sequencing modalities, providing a robust framework for determining and interpreting complex founder genotypes. Importantly, the study examines/highlights the biological insight gained from compound heterozygous founders and naturally arising allelic series, enabling genotype-phenotype associations and functional inferences about protein domains.

      More broadly, the authors' thorough evaluation of the CRISPR/Cas9-based gene editing events in the founders can serve as a benchmark for others in the field, engineering their own mouse "crispants."

      Weaknesses:

      The relationship between the sgRNA/Cas9 concentrations delivered to the zygotes and the resulting editing efficiencies are not explicitly investigated.

    5. Author response:

      We would like to thank the reviewers for their detailed reading of our manuscript and for the constructive comments they have provided.

      We plan to make structural changes to the introduction and the discussion. Reviewer #1 describes the “disconnect between the abstract/introduction and the discussion”. We agree that “the study's aims are not clearly or explicitly defined”. We will edit the introduction to state our aim of investigating the factors that affect using “crispants” in mouse functional genomics. In the discussion, we described how our findings inform sgRNA choice to ensure biallelic gene disruption in founders and how our extensive genotyping methods enabled us to determine the molecular basis for the observed phenotype (explaining why some founders showed the expected recessive trait and why it was partial or absent in others). We also concluded from our attempts of multiplexing that this had too great an impact on viability to be useful. We will edit the discussion to better address our aim and to elaborate on several points raised by the reviewers (discussed in more detail below). Specifically, we will provide examples of screening situations where generating crispant mice may be useful, e.g. preliminary in vivo studies to follow up candidates identified in large-scale cellular screens. We will also provide more context about our assumptions underlying our statement that the use of crispants will “dramatically reduce time, resources, and animal numbers” compared to ENU mutagenesis (where recessive traits require breeding of G2 females with G1 males to achieve homozygosity of de novo mutations in G3 offspring) and the work needed to validate this. We will more clearly acknowledge that our proof-of-principle study used visible phenotypes that can be assessed in individual animals and then discuss how the use of crispants could be extended to the investigation of quantitative or late-onset traits using cohorts of crispants (discussed further below). We will also discuss the assessment of non-null alleles to dissect protein function, building on our unexpected finding that a single round of CRISPR/Cas9mediated mutagenesis can generate an allelic series.

      Reviewer #1 asked us to address “how to interpret wild-type appearing founders”. We have discussed the mechanisms underlying the wild-type appearing founders generated in this study. This is linked with concerns in the field that incomplete editing, transcripts escaping nonsense-mediated decay, and/or the presence of in-frame mutations that don’t disrupt protein function may lead to founders that appear wild-type or have a partial phenotype. We have shown that our electroporation protocol results in very high levels of editing, but that this must always be assessed during genotyping. We found that by using an sgRNA that targets a critical protein domain, you can ensure that short in-frame indels also disrupt protein function. In future studies that determine how strain background modifies a phenotype that has been established on one strain (e.g. C57BL/6J), wild-type appearing founders would suggest that the new strain background rescues the null phenotype. In future studies that determine the consequence of targeting a second gene on a mutant background, wild-type appearing founders would indicate that the second mutation supresses the phenotype associated with the mutant background. We will add this to the discussion section where we describe possible screening situations in which crispant mice would be useful.

      Reviewer #3 states that “the relationship between the sgRNA/Cas9 concentrations delivered to the zygotes and the resulting editing efficiencies are not explicitly investigated.” Members of The Centre for Phenogenomics (TCP) Transgenic Production Core who co-author this study (Lauryl Nutter, Marina Gertsenstein and Lauri Lintott) have published detailed protocols on mouse model production, which we cite in this paper (PMID: 30040228; PMID: 33524495; PMID: 39999224). In PMID: 33524495, they tested a two-fold difference in Cas9 RNP concentrations for generating knock-out alleles. Using their optimised protocols for electroporation of one cell zygotes with RNPs, we achieved an extremely high editing rate. We did not vary the sgRNA/Cas9 concentrations as part of this study as our goal was to assess the ability to generate “complete” null animals. We do note, however, that by targeting two genes simultaneously whilst keeping the total RNP concentration constant (to avoid reagent toxicity), we halved the amount of each sgRNA and this did not lead to a decrease in editing efficiency. We will highlight this in the results/discussion section (as appropriate).

      Reviewer #1 asks about whether the use of crispants is applicable for “quantitative, late-onset, or more subtle phenotypes, including behavioral ones”. We are hopeful that this is possible and it is a priority for future studies. Crucially, cohorts of crispants can be generated in a single round of mutagenesis. Starting an experiment with ten donor females will produce ~100 zygotes, resulting in ~40 crispants. Power calculations must be performed to determine the size of the cohort required for the effect size and variability of the phenotype being studied, but many neurobehavioural studies use ~10 mutants vs ~10 controls. We note that sex and/or background genotype may mean that only some of the ~40 crispants produced can be used for phenotypic testing. This reviewer also raises the point about whether wild-type animals or mock-edited animals serve as the best controls. From work carried out by Lauryl Nutter and her colleagues from the IMPC (PMID: 37301944), we know that “wild-type” controls should ideally be from the same embryo pool as the crispants to avoid differences due to genetic drift within inbred colonies. This study also found that possible off-target mutations from CRISPR/Cas9-mediated mutagenesis is not an issue (despite a lot of attention in the literature). The suggestion of using mock-edited controls, resulting from zygotes that have gone through electroporation without RNP, addresses a possible need to control for the stress of undergoing the electroporation process. Our study shows that additional stress is caused by inducing and repairing a break in a neutral locus (EGFP). Controlling for these stressors may be particularly important when assessing behavioural phenotypes in crispants vs controls.

      Reviewer #2 states that “there could have been some discussion regarding how this approach would be impacted if mutations are dominant or embryonic lethal (for the latter, for example, F0 can be examined as embryos).” Our manuscript discusses how crispants could help with the study of genes that may be essential. Specifically, we stated that when CRISPR/Cas9-mediated mutagenesis fails to produce live pups, phenotypic assessment of crispant embryos could reveal whether targeting the gene impacts embryogenesis. Crispants can only be used to screen for recessive traits since both alleles are edited. The assessment of dominant traits is not addressed in our study and remains a challenge in the field. We note that CRISPRi screens in cultured cells reveal candidates that when partially downregulated lead to the desired phenotype. One possibility is to employ this set up in vivo using dCas9-KRAB transgenic mice (JAX stock #030000). We could add this point to the discussion section.

    1. eLife Assessment

      This study presents valuable data suggesting that ATP-induced modulation of alveolar macrophage (AM) functions is associated with NLRP3 inflammasome activation and enhanced phagocytic capacity. While the in vivo and in vitro data reveal an interesting phenotype, the evidence provided is incomplete and does not fully support the paper's conclusions. Additional investigations would be of value in complementing the data and strengthening the interpretation of the results. This study should be of interest to immunologists and the mucosal immunity community.

    2. Reviewer #1 (Public review):

      Summary:

      Alveolar macrophages (AMs) are key sentinel cells in the lungs, representing the first line of defense against infections. There is growing interest within the scientific community in the metabolic and epigenetic reprogramming of innate immune cells following an initial stress, which alters their response upon exposure to a heterologous challenge. In this study, the authors show that exposure to extracellular ATP can shape AM functions by activating the P2X7 receptor. This activation triggers the relocation of the potassium channel TWIK2 to the cell surface, placing macrophages in a heightened state of responsiveness. This leads to the activation of the NLRP3 inflammasome and, upon bacterial internalization, to the translocation of TWIK2 to the phagosomal membrane, enhancing bacterial killing through pH modulation. Through these findings, the authors propose a mechanism by which ATP acts as a danger signal to boost the antimicrobial capacity of AMs.

      Strengths:

      This is a fundamental study in a field of great interest to the scientific community. A growing body of evidence has highlighted the importance of metabolic and epigenetic reprogramming in innate immune cells, which can have long-term effects on their responses to various inflammatory contexts. Exploring the role of ATP in this process represents an important and timely question in basic research. The study combines both in vitro and in vivo investigations and proposes a mechanistic hypothesis to explain the observed phenotype.

      Weaknesses:

      The authors have revised the manuscript to address the comments raised during the first round of review. However, several figures, figure legends, and methodological sections still require additional adjustments and clarification.

      The interpretation of CFU from lysates as 'killing' is unclear; lysate CFUs typically reflect intracellular surviving bacteria and are confounded by differences in uptake. Please include an uptake control (early time point) or time-course to distinguish phagocytosis from intracellular killing. Also, clarify how bacterial burden was calculated (supernatant vs cell-associated vs total). Supernatant alone may not capture adherent bacteria. The normalization as 'fold killing' (mean negative control / sample) is non-standard; please report absolute CFU (log scale) and specify the exact definition of killing/survival.

      The Methods section is largely incomplete and requires substantial revision. For instance, the authors report quantification of cytokine concentrations, yet no information is provided regarding how these measurements were performed. It is unclear whether cytokines were measured in BALF by ELISA, or assessed at the mRNA level by qPCR from total lung lysates, or by another method. This information must be clearly specified. In addition, the rationale for selecting the measured cytokines should be justified. While the choice of IL-1β and IL-6 is relatively straightforward, the focus on IL-18 requires explicit justification.

      Similarly, the methodology used to quantify immune cell populations presented in Figure 2 is not described. It is not stated how immune cells were isolated and identified (e.g. flow cytometry from lung tissue). No information is provided regarding tissue digestion, cell isolation procedures, or gating strategy (presumably by flow cytometry). These details are essential and should be included, together with the corresponding gating strategy and absolute cell numbers.

      Moreover, immune cell quantification would be expected in the context of the challenge experiment as well. Reporting unchanged percentages of lung immune cells following ATP exposure does not support the conclusion of a training effect, particularly one that is specific to alveolar macrophages (AMs). In addition, AMs are not considered recruited immune cells; this should be corrected in the figure legend and throughout the manuscript where applicable.

      There are inconsistencies throughout the manuscript. For example, the authors report n = 5 for the survival curves in the figure legend, whereas n = 7 is stated in the Methods section. This discrepancy is unclear and should be clarified.

      Supplementary Fig. 1 contains major conceptual errors. The volcano plot represents ATAC-seq peaks (differentially accessible chromatin regions), yet the figure, legend, and color scale repeatedly refer to 'genes' and 'differentially expressed genes'. This conflates chromatin accessibility with gene expression and is misleading. Peaks are secondarily annotated to nearby genes, which should be clearly described as an annotation step rather than the unit of analysis. The figure should be revised to explicitly present peak-level statistics (DARs), with gene names shown only as optional annotations. Additionally, the use of simultaneous P < 0.05 and Q < 0.05 thresholds is non-standard, and the absence of down-regulated regions in the plot requires explanation.

      In Figure 7, trained WT and Nlrp3⁻/⁻ mice display similar levels of bacterial clearance. How should this result be interpreted?

      Overall, while the study addresses an interesting biological question, the manuscript would benefit from substantial revision prior to publication. In particular, clarifications and improvements regarding the methodology, data presentation, and interpretation are required to strengthen the rigor and reproducibility of the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Thompson et al. investigate the impact of prior ATP exposure on later macrophage functions as a mechanism of immune training. They describe that ATP training enhances bactericidal functions which they connect to the P2x7 ATP receptor, Nlrp3 inflammasome activation, and TWIK2 K+ movement at the cell surface and subsequently at phagosomes during bacterial engulfment. This is an incremental addition to existing literature, which has previously explored how ATP alters TWIK2 and K+, and linked it to Nlrp3 activation. The novelty here is in discovering the persistence of TWIK2 change and exploring the impact this biology may have on bacterial clearance. Additional experiments could strengthen their hypothesis that the in vivo protective effect of ATP-training on bacterial clearance is mediated by alveolar macrophages.

      Strengths:

      The authors demonstrate three novel findings: 1) prolonged persistence of TWIK2 at the macrophage plasma membrane following ATP that can translocate to the phagosome during particle engulfment, 2) a persistent impact of ATP exposure on remodeling chromatin around nlrp3, and 3) administering mice intra-nasal ATP to 'train' lungs protects mice from otherwise fatal bacterial infection.

      Weaknesses:

      (1) Some methods remain unclear including the timing and method by which lung cellularity was assessed in Figure 2. It is also difficult to understand how many mice were used in experiments 1, 2 and 6 and thus how rigorous the design was. A specific number is only provided for 1D and the number of mice stated in legend and methods do not match.

      (2) The study design is not entirely ideal for the authors' in vivo question. Overall, the discussion would benefit from a clear summary of study caveats, which are primarily that that 1) in vitro studies attributing ATP training-mediated bacterial killing to persistent TWIK2 relocation, K+ influx, a glycolytic metabolic shift , and epigenetic nlrp3 reprogramming were performed in BMDM or RAW cells and not primary AMs, 2) data does not eliminate the possibility that non-AM immune or non-immune cells in the lung are "trained" and responsible for ATP-mediated protection in vivo; flow data examined total lung digest which may obscure important changes in alveolar recruitment, and 3) in vivo work shows data on acute bacterial clearance but does not explore potential risks that "training" for a more responsive inflammasome may have for the severity of lung injury during infection.

      (3) The is some lack of transparency on data and rigor of methods. Clear data is not provided regarding the RNA-sequencing results. Specific identities of DEGs is not provided, only one high-level pathway enrichment figure. It would also be ideal if controls were included for subcellular fractionating to confirm pure fractions and for dye microscopy to show negative background.

      (4) In results describing 5A, the text states that "ATP-induced macrophage training effects, as measured by augmented bactericidal activity, were diminished in macrophages treated with protease inhibitors". However, these data are not identified significant in the figure; protease dependence can be described as a trend that supports the authors' hypothesis but should not be stated as significant data in text.

      In summary, this work contains some useful data showing how ATP can train macrophages via TWIK2/Nlrp3. Revisions have significantly improved methods reporting, added some data to strengthen the conclusions, and toned down on overstatements to bring conclusions more in line with data presented. The title still overstates what the authors have actually tested, since no macrophage-specific targeting in vivo (no conditional gene deletion, macrophage depletion etc) was performed in infection studies. However, in vitro data provide clear evidence that macrophages can be trained by ATP, and through caveats remain, it is plausible that macrophage training is a key mechanism for the protection observed here in the lung.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) First, the concept of training or trained immunity refers to long-term epigenetic reprogramming in innate immune cells, resulting in a modified response upon exposure to a heterologous challenge. The investigations presented demonstrate phenotypic alterations in AMs seven days after ATP exposure; however, they do not assess whether persistent epigenetic remodeling occurs with lasting functional consequences. Therefore, a more cautious and semantically precise interpretation of the findings would be appropriate.

      In response, we have performed epigenetic analysis (ATAC seq analysis) as requested (Supp Fig. 1).

      (2) Furthermore, the in vivo data should be strengthened by additional analyses to support the authors' conclusions. The authors claim that susceptibility to Pseudomonas aeruginosa infection differs depending on the ATP-induced training effect. Statistical analyses should be provided for the survival curves, as well as additional weight curves or clinical assessments. Moreover, it would be appropriate to complement this clinical characterization with additional measurements, such as immune cell infiltration analysis (by flow cytometry), and quantification of pro-inflammatory cytokines in bronchoalveolar lavage fluid and/or lung homogenates.

      We have added the statistical analyses provided for the survival curves (new Fig. 1D), immune cell infiltration analysis, and quantification of pro-inflammatory cytokines in the lung (new Figs. 1, 2).

      (3) Moreover, the authors attribute the differences in resistance to P. aeruginosa infection to the ATP-induced training effect on AMs, based on a correlation between in vivo survival curves and differences in bacterial killing capacity measured in vitro. These are correlative findings that do not establish a causal role for AMs in the in vivo phenotype. ATP-mediated effects on other (i.e., non-AM) cell populations are omitted, and the possibility that other cells could be affected should be, at least, discussed. Adoptive transfer experiments using AMs would be a suitable approach to directly address this question.

      We have performed additional experiments and found that the numbers of lung macrophages were not significantly altered before and after ATP training (new Fig. 2), indicating the training effects are focused on lung resident macrophages.

      Reviewer #2 (Public review):

      (1) Missing details from methods/reported data: Substantial sections of key methods have not been disclosed (including anything about animal infection models, RNA-sequencing, and western blotting), and the statistical methods, as written, only address two-way comparisons, which would mean analysis was improperly performed. In addition, there is a general lack of transparency - the methods state that only representative data is included in the manuscript, and individual data points are not shown for assays.

      We have revised the methods and statistical analysis.

      (2) Poor experimental design including missing controls: Particularly problematic are the Seahorse assay data (requires normalization to cell numbers to interpret this bulk assay - differences in cell growth/loss between conditions would confound data interpretation) and bacterial killing assays (as written, this method would be heavily biased by bacterial initial binding/phagocytosis which would confound assessment of killing). Controls need to be included for subcellular fractionating to confirm pure fractions and for dye microscopy to show a negative background. Conclusions from these assays may be incorrect, and in some cases, the whole experiment may be uninterpretable.

      Seahorse assay methodology was updated to confirm the order of cell counting, time at seeding and cell counts. Methods were also updated to address the distinction between bacterial killing (Fig. 1B) and overall decrease in bacterial load.

      (3) The conclusions overstate what was tested in the experiments: Conceptually, there are multiple places where the authors draw conclusions or frame arguments in ways that do not match the experiments used. Particularly:

      (a) The authors discuss their findings in the context of importance for AM biology during respiratory infection but in vitro work uses cells that are well-established to be poor mimics of resident AMs (BMDM, RAW), particularly in terms of glycolytic metabolism.

      We have adjusted the text to reflect that the metabolic assay was performed on BMDMs. AMs are fragile for certain manipulations in vitro. We expect that the metabolic change is similar across several macrophage systems as well as the bacterial load reduction.

      (b) In vivo work does not address whether immune cell recruitment is triggered during training.

      We have performed immune cell infiltration analysis (new Fig. 2).

      (c) Figure 3 is used to draw conclusions about K+ in response to bacterial engulfment, but actually assesses fungal zymosan particles.

      We have corrected this in the manuscript.

      (d) Figure 5 is framed in bacterial susceptibility post-viral infection, but the model used is bacterial post-bacterial.

      We have corrected this in the manuscript.

      (e) In their discussion, the authors propose to have shown TWIK2-mediated inflammasome activation. They link these separately to ATP, but their studies do not test if loss of TWIK2 prevents inflammasome activation in response to ATP (Figure 4E does not use TWIK2 KO).

      We have now added the TWIK2 KO results (new Fig. 5E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As noted in the public review, it would be advisable to further characterize the in vivo phenotype in order to strengthen the conclusions. Specifically, it would be useful to quantify the bacterial load in the bronchoalveolar lavage fluid and lung homogenates, as well as to measure cytokine levels both in the respiratory compartment and systemically. Additionally, a broader characterization of the immune response in the presence or absence of ATP-induced training would be valuable. In the absence of direct evidence demonstrating that trained AMs mediate the observed phenotype, the authors should adopt a more cautious interpretation of their results. Moreover, careful attention to semantic accuracy is recommended. The concept of trained immunity refers specifically to long-term epigenetic reprogramming that leads to an altered response of target cells upon a secondary challenge, distant from the initial stress. The data presented do not fully demonstrate this phenomenon, and the interpretations should remain aligned with the evidence provided.

      Bacterial load has been quantified (see more details in the Methods part). And we also measured immune cell infiltration, quantification of pro-inflammatory cytokines in the lung (new Figs. 1, 2), and epigenetic evaluation of vehicle- and ATP-treated cells (Supp. Fig. 1).

      Reviewer #2 (Recommendations for the authors):

      (1) It cannot be overstated how lacking the methods are. This includes no discussion of IACUC approval for animal procedures, which must be included as part of research ethics. It also needs to be made clear where raw data is being archived. This notably includes an accession for deposited RNA-sequencing data, although unmanipulated microscopy and western blot images should also be shown. Methods should discuss any pre-processing that occurred with images.

      We have revised the methods in the manuscript.

      (2) Per statistics, in addition to generally providing more detail and adjusting analyses if they have not been correctly performed, please disclose if SD or SEM is shown. Reporting aggregate data versus representative data would provide more rigor. Perhaps replicate experiments could be included in the supplemental if they cannot, for some reason, be aggregated. Detailed statistical methods for RNA-seq analysis also need to be included.

      More details have been provided in the methods section.

      (3) It is unclear whether bacterial killing assays were correctly designed and can be interpreted. What does cells collected mean? If the assay was focused on intracellular macrophage bacterial load, it is critical to assess and report phagocytosis since different input loads would confound the assessment of killing. A rigorous wash or an antibiotic to eliminate extracellular bacteria should also have been performed and be described in this case. If the total bacterial burden was assessed, that would use cells+media and also needs to be clear and described. With the information provided, it is unclear whether the assays performed are sufficiently rigorous to assess bacterial killing. In addition, Figure 1B reports using an MOI of 50-100, but all data is compiled in one graph - data from different levels of infection should be separated. Figure 5A shows a model with E.coli followed by PA, but that does not appear to be how the assay was structured in B or C. This also does not match how the experiment is written in the results section, which references S. aureus. It is unclear what tissue (or cells) were assessed in Figure 5. Whole lung? BAL? As written, no data provided regarding bacterial killing is of sufficient quality to be considered valid.

      We have re-written the bacterial killing assay in the manuscript. The methodology was corrected to distinguish bacterial killing vs load decrease and generally accurate methodology.

      (4) The in vitro data provide reasonable evidence that BMDM/RAW macrophage training can occur in response to ATP exposure. However, it is unclear whether training is an important mechanism for resident AM in vivo, or whether, in vivo, a broader inflammatory response is generated, recruiting additional immune cells that persist and change infection susceptibility. The authors argue for resident AM immune training, but do not provide sufficient evidence to counter the latter possibility (resident AM are never themselves directly assessed, and the presence of other immune cells in vivo is not excluded). See Iliakis et al 2023 (PMID 37640788) for discussion of how this issue continues to drive uncertainty in the field. For this study, at least providing flow cytometry data quantifying myeloid and lymphoid immune populations in BALF before and after various treatments would help address this caveat. Without knowing this, it also confounds the interpretation of Figure 1B; if BAL is not pure AM after training, perhaps 1B could be repeated with ex vivo training or resident AM could be purified?

      We have performed immune cell infiltration analysis in the lung (both to BALF and in-tissue, new Fig. 2).

      (5) Figure 3A appears to show that fewer than 50% of cells express GFP. Is it expected that only a fraction of RAW cells express TWIK2-GFP? How was this addressed in the analyses for Figure 3? Were cells not appearing to express any significant GFP, included in phagosomal-negative or excluded from analysis? Please include in the methods.

      The RAW cells were transfected with TWIK2-GFP and variable GFP expression was expected. These cells were expressing a non-integrated transgene, which has been added to the methods as well as the consideration of cells for the analysis. Cells without visible GFP expression were excluded.

      (6) Why are many data points in Figure 3D negative? This suggests that settings were not optimized for microscopy - perhaps there is a very high background signal and the ION stain is barely above it. This is concerning for the quality of data. Further, is it expected that only some cells are positive for ION K+? The images shown clearly differentiate phagosomal K with ATP versus the absence of K without, but it is surprising that some cells appear not to contain any ION K+ signal (not completely clear given lack of brightfield or other cell staining) - this may again point to issues with imaging settings that confound data interpretation. This analysis should be carefully assessed.

      This has been updated in the methodology. In old Fig. 3D (new Fig. 4D), the presented data is the net intensity of the phagosome, subtracting the average cytoplasmic MFI from that of the area corresponding to an engulfed zymosan-af594 bead. Thus, a negative value has higher cytoplasmic IonK signal than that of the phagosome.

      (7) The Discussion states that it will be interesting to test whether ATP-TWIK2 is a common mechanism of training and specifically references LPS as an ATP-generating signal. However, Figure 2D data show that LPS induces only transient TWIK2 translocation; the authors have data suggesting that, in the context of LPS, TWIK2 'training' will not be engaged. This line of discussion shows incomplete consideration of the data.

      We have further limited this language in the text such that this may require differential sensitivity/damage sustained by macrophages as compared to that of epi/endothelial cells in response to bacterial endotoxin.

      (8) For RNA-sequencing, plots of the actual genes changed for the mitochondrial pathways of interest would be helpful information for readers, as would a heat map showing sample purity between groups for macrophage markers versus possible contaminant cells, which can also be generated from precursors in BMDM cultures. In general, information in Methods regarding how the analyses in Figure 4B were run is necessary, per cutoffs used to determine DEGs, number of samples in each group, sex of samples used, etc. Greater transparency of data would be appreciated, so plots that show variation between replicates, such as heat maps, would be ideal. Supplemental tables would also be nice.

      We have added to the methodology of the RNA sequencing analysis

      (9) The use of alternate DAMPs is a positive addition to the experimental design, but no data is given regarding the concentrations used. Ideally, positive controls showing histones/NAD are used at acutely activating concentrations could be included but at least references supporting the doses chosen or information about how doses were selected should be given. It is easy to find substantial literature on histones as a DAMP, but it was unclear why/how NAD was selected.

      We have added these concentrations and corresponding references.

      (10) The E.coli CFU reported in Figure 5B are extraordinarily low. In addition, CFU are generally shown on a log scale, but this appears to be linear. Please confirm that these data are correct. Perhaps improved methods might explain why? Is the second hit a low dose?

      These have been corrected in the new Fig. 6B.

      (11) Given that loss of either TWIK2 or Nlrp3 ablates bacterial protection, a link should be tested - experiments should test whether loss of TWIK2 prevents inflammasome activation in response to ATP (TWIK2 KO in 4E) and if loss of Nlrp3 changes TWIK2 translocation (Nlrp3 KO in at least some experiments of Figures 2/3).

      We have now added the TWIK KO results (new Fig. 5E).

      (12) One of the most striking data pieces is Figure 1D. It would, therefore, strengthen the paper to repeat those experiments (even just with the high-dose ATP) using TEIK2/P2X7/NLRP3 KO mice and really show the importance of these pathways in vivo. This is conceptually Figure 5, but the survival data of Figure 1 is far more convincing than the relatively weak bacterial load data of Figure 5.

      Unfortunately, our previous laboratory has been closed and we have trouble acquiring enough mice for additional survival data during the transition period. However, the bacterial load data has been adjusted to the same bacterial counts per 5 mg lung tissue instead of per individual sampling, giving a more contextual interpretation of the data.

    1. eLife Assessment

      This fundamental study provides the first genome-wide characterization of H3K115 acetylation and identifies a striking and previously unappreciated association of this globular-domain histone modification with fragile nucleosomes at CpG island promoters, active enhancers, and CTCF binding sites. While the work is largely descriptive and correlative in nature the evidence is compelling. The authors present multiple, orthogonal genomic and biochemical analyses that consistently support their central conclusions.

    2. Reviewer #1 (Public review):

      Summary

      The authors set out to define the genomic distribution and potential functional associations of acetylation of histone H3 lysine 115 (H3K115ac), a poorly characterized modification located in the globular domain of histone H3. Using native ChIP-seq and complementary genomic approaches in mouse embryonic stem cells and during differentiation to neural progenitor cells, they report that H3K115ac is enriched at CpG island promoters, active enhancers, and CTCF binding sites, where it preferentially localizes to regions containing fragile or subnucleosomal particles. These observations suggest that H3K115ac marks destabilized nucleosomes at key regulatory elements and may serve as an informative indicator of chromatin accessibility and regulatory activity.

      Strengths

      A major strength of this study is its focus on a histone post-translational modification in the globular domain, an area that has received far less attention than histone tail modifications despite strong evidence from structural and in vitro studies that such marks can directly influence nucleosome stability. The authors employ a wide range of complementary genomic analyses-including paired-end ChIP-seq, fragment size-resolved analyses, contour (V-) plots, and sucrose gradient fractionation-to consistently support the association of H3K115ac with fragile nucleosomes across promoters, enhancers, and architectural elements. The revised manuscript is careful in its interpretation and provides a coherent and internally consistent picture of how H3K115ac differs from other acetylation marks such as H3K27ac and H3K122ac. The datasets generated will be valuable to the chromatin community as a resource for further exploration of nucleosome dynamics at regulatory elements.

      Weaknesses

      The conclusions are largely correlative. While the authors provide strong evidence for the localization of H3K115ac to fragile nucleosomes, the work does not directly test whether this modification causally contributes to nucleosome destabilization or regulatory function in vivo. Questions regarding the enzymes responsible for depositing or removing H3K115ac and its direct functional consequences therefore remain open.

      Overall assessment and impact

      Overall, the authors largely achieve their stated aims by providing a detailed and well-supported characterization of H3K115ac distribution in mammalian chromatin and its association with fragile nucleosomes at regulatory elements. While mechanistic insight remains to be established, the study introduces a compelling new perspective on globular-domain histone acetylation and highlights H3K115ac as a potentially useful marker for identifying functionally important regulatory regions. The work is likely to stimulate further mechanistic studies and will be of broad interest to researchers studying chromatin structure, transcriptional regulation, and genome organization.

    3. Reviewer #2 (Public review):

      Summary:

      Kumar et al. aimed to assess the role of the understudied H3K115 acetylation mark, which is located in the nucleosomal core. To this end, the authors performed ChIP-seq experiments of H3K115ac in mouse embryonic stem cells as well as during differentiation into neuronal progenitor cells. Subsequent bioinformatic analyses revealed an association of H3K115ac with fragile nucleosomes at CpG island promoters, as well as with enhancers and CTCF binding sites. This is an interesting study, which provides important novel insights into the potential function of H3K115ac. However, the study is mainly descriptive, and functional experiments are missing.

      Strengths:

      (1) The authors present the first genome-wide profiling of H3K115ac and link this poorly characterized modification to fragile nucleosomes, CpG island promoters, enhancers, and CTCF binding sites.

      (2) The study provides a valuable descriptive resource and raises intriguing hypotheses about the role of H3K115ac in chromatin regulation.

      (3) The breadth of the bioinformatic analyses adds to the value of the dataset

      Comments on revisions:

      The authors sufficiently addressed my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Kumar et al. examine the H3K115 epigenetic mark located on the lateral surface of the histone core domain and present evidence that it may serve as a marker enriched at transcription start sites (TSSs) of active CpG island promoters and at polycomb-repressed promoters. They also note enrichment of the H3K115ac mark is found on fragile nucleosomes within nucleosome-depleted regions, on active enhancers and CTCF bound sites. They propose that these observations suggest that H3K115ac contributes to nucleosome destabilization and so may servers a marker of functionally important regulatory elements in mammalian genomes.

      Strengths:

      The authors present novel observations suggesting that acetylation of a histone residue in a core (versus on a histone tail) domain may serve a functional role in promoting transcription in CPG islands and polycomb-repressed promoters. They present a solid amount of confirmatory in silico data using appropriate methodology that supports the idea that H3K115ac mark may function to destabilize nucleosomes and contribute to regulating ESC differentiation. These findings are quite novel.

      Weaknesses:

      Additional experiments to confirm specificity of the antibodies used have been done, improving confidence in the study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      The reviewer was under the impression that that we did not perform biological replicates of our ChIP-seq experiments. All ChIP-seq (and ATAC-seq) experiments were performed with biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. We had indicated this in the text and methods but will try to make this even clearer.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      Histone modifications are difficult to alter genetically because of the high copy number of histone genes and inhibition of HATs/HDACs in general leads to alterations in other histone modifications. It is an inherent challenge in establishing causality of histone modifications, especially histone acetylation marks.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      We have modified the text in response to this point. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

      We agree that the paper does not provide mechanistic details or solid causality of H3K115ac. We have only emphasized the potential role of H3K115ac in nucleosome fragility based on our in vivo data and previously published in-vitro experiments (Manohar et.al., 2009, Chatterjee et. al., 2015). We do provide the evidence that H3K115ac is enriched on subnucleosomal particles via sucrose gradient sedimentation of MNase-digested chromatin (Figure 3C-D).

      Reviewer #2 (Public review):

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region. It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      ChIP-qPCR in S1B includes competition from native chromatin and shows high specificity to its target. We have provided antibody validation in three ways:

      - Western blot with dot-blot of synthetic peptides (Figure S1A).

      - Western blots with Whole cell extracts (Figure 4D).

      - ChIP-qPCR on native chromatin spiked with a cocktail of synthetic mono-nucleosomes, each carrying a single acetylation and a specific barcode (SNAP-ChIP K-AcylStat Panel).

      We could not include H3K115ac marked nucleosomes as they are not available in the panel. Figure S1B shows that the H3K115ac antibody exhibits negligible binding to known K-acyl marks, comparable to an unmodified nucleosome. Because of the absence of a H3K115ac modified barcoded nucleosome, we used the KLF4 promoter from mESCs as a positive control, in agreement with ChIP-seq signal shown in the genome browser profile (Figure 1E), the KLF4 promoter shows a significantly higher signal than the gene body.

      (2) The association of H3K115ac with fragile nucleosomes is based on MNase-sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      We have performed ChIP-seq on MNase digested mESC chromatin fractionated on sucrose gradients and this shows that H3K115ac is enriched in fractions containing sub-nucleosomal and fragile nucleosomes but depleted in fractions containing stable nucleosomes (Figure 3D).

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      H3K64ac and H3K122ac datasets were generated by us in a previous publication (Pradeepa et. al., 2016) using same native MNase ChIP protocol as used here. The ChIP-seq datasets for H3K122ac and H3K27ac are processed in an identical manner, with the same computational pipelines, to the H3K115ac data sets generated in this paper.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      We agree with the reviewer’s comment, but we have not claimed causality.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      Due to broad target specificity, redundancies and crosstalk among different classes of HATs and HDACs, it is not tractable to answer this question in the current manuscript.

      Reviewer #3 (Public reviews):

      Reviewer 3 is mistaken in thinking our ChIP experiments are performed under cross-linked conditions. As clearly stated in the main text and methods, all our ChIP-seq for histone modifications is done on native MNase-digested chromatin – with no cross-linking. This includes the spike-in experiment shown in Fig S1B to test H3K115ac antibody specificity against the bar-coded SNAP-ChIP® K-AcylStat Panel from Epicypher. We could not include H3K115ac bar-coded nucleosomes in that experiment since they are not available in the panel.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I have two primary concerns that resound through the entire paper:

      (a) Overall, the manuscript is making strong claims based on entirely correlative datasets. No quantitative analyses are performed to demonstrate co-occupancy/localization. Please see more detailed descriptions below.

      Our responses to specific points are provided against each comment below.

      (b) Lack of paired-end replicates for H3K115ac ChIP-seq. While the reviewer token for the deposited data was not made accessible to me, looking at Supplementary Table 1, it appears there are two H3K115ac ChIP-seq datasets. One is paired-end and is single-read. So are peaks called with only one replicate of PE? Or are inaccurate peaks called with SR datasets? Either way, this is not a rigorous way to evaluate H3K115ac localization.

      We are sorry that this reviewer was not able to access the data – the token for the GEO accession was provided for reviewers at the journal’s request. All ChIP-seq (and ATAC-seq) experiments (paired and single-end) were performed with two biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. This was indicated in both the main text and in the methods. In the revised manuscript we have tried to make this even clearer and have put the relevant Pearsons coefficient (r) into the text at the appropriate places. For the reviewer’s information, here is the complete list of data samples in the GEO Accession:

      Author response image 1.

      While I agree that H3K115ac occupancy is high at +CGIs, the authors downplay that H3K122ac and H3K27ac is also more highly enriched at these locations (page 7, last sentence of first paragraph). I imagine this is all due to the more highly transcribed nature of these genes. Sub-stratifying the K27ac and K122ac by transcription (as in Figure 1G) would help to demonstrate a unique nature of H3K115ac. But even better would be to do an analysis that plots H3K115ac enrichment vs transcription for every individual gene rather than aggregate analyses that are biased by single locations. For example, make an XY scatterplot of RNAPII occupancy or 4SU-seq signal vs H3K115ac level, where each point represents a single gene. Because the interpretation that it is CGI-based and not transcription is confounded with the fact that -CGI are more lowly transcribed. So, looking at Figure 1G, even the -CGI occupancy of H3K115ac is correlated with transcription, but it is just more lowly transcribed.

      We thank the reviewer for these suggestions but point out that Figure 1G shows H3K115ac signal for CGI+ and CGI– TSS that are matched for expressions levels (quartiles of 4SU-seq). Fig 1F shows that H3k115ac is much more of a discriminator between CGI+ and – than H3K27ac or H3K122ac.

      (2) H3K115ac, H3K27ac, and H3K122ac are all more enriched (in aggregate) at +CGI locations (Fig 1F); so do these locations just have more positioned nucleosomes? More H3.3? So that these PTMs are just more enriched due to the opportunity?

      Positioned nucleosomes are generally found downstream of the TSS of active CpG island promoters, so what the reviewer suggests may well account for the relative enrichment of H327ac and H3K122ac at CGI+ vs CGI- promoters in Fig.1F. But H3K115ac localisation is distinct, with the peak at the nucleosome-depleted region not the +1 nucleosome. This is also confirmed by the contour plots in Fig 3. Our observation is also not explained by an enrichment of H3.3 at CGI promoters, since we show that H3K115ac is not specific to H3.3 (Fig 4D).

      (3) The authors note in paragraph 2 of page 7 that "H3K115ac does not scale linearly with gene expression..." but the authors never show a quantification of this; stratification in four clusters is not able to make a linear correlation. Furthermore, in the second line of page 7, the authors state that the levels do generally correlate with transcription. To claim it is a specific CGI link and not transcription is tricky, but I encourage the authors to consider more quantifiable ways, rather than correlations, to demonstrate this point, if it is observed.

      We thank the reviewer for this comment, and taking it into consideration, we have decided to re-phrase this paragraph. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) The authors claim on page 7 that "on average, transcription increased from TSS that also gained H3K115ac but to a modest extent, compared with the more substantial loss of H3K115ac from downregulated TSS". However, both upregulated and downregulated are significant; the difference in magnitude could simply be due to more highly or more lowly transcribed locations, meaning that fold change could be more robustly detected. I caution the authors to substantiate claims like this rather than stating a correlation.

      We thank the reviewer for this comment which relates to the data in Fig 2A. It is Fig. 2B shows that the association of H3K115ac loss with downregulation is statistically stronger than H3K115ac gain with upregulation, but only for CGI promoters. With regard to the text on the original pg 7 that is referred to, we have now reworded this to read “Average levels of transcription increased from TSS that also gained H3K115ac, and there was loss of H3K115ac from downregulated TSS (Figure 2A).”

      (5) For Figure 2C, the authors argue that H3K115ac correlate with bivalent locations. So this is all qualitative and aggregate localization; please quantitatively demonstrate this claim.

      Figure S2D provides statistics for this (observed/expected and Fishers exact test).

      (6) The authors claim in Figure 2 that H3115ac is dynamic during differentiation (title of Figure 2). However, there are locations that gain and lose, or maintain H3K115ac. In fact, the most discussed locations are H3K115ac with no change (2C); which means it is NOT dynamic during differentiation. So what is the message for the role during differentiation? From Supplemental Table 1, it appears there is a single ChIP experiment for H3K115ac in NPC, and it is a single read. So this is also a difficult claim with one replicate. Related to this, in S2A, the authors show K115ac where there is no change in transcription; so what is the role of H3K115ac at TSSs relevant to differentiation - it is at both locations changed and unchanged in transcription, but H3K115ac levels itself do not change at these subsets. So, how is this dynamic? This is very confusing, and clearer analyses and descriptions are necessary to deconvolute these data.

      We apologise for the misleading title for Figure 2. This has now been amended to “Changes in H3K115ac during differentiation”. The message of this figure is that whilst changes in H3K115ac at TSS are small (panels A-C), at enhancers the changes are much more dramatic (panel D). The reviewer is incorrect about the number of replicates for NPCs – there are two biological replicates (see response to point 1b).

      (7) The authors go on to examine H3K115ac enrichment on fragile nucleosomes through sucrose gradient sedimentation. A control for H3K27ac or H3K122ac would be nice for comparison.

      We do not have the material available to perform these experiments

      (8) When discussing Figures 3 and SF3, the authors mention performing a different MNase for a second ChIP. Showing the MNase distribution for both the more highly digested and the lowly digested would be nice. a) Related to the above, the authors show input in SF3E to argue that the difference in H3K115ac vs H3K27ac is not due to the library, but they do not show the MNase digestion patterns, which is more important for this argument.

      Input libraries (first two graphs of FigS3E) are the MNase-digested chromatin. Comparison of nucleotide frequencies from millions of reads is more robust method than the fragment length patterns.

      (9) The authors move on to examine H3K115ac at enhancers. Just out of curiosity, given what was found at promoters, is H3K115ac enriched at +CGI enhancers? And what is the correlation with enhancer transcription?

      This is an interesting point, but the number of enhancers associated with CGI is not very high and so we did not focus on this. We have not analysed a correlation with eRNAs in this paper.

      (10) The authors state on page 14 that the most frequent changes in H3K115ac during differentiation are at these enhancers. So do these changes connect with differentiation-specific genes, and/or genes that have altered transcription during differentiation? Just trying to understand the functional role.

      Given the challenges of connecting enhancers with target genes, we have not addressed this question quantitatively. However, we draw the reviewer’s attention to the Genome Browser shots in Figures 2D and S2C, which show clear gain of H3K115ac (and ATAC-seq peaks) at intra and intergenic regions close to genes whose transcription is activated during the differentiation to NPCs.

      (11) Related, at the end of page 14, the authors state that the changes in H3K115ac correlate with changes in ATAC-seq; I imagine this dynamic is not unique for H3K115ac and this is observed for other PTMs (H3K27ac), so assessing and clarifying this, to again get to the specific interest of H3K115ac, would be ideal.

      We have not claimed that chromatin accessibility is unique to H3K115ac. It is the location of H3K115ac which is found inside the ATAC-seq peak region while H3K27ac is found only upstream/downstream of the ATAC peak that is so striking. This is apparent in Fig 4C.

      (12) The authors examine levels of H3K115ac in H3.3 KO cell lines via western blot (Figure 4D), but no replicates and/or quantification are shown.

      We now provide a biological replicate for the Western Blot (new FigS4H) together with an image of the whole gel for the data in Fig 4D

      (13) In Figure S4 and at the end of page 17, the authors are arguing that there is a link to pioneer TF complexes, based on Oct4 binding. First, while Oct4 has pioneering activity, not all Oct4 sites (or motifs) are pioneering; this has been established. So if you want to use Oct4, substratifying by pioneer vs no pioneer is necessary. Second, demonstrating this is unique to pioneer and not to non-pioneer TFs would be an important control.

      In response to the reviewer’s comment, we have removed the term “pioneer” from the manuscript.

      (14) Minor point: Figure 4 A and B, there are some formatting issues with the scale bars.

      We thank the reviewer for pointing this out, and the errors have been corrected in the revised figure.

      (15) Minor point is that it should be clear when single replicates of data are used and when PE/SR sequences are combined or which one is used in each analysis, as this was hard to discern when reading the paper and figure legends.

      We have clearly stated in the text that, after Figure2, we repeated all experiments in paired-end mode. All processing steps are defined separately for single end and paired end datasets in the method section. Details of biological replicates are provided in Sup. Table 1. These concerns are also addressed in our response to Reviewer’s public comment-1.

      (16) Minor point: it is surprising that different MNase and different units were used in the ChIP vs sucrose sedimentation. Could the authors clarify why?

      Chromatin prep for sucrose gradients were done on a much larger scale than for ChIP-seq and required different setups to obtain the right level of MNase digestion.

      (17) The authors note that fragile nucleosomes contain H2A.Z and H3.3, but they never perform an analysis of available data to demonstrate a correlation (or better a quantifiable correlation) between H3K115ac occupancy and these marks at the locations they identify H3K115ac.

      Since have shown (Fig. 4) that depletion of H3.3 does not affect overall levels of H3K115ac, we do not think there is value in further quantitative correlative analyses of H3K115ac and variant histones.

      (18) Minor point: What is the overlap in peaks for H3K115ac, H3K122ac, and H3K27ac (Figure 1C)?

      Nearly all H3K115ac peaks overlap with H3K122ac and/or H3K27ac. Its most distinct properties are its association with CGI promoters, fragile nucleosomes and its unique localisation within the NDRs, three points that the manuscript is focussed on.

      Reviewer #3 (Recommendations for the authors):

      (1) The western blot results in Figure 4D probing for H3, H3.3, and H3K115ac use Ponceau S staining, presumably of an area of the membrane where histones might be expected to migrate, as a measure of loading. However, the Ponceau S bands appear uniformly weaker in the H3.3KO lanes, yet despite this, blotting with H3.3 antibody detects a band in H3.3 knockout ESCs, suggesting that the antibody does not have a high degree of specificity. Again, a blocking experiment with appropriate peptides would instill more confidence in the specificity of these reagents, and/or the authors could provide independent validation of the knockout model to differentiate between a partial knockout or antibody cross-reactivity (e.g., by Sanger sequencing).

      In a revised Fig. S4H we now show the whole gel corresponding to this blot but including co-staining with an antibody for H4 to provide a better loading control. We also provide a biological replicate of this Western blot in the lower panel of Fig. S4H.

      (2) The manuscript would benefit from in vitro follow-up and validation, but if the authors intend to keep the manuscript primarily in silico, I suggest dedicating a few lines in each section to explain the plots, their axes, and their purpose, as well as to assist with interpretation, rather than directly discussing the results. This would make the manuscript more accessible and understandable for a broader audience in the field of epigenetics.

      In the revised version, we have tried to improve the text to make the data more accessible to a broad audience.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of OXT (oxytocin) neurons and OXTR (oxytocin receptor) expressions in mammalian brains using an advanced RNAscope at the single transcript level. The evidence supporting the conclusions is compelling using chromogenic assays and state-of-the-art microscopy. The work will be of broad interest to neuroscientists and endocrinologists.

    2. Reviewer #1 (Public Review):

      This study by Ryu et al, provides compelling evidence to demonstrate the distributions of Oxt and Oxtr in the murine brain using an advanced RNAscope technique. Detailed information on the distributions was provided, revealing differences in Oxt and Oxtr expressions between males and females. This study will provide a new platform for investigators to study previously unknown roles of brain-region specific Oxt and Oxtr neurons and signaling in animal behaviors and metabolism, and others.

    3. Reviewer #2 (Public Review):

      This an exciting study investigating the role of OXT in central nervous system (CNS) regulation of different behaviors and physiological processes. The study clearly shows the expression level of Oxt and Oxtr in different brain nuclei and regions.

      Sex differences in Oxt expression are also well demonstrated.

      Extensions of OXT's function in CNS regulation are sufficiently discussed.

      Overall, this provides a good direction for further investigate OXT's role in CNS's regulation on different behaviors and physiological processes.

    1. eLife Assessment

      This potentially important study explores the specificity of olfactory perceptual learning. In keeping with previous work, the authors found that learning to discriminate between two enantiomers does not generalize across the nostrils or to unrelated enantiomers, whereas learning to discriminate odor mixtures does generalize across the nostrils and to other odor mixtures, with this learning effect persisting over at least two weeks. While the evidence presented to support these findings is convincing, it remains unclear why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

    2. Reviewer #1 (Public Review):

      This study extends a previous study by the same group on the generalization of odor discrimination from one nostril to the other. In their earlier study, the group showed that learning to discriminate between two enantiomers does not generalize across nostrils. This was surprising given the Mainland & Sobel 2001 study that found that detecting androstenone in people who do not detect it can generalize across the two nostrils. In this study, they confirmed their previous results and reported that, unlike enantiomers, learning to discriminate odor mixtures generalizes across nostrils, generalizes to other odor mixtures, and is persistent over at least two weeks. This interesting and important result extends our knowledge of this phenomenon and will likely steer more research. It may also help develop new training protocols for people with impairments in their sense of smell.

      The main weakness of this study is its scope, as it does not provide substantial insight into why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

    3. Reviewer #2 (Public Review):

      The manuscript from Chang et al. taps on an important issue in olfactory perceptual plasticity, named the generalization of perceptual learning effect by training using odors. They employed a discrimination training/learning task with either binary odor mixture or odor enantiomers, and tested for post-training effect at several time intervals. Their results showed contrasting patterns of specificity (enantiomers) and transfer (odor mixtures), and the learning effect persisted at 2 weeks post-training. They demonstrated that the effect was independent of task difficulty, olfactory adaptation and gender.

      Overall this was a well-controlled study and shows novel results. The strength of the study includes the consideration of odor structure and perceptual (dis)similarity and the control training condition. I have two minor issues that hope the authors could address in the next version of the manuscript.

      1) The author used a binary odor mixture with a ration 7:9 or 9:11, why is this ratio chosen and used for the experiment?

      2) Over the course of training, has the valence of odor (odor mixture) changed, it would be helpful to include these results in the supplements. As the author indicated in the discussion, the potential site underlying the transfer effect is the OFC, which has been found to represent odor valence previously (Anderson, Christoff et al. 2003). It would be nice to see the author replicate the results with odor/odor mixture valence (change) controlled.

      Anderson, A. K., K. Christoff, I. Stappen, D. Panitz, D. G. Ghahremani, G. Glover, J. D. Gabrieli and N. Sobel (2003). "Dissociated neural representations of intensity and valence in human olfaction." Nat Neurosci 6(2): 196-202.

    4. eLife Assessment

      This potentially important study explores the specificity of olfactory perceptual learning. In keeping with previous work, the authors found that learning to discriminate between two enantiomers does not generalize across the nostrils or to unrelated enantiomers, whereas learning to discriminate odor mixtures does generalize across the nostrils and to other odor mixtures, with this learning effect persisting over at least two weeks. While the evidence presented to support these findings is convincing, it remains unclear why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

      Discrimination of odor enantiomers ultimately relies on the enantioselectivity of olfactory receptors, whereas mixture discrimination likely depends on relative differences in perceived configural odor notes. These processes probably engage plasticity at different stages of the olfactory pathway. The revised Discussion (p.16-18) now elaborates on this distinction and the potential underlying mechanisms. Please also refer to our responses to Reviewer 1’s Point 1 and Reviewer 2’s Points 2 and 3 below.

      Reviewer #1 (Public Review):

      This study extends a previous study by the same group on the generalization of odor discrimination from one nostril to the other. In their earlier study, the group showed that learning to discriminate between two enantiomers does not generalize across nostrils. This was surprising given the Mainland & Sobel 2001 study that found that detecting androstenone in people who do not detect it can generalize across the two nostrils. In this study, they confirmed their previous results and reported that, unlike enantiomers, learning to discriminate odor mixtures generalizes across nostrils, generalizes to other odor mixtures, and is persistent over at least two weeks.

      This interesting and important result extends our knowledge of this phenomenon and will likely steer more research. It may also help develop new training protocols for people with impairments in their sense of smell.

      We thank the reviewer for the encouraging remarks.

      The main weakness of this study is its scope, as it does not provide substantial insight into why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

      We thank the reviewer for this insightful comment. While the present study does not directly identify the neural mechanisms underlying these differences, it provides behavioral constraints on where specificity and generalization may arise within the olfactory system. Further neuroimaging and neurophysiological work will be needed to fully elucidate the underlying mechanisms.

      Reviewer #2 (Public Review):

      The manuscript from Chang et al. taps on an important issue in olfactory perceptual plasticity, named the generalization of perceptual learning effect by training using odors. They employed a discrimination training/learning task with either binary odor mixture or odor enantiomers, and tested for post-training effect at several time intervals. Their results showed contrasting patterns of specificity (enantiomers) and transfer (odor mixtures), and the learning effect persisted at 2 weeks post-training. They demonstrated that the effect was independent of task difficulty, olfactory adaptation and gender.

      Overall this was a well-controlled study and shows novel results. The strength of the study includes the consideration of odor structure and perceptual (dis)similarity and the control training condition.

      We appreciate the reviewer’s positive assessment of our work.

      I have two minor issues that hope the authors could address in the next version of the manuscript.

      (1). The author used a binary odor mixture with a ration 7:9 or 9:11, why is this ratio chosen and used for the experiment?

      This ratio was selected based on pilot testing and practical constraints. During piloting, we evaluated several mixing ratios to identify those that met two key criteria: (1) Baseline indiscriminability: Most participants were unable to reliably discriminate between the two binary mixtures in a:b and b:a ratios at baseline. (2)Trainability: With 1–5 weeks of training, participants could acquire the ability to discriminate between them.

      The a:b ratios of 7:9 and 9:11 were the ratios that met both criteria in our pilot testing, making them suitable for assessing training‑induced improvements in mixture discrimination. This clarification has been added to the revised Olfactory Stimuli subsection of the Materials and Methods (p.19-20 of the revised manuscript).

      (2) Over the course of training, has the valence of odor (odor mixture) changed, it would be helpful to include these results in the supplements. As the author indicated in the discussion, the potential site underlying the transfer effect is the OFC, which has been found to represent odor valence previously (Anderson, Christoff et al. 2003). It would be nice to see the author replicate the results with odor/odor mixture valence (change) controlled.

      Anderson, A. K., K. Christoff, I. Stappen, D. Panitz, D. G. Ghahremani, G. Glover, J. D. Gabrieli and N. Sobel (2003). "Dissociated neural representations of intensity and valence in human olfaction." Nat Neurosci 6(2): 196-202.

      Odor valence ratings were not collected in Experiments 1 and 2. However, we have since conducted a new experiment examining concentration discrimination learning (see our response to Reviewer 1, Point 1), using the constituents of the mixtures from Experiment 2 as stimuli (i.e., concentration pairs of acetophenone, 2 octanone, methyl salicylate, and isoamyl butyrate). In this new experiment (now incorporated as Experiment 3 in the revised manuscript), unilateral odor valence ratings were collected at baseline (Day 0) and at the post training test and retests on Days N, N+1, N+3, N+7, and N+14.

      For all odor pairs (training and controls), there was no significant change in perceived valence from baseline to Day N, regardless of nostril (ps > 0.05 for the main effects of session and nostril, as well as their interaction; Figure S5D). Moreover, odor valence ratings remained stable across the five post training test sessions (ps ≥ 0.29 for the main and interaction effects involving session), showing the same pattern as at baseline (Figure S5D, F). Thus, training appeared to have no measurable influence on odor valence perception. These results have been incorporated into the revised manuscript on p.14-15.

    1. eLife Assessment

      This important study provides evidence, albeit still incomplete, that high-elevation species lose water at slower rates than low-elevation species. The findings imply that egg physiology may be a factor limiting the distributional range of bird species. While this work reinforces the need for all life stages to be considered when evaluating physiological adjustment to climate change, the analyses as presented by the authors do not clearly highlight the specific impact of species differences in influencing these adjustments.

    2. Reviewer #1 (Public Review):

      The authors tested the hypothesis that at high elevations avian eggs will be adapted to prevent desiccation that might arise from loss of water to surrounding drier air. They used a combination of gas diffusion experiments and scanning electron microscopy to examine water vapour conductance rates and eggshell structure, including thickness, pore size, and pore density among 197 bird species distributed along an elevational gradient in the Andes. While there was a correlation between water vapour conductance and elevation among species, a decrease in water vapour conductance with elevation was not associated with eggshell thickness, pore size, and pore density, suggesting the variation in the structure of the eggshells is unlikely to do with among species differences in water loss along elevational gradients. This study is very interesting and timely, especially with increasing water vapour pressure due to climate warming. It is a very well-written study and easy to read. However, I have some concerns about the conclusions drawn from the results.

      There are more than twice as many species in low and medium-elevation sites compared to high-elevation sites, so the amount of variation in low and medium-elevation should be expected to be higher by default. The argument for a wider range of variation in low-elevation species will be stronger if the comparison was a similar sample size. Moreover, the pattern clearly breaks down within families. Note also that for Low and medium elevation there is no difference in the amount of variation in conductance residuals possibly because the sample sizes are similar. The seemingly strong positive correlation between eggshell conductance and egg mass may be driven by the five high and two medium-elevation species with large eggs. There seem to be hardly any high-elevation species with egg mass greater than 12g whereas species in low elevation egg size seem to be as high as 80g (Figure 2a). Since larger eggs (and thus eggs of larger birds) lose more water compared to smaller eggs, the correlation between water vapour conductance and elevation may be more strongly associated with body size distribution along elevational gradients rather than egg structure and function.

      Authors argue that the observed variation in the relationship between water vapour conductance and elevation among and within bird families suggests potential differences in the adaptive response to common selective pressures in terms of eggshell thickness and pore density, and size. The evidence for this is generally weak from the data analyses because the decrease in water vapour conductance with elevation was not consistent across taxonomic groups nor were differences associated with specific patterns in eggshell thickness and pore density, and size.

      It is not clear how the authors expected the relationship between water vapour conductance and elevation to differ among taxonomic groups and there was no attempt to explain the biological implication of these differences among taxonomic groups based on the specific traits of the species or their families. This missing piece of information is crucial to justify the argument that differences among taxonomic groups may be due to differences in adaptive response.

    3. Reviewer #2 (Public Review):

      Many tropical montane species live only within narrow elevational ranges. Rapid climate change has led to considerable interest in determining whether these narrow elevational ranges are the result of physiological specialization: if so, then warming temperatures will have direct fitness consequences. Thus far, studies of tropical montane ectotherms have often found patterns consistent with physiological specialization, while the few field studies of tropical montane birds (endotherms) have not. However, these few studies measured the thermal physiology of adult birds. The early life stages of birds may show physiological specialization, as eggs and nestlings function as ectotherms.

      In this paper, Ocampo and colleagues provide the first test of the hypothesis that bird eggs are physiologically specialized to the climatic conditions of certain elevational zones. They use experiments and observations to measure water vapor conductance rates and eggshell traits in a diverse set of 197 species that live from the lowland Amazon to the high Andes. Ocampo and colleagues present two principal results: (1) High-elevation eggs lose less water over time than do low-elevation eggs, high elevations tend to be less humid than low elevations and (2) Eggshell traits do not show consistent patterns along the elevational gradient. The pattern in water loss is consistent with the hypothesis that high-elevation eggs are physiologically specialized for the slightly drier environments they experience. The finding that eggshell traits did not vary with elevation, however, means that the pattern of water loss is not driven by single eggshell traits (thicker eggshells could reduce water loss rates, as could fewer or smaller eggshell pores).

      This paper represents a strong advance for two main reasons. First, it provides evidence that egg physiology varies with elevation as predicted by the hypothesis that eggs are physiologically adapted to certain climatic conditions. This means egg physiological adaptation is a factor that could influence species' elevational ranges. Second, it is a proof-of-concept study that shows it is possible to measure eggshell physiology for a large number of species in the field in order to test hypotheses. As such, it should inspire many further tests that examine adaptation in egg physiology in the context of species' distributions along environmental gradients.

      There are two caveats that readers should be aware of. First, measuring these traits is difficult, and there remain questions about the efficacy of different methods. For example, the authors note that quantifying eggshell structures is very difficult, with several unresolved questions about their method of using scanning electron microscopy images to measure eggshell pores. Similarly, the authors mention that temperature variation may partially influence their main result that high-elevation eggs lose water at slower rates than low-elevation eggs (temperatures were colder for experiments at high elevations than for low elevations). Second, I regard the analyses of eggshell traits for specific families as exploratory. There are no a priori expectations for how different families might be expected to differ in their patterns. These analyses are fruitful in that they generate additional hypotheses that future work can test. However, it does mean that the statistical significance of eggshell trait relationships with elevation for specific families should be interpreted with caution.

    4. Author response:

      Reviewer #1 (Public Review):

      The authors tested the hypothesis that at high elevations avian eggs will be adapted to prevent desiccation that might arise from loss of water to surrounding drier air. They used a combination of gas diffusion experiments and scanning electron microscopy to examine water vapour conductance rates and eggshell structure, including thickness, pore size, and pore density among 197 bird species distributed along an elevational gradient in the Andes. While there was a correlation between water vapour conductance and elevation among species, a decrease in water vapour conductance with elevation was not associated with eggshell thickness, pore size, and pore density, suggesting the variation in the structure of the eggshells is unlikely to do with among species differences in water loss along elevational gradients. This study is very interesting and timely, especially with increasing water vapour pressure due to climate warming. It is a very well-written study and easy to read. However, I have some concerns about the conclusions drawn from the results.

      There are more than twice as many species in low and medium-elevation sites compared to high-elevation sites, so the amount of variation in low and medium-elevation should be expected to be higher by default. The argument for a wider range of variation in lowelevation species will be stronger if the comparison was a similar sample size. Moreover, the pattern clearly breaks down within families. Note also that for Low and medium elevation there is no difference in the amount of variation in conductance residuals possibly because the sample sizes are similar. The seemingly strong positive correlation between eggshell conductance and egg mass may be driven by the five high and two medium-elevation species with large eggs. There seem to be hardly any high-elevation species with egg mass greater than 12g whereas species in low elevation egg size seem to be as high as 80g (Figure 2a). Since larger eggs (and thus eggs of larger birds) lose more water compared to smaller eggs, the correlation between water vapour conductance and elevation may be more strongly associated with body size distribution along elevational gradients rather than egg structure and function.

      We thank the reviewer for this thoughtful observation. As noted in our response to comment 3, we recognize that the higher number of species at low and mid-elevations reflects the natural turnover in species richness along elevational gradients, and we are transparent about this caveat in our revised Discussion section. Nevertheless, to address this specific concern, we conducted additional analyses excluding the species with large eggs (i.e., egg mass >12g, which are only present at low and mid-elevations in our dataset). These analyses are now included in the Supplementary Figure 1, and the main pattern of lower water vapor conductance at high elevations holds even when larger eggs are excluded.

      We agree that the well-known scaling relationship between egg mass and conductance (recognized since the 1970s) may partially explain the observed trends across the elevational gradient. Our aim was to explore whether the known relationship between egg size and conductance varies when incorporating environmental variables such as elevation, which brings with it changes in humidity and oxygen availability. While we acknowledge the possible confounding effect of body size distributions along the gradient, our results, even after controlling for egg size (residual analysis), still suggest a decrease in conductance at higher elevations, consistent with predictions based on environmental conditions.

      We have clarified these points in the revised Discussion, including the acknowledgment that disentangling the relative contributions of body size and elevation to conductance patterns remains challenging and warrants further study.

      Authors argue that the observed variation in the relationship between water vapour conductance and elevation among and within bird families suggests potential differences in the adaptive response to common selective pressures in terms of eggshell thickness and pore density, and size. The evidence for this is generally weak from the data analyses because the decrease in water vapour conductance with elevation was not consistent across taxonomic groups nor were differences associated with specific patterns in eggshell thickness and pore density, and size.

      We appreciate the reviewer’s comments on the observed variation in water vapor conductance across taxonomic groups. As mentioned in response to comment 7, we have removed the explicit analyses and figures showing within-family comparisons, as these were exploratory and not directly tied to a specific hypothesis. We have also toned down our speculations regarding the potential adaptive drivers of the observed variation. In the revised Discussion, we emphasize the need for further research to explore these patterns and acknowledge the limitations of our current dataset in making strong conclusions about the adaptive responses to selective pressures.

      It is not clear how the authors expected the relationship between water vapour conductance and elevation to differ among taxonomic groups and there was no attempt to explain the biological implication of these differences among taxonomic groups based on the specific traits of the species or their families. This missing piece of information is crucial to justify the argument that differences among taxonomic groups may be due to differences in adaptive response.

      We appreciate the reviewer’s point. To clarify, we were not expecting the relationship between water vapor conductance and elevation to differ among taxonomic groups. Rather, our primary hypothesis was that water vapor conductance would decrease with elevation due to the drier conditions in highland habitats, and we sought to link this pattern with structural characteristics of the eggshell. The suggestion of potential differences among taxonomic groups arose from the lack of a consistent pattern across families, which prompted us to consider possible adaptive variation. We now address this more clearly in the Discussion section, acknowledging the need for further exploration into the potential selective pressures driving this variation among taxonomic groups.

      Reviewer #2 (Public Review):

      This paper represents a strong advance for two main reasons. First, it provides evidence that egg physiology varies with elevation as predicted by the hypothesis that eggs are physiologically adapted to certain climatic conditions. This means egg physiological adaptation is a factor that could influence species' elevational ranges. Second, it is a proof-of-concept study that shows it is possible to measure eggshell physiology for a large number of species in the field in order to test hypotheses. As such, it should inspire many further tests that examine adaptation in egg physiology in the context of species' distributions along environmental gradients.

      There are two caveats that readers should be aware of. First, measuring these traits is difficult, and there remain questions about the efficacy of different methods. For example, the authors note that quantifying eggshell structures is very difficult, with several unresolved questions about their method of using scanning electron microscopy images to measure eggshell pores. Similarly, the authors mention that temperature variation may partially influence their main result that high-elevation eggs lose water at slower rates than low-elevation eggs (temperatures were colder for experiments at high elevations than for low elevations). Second, I regard the analyses of eggshell traits for specific families as exploratory. There are no a priori expectations for how different families might be expected to differ in their patterns. These analyses are fruitful in that they generate additional hypotheses that future work can test. However, it does mean that the statistical significance of eggshell trait relationships with elevation for specific families should be interpreted with caution.

      We thank Reviewer 2 for these insightful comments. As mentioned earlier, measuring these traits is indeed very challenging, and we acknowledge the limitations of our methods, particularly when it comes to using scanning electron microscopy to quantify eggshell structures. We are aware of the unresolved questions around these techniques, and we plan to continue refining these methods in future studies. Regarding the influence of temperature variation on water loss, we recognize that colder temperatures at high elevations may have influenced our results, and we address this potential confounding factor in the Discussion section, Line 257.

      We also agree with the reviewer’s point regarding the exploratory nature of the family-specific analyses. These analyses were not guided by specific hypotheses, other than the expectation of replicating the overall pattern, and we recognize that they should be interpreted with caution. They serve primarily to generate additional hypotheses for future studies. In the revised manuscript, we have toned down the emphasis on the statistical significance of eggshell trait relationships with elevation for specific families, and we emphasize the need for further research to confirm these patterns.

    1. eLife Assessment

      In this manuscript, Jong et al. provide and validate a very useful resource for performing CRISPR screenings to study neutrophil differentiation and function by generating Hoxb8 cells that constitutively express Cas9. This library-screening approach has the potential to improve on the established lentiviral CRISPR-Cas9 editing of Hoxb8 cells. However, the technical advances provided are only incremental and the results presented in this study do not significantly further our understanding of these cells, but rather provide a good validation of their Cas9+ modified version.

    2. Reviewer #1 (Public Review):

      This study aims to develop a new system to analyze genetic determinants of neutrophil function by large-scale genetic screens. For that, the authors use a genetically-engineered ER-Hoxb8 neutrophil progenitor line that expresses Cas9 to perform CRISPR screens and to identify genes required for neutrophil survival and differentiation.

      A main strength of this study is that the authors develop a pooled CRISPR sgRNA library applicable to neutrophils and show potential determinants of neutrophil differentiation in vitro using this screening methodology.

      A main weakness of this study is that identified candidates associated with neutrophil differentiation, as those indicated in Fig. 4A, were not validated in vivo using neutrophil-specific K.O. models or further characterized in vitro (e.g. transcriptional or epigenetic changes during maturation when compared to non-targeting sgRNA controls).

      As secondary strengths, the authors provide evidence of efficient gene editing in Cas9+ER-Hoxb8 neutrophils both in vivo and in vitro and provide evidence of the specificity of this methodology using a Cas9+ER-Hoxb8 immortalized cell line that differentiates into macrophages.

      In terms of methodology, this study provides a useful tool to explore gene regulatory networks in neutrophils in large-scale genetic screens. However, it falls short in identifying and validating the true potential of this kind of methodology in the biology of the neutrophil.

      Moreover, the technical advances in the field are only incremental as several studies, including those using CRISPR/Cas9 technology in Hoxb-8 immortalized neutrophil progenitor cell lines have been already performed.

    3. Reviewer #2 (Public Review):

      In this manuscript, Jong et al. provide and validate a very useful resource for performing CRISPR screenings to study neutrophil differentiation and function. The major strength of the paper lies in its careful validation of many aspects of the Hoxb8-immortalized progenitor cells, including their differentiation capacity, their ability to clear bacteria, and their capacity to differentiate in vivo. The authors succeed at this, with results correctly supporting their conclusions. The major weaknesses are its presentation and writing, some of which are poorly organized. Finally, while the potential impact of this resource in the field could be very large, the CRISPR screening results appear half-baked, almost preliminary, and could be better validated, or at least presented. A few other points that warrant revision are included below:

      • The introduction should be better constructed and organized. It should be written with more connectors to present facts in a stream that flows naturally, from the broad general facts to the experimental details implemented in the manuscript. It should also discuss other similar approaches used in the literature, such as LaFleur et al. 2019, and relate in which ways these presented methods could be better.

      • The scheme in Figure 4A should more clearly indicate the timings, doublings, numbers of cells, and other aspects of the experimental design.

      • The volcano plot in Figure 4B is poorly informative and almost redundant. What does one make of it?

      • The representation (normalized reads) of each sgRNA in the library and across multiple experiments, including their correlation, should be checked and plotted, to visualize how robust these replicates are.

      • In Figure 4E, the distribution of the hit sgRNAs should be compared to all other targeting guides (instead of just to non-targeting controls). Linear density distribution plots or scatter plots of all guides are usually the best way, but there are others (for example, see Figure 4 of LaFleur et al. 2019). Ideally, each independent sgRNA for each gene in the library, as well as biological replicates, should be separately shown, with hits clearly highlighted.

      • While in vivo differentiation is shown as possible with these cell lines, it is unclear whether CRISPR screenings could be performed in vivo too. Would sgRNA representation suffice for genome-wide? At least some of the new hits could be validated by testing differentiation in vivo (i.e. WASH complex).

      • In the methods section, the RNA-seq analysis pipeline details are missing (versions, software for alignment, quantification, differential gene expression, and enrichment). Also, parameters for MAGeCK and MAGeCKFlute should be explicit and detailed.

      • The discussion is mostly a summary of the results. It is lacking in detail and thoughtful discussion regarding novelty and impact beyond the validation of the cell line. What about potential applications? What about extending screenings to test bacterial-killing, as suggested in Figure 2? What about limitations compared to other similar methods out there? There is little discussion of such important potential matters. Also, a large part of the discussion is dedicated to discussing details about Cebpe that are all well known in the literature and add little value.

      • Figure legends are typically too succinct and hard to interpret, especially for non-experts. The text should enable the figure reader to correctly interpret what is shown in each panel.

    4. Reviewer #3 (Public Review):

      Primary neutrophils are difficult to modify genetically, whereas the generation of knockout mice to study the role of specific proteins is time-consuming and expensive. CRISPR-Cas 9 genetic modification of neutrophil progenitors in vitro offers a platform to study neutrophil biology. Hoxb8 cells are immortalized neutrophil progenitors that differentiate into neutrophils when cultured in the presence of G-CSF, and have been shown to recapitulate the stages of murine neutrophil differentiation. They have also been shown to be amendable to CRISPR-Cas 9 genetic editing with successful deletion of key transcriptional regulators of neutrophil maturation and function. The authors of this manuscript offer an extension to this technique, by generating Hoxb8 cells that constitutively express Cas9. This may reduce the variation between the generated knock-out cells by avoiding the introduction of Cas9 in a plasmid every time together with a guide RNA.

      The first part of the manuscript is dedicated to the characterisation of Cas9+HoxB8 cells throughout their differentiation. Considering the existing body of literature on HoxB8 progenitors and their differentiation into neutrophils ex vivo, it does not significantly further our understanding of these cells, but rather provides a good validation to their Cas9+ modified version of them. Gene editing using Cas9+ Hoxb8 progenitors seems to be highly efficient, which is an important technical point, however, it is hard to assess the degree of improvement in efficiency compared to the published protocols with Cas9 delivery in a plasmid.

      As a test, the authors use Cas9+HoxB8 progenitor to generate a knockout of CEBPE, known for its important role in neutrophil development. They convincingly demonstrate its impact on HoxB8 cell differentiation, with in vivo reconstitution of wild-type and CEBPE-deficient HoxB8 progenitors into irradiated mice being especially elegant. However, the transfer into different recipient mice assumed no differences in the recipient environment, while immunophenotyping for mature neutrophils within the HoxB8 progenitor-derived cells did not account for possible differences in numbers of wt and CEBPE KO surviving cells, limiting the conclusions.

      Finally, the authors put the system to the test by screening a library of Brie gRNA library of ~80K mouse sgRNAs, targeting almost 20K genes with 4 gRNA per gene coverage, to identify genes that are needed for the differentiation of Cas9+ERHoxb8 progenitors into mature neutrophils. They identify a number of hits, amongst which the WASH complex and CEBPE are highlighted. A comparison of cell numbers prior to differentiation and at 4 days post differentiation indicates that they are indeed required for neutrophil survival. To validate the role of these hits in neutrophil maturation itself, as they stated in the aims, i.e. "to identify genes that modulate the differentiation of Cas9+ERHoxb8 progenitors into mature neutrophils", phenotypic, functional, and morphological characterization of these cell lines could have been appropriate.

      Overall, this study has the potential to improve on the established lentiviral CRISPR-Cas9 editing of Hoxb8 cells and be valuable for library-screening approaches for neutrophil modulators, which will benefit the community.

    1. eLife Assessment

      This study reports an mRNA-based strategy for restoring sperm motility in a mouse model of monogenic male infertility. The work is technically innovative and potentially valuable, as it demonstrates feasibility of in vivo testicular mRNA delivery without genomic integration of foreign DNA. However, although partial recovery of sperm motility is supported, the evidence for meaningful restoration of fertility remains incomplete, with weak IVF outcomes and difficult-to-interpret ICSI results. In addition, mechanistic questions regarding the persistence of mRNA and the specificity of germ-cell targeting remain insufficiently resolved, limiting the strength of the authors' conclusions.

    2. Reviewer #4 (Public review):

      I maintain that the images in Figure 12 (new Figure 14) do not support the authors' interpretation that 2-cell embryos resulted from in vitro fertilization (IVF) of Amrc-/- rescued sperm. They are clearly not normal 2-cell embryos and instead look very much like fragmented eggs that can be seen occasionally following in vitro fertilization procedures even when that is done with wild type eggs and sperm. The only portion of current Figure 14 that has normal looking 2-cell embryos is in panel 14A4, where wild type B6D2 sperm were used. Even in that panel, there are some fragmented eggs that the authors identify as 2-cell embryos.

      The authors offer the explanation that CD1 eggs fertilized by B6D2F1 hybrid male sperm do not develop beyond the 2-cell stage, citing a 2008 paper published in Biology of Reproduction by Fernandez-Goonzalez et al. I read through that paper very carefully and even had a colleague read through it in case I missed something, but that paper says nothing at all about strain incompatibilities, much less 2-cell arrest due to them. The only crosses done in that paper are CD1 eggs x CD1 sperm and B6D2 eggs x B6D2 sperm, all by intracytoplasmic sperm injection and not standard in vitro fertilization. [Note that the paper does mention performing in vitro fertilization but says nothing about how it was done or what mouse strains were used.] I even searched the literature for information regarding incompatibility between these strains and could find nothing relevant. But even if the authors are correct and there happens to be a strain incompatibility and 2-cell arrest is expected, what the authors are calling 2-cell embryos are clearly not.

      A second explanation offered by the authors is that they used collagenase to remove the cumulus cells and that this may have affected the appearance of the embryos. This technique is actually used to remove both the cumulus cells and the zona pellucida and has been described as a gentler way to do so than other standard methods (hyaluronidase treatment followed by acid Tyrodes to remove the zona pellucida) (Yamatoya et al., Reprod Med Biol 2011, DOI 10.1007/s12522-011-0075-8). I think it is highly relevant to the current study that the method they used to remove cumulus cells also dissolves the zona, either partially or completely. Given that many of the eggs, fragmented eggs, and 2-cell embryos (from the WT sperm) in Figure 14A are lacking a zona pellucida, it seems very likely that many of the eggs were either zona-free or had partial zona dissolution from the start. In fact, the authors state in the Methods section that "Cumulus-free and zona-free eggs were collected..." for how IVF was done. Partial zona dissolution is standard in some protocols for performing IVF using frozen mouse sperm, which usually have much lower motility and overall efficacy than fresh sperm. In any case, it would improve transparency if the manuscript made clear somewhere other than buried in the Methods that the IVF procedure was done on eggs with partially or fully removed zonas, to allow proper interpretation.

      In the rebuttal, the authors go on to state: "To provide additional functional evidence, we complemented the IVF experiments with ICSI using rescued Armc2-/- sperm and B6D2 oocytes, which allowed embryos to develop to the blastocyst stage. In these experiments, 25% of injected oocytes reached the blastocyst stage with rescued sperm compared to 13% for untreated Armc2-/- sperm (Supplementary Fig. 9) These results support the functional competence of rescued sperm and demonstrate partial recovery of fertilization ability following Armc2 mRNA electroporation."

      Their conclusion that the data support partial recovery of fertilization ability following Armc2 mRNA electroporation in my opinion has no basis. This experiment was done only once, and no information is provided regarding how many eggs underwent ICSI or how many reached the blastocyst stage. The authors claim that the rescued sperm were better than the Armc2-/- sperm in producing blastocysts, but this is based on a simple percentage report of 25% vs 13% without any statistical analysis, even on the results from the single experiment presented.

      Overall, the paper shows rescue of some sperm motility by the new method they use, and the new title is therefore appropriate. The authors have also dealt reasonably with many of the original concerns regarding documenting that their methodology was effective in producing protein (at least the GFP marker) in spermatogenic cells. In my view the authors have, however, not shown any indication of functional recovery over what is already known for the knockout sperm, that ICSI can support blastocyst stage embryo development. They also have not, in my view, justified the claims at the end of the abstract "These motile sperm were able to produce embryos by IVF..." and that "...mRNA electroporation can restore...partially fertilizing ability..."

    3. Reviewer #5 (Public review):

      While the study presents an innovative and potentially impactful mRNA-based approach for addressing monogenic causes of male infertility, several significant weaknesses limit the strength of the authors' central conclusions.

      First, the functional evidence for true fertility restoration remains incomplete. Although the authors convincingly demonstrate partial recovery of sperm motility, the downstream reproductive outcomes, particularly for IVF, are weak. Importantly, these concerns are shared by all three reviewers and the former Reviewing Editor, and to my eye they are both thoughtfully articulated and well warranted. The ICSI data show modest improvement, but this rescue is difficult to interpret.

      In parallel, significant mechanistic questions persist regarding the unusually prolonged persistence of naked mRNA and reporter protein expression in germ cells, which is not fully reconciled with established mRNA and protein half-life biology and is supported largely by inference rather than by direct decay measurements.

      Finally, although the authors have conducted additional cellular analyses, concerns about the extent and specificity of germ-cell targeting versus Sertoli-cell expression remain unresolved. Together, these issues do not negate the technical novelty of the work, but they do constrain the confidence with which the current dataset can support the authors' strongest therapeutic claims.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors assess the effectiveness of electroporating mRNA into male germ cells to rescue the expression of proteins required for spermatogenesis progression in individuals where these proteins are mutated or depleted. To set up the methodology, they first evaluated the expression of reporter proteins in wild-type mice, which showed expression in germ cells for over two weeks. Then, they attempted to recover fertility in a model of late spermatogenesis arrest that produces immotile sperm. By electroporating the mutated protein, the authors recovered the motility of ~5% of the sperm; although the sperm regenerated was not able to produce offspring using IVF, the embryos reached the 2-cell state (in contrast to controls that did not progress past the zygote state).

      This is a comprehensive evaluation of the mRNA methodology with multiple strengths. First, the authors show that naked synthetic RNA, purchased from a commercial source or generated in the laboratory with simple methods, is enough to express exogenous proteins in testicular germ cells. The authors compared RNA to DNA electroporation and found that germ cells are efficiently electroporated with RNA, but not DNA. The differences between these constructs were evaluated using in vivo imaging to track the reporter signal in individual animals through time. To understand how the reporter proteins affect the results of the experiments, the authors used different reporters: two fluorescent (eGFP and mCherry) and one bioluminescent (Luciferase). Although they observed differences among reporters, in every case expression lasted for at least two weeks. The authors used a relevant system to study the therapeutic potential of RNA electroporation. The ARMC2-deficient animals have impaired sperm motility phenotype that affects only the later stages of spermatogenesis. The authors showed that sperm motility was recovered to ~5%, which is remarkable due to the small fraction of germ cells electroporated with RNA with the current protocol. The sperm motility parameters were thoroughly assessed by CASA. The 3D reconstruction of an electroporated testis using state-of-the-art methods to show the electroporated regions is compelling.

      The main weakness of the manuscript is that although the authors manage to recover motility in a small fraction of the sperm population, it is unclear whether the increased sperm quality is substantial to improve assisted reproduction outcomes. The authors found that the rescued sperm could be used to obtain 2-cell embryos via IVF, but no evidence for more advanced stages of embryo differentiation was provided. The motile rescued sperm was also successfully used to generate blastocyst by ICSI, but the statistical significance of the rate of blastocyst production compared to non-rescued sperm remains unclear. The title is thus an overstatement since fertility was never restored for IVF, and the mutant sperm was already able to produce blastocysts without the electroporation intervention.

      Overall, the authors clearly show that electroporating mRNA can improve spermatogenesis as demonstrated by the generation of motile sperm in the ARMC2 KO mouse model.

      We thank the reviewer for this thoughtful and constructive comment. We agree that our study demonstrates a partial functional recovery of spermatogenesis rather than a complete restoration of fertility. Our main objective was to establish and validate a proof-of-concept approach showing that mRNA electroporation can rescue the expression of a missing or mutated protein in post-meiotic germ cells and result in the production of motile sperm.

      To address the reviewer’s concern, we have the title and discussion to more accurately reflect the scope of our findings. The new title reads:

      “Sperm motility in mice with oligo-astheno-teratozoospermia restored by in vivo injection and electroporation of naked mRNA”

      In the manuscript, we now emphasize that while motility recovery was significant, complete fertility restoration was not achieved. We have also clarified that:

      The 5% recovery in motile sperm represents a substantial improvement considering the small population of germ cells reached by the current electroporation method.

      The 2-cell embryo formation observed after IVF serves as a strong indication of partial functional recovery

      Finally, we now explicitly state in the Discussion that this approach should be considered a therapeutic proof-of-concept, demonstrating feasibility and potential, rather than a fully curative intervention.

      Reviewer #2 (Public review):

      The authors inject, into the rete testes, mRNA and plasmids encoding mRNAs for GFP and then ARMC2 (into infertile Armc2 KO mice) in a gene therapy approach to express exogenous proteins in male germ cells. They do show GFP epifluorescence and ARMC2 protein in KO tissues, although the evidence presented is weak. Overall, the data do not necessarily make sense given the biology of spermatogenesis and more rigorous testing of this model is required to fully support the conclusions, that gene therapy can be used to rescue male infertility.

      In this revision, the authors attempt to respond to the critiques from the first round of reviews. While they did address many of the minor concerns, there are still a number to be addressed. With that said, the data still do not support the conclusions of the manuscript.

      We thank the reviewer for their careful and detailed assessment of our manuscript. We appreciate the concerns raised regarding mRNA stability, GFP localization, and the interpretation of spermatogenesis stages, and we have addressed these points in the manuscript and in the responses below.

      (1) The authors have not satisfactorily provided an explanation for how a naked mRNA can persist and direct expression of GFP or luciferase for ~3 weeks. The most stable mRNAs in mammalian cells have half-lives of ~24-60 hours. The stability of the injected mRNAs should be evaluated and reported using cell lines. GFP protein's half-life is ~26 hours, and luciferase protein's half-life is ~2 hours.

      We thank the reviewer for this important comment. The stability of mRNA-GFP was assessed by RT-QPCR in HEK cells and seminiferous tubule cells (Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells (Fig. 5A). Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability, efficient translation within germ cells and the slow protein turnover that is typical of the spermatogenic lineage.

      (2) There is no convincing data shown in Figs. 1-8 that the GFP is even expressed in germ cells, which is obviously a prerequisite for the Armc2 KO rescue experiment shown in the later figures! In fact, to this reviewer the GFP appears to be in Sertoli cell cytoplasm, which spans the epithelium and surrounds germ cells - thus, it can be oft-confused with germ cells. In addition, if it is in germ cells, then the authors should be able to show, on subsequent days, that it is present in clones of germ cells that are maturing. Due to intracellular bridges, a molecule like GFP has been shown to diffuse readily and rapidly (in a matter of minutes) between adjacent germ cells. To clarify, the authors must generate single cell suspensions and immunostain for GFP using any of a number of excellent commercially-available antibodies to verify it is present in germ cells. It should also be present in sperm, if it is indeed in the germline.

      We thank the reviewer for this insightful comment. To directly address the concern, we performed additional experiments to assess GFP expression in germ cells following in vivo mRNA delivery. GFP-encoding mRNA was injected and electroporated into the testes on day 0. On day 1, testes were collected, enzymatically dissociated, and the resulting seminiferous tubule cell suspensions were cultured for 12 hours. Live cells were then analyzed by fluorescence microscopy (Fig. 10).

      We observed GFP expression in various germ cell types, including pachytene spermatocytes (53,4 %) (Fig 10 A-), round spermatids (25 %) (Fig 10B-E) and in elongated spermatids (11,4%) (Fig 10 C-E). The identification of these cell types was based on DAPI nuclear staining patterns, cell size fig 10 F, non-adherent characteristics, and the use of an enzymatic dissociation protocol.

      Fluorescence imaging revealed strong cytoplasmic GFP signals in each of these populations, confirming efficient transfection and translation of the delivered mRNA. These results demonstrate that the in vivo injection and electroporation protocol enables effective mRNA transfection across multiple stages of spermatogenesis. These results confirm that the injected mRNA is efficiently translated in germ cells at various stages of spermatogenesis. Together, these data validate the germ cell-specific nature of the GFP signal, supporting the Armc2 KO rescue experiments.

      As mentioned previously, we assessed the stability of mRNA-GFP using RT-QPCR in HEK cells and seminiferous tubule cells (see Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells. Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability and local translation within germ cells, as well as the slow protein turnover typical of the spermatogenic lineage.

      Other comments:

      70-1 This is an incorrect interpretation of the findings from Ref 5 - that review stated there were ~2,000 testis-enriched genes, but that does not mean "the whole process involves around two thousand of genes"

      We thank the reviewer for this helpful comment. We agree that our previous phrasing was imprecise. We have revised the sentence to clarify that approximately 2,000 genes show testis-enriched expression, rather than implying that the entire spermatogenic process is limited to these genes. The corrected sentence now reads:

      “Spermatogenesis involves the coordinated expression of a large number of genes, with approximately 2,000 showing testis-enriched expression, about 60% of which are expressed exclusively in the testes”

      74 would specify 'male':

      we have now specified it as you suggested.

      79-84 Are the concerns with ICSI due to the procedure itself, or the fact that it's often used when there is likely to be a genetic issue with the male whose sperm was used? This should be clarified if possible, using references from the literature, as this reviewer imagines this could be a rather contentious issue with clinicians who routinely use this procedure, even in cases where IVF would very likely have worked:

      We thank the reviewer for this important comment. Concerns about ICSI outcomes indeed reflect two partly overlapping causes: the procedure itself (direct sperm injection and associated laboratory manipulations) and the clinical/genetic background of couples undergoing ICSI (especially men with severe male-factor infertility). Large reviews and meta-analyses report a small increase in some perinatal and congenital risks after ART/ICSI, but these studies conclude that it is difficult to fully disentangle procedural effects from parental factors. Importantly, genetic or epigenetic abnormalities in the male (which motivate use of ICSI) likely contribute to adverse outcomes in offspring, while some studies also suggest that ICSI-specific manipulations may alter epigenetic marks in embryos. For these reasons professional bodies recommend reserving ICSI for appropriate male-factor indications rather than as routine insemination for non-male-factor cases

      We have revised the text accordingly to clarify this distinction:

      “ICSI can efficiently overcome the problems faced.  Nevertheless, concerns persist regarding the potential risks associated with this technique, including blastogenesis defect, cardiovascular defect, gastrointestinal defect, musculoskeletal defect, orofacial defect, leukemia, central nervous system tumors, and solid tumors [1-4]. Statistical analyses of birth records have demonstrated an elevated risk of birth defects, with a 30-40 % increased  likelihood in cases involving ICSI [1-4], and a prevalence of birth defects between 1 % and 4 % [3]. It is important to note, however, that the origin of these risks remains debated. Several large epidemiological and mechanistic studies indicate that both the procedure itself (direct microinjection and in vitro manipulation) and the underlying genetic or epigenetic abnormalities often present in men requiring ICSI contribute to the observed outcomes [1, 3] [5, 6] . To overcome these drawbacks, a number of experimental strategies have been proposed to bypass ARTs and restore spermatogenesis and fertility, including gene therapy [7-10].”

      199 Codon optimization improvement of mRNA stability needs a reference;

      We have added the references accordingly: [11-15]

      In one study using yeast transcripts, optimization improved RNA stability on the order of minutes (e.g., from ~5 minutes to ~17 minutes); is there some evidence that it could be increased dramatically to days or weeks?

      We agree with the reviewer that codon optimization can enhance mRNA stability, but available evidence indicates that this effect is moderate. In Saccharomyces cerevisiae, Presnyak et al. (2015) [16] showed that codon optimization increased mRNA half-life from approximately 5 minutes to ~17 minutes, representing a several-fold improvement rather than a shift to days or weeks. Similar codon-dependent stabilization has been observed in mammalian systems, where transcripts enriched in optimal codons exhibit longer half-lives and enhanced translation efficiency [11]; [17]). However, these studies consistently report effects on the scale of minutes to hours. In mammalian cells, the prolonged stability of therapeutic or vaccine mRNAs—lasting for days—is primarily achieved through additional features such as optimized untranslated regions, chemical nucleotide modifications (e.g., N¹-methylpseudouridine), and protective delivery systems, rather than codon usage alone ([18]; [19]).

      Other molecular optimizations that improve in vivo mRNA stability and translation include a poly(A) tail, which binds poly(A)-binding proteins to protect the transcript from 3′ exonuclease degradation and promotes ribosome recycling, and a CleanCap structure at the 5′ end, which mimics the natural Cap 1 configuration, protects against 5′ exonuclease attack, and enhances translational initiation [11-15]. Together, these modifications act synergistically to stabilize the transcript and support efficient translation.

      472-3 The reported half-life of EGFP is ~36 hours - so, if the mRNA is unstable (and not measured, but certainly could be estimated by qRT-PCR detection of the transcript on subsequent days after injection) and EGFP is comparatively more stable (but still hours), how does EGFP persist for 21 days after injection of naked mRNA??

      We thank the reviewer for this important comment. The stability of mRNA-GFP was assessed by RT-QPCR in HEK cells and seminiferous tubule cells (Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells (Fig. 5). Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability, efficient translation within germ cells and the slow protein turnover that is typical of the spermatogenic lineage.

      Curious why the authors were unable to get anti-GFP to work in immunostaining?

      We appreciate the reviewer’s question. We attempted to detect GFP using several commercially available anti-GFP antibodies under various standard immunostaining conditions. However, in our hands, these antibodies consistently produced either no signal or high background staining, making the results unreliable. We therefore relied on direct detection of GFP fluorescence, which provides a more accurate and specific readout of protein expression in our system.

      In Fig. 3-4, the GFP signals are unremarkable, in that they cannot be fairly attributed to any structure or cell type - they just look like blobs; and why, in Fig. 4D-E, why does the GFP signal appear stronger at 21 days than 15 days? And why is it completely gone by 28 days? This data is unconvincing.

      We would like to thank the reviewer for their comments. Figure 3–4 provides a global overview of GFP expression on the surface of the testis. The entire testis was imaged using an inverted epifluorescence microscope, and the GFP signal represents a composite of multiple seminiferous tubules across the tissue surface. Due to this whole-organ imaging approach, it is not possible to resolve individual structures such as the basement membrane or lumen, which is why the signals may appear as diffuse “blobs.”

      Regarding the time-course in Figure 4D–E, the apparent increase in GFP signal at 21 days compared with 15 days likely reflects accumulation and translation of the delivered mRNA in germ cells over time, whereas the absence of signal at 28 days corresponds to the natural turnover and degradation of GFP protein and mRNA in the tissue. We hope this explanation clarifies the observed patterns of fluorescence.

      If the authors did a single cell suspension, what types or percentage of cells would be GFP+? Since germ cells are not adherent in culture, a simple experiment could be done whereby a single cell suspension could be made, cultured for 4-6 hours, and non-adherent cells "shaken off" and imaged vs adherent cells. Cells could also be fixed and immunostained for GFP, which has worked in many other labs using anti-GFP.

      We thank the reviewer for this insightful comment. To directly address the concern, we performed additional experiments to assess GFP expression in germ cells following in vivo mRNA delivery. GFP-encoding mRNA was injected and electroporated into the testes on day 0. On day 1, testes were collected, enzymatically dissociated, and the resulting seminiferous tubule cell suspensions were cultured for 12 hours. Live cells were then analyzed by fluorescence microscopy (Fig. 10).

      We observed GFP expression in various germ cell types, including pachytene spermatocytes (53,4 %) (Fig 10 A-), round spermatids (25 %) (Fig 10B-E) and in elongated spermatids (11,4%) (Fig 10 C-E). The identification of these cell types was based on DAPI nuclear staining patterns, cell size fig 10 F, non-adherent characteristics, and the use of an enzymatic dissociation protocol.

      Fluorescence imaging revealed strong cytoplasmic GFP signals in each of these populations, confirming efficient transfection and translation of the delivered mRNA. These results demonstrate that the in vivo injection and electroporation protocol enables effective mRNA transfection across multiple stages of spermatogenesis.

      These results confirm that the injected mRNA is efficiently translated in germ cells at various stages of spermatogenesis. Together, these data validate the germ cell-specific nature of the GFP signal, supporting the Armc2 KO rescue experiments.

      As mentioned previously, we assessed the stability of mRNA-GFP using RT-QPCR in HEK cells and seminiferous tubule cells (see Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells. Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability and local translation within germ cells, as well as the slow protein turnover typical of the spermatogenic lineage.

      In Fig. 5, what is the half-life of luciferase? From this reviewer's search of the literature, it appears to be ~2-3 h in mammalian cells. With this said, how do the authors envision detectable protein for up to 20 days from a naked mRNA? The stability of the injected mRNAs should be shown in a mammalian cell line - perhaps this mRNA has an incredibly long half-life, which might help explain these results. However, even the most stable endogenous mRNAs (e.g., globin) are ~24-60 hrs.

      We did not directly assess the stability of luciferase mRNA, but we evaluated the persistence of GFP mRNA, which was synthesized and optimized using the same sequence optimization and chemical modification strategy as the luciferase mRNA. In these experiments, mRNA-GFP was detectable in seminiferous tubule cells for up to two weeks after injection. We therefore expect a similar stability profile for the luciferase mRNA. These findings suggest that the prolonged fluorescence or bioluminescence observed in our study likely reflects a combination of factors, including enhanced transcript stability, local translation within germ cells, and the inherently slow protein turnover characteristic of the spermatogenic lineage.

      527-8 The Sertoli cell cytoplasm is not just present along the basement membrane as stated, but also projects all the way to the lumina:

      we clarified the sentence " Sertoli cells have an oval to elongated nucleus and the cytoplasm presents a complex shape (“tombstone” pattern) along the basement membrane, with long projections that extend toward the lumen."

      529-30 This is incorrect, as round spermatids are never "localized between the spermatocytes and elongated spermatids" - if elongated spermatids are present, rounds are not - they are never coincident in the same testis section:

      We thank the reviewer for this important comment and for drawing attention to the detailed staging of the seminiferous epithelium. We agree that the spatial organization of germ cells varies depending on the stage of spermatogenesis. While round spermatids (steps 1–8) and elongated spermatids (steps 9–16) are typically associated with distinct stages, transitional stages of the seminiferous epithelium can contain both cell types in close proximity, reflecting the continuous and overlapping nature of spermatid differentiation (Meistrich, 2013, Methods Mol. Biol. 927:299–307). We have revised the text to clarify this point, indicating that the relative positioning of germ cell types depends on the stage of the seminiferous cycle rather than implying their constant coexistence within the same tubule section.

      Fig. 7. To this reviewer, all of the GFP appears to be in Sertoli cell cytoplasm In Figs 1-8 there is no convincing evidence presented that GFP is expressed in germ cells! In fact, it appears to be in Sertoli cells.

      We thank the reviewer for their observation. As previously mentioned, we have included an additional experiment specifically demonstrating GFP expression in germ cells (fig 10). This new data provides clear evidence that the GFP signal is not restricted to Sertoli cells and confirms successful uptake and translation of GFP mRNA in germ cells.

      Fig. 9 - alpha-tubuline?

      We corrected the figure.

      Fig. 11 - how was sperm morphology/motility not rescued on "days 3, 6, 10, 15, or 28 after surgery", but it was in some at 21 and 35? How does this make sense, given the known kinetics of male germ cell development??

      We note the reviewer’s concern regarding the timing of motile sperm appearance. Variability among treated mice is expected because transfection efficiency differed between spermatogonia and spermatids. Full spermiogenesis requires ~15 days, and epididymal transit adds ~8 days, consistent with motile sperm appearing around 21 days post-injection in some mice.

      And at least one of the sperm in the KO in Fig. B5 looks relatively normal, and the flagellum may be out-of-focus in the image? With only a few sperm for reviewers to see, how can we know these represent the population?

      We thank the reviewer for their comment. Upon closer examination of the image, the flagellum of the spermatozoon in question is clearly abnormally short and this is not due to being out of focus. Furthermore, the supplementary figure shows that the KO consistently lacks normal spermatozoa. These defects are consistent with previous findings from our laboratory [22], confirming that the observed phenotype is representative of the KO population rather than an isolated occurrence.

      Reviewer #3 (Public review):

      Summary:

      The authors used a novel technique to treat male infertility. In a proof-of-concept study, the authors were able to rescue the phenotype of a knockout mouse model with immotile sperm using this technique. This could also be a promising treatment option for infertile men.

      Strengths:

      In their proof-of-concept study, the authors were able to show that the novel technique rescues the infertility phenotype of Armc2 knockout spermatozoa. In the current version of the manuscript, the authors have added data on in vitro fertilisation experiments with Armc2 mRNA-rescued sperm. The authors show that Armc2 mRNA-rescued sperm can successfully fertilise oocytes that develop to the blastocyst stage. This adds another level of reliability to the data.

      Weaknesses:

      Some minor weaknesses identified in my previous report have already been fixed. The technique is new and may not yet be fully established for all issues. Nevertheless, the data presented in this manuscript opens the way for several approaches to immotile spermatozoa to ensure successful fertilisation of oocytes and subsequent appropriate embryo development.

      [Editors' note: The images in Figure 12 do not support the authors' interpretation that 2-cell embryos resulted from in vitro fertilization. Instead, the cells shown appear to be fragmented, unfertilized eggs. Combined with the lack of further development, it seems highly unlikely that fertilization was successful.]

      We thank the reviewer for their careful evaluation and constructive feedback. We appreciate the acknowledgment of the strengths of our study, particularly the proof-of-concept demonstration that Armc2-mRNA electroporation can rescue sperm motility in Armc2 knockout mice.

      Regarding the concern raised by the editor about Figure 12, we would like to clarify two technical points. First, the IVF experiments were performed using CD1 oocytes and B6D2 sperm. Due to strain-specific incompatibilities, fertilization of CD1 oocytes by B6D2 sperm typically does not progress beyond the two-cell stage (Fernández-González [23] et al., 2008, Biology of Reproduction). Therefore, the observation of two-cell embryos represents the expected limit of development in this cross and serves as a strong indication of successful fertilization, even though further development is not possible. Second, the oocytes used in these experiments were treated with collagenase to remove cumulus cells. This enzymatic treatment can sometimes affect the morphology of early embryos, which may explain why the two-cell embryos in Figure 12 appear less regular or somewhat fragmented. We also included a control showing embryos from B6D2 sperm with the same collagenase treatment on CD1 oocytes, which yielded similar appearances (Fig14 A4).

      To provide additional functional evidence, we complemented the IVF experiments with ICSI using rescued Armc2<sup>–/–</sup> sperm and B6D2 oocytes, which allowed embryos to develop to the blastocyst stage. In these experiments, 25% of injected oocytes reached the blastocyst stage with rescued sperm compared to 13% for untreated Armc2–/– sperm (Supplementary Fig. 9) These results support the functional competence of rescued sperm and demonstrate partial recovery of fertilization ability following Armc2 mRNA electroporation.

      We have clarified these points in the revised Results and Discussion sections to emphasize that the IVF data indicate partial functional recovery of rescued sperm rather than full fertility restoration. These clarifications address the editor’s concern while accurately representing the technical limitations of the strain combination used in our experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Fig 12 and Supplementary Fig 9 are mislabeled in the text and rebuttal.

      We thank the reviewer for pointing this out. We have carefully checked the manuscript and the rebuttal text, and corrected all references to Figure 12 and Supplementary Figure 9 to ensure they are accurately labeled and consistent throughout the text.

      Reviewer #3 (Recommendations for the authors):

      The contribution of the newly added authors should be clarified. All other aspects of inadequacy raised in my previous report have been adequately addressed.

      No further comments.

      We thank the reviewer for noting this. The contributions of the newly added authors have been clarified in the Author Contributions section of the revised manuscript. All other points raised in the previous review have been addressed as indicated.

      References

      (1) Hansen, M., et al., Assisted reproductive technologies and the risk of birth defects--a systematic review. Hum Reprod, 2005. 20(2): p. 328-38.

      (2) Halliday, J.L., et al., Increased risk of blastogenesis birth defects, arising in the first 4 weeks of pregnancy, after assisted reproductive technologies. Hum Reprod, 2010. 25(1): p. 59-65.

      (3) Davies, M.J., et al., Reproductive technologies and the risk of birth defects. N Engl J Med, 2012. 366(19): p. 1803-13.

      (4) Kurinczuk, J.J., M. Hansen, and C. Bower, The risk of birth defects in children born after assisted reproductive technologies. Curr Opin Obstet Gynecol, 2004. 16(3): p. 201-9.

      (5) Graham, M.E., et al., Assisted reproductive technology: Short- and long-term outcomes. Dev Med Child Neurol, 2023. 65(1): p. 38-49.

      (6) Palermo, G.D., et al., Intracytoplasmic sperm injection: state of the art in humans. Reproduction, 2017. 154(6): p. F93-f110.

      (7) Usmani, A., et al., A non-surgical approach for male germ cell mediated gene transmission through transgenesis. Sci Rep, 2013. 3: p. 3430.

      (8) Raina, A., et al., Testis mediated gene transfer: in vitro transfection in goat testis by electroporation. Gene, 2015. 554(1): p. 96-100.

      (9) Michaelis, M., A. Sobczak, and J.M. Weitzel, In vivo microinjection and electroporation of mouse testis. J Vis Exp, 2014(90).

      (10) Wang, L., et al., Testis electroporation coupled with autophagy inhibitor to treat non-obstructive azoospermia. Mol Ther Nucleic Acids, 2022. 30: p. 451-464.

      (11) Wu, Q., et al., Translation affects mRNA stability in a codon-dependent manner in human cells. eLife, 2019. 8: p. e45396.

      (12) Gallie, D.R., The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes & Development, 1991. 5(11): p. 2108-2116.

      (13) Henderson, J.M., et al., Cap 1 messenger RNA synthesis with co-transcriptional CleanCap® analog improves protein expression in mammalian cells. Nucleic Acids Research, 2021. 49(8): p. e42.

      (14) Stepinski, J., et al., Synthesis and properties of mRNAs containing novel “anti-reverse” cap analogs. RNA, 2001. 7(10): p. 1486-1495.

      (15) Sachs, A.B., P. Sarnow, and M.W. Hentze, Starting at the beginning, middle, and end: translation initiation in eukaryotes. Cell, 1997. 89(6): p. 831-838.

      (16) Presnyak, V., et al., Codon optimality is a major determinant of mRNA stability. Cell, 2015. 160(6): p. 1111-24.

      (17) Cao, D., et al., Unlock the sustained therapeutic efficacy of mRNA. J Control Release, 2025. 383: p. 113837.

      (18) Karikó, K., et al., Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol Ther, 2008. 16(11): p. 1833-40.

      (19) Pardi, N., et al., mRNA vaccines — a new era in vaccinology. Nature Reviews Drug Discovery, 2018. 17(4): p. 261-279.

      (20) Meistrich, M.L. and R.A. Hess, Assessment of Spermatogenesis Through Staging of Seminiferous Tubules, in Spermatogenesis: Methods and Protocols, D.T. Carrell and K.I. Aston, Editors. 2013, Humana Press: Totowa, NJ. p. 299-307.

      (21) Au - Mäkelä, J.-A., et al., JoVE, 2020(164): p. e61800.

      (22) Coutton, C., et al., Bi-allelic Mutations in ARMC2 Lead to Severe Astheno-Teratozoospermia Due to Sperm Flagellum Malformations in Humans and Mice. Am J Hum Genet, 2019. 104(2): p. 331-340.

      (23) Fernández-Gonzalez, R., et al., Long-term effects of mouse intracytoplasmic sperm injection with DNA-fragmented sperm on health and behavior of adult offspring. Biol Reprod, 2008. 78(4): p. 761-72.

    1. eLife Assessment

      In their study, Scherer and colleagues aim to use analyses of single-cell clones of murine granulocyte monocyte progenitors that are conditionally immortalized, and analyses of neutrophils derived from those clones to characterize an experimental system for studying neutrophil heterogeneity. The multi-omic and functional analyses reported are valuable but the strength of the evidence presented in support of them is incomplete because the study lacks a rigorous demonstration that the neutrophil-like cells that they derive are fully mature neutrophils.

    2. Reviewer #1 (Public Review):

      The heterogeneity within the neutrophil population is becoming clear. However, it was not clear if neutrophil progenitors are also heterogenous. Because neutrophils are short-lived, it is technically challenging to tackle the question. This study used a system to isolate and expand clonal neutrophil progenitors (granulocyte-monocyte progenitors; GMPs) to achieve molecular and functional profiling. In the study, transcriptional profiling was performed by RNAseq and ATACseq. Functional assays were performed ex vivo to examine phagocytosis, ROS production, NET formation, and neutrophil swarming using Candida albicans, as well as C. glabrata and C. auris. The strengths of this study include the use of the neutrophil clone system to track GMPs developing into neutrophils. The clone-based approach made it possible to evaluate the functions of multiple neutrophil subpopulations. Limitations of this study include the dependency on ex vivo approaches and the modest degree of heterogeneity within presented neutrophils. Nevertheless, the finding - the heterogeneity of neutrophils can be traced back to the GMP stage - is significant.

    3. Reviewer #2 (Public Review):

      The stated goal of the authors is to establish and characterize an experimental system to study neutrophil heterogeneity in a manner that allows for functional outcomes to be probed. To do so, they start with murine GMPs that are conditionally immortalized by ER-HoxB8 expression and make single-cell clonal populations to ask whether those GMPs or neutrophils derived by differentiating such clonal GMPs harbor heterogeneity. At a conceptual level, this is an innovative approach that could shed light on mechanisms of neutrophil heterogeneity that have been described in both health and disease. They perform bulk multi-omics and functional analyses of both the clonal GMPs and neutrophil-like cells, including transcriptional and epigenetic profiling. However, the major weakness of the study is that the authors do not provide rigorous or convincing data that the cells they derive are truly mature neutrophils. To the contrary, the neutrophil-like cells lack Ly6G expression and so the authors fall back on using CD11b as the primary marker for delineating neutrophils; however, CD11b is expressed by both myeloid progenitors and some premature and mature myeloid lineages that are not neutrophils. They acknowledge this shortcoming, but they make an assumption that Ly6G expression is the only way in which the cells they derive are different from primary neutrophils without presenting any evidence indicating such. The authors use only SCF during the maturation of ER-HoxB8 GMPs into leukocytes, rather than including other cytokines such as G-CSF (or use in vivo maturation) that could have better-induced differentiation and maturation into granulocytes/neutrophils. The authors did not use their transcriptional analyses to further establish that the cells they derive from ER-HoxB8 GMPs are similar/different from primary murine neutrophils. Unfortunately, this shortcoming means that all of the analyses of neutrophil-like cells derived from clonal GMPs may or may not represent the transcriptional, epigenetic, etc. profile of a true mature neutrophil. It is also not rigorously addressed whether what they call PMNs derived from clonal GMPs are a transcriptionally uniform population or if they harbor heterogeneity within the bulk population. Overall, while conceptually intriguing and in pursuit of an experimental system that would be impactful for the field, the study as performed has critical flaws.

    4. Author response:

      Reviewer #1 (Public Review):

      The heterogeneity within the neutrophil population is becoming clear. However, it was not clear if neutrophil progenitors are also heterogenous. Because neutrophils are short-lived, it is technically challenging to tackle the question. This study used a system to isolate and expand clonal neutrophil progenitors (granulocyte-monocyte progenitors; GMPs) to achieve molecular and functional profiling. In the study, transcriptional profiling was performed by RNAseq and ATACseq. Functional assays were performed ex vivo to examine phagocytosis, ROS production, NET formation, and neutrophil swarming using Candida albicans, as well as C. glabrata and C. auris. The strengths of this study include the use of the neutrophil clone system to track GMPs developing into neutrophils. The clone-based approach made it possible to evaluate the functions of multiple neutrophil subpopulations. Limitations of this study include the dependency on ex vivo approaches and the modest degree of heterogeneity within presented neutrophils. Nevertheless, the finding - the heterogeneity of neutrophils can be traced back to the GMP stage - is significant.

      Reviewer #2 (Public Review):

      The stated goal of the authors is to establish and characterize an experimental system to study neutrophil heterogeneity in a manner that allows for functional outcomes to be probed. To do so, they start with murine GMPs that are conditionally immortalized by ER-HoxB8 expression and make single-cell clonal populations to ask whether those GMPs or neutrophils derived by differentiating such clonal GMPs harbor heterogeneity. At a conceptual level, this is an innovative approach that could shed light on mechanisms of neutrophil heterogeneity that have been described in both health and disease. They perform bulk multi-omics and functional analyses of both the clonal GMPs and neutrophil-like cells, including transcriptional and epigenetic profiling. However, the major weakness of the study is that the authors do not provide rigorous or convincing data that the cells they derive are truly mature neutrophils. To the contrary, the neutrophil-like cells lack Ly6G expression and so the authors fall back on using CD11b as the primary marker for delineating neutrophils; however, CD11b is expressed by both myeloid progenitors and some premature and mature myeloid lineages that are not neutrophils. They acknowledge this shortcoming, but they make an assumption that Ly6G expression is the only way in which the cells they derive are different from primary neutrophils without presenting any evidence indicating such. The authors use only SCF during the maturation of ER-HoxB8 GMPs into leukocytes, rather than including other cytokines such as G-CSF (or use in vivo maturation) that could have better-induced differentiation and maturation into granulocytes/neutrophils.

      Thank you. Of note, reviewer #1 also commented on the question of including other cytokines during the neutrophil differentiation process. We have included our response to reviewer #1 below, which includes the use of GM-CSF and IL-4.

      “We have now demonstrated enhanced Ly6G expression with GM-CSF and IL-4 treatment in a new Supplementary Figure 1.

      GMPs were washed out of estradiol-containing media and placed in fresh media containing 10 ng/ml GM-CSF and/or 1 ng/ml IL-4 for four days. Cells were collected and stained with CD117 (APC), F4/80 (AlexaFluor 488), Ly6G (PE), and CD11b (BV421). Neutrophil clones were run in biological triplicates, and undifferentiated GMPs were included as a negative control.

      GMPs stain as CD117POS / F4/80NEG / Ly6GNEG / CD11bNEG, indicating they are immature. The clones removed from estradiol differentiate and lose their CD117 expression. The mature cells remain F4/80NEG, as expected for mature neutrophils.

      The addition of GM-CSF to the media led to a significant increase in the expression of Ly6G. The addition of both GM-CSF + IL-4 did not further increase the proportion of Ly6G+ cells, and we have altered our statement slightly in the main text to reflect this finding (line 139).”

      The authors did not use their transcriptional analyses to further establish that the cells they derive from ER-HoxB8 GMPs are similar/different from primary murine neutrophils. Unfortunately, this shortcoming means that all of the analyses of neutrophil-like cells derived from clonal GMPs may or may not represent the transcriptional, epigenetic, etc. profile of a true mature neutrophil.

      Thank you. The ER-Hoxb8 system has been well-characterized by many authors at the function and at the transcriptional level, confirming that the cells highly reflect that same gene expression pattern as mature neutrophils. This was actually recently reviewed by Lail et al. (Traffic, 2022, PMID: 36117140). In terms of our analysis, we used transcriptional profiling to examine heterogeneity between different single-cell clones and not to re-validate the similarity with primary neutrophils.

      It is also not rigorously addressed whether what they call PMNs derived from clonal GMPs are a transcriptionally uniform population or if they harbor heterogeneity within the bulk population.

      Thank you. The reviewer poses an interesting, albeit challenging, question of whether even a single GMP clone can differentiate and result in mature neutrophil heterogeneity. To address this would require single cell sequencing of the resulting cells which we did not perform. We relied on single cell subcloning of the immature granulocyte monocyte progenitors to ensure a genetically identical clonal population. This was then additional confirmed by the retroviral insertional analysis. These analyses confirmed the clonal nature of our starting population, from which we posed the question of as whether the neutrophils derived from these clonal GMPs resulted in mature cells with consistent functional heterogeneity, which was indeed the case.

      Overall, while conceptually intriguing and in pursuit of an experimental system that would be impactful for the field, the study as performed has critical flaws.

    1. eLife Assessment

      This important study tackles an interesting aspect of fungal physiology: how a mitochondria-associated gene influences production of the secondary metabolite DON and fungicide sensitivity. The authors have improved the manuscript and the supporting evidence is convincing, although some uncertainties remain around descriptions of the methods.

    2. Reviewer #1 (Public review):

      Summary:

      In their study the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely and the manuscript is well written, albeit in some cases details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease of the DON content associated with deletion of FgDML1: Although some growth data are shown in figure 6 - indicating a severe growth defect - the DON production presented in figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to a decreased growth and specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation on the same conditions to the DON amount detected. Only then a conclusion as to an altered production in the mutant strains can be drawn.

      Comments to the revised manuscript:

      The authors carefully revised the manuscript and provided explanations for methods in several cases. However, there are still some problems - probably due to misunderstanding - that need revision.

      (1) A major problem of the first version of the manuscript was the lack of appropriate description of biomass analysis and the consideration of the respective results for evaluation of production of DON and other metabolites. Although the authors provide some explanation in the response to reviews, I could not find a corresponding explanation or description in the manuscript. It is not sufficient to explain the problem to me, but a detailed explanation and description of the method has to be provided in the manuscript along with the definition of one "unit of mycelium". It is still not entirely clear to me what such a "unit of mycelium" is.

      Please clarify this and any other uncertainties that were commented on by me and other reviewers in the manuscript, not only in the response to reviews. Also adjust the reference list accordingly.

      (2) Another problem was, that the authors considered FgDML1 a regulator of DON production. As mentioned by me and reviewer 3, FgDML1 is crucial to numerous functions in F. graminearum and its lack causes a plethora of problems for fungal physiology. Hence, although it is clear that the lack of FgDML1 causes alterations in DON production, it is not appropriate to designate this factor as a "regulator".<br /> It seems to me that the authors are afraid that if FgDML1 would not be a "regulator" that this would decrease the value of their study, which is not the case. This is a matter of correct wording. Therefore, please revise the wording accordingly, starting with the title:

      ...FgDML1 impacts DON toxin biosynthesis...

      Moreover, for sure the manuscript might benefit from more detailed description of the whole cascade leading from FgDML1 to DON biosynthesis and production of the other metabolites that change upon deletion. Such explanation can help the reader grasp the relevance of FgDML for regulatory processes as well as on more general versus specific effects.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper in innovative, but there are issues in the writing that need to be added and corrected.

      Comments on revisions:

      The author has addressed my questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      In their study, the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely, and the manuscript is well written, albeit in some cases, details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease in the DON content associated with the deletion of FgDML1. Although some growth data are shown in Figure 6, indicating a severe growth defect, the DON production presented in Figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to decreased growth, and the specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation under the same conditions to the DON amount detected. Only then can a conclusion as to an altered production in the mutant strains be drawn.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions. The point to point responds to the reviewer’s comments are listed as following. Our method for DON quantification was based on the amount per unit of mycelium. After obtaining the absorbance value from the ELISA reaction, the concentration of DON was calculated according to a standard curve and a formula, then divided by the dry weight of the mycelium to obtain the DON content per unit of mycelium, with the results finally expressed in µg/g.

      (1) Line 139f

      ... FgDML1 is a critical positive regulator of virulence ....

      Clearly, the deletion of FgDML1 impacts virulence, but it is too much of a general effect to say it is a regulator. DML1 acts high up in the cascade, impacting numerous processes, one of which is virulence. Generally, it has to be considered that deletion of DML1 causes a severe growth defect, which in turn is likely to lead to a plethora of effects. Besides discussing this fact, please also revise the manuscript to avoid references to "direct effects" or "regulator".

      Thank you very much for your advice. Our method for determining the amount of DON is based on the amount of mycelium per unit. After obtaining the absorbance value through Elisa reaction, we calculate the concentration of DON toxin according to the established standard curve and formula. Then, we divide it by the dry weight of mycelium to obtain the DON toxin content per unit mycelium, and finally present the results in µg/g. In summary, we conclude that the decrease in DON production by ΔFgDML is not due to slower hyphal growth, but rather a decrease in the ability of unit hyphae to produce DON toxins compared to the wild type. Given the decrease in DON toxin synthesis caused by FgDML1 deficiency, we believe that using a regulator is reasonable.

      (2) Line 143

      Please define "toxin-producing conditions".

      Thank you very much for your advice. We have accurately defined the conditions for toxin-producing conditions in the manuscript' toxin-inducing conditions '(28°C, 145 ×g, 7 days incubation)' (in L163-164)

      (3) Line 149

      A brief intro on toxisomes should be provided in the introduction to better integrate this into the manuscript's results.

      Thank you very much for your advice. We have added corresponding content about toxin producing bodies in the introduction section 'The biosynthesis of DON entails a reorganization of the endoplasmic reticulum into a specialized compartment termed the "toxisome" (Tang et al., 2018). The assembly of the toxisome coincides with the aggregation of key biosynthetic enzymes, which in turn enhances the efficiency of DON production. Concurrently, this compartmentalization serves as a self-defense mechanism, protecting the fungus from the autotoxicity of TRI pathway intermediates (Boenisch et al., 2017). The proteins TRI1, TRI4, TRI14, and Hmr1 are confirmed constituents of this structure(Kistler and Broz, 2015; Menke et al., 2013).' (in L86-93)

      (4) Line 153

      DON production decreases by about 80 %, but not to 0. Consequently, DML1 is important, but NOT essential for DON production.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'FgDML1 is essential for the biosynthesis of the DON toxin. '(in L161)

      (5) Line 168ff

      Please provide a reference for FgDnm1 being critical for mitochondrial fission and state whether such an interaction has been shown in other organisms.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'FgDnm1 is a key dynamin-related protein mediating mitochondrial fission(Griffin et al., 2005; Kang et al., 2023), suggesting that FgDML1 may form a complex with FgDnm1 to regulate mitochondrial fission and fusion processes. To our knowledge, this is the first report documenting an interaction between DML1 and Dnm in any fungal species, including model organisms such as S. cerevisiae. This novel finding provides new insights into the molecular mechanisms underlying mitochondrial dynamics in filamentous fungi. '(in L277-283)

      (6) Line 178

      Please specify whether Complex III activity was related to biomass and provide a p-value or standard deviation for the value.

      Thank you very much for your question. The activity determination of complex III was completed using a complex III enzyme activity kit (Solarbio, Beijing, China) (Li, et al 2022; Wang, et al 2022). Take 0.1 g of standardized mycelium as the sample for the experiment. Given that the mycelium has been homogenized, we believe that there is no necessary correlation between the activity and biomass of complex III. And we also refined the specific measurement steps in the article. ' Briefly, 0.1 g of mycelia was homogenized with 1 mL of extraction buffer in an ice bath. The homogenate was centrifuged at 600 ×g for 10 min at 4°C. The resulting supernatant was then subjected to a second centrifugation at 11,100 ×g for 10 min at 4°C. The pellet was resuspended in 200 μL of extraction buffer and disrupted by ultrasonication (200 W, 5 s pulses with 10 s intervals, 15 cycles). Complex III enzyme activity was finally measured by adding the working solution as per the manufacturer's protocol. Each treatment group contains three biological replicates and three technical replicates. '(in L511-517)

      Li C, et al. Amino acid catabolism regulates hematopoietic stem cell proteostasis via a GCN2-eIF2 axis. Cell Stem Cell. 2022 Jul 7; 29(7):1119-1134.e7. doi: 10.1016/j.stem.2022.06.004. PMID: 35803229.

      Wang K, et al. Locally organised and activated Fth1hi neutrophils aggravate inflammation of acute lung injury in an IL-10-dependent manner. Nat Commun. 2022 Dec 13;13(1):7703. doi: 10.1038/s41467-022-35492-y. PMID: 36513690; PMCID: PMC9745290

      (7) Line 185

      Albeit this headline is a reasonable hypothesis, you actually did not show that the conformation is altered. Please reword accordingly.

      Please also add references for cyazofamid acting on the QI site versus other fungicides acting on the QO site.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'Overexpression of FgQCR2, FgQCR8, and FgQCR9 may alters the conformation of the QI site, resulting in reduced sensitivity to cyazofamid. '(in L212-213). For fungicides targeting Qi and QO sites, we have added corresponding descriptions in the respective sections 'Numerous fungicides have been developed to inhibit the Qo site (e.g., pyraclostrobin, azoxystrobin)(Nuwamanya et al., 2022; Peng et al., 2022) and the Qi site (e.g., cyazofamid)(Mitani et al., 2001) of the cytochrome bc1 complex. '(in L327-329)

      (8) Line 200

      This section on growth should be moved up right after introducing the mutant strain.

      Thank you very much for your advice. We have advanced the part of nutritional growth and sexual asexual development before DON toxin to promote better reading and understanding. We arranged the sequence in the previous way to emphasize the new discovery between mitochondria and DON toxin. We found a significant decrease in DON toxin in ΔFgDML1, defects in the formation of toxin producing bodies, and downregulation of FgTRis at both the gene and protein levels. In summary, we believe that the absence of FgDML1 does indeed lead to a decrease in the content of DON toxin, and FgDML1 plays a regulatory role in the synthesis of DON toxin. In addition, our measurements of DON toxin, acetyl CoA, ATP and other indicators are all based on the amount per unit hyphae, excluding differences caused by hyphal biomass or growth. We have further refined the materials and methods to facilitate better reading and understanding.

      (9) Line 203

      "... significantly reduced growth rates ..."

      This is not what was measured here. Figure 6A shows a plate assay that can be used to assess hyphal extension. In the figure, it is also visible that the mycelium of the deletion mutant is much denser, maybe due to increased hyphal branching. Please reword.

      Additionally, it is important to include a biomass measurement here under the conditions used for DON assessment. Hyphal extension measurements cannot be used instead of biomass.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'The ΔFgDML1 strain displayed a distinct growth phenotype characterized by retardation in radial growth and the formation of more compact, denser hyphal networks on all tested media compared to the PH-1 and ΔFgDML-C strains. '(in L136-138).

      (10) Line 217

      Please include information on how long the cultures were monitored. Given the very slow growth of the mutant, perithecia formation may be considerably delayed beyond 14 days.

      Thank you very much for your advice. Based on your suggestion, we have extended the incubation time for sexual reproduction to 21 days to more accurately evaluate its sexual reproduction ability. Our results show that even after 21 days, Δ FgDML1 still cannot produce ascospores and ascospores, which proves that the absence of FgDML1 does indeed cause sexual reproduction defects in F. graminearum.

      Author response image 1.

      Discussion

      (11) Please mention your summary Figure 8 early on in the discussion, and explain conclusions with this figure in mind. Please avoid repetition of the results section as much as possible.

      Also, please state clearly what was already known from previous research and is in agreement with your results, and what is new (in fungi or generally).

      Thank you very much for your advice. Based on your suggestion, we mentioned Fig8 earlier in the first half of the discussion and provided guidance for the following text. We also conducted a more comprehensive discussion by analyzing our research results and comparing them with previous studies. 'Our study defines a novel mechanism through which FgDML1 governs mitochondrial homeostasis. We demonstrate that FgDML1 directly interacts with the key mitochondrial fission regulator FgDnm1 and positively modulates cellular bioenergetic metabolism, as evidenced by elevated ATP and acetyl-CoA levels (Fig. 8). '(in L250-253). 'The Misato/DML1 protein family is evolutionarily conserved from yeast to humans and plays a critical role in mitochondrial regulation. In S. cerevisiae, DML1 is an essential gene; its deletion is lethal, while its overexpression results in fragmented mitochondrial networks and aberrant cellular morphology, underscoring its necessity for normal mitochondrial function (Gurvitz et al., 2002). Similarly, in Homo sapiens, the homolog Misato localizes to the mitochondrial outer membrane, and both its depletion and overexpression are sufficient to disrupt mitochondrial morphology and distribution (Kimura and Okano, 2007). '(in L241-244).

      (12) Line 262ff

      Please specify if this interaction was shown previously in other organisms and provide references.

      Thank you very much for your advice. We have clearly stated in the corresponding section that the interaction between FgDML and FgDnm is the first reported, and to our knowledge, no relevant reports have been found in other species so far. ' Notably, FgDML1 was found to interact with FgDnm1 (Fig. 5E), FgDnm1 is a key dynamin-related protein mediating mitochondrial fission(Griffin et al., 2005; Kang et al., 2023), suggesting that FgDML1 may form a complex with FgDnm1 to regulate mitochondrial fission and fusion processes. To our knowledge, this is the first report documenting an interaction between DML1 and Dnm in any fungal species, including model organisms such as S. cerevisiae. This novel finding provides new insights into the molecular mechanisms underlying mitochondrial dynamics in filamentous fungi. '(in L276-283)

      (13) Line 287ff

      There is no result that would justify this speculation. Please remove.

      Thank you very much for your advice. We have modified the corresponding wording in the corresponding section. 'In conclusion, our findings suggest that the overexpression of assembly factors FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 potentially modifies the conformation of the Qi site, which specifically modulates the sensitivity of F. graminearum to cyazofamid. '(in L352-355)

      Materials and methods

      (14) A table with all primer sequences used in the study and their purpose is missing. For every experiment, the number of technical and biological replicates needs to be stated.

      Thank you very much for your advice. We have presented all the primers used in this study in Supplementary Table 1 (in Table S1) .We added the number of technical and biological replicates in the material and method descriptions for each experiment. 'For each sample, a total of 200 conidia were counted. The experiment included three biological replicates with three technical replicates each.'(in L434-436). 'Each treatment group contains three biological replicates. '(in L444-445). 'Each treatment group contains three biological replicates and three technical replicates. ' (in L463-464). 'Each treatment group contains three biological replicates and three technical replicates. '(in L474-475). 'Each treatment group contains three biological replicates. '(in L483). 'Each treatment group contains three biological replicates and three technical replicates.'(in L501-502). 'Each treatment group contains three biological replicates and three technical replicates. '(in L516-517). 'The experiment was independently repeated three times. '(in L533-534).

      (15) Line 369ff

      Please provide final concentrations used for assays here.

      Thank you very much for your advice. The final concentration has been displayed in the Figure (in Fig6. A, B) (in Fig. S3). And we have provided supplementary Table 2 to reflect the concentration in a more intuitive way.(in Table. S2)

      (16) Line 383

      Please provide a reference or data on the use of F2du for transformant selection and explain the abbreviation.

      Thank you very much for your advice. Based on your suggestion, we have provided the full name and references of F2du. 'Transformants were selected on PDA plates containing either 100 μg/mL Hygromycin B (Yeasen, Shanghai, China) or 0.2 μmol/mL 5-Fluorouracil 2'-deoxyriboside (F2du) (Solarbio, Beijing, China)(Zhao et al., 2022). '(in L405-407).

      (17) Line 407

      Please provide a reference for the method and at least a brief description.

      Thank you very much for your advice. Based on your suggestion, we have added references and provided a brief introduction to the method. 'As previously described (Tang et al., 2020; Wang et al., 2025), Specifically, coleoptiles were inoculated with conidial suspensions and incubated for 14 days, while leaves were inoculated with fresh mycelial plugs and incubated for 5 days, followed by observation and quantification of disease symptoms. DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018). '(in L466-471)

      (18) Line 414ff

      Also, here, the amount of biomass has to be considered for the measurement to be able to distinguish if actually less of the compounds were produced or if the effect seen was merely due to an altered amount of biomass present.

      Thank you very much for your advice. We believe that biomass is not within the scope of our measurement indicators, as we have measured and calculated based on unit hyphae. Therefore, we have ruled out experimental bias caused by a decrease in biomass.

      RNA and RT-qPCR

      (19) Line 461

      When the strains were transferred to AEA medium, was the biomass measured, at least wet weight, and in which culture volume was it done? It makes a big difference if the amount of (wet) biomass dilutes a small amount of fungicide-containing culture or if biomass is added in at least roughly equal amounts in sufficient growth medium to ensure equal conditions.

      Thank you very much for your question. Our sample processing controlled the wet weight of the samples before dosing, ensuring that the wet weight of the mycelium obtained from each sample before dosing was 0.2g. The mycelium was obtained through AEA with a volume of 100mL. This ensured consistency in the initial biomass between groups before dosing, and also ensured the accuracy of the drug concentration.

      (20) Line 466

      Please provide the name and supplier of the kit.

      Thank you very much for your advice. We have added corresponding content in the corresponding location. 'Mycelium was collected and total RNA was extracted following the instructions provided by the Total RNA Extraction Kit (Tiangen, Beijing, China).' (in L523-524).

      (21) All primer sequences must be provided in a table.

      Thank you very much for your advice. We have presented all the primers used in this study in Supplementary Table 1. (in Table S1).

      (22) For RT qPCR it is essential to check the RNA quality to be sure that the obtained results are not artifacts due to varying quality, which may exceed differences. Please state how quality control was done and which threshold was applied for high-quality RNA to be used in RTqPCR (like RIN factor, etc).

      Thank you very much for your question. We performed stringent quality control on the extracted total RNA. First, a micro-spectrophotometer was used to measure RNA concentration and purity, confirming that the A260/A280 ratio was between 1.8 and 2.0 and the A260/A230 ratio was greater than 2.0, indicating good RNA purity without significant protein or organic solvent contamination.Subsequently, verification by agarose gel electrophoresis revealed distinct 28S and 18S rRNA bands, demonstrating good RNA integrity and absence of degradation.

      Author response image 2.

      (B): Minor Comments:

      (1) Please increase the font size of the labels and annotations of the figures; it is hard to read as it is now.

      Thank you very much for your advice. We have increased the size of annotations or numerical labels in the corresponding images for better reading.

      (2) Throughout the manuscript: Please check that all abbreviations are explained at first use.

      Thank you very much for your advice. We have checked the entire text to ensure that abbreviations have their full names when they first appear.

      (3) I do hope that the authors can clarify all concerns and provide an amended manuscript of this interesting story.

      Thank you very much for your advice. Sincerely thank you for your suggestions and questions, which have been very helpful to us.

      Reviewer #2:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper is innovative, but there are issues in the writing that need to be addressed and corrected.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions with red words. In the response comments, to highlight the specific positions of the revised parts in the manuscript with red line number. The point to point responds to the reviewer’s comments are listed as following.

      Weaknesses:

      (1) The authors speculate that cyazofamid treatment caused upregulation of the assembly factors, leading to a change in the conformation of the Qi protein, thus restoring the enzyme activity of complex III. But no speculation was given in the discussion as to why this would lead to the upregulation of assembly factors, and how the upregulation of assembly factors would change the protein conformation, and is there any literature reporting a similar phenomenon? I would suggest adding this to the discussion.

      Thank you very much for your advice. Based on your suggestion, we have added content related to the assembly factor of complex III in the discussion section and made modifications to the corresponding wording. 'Previous studies have reported that mutations in the Complex III assembly factors TTC19, UQCC2, and UQCC3 impair the assembly and activity of Complex III (Feichtinger et al., 2017; Wanschers et al., 2014). '(in L345-347). 'In conclusion, our findings suggest that the overexpression of assembly factors FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 potentially modifies the conformation of the Qi site, which specifically modulates the sensitivity of F. graminearum to cyazofamid. '(in L352-355).

      (2) Would increased sensitivity of the mutant to cell wall stress be responsible for the excessive curvature of the mycelium?

      Thank you very much for your question. We believe that the sensitivity of ΔFgDML1 to osmotic stress is reduced, which may not be related to hyphal bending, as shown in the Author response image 3. During the conidia stage, ΔFgDML1 cannot germinate in YEPD, while the application of 1M Sorbitol promotes its germination. But it is caused by internal unknown mechanisms, which is also the focus of our future research.

      Author response image 3.

      (3) The vertical coordinates of Figure 7B need to be modified with positive inhibition rates for the mutants.

      Thank you very much for your advice. The display in Figure 7B truly reflects its inhibition rate. In the Δ FgDML1 mutant, when subjected to osmotic stress treatment, the inhibition rate becomes negative, indicating that the colony growth is greater than that of the CK. Therefore, the negative inhibition rate is shown in Figure 7B.

      (1) In Figure 1B, Figure 3C, and Figure 6C, the scale below the picture is not clear. In Figure 5D, the histogram is unclear, and it is recommended to redraw the graph.

      Thank you very much for your advice. The issue with the above images may be due to Word compression. We have changed the settings and enlarged the images as much as possible to better display them.

      (2) The full Latin name of the strain should be used in the title of figures and tables.

      Thank you very much for your advice. Based on your suggestion, we have used the full names of the strains appearing in the title of figures and tables.

      (3) Proteins in line 117 should be abbreviated.

      Thank you very much for your advice. Based on your suggestion, we have abbreviated the corresponding positions. 'The DML1 protein from S. cerevisiae was used as a query for a BLAST search against the Fusarium genome database, resulting in the identification of the putative DML1 gene FgDML1 (FGSG_05390) in F. graminearum. '(in L118-120).

      (4) The sentence in lines 187-189, which is supposed to introduce why the test is sensitive to the three drugs, is currently illogical.

      Thank you very much for your advice. Based on your suggestion, we have made modifications to the corresponding sections. 'Since Complex III is involved in the action of both cyazofamid (targeting the QI site) and pyraclostrobin (targeting the QO site), the sensitivity of ΔFgDML1 to cyazofamid and pyraclostrobin was investigated. ' (in L214-216).

      (5) The expression of FgQCR2, FgQCR7, and FgQCR8 was significantly upregulated in ΔFgDML1 at transcription levels. Do FgQCR2, FgQCR8, and FgQCR9 show upregulated expression at the protein level?

      Thank you very much for your question. Based on your suggestion, we evaluated the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in PH-1 and ΔFgDML1, and we found that the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 were higher than those in PH-1. (in Fig. 6F).

      (6) In Figure 7B, it is recommended to adjust the position of the horizontal axis labels in the histogram.

      Thank you very much for your advice. Based on your suggestion, we have made modifications to the corresponding sections.(in Fig. 7B)

      (7) There are numerous errors in the writing of gene names in the text. Please check the full text and change the writing of gene names and mutant names to italic.

      Thank you very much for your advice. We have checked the entire text to ensure that all genes have been italicized.

      (8) All acronyms should be spelled out in figure and table captions. e.g., F. graminearum.

      Thank you very much for your advice. Based on your suggestion, we have used the full names of the strains appearing in the title of figures and tables.

      (9) In line 492, P should be lowercase and italic.

      Thank you very much for your advice. Based on your suggestion, we have made adjustments to the corresponding content.

      Reviewer #3:

      Summary:

      The manuscript "Mitochondrial 1 protein FgDML1 regulates DON toxin biosynthesis and cyazofamid sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" describes the construction of a null mutant for the FgDML1 gene in F. graminearum and assays characterising the effects of this mutation on the pathogen's infection process and lifecycle. While FgDML1 remains underexplored with an unclear role in the biology of filamentous fungi, and although the authors performed several experiments, there are fundamental issues with the experimental design and execution, and interpretation of the results.

      Strengths:

      FgDML1 is an interesting target, and there are novel aspects in this manuscript. Studies in other organisms have shown that this protein plays important roles in mitochondrial DNA (mtDNA) inheritance, mitochondrial compartmentalisation, chromosome segregation, mitochondrial distribution, mitochondrial fusion, and overall mitochondrial dynamics. Indeed, in Saccharomyces cerevisiae, the mutation is lethal. The authors have carried out multi-faceted experiments to characterise the mutants.

      Weaknesses:

      However, I have concerns about how the study was conceived. Given the fundamental importance of mitochondrial function in eukaryotic cells and how the absence of this protein impacts these processes, it is unsurprising that deletion of this gene in F. graminearum profoundly affects fungal biology. Therefore, it is misleading to claim a direct link between FgDML1 and DON toxin biosynthesis (and virulence), as the observed effects are likely indirect consequences of compromised mitochondrial function. In fact, it is reasonable to assume that the production of all secondary metabolites is affected to some extent in the mutant strains and that such a strain would not be competitive at all under non-laboratory conditions. The order in which the authors present the results can be misleading, too. The results on vegetative growth rate appeared much later in the manuscript, which should have come first, as the FgDML1 mutant exhibited significant growth defects, and subsequent results should be discussed in that context. Moreover, the methodologies are not described properly, making the manuscript hard to follow and difficult to replicate.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions with red words. In the response comments, to highlight the specific positions of the revised parts in the manuscript with red line number. The point to point responds to the reviewer’s comments are listed as following.

      For weaknesses,we arranged the sequence in this way to emphasize the novel discovery between mitochondria and DON toxin. We found a significant decrease in DON toxin in Δ FgDML1, defects in the formation of toxin producing bodies, and downregulation of FgTRis at both the gene and protein levels. In summary, we believe that the absence of FgDML1 does indeed lead to a decrease in the content of DON toxin, and FgDML1 plays a regulatory role in the synthesis of DON toxin. In addition, our measurements of DON toxin, acetyl CoA, ATP and other indicators are all based on the amount per unit hyphae, excluding differences caused by hyphal biomass or growth. We have further refined the materials and methods to facilitate better reading and understanding.

      (1) Lines 37-39: The disease itself does not produce toxins; it is the fungus that causes the disease that produces toxins. Moreover, the disease symptoms observed are likely caused by the toxins produced by the fungus.

      Thank you very much for your advice. We have made modifications to the wording of the corresponding sections. 'Studies have shown that increased DON levels are positively correlated with the pathogenicity rate of F. graminearum.'(in L36-37).

      (2) Lines 82-87: While it is challenging to summarise the role of ATP in just a few words, this section needs improvement for clarity and accuracy. Additionally, I do not believe that drawing a direct link between mitochondrial defects and toxin production is an appropriate strategy in this case.

      Thank you very much for your advice. Based on your suggestion, we have added corresponding descriptions in the corresponding positions to provide more information on the relationship between ATP and toxins, in order to better prepare for the following text. 'Pathogen-intrinsic ATP homeostasis is recognized as a critical, rate-limiting determinant for toxin biosynthesis. Previous studies indicate that dual-target inhibition of ATP synthase (AtpA) and adenine deaminase (Ade) by a specific small-molecule probe effectively depletes intracellular ATP, consequently suppressing the synthesis of key virulence factors TcdA and TcdB transcriptionally and translationally(Marreddy et al., 2024). The systemic toxicity of Anthrax Edema Toxin (ET) is primarily attributed to its catalytic activity, which depletes the host cell's ATP reservoir, thereby triggering a bioenergetic collapse that culminates in cell lysis and death(Liu et al., 2025). '(in L78-86).

      (3) Lines 125-126: The manuscript does not clearly describe how subcellular localisation was determined. This methodology needs to be properly detailed.

      Thank you very much for your advice. The subcellular localization was validated through co-localization analysis with MitoTracker Red CMXRos, a mitochondrial-specific dye. The observed overlap between the FgDML1-GFP signal and the mitochondrial marker confirmed mitochondrial localization. Based on these results, we determined that FgDML1 is definitively localized to the mitochondria.We have incorporated this description in the appropriate section of the manuscript. 'Furthermore, subcellular localization studies confirmed that FgDML1 localizes to mitochondria, as demonstrated by colocalization with a mitochondria-specific dye MitoTracker Red CMXRos (Fig. 1B). '(in L125-127).

      (4) Regarding the organisation of the Results section, it needs to be revised. While I understand the authors' intention to emphasise the impact on virulence, the results showing how FgDML1 deletion affects vegetative growth, asexual and sexual reproduction, and sensitivity to stressors should be presented before the virulence assays and effects on DON production. Additionally, the authors do not provide any clear evidence that FgDML1 directly interacts with proteins involved in asexual or sexual reproduction, stress responses, or virulence. Therefore, it is misleading to suggest that FgDML1 directly regulates these processes. The observed phenotypes are, rather, a consequence of severely impaired mitochondrial function. Without functional mitochondria, the cell cannot operate properly, leading to widespread physiological defects. In this regard, statements such as those in lines 139-140 and 343-344 are misleading.

      Thank you very much for your advice. We have adjusted the order of the images based on your suggestion, placing the characterization of ΔFgDML1 in nutritional growth, sexual reproduction, and other aspects before DON toxin. And we have made adjustments to the corresponding statements. 'These findings demonstrate that FgDML1 is a positive regulator of virulence in F. graminearum. '(in L140-141).

      (5) Lines 185-186: The authors do not provide sufficient evidence to support the claim that FgQCR2, FgQCR8, and FgQCR9 overexpression is the main cause of reduced cyazofamid sensitivity. Although expression of these genes is altered, reduced sensitivity may result from changes in other proteins or pathways. To strengthen this claim, overexpression of FgQCR2, 8, and 9 in the wild-type background, followed by assessment of cyazofamid resistance, would be necessary. As it stands, there is no support for the claim presented in lines 329-332.

      Thank you very much for your advice. To establish a causal link between the overexpression of FgQCR2, FgQCR7, and FgQCR8 and the observed reduction in cyazofamid sensitivity, we first quantified the protein levels of these assembly factor. Western blot analysis confirmed their elevated expression in the ΔFgDML1 mutant compared to the wild-type PH-1. We further generated individual overexpression strains for FgQCR2, FgQCR7, and FgQCR8 in the wild-type PH-1 background. Fungicide sensitivity assays revealed that all three overexpression mutants displayed significantly reduced sensitivity to cyazofamid compared to the parental strain. These genetic complementation experiments confirm that upregulation of FgQCR2, FgQCR7, and FgQCR8 is sufficient to confer reduced cyazofamid sensitivity.We have incorporated these explanations and provided supporting images in the appropriate section of the manuscript. 'To further clarify whether the upregulated expression of FgQCR2, FgQCR7, and FgQCR8 genes affects their protein expression levels, we measured the protein levels. The results showed that the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 were higher than those in PH-1(Fig. 6F). Subsequently, we overexpressed FgQCR2, FgQCR7, and FgQCR8 in the wild-type background, and the corresponding overexpression mutants exhibited reduced sensitivity to cyazofamid(Fig. 6E). '(in L205-211)(in Fig. 6E, F)

      (6) Lines 187-190: This segment is confusing and difficult to follow. It requires rewriting for clarity.

      Thank you very much for your advice. Based on your suggestion, we have made corresponding modifications in the corresponding locations. 'Since Complex III is involved in the action of both cyazofamid (targeting the QI site) and pyraclostrobin (targeting the QO site), the sensitivity of ΔFgDML1 to cyazofamid and pyraclostrobin was investigated. ''(in L214-216)

      (7) Lines 345-346: The authors state that in this study, FgDML1 is localised in mitochondria, which implies that in other studies, its localisation was different. Is this accurate? Clarification is needed.

      Thank you very much for your question. In previous studies, the localization of this protein was not clearly defined, and its function was only emphasized to be related to mitochondria. Whether in yeast or in Drosophila melanogaster. (Miklos et al., 1997; Gurvitz et al., 2002)

      Miklos GLG, Yamamoto M-T, Burns RG, Maleszka R. 1997. An essential cell division gene of drosophila, absent from saccharomyces, encodes an unusual protein with  tubulin-like and myosin-like peptide motifs. Proc Natl Acad Sci 94:5189–5194. doi:10.1073/pnas.94.10.5189

      Gurvitz A, Hartig A, Ruis H, Hamilton B, de Couet HG. 2002. Preliminary characterisation of DML1, an essential saccharomyces cerevisiae gene related to misato of drosophila melanogaster. FEMS Yeast Res 2:123–135. doi:10.1016/S1567-1356(02)00083-1

      Material and Methods Section

      (8) In general, the methods require more detailed descriptions, including the brands and catalog numbers of reagents and kits used. Simply stating that procedures were performed according to manufacturers' instructions is insufficient, particularly when the specific brand or kit is not identified.

      Thank you very much for your advice. We have added corresponding content based on your suggestion to more comprehensively display the reagent brand and complete product name. 'Transformants were selected on PDA plates containing either 100 μg/mL Hygromycin B (Yeasen, Shanghai, China) or 0.2 μmol/mL 5-Fluorouracil 2'-deoxyriboside (F2du) (Solarbio, Beijing, China)(Zhao et al., 2022). ' (in L405-407). 'DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018) '. (in L469-471)

      (9) Line 364: What do CM and MM stand for? Please define.

      Thank you very much for your advice. Based on your suggestion, we have made modifications in the corresponding locations. 'To evaluate vegetative growth, complete medium (CM), minimal medium (MM), and V8 Juice Agar (V8) media were prepared as described previously(Tang et al., 2020). '(in L385-387)

      Generation of Deletion and Complemented Mutants:

      (10) This section lacks detail. For example, were PCR products used directly for PEG-mediated transformation, or were the fragments cloned into a plasmid?

      Thank you very much for your question. We directly use the fused fragments for protoplast transformation after sequencing confirmation. We have clearly defined the fragment form used for transformation at the corresponding location. 'The resulting fusion fragment was transformed into the wild-type F. graminearum PH-1 strain via polyethylene glycol (PEG)-mediated protoplast transformation. '(in L403-405).

      (11) PCR and Southern blot validation results should be included as supplementary material, along with clear interpretations of these results.

      Thank you very much for your advice. In the supplementary material we submitted, Supplementary Figure 2 already includes the results of PCR and Southern blot validation.(in Fig. S2)

      (12) There is almost no description of how the mutants mentioned in lines 388-390 were generated.

      Thank you very much for your advice. Based on your suggestions, we have added relevant content in the appropriate sections to more comprehensively and clearly reflect the experimental process. 'Specifically, FgDML1, including its native promoter region and open reading frame (ORF) (excluding the stop codon), was amplified.The PCR product was then fused with the XhoI -digested pYF11 vector. After transformation into E. coli and sequence verification, the plasmid was extracted and subsequently introduced into PH-1 protoplasts. For FgDnm1-3×Flag, the 3×Flag tag was added to the C-terminus of FgDnm1 by PCR, fused with the hygromycin resistance gene and the FgDnm1 downstream arm, and then introduced into PH-1 protoplasts. The overexpression mutant was constructed according to a previously described method. Specifically, the ORF of FgDML1 was amplified and the PCR product was ligated into the SacII-digested pSXS overexpression vector. The resulting plasmid was then transformed into PH-1 protoplasts (Shi et al., 2023). For the construction of PH-1::FgTri1+GFP and ΔFgDML1::FgTri1+GFP, the ORF of FgTri1 was amplified and ligated into the XhoI-digested pYF11 vector as described above. The resulting vectors were then transformed into protoplasts of PH-1 or ΔFgDML1, respectively.'(in L413-426).

      Vegetative Growth and Conidiation Assays:

      (13) There is no information about how long the plates were incubated before photos were taken. Judging by the images, it appears that different incubation times may have been used.

      Thank you very much for your advice. Due to the slower growth of ΔFgDML1, we adopted different incubation periods and have supplemented the relevant content in the corresponding section. 'All strains were incubated at 25°C in darkness; however, due to ΔFgDML1 slower growth, the ΔFgDML1 mutant required a 5-day incubation period compared to the 3 days used for PH-1 and ΔFgDML1-C. '(in L490-493).

      (14) There is no description of the MBL medium.

      Thank you very much for your advice. Based on your suggestion, we have supplemented the corresponding content in the corresponding positions. 'Mung bean liquid (MBL) medium was used for conidial production, while carrot agar (CA) medium was utilized to assess sexual reproduction(Wang et al., 2011). '(in L387-389).

      DON Production and Pathogenicity Assays:

      (15) Were DON levels normalised to mycelial biomass? The vegetative growth assays show that FgDML1 null mutants exhibit reduced growth on all tested media. If mutant and wild-type strains were incubated for the same period under the same conditions, it is reasonable to assume that the mutants accumulated significantly less biomass. Therefore, results related to DON production, as well as acetyl-CoA and ATP levels, must be normalised to biomass.

      Thank you very much for your question. We have taken into account the differences in mycelial biomass. Therefore, when measuring DON, acetyl-CoA, and ATP levels, all data were normalized to mycelial mass and calculated as amounts per unit of mycelium, thereby avoiding discrepancies arising from variations in biomass.

      Sensitivity Assays:

      (16) While the authors mention that gradient concentrations were used, the specific concentrations and ranges are not provided. Importantly, have the plates shown in Figure 5 been grown for different periods or lengths? Given the significantly reduced growth rate shown in Figure 6A, the mutants should not have grown to the same size as the WT (PH-1) as shown in Figures 5A and 5B unless the pictures have been taken on different days. This needs to be explained.

      Thank you very much for your question. Due to the slower growth of ΔFgDML1, we adopted different incubation periods and have supplemented the relevant content in the corresponding section. 'All strains were incubated at 25°C in darkness; however, due to ΔFgDML1 slower growth, the ΔFgDML1 mutant required a 5-day incubation period compared to the 3 days used for PH-1 and ΔFgDML1-C. '(in L490-493).

      (17) Additionally, was inhibition measured similarly for both stress agents and fungicides? This should be clarified.

      Thank you very much for your question. We have supplemented the specific concentration gradient of fungicides. 'The concentration gradients for each fungicide in the sensitivity assays were set up according to Supplementary Table S2. '(in L493-494)(in Table. S2).

      Complex III Enzyme Activity:

      (18) A more detailed description of how this assay was performed is needed.

      Thank you very much for your advice. We have provided further detailed descriptions of the corresponding sections. 'Briefly, 0.1 g of mycelia was homogenized with 1 mL of extraction buffer in an ice bath. The homogenate was centrifuged at 600 ×g for 10 min at 4°C. The resulting supernatant was then subjected to a second centrifugation at 11,000 ×g for 10 min at 4°C. The pellet was resuspended in 200 μL of extraction buffer and disrupted by ultrasonication (200 W, 5 s pulses with 10 s intervals, 15 cycles). Complex III enzyme activity was finally measured by adding the working solution as per the manufacturer's protocol. '(in L511-517)

      (19) Were protein concentrations standardised prior to the assay?

      Thank you very much for your question. Protein concentrations for all Western blot samples were quantified using a BCA assay kit to ensure equal loading.

      (20) Line 448: Are ΔFgDML1::Tri1+GFP and ΔFgDML1+GFP the same strain? ΔFgDML1::Tri1+GFP has not been previously described.

      Thank you very much for your question. These two strains are not the same strain, and we have supplemented their construction process in the corresponding section. 'For the construction of PH-1::FgTri1+GFP and ΔFgDML1::FgTri1+GFP, the ORF of FgTri1 was amplified and ligated into the XhoI-digested pYF11 vector as described above. The resulting vectors were then transformed into protoplasts of PH-1 or ΔFgDML1, respectively. '(in L423-426)

      (21) Lines 460 and 468: Please adopt a consistent nomenclature, either RT-qPCR or qRT-PCR.

      Thank you very much for your advice. We have unified it and modified the corresponding content in the corresponding sections. 'Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) was carried out using the QuantStudio 6 Flex real-time PCR system (Thermo, Fisher Scientific, USA) to assess the relative expression of three subunits of Complex III (FgCytb, FgCytc1, FgISP), five assembly factors (FgQCR2, FgQCR6, FgQCR7, FgQCR8, FgQCR9), and DON biosynthesis-related genes (FgTri5 and FgTri6). '(in L526-531)

      (22) Lines 472-473: Why was FgCox1 used as a reference for FgCytb? Clarification is needed.

      Thank you very much for your question. FgCytb (cytochrome b) and FgCOX1 (cytochrome c oxidase subunit I) are both encoded by the mitochondrial genome and serve as core components of the oxidative phosphorylation system (Complex III and Complex IV, respectively). Their transcription is co-regulated by mitochondrial-specific mechanisms in response to cellular energy status. Consequently, under experimental conditions that perturb energy homeostasis, FgCOX1 expression exhibits relative, context-dependent stability with FgCytb, or at least co-varies directionally, making it a superior reference for normalizing target gene expression. In contrast, FgGapdh operates within a distinct genetic and regulatory system. Using FgCOX1 ensures that both reference and target genes reside within the same mitochondrial compartment and functional module, thereby preventing normalization artifacts arising from independent variation across disparate pathways.

      (23) Lines 476-477: This step requires a clearer and more detailed explanation.

      Thank you very much for your advice. We provided detailed descriptions of them in their respective positions. 'For FgDnm1-3×Flag, the 3×Flag tag was added to the C-terminus of FgDnm1 by PCR, fused with the hygromycin resistance gene and the FgDnm1 downstream arm, and then introduced into PH-1 protoplasts. '(in L417-419). 'The FgDnm1-3×Flag fragment was introduced into PH-1 and FgDML1+GFP protoplasts, respectively, to obtain single-tagged and double-tagged strains. '(in L541-543)

      Western blotting:

      (24) Uncropped Western blot images should be provided as supplementary material.

      Thank you very much for your advice. All Western blot images will be submitted to the supplementary material package.

      (25) Lines 485-489: A more thorough description of the antibodies used (including source, catalogue number, and dilution) is necessary.

      Thank you very much for your advice. The antibodies used are clearly stated in terms of brand, catalog number, and dilution. We have added the dilution ratio. 'All antibodies were diluted as follows: primary antibodies at 1:1000 and secondary antibodies at 1:10000. '(in L550-551)

      (26) The Western blot shown in Figure 3D appears problematic, particularly the anti-GAPDH band for FgDML1::FgTri1+GFP. Are both anti-GAPDH bands derived from the same gel?

      Thank you very much for your advice. We are unequivocally certain that these data derive from the same gel. Therefore, we are providing the original image for your inspection.

      Author response image 4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.

      We have revised the paper for clarity at all levels: motivation, application, and parameterization. We clarify that there is a large unmet need for using RSA in a trial-wise manner, and that this approach indeed offers benefits to any team interested in decoding trial-wise representational information linked to a behavioral responses, and as such is not a problem specific to a single memory study.

      (2) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.

      We appreciate the summary of relevant literature and have included a revised Introduction to address this bounty of relevant work. While much is owed to these authors, new developments from a diverse array of researchers outside of a single group can aid in new research questions, and should always have a place in our research landscape. We owe much to the work of Kriegeskorte’s group, and in fact, Schutt et al., 2023 served as a very relevant touchpoint in the Discussion and helped to highlight specific needs not addressed by the assessment of the “representational geometry” of an entire presented stimulus set. Principal amongst these needs is the application of trial-wise representational information that can be related to trial-wise behavioral responses and thus used to address specific questions on brain-behavior relationships. We invite the Reviewer to consider the utility of this shift with the following revisions to the Introduction.

      Page 3. “Recently, methodological advancements have addressed many known limitations in cRSA. For example, cross-validated distance measures (e.g., Euclidean distance) have improved the reliability of representational dissimilarities in the presence of noise and trial imbalance (Walther et al., 2016; Nili et al., 2014; Diedrichsen et al., 2021). Bayesian approaches such as pattern component modeling (Diedrichsen, Yokoi, & Arbuckle, 2018) have extended representational approaches to accommodate continuous stimulus features or temporal variation. Further, model comparison RSA strategies (Diedrichsen et al., 2021) and generalization techniques across stimuli (Schütt et al., 2023) have improved sensitivity and inference. Nevertheless, a common feature shared across most of improvements is that they require stimuli repetition to examine the representational structure. This requirement limits their ability to probe brain-behavior questions at the level of individual events”.

      Page 8. “While several extensions of RSA have addressed key limitations in noise sensitivity, stimulus variance, and modeling (e.g., Diedrichsen et al., 2021; Schütt et al., 2023), our tRSA approach introduces a new methodological step by estimating representational strength at the trial level. This accounts for the multi-level variance structure in the data, affords generalizability beyond the fixed stimulus set, and allows one to test stimulus- or trial-level modulations of neural representations in a straightforward way”.

      Page 44. “Despite such prevalent appreciation for the neurocognitive relevance of stimulus properties, cRSA often does not account for the fact that the same stimulus (e.g., “basketball”) is seen by multiple subjects and produces statistically dependent data, an issue addressed by Schütt et al., 2023, who developed cross validation and bootstrap methods that explicitly model dependence across both subjects and stimulus conditions”.

      (3) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".

      At a general level, our approach rests on the premise that there is meaningful information present in a single presentation of a given stimulus. This assumption may have less utility when the research goals are more focused on estimating the fidelity of signal patterns for RSA, as in designs with multiple repetitions. But it is an exaggeration to state that such a trial-wise approach cannot address the difference between “true” stimulus patterns and noise. This trial-wise approach has explicit utility in relating trial-wise brain information to trial-wise behavior, across multiple cognitions (not only memory studies, as applied here). We have added substantial text to the Introduction distinguishing cRSA, which is widely employed, often in cases with a single repetition per stimulus, and model comparative methods that employ multiple repetitions. We clarify that we do not consider tRSA an alternative to the model comparative approach, and discuss that operational definitions of representational strength are constrained by the study design.

      Page 3. “In this paper, we present an advancement termed trial-level RSA, or tRSA, which addresses these limitations in cRSA (not model comparison approaches) and may be utilized in paradigms with or without repeated stimuli”.

      Page 4. “Representational geometry usually refers to the structure of similarities among repeated presentations of the same stimulus in the neural data (as captured in the brain RSM) and is often estimated utilizing a model comparison approach, whereas representational strength is a derived measure that quantifies how strongly this geometry aligns with a hypothesized model RSM. In other words, geometry characterizes the pattern space itself, while representational strength reflects the degree of correspondence between that space and the theoretical model under test”.

      Finally, we clarified that in our simulation methods we assume a true underlying activity pattern and a random error pattern. The model RSM is computed based on the true pattern, whereas the brain RSM comes from the noisy pattern, not the model RSM itself.

      Page 9. “Then, we generated two sets of noise patterns, which were controlled by parameters σ<sub>A</sub> and σ<sub>B</sub> , respectively, one for each condition”.

      (4) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).

      We have added notations for true and measured activity patterns to differentiate it from our notation for variance. We agree that multilevel models are usually defined at the level of means rather than at the level of variances and we include a Figure (Fig 1D) that describes the model in terms of the means. We clarify that the σ ($\sigma$) used in the manuscript were not variances/standard deviations themselves; rather, they were meant to denote components of the actual (multilevel) variance parameter. Each component was sampled from normal distributions, and they collectively summed up to comprise the final variance parameter for each trial. We have modified our notation for each component to the lowercase letter s to minimize confusion. We have also made our R code publicly available on our lab github, which should provide more clarity on the exact simulation process.

      (5) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.

      We have added justification of the mixed-effects model given the potential assumption violations. We caution readers to investigate the robustness of their models, and to employ permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. Finally, we agree that the first simulation setting does not possess several properties of realistic RDMs/RSMs; however, we believe that there is utility in understanding the mathematical properties of correlations – an essential component of RSA – in a straightforward simulation where the ground truth is known, thus moving the simulation to Appendix 1.

      (6) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.

      The Reviewer is correct that the voxel values in the true pattern are drawn from i.i.d. standard normal distributions. We take the Reviewer’s suggestion of “condition-specific pattern” to mean that there could be a condition-voxel interaction in two non-mutually exclusive ways. The first is additive, essentially some common underlying multi-voxel pattern like [6, 34, -52, …, 8] for all condition A trials, and different one such pattern for condition B trials, etc. The second is multiplicative, essentially a vector of scaling factors [x1.5, x0.5, x0.8, …, x2.7] for all condition A trials, and a different one such vector for condition B trials, etc. Both possibilities could indeed affect tRSA as much as it would cRSA.

      Importantly, If such a strong condition-specific pattern is expected, one can build a condition-specific model RDM using one-shot coding of conditions (see example figure; src: https://www.newbi4fmri.com/tutorial-9-mvpa-rsa), to either capture this interesting phenomenon or to remove this out as a confounding factor. This practice has been applied in multiple regression cRSA approaches (e.g., Cichy et al., 2013) and can also be applied to tRSA.

      (7) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.

      We appreciate this important warning, and now caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the supplement.

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models. The multilevel structure of RSA data introduces potential dependencies across subjects, stimuli, and trials, which can violate assumptions of independence if not properly modeled. In the present study, we used a model that included random intercepts for both subjects and stimuli, which accounts for variance at these levels and improves the generalizability of fixed-effect estimates. Still, there is a potential for systematic dependence across trials within a subject. To ensure that the model assumptions were satisfied, we conducted a series of diagnostic checks on an exemplar ROI (right LOC; middle occipital gyrus) in the Object Perception dataset, including visual inspection of residual distributions and autocorrelation (Appendix 3, Figure 13). These diagnostics supported the assumptions of normality, homoscedasticity, and conditional independence of residuals. In addition, we conducted permutation-based inference, similar to prior improvements to cRSA (Niliet al. 2014), using a nested model comparison to test whether the mean similarity in this ROI was significantly greater than zero. The observed likelihood ratio test statistic fell in the extreme tail of the null distribution (Appendix 3, Figure 14), providing strong nonparametric evidence for the reliability of the observed effect. We emphasize that this type of model checking and permutation testing is not merely confirmatory but can help validate key assumptions in RSA modeling, especially when applying mixed-effects models to neural similarity data. Researchers are encouraged to adopt similar procedures to ensure the robustness and interpretability of their findings”.

      Exemplar Permutation Testing

      To test whether the mean representational strength in the ROI right LOC (middle occipital gyrus) was significantly greater than zero, we used a permutation-based likelihood ratio test implemented via the permlmer function. This test compares two nested linear mixed-effects models fit using the lmer function from the lme4 package, both including random intercepts for Participant and Stimulus ID to account for between-subject and between-item variability.

      The null model excluded a fixed intercept term, effectively constraining the mean similarity to zero after accounting for random effects:

      ROI ~ 0 + (1 | Participant) + (1 | Stimulus)

      The full model included the same random effects structure but allowed the intercept to be freely estimated:

      ROI ~ 1 + (1 | Participant) + (1 | Stimulus)

      By comparing the fit of these two models, we directly tested whether the average similarity in this ROI was significantly different from zero. Permutation testing (1,000 permutations) was used to generate a nonparametric p-value, providing inference without relying on normality assumptions. The full model, which estimated a nonzero mean similarity in the right LOC (middle occipital gyrus), showed a significantly better fit to the data than the null model that fixed the mean at zero (χ²(1) = 17.60, p = 2.72 × 10⁻⁵). The permutation-based p-value obtained from permlmer confirmed this effect as statistically significant (p = 0.0099), indicating that the mean similarity in this ROI was reliably greater than zero. These results support the conclusion that the right LOC contains representational structure consistent with the HMAXc2 RSM. A density plot of the permuted likelihood ratio tests is plotted along with the observed likelihood ratio test in Appendix 3 Figure 14.

      (8) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.

      We agree that the comparability between mean row-wise Spearman correlations and the matrix-wise Spearman correlation is needed. We believe that the simulations are the best approach for this comparison, since they are much more robust than the empirical dataset and have the advantage of knowing the true pattern/noise levels. We expand on our comparison of mean tRSA values and matrix-wise Spearman correlations on page 42.

      Page 42. “Although tRSA and cRSA both aim to quantify representational strength, they differ in how they operationalize this concept. cRSA summarizes the correspondence between RSMs as a single measure, such as the matrix-wise Spearman correlation. In contrast, tRSA computes such correspondence for each trial, enabling estimates at the level of individual observations. This flexibility allows trial-level variability to be modeled directly, but also introduces subtle differences in what is being measured. Nonetheless, our simulations showed that, although numerical differences occasionally emerged—particularly when comparing between-condition tRSA estimates to within-condition cRSA estimates—the magnitude of divergence was small and did not affect the outcome of downstream statistical tests”.

      (9) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.

      We recognize the potential of condition-specific patterns and chose to constrain the analyses to those most comparable with cRSA. However, depending on their hypotheses, researchers may consider testing condition RSMs and utilizing a model comparison approach or employ the z-scored approach, as employed in the simulations above. Regarding the potential run confounds, this is always the case in RSA and why we exclude within-run comparisons. We have also added to the Discussion the suggestion to include run as a covariate in their mixed-effects models. However, we do not employ this covariate here as we preferred the most parsimonious model to compare with cRSA.

      Page 46 - 47. “Further, while analyses here were largely employed to be comparable with cRSA, researchers should consider taking advantage of the flexibility of the mixed-effects models and include co variates of non-interest (run, trial order etc.)”.

      (10) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli.

      We agree that studies advancing designs with multiple repetitions of a given stimulus image are useful in estimating the reliability of concept representations. We would argue however that model comparison in RSA is not restricted to such data. Many extant studies do not in fact have multiple repetitions per stimulus per subject (Wang et al., 2018 https://doi.org/10.1088/1741-2552/abecc3, Gao et al, 2022 https://doi.org/10.1093/cercor/bhac058, Li et al, 2022 https://doi.org/10.1002/hbm.26195, Staples & Graves, 2020 https://doi.org/10.1162/nol_a_00018) that allow for that type of model-comparative approach. While beneficial in terms of noise estimation, having multiple presentations was not a requirement for implementing cRSA (Kriegeskorte, 2008 https://doi.org/10.3389/neuro.06.004.2008). The aim of this manuscript is to introduce the tRSA approach to the broad community of researchers whose research questions and datasets could vary vastly, including but not limited to the number of repeated presentations and the balance of trial counts across conditions.

      (11) Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here.

      We have added language on the value of cross-validation approaches to RSA in the Discussion:

      Page 47. “Additionally, we note that while our proposed tRSA framework provides a flexible and statistically principled approach for modeling trial-level representational strength, we acknowledge that there are alternative methods for addressing trial-level variability in RSA. In particular, the use of cross-validated distance metrics (e.g., crossnobis distance) has become increasingly popular for controlling differences in measurement noise variance and accounting for possible covariance structures across trials (Walther et al., 2016). These metrics offer several advantages, including unbiased estimation of representational dissimilarities under Gaussian noise assumptions and improved generalization to unseen data. However, cross-validated distances are conceptually distinct from the approach taken here: whereas cross-validation aims to correct for noise-related biases in representational dissimilarity matrices, our trial-level RSA method focuses on estimating and modeling the variability in representation strength across individual trials using mixed-effects modeling. Rather than proposing a replacement for cross-validated RSA, tRSA adds a complementary tool to the methodological toolkit—one that supports hypothesis-driven inference about condition effects and trial-level covariates, while leveraging the full structure of the data”.

      (12) One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.

      Indeed, the chosen model RSM may not be the true RSM, but as the noise level increases the correlation between RSMs practically becomes zero. In our simulations we assume this to be true as a straightforward way to manipulate the correspondence between the brain data and the model. However, just like cRSA, tRSA is constrained by the model selections the researchers employ. We encourage researchers to have carefully considered theoretically-motivated models and, if their research questions require, consider multiple and potentially competing models. Furthermore, the trial-wise estimates produced by tRSA encourage testing competing models within the multiple regression framework. We have added this language to the Discussion.

      Page 46. ..”choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives”.

      Pages 45-46. “While a number of studies have addressed the validity of measuring representational geometry using designs with multiple repetitions, a conceptual benefit of the tRSA approach is the reliance on a regression framework that engenders the testing of competing conceptual models of stimulus representation (e.g., taxonomic vs. encyclopedic semantic features, as in Davis et al., 2021)”.

      Reviewer #2 (Public review):

      (1)  While I generally welcome the contribution, I take some issue with the accusatory tone of the manuscript in the Introduction. The text there (using words such as 'ignored variances', 'errouneous inferences', 'one must', 'not well-suited', 'misleading') appears aimed at turning cRSA in a 'straw man' with many limitations that other researchers have not recognized but that the new proposed method supposedly resolves. This can be written in a more nuanced, constructive manner without accusing the numerous users of this popular method of ignorance.

      We apologize for the unintended accusatory tone. We have clarified the many robust approaches to RSA and have made our Introduction and Discussion more nuanced throughout (see also 3, 11 and16).

      (2) The described limitations are also not entirely correct, in my view: for example, statistical inference in cRSA is not always done using classic parametric statistics such as t-tests (cf Figure 1): the rsatoolbox paper by Nili et al. (2014) outlines non-parametric alternatives based on permutation tests, bootstrapping and sign tests, which are commonly used in the field. Nor has RSA ever been conducted at the row/column level (here referred to by the authors as 'trial level'; cf King et al., 2018).

      We agree there are numerous methods that go beyond cRSA addressing these limitations and have added discussion of them into our manuscript as well as an example analysis implementing permutation tests on tRSA data (see response to 7). We thank the reviewer for bringing King et al., 2014 and their temporal generalization method to our attention, we added reference to acknowledge their decoding-based temporal generalization approach.

      Page 8. “It is also important to note that some prior work has examined similarly fine-grained representations in time-resolved neuroimaging data, such as the temporal generalization method introduced by King et al. (see King & Dehaene, 2014). Their approach trains classifiers at each time point and tests them across all others, resulting in a temporal generalization matrix that reflects decoding accuracy over time. While such matrices share some structural similarity with RSMs, they do not involve correlating trial-level pattern vectors with model RSMs nor do their second-level models include trial-wise, subject-wise, and item-wise variability simultaneously”.

      (3) One of the advantages of cRSA is its simplicity. Adding linear mixed effects modeling to RSA introduces a host of additional 'analysis parameters' pertaining to the choice of the model setup (random effects, fixed effects, interactions, what error terms to use) - how should future users of tRSA navigate this?

      We appreciate the opportunity to offer more specific proscriptions for those employing a tRSA technique, and have added them to the Discussion:

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models and choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives. However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (4) Here, only a single real fMRI dataset is used with a quite complicated experimental design for the memory part; it's not clear if there is any benefit of using tRSA on a simpler real dataset. What's the benefit of tRSA in classic RSA datasets (e.g., Kriegeskorte et al., 2008), with fixed stimulus conditions and no behavior?

      To clarify, our empirical approach uses two different tasks: an Object Perception task more akin to the classic RSA datasets employing passive viewing, and a Conceptual Retrieval task that more directly addresses the benefits of the trialwise approach. We felt that our Object Perception dataset is a simpler empirical fMRI dataset without explicit task conditions or a dichotomous behavioral outcome, whereas the Retrieval dataset is more involved (though old/new recognition is the most common form of memory retrieval testing) and  dependent on behavioral outcomes. However, we recognize the utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (5) The cells of an RDM/RSM reflect pairwise comparisons between response patterns (typically a brain but can be any system; cf Sucholutsky et al., 2023). Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. Does this raise issues with the validity of the linear mixed effects model? Does it assume the observations are linearly independent?

      We recognize the potential danger for not meeting model assumptions. Though our simulation results and model checks suggest this is not a fatal flaw in the model design, we caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. See response to R1.

      (6) The manuscript assumes the reader is familiar with technical statistical terms such as Type I/II error, sensitivity, specificity, homoscedasticity assumptions, as well as linear mixed models (fixed effects, random effects, etc). I am concerned that this jargon makes the paper difficult to understand for a broad readership or even researchers currently using cRSA that might be interested in trying tRSA.

      We agree this jargon may cause the paper to be difficult to understand. We have expanded/added definitions to these terms throughout the methods and results sections.

      Page 12. “Given data generated with 𝑠<sub>𝑐𝑜𝑛𝑑,𝐴</sub> = 𝑠<sub>𝑐𝑜𝑛𝑑,B</sub>, the correct inference should be a failure to reject the null hypothesis of ; any significant () result in either direction was considered a false positive (spurious effect, or Type I error). Given data generated with , the inference was considered correct if it rejected the null hypothesis of  and yielded the expected sign of the estimated contrast (b<sub>B-𝐴</sub><0). A significant result with the reverse sign of the estimated contrast (b<sub>B-𝐴</sub><0) was considered a Type I error, and a nonsignificant (𝑝 ≥ 0.05) result was considered a false negative (failure to detect a true effect, or Type II error)”.

      Page 2. “Compared to cRSA, the multi-level framework of tRSA was both more theoretically appropriate and significantly sensitive (better able to detect) to true effects”.

      Page 25.”The performance of cRSA and tRSA were quantified with their specificity (better avoids false positives, 1 - Type I error rate) and sensitivity (better avoids false negatives 1 - Type II error rate)”.

      Page 6. “One of the fundamental assumptions of general linear models (step 4 of cRSA; see Figure 1D) is homoscedasticity or homogeneity of variance — that is, all residuals should have equal variance” .

      Page11. “Specifically, a linear mixed-effects model with a fixed effect  of condition (which estimates the average effect across the entire sample, capturing the overall effect of interest) and random effects of both subjects and stimuli (which model variation in responses due to differences between individual subjects and items, allowing generalization beyond the sample) were fitted to tRSA estimates via the `lme4 1.1-35.3` package in R (Bates et al., 2015), and p-values were estimated using Satterthwaites’s method via the `lmerTest 3.1-3` package (Kuznetsova et al., 2017)”.

      (7) I could not find any statement on data availability or code availability. Given that the manuscript reuses prior data and proposes a new method, making data and code/tutorials openly available would greatly enhance the potential impact and utility for the community.

      We thank the reviewer for raising our oversight here. We have added our code and data availability statements.

      Page 9. “Data is available upon request to the corresponding author and our simulations and example tRSA code is available at https://github.com/electricdinolab”.

      Reviewer #1 (Recommendations for the authors):

      (13) Page 4: The limitations of cRSA seem to be based on the assumption that within each different experimental condition, there are different stimuli, which get combined into the condition. The framework of RSA, however, does not dictate whether you calculate a condition x condition RDM or a larger and more complete stimulus x stimulus RDM. Indeed, in practice we often do the latter? Or are you assuming that each stimulus is only shown once overall? It would be useful at this point to spell out these implicit assumptions.

      We agree that stimulus x stimulus RDMs can be constructed and are often used. However, as we mentioned in the Introduction, researchers are often interested in the difference between two (or more) conditions, such as “remembered” vs. “forgotten” (Davis et al., https://doi.org/10.1093/cercor/bhaa269) or “high cognitive load” vs. “low cognitive load” (Beynel et al., https://doi.org/10.1523/JNEUROSCI.0531-20.2020). In those cases, the most common practice with cRSA is to construct condition-specific RDMs, compute cRSA scores separately for each condition, and then compare the scores at the group level. The number of times each stimulus gets presented does not prevent one from creating a model RDM that has the same rows and columns as the brain RDM, either in the same condition (“high load”) or across different conditions.

      (14) Page 5: The difference between condition-level and stimulus-level is not clear. Indeed, this definition seems to be a function of the exact experimental design and is certainly up for interpretation. For example, if I conduct a study looking at the activity patterns for 4 different hand actions, each repeated multiple times, are these actions considered stimuli or conditions?

      We have added clarifying language about what is considered stimuli vs conditions. Indeed, this will depend on the specific research questions being employed and will affect how researchers construct their models. In this specific example, one would most likely consider each different hand action a condition, treating them as fixed effects rather than random effects, given their very limited number and the lack of need to generalize findings to the broader “hand actions” category.

      Page 5. “Critically, the distinction between condition-level and stimulus level is not always clear as researchers may manipulate stimulus-level features themselves. In these cases, what researchers ultimately consider condition-level and stimulus-level will depend on their specific research questions. For example, researchers intending to study generalized object representation may consider object category a stimulus-level feature, while researchers interested in if/how object representation varies by category may consider the same category variable condition-level”.

      (15) Page 5: The fact that different numbers of trials / different levels of measurement noise / noise-covariance of different conditions biases non-cross-validated distances is well known and repeatedly expressed in the literature. We have shown that cross-validation of distances effectively removes such biases - of course, it does not remove the increased estimation variability of these distances (for a formal analysis of estimation noise on condition patterns and variance of the cross-nobis estimator, see (Diedrichsen et al. 2021)).

      We thank the reviewer for drawing our attention to this literature and have added discussions of these methods.

      (16). Page 5: "Most studies present subjects with a fixed set of stimuli, which are supposedly samples representative of some broader category". This may be the case for a certain type of RSA experiments in the visual domain, but it would be unfair to say that this is a feature of RSA studies in general. In most studies I have been involved in, we use a "stimulus" x "stimulus" RDM.

      We have edited this sentence to avoid the “most” characterization. We also added substantial text to the introduction and discussion distinguishing cRSA, which is nonetheless widely employed, especially in cases with a single repetition per stimulus (Macklin et al., 2023, Liu et al, 2024) and the model comparative method and explicitly stating that we do not consider tRSA an alternative to the model comparative approach.

      (17). Page 5: I agree that "stimuli" should ideally be considered a random effect if "stimuli" can be thought of as sampled from a larger population and one wants to make inferences about that larger population. Sometimes stimuli/conditions are more appropriately considered a fixed effect (for example, when studying the response to stimulation of the 5 fingers of the right hand). Techniques to consider stimuli/conditions as a random effect have been published by the group of Niko Kriegeskorte (Schütt et al. 2023).

      Indeed, in some cases what may be thought of as “stimuli” would be more appropriately entered into the model as a fixed effect; such questions are increasingly relevant given the focus on item-wise stimulus properties (Bainbridge et al., Westfall & Yarkoni). We have added text on this issue to the Discussion and caution researchers to employ models that most directly answer their research questions.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question. An effect is fixed when the levels represent the specific conditions of theoretical interest (e.g., task condition) and the goal is to estimate and interpret those differences directly. In contrast, an effect is random when the levels are sampled from a broader population (e.g., subjects) and the goal is to account for their variability while generalizing beyond the sample tested. Note that the same variable (e.g., stimuli) may be considered fixed or random depending on the research questions”.

      (18) Page 6: It is correct that the "classical" RSA depends on a categorical assignment of different trials to different stimuli/conditions, such that a stimulus x stimulus RDM can be computed. However, both Pattern Component Modelling (PCM) and Encoding models are ideally set up to deal with variables that vary continuously on a trial-by-trial or moment-by-moment basis. tRSA should be compared to these approaches, or - as it should be clarified - that the problem setting is actually quite a different one.

      We agree that PCM and encoding models offer a flexible approach and handle continuous trial-by-trial variables. We have clarified the problem setting in cRSA is distinct on page 6, and we have added the robustness of encoding models and their limitations to the Discussion.

      Page 6. “While other approaches such as Pattern Component Modeling (PCM) (Diedrichsen et al., 2018) and encoding models (Naselaris et al., 2011) are well-suited to analyzing variables that vary continuously on a trial-by-trial or moment-by-moment basis, these frameworks address different inferential goals. Specifically, PCM and encoding models focus on estimating variance components or predicting activation from features, while cRSA is designed to evaluate representational geometry. Thus, cRSA as well as our proposed approach address a problem setting distinct from PCM and encoding models”.

      (19) Page 8: "Then, we generated two noise patterns, which were controlled by parameters 𝜎 𝐴 and 𝜎𝐵, respectively, one for each condition." This makes little sense to me. The noise patterns should be unique to each trial - you should generate n_a + n_b noise patterns, no?

      We clarify that the “noise patterns” here are n_voxel x n_trial in size; in other words, all trial-level noise patterns are generated together and each trial has their own unique noise pattern. We have revised our description as “two sets of noise patterns” for clarity starting on page 9.

      (20) Page 9: First, I assume if this is supposed to be a hierarchical level model, the "noise parameters" here correspond to variances? Or do these \sigma values mean to signify standard deviations? The latter would make little sense. Or is it the noise pattern itself?

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (21) Page 10: your formula states "𝜎<sub>𝑠𝑢𝑏𝑗</sub>~ 𝙽(0, 0.5^2)". This conflicts with your previous mention that \sigmas are noise "levels" are they the noise patterns themselves now? Variances cannot be normally distributed, as they cannot be negative.

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (22) Page 13: What was the task of the subject in the Memory retrieval task? Old/new judgements relative to encoding of object perception?

      We apologize for the lack of clarity about the Memory Retrieval task and have added that information and clarified that the old/new judgements were relative to a separate encoding phase, the brain data for which has been reported elsewhere.

      Page 14. “Memory Retrieval took place one day after Memory Encoding and involved testing participants’ memory of the objects seen in the Encoding phase. Neural data during the Encoding phase has been reported elsewhere. In the main Memory Retrieval task, participants were presented with 144 labels of real-world objects, of which 114 were labels for previously seen objects and 30 were unrelated novel distractors. Participants performed old/new judgements, as well as their confidence in those judgements on a four-point scale (1 = Definitely New, 2 = Probably New, 3 = Probably Old, 4 = Definitely Old)”.

      (23) Page 13: If "Memory Retrieval consisted of three scanning runs", then some of the stimulus x stimulus correlations for the RSM must have been calculated within a run and some between runs, correct? Given that all within-run estimates share a common baseline, they share some dependence. Was there a systematic difference between the within-run and the between-run correlations?

      We have clarified in this portion of the methods that within run comparisons were excluded from our analyses. We also double-checked that the within-run exclusion was included in the description of the Neural RSMs.

      Page 14. “Retrieval consisted of three scanning runs, each with 38 trials, lasting approximately 9 minutes and 12 seconds (within-run comparisons were later excluded from RSA analyses)”.

      Page 18. “This was done by vectorizing the voxel-level activation values within each region and calculating their correlations using Pearson’s r, excluding all within-run comparisons.”

      (24) Page 20: It is not clear why the mean estimate of "representational strength" (i.e., model-brain RSM correlations) is important at all. This comes back to Major point #2, namely that you are trying to solve a very different problem from model-comparative RSA.

      We have clarified that our approach is not an alternative to model-comparative RSA, and that depending on the task constraints researchers may choose to compare models with tRSA or other approaches requiring stimulus repetition (see 3).

      (25) Page 21: I believe the problems of simulating correlation matrices directly in the way that the authors in their first simulation did should be well known and should be moved to an appendix at best. Better yet, the authors could start with the correct simulation right away.

      We agree the paper is more concise with these simulations being moved to the appendix and more briefly discussed. We have implemented these changes (Appendix 1). However, we are not certain that this problem is unknown, and have several anecdotes of researchers inquiring about this “alternative” approach in talks with colleagues, thus we do still discuss the issues with this method.

      (26) Page 26: Is the "underlying continuous noise variable 𝜎𝑡𝑟𝑖𝑎𝑙 that was measured by 𝑣𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 " the variance of the noise pattern or the noise pattern itself? What does it mean it was "measured" - how?

      𝜎𝑡𝑟𝑖𝑎𝑙 is a vector of standard deviations for different trials, and 𝜎𝑡𝑟𝑖𝑎𝑙 i would be used to generate the noise patterns for trial i. v_measured is a hypothetical measurement of trial-level variability, such as “memorability” or “heartbeat variability”. We have revised our description to clarify our methods.

      Reviewer #2 (Recommendations for the authors):

      (8) It would be helpful to provide more clarity earlier on in the manuscript on what is a 'trial': in my experience, a row or column of the RDM is usually referred to as 'stimulus condition', which is typically estimated on multiple trials (instances or repeats) of that stimulus condition (or exemplars from that stimulus class) being presented to the subject. Here, a 'trial' is both one measurement (i.e., single, individual presentation of a stimulus) and also an entry in the RDM, but is this the most typical scenario for cRSA? There is a section in the Discussion that discusses repetitions, but I would welcome more clarity on this from the get-go.

      We have added discussion of stimulus repetition methods and datasets to the Introduction and clarified our use of the terms.

      Page 8. “Critically, in single-presentation designs, a “trial” refers to one stimulus presentation, and corresponds to a row or column in the RSM. In studies with repeated stimuli, these rows are often called “conditions” and may reflect aggregated patterns across trials. tRSA is compatible with both cases: whether rows represent individual trials or averaged trials that create “conditions”, tRSA estimates are computed at the row level”.

      (9) The quality of the results figures can be improved. For example, axes labels are hard to read in Figure 3A/B, panels 3C/D are hard to read in general. In Figure 7E, it's not possible to identify the 'dark red' brain regions in addition to the light red ones.

      We thank the reviewer for raising these and have edited the figures to be more readable in the manner suggested.

      (10) I would be interested to see a comparison between tRSA and cRSA in other fMRI (or other modality) datasets that have been extensively reported in the literature. These could be the original Kriegeskorte 96 stimulus monkey/fMRI datasets, commonly used open datasets in visual perception (e.g., THINGS, NSD), or the above-mentioned King et al. dataset, which has been analyzed in various papers.

      We recognize the great utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (11) On P39, the authors suggest 'researchers can confidently replace their existing cRSA analysis with tRSA': Please discuss/comment on how researchers should navigate the choice of modeling parameters in tRSA's linear mixed effects setting.

      We have added discussion of the mixed-effects parameters and the various and encourage researchers to follow best practices for their model selection.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (12) The final part of the Results section, demonstrating the tRSA results for the continuous memorability factor in the real fMRI data, could benefit from some substantiation/elaboration. It wasn't clear to me, for example, to what extent the observed significant association between representational strength and item memorability in this dataset is to be 'believed'; the Discussion section (p38). Was there any evidence in the original paper for this association? Or do we just assume this is likely true in the brain, based on prior literature by e.g. Bainbridge et al (who probably did not use tRSA but rather classic methods)?

      Indeed, memorability effects have been replicated in the literature, but not using the tRSA method. We have expanded our discussion to clarify the relationship of our findings and the relevant literature and methods it has employed.

      Page 38. “Critically, memorability is a robust stimulus property that is consistent across participants and paradigms (Bainbridge, 2022). Moreover, object memorability effects have been replicated using a variety of methods aside from tRSA, including univariate analyses and representational analyses of neural activity patterns where trial-level neural activity pattern estimates are correlated directly with object memorability (Slayton et al, 2025).”

      (13) The abstract could benefit from more nuance; I'm not sure if RSA can indeed be said to be 'the principal method', and whether it's about assessing 'quality' of representations (more commonly, the term 'geometry' or 'structure' is used).

      We have edited the abstract to reflect the true nuisance in the current approaches.

      Abstract. Neural representation refers to the brain activity that stands in for one’s cognitive experience, and in cognitive neuroscience, a prominent method of studying neural representations is representational similarity analysis (RSA). While there are several recent advances in RSA, the classic RSA (cRSA) approach examines the structure of representations across numerous items by assessing the correspondence between two representational similarity matrices (RSMs): usually one based on a theoretical model of stimulus similarity and the other based on similarity in measured neural data.

      (14) RSA is also not necessarily about models vs. neural data; it can also be between two neural systems (e.g., monkey vs. human as in Kriegeskorte et al., 2008) or model systems (see Sucholutsky et al., 2023). This statement is also repeated in the Introduction paragraph 1 (later on, it is correctly stated that comparing brain vs. model is most likely the 'most common' approach).

      We have added these examples in our introduction to RSA.

      Page 3.”One of the central approaches for evaluating information represented in the brain is representational similarity analysis (RSA), an analytical approach that queries the representational geometry of the brain in terms of its alignment with the representational geometry of some cognitive model (Kriegeskorte et al., 2008; Kriegeskorte & Kievit, 2013), or, in some cases, compares the representational geometry of two neural systems (e.g., Kriegeskorte et al., 2008) or two model systems (Sucholutsky et al., 2023)”.

      (15) 'theoretically appropriate' is an ambiguous statement, appropriate for what theory?

      We apologize for the ambiguous wording, and have corrected the text:

      Page 11. “Critically, tRSA estimates were submitted to a mixed-effects model which is statistically appropriate for modeling the hierarchical structure of the data, where observations are nested within both subjects and stimuli (Baayen et al., 2008; Chen et al., 2021)”.

      (16) I found the statement that cRSA "cannot model representation at the level of individual trials" confusing, as it made me think, what prohibits one from creating an RDM based on single-trial responses? Later on, I understood that what the authors are trying to say here (I think) is that cRSA cannot weigh the contributions of individual rows/columns to the overall representational strength differently.

      We thank the reviewer for their clarifying language and have added it to this section of the manuscript.

      “Abstract. However, because cRSA cannot weigh the contributions of individual trials (RSM rows/columns), it is fundamentally limited in its ability to assess subject-, stimulus-, and trial-level variances that all influence representation”.

      (17) Why use "RSM" instead of "RDM"? If the pairwise comparison metric is distance-based (e..g, 1-correlation as described by the authors), RDM is more appropriate.

      We apologize for the error, and have clarified the Methods text:

      Page3-4. First, brain activity responses to a series of N trials are compared against each other (typically using Pearson’s r) to form an N×N representational similarity matrix.

      (18) Figure 2: please write 'Correlation estimate' in the y-axis label rather than 'Estimate'.

      We have edited the label in Figure 2.

      (19) Page 6 'leaving uncertain the directionality of any findings' - I do not follow this argument. Obviously one can generate an RDM or RSM from vector v or vector -v. How does that invalidate drawing conclusions where one e.g., partials out the (dis)similarity in e.g., pleasantness ratings out of another RDM/RSM of interest?

      We agree such an approach does not invalidate the partial method; we have clarified what we mean by “directionality”.

      Page 8. ”For instance, even though a univariate random variable , such as pleasantness ratings, can be conveniently converted to an RSM using pairwise distance metrics (Weaverdyck et al., 2020), the very same RSM would also be derived from the opposite random variable , leaving uncertain of the directionality (or if representation is strongest for pleasant or unpleasant items) of any findings with the RSM (see also Bainbridge & Rissman, 2018)”.

      (20) P7 'sampled 19900 pairs of values from a bi-variate normal distribution', but the rows/columns in an RDM are not independent samples - shouldn't this be included in the simulation? I.e., shouldn't you simulate first the n=200 vectors, and then draw samples from those, as in the next analysis?

      This section has been moved to Appendix 1 (see responses to Reviewer 1.13).

      (21) Under data acquisition, please state explicitly that the paper is re-using data from prior experiments, rather than collecting data anew for validating tRSA.

      We have clarified this in the data acquisition section.

      Page 13. “A pre-existing dataset was analyzed to evaluate tRSA. Main study findings have been reported elsewhere (S. Huang, Bogdan, et al., 2024)”.

      (22) Figure 4 could benefit from some more explanation in-text. It wasn't clear to me, for example, how to interpret the asterisks depicted in the right part of the figure.

      We clarified the meaning of the asterisks in the main text in addition to the existent text in the figure caption.

      Page 26. “see Figure 4, off-diagonal cells in blue; asterisks indicate where tRSA was statistically more sensitive then cRSA)”.

      (23) Page 38 "the outcome of tRSA's improved characterization can be seen in multiple empirical outcomes:" it seems there is one mention of 'outcomes' too many here.

      We have revised this sentence.

      Page 41. “tRSA's improved characterization can be seen in multiple empirical outcomes”.

      (24) Page 38 "model fits became the strongest" it's not clear what aspect of the reported results in the paragraph before this is referring to - the Appendix?

      Yes, the model fits are in the Appendix, we have added this in text citation.

      Moreover, model-fits became the strongest when the models also incorporated trial-level variables such as fMRI run and reaction time (Appendix 3, Table 6).

      References

      Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789

      Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

      Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.

      Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.

      Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.

      King, M. L., Groen, I. I., Steel, A., Kravitz, D. J., & Baker, C. I. (2019). Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage, 197, 368-382.

      Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., ... & Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.

      Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bobu, A., Kim, B., ... & Griffiths, T. L. (2023). Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018.

    2. Reviewer #2 (Public review):

      This paper proposes two changes to classic RSA, a popular method to probe neural representation in neuroimaging experiments: computing RSA at row/column level of RDM, and using linear mixed modeling to compute second level statistics, using the individual row/columns to estimate a random effect of stimulus. The benefit of the new method is demonstrated using simulations and a re-analysis of a prior fMRI dataset on object perception and memory encoding.

      The author's claim that tRSA is a promising approach to perform more complete modeling of cogneuro data, and to conceptualize representation at the single trial/event level (cf Discussion section on P42), is appealing.

      In their revised manuscript, the authors have addressed some previous concerns, now referencing more literature aiming to improve RSA and its associated statistical inferences, and providing more guidance on methodological considerations in the Discussion. However, I wish the authors had more extensively edited the Introduction to better contextualize the work and clarify the specific settings in which they see the method as being beneficial over classic RSA. For example, some of the limitations of cRSA mentioned on page 6, e.g. related to presenting the same stimuli to multiple subjects, seem to be quite specific to settings where the researcher expects differential responses across subjects to fundamentally alter the interpretation, rather than something that will just average out by repeatedly offering the same stimulus, or combining data across subjects. It's not clear to me how the switch from 'matrix-level' to 'row-level' analysis in tRSA necessarily addresses this problem. I would be very helpful if the authors would more explicitly outline what problem the row-level aspect of tRSA is solving; what problem statistical inference via LMM is solving; and walk the reader through a very specific use case (perhaps a toy version of the real-data experiment which is now at the end of the paper). Explaining the utility of tRSA for experimental settings in which assessing representational strength for a single-events is crucial would clarify the contribution of this new method better.

      A few weaknesses mentioned in my previous review were not adequately addressed. To demonstrate the utility of the method on real neural recordings, only a single dataset is used with a quite complicated experimental design; it's not clear if there is any benefit of using tRSA on a simpler real dataset. Moreover, the cells of an RDM/RSM reflect pairwise comparisons between response patterns. Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. While the authors show examples that failure to meet independence assumptions do not affect results in their specific dataset, it does not get acknowledged as a problem at a more fundamental level. Finally, while the paper now states that 'simulations and example tRSA code' are publicly available, the link points to the lab's general github page containing many lab repositories, in which I could not identify a specific repository related to this paper. This is disappointing given that the main goal of this manuscript is to provide a new method that they encourage others to use; a clear pointer to available code is only a minimal requirement to achieve that goal. A dedicated repository, including documentation, READMEs and tutorials/demo's to run simulations, compare methods, etc. would greatly enhance the paper's contribution.

    3. eLife Assessment

      This study proposes a potentially useful improvement on a popular fMRI method for quantifying representational similarity in brain measurements by focusing on representational strength at the single trial level and adding linear mixed effects modeling for group-level inference. The manuscript provides solid evidence of increased sensitivity with no loss of precision compared to more classic versions of the method. However, several assumptions are insufficiently motivated, and it is unclear to what extent the approach would generalize to other paradigms.

    1. eLife Assessment

      This is an important study that provides compelling data from a diverse set of approaches from single cell transcriptome data and network analysis from genetically diverse mouse cells to identify novel driver genes underlying human GWAS associations. The authors present evidence that network analysis of scRNA-seq data from genetically diverse mouse bone-marrow derived stromal cells can be informative for identifying human BMD GWAS driver genes. Their approach should be broadly used and applicable to other GWAS studies.

    2. Reviewer #1 (Public review):

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease.

      Specifically, they utilize a large single cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses.

      The current study builds on previous published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilized a much larger scRNA-seq dataset from 80 DO BMSC-OBs, inferred co-expression based on Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that co-localized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below.

      Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking.

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3).

      Potential drawbacks of the authors' approach include their focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type specific eQTLs. Another issue concerns potential model overfitting in the iterativeWGCNA analysis of mesenchymal cell type-specific co-expression, which identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per cell read depth (400-6200 reads/cell) and drop outs, it's surprising that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting is responsible for these results.

      Overall, though, these concerns are minor relative to the many strengths of the study design and results. Indeed, I expect the analytical framework employed by the authors here will be valuable to -- and replicated by -- researchers in other disease areas.

      Comments on revisions:

      Thank you for addressing my concerns. This is an impressive study and manuscript that you should be proud of.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Farber and colleagues have performed single cell RNAseq analysis on bone marrow derived stem cells from DO Mice. By performing network analysis, they look for driver genes that are associated with bone mineral density GWAS associations. They identify two genes as potential candidates to showcase the utility of this approach.

      Strengths:

      The study is very thorough and the approach is innovative and exciting. The manuscript contains some interesting data relating to how cell differentiation is occurring and the effects of genetics on this process. The section looking for genes with eQTLs that differ across the differentiation trajectory (Figure 4) was particularly exciting.

      Weaknesses:

      The manuscript is, in parts, hard to read due to the use of acronyms and there are some questions about data analysis that still need to be addressed.

      Comments on revisions:

      Dillard et al have made several improvements to their manuscript.

      (1) We previously asked the authors to determine whether any cell types were enriched for BMD-related traits since the premise of the paper is that 'many genes impacting BMD do so by influencing osteogenic differentiation or ... adipogenic differentiation'. Given the potential for the cell culture method to skew the cell type distribution non-physiologically, it is important to establish which cell types in their assay are most closely associated with BMD traits. The new CELLECT analysis and Figure 1E address this point nicely. However, it would still be nice to see the correlations between these cell types and BMD traits in the mice as this would provide independent evidence to support their physiological importance more broadly.

      (2) Shortening the introduction.

      (3) Addressing limitations that arise from not accounting for founder genome SNPs when aligning scRNA-seq data.

      (4) The main take-away of this paper is, to us, the development of a single cell approach to studying BMD-related traits. It is encouraging that the cells post-culture appear to be representative of those pre-culture (supplemental figure 3).

      However, the authors seem to have neglected several comments made by both reviewers. While we share the authors' enthusiasm for the single cell analytical approach, we do not understand their reluctance to perform further statistical tests. We feel that the following comments have still not been addressed:

      (1) The manuscript still contains the following:

      "To provide further support that tradeSeq-identified genes are involved in differentiation, we performed a cell type-specific expression quantitative trait locus (eQTL) analysis for each mesenchymal cell type from the 80 DO mice. We identified 563 genes (eGenes) regulated by a significant cis-eQTL in specific cell types of the BMSC-OB scRNA-seq data (Supplementary Table S14). In total, 73 eGenes were also tradeSeq-identified genes in one or more cell type boundaries along their respective trajectories (Supplementary Table S9)."

      The purpose of this paragraph is to convince readers that the eGenes approach aligns with the tradeSeq approach (and that their approach can therefore be trusted). It is essential that such claims are supported by statistical reasoning. Given that it would be very simple to perform permutation/enrichment analyses to address this point, and both reviewers requested similar analyses, we do not understand the author's reluctance here. Otherwise, this section should be rewritten so that it does not imply that the identification of these genes provides support for their approach.

      (2) Given that a central purpose of this manuscript is to establish a systematic workflow for identifying candidate genes, the manuscript could still benefit from more explanation as to why the authors chose to highlight Tpx2 and Fgfrl1. Tpx2 does already have a role in bone physiology through the IMPC. The authors should comment on why they did not explore Kremen1, for instance, as this gene seems important for the transition to both OB1 and 2.

      A final minor comment is that it would be very helpful if the authors could indicate if the DDGs in Table 1 are also eGenes for the relevant cell type. This is much more meaningful than looking through GTEx.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease. Specifically, they utilize a large single-cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses. The current study builds on previously published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilize a much larger scRNA-seq dataset from 80 DO BMSC-OBs, infer co-expression-based and Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentiation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that colocalized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single-cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below. Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking. 

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3). One weakness involves the focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but the reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type-specific eQTLs. Furthermore, the mesenchymal cell type-specific co-expression analysis by iterative WGCNA identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per-cell read depth (400-6200 reads/cell) and dropouts, it's hard to believe that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting here and would expect that many/most of these identified modules have very few gene members, but the methods list a minimum module size of 20 genes. How do the numbers of modules identified in this study compare to other published scRNA-seq studies that use iterative WGCNA? 

      In the section "Identification of differentiation driver genes (DDGs)", the authors identified 408 significant DDGs and found that 49 (12%) were reported by the International Mouse Knockout [sic] Consortium (IMPC) as having a significant effect on whole-body BMD when knocked out in mice. Is this enrichment significant? E.g., what is the background percentage of IMPC gene knockouts that show an effect on whole-body BMD? Similarly, they found that 21 of the 408 DDGs were genes that have BMD GWAS associations that colocalize with GTEx eQTLs/sQTLs. Given that there are > 1,000 BMD GWAS associations, is this enrichment (21/408) significant? Recommend performing a hypergeometric test to provide statistical context to the reported overlaps here. 

      We thank the reviewer for their constructive feedback and thoughtful questions. In regards to the iterativeWGCNA, a larger number of modules is sometimes an outcome of the analysis, as reported in the iterativeWGCNA preprint (Greenfest-Allen et al., 2017). While we did not make a comparison to other works leveraging this tool for scRNA-seq, it has been used broadly across other published studies, such as PMID: 39640571, 40075303, 33677398, 33653874. While model overfitting, as you mention, may be a cause for more modules, our Bayesian network analysis we perform after iterativeWGCNA highlights smaller aspects of coexpression modules, as opposed to focusing on the entirety of any given module.

      We did not perform enrichment or statistical tests as our goal was to simply highlight attributes or unique features of these genes for additional context.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Farber and colleagues have performed single-cell RNAseq analysis on bone marrow-derived stem cells from DO Mice. By performing network analysis, they look for driver genes that are associated with bone mineral density GWAS associations. They identify two genes as potential candidates to showcase the utility of this approach. 

      Strengths: 

      The study is very thorough and the approach is innovative and exciting. The manuscript contains some interesting data relating to how cell differentiation is occurring and the effects of genetics on this process. The section looking for genes with eQTLs that differ across the differentiation trajectory (Figure 4) was particularly exciting. 

      Weaknesses: 

      The manuscript is in parts hard to read due to the use of acronyms and there are some questions about data analysis that need to be addressed. 

      We thank the reviewer for their feedback and shared enthusiasm for our work. We tried to minimize the use of technical acronyms as much as we could without compromising readability. Additionally, we addressed questions regarding aspects of data analysis. 

      Reviewer #1 (Recommendations for the authors):

      (1) For increased transparency and to allow reproducibility, it would be necessary for the scripts used in the analysis to be shared along with the publication of the preprint. Also, where feasible, sharing the processed data in addition to the raw data would allow the community greater access to the results and be highly beneficial. 

      Thank you for this suggestion. The raw data will be available via GEO accession codes listed in the data availability statement. We will make available scripts for some analyses on our Github (https://github.com/Farber-Lab/DO80_project) and processed scRNA-seq data in a Seurat object (.rds) on Zenodo (https://zenodo.org/records/15299631)

      (2) Lines 55-76: I think the summary of previous work here is too long. I understand that they would like to cover what has been done previously, but this seems like overkill. 

      Good suggestion. We have streamlined some of the summary of our previous work.

      (3) Did the authors try to map QTL for cell-type proportion differences in their BMSC-OBs? While 80 samples certainly limit mapping power, the data shown in Figs 4C/D suggest that you might identify a large-effect modifier of LMP/OB1 proportions. 

      We did try to map QTL for cell type proportion differences, but no significant associations were identified. 

      (4) Methods question: Does the read alignment method used in your analysis account for SNPs/indels that segregate among the DO/CC founder strains? If not, the authors may wish to include this in their discussion of study limitations and speculate on how unmapped reads could affect expression results. 

      The read alignment method we used does not account for SNPs/indels from the DO founder strains that fall in RNA transcripts captured in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424). 

      (5) Much of the discussion reads as an overview of the methods, while a discussion of the results and their context to the existing BMD literature is relatively lacking in comparison.

      We have added additional explanation of the results and context to the discussion (line 381-382, 396-407). 

      (6) Figure 1E and lines 146-149: Adjusted p values should be reported in the figure and accompanying text instead of switching between unadjusted and adjusted p values. 

      We updated Figure 1e to portray adjusted p-values, listed the adjusted p-values in legend of Figure 1e, and listed them in the main text (line 153-154).

      (7) Why do the authors bring the IMPC KO gene list into the analysis so late? This seems like a highly relevant data resource (moreso than the GTEx eQTLs/sQTLs) that could have been used much earlier to help identify DDGs. 

      Given that our scRNA-seq data is also from mice, we did choose to integrate information from the IMPC to highlight supplemental features of genes in networks (i.e., genes that have an experimentally-tested and significant effect on BMD in mice). However, our primary goal was to inform human GWAS and leverage our previous work in which we identified colocalizations between human BMD GWAS and eQTL/sQTL in a human GTEx tissue, which is why this information was used to guide our network analysis.

      (8) Does Fgfrl1 and/or Tpx2 have a cis-eQTL in your BMSC-OB scRNA-seq dataset? 

      We did not identify cis-eQTL effects for Fgfrl1 and Tpx2.

      (9) Figure 4B-C: These eQTLs may be real, but based on the diplotype patterns in Figure 4C, I suspect they are artifacts of low mapping power that are driven by rare genotype classes with one or two samples having outlier expression results. For example, if you look at the results in Fig 4C for S100a1 expression, the genotype classes with the highest/lowest expression have lower sample numbers. In the case of Pkm eQTL showing a PWK-low effect, the PWK genome has many SNPs that differ from the reference genome in the 3' UTR of this gene, and I wonder if reads overlapping these SNPs are not aligning correctly (see point 4 above) and resulting (falsely) in lower expression values for samples with a PWK haplotype. 

      As mentioned above, our alignment method did not consider DO founder genetic variation that is specifically located in the 3’ end of RNA transcripts in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424).

      In future studies, we intend to include larger populations of mice to potentially overcome, as you mention, any artifacts that may be attributable to low statistical power, rare genotype classes, or outlier expression.

      Reviewer #2 (Recommendations for the authors):

      Major Points 

      (1) The authors hypothesize "that many genes impacting BMD do so by influencing osteogenic differentiation or possibly bone marrow adipogenic differentiation". However, cell type itself does not correlate with any bone trait. Does this indicate that the hypothesis is not entirely correct, as genes that drive these phenotypes would not be enriched in one particular cell type? The authors have previously identified "high-priority target genes". So, are there any cell types that are enriched for these target genes? If not, this would indicate that all these genes are more ubiquitously expressed and this is probably why they would have a greater effect on the overall bone traits. Furthermore, are the 73 eGenes (so genes with eQTLs in a particular cell type that change around cell type boundaries) or the DDGs (Table 1) enriched for these high-priority target genes? 

      The bone traits measured in the DO mice are complex and impacted by many factors, including the differentiation propensity and abundance of certain cell types, both within and outside of bone. Though we did not identify correlations between cell type abundance and the bone traits we measured, we tailored our investigations to focus on cellular differentiation using the scRNA-seq data. However, future studies would need to be performed to investigate any connections between cellular differentiation, cell type abundance, and bone traits.

      We did not perform enrichment analyses of either the target genes identified from our other work or eGenes identified here, but instead used the target gene list to center our network analysis and the eGenes to showcase the utility of the DO mouse population.

      (2) The readability of the paper could be improved by minimising the use of acronyms and there are several instances of confusing wording throughout the paper. In many cases, this can be solved by re-organising sentences and adding a bit more detail. For example, it was unclear how you arrived at Fgfrl1 or Tpx2.

      One of the goals of our study was to identify genes that have (to our knowledge) little to no known connection to BMD. We chose to highlight Fgfrl1 and Tpx2 because there is minimal literature characterizing these genes in the context of bone, which we speak to in the results (line 296-297). Additionally, we prioritized these genes in our previous work and they were identified in this study by using our network analyses using the scRNA-seq data, which we mention in the results (line 276-279).

      (3) Technical aspects of the assay. In Figure 1d you show that the cell populations vary considerably between different DO mice. It would be useful to give some sense of the technical variance of this assay given that the assay involves culturing the cells in an exogenous environment. This could take the form of tests between mice within the same inbred strain, or even between different legs of the same DO mice to show that results are technically very consistent. It might also be prudent to identify that this is a potential limitation of the approach as in vitro culturing has the potential to substantially change the cell populations that are present. 

      We agree that in vitro culturing, in addition to the preparation of single cells for scRNA-seq, are unavoidable sources of technical variation in this study. However, the total number of cells contributed by each of the 80 DO mice after data processing does not appear to be skewed and the distribution appears normal (see added figures, now included as Supplemental Figure 3). Therefore, technical variation is at least consistent across all samples. Nevertheless, we have mentioned the potential for technical variation artifacts in our study in the discussion (line 414-416).

      (4) Need for permutation testing. "We identified 563 genes regulated by a significant eQTL in specific cell types. In total, 73 genes with eQTLs were also tradeSeq-identified genes in one or more cell type boundaries". These types of statements are fine but they need to be backed up with permutation testing to show that this level of enrichment is greater than one would expect by chance. 

      We did not perform enrichment tests as our only goal was to 1. determine if eQTL could be resolved in the DO mouse population using our scRNA-seq data and 2. predict in what cell type the associated eQTL and associated eGene may have an effect.

      (5) The main novelty of the paper seems to be that you have used single-cell RNA seq (given that you appear to have already detailed the candidates at the end). I don't think this makes the paper less interesting, but I think you need to reframe the paper more about the approach, and not the specific results. How you landed on these candidates is also not clear. So the paper might be improved by more robustly establishing the workflow and providing guidelines for how studies like this should be conducted in the future. 

      We sought to not only devise a rigorous approach to analyze our single cell data, but also showcase the utility of the approach in practice by highlighting targets for future research (i.e., Fgfrl1 and Tpx2).

      Our goal was to identify novel genes and we landed on these candidate genes (Fgfrl1 and Tpx2) because they had substantial data supporting their causality and they have yet to be fully characterized in the context of bone and BMD (line 295-297).

      In regards to establishing the workflow, we have included rationale for specific aspects of our approach throughout the paper. For example, Figure 2 itemizes each step of our network analysis and we explain why each step is utilized throughout various parts results (e.g., lines 168-170, 179-181, 191-193, 202-203, 257-260, 276-277).

      We have added a statement advocating for large-scale scRNA-seq from genetically diverse samples and network analyses for future studies (line 436-438).

      Minor Points 

      (1) In the summary you use the word "trajectory". Trajectories for what? I assume the transition between cell types, but this is not clear. 

      We added text to clarify the use of trajectory in the summary (line 34).

      (2) This sentence: "By 60 identifying networks enriched for genes implicated in GWAS we predicted putatively causal genes 61 for hundreds of BMD associations based on their membership in enriched modules." is also not clear. Do you mean: we predicted putatively causal genes by identifying clusters of co-expressed genes that were enriched for GWAS genes?" It is not clear how you identify the causal gene in the network. Is this just based on the hub gene? 

      The aforementioned sentence has since been removed to streamline the introduction, as suggested by Reviewer 1.

      In regards to causal gene identification, it is not based on whether it is hub gene. We prioritized a DDG (and their associated networks) if it was a causal gene that we identified in our previous work as having eQTL/sQTL in a GTEx tissue that colocalizes with human BMD GWAS.

      (3) Figure 3C. This is good but the labels are quite small. Would be good to make all the font sizes larger. 

      We have enlarged Figure 3C.

      (4) Line 341 in the Discussion should be "pseudotemporal". 

      We have edited “temporal” to “pseduotemporal”.

    1. eLife Assessment

      This study presents a valuable finding on the neural representation of time from two distinct egocentric and allocentric reference frames. The evidence is solid and largely supports the hypothesis, with one caveat that the task differences could impact the observed effects. The work will be of interest to cognitive neuroscientists working on the perception and memory of time.

    2. Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible temporal construals. For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and participants were instructed to consider the event from "an internal" or from "an external" perspective. The authors found distinct patterns of brain activity in the posterior parietal cortex (PPC) and anterior hippocampus for the internal and the external viewpoint. Specifically, activation in the posterior parietal cortex positively correlated with distance during the external-perspective task, but negatively during the internal-perspective task. The anterior hippocampus positively correlated with distance in both perspectives. The authors conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are supported by the parietal cortex.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and the work tackles them from the perspective of construals theory.

      Weaknesses:

      Although the work uses two distinct psychological tasks, the authors do not elaborate on the cognitive operationalization the tasks entail, nor the implication of the task design for the observed neural activation.

    3. Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses as well as neuroimaging analyses is also much appreciated.

      Suggestions:

      The authors have done a commendable job addressing my previous comments. In particular, the additional analyses elucidating the potential contribution of boundary effects to the behavioural data, the impact of incorporating RT into the fMRI GLMs, and the differential contributions of RT and sequential distance to neural activity (i.e., in PPC) are valuable and strengthen the authors' interpretation of their findings.

      My one remaining suggestion pertains to the potential contribution of boundary effects. While the new analyses suggest that the RT findings are driven by sequential distance and duration independent of a boundary effect (i.e., Same vs. Different factor), I'm wondering whether the same applies to the neural findings? In other words, have the authors run a GLM in which the Same vs. Different factor is incorporated alongside distance and duration?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible "temporal construals". For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and they were instructed to consider the event from "an internal" or from "an external" perspective. The authors found opposite patterns of brain activity in the posterior parietal cortex and the anterior hippocampus for the internal and the external viewpoint. They conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are used in the parietal cortex. The claims align with previous fMRI work addressing this question.

      We appreciate the reviewer's concise summary of our research. We would like to offer two clarifications to prevent any potential misunderstandings.

      First, the activity patterns in the parietal cortex and hippocampus are not entirely opposite across internal and external perspectives. Specifically, the activation level in the posterior parietal cortex shows a positive correlation with sequential distance during external-perspective tasks, but a negative correlation during internal-perspective tasks. In contrast, the activation level in the anterior hippocampus positively correlates with sequential distance, irrespective of the observer's perspective. Therefore, our results suggest that the parietal cortex, with its perspective-dependent activity, supports egocentric representation; the hippocampus, with its consistent activity across perspectives, supports allocentric representation.

      Second, while some of our findings align with previous fMRI studies, to our knowledge, no prior research has explicitly investigated how the neural representation of time may vary depending on the observer's viewpoint. This gap in the literature is the primary motivation for our current study.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and this work seems to want to tackle some of them.

      We appreciate the reviewer's acknowledgment of the theoretical significance of our study.

      Weaknesses:

      The current writing is fuzzy both conceptually and experimentally. I cannot provide a sufficiently well-informed assessment of the quality of the experimental work because there is a paucity of details provided in the report. Any future revisions will likely improve transparency.

      (1) Improving writing and presentation:

      The abstract and the introduction make use of loaded terms such as "construals", "mental timeline", "panoramic views" in very metaphoric and unexplained ways. The authors do not provide a comprehensive and scholarly overview of these terms, which results in verbiage and keywords/name-dropping without a clear general framework being presented. Some of these terms are not metaphors. They do refer to computational concepts that the authors should didactically explain to their readership. This is all the more important that some statements in the Introduction are misattributed or factually incorrect; some statements lack attributions (uncited published work). Once the theory, the question, and the working hypothesis are clarified, the authors should carefully explain the task.

      We appreciate the reviewer's critics.

      The formulation of the scientific question in the introduction is grounded in the spatial construals of time hypothesis and conceptual metaphor theory (e.g., Traugott, 1978; Lakoff & Johnson, 1980; see recent reviews by Núñez & Cooperrider, 2013; Bender & Beller, 2014). These frameworks were originally developed through analyses of how spatial metaphors are used to describe temporal concepts in natural language. Consequently, it is theoretically motivated and largely unavoidable to introduce the two primary temporal construals—mental time travel and mental time watching— using metaphorical expressions.

      However, we do agree with the reviewer that the introduction in the original manuscript was overly long and that the working hypothesis was not clearly stated. In the revised manuscript, we have streamlined the introduction and substantially revised the following two paragraphs to clarify the formulation of our working hypothesis (Pages 5-6):

      “Recent studies have already begun to investigate the neural representation of the memorized event sequence (e.g., Deuker et al., 2016; Thavabalasingam et al., 2018; Bellmund et al., 2019, 2022; see reviews by Cohn-Sheehy & Ranganath, 2017; Bellmund et al., 2020). Yet, the neural mechanisms that enable the brain to construct distinct construals of an event sequence remain largely unknown. Valuable insights may be drawn from research in the spatial domain, which diPerentiates the neural representation in allocentric and egocentric reference frames. According to an influential neurocomputational model (Byrne et al., 2007; Bicanski & Burgess, 2018; Bicanski & Burgess, 2020), allocentric and egocentric spatial representations are dissociable in the brain—they are respectively implemented in the medial temporal lobe (MTL)—including the hippocampus—and the parietal cortex. Various egocentric representations in the parietal cortex derived from diPerent viewpoints can be transformed and integrated into a unified allocentric representation and stored in the MTL (i.e., bottom-up process). Conversely, the allocentric representation in the MTL can serve as a template for reconstructing diverse egocentric representations across diPerent viewpoints in the parietal cortex (i.e., top-down process).”

      “In line with the spatial construals of time hypothesis, several authors have recently suggested that such mutually engaged egocentric and allocentric reference frames (in the parietal cortex and the medial temporal lobe, respectively) proposed in the spatial domain might also apply to the temporal one (e.g., Gauthier & van Wassenhove, 2016ab; Gauthier et al., 2019, 2020; Bottini & Doeller, 2020). If this hypothesis holds, it could explain how the brain flexibly generates diverse construals of the same event sequence. Specifically, the hippocampus may encode a consistent representation of an event sequence that is independent of whether an individual adopts an internal or external perspective, reflecting an allocentric representation of time. In contrast, parietal cortical representations are expected to vary flexibly with the adopted perspective that is shaped by task demands, reflecting an egocentric representation of time.”

      In the revised manuscript, we also corrected statements in the Introduction that may have been misattributed (see Reviewer 2, comment 4(ii)) and added several relevant and important publications.

      (2) The experimental approach lacks sufficient details to be comprehensible to a general audience. In my opinion, the results are thus currently uninterpretable. I highlight only a couple of specific points (out of many). I recommend revision and clarification.

      (a) No explanation of the narrative is being provided. The authors report a distribution of durations with no clear description of the actual sequence of events. The authors should provide the text that was used, how they controlled for low-level and high-level linguistic confounds.

      We thank the reviewer for the suggestions. The event sequence for the odd-numbered participants is shown in the original Figure 1. In the revised manuscript, we added to Figure 1 the figure supplement 1 to illustrate the actual sequence of events for the participants with both odd and even numbers. We also added the narratives used in the reading phase of the learning procedures for the participants with both odd and even numbers (Figure 1—source data 1).

      To control for low-level linguistic confounds, we included the number of syllables as a covariate in the first-level general linear model in the fMRI analysis. To address high-level linguistic confounds, such as semantic information (which is difficult to quantify), we randomly assigned event labels to the 15 events twice, creating two counterbalanced versions for participants with even and odd numbers (see Comment 2b below).

      (b) The authors state, "we randomly assigned 15 phrases to the events twice". It is impossible to comprehend what this means. Were these considered stimuli? Controls? IT is also not clear which event or stimulus is part of the "learning set" and whether these were indicated to be such to participants.

      We apologize for any confusion in the Results section and the legend of Figure 1. Our motivation was explained in the "Stimuli" section of the Methods. In the revised manuscript, we have clarified this by adding an explanation to the legend of Figure 1 and including the supplementary Figure 1: " To minimize potential confounds between the semantic content of the event phrases and the temporal structure of the events, we randomly assigned the phrases to the events, creating two versions for participants with even and odd ID numbers. Both versions can be seen in Figure1—figure supplement 1 and Figure 1—source data 1."

      (c) The left/right counterbalancing is not being clearly explained. The authors state that there is counterbalancing, but do not sufficiently explain what it means concretely in the experiment. If a weak correlation exists between sequential position and distance, it also means that the position and the distance have not been equated within. How do the authors control for these?

      We thank the reviewer for highlighting this point and apologize for the lack of clarity in the original manuscript. In the current version (Page 40), we have provided further clarification: “We carefully selected two sets of 20 event pairs from the 210 possible combinations, assigning them to the odd and even runs of the fMRI experiment. Using a brute-force search, we identified 20 pairs in which sequential distance showed only weak correlations with positional information for both reference and target events (ranging from 1 to 15), as well as with behavioral responses (Same vs. Different or Future vs. Past, coded as 0 and 1), with all correlation coefficients below 0.2. At the same time, we balanced the proportion of correct responses across conditions: for the external-perspective task, Same/Different = 11/9 and 12/8; for the internal-perspective task, Future/Past = 12/8 and 8/12. Under these constraints, the sequential distances in both sets ranged from 1 to 5. To further mitigate spatial response biases, we pseudorandomized the left/right on-screen positions of the two response options within each task block, while ensuring an equal number of correct responses mapped to the left and right buttons (i.e., 10 per block).”

      The event pairs we selected already represent the best possible choice given all the criteria we aimed to satisfy. It is impossible to completely eliminate all potential correlations. For instance, if the target event occurs near the beginning of the day, it will tend to fall in the past, whereas if it occurs near the end of the day, it is more likely to fall in the future. To further ensure that the significant results were not driven by these weak confounding factors, we constructed another GLM that included three additional parametric modulators: the sequence position of the target event (ranging from 1 to 15) and the behavioral responses (Future vs. Past in the internal-perspective task; Same vs. Different in the external-perspective task, coded as 0 and 1). The significant findings were unaffected.

      (d) The authors used two tasks. In the "external perspective" one, the authors asked participants to report whether events were part of the same or a different part of the day. In the "internal perspective one", the authors asked participants to project themselves to the reference event and to determine whether the target event occurred before or after the projected viewpoint. The first task is a same/different recognition task. The second task is a temporal order task (e.g., Arzy et al. 2009). These two asks are radically different and do not require the same operationalization. The authors should minimally provide a comprehensive comparison of task requirements, their operationalization, and, more importantly, assess the behavioral biases inherent to each of these tasks that may confound brain activity observed with fMRI.

      We understand the reviewer’s concern. We agree that there is a substantial difference between the two tasks. However, the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component. Rather, the neural correlates of temporal distance were first identified as brain regions showing a significant correlation between neural activity and temporal distance using the parametric modulation analysis. We then compared these neural correlates between the two tasks. Therefore, any general differences between the tasks should not be a confound for our main results. Our aim was to examine whether the hippocampal representation of temporal distance remains consistent across different perspectives, and whether the parietal representation of temporal distance varies as a function of the perspective adopted.

      Therefore, the main aim of our task manipulation was to ensure that participants adopted either an external or an internal perspective on the event sequence, depending on the task condition. In the Introduction (Pages 6–7), we clarify this manipulation as follows: “In the externalperspective task, participants localized events with respect to external temporal boundaries, judging whether the target event occurred in the same or a different part of the day as the reference event. In the internal-perspective task, participants were instructed to mentally project themselves into the reference event and localize the target event relative to their own temporal point, judging whether the target event happened in the future or the past of the reference event (see Methods for details of the scanning procedure).”

      We believe this task manipulation was successful. Behaviorally, the two tasks showed opposite correlations between reaction time and temporal distance, resembling the symbolic distance versus mental scanning effect. Neurally, contrasting the internal- and external-perspective tasks revealed activation of the default mode network, which is known to play a central role in self-projection (Buckner et al., 2017).

      (e) The authors systematically report interpreted results, not factual data. For instance, while not showing the results on behavioral outcomes, the authors directly interpret them as symbolic distance effects.

      Thank you for this comment. In the original paper, we reported the relevant statistics before our interpretation: “Sequential Distance was correlated positively with RT in the external-perspective task (z = 3.80, p < 0.001) but negatively in the internal-perspective task (z = -3.71, p < 0.001).” However, they may have been difficult to notice, and we are including a figure for the RT analysis in the revised manuscript.

      Crucially, the authors do not comment on the obvious differences in task difficulty in these two tasks, which demonstrates a substantial lack of control in the experimental design. The same/different task (task 1 called "external perspective") comes with known biases in psychophysics that are not present in the temporal order task (task 2 called " internal perspective"). The authors also did not discuss or try to match the performance level in these two tasks. Accordingly, the authors claim that participants had greater accuracy in the external (same/different) task than in the internal task, although no data are shown and provided to support this report. Further, the behavioral effect is trivialized by the report of a performance accuracy trade off that further illustrates that there is a difference in the task requirements, preventing accurate comparison of the two tasks.

      As noted in Question 2d, we acknowledge the substantial difference between the two tasks. However, the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component. Instead, we first identified the neural correlates of temporal distance as brain regions showing a significant correlation between neural activity and temporal distance, independent of task demands. We then compared these neural correlates across the two task conditions, which were designed to engage different temporal perspectives. Therefore, any general differences between the tasks should not be a confound for our main findings and interpretation.

      Our aim was to investigate whether the hippocampal representation of temporal distance remains consistent across different perspectives and whether the parietal representation of temporal distance varies as a function of the perspective adopted. We do not see how this doubledissociation pattern could be explained by differences in task difficulty.

      While we do not consider the overall difference in task difficulty between the two tasks to be a confounding factor, we acknowledge the potential confound posed by variations in task difficulty across temporal distances (1 to 5). This concern arises from the similarity between the activity patterns in the posterior parietal cortex and reaction time across temporal distances. To address this, we conducted control analyses to test this hypothesis (see the second and third points from Reviewer 2 for details).

      On page 8, we present the behavioral accuracy data: “Participants showed significantly higher accuracy in the external-perspective task than in the internal-perspective task (external-perspective task: M = 93.5%, SD = 4.7%; internal-perspective task: M = 89.5%, SD = 8.1%; paired t(31) = 3.33, p = 0.002).”

      All fMRI contrasts are also confounded by this experimental shortcoming, seeing as they are all reported at the interaction level across a task. For instance, in Figure 4, the authors report a significant beta difference between internal and external tasks. It is impossible to disentangle whether this effect is simply due to task difference or to an actual processing of the duration that differs across tasks, or to the nature of the representation (the most difficult to tackle, and the one chosen by the authors).

      We thank the reviewer for pointing out this important issue. Like temporal distance, the neural correlates of duration were not derived from a direct contrast between the two tasks. Instead, they were identified by detecting brain regions showing a significant correlation between neural activity and the implied duration of each event using the parametric modulation analysis. Therefore, what is shown in Figure 4 reflects the significant differences in these neural correlations with duration between the two tasks.

      The observed difference in the neural representation of duration between the two tasks was unexpected. In the original manuscript, we provided a post hoc explanation: “Since the externalperspective task in the current study encouraged the participants to compare the event sequence with the external parallel temporal landmarks, duration representation in the hippocampus may be dampened.”

      However, we agree that this difference might also arise from other factors distinguishing the two tasks. In the revised manuscript, we have clarified this possibility as follows: “The difference in duration representation between the two tasks remains open to interpretation. One possible explanation is that the hippocampus is preferentially involved in memory for durations embedded within event sequences (see review by Lee et al., 2020). In the internal-perspective task, participants indeed localized events within the event sequence itself. In contrast, the externalperspective task encouraged participants to compare the event sequence with external temporal landmarks, which may have attenuated the hippocampal representation of duration.”

      Conclusion:

      In conclusion, the current experimental work is confounded and lacks controls. Any behavioral or fMRI contrasts between the two proposed tasks can be parsimoniously accounted for by difficulty or attentional differences, not the claim of representational differences being argued for here.

      We hope that our explanations and clarifications above adequately address the reviewer’s concerns. We would like to reiterate that we did not directly compare the two tasks. Rather, we first identified the neural representations of sequential distance and duration, and then examined how these representations differed across tasks. It is unclear to us how the overall difference in task difficulty or attentional demands could lead to the observed pattern of results.

      By determining where the neural representations were consistent and where they diverged, we were able to differentiate brain regions that encode temporal information allocentrically from those that represent temporal information in a perspective-dependent manner, modulated by task demands.

      Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task, but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      We sincerely appreciate the reviewers for providing an accurate, comprehensive, and objective summary of our study.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out, and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses, as well as neuroimaging analyses, is also much appreciated.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses/Suggestions:

      Although the design and analysis choices are generally solid, there are a few finer details/nuances that merit further clarification or consideration in order to strengthen the readers' confidence in the authors' interpretation of their data.

      (1) Given the known behavioural and neural effects of boundaries in sequence memory, I was wondering whether the number of traversed context boundaries (i.e., between morning-afternoon, and afternoon-evening) was controlled for across sequential length in the internal perspective condition? Or, was it the case that reference-target event pairs with higher sequential numbers were more likely to span across two parts of the day compared to lower sequential numbers? Similarly, did the authors examine any potential differences, whether behaviourally or neurally, for day part same vs. day part different external task trials?

      We thank the reviewer for the thoughtful comments. When we designed the experiment, we minimized the correlation between the sequential distance between the target and reference events and whether the reference and target events occurred within the same or different parts of the day (coded as Same = 0, Different = 1). The point-biserial correlation coefficient between these two variables across all the trials within the same run were controlled below 0.2.

      To investigate the effect of day-part boundaries on behavior, as well as the contribution of other factors, we conducted a new linear mixed-effects model analysis incorporating four additional variables. They are whether the target and the reference events are within the same or different parts of the day (i.e., Same vs. Different), whether the target event is in the future or the past of the reference event (i.e., Future vs. Past), and the interactions of the two factors with Task Type (i.e., internal- vs. external-perspective task).

      The results are largely the same as the original one in the table: There was a significant main effect of Syllable Length, and the interaction effects between Task Type and Sequence Distance and between Task Type and Duration remain significant. What's new is we also found a significant interaction effect between Task Type and Same vs. Different.

      As shown in the Figure 2—figure supplement 1, this Same vs. Different effect was in line with the effect of Sequential Distance, with two events in the same and different parts of the day corresponding to the short and long sequential distances. Given that Sequential Distance had already been considered in the model, the effect of parts of the day should result from the boundary effect across day parts or the chunking effect within day parts, i.e., the sequential distance across different parts of the day was perceived longer while the sequential distance within the same parts of the day was perceived shorter. We have incorporated these findings into the manuscript.

      Neurally, to further verify that the significant effects of sequential distance were not driven by its weak correlation with the Same/Different judgment or other potential confounding factors, we constructed another GLM that incorporated three additional parametric modulators: the sequence position of the target event (ranging from 1 to 15) and the behavioral responses (Future vs. Past in the internal-perspective task; Same vs. Different in the external-perspective task, coded as 0 and 1). The significant findings were unaffected.

      (2) I would appreciate further insight into the authors' decision to model their task trials as stick functions with duration 0 in their GLMs, as opposed to boxcar functions with varying durations, given the potential benefits of the latter (e.g., Grinband et al., 2008). I concur that in certain paradigms, RT is considered a potential confound and is taken into account as a nuisance covariate (as the authors have done here). However, given that RTs appear to be critical to the authors' interpretation of participant behavioural performance, it would imply that variations in RT actually reflect variations in cognitive processes of interest, and hence, it may be worth modelling trials as boxcar functions with varying durations.

      We appreciate the reviewer’s insightful comment on this important issue. Whether to control for RT’s influence on fMRI activation is indeed a long-standing paradox. On the one hand, RT reflects underlying cognitive processes and therefore should not be fully controlled for. On the other hand, RT can independently influence neural activity, as several brain networks vary with RT irrespective of the specific cognitive process involved—a domain-general effect. For example, regions within the multiple-demand network are often positively correlated with RT across different cognitive domains.

      Our strategy in the manuscript is to first present the results without including RT as a control variable and then examine whether the effects are preserved after controlling for RT. In the revised manuscript, we have clarified this approach (Page 13): “Here, changes in activity levels within the PPC were found to align with RT. Whether to control for RT’s influence on fMRI activation represents a well-known paradox. On the one hand, RT reflects underlying cognitive processes and therefore should not be fully controlled for. On the other hand, RT can independently influence neural activity, as several brain networks vary with RT irrespective of the specific cognitive process involved—a domain-general effect. For instance, regions within the multiple-demand network are often positively correlated with RT and task difficulty across diverse cognitive domains (e.g., Fedorenko et al., 2013; Mumford et al., 2024). To evaluate the second possibility, we conducted an additional control analysis by including trial-by-trial RT as a parametric modulator in the first-level model (see Methods). Notably, the same PPC region remained the only area in the entire brain showing a significant interaction between Task Type and Sequential Distance (voxel-level p < 0.001, clusterlevel FWE-corrected p < 0.05). This finding indicates that PPC activity cannot be fully attributed to RT. Furthermore, we do not interpret the effect as reflecting a domain-general RT influence, as regions within the multiple-demand system—typically sensitive to RT and task difficulty—did not exhibit significant activation in our data.”

      The reason we did not use boxcar functions with varying durations in our original manuscript is that we also applied parametric modulation in the same model. In the parametric modulation, all parametric modulators inherit the onsets and durations of the events being modulated. Consequently, the modulators would also take the form of boxcar functions rather than stick functions—the height of each boxcar reflecting the parameter value and its length reflecting the RT. We were uncertain whether this approach would be appropriate, as we have not encountered other studies implementing parametric modulation in this manner.

      For exploratory purposes, we also conducted a first-level analysis using boxcar functions with variable durations. The same PPC region remained the strongest area in the entire brain that shows an interaction effect between Task Type and Sequential Distance. However, the cluster size was slightly reduced (voxel-level p < 0.001, cluster-level FWE-corrected p = 0.0610; see the Author response image 1 below). The cross indicates the MNI coordinates at [38, –69, 35], identical to those shown in the main results (Figure 4A).

      Author response image 1.

      (3) The activity pattern across tasks and sequential distance in the posterior parietal cortex appears to parallel the RT data. Have the authors examined potential relationships between the two (e.g., individual participant slopes for RT across sequential distance vs. activity betas in the posterior parietal cortex)?

      We thank the reviewer for this helpful suggestion. As shown in the Author response image 2, the interaction between Task Type and Sequential Distance was a stronger predictor of PPC activation than of RT. Because PPC activation and RT are measured on different scales, we compared their standardized slopes (standardized β) measuring the change in a dependent variable in terms of standard deviations for a one-standard-deviation increase in an independent variable. The standardized β for the Task Type × Sequential Distance interaction was −0.30 (95% CI [−0.42, −0.19]) for PPC activation and −0.21 (95% CI [−0.30, −0.13]) for RT. The larger standardized effect for PPC activation indicates that the Task Type × Sequential Distance interaction was a stronger predictor of neural activation than of behavioral RT.

      Author response image 2.

      A more relevant question is whether PPC activation can be explained by temporal information (i.e., the sequential distance) independently of RT. To test this, we included both Sequential Distance and RT in the same linear mixed-effects model predicting PPC Activation Level. As shown in the Author response table 1, although RT independently influenced PPC activation (F(1, 288) = 4.687, p = 0.031), the interaction between Task Type and Sequential Distance was a much stronger independent predictor (F(1, 290) = 19.319, p < 0.001).

      Author response table 1.

      PPC Activation Level Predicted by Sequential Distance and RT

      (3) Linear Mixed Model Formula: PPC Activation Level ~ 1 + Task Type * (Sequential Distance + RT) + (1 | Participant)

      (4) There were a few places in the manuscript where the writing/discussion of the wider literature could perhaps be tightened or expanded. For instance:

      (i) On page 16, the authors state 'The negative correlation between the activation level in the right PPC and sequential distance has already been observed in a previous fMRI study (Gauthier & van Wassenhove, 2016b). The authors found a similar region (the reported MNI coordinate of the peak voxel was 42, -70, 40, and the MNI coordinate of the peak voxel in the present study was 39, -70, 35), of which the activation level went up when the target event got closer to the self-positioned event. This finding aligns with the evidence suggesting that the posterior parietal cortex implements egocentric representations.' Without providing a little more detail here about the Gauthier & van Wassenhove study and what participants were required to do (i.e., mentally position themselves at a temporal location and make 'occurred before' vs. 'occurred after' judgements of a target event), it could be a little tricky for readers to follow why this convergence in finding supports a role for the posterior parietal cortex in egocentric representations.

      We appreciate the reviewer’s comments. In the revised manuscript, we have provided a more detailed explanation of Gauthier and van Wassenhove’s study (Page 17): “The negative correlation between the activation level in the right PPC and sequential distance has already been observed in a previous fMRI study by Gauthier & van Wassenhove (2016b). In their study, the participants were instructed to mentally position themselves at a specific time point and judge whether a target event occurred before or after that time point. The authors identified a similar brain region (reported MNI coordinates of the peak voxel: 42, −70, 40), closely matching the activation observed in the present study (MNI coordinates of the peak voxel: 39, −70, 35). In both studies, activation in this region increased as the target event approached the self-positioned time point, which aligns with the evidence suggesting that the posterior parietal cortex implements egocentric representations.”

      (ii) Although the authors discuss the Lee et al. (2020) review and related studies with respect to retrospective memory, it is critical to note that this work has also often used prospective paradigms, pointing towards sequential processing being the critical determinant of hippocampal involvement, rather than the distinction between retrospective vs. prospective processing.

      We sincerely thank the reviewer for highlighting these important points. In response, we have revised the section of the Introduction discussing the neural underpinnings of duration (Pages 3-4). “Neurocognitive evidence suggests that the neural representation of duration engages distinct brain systems. The motor system—particularly the supplementary motor area—has been associated with prospective timing (e.g., Protopapa et al., 2019; Nani et al., 2019; De Kock et al., 2021; Robbe, 2023), whereas the hippocampus is considered to support the representation of duration embedded within an event sequence (e.g., Barnett et al., 2014; Thavabalasingam et al., 2018; see also the comprehensive review by Lee et al., 2020).”

      (iii) The authors make an interesting suggestion with respect to hippocampal longitudinal differences in the representation of event sequences, and may wish to relate this to Montagrin et al. (2024), who make an argument for the representation of distant goals in the anterior hippocampus and immediate goals in the posterior hippocampus.

      We thank the reviewer for bringing this intriguing and relevant study to our attention. In the Discussion of the manuscript, we have incorporated it into our discussion (Page 21): “Evidence from the spatial domain has suggested that the anterior hippocampus (or the ventral rodent hippocampus) implements global and gist-like representations (e.g., larger receptive fields), whereas the posterior hippocampus (or the dorsal rodent hippocampus) implements local and detailed ones (e.g., finer receptive fields) (e.g., Jung et al., 1994; Kjelstrup et al., 2008; Collin et al., 2015; see reviews by Poppenk et al., 2013; Robin & Moscovitch, 2017; see Strange et al., 2014 for a different opinion). Recent evidence further shows that the organizational principle observed along the hippocampal long axis may also extend to the temporal domain (Montagrin et al., 2024). In that study, the anterior hippocampus showed greater activation for remote goals, whereas the posterior hippocampus was more strongly engaged for current goals, which are presumed to be represented in finer detail.”

      Reviewing Editor Comments:

      While both reviewers acknowledged the significance of the topic, they raised several important concerns. We believe that providing conceptual clarification, adding important methodological details, as well as addressing potential confounds will further strengthen this paper.

      We thank the editor for the suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please, provide the actual ethical approval #.

      We have added the ethical approval number in the revised manuscript (P 36): “The ethical committee of the University of Trento approved the experimental protocol (Approval Number 2019-018),”

      (2) Thirty-two participants were tested. Please report how you estimated the sample size was sufficient to test your working hypothesis.

      We thank the editor for pointing out this omission. In the revised manuscript, we have added an explanation for our choice of sample size (p. 36): “The sample size was chosen to align with the upper range of participant numbers reported in previous fMRI studies that successfully detected sequence or distance effects in the hippocampus (N = 15–34; e.g., Morgan et al., 2011; Howard et al., 2014; Deuker et al., 2016; Garvert et al., 2017; Theves et al., 2019; Park et al., 2021; Cristoforetti et al., 2022).”

      (3) All MRI figures: please orient the reader; left/right should be stated.

      In the revised manuscript, we have added labels to all MRI figures to indicate the left and right hemispheres.

      (4) In Figure 3A-B, the clear lateralization of the activation is not discussed in the Results or in the Discussion. Was it predicted?

      We thank the editors for highlighting this important point regarding hemispheric lateralization. The right-lateralization observed in our findings is indeed consistent with previous literature. In the revised manuscript, we have expanded our discussion to emphasize this aspect more clearly.

      For the parietal cortex, we now note (Page 17-18): “The negative correlation between activation in the right posterior parietal cortex (PPC) and sequential distance has previously been reported in an fMRI study by Gauthier and van Wassenhove (2016b). In their paradigm, participants were instructed to mentally position themselves at a specific time point and judge whether a target event occurred before or after that point. The authors identified a similar region (peak voxel MNI coordinates: 42, −70, 40), closely corresponding to the activation observed in the present study (peak voxel MNI coordinates: 39, −70, 35). In both studies, activation in this region increased as the target event approached the self-positioned time point, consistent with evidence suggesting that the posterior parietal cortex supports egocentric representations. Neuropsychological studies have further shown that patients with lesions in the bilateral or right PPC exhibit ‘egocentric disorientation’ (Aguirre & D’Esposito, 1999), characterized by an inability to localize objects relative to themselves (e.g., Case 2: Levine et al., 1985; Patient DW: Stark, 1996; Patients MU: Wilson et al., 1997, 2005).”

      For the hippocampus, we have added (Page 19): “Previous research has shown that hippocampal activation correlates with distance (e.g., Morgan et al., 2011; Howard et al., 2014; Garvert et al., 2017; Theves et al., 2019; Viganò et al., 2023), and that distributed hippocampal activity encodes distance information (e.g., Deuker et al., 2016; Park et al., 2021). Most studies have reported hippocampal ePects either bilaterally or predominantly in the right hemisphere, whereas only one study (Morgan et al., 2011) found the ePect localized to the left hippocampus.”

    1. eLife Assessment

      This manuscript describes a useful integrated proteogenomics pipeline to enable the discovery of novel peptides in cancer cell lines. The method combines long-read RNA sequencing with a multi-protease digestion and proteomics approach. The method is a further development of the authors' previous approaches to identify cancer-specific peptides; however, the current study focuses on a single cell line, and the characterization remains incomplete and lacks validation for candidate alterations. The manuscript will be of interest to scientists focusing on identifying unique alterations in cancer cells.

    2. Reviewer #1 (Public review):

      In this study, the authors provide an integrated proteogenomics pipeline to enable the discovery of novel peptides in an Ewing sarcoma cell line (A673). To identify novel full-length resolved isoforms, they performed long-read RNA sequencing (Oxford Nanopore Technology). Then, to increase the chance of detecting Ewing-specific neopeptides, the authors combined two approaches: a multi-protease digestion and a multi-dimensional proteomics approach.

      Given the importance of novel isoforms and cryptic sites in neoantigen discovery and its putative applications in immunotherapy, this method and resource paper are of interest for the Ewing community and potentially for a broader cancer audience. The originality of this paper relies mostly on this optimized method to discover novel peptides (long-read sequencing with multiprotease, multi-dimensional trapped ion mobility spectrometry parallel accumulation-serial fragmentation mass spectrometry). Although, to my knowledge, no study combining long-read sequencing and proteomics methods has been published on Ewing Sarcoma, this study appears limited by a few aspects:

      (1) The study is restricted to the analysis of a single cell line (A673). The authors should consider extending the analysis to other Ewing cell lines.

      (2) The characterization of the 1121 non-canonical transcripts can be improved. How many are just splice variants of known genes, and how many are bona fide neogenes? In this respect, the definition of what the authors call neogene is quite unclear. Is a transcript with a new exon reported as a neogene? Is a transcript with a new start site reported as a neogene? It should be clearly indicated which categories of Figure 4B are reported on Figure 4D. A general flow chart would be very useful to help follow the analysis process.

      (3) Similarly, the authors detect 3216 A673 specific proteins with no match in SwissProt. This number decreases to 72 "putative non-canonical proteoforms with unique peptides after BLASTp" against Uniprot. Again, a flow chart would conveniently enable one to follow the step-by-step analysis.

      (4) Finally, only 17 spectral matches are suggested to be derived from non-canonical proteoforms. It would be important to compare the spectrum of these detected peptides with that of synthetic peptides. Such an analysis would enable us to assess the number of reliably detected proteoforms that can be expected in an Ewing sarcoma cell line.

      (5) It is very unclear what the authors want to highlight in Supplementary Figure 5. Is it that non-canonical transcripts are broadly expressed in normal tissue? Which again raises the question of definitions of neogenes, non-canonical... Apparently, this figure shows that these non-canonical transcripts contain a large part of canonical sequences, which account for the strong signal in many normal tissues. A similar heatmap could be presented, including only the non-canonical sequences of the non-canonical transcripts. This figure should also include Ewing sarcoma samples.

    3. Reviewer #2 (Public review):

      The paper from Kulej et al. reports a set of tools for proteogenomic analysis of cancer proteomes. Their approach utilizes modern methods in long-read RNA sequencing to assemble a proteome database that is specific to Ewing sarcoma-derived A673 cells. To maximize proteome coverage and therefore increase the odds of detecting cancer-specific alterations at the protein level, the authors use multiple enzymes (trypsin, gluC, etc.) to digest cellular proteins and then perform multidimensional peptide fractionation. Peptide samples are then analyzed by LC-MS/MS using data-dependent and data-independent schemes on a timstof mass spectrometer. Proteogenomics is an important area of investigation for cancer research and does require new informatics tools.

      The authors describe an end-to-end workflow where they claim to have optimized four different steps:

      (1) Assembly of a sample-specific protein database using long-read transcriptomic data.

      (2) Use of 8 different proteolytic enzymes to maximize diversity of peptides.

      (3) Multiple stages of peptide fractionation using SCX and high pH rp chromatography.

      (4) Utilize acquisition methods on the timstof mass spec to provide MS/MS data from single-charged peptides and multiply-charged peptides.

      The authors published two earlier versions of ProteomeGenerator (versions 1 and 2) in the Journal of Proteome Research. In these earlier versions, 'ProteomeGenerator' was the set of software tools designed to integrate DNA and RNA sequencing to create a sample-specific protein database. To test the performance of each ProteomeGenerator version, the authors generated LC-MS/MS data using a combination of trypsin and LysC, then in the other paper, trypsin, LysC, and GluC. In both papers, they performed some levelof peptide fractionation prior to LC-MS/MS. They acquired LC-MS/MS data on a Thermo Q-Exactive in one paper and a Thermo Orbitrap mass spec in the other paper.

      In the current paper, the primary innovation is the use of long-read sequencing to potentially improve the quality of the sample specific protein database. The other three components noted above are incremental compared to the authors' previous two papers and generally accepted practices in the field of proteomics. To note one example, the authors previously digested proteins using three enzymes and now use eight. Similarly, they are now using a timstof Bruker mass spec instead of one from Thermo. The detailed descriptions around the use of many enzymes and peptide fractionation, etc., create a very technically oriented paper, similar to or more so than the authors' earlier papers in J. Proteome Research. So, while there is enthusiasm for the use of long-read sequencing across biomedical research, the impact here for proteogenomic applications is somewhat lost with all of the technical description for experimental details that are not particularly innovative. In this respect, the report is not well matched to a broad readership.

    4. Author response:

      We thank you and reviewers for their thoughtful, constructive, and fair evaluation of our manuscript. We appreciate the recognition of the value of an end-to-end proteogenomics framework integrating long-read transcriptomics with deep proteomic analysis, and we are grateful for the specific guidance on how to strengthen clarity, generality, and impact for a broad scientific readership. We outline below the key revisions we plan to undertake in response to the public reviews.

      Reviewer #1

      We thank the reviewer for their positive assessment of the relevance of this work to Ewing sarcoma and cancer proteogenomics.

      Scope and generality.

      We agree that analysis of a single cell line limits generalization. In the revised manuscript, we will extend the ProteomeGenerator3 workflow to additional tumor specimens, including Ewing sarcoma tumors, to assess reproducibility and biological relevance beyond a single test cancer cell line.

      Definitions and analytical clarity.

      We will clarify definitions of non-canonical transcripts, alternative splice isoforms, and neogenes, and explicitly distinguish these categories throughout the manuscript. We will add a summary flow diagram that tracks transcripts through classification, ORF prediction, and proteoform detection, clarifying how Figures 4B and 4D relate.

      Proteoform filtering and confidence.

      To improve transparency, we will add a step-wise schematic summarizing how candidate non-canonical proteoforms are filtered to a high-confidence subset, including SwissProt comparison, BLASTp filtering, peptide uniqueness, and competitive database searches.

      Validation.

      We agree that orthogonal validation is important. We will include additional analyses of non-canonical proteofoms detected recurrently in additional tumor specimens to provide an empirical estimate of reliably detectable non-canonical proteoforms.

      Supplementary Figure 5.

      We will revise the presentation and explanation of this figure to avoid misinterpretation, including analyses focused specifically on non-canonical sequence segments and inclusion of tumor samples for direct comparison.

      Reviewer #2

      We thank the reviewer for placing this work in context with our prior ProteomeGenerator publications and for their guidance on framing the manuscript for a broad audience.

      Emphasizing the central conceptual advance.

      We agree that the primary innovation is the use of long-read transcriptomics to generate sample-specific proteogenomic databases. In the revised manuscript, we will directly compare long-read-derived and short-read-derived databases applied to the same samples and proteomic data, explicitly demonstrating where long-read sequencing enables discovery inaccessible to short-read approaches.

      Manuscript reorganization.

      We will substantially revise the manuscript to foreground the biological and conceptual consequences of long-read-enabled proteogenomics, using focused examples. Detailed descriptions of protease selection, fractionation, and acquisition optimization will be moved to supplementary methods, while retaining key conclusions about their impact on discovery.

      Positioning of technical advances.

      We will frame multi-protease and acquisition strategies as general principles required for unbiased proteoform discovery, rather than as static technical prescriptions, emphasizing their relevance across evolving proteomics platforms.

      Overall Significance

      In the revised manuscript, we will more clearly articulate that this work establishes long-read-informed, sample-specific proteogenomics as a discovery-grade framework, revealing cancer-specific proteoforms that are systematically invisible to reference-based and short-read-driven approaches, with broad implications for cancer biology and biomarker discovery.

      We thank the editors and reviewers again for their constructive feedback, which we believe will substantially strengthen the clarity and broad impact of this work.

    1. eLife Assessment

      This study provides an important contribution by showing that whiteflies and planthoppers use salivary effectors to suppress plant immunity through the receptor-like protein RLP4, suggesting convergent evolution in these insect lineages. The topic is of clear interest for understanding plant-insect interactions and offers ideas that could stimulate further research in the field. The authors provide mostly solid evidence for the functional roles of the salivary effectors; however, the interpretation of the physiological function of RLP4 in plant defense requires clarification.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

    3. Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein. Unlike previously reported receptor-like proteins with large ligand-binding domains, the NtRLP4 here has a malectin LRR domain. Interestingly, it also associates with the adaptor SOBIR1. While the function of this protein remains to be further explored, the authors provide strong evidence showing it's the target of salivary proteins as the insects' survival strategy.

      Major points:

      The authors mixed the concepts of LRR-RLPs with malectin LRR-RLPs. These are two different type of receptors. While LRR-RLPs are well studied, little is known about malectin LRR-RLPs. The authors should not simply apply the mode of function of LRR-RLPs to RLP4 which is a malectin LRR-RLP. In addition, LRR-RLPs that function as ligand-binding receptors typically possess >20 LRRs, whereas RLP4 in this work has a rather small ectodomain. It remains unclear whether it will function as a PRR.

      I can't agree with the author's logic of testing uninfested plants for proving a PRR's function. The function of a pattern recognition receptor depends on perceiving the corresponding ligand. As shown by the data provided, RLP4-OE plants have altered transcriptional profile indicating activated defense, suggesting it's unlikely a PRR. An alternative explanation is needed.

      More work on BAK1 will also help to clarify the ideas proposed by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al., investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Two minor comments:

      In line 140, yeast two-hybrid (Y2H) was used to screen for interacting proteins in plants. However, it is generally difficult to identify membrane receptors using Y2H. Please provide more methodological details to justify this approach, or alternatively, include a discussion explaining this.

      In Figure S12C, the interaction between the two proteins appears to be present in the nucleus as well. Please provide a possible explanation for this observation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a well-structured and interesting manuscript that investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

      Weaknesses:

      Western blot evidence for effector secretion is weak. The possibility of contamination from insect tissues during the sample preparation should be avoided.

      Below are some specific comments and suggestions to strengthen the manuscript.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      (1) Western blot evidence for effector secretion:

      The western blot evidence in Figure 1, which aims to show that the insect protein is secreted into plants, is not fully convincing. The band of the expected size (~30 kDa) in the infested tissues is very weak. Furthermore, the high and low molecular weight bands that appear in the infested tissues do not match the size of the protein in the insects themselves, and a high molecular weight band also appears in the uninfested control tissues. It is difficult to draw a definitive conclusion that this protein is secreted into the plants based on this evidence. The authors should also address the possibility of contamination from insect tissues during the sample preparation and explain how they have excluded this possibility.

      Thank you for pointing out this. One or two bands between 25-35kDa were specifically identified in B. tabaci-infested plants, but not the non-infested plants, and the smaller high intensity band is the same size as that of BtRDP in salivary glands. This experiment has been repeated for six times. In the current version, we reperformed this experiment, and provided salivary gland sample as a positive control, which showed the same molecular weight with a specific band in infested sample. It is noteworthily that in the experiment of current version, only the smaller high intensity band appear, while the low intensity band did not appear. The detection of a protein within infested plant tissue is a key criterion for validating the secretion of salivary effectors, an approach supported by numerous studies in this field. Furthermore, our previous LC-MS/MS analysis of B. tabaci watery saliva identified six unique peptides matching BtRDP, providing independent evidence for its presence in saliva. Therefore, as we now state in the manuscript “the detection of BtRDP in infested plants (Fig. 1a) and in watery saliva (Fig. S1) collectively indicates that BtRDP is a salivary protein”.

      Regarding the higher molecular weight band that present in both infested and non-infested samples, we agree that it most likely represents a non-specific band, which is a common occurrence in Western blot assays. Such bands are sometimes used to indicate comparable sample loading. To address the possibility of contamination by insect tissues, we wish to clarify that all insects and deposited eggs were carefully removed from the infested leaves prior to sample processing. Moreover, BtRDP is undetectable at the egg stage, and no BtRDP-associated band can be detected even in egg contamination. We have revised the Methods section to explicitly state this procedure:

      “After feeding, the eggs deposited on the infested tobacco leaves were removed. The leaves showing no visible insect contamination were immediately frozen in liquid nitrogen and ground to a fine powder.”

      (2) Inconsistent conclusion (Line 156 and Figure 3c):

      The statement in line 156 is inconsistent with the data presented in Figure 3c. The figure clearly shows that the LRR domain of the protein is the one responsible for the interaction with BtRDP, not the region mentioned in the text. This is a critical misrepresentation of the experimental findings and must be corrected. The conclusion in the text should accurately reflect the data from the figure.

      We apologize for any confusion caused by the original phrasing. In our previous manuscript, the description “NtRLP4 without signal peptides and transmembrane domains” referred specifically to the truncated construct NtRLP4<sub>(23-541)</sub> used in the experiment. To prevent any misunderstanding, we have revised the sentence in the updated version to state explicitly: “Point-to-point Y2H assays reveal that NtRLP4<sub>(23-541)</sub> (a truncated version lacking the signal peptide and transmembrane domains) interacts with BtRDP<sup>-sp</sup>”.

      (3) Role of SOBIR1 in the RLP4/SOBIR1 Complex:

      The authors demonstrate that the salivary effectors destabilize the RLP4 receptor, leading to a decrease in its protein levels and a reduction in the RLP4/SOBIR1 complex. A key question remains regarding the fate of SOBIR1 within this complex. The authors should clarify what happens to the SOBIR1 protein after the destabilization of RLP4. Does SOBIR1 become unbound, targeted for degradation itself, or does it simply lose its function without RLP4? This would provide further insight into the mechanism of action of the effectors.

      Thank you for suggestion. In the current version, we assessed the impact of BtRDP on NtSOBIR1 following NtRLP4 destabilization. The results showed that while the NtRLP4-myc accumulation was markedly reduced, NtSOBIR1-flag levels remained unchanged, suggesting that destabilization of NtRLP4 did not affect NtSOBIR1 accumulation.

      (4) Clarification on specificity and evolutionary claims:

      The paper's most significant claim is that the effectors from both whiteflies and planthoppers "independently evolved" to target RLP4. While the functional data is compelling, this evolutionary claim would be more convincing with stronger evidence. Showing that two different effector proteins target the same host protein is a fascinating finding but without a robust phylogenetic analysis, the claim of independent evolution is not fully supported. It would be valuable to provide a more detailed evolutionary analysis, such as a phylogenetic tree of the effector proteins, showing their relationship to other known insect proteins, to definitively rule out a shared, but highly divergent, common ancestor.

      We appreciate the reviewer’s valuable suggestion to investigate a potential evolutionary link between BtRDP and NlSP104. Our initial analysis already indicated no detectable sequence similarity. To address this point more thoroughly, we attempted a phylogenetic analysis. However, we were unable to generate a meaningful alignment due to a complete lack of conserved amino acid sequences. Therefore, we conducted a comparative genomics analysis by blasting both proteins against the genomic or transcriptomic data of 30 diverse insect species. This analysis revealed that RDP is exclusively present in Aleyrodidae species, and SP104 is exclusively present in Delphacidae species (Table S1). Taken together, the absence of sequence similarity, their distinct protein structure, and their lineage-specific distributions, we conclude that BtRDP and NlSP104 are highly unlikely to be homologous and thus did not originate from a common ancestor.

      (5) Role of SOBIR1 in the interaction:

      The results suggest that the effectors disrupt the RLP4/SOBIR1 complex. It is not entirely clear if the effectors are specifically targeting RLP4, SOBIR1, or both. Further experiments, such as a co-immunoprecipitation assay with just RLP4 and the effector, could clarify if the effector can bind to RLP4 in the absence of SOBIR1. This would help to definitively place RLP4 as the primary target.

      We appreciate the reviewer’s insightful comments regarding whether the effector preferentially targets RLP4, SOBIR1, or both. In our study, we conducted reciprocal co-immunoprecipitation assays using RLP4 and BtRDP as controls. These assays showed that BtRDP interacts with RLP4 but does not interact with SOBIR1, supporting the conclusion that SOBIR1 is unlikely to be a direct target of BtRDP. We fully agree that testing the interaction between RLP4 and BtRDP in the absence of SOBIR1 would further strengthen the conclusion. However, we were unable to obtain N. tabacum SOBIR1 knockout mutants, and therefore could not experimentally assess whether the RLP4–BtRDP interaction persists in planta without SOBIR1. Nevertheless, our yeast two-hybrid assays demonstrate that RLP4 and BtRDP can directly interact, indicating that their association does not strictly depend on SOBIR1. Together, these results support the interpretation that RLP4 is the primary target of BtRDP, while SOBIR1 is not directly engaged by the effector.

      (6) Transcriptome analysis (Lines 130-143):

      The transcriptome analysis section feels disconnected from the rest of the manuscript. The findings, or lack thereof, from this analysis do not seem to be directly linked to the other major conclusions of the paper. This section could be removed to improve the manuscript's overall focus and flow. If the authors believe this data is critical, they should more clearly and explicitly connect the conclusions of the transcriptome analysis to the core findings about the effector-RLP4 interaction.

      Thank you for suggestion. As you and Reviewer #2 pointed, the transcriptomic analysis did not closely link to the major conclusions of the paper, and we got little information from the transcriptomic analysis. Therefore, we remove these analyses to improve the manuscript’s overall focus and flow.

      (7) Signal peptide experiments (Lines 145 and beyond):

      The experiments conducted with the signal peptide (SP) are questionable. The SP is typically cleaved before the protein reaches its final destination. As such, conducting experiments with the SP attached to the protein may have produced biased observations and could lead to unjustified conclusions about the protein's function within the plant cell. We suggest the authors remove the experiments that include the signal peptide.

      Thank you for pointing out this. The SP was retained to direct the target proteins to the extracellular space of plant cells. Theoretically, the SP is cleaved in the mature protein. This methodology is widely used in effector biology. For example, the SP directs Meloidogyne graminicola Mg01965 to the apoplast, where it functions in immune suppression, whereas Mg01965 without the SP fails to exert this function (10.1111/mpp.12759). In our study, the SP of BtRDP was expected to guide the target protein to the extracellular space, facilitating its interaction with RLP4. Moreover, the observed protein sizes of BtRDP with and without the SP in transgenic plants were identical, suggesting successful SP cleavage. Therefore, we have retained the experiments involving the SP in the current version.

      (8) Overly strong conclusion and unclear evidence (Line 176):

      The use of the word "must" on line 176 is very strong and presents a definitive conclusion without sufficient evidence. The authors state that the proteins must interact with SOBIR1, but they do not provide a clear justification for this claim. Is SOBIR1 the only interaction partner for NtRLP4? The authors should provide a specific reason for focusing on SOBIR1 instead of demonstrating an interaction with NtRLP4 first. Additionally, do BtRDP or NlSP694 also interact with SOBIR1 directly? The authors should either tone down their language to reflect the evidence or provide a clearer justification for this strong claim.

      Thank you for pointing this out. In the current version, the word “must” has been toned down to “may” due to insufficient supporting evidence. In this study, SOBIR1 was chosen because it has been widely reported to be required for the function of several RLPs involved in innate immunity. However, it remains unclear whether SOBIR1 is the only interaction partner of NtRLP4. In the current version, we have clarified the rationale for focusing on SOBIR1 prior to the experiments “The receptor-like kinase SOBIR1, which contains a kinase domain, has been widely reported to be required for the function of RLPs involved in innate immunity (Gust & Felix, 2014)” and discussed that “Although NtRLP4 interacts with SOBIR1, this alone does not confirm that it operates strictly through this canonical module. Evidence from other RLPs shows that co-receptor usage can be flexible, and some RLPs function partly or conditionally independent of SOBIR1. Therefore, a more definitive assessment of NtRLP4 signaling will therefore require genetic dissection of its co-receptor dependencies, including but not limited to SOBIR1.”. In addition, the direct interaction between BtRDP and SOBIR1 was experimentally tested, and the results showed that BtRDP failed to interact with SOBIR1.

      Minor Comments

      (9) The statement in the abstract, "However, it remains unclear how these invaders are able to overcome receptor perception and disable the plant signaling pathways," is not entirely accurate. The fields of effector biology and host-pathogen interactions have provided significant insight into how pathogens and pests manipulate both Pattern-Triggered Immunity (PTI) and Effector-Triggered Immunity (ETI). While the specific mechanism described in this paper is novel, the broader claim that the field is unclear on these processes weakens the initial hook of the paper. A more precise framing of the problem would be beneficial, perhaps by stating that the specific mechanisms used by these particular herbivores to target RLP4 were previously unknown.

      Thank you for this insightful comment. We agree that the original statement in the abstract overstated the lack of understanding in the field. In the current version, we have refined the sentence to more accurately reflect the current state of knowledge, emphasizing that while microbial suppression of plant immunity has been extensively studied, the strategies used by herbivorous insects to overcome receptor-mediated defenses remain less understood. The revised sentence now reads as follows: “Although the mechanisms used by microbial pathogens to suppress plant immunity are well studied, how herbivorous insects overcome receptor-mediated defenses remains unclear”.

      (10) The introduction is heavily focused on Pattern Recognition Receptors (PRRs), which, while central to the paper's findings, gives a somewhat narrow view of the plant's defense against herbivores. It would be beneficial to briefly acknowledge the broader context of plant defenses, such as physical barriers, direct chemical toxicity, and indirect defenses, before narrowing the focus to the specific molecular interactions of PRRs that are the core of this study. This would provide a more complete picture of the "arms race" between plants and herbivores.

      Thank you for this valuable suggestion. We agree that the original introduction focused too narrowly on pattern-recognition receptors (PRRs). In the current version, we have expanded the introductory section to provide a broader overview of plant defense mechanisms. Specifically, we now acknowledge the multiple layers of plant defenses, including physical barriers (e.g., cuticle and cell wall), chemical defenses (e.g., toxic secondary metabolites and anti-nutritive compounds), and indirect defenses mediated by herbivore-induced volatiles. This addition provides a more complete context for understanding the molecular interactions discussed in this study. The revised paragraph now reads as follows: “Plants have evolved sophisticated defense systems to survive constant attacks from pathogens and herbivorous insects. These defenses operate at multiple levels, including physical barriers such as the cuticle and cell wall, chemical defenses involving toxic secondary metabolites and anti-nutritive compounds, and indirect defenses that attract natural enemies of herbivores through the emission of herbivore-induced volatiles. Beyond these general strategies, plants also rely on highly specialized molecular immune responses that allow them to detect and respond rapidly to invaders.”

      (11) The figure legends are generally clear, but some could be more detailed. For instance, in Figure 2, it would be helpful to explicitly state what each bar represents in the graph and to include the statistical test used. Please ensure all panels in all figures have clear labels.

      Thank you for this helpful suggestion. We have revised the legend of Fig. 2 and other figures to provide more detailed information for each panel. Specifically, we now explicitly describe what each bar represents in the graphs and specify the statistical test used. In addition, we ensured that all panels are clearly labeled. These changes improve clarity and allow readers to better interpret the data.

      (12) The methods section is comprehensive, but it would be helpful to include more specifics on the statistical analyses used. For example, the type of statistical test (e.g., t-test, ANOVA) and the software used should be mentioned for each experiment.

      Thank you for your suggestion. We have revised the Methods section (Statistical analysis) to provide more detailed information on the statistical analysis used for each experiment.

      (13) The manuscript's overall impact is weakened by the inclusion of unnecessary words and a few grammatical issues. A focused revision to tighten the language would make the major findings stand out more clearly. For example, on page 2, line 18, "in whitefly Bemisia tabaci, BtRDP is an Aleyrod..." seems to have an incomplete sentence. A thorough proofreading for typos and grammatical errors is highly recommended to improve the overall readability.

      Thank you for your suggestion. We have carefully revised the abstract and the manuscript to improve clarity, readability, and grammatical correctness. In addition, we sought the assistance of a professional English editor to thoroughly proofread and polish the manuscript, ensuring that the language meets high academic standards.

      (14) The discussion section is strong, but it could benefit from a more explicit connection between the findings and the broader ecological implications. For instance, how might the independent evolution of these effectors in different insect species impact plant-insect co-evolutionary dynamics?

      We thank the reviewer for the valuable suggestion. In the current version, we have added a paragraph in the Discussion section highlighting the broader ecological and evolutionary implications of our findings. Specifically, we discuss how the independent evolution of RLP4-targeting effectors in different insect lineages may drive plant-insect co-evolution, influence selection pressures on both plants and herbivores, and potentially shape defense diversification across plant communities. This addition helps to link our molecular findings to ecological outcomes and co-evolutionary dynamics.

      (15) The sentence on line 98, which reads " A few salivary proteins have been reported to attach to salivary sheath after secretion" seems to serve an unclear purpose in the introduction. It would be helpful for the authors to clarify its relevance to the surrounding context or to the paper's overall argument. Its inclusion currently disrupts the flow of the introduction and makes it difficult for the reader to understand its intended purpose.

      We thank the reviewer for the comment. We have revised the paragraph to clarify the relevance of salivary sheath localization to the study. Specifically, we now introduce the role of the salivary sheath as a potential scaffold for effector delivery and explicitly link previous reports of sheath-associated salivary proteins to our observation that BtRDP localizes to the salivary sheath after secretion.

      (16) The writing in lines 104-106 is both grammatically inconsistent and overly wordy. The authors switch between present and past tense ("is" and "was"), and the sentences could be made more concise to improve the clarity and flow of the text. Also check entire paper.

      We thank the reviewer for pointing this out. We have revised the sentence to improve grammatical consistency and clarity, and also checked the manuscript for similar issues. The sentence is now split into two concise statements. In addition, we have thoroughly checked the entire manuscript for similar tense inconsistencies and overly wordy sentences, and have made revisions throughout to ensure consistent past tense usage and improved readability.

      (16) The sentences on lines 111-113 are quite wordy. The core conclusion, which is that the protein affects the insect's feeding probe, could be expressed more simply and directly to improve clarity and flow. I suggest rephrasing this section to be more concise and to highlight the primary finding without the added language.

      We thank the reviewer for the helpful suggestion. We have revised the sentences to make them more concise and to emphasize the main finding that BtRDP influences the whitefly’s feeding behavior as follow: “Compared with the dsGFP control, dsBtRDP-treated B. tabaci showed a marked reduction in phloem ingestion and a longer pathway duration, indicating that BtRDP is required for efficient feeding (Fig. 2c).”

      (17) On line 118, the authors mention "subcellular location." It is not clear where the protein is localized. The authors should explicitly state the specific subcellular compartment of the protein, as this is crucial for understanding its function and interaction with other proteins.

      We thank the reviewer for this valuable comment. To clarify the subcellular localization of BtRDP, we have revised the manuscript accordingly. The transgenic line overexpressing the full-length BtRDP including the signal peptide (oeBtRDP) is expected to localize in the apoplast (extracellular space), whereas the line expressing BtRDP without the signal peptide (oeBtRDP<sup>-sp</sup>) is likely retained in the cytoplasm.

      (18) Lines 121-128, the description of the fecundity and choice assays in this section is overly wordy. The authors should present the main conclusion of these experiments more directly and concisely. The key finding is that the protein affects feeding behavior; this central point is somewhat lost in the detailed, and sometimes repetitive, phrasing.

      We thank the reviewer for this suggestion. In the revised manuscript, we have simplified the description of the fecundity and two-choice assays to highlight the main conclusion as follow: “Fecundity and two-choice assays showed that BtRDP, whether localized in the apoplast (oeBtRDP) or cytoplasm (oeBtRDP<sup>-sp</sup>), enhanced whitefly settling and oviposition compared with EV controls (Fig. 2d-i; Fig. S10), indicating that BtRDP promotes whitefly feeding behavior regardless of its subcellular location.”

      (19) Line 148, the manuscript mentions experiments involving transformation, but the transformation efficiency is not provided. Please include the transformation efficiency for all transformation experiments, as this is crucial for the reproducibility of the results.

      We thank the reviewer for raising this point. We would like to clarify that no transformation experiments were performed in this section. The experiments described involved Y2H screening using BtRDP<sup>-sp</sup> as a bait to identify interacting proteins from a N. benthamiana cDNA library. Therefore, there is no transformation efficiency to report.

      (20) Line 159, the manuscript refers to a sequence similarity around line 159 but does not provide the specific data. It is important to show the actual sequence similarity, perhaps in a supplementary figure or table, to support the claims being made.

      We thank the reviewer for this suggestion. To support our statement regarding sequence similarity, we have added the corresponding alignment figure in the Fig. S11.

      (21) Line 159, the manuscript refers to "three randomly selected salivary proteins." It is unclear from where these proteins were selected. The authors should clarify the source of this selection (e.g., a specific database or a previous study) to ensure the methodology is transparent and the results are reproducible.

      We thank the reviewer for raising this point. These proteins were selected based on previously reports (10.1093/molbev/msad221; 10.1111/1744-7917.12856). In the current version, we provide the accession of these proteins in the MS.

      (22) Line 160, the description "NtcCf9 without signal peptide and transmembrane domains" is difficult to understand. It would be clearer and more consistent to use a term like "truncated NtcCf9" and then specify which domains were removed, as this is a standard practice in molecular biology for describing protein constructs.

      We thank the reviewer for this suggestion. We have revised the manuscript to describe the construct as “truncated NtCf9” and specified that the signal peptide and transmembrane domains were removed

      (23) The phrase "incubated with anti-flag beads" on line 172 is a detail of a routine method. Such details are more appropriate for the Methods section rather than the main text, which should focus on the results and their implications. Please remove such descriptions from the main text to improve readability and flow.

      We thank the reviewer for this suggestion. We have removed the methodological detail from the main text to improve readability. We also check this throughout the MS.

      I am excited about the potential of this work and look forward to seeing the current version.

      We sincerely thank the reviewer for the positive feedback and encouragement. We appreciate your time and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein.

      Strengths:

      The authors used a wide range of methods to dissect the function of the white fly protein BtRDP and identify its host target NtRLP4.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      Weaknesses:

      (1) Serious concerns about protein work.

      I did not find the indicated protein bands for anti-BtRDP in Figures 1a and 1b in the original blot pictures shown in Figure S30. In Figure 1a, I can't get the point of showing an unspecific protein band with a size of ~190 kD as a loading control for a protein of ~ 30 kD.

      The data discrepancy led me to check other Western blot pictures. Similarly, Figures 2d, 3b, 3d, and S15b (anti-Myc) do not correspond to the original blots shown. In addition, the anti-Myc blot in Figure 4i, all blot pictures in Figures 5b, 5h, and S19a appeared to be compressed vertically. These data raised concerns about the quality of the manuscript.

      Blots shown in Figure 3d, 4f, 4g, and 4h appeared to be done at a different exposure rate compared to the complete blot shown in Figure S30. The undesirable connection between Western blot pictures shown in the figures and the original data might be due to the reduced quality of compressed figures during submission. Nevertheless, clarification will be necessary to support the strength of the data provided.

      We sincerely thank the reviewer for carefully examining our Western blot data and for pointing out these inconsistencies. The discrepancy between the figures in the main text and the original blots (Figure S30) resulted from an oversight during manuscript revision. This manuscript had undergone multiple rounds of revision after submission to another journal. During this process, the main figures and supplementary figures were updated separately, and we mistakenly failed to replace the original blot files with the corresponding current versions.

      For the different exposure rate, the blots shown in the main text were adjusted for overall contrast and brightness to enhance band visibility and presentation clarity, whereas the original images in Figure S30 were raw, unprocessed scans directly from the imaging system. For example, in the Author response image 1 below, to visualize the loading of the input sample, the output figure was adjusted for overall contrast and brightness. This was acceptable for image processing (https://www.nature.com/nature-portfolio/editorial-policies/image-integrity)

      Author response image 1.

      The same figure with brightness and contrast changes across the entire image.

      For the vertical compression, in the previous version, some images were vertically compressed for layout purposes to make the composite figures appear more visually balanced. However, after consulting relevant publication guidelines, we realized that such one-dimensional compression is not encouraged by certain journals as it may alter the original aspect ratio of the image. Therefore, in the manuscript, we have avoided any non-proportional scaling and retained the original aspect ratio of all images.

      We have now carefully rechecked all Western blot data, replaced the outdated raw blot images with the correct corresponding ones, avoid vertical compression, and ensured that the processed figures in the main text match their original data. The revised supplementary figures now accurately reflect the raw experimental results.

      (2) Misinterpretation of data.

      I am afraid the authors misunderstood pattern-triggered immunity through receptor-like proteins. It is true that several LRR-type RLPs constitutively associate with SOBIR1, and further recruit BAK1 or other SERKs upon ligand binding. One should not take it for granted that every RLP works this way. To test the hypothesis that NtRLP4 confers resistance to B.tabaci infestation, the author compared transcriptional profiles between an EV plant line and an RLP4 overexpression line. If I understood the methods and figure legends correctly, this was done without B. tabaci treatment. This experimental design is seriously flawed. To provide convincing genetic evidence, independent mutant lines (optionally independent overexpression lines) in combination with different treatments will be necessary. Otherwise, one can only conclude that overexpressing the RLP4 protein generated a nervous plant. In addition, ROS burst, but not H2O2 accumulation, is a common immune response in pattern-triggered immunity.

      We agree with the reviewer that not every RLP functions through the same mechanism as the canonical SOBIR1–BAK1 pathway. In the current version, we further examined the interaction between the whitefly salivary protein and SOBIR1, and found that they do not interact. However, our interaction assays clearly demonstrated that NtRLP4 does interact with SOBIR1. Whether NtRLP4 functions through, or exclusively through, SOBIR1 remains uncertain, and we have emphasized this limitation in the Discussion section as follow: “Although NtRLP4 interacts with SOBIR1, this alone does not confirm that it operates strictly through this canonical module. Evidence from other RLPs shows that co-receptor usage can be flexible, and some RLPs function partly or conditionally independent of SOBIR1 [39]. Therefore, a more definitive assessment of NtRLP4 signaling will therefore require genetic dissection of its co-receptor dependencies, including but not limited to SOBIR1.”

      Regarding the transcriptome analysis, our original aim was to explore why B. tabacishowed such a pronounced preference among tobacco plants. As this preference was assessed using uninfested plants, we also performed transcriptome sequencing using plants without B. tabaci treatment. The enrichment analysis demonstrated that the majority of up-regulated DEGs were associated with plant–pathogen interaction, environmental adaptation, MAPK signaling, and signal transduction pathways, while down-regulated DEGs were enriched in glutathione, carbohydrate, and amino acid metabolism. Notably, many DEGs were annotated as RLK/RLPs or WRKY transcription factors, most of which were upregulated, suggesting an enhanced defense state in the NtRLP4-overexpressing plants. The altered expression of JA- and SA-related genes (e.g., upregulation of FAD7 and downregulation of PAL and NPR1) further supported this enhanced defense and hormonal crosstalk. We agree that combining overexpression or knockout lines with insect infestation treatments would provide more direct genetic evidence for NtRLP4-mediated resistance, and we have acknowledged this as an important future direction. Nevertheless, our current data are consistent with the conclusion that NtRLP4 overexpression confers increased resistance to B. tabaci infestation.

      Finally, DAB staining for H<sub>2</sub>O<sub>2</sub> accumulation is also a well-established indicator of PTI responses, and many studies have shown that overexpression of salivary elicitors can trigger such accumulation.

      (3) Lack of logic coherence.

      The written language needs substantial improvement. This impeded the readability of the work. More importantly, the logic throughout the manuscript appeared scattered. The choice of testing protein domains for protein-protein interactions, using plants overexpressing an insect protein to study its subcellular localization, switching back and forth between using proteins with signal peptides and without signal peptides, among others, lacks a clear explanation.

      We appreciate the reviewer’s careful reading and valuable comments regarding the logical coherence of our manuscript.

      (1) To improve the English quality, the entire manuscript has been professionally edited by a certified language-editing service.

      (2) Regarding the rationale for testing protein domains in the protein–protein interaction assays: NtRLP4 is a membrane-anchored receptor-like protein composed of extracellular, transmembrane, and short intracellular domains. We aimed to determine which region of NtRLP4 is responsible for interacting with the salivary protein, as this would help infer the likely site of interaction in planta. In addition, not all RLPs contain a malectin-like domain, and we sought to verify whether the BtRDP–NtRLP4 interaction depends on this domain. To enhance the logical flow, we introduced a brief statement explaining the experimental purpose before presenting the interaction assays in the current version as follow: “These findings raised the question of which domain of NtRLP4 is responsible for binding BtRDP, as identifying the interacting domain could help infer where the salivary protein contacts the receptor in planta. We therefore dissected the NtRLP4 domains accordingly.”

      (3) With respect to using plants overexpressing an insect protein to examine subcellular localization: since both the brown planthopper and the whitefly are non-model species for which stable genetic transformation is technically unfeasible, many previous studies have used Agrobacterium-mediated transient expression or transgenic plant systems to investigate the subcellular localization of insect salivary proteins within host cells. Following these precedents, our study also employed plant systems to determine the localization of the insect protein and to assess how different localizations affect plant defense responses.

      (4) As for switching between constructs with or without signal peptides: the subcellular localization of effectors can influence their biological activity and interactions. Previous studies have used the presence or absence of signal peptides, or replacement with a PR1 signal peptide, to direct protein targeting (for example, Frontiers in Plant Science, 2022, 13:813181). Because salivary sheaths are generally considered to localize in the apoplastic space, we generated two transgenic N. tabacum lines overexpressing BtRDP: one carrying the full-length coding sequence including the signal peptide (oeBtRDP), expected to be secreted into the apoplast, and another lacking the signal peptide (oeBtRDP-sp), likely retained in the cytoplasm. In the current version, we clarified this rationale and added references to similar studies to improve the manuscript’s logic and readability. Details are as follow: “To investigate the role of BtRDP in different subcellular location of host plants, we constructed two transgenic N. tabacum lines overexpressing BtRDP: one carrying the full-length coding sequence including the signal peptide (oeBtRDP), which is expected to be secreted into the apoplast (extracellular space), and the other lacking the signal peptide (oeBtRDP<sup>-sp</sup>), which is likely retained in the cytoplasm.”

      Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al. investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      Weaknesses:

      (1) I found the naming of BtRDP and NlSP694 somewhat confusing. The authors defined BtRDP as "B. tabaci RLP-degrading protein," whereas NlSP694 appears to have been named after the last three digits of its GenBank accession number (MF278694, presumably). Is there a standard convention for naming newly identified proteins, for example, based on functional motifs or sequence characteristics? As it stands, the inconsistency makes it difficult for readers to clearly distinguish these proteins from those reported in other studies.

      Thank you for your comment. These are species-specific salivary proteins that have not been reported or annotated in previous studies. Because no homologous genes could be identified in other species, there are no existing names or annotations for these proteins. For such lineage-specific salivary proteins, it is common in recent studies to name them according to their experimentally identified functions. For example, a recently reported salivary protein was named SR45-interacting salivary protein (SISP) based on its function (10.1111/nph.70668). Following this convention, we adopted a similar functional naming strategy in this study. We acknowledge that there may not yet be a standardized rule for naming such proteins, and we would be glad to follow a more authoritative naming guideline if possible.

      (2) Figure 2 and other figures. Transgenic experiments require at least two independent lines, because results from a single line may be confounded by position effects or unintended genomic alterations, and multiple lines provide stronger evidence for reproducibility and reliability.

      We appreciate the reviewer’s suggestion. In our study, two independent transgenic lines were used to ensure the reproducibility and reliability of the results. One representative line was presented in the main figures, while data from the second independent line were included in the supplementary figures. To make this clearer, we have emphasized in the manuscript that bioassays were conducted using two independent transgenic lines.

      (3) Figure 3e. Quantitative analysis of NtRLP4 was required. Additionally, since only one band was observed in oeRLP, were any tags included in the construct?

      Thank you for your comment. In the current version, quantitative analysis of NtRLP4 expression has been performed and is now presented in Figure 3. For the oeRLP plants, no tag was fused to NtRLP4; thus, anti-RLP serum was used to detect the target bands. In contrast, oeBtRDP and oeBtRDP-sp were fused with C-terminal FLAG tags, and their detection was carried out using anti-FLAG serum. This information has been clarified in the revised Methods section as follows: “The oeBtRDP and oeBtRDP<sup>-sp</sup> were fused with C-terminal FLAG tags, while no tag was fused to oeNtRLP4.”

      (4) Figure 4a. The RNAi effect appears to be well rescued in Line 1 but poorly in Line 2. Could the authors clarify the reason for this difference?

      Thank you for pointing this out. We also noticed that the RNAi effect appeared to be better rescued in Line 2 than in Line 1. Based on our measurements, the silencing efficiency of NtRLP4 in RNAi-RLP4 Line 1 was markedly weaker than in Line 2, which likely explains the difference in rescue efficiency. In the current version, we have clarified this point as follows: “Both RNAi-RLP lines showed reduced NtRLP4 levels compared with EV plants, with RNAi-RLP#2 exhibiting a stronger silencing effect (Fig. S19a).” “The differential rescue effect between the two RNAi lines likely resulted from their different NtRLP4 silencing efficiencies, with the lower NtRLP4 level in RNAi-RLP#2 leading to a more complete rescue phenotype.”

      (5) ROS accumulation is shown for only a single leaf. A quantitative analysis of ROS accumulation across multiple samples would be necessary to support the conclusion. The same applies to Figure 16f.

      Thank you for pointing this out. The H<sub>2</sub>O<sub>2</sub> accumulation experiments have been repeated for 5 times in Figure 4 and Figure S16f. In the current version, we addressed that “the experiment is repeated five times with similar results” in the figure legends.

      (6) Figure 4f: NtRLP4 abundance was significantly reduced in oeBtRDP plants but not in oeBtRDP-SP. Although coexpression analysis suggests that BtRDP promotes NtRLP4 degradation in an ubiquitin-dependent manner, the reduced NtRLP4 levels may not result from a direct interaction between BtRDP and NtRLP4. It is possible that BtRDP influences other factors that indirectly affect NtRLP4 abundance. The authors should discuss this possibility.

      Thank you for your valuable suggestion. We agree that the reduced NtRLP4 abundance may not necessarily result from a direct interaction between BtRDP and NtRLP4. In the manuscript, we have further discussed this possibility as follows: “Notably, BtRDP and NlSP104 shared no sequence or structural similarity and lack resemblance to known eukaryotic ubiquitin-ligase domains. Their interaction with RLP4s occurs in the extracellular space (Fig. 3d; Fig. 5c), whereas the ubiquitin-proteasome system primarily functions in the cytosol and nucleus [46]. Furthermore, NtRLP4 reduction is observed only in oeBtRDP transgenic plants, not in oeBtRDP-sp plants (Fig. 4f), suggesting that BtRDP exerts its influence on NtRLP4 in the extracellular space. These observations collectively argue against the possibility that BtRDP or NlSP694 possesses intrinsic E3 ligase activity capable of directly ubiquitinating RLP4s within plant cells. Importantly, the reduced NtRLP4 levels may not result from a direct physical interaction between BtRDP and NtRLP4. Instead, BtRDP may indirectly affect RLP4 post-translational modification, thereby accelerating its degradation, which warrants further investigation”

      (7) The statement in lines 335-336 that 'Overexpression of NtRLP4 or NtSOBIR1 enhances insect feeding, while silencing of either gene exerts the opposite effect' is not supported by the results shown in Figures S16-S19. The authors should revise this description to accurately reflect the data.

      Thank you for pointing this out. We agree that our original statement was not precise, as we measured the insect settling preference and oviposition on transgenic plants, but did not directly assess the feeding behavior of B. tabaci. Therefore, we have revised the description in the manuscript to more accurately reflect our data as follows: “Overexpression of NtRLP4 or NtSOBIR1 in N. tabacum is attractive to B. tabaci and promotes insect reproduction, whereas silencing of either gene exerts the opposite effect.”

      (8) BtRDP is reported to attach to the salivary sheath. Does the planthopper NlSP694 exhibit a similar secretion localization (e.g., attachment to the salivary sheath)? The authors should supplement this information or discuss the potential implications of any differences in secretion localization between BtRDP and NlSP694 for their respective modes of action.

      Thank you for your insightful suggestion. We agree that determining the secretion localization of NlSP694 would provide valuable information for understanding its potential mode of action. Immunohistochemical (IHC) staining is indeed a critical approach for such analysis. However, in this study, we were unable to express NlSP694 in Escherichia coli, and the antibody generated using a synthesized peptide did not show sufficient specificity or sensitivity for IHC detection. Consequently, we were unable to determine whether NlSP694 is attached to the salivary sheath. Therefore, whether BtRDP and NlSP694 acted in different mode require further investigation.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1e. The BtRDP-labeled fluorescent signal is difficult to discern. An enlarged view of the target region would be helpful for clarity.

      Thank you for your suggestion. In the current version, an enlarged view of the target region was provided below the figure.

      (2) The finding that BtRDP accumulates in the salivary sheath secreted by Bemisia tabaci is important for understanding the subcellular localization of this protein during actual insect feeding. I suggest moving Figure S5 to the main text.

      Thank you for your suggestion. Figure S5 has been moved to Fig. 1f in the current version.

      (3) Please carefully cross-check the figure numbering to ensure that all in-text citations correspond to the correct figures and panels. i.e., lines 136,188,192, and 194.

      Thank you for pointing this out. We corrected them in the current version.

    1. eLife Assessment

      This study demonstrates that endothelial toll-like receptor 4 is a central regulator of leptomeningeal inflammation in the context of neonatal E. coli meningitis. The data are derived from cell type-specific gene knockout in mice as well as from cultured endothelial cells, and are generally solid, with only minor weaknesses in analysis and interpretation. This work is important as it advances our understanding of host cellular processes and molecular pathways underlying meningitis pathogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Seegren and colleagues demonstrate that in a mouse model of neonatal E. coli meningitis, loss of endothelial toll-like receptor 4 (TLR4) leads to a marked decrease in transcriptional dysregulation across multiple leptomeningeal cell types, a decrease in vascular permeability, and a decrease in macrophage abundance. In contrast, loss of macrophage TLR4 had less pronounced effects. Using cultured wild-type and TLR4-knockout endothelial cells, the authors further demonstrate that TLR4-NF-κB signaling leads to reversible internalization of the tight junction protein claudin-5, establishing a potential mechanism of increased vascular permeability. Finally, the authors use RNA-sequencing of wild-type and TLR4-knockout endothelial cells to define the TLR4-dependent cell-autonomous transcriptional response to E. coli.

      Strengths:

      (1) The authors address an important, well-motivated hypothesis related to the cellular and molecular mechanisms of leptomeningeal inflammation.

      (2) The authors use model systems (mouse conditional knockouts and cultured endothelial cells) that are appropriate to address their hypotheses. The data are of high quality.

      Weaknesses:

      (1) The authors perform single-nucleus RNA-seq on dissected leptomeninges from control and E. coli-infected mice across three genotypes (WT, Tlr4MKO, and Tlr4ECKO). A major discovery from this experiment, as summarized by the authors, is: "Tlr4ECKO mice exhibited a global attenuation of infection-induced transcriptional responses across all major leptomeningeal cell types, as judged by the positions of cell clusters in the UMAP." This conclusion could be considerably strengthened by improving the qualitative and quantitative analysis.

      (2) The authors interpret E. coli infection-induced increases in leptomeningeal sulfo-NHS-biotin as evidence of compromised BBB integrity (i.e., extravasation from the vasculature) (Results, page 7), but another possible route in this context is sulfo-NHS-biotin entry from the dura across a compromised arachnoid barrier. The complete rescue in Tlr4ECKOs is strongly suggestive that the vascular route dominates, but it would strengthen the work if the authors could assess arachnoid barrier fidelity (e.g. via immunohistochemistry). At a minimum, authors should mention that the sulfo-NHS-biotin signal in this context may represent both vascular and arachnoid barrier extravasation.

      (3) The authors state that "deletion of TLR4 prevented both NF-κB nuclear translocation and Cldn5 internalization in response to E. coli (Figure 4A-D)" (Results, page 9). In Figures 4C and D, however, there is no indicator of a statistical test directly comparing the two genotypes. A comparison of within-genotype P-values should not be used to support a genotype difference (PMID: 34726155).

      (4) In the first paragraph of the Results, the authors summarize the meningeal layers as (1) pia, (2) subarachnoid space, (3) arachnoid, and (4) dura, and then state "The second and third layers constitute the leptomeninges." This definition of leptomeninges seems to omit the pia, which is widely considered part of the leptomeninges (PMID: 37776854).

      (5) The Cdh5-CreER/+;Tlr4 fl/- mouse lacks TLR4 in all endothelial cells (i.e., in peripheral organs as well as CNS/leptomeninges), and, as the authors note, the periphery is exposed to E. coli. It would be helpful if the authors could comment in the Discussion on the possibility that peripheral effects (e.g., peripheral endothelial cytokine production, changes to blood composition as a result of changes to peripheral endothelial permeability) may contribute to the observed leptomeningeal phenotypes.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a postnatal mouse model of E. coli bacterial meningitis and a mouse brain endothelioma cell line combined with cell-type-specific gene deletion to study the function of endothelial TLR4, a cell surface receptor that recognizes gram positive bacterial wall components, in the local leptomeningeal (LPM) response with a focus on endothelial barrier breakdown mediated by TLR4. Single-cell transcriptional profiling and imaging studies using whole-mount preps of the LPM support that LPM endothelial, CD206+ local macrophage and LPM fibroblast and arachnoid barrier cell inflammatory response and is abrogated in endothelial-specific KO of TLR4, pointing to a role for endothelial TLR4 in local LPM response. Culture studies using Bend3.1 cells (a mouse brain endothelioma cell line) support a direct role for TLR4 in the bacteria-mediated inflammatory response and in internalization of Cldn5 via the endosomal-lysosomal pathway, resulting in loss of barrier integrity

      Strengths:

      The local LPM cell response in meningitis and the role of specific LPM cells in inflammation and CNS barrier breakdown have not been extensively studied, despite ample evidence for primary immune response in the meninges in human patients and in animal models. The authors employ a robust, multi-model approach using both in vivo and in vitro models with cell-type-specific knockout to study the function of TLR4 in brain endothelial cell response. The authors nicely combine functional barrier assays with IF for junctional localization in their experimental design, and they delve into potential mechanisms of Cldn5 internalization using markers of endosomal-lysosomal pathway localization. The authors also describe a new type of barrier assay using a streptavidin-coated plate upon which barrier-forming cell cultures can be placted, this could be a very useful alternative or complement to other size-selective barrier assays and presumably could work for other barrier forming cells types, likely epithelial cells.

      Weaknesses:

      (1) There are no measures of bacterial burden in peripheral organs, blood, in the LPM or brain in the TLR4 endothelial cKO mice. Lack of TLR4 in endothelial cells could prevent bacterial 'access' into the LPM and brain, essentially preventing meningitis and leading to a lack of inflammatory responses in the LPM-located cells simply because there is no bacteria present. Bacteremia may also be reduced, as might inflammatory responses in peripheral organs with TLR4-deficient peripheral endothelium. Bacterial counts and inflammatory measures in peripheral organs and blood are important to better understand the mechanism(s) underlying the reduced inflammatory profile in LPM cells and no LPM endothelial breakdown in the Tlr4 endothelial cKO mice. In other words, does deleting TLR4 in EC protect against the development of meningitis by somehow blocking bacteria access to the LPM (this would be supported by low or no CFU counts in infected Tlr4 endothelial cKO) or is it what the authors appear to propose in Figure 1J that TLF4 in EC is the only cell responding to the bacteria to trigger the immune cascade in the LPM? More data is needed to resolve this, as this is a major claim of the paper.

      (2) The authors look at the underlying cortical response (cerebral vasculature for ICAM and immune cells) but do not use markers that could identify microglia (Iba1), the primary resident immune cell (CD206 is not useful, at this stage, in perivascular macrophages that are extremely sparse in the postnatal brain). This would be important to better study the impact on CNS resident immune cell morphological activation.

      (3) The authors suggest that Cldn5 junctional localization is selectively disrupted upon bacterial exposure, mediated by TLR4 - they suggest this based on studying PECAM, GLUT-1, ZO-1 and B-catenin (all normally junction or cell surface located in cultured Bend3.1) in relationship to Cldn5 localization (normally high) - it is possibly these are also impact by bacteria exposure (maybe through different mechanisms?) - a better measure would be to use the similar cyto/PM measure they do for Cldn5 in Fig. 4D and to evaluate this or to use intensity measurements.

      (4) The discussion could benefit from delving more into the prior literature on E.coli-mediated breakdown of junctions in cultured human microvascular brain endothelial cell model and critical host-pathogen interactions of the bacteria with ECs (PMID: 14593586), and how this might involve TLR4.

      (5) It would be important to discuss how their results relate to earlier studies on TLR4-/- and TLR2-/- global knockout mice and protection vs vulnerability to development of meningitis (see PMCID: PMC3524395) - this paper showed that TLR4 global KO mice have increased susceptibility to die from meningitis and have much higher CFU counts in the CNS. In this manuscript and their prior work (Wang et al., 2023), this group shown that both global TLR4-/- mutants and their EC-specific KO have reduced barrier permeability, but we don't have any information about CFU or susceptibility to death from meningitis in their models.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates the molecular underpinnings of immune responses in the leptomeninges in neonatal bacterial meningitis. Bacterial meningitis is a major disease burden, particularly for neonates, and it has previously been noted that the meningeal immune environment in infants is permissive to opportunistic infection (Kim et al., Sci Immunol, 2023). There is less known about the contribution of the stromal compartment to meningeal immune responses. Seegren et al. interrogate the role of leptomeningeal endothelium in host defence in E. coli infected neonatal mice using mouse genetic tools to delete the LPS receptor Tlr4 from either endothelial cells (using Cdh5-CreER) or macrophages (using LysM-Cre). The authors use snRNAseq, cleared cortical mounts, and in vitro work to define the impact of E. coli infection on leptomeningeal endothelial cells. This study uses a range of innovative techniques to probe the role of the stromal compartment in meningitis.

      Strengths:

      This study makes excellent use of cleared cortical mounts to examine the biology of the leptomeninges, in particular, changes to the endothelium, with unprecedented detail. In combination with high-quality sequencing data provide new insights into the impact of meningitis on the leptomeninges. The data presented by the authors is of very high quality.

      Weaknesses:

      The weaknesses of the study were in terms of interpretation and perhaps study design.

      (1) Most importantly, the authors need to provide additional validation of their conditional knockout models. The authors need to confirm that the Cdh5-CreER does not impact leptomeningeal fibroblasts and to confirm gene deletion in macrophages.

      (2) The authors could also strengthen the paper by providing data on the impact of these conditional knockout models on the course of meningitis and bacterial burden.

      (3) Finally, it is perhaps not surprising that Tlr4 is required for meningitis responses with E. coli. However, it is unclear if these findings can be generalised to other, more common, meningitis infections (streptococcal/pneumococcal).

      (4) There are additional minor issues; for instance, the arachnoid fibroblast 2 population appears to closely resemble dural border cells.

      (5) The cell line model (bEnd.3) is a relatively low-fidelity model of BBB endothelial cells, and this should be acknowledged.

      With these caveats, it is difficult to be certain that the endothelium alone is the driver of meningeal immune responses in meningitis, and what the impact of these is.

    1. eLife Assessment

      This fundamental study advances our understanding of how dietary patterns shape cancer immunity by identifying a link between a Mediterranean-mimicking diet, gut bacteria, and a metabolite that enhances anti-tumor immune responses. The evidence supporting the main conclusions is solid, based on carefully controlled diet experiments, measurements of gut-derived molecules, and functional immune analyses across multiple models, together with supportive observations in human data. The work will be of broad interest to biologists working on microbiota and cancer. However, there are several issues that the authors should address to improve the manuscript.