10,000 Matching Annotations
  1. Feb 2026
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal Zn²-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a plausible hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channeling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO, but the concentrations used for THF greatly exceed its reported KD (enzyme concentration used in this assay is not reported). It has previously been shown that HCHO and THF can couple spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic channeling. An additional control that can rule out this possibility would help to strengthen the evidence. For example, mutating the THF binding site to prevent THF binding to the protein complex could clarify whether the observed decrease in HCHO depends on enzyme-mediated proximity effects. A mutation which would specifically disable channeling could be even more convincing (maybe at the narrowest bottleneck).

      We agree with the reviewer that HCHO and THF can react spontaneously in a non-enzymatic manner, and our experiments were not intended to demonstrate enzymatic channeling. The linear regression analysis in Figure 1C was designed solely to confirm that HCHO reacts with THF under our assay conditions. Accordingly, THF was titrated over a broad concentration range starting from zero, and the observed THF concentration–dependent decrease in HCHO reflects this chemical reactivity.

      We do not interpret these data as evidence that the enzyme catalyzes or is required for the HCHO–THF coupling reaction. Instead, the structural observation of an enclosed channel is presented as a separate finding. We have clarified this point in the revised text to avoid overinterpretation of the biochemical data (page 2, line 16).

      Another concern is that the observed decrease in HCHO could alternatively arise from a reduced production of HCHO due to a negative allosteric effect of THF binding on the active site. From this perspective, the interpretation would be more convincing if a clear coupled effect could be demonstrated, specifically, that removal of the product (HCHO) from the reaction equilibrium leads to an increase in the catalytic efficiency of the demethylation reaction.

      We agree that, in principle, a decrease in detectable HCHO could also arise from an indirect effect of THF binding on enzyme activity. However, in our study the experiment was not designed to assess catalytic coupling or allosteric regulation. The assay in question monitors HCHO levels under defined conditions and does not distinguish between changes in HCHO production and downstream consumption.

      Additionally, we do not interpret the observed decrease in HCHO as evidence that THF binding enhances catalytic efficiency, or that removal of HCHO shifts the reaction equilibrium. Instead, the data are presented to establish that HCHO can react with THF under the assay conditions. Any potential allosteric effects of THF on the demethylation reaction, or kinetic coupling between HCHO removal and catalysis, are beyond the scope of the current study, and are not claimed.

      While the enzyme kinetics appear to have been performed thoroughly, the description of the kinetic assays in the Methods section is very brief. Important details such as reaction buffer composition, cofactor identity and concentration (Zn<sup>2+</sup>), enzyme concentration, defined temperature, and precise pH are not clearly stated. Moreover, a detailed methodological description could not be found in the cited reference (6), if I am not mistaken.

      Thank you for the suggestion. We have added reference [24] to the methodological description on page 8. The Methods section has been revised accordingly on page 8 under “TDM Activity Assay,” without altering the Zn<sup>2+</sup> concentration.

      The composition of the complex is intriguing but raises some questions. Based on SDS-PAGE analysis, the purified protein appears to be predominantly full-length TDM, and size-exclusion chromatography suggests an apparent molecular weight below 100 kDa. However, the cryo-EM structure reveals a substantially larger complex composed of two full-length monomers and two half-domains.

      We appreciate the reviewer’s careful analysis of the apparent discrepancy between the biochemical characterization and the cryo-EM structure. This issue is addressed in Figure S1, which may have been overlooked.

      As shown in Figure S1, the stability of TDM is highly dependent on protein and salt conditions. At 150 mM NaCl, SEC reveals a dominant peak eluting between 10.5 and 12 mL, corresponding to an estimated molecular weight of ~170–305 kDa (blue dot, Author response image 1). This fraction was explicitly selected for cryo-EM analysis and yields the larger complex observed in the reconstruction. At lower salt concentrations (50 mM) or higher (>150 mM NaCl), the protein either aggregates or elutes near the void volume (~8 mL).

      SDS–PAGE analysis detects full-length TDM together with smaller fragments (~40–50 kDa and ~22–25 kDa). The apparent predominance of full-length protein on SDS–PAGE likely reflects its greater staining intensity per molecule and/or a higher population, rather than the absence of truncated species.

      Author response image 1.

      Given the lack of clear evidence for proteolytic fragments on the SDS-PAGE gel, it is unclear how the observed stoichiometry arises. This raises the possibility of higher-order assemblies or alternative oligomeric states. Did the authors attempt to pick or analyze larger particles during cryo-EM processing? Additional biophysical characterization of particle size distribution - for example, using interferometric scattering microscopy (iSCAT)-could help clarify the oligomeric state of the complex in solution.

      Cryo-EM data were collected exclusively from the size-exclusion chromatography fraction eluting between 10.5 and 12 mL. This fraction was selected to isolate the dominant assembly in solution. Extensive 2D and 3D particle classification did not reveal distinct classes corresponding to smaller species or higher-order oligomeric assemblies. Instead, the vast majority of particles converged to a single, well-defined structure consistent with the 2 full-length + 2 half-domain stoichiometry.

      A minor subpopulation (~2%) exhibited increased flexibility in the N-terminal region of the two full-length subunits, but these particles did not form a separate oligomeric class, indicating conformational heterogeneity rather than alternative assembly states (Author response image 2). Together, these data support the 2+2½ architecture as the predominant and stable complex under the conditions used for cryo-EM. Additional techniques, such as iSCAT, would provide complementary information, but are not required to support the conclusions drawn from the SEC and cryo-EM analyses presented here.

      Author response image 2.

      The authors mention strict symmetry in the complex, yet C2 symmetry was enforced during refinement. While this is reasonable as an initial approach, it would strengthen the structural interpretation to relax the symmetry to C1 using the C2-refined map as a reference. This could reveal subtle asymmetries or domain-specific differences without sacrificing the overall quality of the reconstruction.

      We thank the reviewer for this thoughtful suggestion. In standard cryo-EM data processing, symmetry is typically not imposed initially to minimize potential model bias; accordingly, we first performed C1 refinement before applying C2 symmetry. The resulting C1 reconstructions revealed no detectable asymmetry or domain-specific differences relative to the C2 map. In addition, relaxing the symmetry consistently reduced overall resolution, indicating lower alignment accuracy and further supporting the presence of a predominantly symmetric assembly.

      In this context, the proposed catalytic role of Zn<sup>2+</sup> raises additional questions. Why is a 2:1 enzyme-to-metal stoichiometry observed, and how does this reconcile with previous reports? This point warrants discussion. Does this imply asymmetric catalysis within the complex? Would the stoichiometry change under Zn<sup>2+</sup>-saturating conditions, as no Zn<sup>2+</sup> appears to be added to the buffers? It would be helpful to clarify whether Zn<sup>2+</sup> occupancy is equivalent in both active sites when symmetry is not imposed, or whether partial occupancy is observed.

      The observed ~2:1 enzyme-to-Zn<sup>2+</sup> stoichiometry likely reflects the composition of the 2 full-length + 2 half-domain (2+2½) complex. In this assembly, only the core domains that are fully present in the complex contribute to metal binding. The truncated or half-domains lack the Zn<sup>2+</sup> binding domain. As a result, only two metal-binding sites are occupied per assembled complex, consistent with the measured stoichiometry.

      We note that Zn<sup>2+</sup> was not deliberately added to the buffers, so occupancy may not reflect full saturation. Based on our cryo-EM and biochemical data, both metal-binding sites in the full-length subunits appear to be occupied to an equivalent extent, and no clear evidence of asymmetric catalysis is observed under these current experimental conditions. Full Zn<sup>2+</sup> saturation could potentially increase occupancy, but was not explored in these experiments.

      The divalent ion Zn<sup>2+</sup> is suggested to activate water for the catalytic reaction. I am not sure if there is a need for a water molecule to explain this catalytic mechanism. Can you please elaborate on this more? As one aspect, it might be helpful to explain in more detail how Zn-OH and D220 are recovered in the last step before a new water molecule comes in.

      Thank you for your suggestion. We revised our text in page 2 as bellow.

      Based on our structural and biochemical data, we propose a structurally informed working model for TMAO turnover by TDM (Scheme 1). In this model, Zn<sup>2+</sup> plays a non-redox role by polarizing the O–H bond of the bound hydroxyl, thereby lowering its pK<sub>a</sub>. The D220 carboxylate functions as a general base, abstracting the proton to generate a hydroxide nucleophile. This hydroxide then attacks the electrophilic N-methyl carbon of TMAO, forming a tetrahedral carbinolamine (hemiaminal) intermediate. Subsequent heterolytic cleavage of the C–N bond leads to the release of HCHO. D220 then switches roles to act as a general acid, donating a proton to the departing nitrogen, which facilitates product release and regenerates the active site. This sequence allows a new water molecule to rebind Zn<sup>2+</sup>, enabling subsequent catalytic turnovers. This proposed pathway is consistent with prior mechanistic studies, in which water addition to the azomethine carbon of a cationic Schiff base generates a carbinolamine intermediate, followed by a rate-limiting breakdown to yield an amino alcohol and a carbonyl compound, in the published case, an aldehyde (Pihlaja et al., J. Chem. Soc. Perkin Trans. 2, 1983, 8, 1223–1226).

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channeling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channeling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe<sup>2+</sup>, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense to me, for several reasons:

      (i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which is unprecedented, and even if it were possible, would generate methanol, not formaldehyde.

      (ii) The amine oxide is then proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox-active metal ion;

      (iii) The authors say "forming a tetrahedral intermediate, as described for metalloproteinase", but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism, there is no carbonyl to attack, so this statement is just wrong.

      So on several counts, the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn<sup>2+</sup>cannot fulfil that role. Fe<sup>2+</sup> could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so, then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands, the proposed catalytic mechanism is unacceptable.

      We thank the reviewer for the detailed and thoughtful mechanistic critique. We fully agree that Zn<sup>2+</sup> is not redox-active, and cannot directly mediate oxidative demethylation or amine oxide deoxygenation. We acknowledge that the oxidative step required for the conversion of TMAO to HCHO is not explicitly resolved in the present study. Accordingly, we have revised the manuscript to remove any implication of Zn<sup>2+</sup>-mediated redox chemistry, and have eliminated the previously imprecise analogy to zinc metalloproteases.

      We recognize and now discuss prior biochemical work on TMAO demethylase from Methylocella silvestris (MsTDM), which proposed an iron-dependent oxidative mechanism (Zhu et al., FEBS 2016, 3979–3993). That study reported approximately one Zn<sup>2+</sup> and one non-heme Fe<sup>2+</sup> per active enzyme, implicated iron in catalysis through homology modeling and mutagenesis, and used crossover experiments suggesting a trimethylamine-like intermediate and oxygen transfer from TMAO, consistent with an Fe-dependent redox process. However, that system lacked experimental structural information, and did not define discrete metal-binding sites.

      In contrast,

      (1) Our high-resolution cryo-EM structures and metal analyses of TDM consistently reveal only a single, well-defined Zn<sup>2+</sup>-binding site, with no structural evidence for an additional iron-binding site as in the previous report (Zhu et al., FEBS 2016, 3979–3993).

      (2) To investigate the potential involvement of iron, we expressed TDM in LB medium supplemented with Fe(NH<sub>4</sub>)<sub>2</sub>SO<sub>4</sub> and determined its cryo-EM structure. This structure is identical to the original one, and no EM density corresponding to a second iron ion was observed. Moreover, the previously proposed Fe<sup>2+</sup>-binding residues are spatially distant (Figure S6).

      (3) ICP-MS analysis shows undetectable Iron, and only Zinc ion (Figure S5).

      (4) Our enzyme kinetics analysis with the TDM without Iron is comparable to that of from MsTDM (Figure 1A). The differences in Km and Vmax we propose is due to the difference in the overall sequence of the enzymes. Please also see comment at the end on a new published paper on MsTDM.

      While we cannot comment on the MsTDM results, our ‘experimental’ results do not support the presence of an iron-binding site. Our data indicate that this chemistry is unlikely to be mediated by a canonical non-heme iron center as proposed for MsTDM. We therefore revised our model as a structural framework that rationalizes substrate binding, metal coordination, and product stabilization, while clearly delineating the limits of mechanistic inference supported by the current data.

      The scheme 1 and proposal mechanism section were revised in page 4. Figure S6 was added.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors do quote a Vmax of 16.52 µM/min/mg; however, these are incorrect units for Vmax, they should be µmol/min/mg. There is a further inconsistency between the text saying µM/min/mg and the Figure saying µM/min/µg.

      Thank you for the correction. We converted the V<sub>max</sub> unit to nmol/min/mg. and revised the text in page 2. We also compared with the value of the previous report in the TDM enzyme by revising the text on page 2. See also the note on a newly published manuscript and its comparison.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %.

      We appreciate this important point. We have revised Figure 1C to present HCHO levels in absolute concentration units. While the current data demonstrate reduced detectable HCHO in the presence of THF, we agree that distinguishing between HCHO consumption and potential THF-mediated enzyme inhibition would require dedicated time-course and protein-dependence experiments. We have therefore revised the description to avoid overinterpretation and limit our conclusions to the observed changes in HCHO concentration in page 2, line 18-19.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme; is that the strain used? Details about the strain are needed, and the accession for the protein sequence.

      Thank you for this comment. We now indicate that the enzyme is derived from Paracoccus sp. DMF and provide the accession number for the protein sequence (WP_263566861) in the Experimental Section (page 8, line 4).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The ITC experiment requires a ligand-into-buffer titration as an additional control. Also, maybe I misunderstood the molar ratio or the concentrations you used, but if you indeed added a total of 4.75 μL of 20 μM THF into 250 μL of 5 μM TDM, it is not clear to me how this leads to a final molar ratio of 3.

      We thank the reviewer for this suggestion. A ligand-into-buffer control ITC experiment was performed and is now included in Figure S8C, which shows no realizable signal.

      Regarding the molar ratio, it is our mistake. The experiment used 2.45 μL injections of 80 μM THF into 250 μL of 5 μM TDM. This corresponds to a final ligand concentration of ~12.8 μM, giving a ligand-to-protein molar ratio of ~2.6. We revised our text in page 9, ITC section.

      (2) Characterization/quality check of all mutant enzymes should be performed by NanoDSF, CD spectroscopy or similar techniques to confirm that proteins are properly folded and fit for kinetic testing.

      We appreciate the reviewer’s suggestion. All mutant proteins, including D220A, D367A, and F327A, were purified with yields similar to the wild-type enzyme. Additionally, cryo-EM maps of the mutants show well-defined density and overall structural integrity consistent with the wild-type. These findings indicate that the introduced mutations do not significantly affect protein folding, supporting their use for kinetic analysis. While NanoDSF might reveal differences in thermal stability due to mutations, it does not provide structural information. Our conclusions are not based on minor differences in thermostability. Our cryo-EM structures of the mutants offer much more reliable structural data than CD spectroscopy.

      (3) Best practice would suggest overlapping pH ranges between different buffer systems in the pH-dependence experiments to rule out buffer-specific effects independent of pH.

      We thank the reviewer for this helpful suggestion. We agree that overlapping pH ranges between different buffer systems can be valuable for excluding buffer-specific effects. In this study, the pH-dependence experiments were intended to provide a qualitative assessment of pH sensitivity rather than a detailed analysis of buffer-independent pKa values. While we cannot fully exclude minor buffer-specific contributions, the overall trends observed were reproducible and sufficient to support the conclusions drawn. We have added a clarifying statement to the revised manuscript to reflect this consideration, page 2, line 12.

      (4) Structural comparison revealed high similarity to a THF-binding protein, with superposition onto a T protein.": It would be nice to show this as an additional figure, as resolution and occupancy for THF are low.

      We thank the reviewer for this suggestion. To address this point, we have revised Figure S6 by adding an additional panel (C, now is Figure S7C) showing the structural superposition of TDM with the THF-binding T protein. This comparison is included to better illustrate the structural similarity, despite the limited resolution and partial occupancy of THF density in our map.

      (5) Editing could have been done more thoroughly. Some spelling mistakes, e.g. "RESEULTS", "redius", "complec"; kinetic rate constants should be written in italic (not uniform between text and figures); Prism version is missing; Vmax of 16.52 µM/min/mg - doublecheck units; Figure S1B: The "arrow on the right" might have gone missing.

      We corrected the spelling in page 2 ~ line 10, page 5 ~ line 34, page 6 ~ line40. Prism version was added. The arrow was added into figure S1B. The Vmax unit is corrected to nmol/min/mg.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must re-examine the metal content of their purified enzyme, looking in particular for Fe or another redox-active metal ion, which could be involved in a reasonable catalytic mechanism.

      We thank the reviewer for this suggestion and have carefully re-examined the metal content of TDM. Elemental analyses by EDX and ICP-MS consistently detected Zn<sup>2+</sup> in purified TDM (Zn:protein ≈ 1:2), whereas Fe was below the detection limit across multiple independent preparations (Fig. S5A,B). To assess whether iron could be incorporated or play a functional role, we expressed TDM in E. coli grown in LB medium supplemented with Fe(NH<sub>4</sub>SO<sub>4</sub>)<sub>2</sub> and performed activity assays in the presence of exogenous Fe<sup>2+</sup>. Neither condition resulted in enhanced enzymatic activity.

      Consistent with these biochemical data, all cryo-EM structures reveal a single, well-defined metal-binding site coordinated by three conserved cysteine residues and occupied by Zn<sup>2+</sup>, with no evidence for an additional iron species or other redox-active metal site.

      (2) The specific activity of the enzyme should be quoted in the same units as other literature papers, so that the enzyme activity can be compared. It could be, for example, that the content of Fe (or other redox-active metal) is low, and that could then give rise to a low specific activity.

      Thank you for the suggestion, we quoted the enzyme units as similar with previous report. and revised the text in in page 2.

      Since the submission of our paper a new report on MsTDM has been published (Cappa et al., Protein Science 33(11), e70364). It further supports our findings. First, the reported kinetic parameters using ITC (Vmax = 0.309 μmol/s, approximately 240 nmol/min/mg; Km = 0.866 mM) are comparable to our observed (156 nmol/min/mg and 1.33 mM, respectively) in the absence of exogenous iron. Second, the optimal pH for enzymatic activity similar to that observed in our paraTDM. Third, the reported two-state unfolding behavior is consistent with our cryo-EM structural observations, in which the more dynamic subunits appear to destabilize prior to unfolding of the core domains. Based on these findings, we now propose that Zn<sup>2+</sup> appears to function primarily as an organizational cofactor at the core catalytic domain (revised Scheme 1).

    1. eLife Assessment

      This study provides a useful contribution to understanding how wearable augmentation devices interact with human proprioception, using a longitudinal design over a single session. Results demonstrate that the perceptual representation of the biological finger and augmentation device changes across different phases of device exposure and use. The evidence supporting a representational change over time is solid, although it is still not clear whether these changes reflect three distinct phases of sensorimotor plasticity, as argued, versus 'washout' or adaptation effects. This work will be of interest to researchers studying body representation, sensorimotor learning, and human-technology interaction.

    2. Reviewer #1 (Public review):

      This study by Radziun and colleagues investigates the effects of using a hand-augmentation device on mental body representations. The authors use a proprioceptive localisation task to measure metric representations of finger length before and after participants wear the device, and then before and after they learn to use the device, which extends the lengths of the fingers by 10 cm. The authors find changes between different time points, which they interpret as evidence for three distinct forms of plasticity: one related to simply wearing the device, one related to learning to use it, and an aftereffect after taking the device off. A control experiment with a similar device, which does not lengthen the fingers, showed the first and third of these forms of plasticity, but not the second.

      This study takes an interesting approach to a timely and theoretically significant issue. The study appears to be appropriately designed and conducted. There are, however, some points which require clarification.

      (1) The nature of the localization task is unclear. On its face, the task appears to involve localization of each landmark within the 2-dimensional surface of the touchscreen. However, the regression analysis presupposes that localization is made in a 1-dimensional space. Figure S2 shows that three lines are presented on the screen above the index, middle, and ring fingers, which I imagine the participant is meant to use as a guide. But it is at least conceivable that the perceived location or orientation of the finger might not correspond exactly to these lines. While the method can deal gracefully with proximal-distal translations of the fingers (i.e., with the intercept parameter of the regression), it isn't clear how the participant is supposed to respond if their proprioceptive perception of finger location is translated left-right or rotated relative to the lines on the screen. I also worry that presenting a long, thin line to represent each finger on the screen may not be a neutral method and may prime participants to represent the finger as long and thin.

      (2) The task used here fits within a wider family of tasks in the literature using localization judgments of multiple landmarks to map body representations. I feel that some discussion of this broader set of tasks and their use to measure body representation and plasticity is notably absent from the paper. It is also striking to me that some of the present authors have themselves recently criticized the use of landmark localization methods as a measure of represented body size and shape (Peviani et al, 2024, Current Biology). It is therefore surprising to see them use this task here as a measure of represented finger length without commenting on this issue.

      (3) 18 participants strikes me as a relatively small sample size for this type of study. It weakens the manuscript that the authors do not provide any justification, or even comment on, the sample size. This is especially true as participants are excluded from the entire sample, and from specific analyses, on rather post-hoc grounds.

      (4) I have some concerns about the interpretation of contraction in stage 2. The authors claim that wearing the finger extended produces "a contraction",i.e., an "under-representation" (page 12). But in both experiments, regression slopes in stage 2 were not significantly different from 1 (i.e., 0.98 [SE: 0.07] in Exp 1a and 1.04 [SE: 0.09] in Experiment 1b). So how can that be interpreted as "under-representation"?

      (5) I also have concerns about the interpretation of the stretch that is claimed to occur following training. In Exp 1a, regression slopes in stage 3 are on average 1.15. That is LESS than in the pretest at stage 1 (mean: 1.16). The idea of stretch only comes about because of the lower slopes in stage 2, which the authors have interpreted as reflecting contraction. So what the authors call stretch and a 2nd form of plasticity could just be the contraction from stage 2 wearing off or dissipating, since perceived finger length in stage 3 just appears to return to the baseline level seen in stage 1. While the authors describe their results in terms of three distinct forms of plasticity, these are not in fact statistically independent. The dip in regression slopes in stage 2 is interpreted as evidence for two distinct plasticity effects, which I do not find convincing.

      (6) The distinction between plasticity at stage 3 (which appears specific to augmentation) and plasticity at stage 4 (which does not appear specific, as it also occurs in Experiment 1b) feels strained. This feels like a very subtle distinction, and the theoretical significance of it is not convincingly developed.

      (7) The reporting of statistics is not always consistent. For example, 95%CIs are presented for regression slopes in stages 1, 3, and 4, but not for stage 2. Statistics are performed on regression slopes, except for one t-test on page 7 comparing lengths in cm. Estimates of effect size would be nice additions to statistical tests.

      (8) Minor point: On page 4, the authors write, "These included sorting colored blocks, stacking a Jenga tower, and sorting pegs into holes; the latter task required fine-grained manipulation and was used as our outcome measure of motor learning." This suggests that peg sorting was the outcome measure, but in Figure 1D, Jenga is presented as the outcome measure.

    3. Reviewer #2 (Public review):

      Summary:

      This study aimed to explore dynamic changes in the somatosensory representation of both the body and artificial body parts. The study investigated how proprioceptive localisation along the finger changes when participants wear, actively use, and then remove a hand augmentation device - a rigid finger-extension. By mapping perceived target locations along the biological finger and the extension across multiple stages, the authors aim to characterise how the somatosensory system updates our spatial body representation during and after interaction with body augmentation technology.

      Strengths:

      The manuscript addresses an interesting question of how augmentation devices alter proprioceptive localisation abilities. Conceptually, the work moves beyond classic tool-use paradigms by focusing on a device that is used with the hand to extend the fingers' abilities (versus a tool that is simply used by the hand), and by attempting to map perceived spatial structure across both biological and artificial segments within the same framework.

      A major strength is the multi-stage design, which samples localisation abilities at baseline, the beginning of device wear, post-training, and immediately post-removal. This provides a richer characterisation of short-term adaptation compared to a simple pre/post comparison. The dense sampling across stages and target locations generates a rich behavioural dataset that will be valuable to readers interested in somatosensory body representation. The within-subject, counterbalanced control session further strengthens interpretability, providing a useful comparison for interpreting stage-dependent effects, and to probe how functional training shapes changes in the perceptual representations. Finally, the augmentation device itself appears carefully engineered, with thoughtful design decisions regarding wearability, including comfort and customised fit. The manuscript is also communicated clearly, with transparent reporting of analyses and succinct figures that make the pattern changes across stages straightforward to evaluate.

      Weaknesses:

      There is conceptual ambiguity in how the regression outcomes are interpreted in relation to perceived length and spatial integration. The manuscript treats regression slope as a proxy for "length perception" and discards the intercept as "spatial bias," but in this localisation task translation (intercept) and scaling (slope) are coupled: changes in anchoring at the proximal baseline (intercept) or distal endpoint can generate slope differences without uniform rescaling across the mapped surface. Relatedly, the analyses do not establish whether the reported effects are global across targets or disproportionately driven by the most distal locations. This limits the strength of inferences about "partitioning" or "reallocation" of representational space across biological and artificial segments. Some interpretive statements also appear stronger than the evidence supports (e.g., describing the stage 2 bio-extension map as "geometrically accurate", despite Bayes factors that provide only anecdotal support for no difference from true length). Extensive repeated judgements to a fixed set of locations may additionally stabilise response strategies or anchoring even without feedback, complicating the separation of body-representation change from task-specific calibration.

      The manuscript would also benefit from clearer conceptual framing of what the device is and what its training probes are. The device is described variably as an "artificial finger" versus a rigid "finger extension," with different implications for perception and function. In addition, the training tasks appear to emphasise manipulation and dexterity more than scenarios requiring an extended reachable workspace (indeed, participants appear to have performed at least as well, if not better, in the control training), which brings into question whether participants explored the device's intended functionality and possible proprioceptive consequences. The control experiment is thoughtfully designed to test whether functional training contributes to the stage 3 changes, but because localisation is not performed while wearing the short device, the design does not resolve whether the stage 2 change and the post-removal aftereffect are specific to the augmentative extension versus more general consequences of wearing a device on the finger (and the following possible distorted distal cues).

      Finally, the immediate post-removal aftereffects are intriguing, but the mechanistic interpretation remains underspecified. As presented within the internal model framework, the magnitude and consistency of the aftereffect following brief exposure are difficult to reconcile with the stability expected from a lifetime biological finger model, and because the aftereffect is assessed only immediately after removal, its time course and functional significance remain unclear.

    4. Reviewer #3 (Public review):

      Summary:

      The study aims to investigate sensorimotor plasticity mechanisms by exposing a cohort of 20 subjects to manipulation activities while using wearable finger extensions. With a series of experiments involving localization and motor tasks, the authors provide evidence that the finger extensions are integrated into the body representation of the subjects.

      Strengths:

      The study deserves attention, and the psychophysical protocols are carefully designed, and the statistical analyses are solid.

      Weaknesses:

      However, the current version of the manuscript, in my opinion, makes an exaggerated use of the term plasticity, and this should be amended. This is because the authors support the plasticity claims with psychophysical experiments, without providing evidence of neural-plasticity mechanisms (e.g., neuroimaging methods are not used).

      The authors are recommended to revise the wording of the manuscript and possibly perform additional experiments with brain imaging methods (e.g., EEG or fMRI).

    1. eLife Assessment

      This paper investigates the Achilles' heel of an aggressive pediatric bone cancer known as Ewing sarcoma. The authors aimed to better understand how its previously undruggable drivers mediate oncogenic mechanisms using several omics approaches. Transcriptomic changes aligned with their findings provide convincing evidence for the role of a short alpha helix in the DNA binding domain of FLI1 in modulating binding to GGAA microsatellites and promoting enhancer activity. The study provides valuable new insights into the underlying oncogenic mechanisms in Ewing sarcoma.

    2. Reviewer #1 (Public review):

      Summary:

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult and better understanding how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediate EWS-FLI cis activity (at msat) and to generate the formation of specific TADs. Furthermore, cells expressing DBD deficient EWS-FLI display very poor colony forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      Strengths:

      The group has strong expertise in Ewing sarcoma genetics and epigenetics and also in using and analyzing this model (Theisen et al., 2019; Boone et al., 2021; Showpnil et al., 2022).

      They aim at better understanding how EWS-FLI mediated its oncogenic activity, which is critical to eventually identifying novel therapies against this aggressive cancer.

      They use the most recent state-of-the-art omics methods to investigate transcriptome, epigenetics, and genome conformation methods. In particular, Micro-C enables achieving up to 1kb resolved 3D chromatin structures, making it possible to investigate a large number of TADs and sub-TADs structures where EWS-FLI1 mediates its oncogenic activity.

      They performed all their experiments in an Ewing sarcoma genetic background (A673 cells) which circumvents bias from previously reported approaches when working in non-orthologous cell models using similar approaches.

      Weaknesses:

      The main weakness stems from the poor reproducibility of the Micro-C data. Indeed, the distances and clustering observed between replicates appear to be similar to, or even greater than, those observed between biological conditions. For instance, in Figure 1B, we do not observe any clear clustering among DBD1, DBD2, DBD+1, and DBD+2. Although no further experiments were performed, the authors tempered their claims by rephrasing aspects related to this issue and the reviewer also acknowledged that the transcriptomic data are convincing and support their findings.

      Regarding DBD stability and the cycloheximide experiments requested to rule out any half-life bias of DBD (as higher stability of the re-expressed DBD+ could also partially explain the results independently of a 3D conformational change), the reviewer acknowledged that the WB, RNA-seq data and agar assays presented by the authors appear reproducible across experiments.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult and better understanding how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediate EWS-FLI cis activity (at msat) and to generate the formation of specific TADs. Furthermore, cells expressing DBD deficient EWS-FLI display very poor colony forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      This new version of the study comprises as requested new data from an additional cell line. The new data has strengthened the manuscript. Nevertheless, some of the arguments of the authors pertaining to the limitations of immunoblots to assess stability of the DBD constructs or the poor reproducibility of the Micro C data remain problematic. While the effort to repeat MicroC in a different cell line is appreciated, the data are as heterogeneous as those in A673 and no real conclusion can be drawn. The authors should tone down their conclusions. If DBD has a strong effect on chromatin organization, it should be reproducible and detectable. The transcriptomic and cut and tag data are more consistent and provide robust evidence for their findings at these levels. 

      We agree that the Micro-C data have more apparent heterogeneity within and across cell lines as compared to other analyses such as our included CUT&Tag and RNA-seq. We addressed the possible limitations of the technique as well as inherent biology that might be driving these findings in our previous responses. Despite the poor clustering on the PCA plots, our analysis on differential interacting regions, TADs and loops remain consistent across both cell lines. We are confident that these findings reflect the context of transcriptional regulation by the constructs, therefore the role of the alpha-helix in modulating chromatin organization. To address the concerns raised by the editors and reviewers for the strength of the conclusions we drew from the Micro-C findings we have made changes to the language used to describe them throughout the manuscript. Find these changes outlined below.

      • On lines 70-71, "is required to restructure" was changed to "is implicated in restructuring of"

      • On line 91, "is required for" was changed to "participates in"

      • On line 98, "is required for" changed to "is potentially required for"

      • On line 360-361, "is required for restructuring" changed to "participates in restructuring"

      Concerning the issue of stability of the DBD and DBD+ constructs, a simple protein half-life assay (e.g. cycloheximide chase assay) could rule out any bias here and satisfactorily address the issue.

      While we generally agree that a cycloheximide assay is a relatively simple approach to look at protein half-life, as we discussed last me the assays included in this paper are performed at equilibrium and rely on the concentration of protein at the me of the assay. This is particularly true for assays involving crosslinking, like Micro-C. As discussed in our prior response, western blots are semi quantitative at best, even when normalized to a housekeeping protein. In analyzing the relative protein concentration of DBD vs. DBD+ with relative protein intensities first normalized to tubulin and using the wildtype EWSR1::FLI1 rescue as a reference point, we find that there is no statistical difference in the samples used for micro-C here (Author responseimage 1A) or across all of the samples that we have used for publication (Author response image 1B). This does show that DBD generally has more variable expression levels relative to wildtype EWSR1::FLI1, and this is consistent with our experience in the lab.

      Nonetheless, we did attempt to perform the requested cycloheximide chase experiment to determine protein stability. Unfortunately, despite an extensive number of troubleshooting attempts, we have not been able to get good expression of DBD for these experiments. The first author who performed this work has left the lab and we have moved to a new lab space since the benchwork was performed. We continue to try to troubleshoot to get this experimental system for DBD and DBD+ to work again. When we tried to look at stability of DBD+ following cycloheximide treatment, there did appear to be some difference in protein stability (Author response image 2). However, these conditions are not the same conditions as those we published, they do not meet our quality control standards for publication, and we are concerned about being close to the limit of detection for DBD throughout the later timepoints. Additional studies will be needed with more comparable expression levels between DBD and DBD+ to satisfactorily address the reviewer concerns.

      Author response image 1.

      Expression Levels of DBD and DBD+ Across Experiments. Expression levels of DBD and DBD+ protein based on western blot band intensity normalized by tubulin band intensity. Expression levels are relative to wildtype EWSR1::FLI1 rescue levels and are calculated for (A) A673 samples used for micro-C and (B) all published studies of DBD and DBD+. P-values were calculated with an unpaired t-test.

      Author response image 2.

      CHX chase assay to determine the stability of DBD and DBD+. (A) Knock-down of endogenous EWSR1::FLI1 detected with FLI1 ab and rescue with DBD and DBD+ detected with FLAG ab. (B) CHX chase assay to determine the stability of DBD and DBD+ in A-673 cells with quantification of the protein levels (n=3). Error bars represent standard deviation. The half-lives (t1/2) of DBD and DBD+ were listed in the table.

      Suggestions:

      The Reviewing Editor and a referee have considered the revised version and the responses of the referees. While the additional data included in the new version has consolidated many conclusions of the study, the MicroC data in the new cell line are also heterogeneous and as the authors argue, this may be an inherent limitation of the technique. In this situation, the best would be for the authors to avoid drawing robust conclusions from this data and to acknowledge its current limitations.

      As discussed above, we have changed the language regarding our conclusions from micro-C data to soften the conclusions we draw per the Editor’s suggestion.

      The referee and Reviewing Editor also felt that the arguments of the authors concerning a lack of firm conclusions on the stability of EWS-FLI1 under +/-DBD conditions could be better addressed. We would urge the authors to perform a cycloheximide chase type assay to assess protein half-life. These types of experiments are relatively simple to perform and should address this issue in a satisfactory manner.

      As discussed above, we do not feel that differences in protein stability would affect the results here because the assays performed required similar levels of protein at equilibrium. Our additional analyses in this response shows that there are not significant differences between DBD and DBD+ levels in samples that pass quality control and are used in published studies. However, we attempted to address the reviewer and editor comments with a cycloheximide chase assay and were unable to get samples that would have passed our internal quality control standards. These data may suggest differences in protein stability, but it is unclear that these conditions accurately reflect the conditions of the published experiments, or that this would matter with equal protein levels at equilibrium.

    1. eLife Assessment

      This fundamental manuscript describes how the posterolateral cortical amygdala (plCoA) generates appetitive or aversive behaviors in response to odors. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

    2. Reviewer #1 (Public review):

      Summary:

      This study by Howe and colleagues investigates the role of the posterolateral cortical amygdala (plCoA) in mediating innate responses to odors, specifically attraction and aversion. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. They show that specific glutamatergic neurons in the anterior and posterior regions of plCoA are responsible for driving attraction and avoidance, respectively, and that these neurons project to distinct downstream regions, including the medial amygdala and nucleus accumbens, to control these responses.

      Strengths:

      The major strength of the study is the thoroughness of the experimental approach, which combines advanced techniques in neural manipulation and mapping with high-resolution molecular profiling. The identification of a topographically organized circuit in plCoA and the connection between molecularly defined populations and distinct behaviors is a notable contribution to understanding the neural basis of innate motivational responses. Additionally, the use of fucntional manipulations adds depth to the findings, offering valuable insights into the functionality of specific neuronal populations.

      Weaknesses:

      Previously described weaknesses in the study's methods and interpretation were fully addressed during revision. Locomotor behavior of the mice during head-fixed imaging experiments was added and analysis of the correlation of locomotion with neural activity was also added.

      This work provides significant insights into the neural circuits underlying innate behaviors and opens new avenues for further research. The findings are particularly relevant for understanding the neural basis of motivational behaviors in response to sensory stimuli, and the methods used could be valuable for researchers studying similar circuits in other brain regions. If the authors address the methodological issues raised, this work could have a substantial impact on the field, contributing to both basic neuroscience and translational research on the neural control of behavior.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by the Root laboratory and colleagues describes how the posterolateral cortical amygdala (plCoA) generates valenced behaviors. Using a suite of methods, the authors demonstrate that valence encoding is mediated by several factors, including spatial localization of neurons within the plCoA, glutamatergic markers, and projection. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

      Strengths:

      - The manuscript is well constructed, containing lots of data sets and clearly presented, in spite of the abundance of experimental results.

      - The authors should be commended for their rigorous anatomical characterizations and post-hoc analysis. In the field of circuit neuroscience, this is rarely done so carefully, and when it is, often new insights are gleaned as is the case in the current manuscript.

      - The combination of molecular markers, behavioral readouts and projection mapping together substantially strengthens the results.

      - The focus on this relatively understudied brain region in the context is valence is well appreciated, exciting and novel.

      Weaknesses:

      The weaknesses noted in the primary review have all been addressed adequately.

    4. Reviewer #3 (Public review):

      Summary:

      Combining electrophysiological recording, circuit tracing, single cell RNAseq, and optogenetic and chemogenetic manipulation, Howe and colleagues have identified a graded division between anterior and posterior plCoA and determined the molecular characteristics that distinguish the neurons in this part of the amygdala. They demonstrate that the expression of slc17a6 is mostly restricted to the anterior plCoA whereas slc17a7 is more broadly expressed. Through both anterograde and retrograde tracing experiments, they demonstrate that the anterior plCoA neurons preferentially projected to the MEA whereas those in the posterior plCoA preferentially innervated the nucleus accumbens. Interestingly, optogenetic activation of the aplCoA drives avoidance in a spatial preference assay whereas activating the pplCoA leads to preference. The data support a model that spatially segregated and molecularly defined populations of neurons and their projection targets carry valence specific information for the odors. Moreover, the intermingling of neurons in the plCoA is consistent with prior observations. The presence of a gradient rather than a distinct separation of the cells fits the model being proposed. The discoveries represent a conceptual advance in understanding plCoA function and innate valence coding in the olfactory system.

      Strengths:

      The strongest evidence supporting the model comes from single-cell RNASeq, genetically facilitated anterograde and retrograde circuit tracing, and optogenetic stimulation. The evidence clear demonstrates two molecularly defined cell populations with differential projection targets. Stimulating the two populations produced opposite behavioral responses.

      Weaknesses:

      The weaknesses noted in primary review have all been addressed adequately.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Howe and colleagues investigates the role of the posterolateral cortical amygdala (plCoA) in mediating innate responses to odors, specifically attraction and aversion. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. They show that specific glutamatergic neurons in the anterior and posterior regions of plCoA are responsible for driving attraction and avoidance, respectively, and that these neurons project to distinct downstream regions, including the medial amygdala and nucleus accumbens, to control these responses.

      Strengths:

      The major strength of the study is the thoroughness of the experimental approach, which combines advanced techniques in neural manipulation and mapping with high-resolution molecular profiling. The identification of a topographically organized circuit in plCoA and the connection between molecularly defined populations and distinct behaviors is a notable contribution to understanding the neural basis of innate motivational responses. Additionally, the use of functional manipulations adds depth to the findings, offering valuable insights into the functionality of specific neuronal populations.

      Weaknesses:

      There are some weaknesses in the study's methods and interpretation. The lack of clarity regarding the behavior of the mice during head-fixed imaging experiments raises the possibility that restricted behavior could explain the absence of valence encoding at the population level.

      We agree with idea that head-fixation may alter the state of the animal and the neural encoding of odor. To address this, we have provided further analysis of walking behavior during the imaging sessions, which is provided in Figure S2. Overall, we could not identify any clear patterns in locomotor behavior that are odor-specific. Moreover, when neural activity was sorted depending on the behavioral state (walking, pausing or fleeing) we didn’t observe any apparent patterns in odor-evoked neural activity. This is now discussed in the Results and Limitations sections of the manuscript.

      Furthermore, while the authors employ chemogenetic inhibition of specific pathways, the rationale for this choice over optogenetic inhibition is not fully addressed, and this could potentially affect the interpretation of the results.

      The rationale was logistical. First, inhibition of over a timescale of minutes is problematic with heat generation during prolonged optical stimulation. Second, our behavioral apparatus has a narrow height between the ceiling and floor, making tethering difficult. This is now explained the results section. The trade-off of using chemogenetics is that we are silencing neurons and not specific projections. However, because we find that NAc- and MeA- projecting neurons have little shared collateralization, we believe the conclusion of divergent pathways still stands. This is now discussed in the Limitations section.

      Additionally, the choice of the mplCoA for manipulation, rather than the more directly implicated anterior and posterior subregions, is not well-explained, which could undermine the conclusions drawn about the topographic organization of plCoA.

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intra plCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have now clarified this in the text.

      Despite these concerns, the work provides significant insights into the neural circuits underlying innate behaviors and opens new avenues for further research. The findings are particularly relevant for understanding the neural basis of motivational behaviors in response to sensory stimuli, and the methods used could be valuable for researchers studying similar circuits in other brain regions. If the authors address the methodological issues raised, this work could have a substantial impact on the field, contributing to both basic neuroscience and translational research on the neural control of behavior.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by the Root laboratory and colleagues describes how the posterolateral cortical amygdala (plCoA) generates valenced behaviors. Using a suite of methods, the authors demonstrate that valence encoding is mediated by several factors, including spatial localization of neurons within the plCoA, glutamatergic markers, and projection. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

      Strengths:

      - For a first submission the manuscript is well constructed, containing lots of data sets and clearly presented, in spite of the abundance of experimental results.

      - The authors should be commended for their rigorous anatomical characterizations and posthoc analysis. In the field of circuit neuroscience, this is rarely done so carefully, and when it is, often new insights are gleaned as is the case in the current manuscript.

      - The combination of molecular markers, behavioral readouts and projection mapping together substantially strengthen the results.

      - The focus on this relatively understudied brain region in the context is valence is well appreciated, exciting and novel.

      Weaknesses:

      - Interpretation of calcium imaging data is very limited and requires additional analysis and behavioral responses specific to odors should be considered. If there are neural responses behavioral epochs and responses to those neuronal responses should be displayed and analyzed.

      We have now considered this, see response above.

      - The effect of odor habituation is not considered.

      We considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Optogenetic data in the two subregions relies on very careful viral spread and fiber placement. The current anatomy results provided should be clear about the spread of virus in A-P, and D-V axis, providing coordinates for this, to ensure readers the specificity of each sub-zone is real.

      We were careful to exclude animals for improper targeting. The spread of virus is detailed in Figures S3, S8 & S9.

      - The choice of behavioral assays across the two regions doesn't seem balanced and would benefit from more congruency.

      The choice of the 4-quadrant assay was used because this study builds off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. Moreover, the approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. We did examine other non-olfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways.

      - Rationale for some of the choices of photo-stimulation experiment parameters isn't well defined.

      The parameters for photo-stimulation were based on those used in our past work (Root et al., 2014). We used a gradient of frequency from 1-10 Hz based on the idea that odor likely exists in a gradient and this was meant to mimic a potential gradient, though we don’t know if it exists. The range in stimulation frequencies appears to align with the actual rate of firing of plCoA neurons (Iurilli et al., 2017).

      Reviewer #3 (Public review):

      Summary:

      Combining electrophysiological recording, circuit tracing, single cell RNAseq, and optogenetic and chemogenetic manipulation, Howe and colleagues have identified a graded division between anterior and posterior plCoA and determined the molecular characteristics that distinguish the neurons in this part of the amygdala. They demonstrate that the expression of slc17a6 is mostly restricted to the anterior plCoA whereas slc17a7 is more broadly expressed. Through both anterograde and retrograde tracing experiments, they demonstrate that the anterior plCoA neurons preferentially projected to the MEA whereas those in the posterior plCoA preferentially innervated the nucleus accumbens. Interestingly, optogenetic activation of the aplCoA drives avoidance in a spatial preference assay whereas activating the pplCoA leads to preference. The data support a model that spatially segregated and molecularly defined populations of neurons and their projection targets carry valence specific information for the odors. The discoveries represent a conceptual advance in understanding plCoA function and innate valence coding in the olfactory system.

      Strengths:

      The strongest evidence supporting the model comes from single cell RNASeq, genetically facilitated anterograde and retrograde circuit tracing, and optogenetic stimulation. The evidence clear demonstrates two molecularly defined cell populations with differential projection targets. Stimulating the two populations produced opposite behavioral responses.

      Weaknesses:

      There are a couple of inconsistencies that may be addressed by additional experiments and careful interpretation of the data.

      Stimulating aplCoA or slc17a6 neurons results in spatial avoidance, and stimulating pplCoA or slc17a7 neurons drives approach behaviors. On the other hand, the authors and others in the field also show that there is no apparent spatial bias in odor-driven responses associated with odor valence. This discrepancy may be addressed better. A possibility is that odor-evoked responses are recorded from populations outside of those defined by slc17a6/a7. This may be addressed by marking activated cells and identifying their molecular markers. A second possibility is that optogenetic stimulation activates a broad set of neurons that and does not recapitulate the sparseness of odor responses. It is not known whether sparsely activation by optogenetic stimulation can still drive approach of avoidance behaviors.

      We agree that marking specific genetic or projection defined neurons could help to clarify if there are some neurons have more selective valence responses. However, we are not able to perform these experiments at the moment. We have included new data demonstrating that sparser optogenetic activation evokes behaviors similar in magnitude as the broader activation (see Figure S4).

      The authors show that inhibiting slc17a7 neurons blocks approaching behaviors toward 2-PE. Consistent with this result, inhibiting NAc projection neurons also inhibits approach responses. However, inhibiting aplCOA or slc17a6 neurons does not reduce aversive response to TMT, but blocking MEA projection neurons does. The latter two pieces of evidence are not consistent with each other. One possibility is that the MEA projecting neurons may not be expressing slc17a6. It is not clear that the retrogradely labeling experiments what percentage of MEA- and NACprojecting neurons express slc17a6 and slc17a7. It is possible that neurons expressing neither VGluT1 nor VGluT2 could drive aversive or appetitive responses. This possibility may also explain that silencing slc17a6 neurons does not block avoidance.

      We have now performed RNAscope staining on retrograde tracing to better define this relationship. Although the VGluT1 and VGluT2 neurons have biased projections to the MeA and NAc, respectively, there is some nuance detailed in Figure S10. Generally, MeA projecting neurons are predominately VGluT2+, whereas NAc projecting have about 20% that express both. Some (less than 35%) retrogradely labeled neurons were not detected as VGluT1 or VGluT2 positive, suggesting that other populations could also contribute. We agree that the discrepancy between MeA-projection and VGluT2 silencing is likely due to incomplete targeting of the MeA-projecting population with the VGluT2-cre line. This is included in the Discussion section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main:

      (1) For the head-fixed imaging experiments, what is the behavior of the mice during odor exposure? Could the weak reliability of individual neurons be due to a lack of approach or avoidance behavior? Could restricted behavior also explain the lack of valence encoding at the population level?

      We agree that this is a limitation of head-fixed recordings. In the revised manuscript we did attempt to characterize their behavioral response, and look for correlations in odor representation. Although we did find different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). For example, one might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors as all odors evoked a mixture of responses (Figure S2A-D, text lines 208-232). We then examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any apparent patterns in odor responses (Figure S2E,F). Lastly, we acknowledge in the text that the lack of valence encoding may be an artifact of head-fixation (see lines 849-857).

      (2) For the optogenetic manipulations of Vglut1 and Vglut2 neurons, why was the injection and fiber targeted to the medial portion of the plCoA, if the hypothesis was that these glutamatergic neuron populations in different regions (anterior or posterior) are responsible for approach and avoidance? 

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intraplCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have clarified this in the text (Lines 417-419).

      Could this explain the lack of necessity with the DREADD experiments? 

      For the loss of function experiments, a larger volume of virus was injected to cover a larger area and we did confirm targeting of the appropriate areas. Though, it is always possible that the lack of necessity is due to incomplete silencing.

      Further, why was an optogenetic inhibition approach not utilized? 

      Although optogenetic inhibition could have plausibly been used instead, we chose chemogenetic inhibition for two reasons: First, for minutes-long periods of inhibition, optical illumination poses the risk of introducing heat related effects (Owen et al., 2019). In fact, we first tried optical inhibition but controls were exhibited unusually large variance. Second, it is more feasible in our assay as it has a narrow height between the floor and lid that complicates tethering to an optic fiber. Past experiments overcame this with a motorized fiber retraction system (Root et al., 2014), but this is highly variable with user-dependent effects, so we found chemogenetics to be a more practical strategy. We have added a sentence to explain the rationale (see lines 561-563).

      (3) The specific subregion of the nucleus accumbens that was targeted should be named, as distinct parts of the nucleus accumbens can have very different functions. 

      We attempted to define specific subregions of the nucleus accumbens and found that plCoA projection is not specific to the shell or core, anterior or posterior, rather it broadly innervates the entire structure. We have added a note about this in manuscript (see lines 470-471). Given that we did not find notable subregion-specific outputs within the NAc, targeting was directed to the middle region of NAc, with coordinates stated in the methods. 

      (4) Why was an intersectional DREADD approach used to inhibit the projection pathways, as opposed to optogenetic inhibition? The DREADD approach could potentially affect all projection targets, and the authors might want to address how this could influence the interpretation of the results.

      This is partly addressed above in point 2. As for interpretation, we acknowledge that the intersectional approach silences the neurons projecting to a given target and not the specific projection and we have been careful with the wording. Although this may complicate the conclusion, we did map the collaterals for NAc and MeA projecting neurons and find that neurons do not appreciably project to both targets and have minimal projections to other targets. We have now taken care to state that we silence the neurons projecting to a structure, not silencing the projection, and we acknowledge this caveat. However, since the MeA- and NAcprojecting neurons appear to be distinct from each other (largely not collateralizing to each other), the conclusion that these divergent pathways are required still stands. We have added discussion of this in the Limitations section (see lines 859-863).

      Minor:

      (1) Line 402 needs a reference.

      We have added the missing reference (now line 441).

      (2) The Supplemental Figure labeling in the main text should be checked carefully.

      Thank you for pointing this out. We have fixed the prior errors.

      (3) Panel letter D is missing from Figure 2.

      This has been fixed.

      Reviewer #2 (Recommendations for the authors):

      Major Concerns, additional experiments:

      - In the calcium imaging experiments mice were presented with the same odor many times. Overall responses to odor presentations were quite variable and appear to habituate dramatically (Figure S1F). The general conclusion from these experiments are a lack of consistent valence-specific responses of individual neurons, but I wonder if this conclusion is slightly premature. A few potential explanatory factors that may need additional attention are: -First, despite recording video of the mouse's face during experiments, no behavioral response to any odor is described. Is it possible these odors when presented in head-fixed conditions do not have the same valence?

      Yes, we agree that this is a possibility. We have added a discussion in the Limitations section (see lines 849-857). We have also added additional behavioral analysis discussed below.

      On trials with neural responses are there behavioral responses that could be quantified? 

      We have now added data in which we attempt to characterize their behavioral response, to look for correlations in odor representation (see lines 208-228). Although we did observe different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). One might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors (Figure S2A-D). Next, we examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any meaningful differences in odor responses (Figure S2E,F). Lastly, we acknowledge that the odor representation may be different in freely moving animals that exhibit dynamic responses to odor (see lines 859-857).

      - Habituation seems to play a prominent role in the neural signals, is there a larger contribution of valence if you look only at the first delivery (or some subset of the 20 presentations) of an odor type for a given trial? 

      Indeed, we considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Is it reasonable to exclude valence encoding as a possibility when largely neurons were unresponsive to the positive valence odors (2PE and peanut) chosen when looking at the average cluster response (Figure 1F)? 

      It is true that we see fewer neurons responding to the appetitive odors (Figure 1H) and smaller average responses within the cluster, but some neurons do respond robustly. If these were valence responses, we would predict that neural responses should be similarly selective, but we do not observe any such selectivity. The sparseness of responses to appetitive odors does cause the average cluster analysis (Figure 1F) to show muted responses to these odors, consistent with the decreased responsivity to appetitive odors. Moreover, single neuron response analysis reveals that a given neuron is not more likely to respond to appetitive or aversive odors with any selectivity greater than chance. For these reasons, we think it is reasonable to conclude an absence of valence responses, which is consistent with the conclusion from another report (Iurilli et al., 2017).

      - While the preference and aversion assay with 4 corners is an interesting set-up and provides a lot of data for this particular manuscript. It would be helpful to test additional behaviors to determine whether these circuits are more conserved. As it stands the current manuscript relies on very broad claims using a single behavioral readout. Some attempts to use head-fixed approaches with more defined odor delivery timelines and/or additional valenced behavioral readouts is warranted.

      We appreciate the suggestion, but are not able to perform these experiments at the moment. The choice of the 4-quadrant assay was used because it built off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. The approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. Moreover, we did examine other nonolfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways. Lastly, we have tried to define parameters for head-fixed behavior that would permit correlation of neural responses with behavior, including longer stimulations and closed loop locomotion control of odor concentration, but were unsuccessful at establishing parameters that generated reliable behavioral responses. We acknowledge that one limitation of the study is the limited behavioral tests with two odors and whether the circuits are more broadly necessary for other odors. 

      Minor comments:

      • Please define PID in the Results when it is first introduced.

      Done (see line 154)

      • Line 412 Figure S5C-N should be Figure S6C-N.

      Fixed. Now Figure S8C-N due to additional figures (see line 451).

      • Throughout the Discussion it would be helpful if the authors referred to specific Figure panels that support their statements (e.g. lines 654-656 "[...] which is supported by other findings presented here showing that both VGluT2+ and VGluT1+ neurons project to MeA, while the projection to NAc is almost entirely composed of VGluT1+ neurons".

      Thank you for the suggestion. We have added figure references in the discussion.

      • Line 778 "producing" should be "produce".

      Corrected (see line 840)

      • The figures are very busy, especially all the manipulations. The authors are commended for including each data point, but they might consider a more subtle design (translucent lines only for each animal, and one mean dot for the SEM), just to reduce the overall clutter of an already overwhelming figure set. But this is ultimately left to the authors to resolve and style to their liking. 

      Thank you for the suggestion. We have tried some different styles but like the original best.

      Reviewer #3 (Recommendations for the authors):

      If within reach, I suggest that the author determine the percentage of retrogradely labeled neurons to NAc or MEA that expresses GluT1 and GluT2. 

      We have done this for the middle region plCoA that has the greatest mixture of cell types (See Figure S10, lines 504-517). We find that the MeA projecting neurons are mostly VGluT2+ with a minority that express both VGluT1 and VGlut2. NAc-projecting neurons are primarily VGluT1+ with about 20% expressing VGlut2 as well.

      It would also be nice to sparse label of aplCoA and pplCoA using ChR2 to see if sparse activation drives approach or avoidance. 

      We agree that it would be useful to vary the sparseness of the ChR2 expression, to see if produces similar results. We examined this using sparsely labeled odor ensembles, as previously done (Root et al., 2014). Briefly, we used the Arc-CreER mouse to label TMT responsive neurons with a cre-dependent ChR2 AAV vector targeted to the anterior or posterior regions, while previously we had broadly targeted the entirety of plCoA. We had established that this labeling method captures about half of the active cells detected by Arc expression, which is on the order of hundreds of neurons rather than thousands by broad cre-independent expression. Remarkably, we get effects similar in magnitude that are not significantly different from that with broader activation of the anterior or posterior domains (see new Figure S4, lines 267-288). It still remains possible that there is a threshold number of neurons that are necessary to elicit behavior, but that is beyond the scope of the current study. However, these data indicate that the effect of activating anterior and posterior domains is not an artifact of broad stimulation.

    1. eLife Assessment

      This is an important study with direct implications for the rational selection of antimalarial drug combinations. The authors present data demonstrating antagonism between 4-aminoquinoline antimalarials and peroxide drugs under physiologically relevant conditions, including robust effects at the trophozoite stage and for chloroquine at the ring stage. While the conclusions are based on in vitro assays and further work will be needed to fully resolve the underlying mechanism, the findings are convincing and provide a strong rationale for evaluating drug combinations in relevant preclinical models prior to clinical testing.

    2. Reviewer #1 (Public review):

      Summary:

      This study set out to investigate potential pharmacological drug-drug interactions between the two most common antimalarial classes, the artemisinins and quinolines. There is strong rationale for this aim, because drugs from these classes are already widely-used in Artemisinin Combination Therapies (ACTs) in the clinic, and drug combinations are an important consideration in the development of new medicines. Furthermore, whilst there is ample literature proposing many diverse mechanisms of action and resistance for the artemisinins and quinolines, it is generally accepted that the mechanisms for both classes involve heme metabolism in the parasite, and that artemisinin activity is dependent on activation by reduced heme. The study was designed to measure drug-drug interactions associated with a short pulse exposure (4 h) that is reminiscent of the short duration of artemisinin exposure obtained after in vivo dosing. Clear antagonism was observed between dihydroartemisinin (DHA) and chloroquine, which became even more extensive in chloroquine-resistant parasites. Antagonism was also observed in this assay for the more clinically-relevant ACT partner drugs piperaquine and amodiaquine, but not for other ACT partners mefloquine and lumefantrine, which don't share the 4-aminoquinoline structure or mode of action. Interestingly, chloroquine induced an artemisinin resistance phenotype in the standard in vitro Ring-stage Survival Assay, whereas this effect was not as extensive for piperaquine.

      The authors also utilised a heme-reactive probe to demonstrate that the 4-aminoquinolines can inhibit heme-mediated activation of the probe within parasites, which suggests that the mechanism of antagonism involves the inactivation of heme, rendering it unable to activate the artemisinins. Measurement of protein ubiquitination showed reduced DHA-induced protein damage in the presence of chloroquine, which is also consistent with decreased heme-mediated activation, and/or with decreased DHA activity more generally.

      Overall, the study clearly demonstrates a mechanistic antagonism between DHA and 4-aminoquinoline antimalarials in vitro. It is interesting that this combination is successfully used to treat millions of malaria cases every year, which may raise questions about the clinical relevance of this finding. However, the conclusions in this paper are supported by multiple lines of evidence and the data is clearly and transparently presented, leaving no doubt that DHA activity is compromised by the presence of chloroquine in vitro. It is perhaps fortunate the that the clinical dosing regimens of 4-aminoquinoline-based ACTs have been sufficient to maintain clinical efficacy despite the non-optimal combination. Nevertheless, optimisation of antimalarial combinations and dosing regimens is becoming more important in the current era of increasing resistance to artemisinins and 4-aminoquinolines. Therefore, these findings should be considered when proposing new treatment regimens (including Triple-ACTs) and the assays described in this study should be performed on new drug combinations that are proposed for new or existing antimalarial medicines.

      Strengths:

      This manuscript is clearly written and the data presented is clear and complete. The key conclusions are supported by multiple lines of evidence, and most findings are replicated with multiple drugs within a class, and across multiple parasite strains, thus providing more confidence in the generalisability of these findings across the 4-aminoquinoline and peroxide drug classes.

      A key strength of this study was the focus on short pulse exposures to DHA (4 h in trophs and 3 h in rings), which is relevant to the in vivo exposure of artemisinins. Artemisinin resistance has had a significant impact on treatment outcomes in South-East Asia, and is now emerging in Africa, but is not detected using a 'standard' 48 or 72 h in vitro growth inhibition assay. It is only in the RSA (a short pulse of 3-6 h treatment of early ring stage parasites) that the resistance phenotype can be detected in vitro. Therefore, assays based on this short pulse exposure provide the most relevant approach to determine whether drug-drug interactions are likely to have a clinically-relevant impact on DHA activity. These assays clearly showed antagonism between DHA and 4-aminoquinolines (chloroquine, piperaquine, amodiaquine and ferroquine) in trophozoite stages. Interestingly, whilst chloroquine clearly induced an artemisinin-resistant phenotype in the RSA, piperaquine only had a minor impact on the early ring stage activity of DHA, which may be fortunate considering that piperaquine is a currently recommended DHA partner drug in ACTs, whereas chloroquine is not.

      The evaluation of additional drug combinations at the end of this paper is a valuable addition, which increases the potential impact of this work. The finding of antagonism between piperaquine and OZ439 in trophozoites is consistent with the general interactions observed between peroxides and 4-aminoquinolines, and it may be interesting to see whether piperaquine impacts the ring-stage activity of OZ439.

      The evaluation of reactive heme in parasites using a fluorescent sensor, combined with the measurement of K48-linked ubiquitin, further support the findings of this study, providing independent read-outs for the chloroquine-induced antagonism.<br /> The in-depth discussion of the interpretation and implications of the results are an additional strength of this manuscript. Whilst the discussion section is rather lengthy, there are important caveats to the interpretation of some of these results, and clear relevance to the future management of malaria that require these detailed explanations.

      Overall, this is a high quality manuscript describing an important study that has implications for the selection of antimalarial combinations for new and existing malaria medicines.

      Weaknesses:

      This study is an in vitro study of parasite cultures, and therefore caution should be taken when applying these findings to decisions about clinical combinations. The drug concentrations and exposure durations in these assays are intended to represent clinically relevant exposures, although it is recognised that the in vitro system is somewhat simplified and there may be additional factors that influence in vivo activity. This limitation is reasonably well acknowledged in the manuscript.

      It is also important to recognise that the majority of the key findings regarding antagonism are based on trophozoite-stage parasites, and one must show caution when generalising these findings to other stages or scenarios. For example, piperaquine showed clear antagonism in trophozoite stages, but minimal impact in ring stages under these assay conditions.

      A key limitation is the interpretation of the mechanistic studies that implicate heme-mediated artemisinin activation as the mechanism underpinning antagonism by chloroquine. This study did not directly measure the activation of artemisinins. The data obtained from the activation of the fluorescent probe are generally supportive of chloroquine suppressing the heme-mediated activation of artemisinins, and I think this is the most likely explanation, but there are significant caveats to consider. Primarily, the inconsistency between the fluorescence profile in the chemical reactions and the cell-based assay raise questions about the accuracy of this readout. In the chemical reaction, mefloquine and chloroquine showed identical inhibition of fluorescence, whereas piperaquine had minimal impact. On the contrary, in the cell, chloroquine and piperaquine had similar impacts on fluorescence, but mefloquine had minimal impact. This inconsistency indicates that the cellular fluorescence based on this sensor does not give a simple direct readout of the reactivity of ferrous heme, and therefore, these results should be interpreted with caution. Indeed, the correlation between fluorescence and antagonism for the tested drugs is a correlation, not causation. There could be several reasons for the disconnect between the chemical and biological results, either via additional mechanisms that quench fluorescence, or the presence of biomolecules that alter the oxidation state or coordination chemistry of heme or other potential catalysts of this sensor. It is possible that another factor that influences the H-FluNox fluorescence in cells also influences the DHA activity in cells, leading to the correlation with activity. It should be noted that H-FluNox is not a chemical analogue of artemisinins. It's activation relies on Fenton-like chemistry, but with a N-O rather that O-O bond, and it possesses very different steric and electronic substituents around the reactive centre, which are known to alter reactivity to different iron sources. Despite these limitations, the authors have provided reasonable justification for the use of this probe to directly visualise heme reactivity in cells, and the results are still informative.

      Another interesting finding that was not elaborated by the authors is the impact of chloroquine in the DHA dose-response curves from the ring stage assays. Detection of artemisinin resistance in the RSA generally focuses on the % survival at high DHA concentrations (700 nM) as there is minimal shift in the IC50 (see Fig 2), however, chloroquine clearly induces a shift in the IC50 (~5-fold), where the whole curve is shifted to the right, whereas the increase in % survival is relatively small. This different profile suggests that the mechanism of chloroquine-induced antagonism may be different to the mechanism of artemisinin resistance. Current evidence regarding the mechanism of artemisinin resistance generally points towards decreased heme-mediated drug activation due to a decrease in hemoglobin uptake, which should be analogous to the decrease in heme-mediated drug activation caused by chloroquine. However, these different dose response curves suggest different mechanisms are primarily responsible. Additional mechanisms have been proposed for artemisinin resistance, involving redox or heat stress responses, proteostatic responses, mitochondrial function, dormancy and PI3K signalling among others. Whilst the H-FluNox probe generally supports the idea that chloroquine suppresses heme-mediated DHA activation, it remains plausible that chloroquine could induce these, or other, cellular responses that suppress DHA activity.

      Impact:

      This study has important implications for the selection of drugs to form combinations for the treatment of malaria. The overall findings of antagonism between peroxide antimalarials and 4-aminoquinolines in the trophozoite stage are robust, and the this carries across to the ring stage for chloroquine.

      The manuscript also provides a plausible mechanism to explain the antagonism, although future work will be required to further explore the details of this mechanism and to rule out alternative factors that may contribute.

      Overall, this is an important contribution to the field and provides a clear justification for the evaluation of potential drug combinations in relevant in vitro assays before clinical testing.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Rosenthal and Goldberg investigates interactions between artemisinins and its quinoline partner drugs currently used for treating uncomplicated Plasmodium falciparum malaria. The authors show that chloroquine (CQ), piperaquine, and amodiaquine antagonize dihydroartemisinin (DHA) activity, and in CQ-resistant parasites, the interaction is described as "superantagonism," linked to the pfcrt genotype. Mechanistically, application of the heme-reactive probe H-FluNox indicates that quinolines render cytosolic heme chemically inert, thereby reducing peroxide activation. The work is further extended to triple ACTs and ozonide-quinoline combinations, with implications for artemisinin-based combination therapy (ACT) design, including triple ACTs.

      Strengths:

      The manuscript is clearly written, methodologically careful, and addresses a clinically relevant question. The pulsing assay format more accurately models in vivo artemisinin exposure than conventional 72-hour assays, and the use of H-FluNox and Ac-H-FluNox probes provides mechanistic depth by distinguishing chemically active versus inert heme. These elements represent important refinements beyond prior studies, adding nuance to our understanding of artemisinin-quinoline interactions.

      Weaknesses:

      Several points warrant consideration. The novelty of the work is somewhat incremental, as antagonism between artemisinins and quinolines is well established. Multiple prior studies using standard fixed-ratio isobologram assays have shown that DHA exhibits indifferent or antagonistic interactions with chloroquine, piperaquine, and amodiaquine (e.g., Davis et al., 2006; Fivelman et al., 2007; Muangnoicharoen et al., 2009), with recent work highlighting the role of parasite genetic background, including pfcrt and pfmdr1, in modulating these interactions (Eastman et al., 2016). High-throughput drug screens likewise identify quinoline-artemisinin combinations as mostly antagonistic. The present manuscript adds refinement by applying pulsed-exposure assays and heme probes rather than establishing antagonism de novo.

      The dataset focuses on several parasite lines assayed in vitro, so claims about broad clinical implications should be tempered, and the discussion could more clearly address how in vitro antagonism may or may not translate to clinical outcomes. The conclusion that artemisinins are predominantly activated in the cytoplasm is intriguing but relies heavily on Ac-H-FluNox data, which may have limitations in accessing the digestive vacuole and should be acknowledged explicitly. The term "superantagonism" is striking but may appear rhetorical; clarifying its reproducibility across replicates and providing a mechanistic definition would strengthen the framing. Finally, some discussion points, such as questioning the clinical utility of DHA-PPQ, should be moderated to better align conclusions with the presented data while acknowledging the complexity of in vivo pharmacology and clinical outcomes.

      Despite these mild reservations, the data are interesting and of high quality and provide important new information for the field.

      Editor's Review of the Revision: The authors have provided a well-reasoned rebuttal to the comments of the three reviewers. Most of the changes were incorporated in their revised Discussion. Their data with the active heme probe H-FluNox are novel and the authors reveal interesting interactions between peroxide and 4-aminoquinoline-based antimalarials that open new avenues of research especially when considering antimalarial combinations that combine these chemical scaffolds. This study will be of broad interest to investigators studying and developing antimalarial drugs and combinations and the impact of Plasmodium falciparum resistance mechanisms. A minor recommendation would be that the authors state H-FluNox when referring to their small molecule probe in the abstract, so that it is captured in PubMed searches.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present an in vitro evaluation of drug-drug interactions between artemisinins and quinoline antimalarials, as an important aspect for screening the current artemisinin-based combination therapies for Plasmodium falciparum. Using a revised pulsing assay, they report antagonism between dihydroartemisinin (DHA) and several quinolines, including chloroquine, piperaquine (PPQ), and amodiaquine. This antagonism is increased in CQ-resistant strains in isobologram analyses. Moreover, CQ co-treatment was found to induce artemisinin resistance even in parasites lacking K13 mutations during the ring-stage survival assay. This implies that drug-drug interactions, not just genetic mutations, can influence resistance phenotypes. By using a chemical probe for reactive heme, the authors demonstrate that quinolines inhibit artemisinin activation by rendering cytosolic heme chemically inert, thereby impairing the cytotoxic effects of DHA. The study also observed negative interactions in triple-drug regimens (e.g., DHA-PPQ-Mefloquine) and in combinations involving OZ439, a next-generation peroxide antimalarial. Taken together, these findings raise significant concerns regarding the compatibility of artemisinin and quinoline combinations, which may promote resistance or reduce efficacy.

      With the additive profile as the comparison and a lack of synergistic effect in any of the comparisons, it is hard to contextualize the observed antagonism. Including a known synergistic pair (e.g., artemisinin + lumefantrine) would have provided a useful benchmark to assess the relative impact of the drug interactions described.

      Strengths:

      This study demonstrates the following strengths:

      • The use of a pulsed in vitro assay that is more physiologically relevant over the traditional 48h or 72h assays

      • Small molecule probes, H-FluNox, and Ac-H-FluNox to detect reactive cytosolic heme, demonstrating that quinolines render heme inert and thereby block DHA activation.

      • Evaluates not only traditional combinations but also triple-drug combinations and next-generation artemisinins like OZ439. This broad scope increases the study's relevance to current treatment strategies and future drug development.

      • By using the K13 wild-type parasites, the study suggests that resistance phenotypes can emerge from drug-drug interactions alone, without requiring genetic resistance markers.

      Weaknesses:

      • The study would benefit from a future characterization of the molecular basis for the observed heme inactivation by quinolines to support this hypothesis - while the probe experiments are valuable, they do not fully elucidate how quinolines specifically alter heme chemistry at the molecular level.

      • Suggestion of alternative combinations that show synergy could have improved the significance of the work. The invitro study did not include pharmacokinetic/pharmacodynamic modeling, hence it leaves questions about how the observed antagonism would manifest under real-world dosing conditions, necessitating furture work based on these findings.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      We appreciate the positive assessment. We recognize that since all of the work in this manuscript was done in vitro, there are reasonable concerns about the translatability of these data to clinical settings. These results should not directly inform malaria policy, but we hope that these data bring new considerations to the approach for choosing strategic antimalarial combinations. We have modified the manuscript to clarify this distinction.

      Public Reviews

      Reviewer #1 (Public Review):

      We thank the reviewer for their thoughtful summary of this manuscript. It is important to note that DHA-PPQ did show antagonism in RSAs. In this modified RSA, 200 nM PPQ alone inhibited growth of PPQ-sensitive parasites approximately 20%. If DHA and PPQ were additive, then we would expect that addition of 200 nM PPQ would shift the DHA dose response curve to the left and result in a lower DHA IC50. Please refer to Figure 4a and b as examples of additive relationships in dose-response assays. We observed no significant shift in IC50 values between DHA alone and DHA + PPQ. This suggests antagonism, albeit not to the extent seen with CQ. We have modified the manuscript to emphasize this point. As the reviewer pointed out, it is fortunate that despite being antagonistic, clinically used artemisinin-4-aminoquinoline combinations are effective, provided that parasites are sensitive to the 4-aminoquinoline. It is possible that superantagonism is required to observe a noticeable effect on treatment efficacy (Sutherland et al. 2003 and Kofoed et al. 2003), but that classical antagonism may still have silent consequences. For example, if PPQ blocks some DHA activation, this might result in DHA-PPQ acting more like a pseudo-monotherapy. However, as the reviewer pointed out, while our data suggest that DHA-PPQ and AS-ADQ are “non-optimal” combinations, the clinical consequences of these interactions are unclear. We have modified the manuscript to emphasize the later point.

      While the Ac-H-FluNox and ubiquitin data point to a likely mechanism for DHA-quinoline antagonism, we agree that there are other possible mechanisms to explain this interaction.  We have addressed this limitation in the discussion section. Though we tried to measure DHA activation in parasites directly, these attempts were unsuccessful. We acknowledge that the chemistry of DHA and Ac-H-FluNox activation is not identical and that caution should be taken when interpreting these data. Nevertheless, we believe that Ac-H-FluNox is the best currently available tool to measure “active heme” in live parasites and is the best available proxy to assess DHA activation in live parasites. These points are now addressed in the discussion section. Both in vitro and in parasite studies point to a roll for CQ in modulating heme, though an exact mechanism will require further examination. Similar to the reviewer, we were perplexed by the differences observed between in vitro and in parasite assays with PPQ and MFQ. We proposed possible hypotheses to explain these discrepancies in the discussion section. Interestingly, our data corelate well with hemozoin inhibition assays in which all three antimalarials inhibit hemozoin formation in solution, but only CQ and PPQ inhibit hemozoin formation in parasites. In both assays, in-parasite experiments are likely to be more informative for mechanistic assessment.

      It remains unclear why K13 genotype influences RSA values, but not early ring DHA IC50 values. In K13<sup>WT</sup> parasites, both RSA values and DHA IC50 values were increased 3-5 fold upon addition of CQ. This suggests that CQ-mediated resistance is more robust than that conferred by K13 genotype. However, this does not necessarily suggest a different resistance mechanism. We acknowledge that in addition to modulating heme, it is possible that CQ may enhance DHA survival by promoting parasite stress responses. Future studies will be needed to test this alternative hypothesis. This limitation has been acknowledged in the manuscript. We have also addressed the reviewer’s point that other factors, including poor pharmacokinetic exposure, contributed to OZ439-PPQ treatment failure.

      Reviewer #2 (Public Review):

      We appreciate the positive feedback. We agree that there have been previous studies, many of which we cited, assessing interactions of these antimalarials. We also acknowledge that previous work, including our own, has shown that parasite genetics can alter drug-drug interactions. We have included the author’s recommended citations to the list of references that we cited. Importantly, our work was unique not only for utilizing a pulsing format, but also for revealing a superantagonistic phenotype, assessing interactions in an RSA format, and investigating a mechanism to explain these interactions. We agree with the reviewer that implications from this in vitro work should be cautious, but hope that this work contributes another dimension to critical thinking about drug-drug interactions for future combination therapies. We have modified the manuscript to temper any unintended recommendations or implications.

      The reviewer notes that we conclude “artemisinins are predominantly activated in the cytoplasm”. We recognize that the site of artemisinin activation is contentious. We were very clear to state that our data combined with others suggest that artemisinins can be activated in the parasite cytoplasm. We did not state that this is the primary site of activation. We were clear to point out that technical limitations may prevent Ac-H-FluNox signal in the digestive vacuole, but determined that low pH alone could not explain the absence of a digestive vacuole signal.

      With regard to the “reproducibility” and “mechanistic definition” of superantagonism, we observed what we defined as a one-sided superantagonistic relationship for three different parasites (Dd2, Dd2 PfCRT<sup>Dd2</sup>, and Dd2 K13<sup>R539T</sup>) for a total of nine independent replicates. In the text, we define that these isoboles are unique in that they had mean ΣFIC50 values > 2.4 and peak ΣFIC50 values >4 with points extending upward instead of curving back to the axis. As further evidence of the reproducibility of this relationship, we show that CQ has a significant rescuing effect on parasite survival to DHA as assessed by RSAs and IC50 values in early rings.

      Reviewer #3 (Public Review):

      We thank the reviewer for their positive feedback. We acknowledge that no combinations tested in this manuscript were synergistic. However, two combinations, DHA-MFQ and DHA-LM, were additive, which provides context for contextualizing antagonistic relationships. We have previously reported synergistic and additive isobolograms for peroxide-proteasome inhibitor combinations using this same pulsing format (Rosenthal and Ng 2021). These published results are now cited in the manuscript.

      We believe that these findings are specific to 4-aminoquinoline-peroxide combinations, and that these findings cannot be generalized to antimalarials with different mechanisms of action. Note that the aryl amino alcohols, MFQ and LM, were additive with DHA. Since the mechanism of action of MFQ and LM are poorly understood, it is difficult to speculate on a mechanism underlying these interactions.

      We agree with the reviewer that while the heme probe may provide some mechanistic insight to explain DHA-quinoline interactions, there is much more to learn about CQ-heme chemistry, particularly within parasites.

      The focus of this manuscript was to add a new dimension to considerations about pairings for combination therapies. It is outside the scope of this manuscript to suggest alternative combinations. However, we agree that synergistic combinations would likely be more strategic clinically.

      An in vitro setup allows us to eliminate many confounding variables in order to directly assess the impact of partner drugs on DHA activity. However, we agree that in vivo conditions are incredibly more complex, and explicitly state this.

      We agree that in the future, modeling studies could provide insight into how antagonism may contribute to real-world efficacy. This is outside the scope of our studies.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      The key weaknesses identified in this manuscript are described in the 'weaknesses' section of the public review. The major one is the inconsistency around the H-FluNox response in the chemical vs biological experiments. I can't think of a simple experiment to resolve this issue, but it is good that this data is openly provided in the manuscript. I believe there could be more discussion to clarify this limitation with the current study, and the conclusions, and particularly the title, should be softened regarding the mechanism of antagonism being based on heme reactivity.

      We have softened the title and conclusions to take into account the limitations of our studies.

      (1) Please double-check the definitions for isobologram interpretation. In most antimicrobial interaction studies, I see the threshold for antagonism at sumFIC50 of 1.5, or even 2. 1.25 is often interpreted as additive in many studies.

      We acknowledge that different studies use various cutoff values. Our interpretations for additive versus antagonistic versus superantagonistic were based not only on mean ΣFIC50 values, but also isobologram shape. For example, the flat isoboles for MFQ-DHA were clearly distinct from the curved isoboles of PPQ-DHA. It is unclear what cutoff value(s) would be most clinically relevant.

      (2) For the MFQ-PPQ interaction study, please make it clear that these drugs have very long half-lives (weeks), so the 4 h pulse assay isn't really relevant to their overall activity. It probably shows a slower onset of action, but there is plenty of drug remaining for many days in the clinical scenario, so perhaps the data from the traditional 48h assay is more relevant. The same consideration applies to OZ439, which may impact the interpretation of that data.

      We have now included the half-lives of these compounds in the discussion section. Our intent was to use a pulsing format to make these isobolograms comparable with the other assays. It is important to note that pulses can reveal stronger phenotypes that might be missed with traditional methods. Thus, while 48 h assays may better mimic in vivo conditions, they could also mask important phenotypes.

      Reviewer #3 (Recommendations for the Authors):

      I have included most of my concerns in the public review. Below are some additional specific points for consideration:

      (1) It is expected to include a synergistic combination as a control (e.g., artemisinin + lumefantrine) to contextualize the degree of antagonism observed. The experimental design should show some synergistic profiles in comparison. Adding a few experiments by including a synergistic control is needed.

      Both MFQ-DHA and LM-DHA combinations were additive, which provides context for antagonistic combinations. This is now stated in the results section pertaining to Figure 1. We have also included a reference to our previous publication in which we demonstrated that proteasome inhibitor-peroxide combinations are synergistic to additive using this same pulsing format.

      (2) Consider in vivo validation or pharmacokinetic/pharmacodynamic modeling to strengthen the translational relevance of the findings when it comes to doses and the IC50 correlations.

      We agree that this would be useful to do in future, but it is outside the scope of the current study.

      (3) It would be beneficial to include a discussion section on how the findings are generalizable to different Plasmodium falciparum genotypes (3D7, Dd2, MRA-1284) and their relevance.

      Findings were consistent across three parasite backgrounds depending on PfCRT genotype. This point has been included in the discussion section. The background of these parasites is also provided in Table 1.

      (4) Potential evaluation criteria to understand where certain combinations should be reconsidered can be included as a suggestion for the wider audience.

      Our in vitro studies suggest that pulsing isobolograms would be a useful assay to include when evaluating combination therapies. While we believe that synergistic combinations would be more strategic than antagonistic combinations, we cannot provide evaluation criteria or make recommendations for reconsidering currently used combinations.

      (5) Further elaborate on the mechanistic basis of heme inactivation by quinolines. If data are available, please include more data on the specificity of the process.

      Despite our best efforts, we were unable to evaluate quinoline-heme interactions in parasites. Even in vitro, this interaction has remined elusive for decades. We agree that this would be an important future step towards supporting a specific mechanism for quinoline-DHA antagonism.

    1. eLife Assessment

      In this study, the authors identify EOLA1 as a novel mitochondrial protein required for mitochondrial translation and normal cardiac function. The characterization of the molecular role of EOLA1 is still incomplete, and additional controls will be necessary. Nevertheless, the identification of a novel factor critical for mitochondrial gene expression and oxidative phosphorylation will be useful for cell biologists working on mitochondrial dysfunction.

    2. Reviewer #1 (Public review):

      Summary:

      Mitochondria encode a small set of proteins that are made inside the organelle by specialized ribosomes. When this mitochondrial translation system fails, oxidative phosphorylation is impaired, an outcome that is particularly harmful to energy-demanding tissues such as the heart. In this manuscript, the authors use a targeted CRISPR/Cas9 screen in cultured cells grown on galactose (a condition that forces reliance on oxidative phosphorylation) to identify genes required for mitochondrial activity. They highlight EOLA1, previously studied mainly in inflammatory contexts, as a top candidate.

      Strengths:

      The authors present data suggesting that EOLA1 is imported into mitochondria via an N-terminal targeting sequence and resides in the mitochondrial matrix. Loss of EOLA1 reduces oxygen consumption and is associated with altered mitochondrial ultrastructure. Mechanistically, affinity purification suggests interaction with mitochondrial elongation factors TUFM (mtEF-Tu), and RNA immunoprecipitation experiments enrich 12S mt-rRNA, consistent with a relationship to the small ribosomal subunit. Multiple assays, including sucrose-gradient profiling, reduced abundance of selected mtDNA-encoded proteins, and a click-chemistry labeling approach, support the conclusion that mitochondrial protein synthesis is decreased in EOLA1-deficient cells. Finally, whole-body Eola1 knockout mice show echocardiographic findings consistent with dilated cardiomyopathy and reduced levels of representative mitochondrially encoded proteins in cardiac tissue.

      How to interpret the work:

      The data support a role for EOLA1 in maintaining mitochondrial gene expression and oxidative phosphorylation capacity, and they plausibly implicate mitochondrial translation.

      Weaknesses:

      The main caveat is that the study does not yet establish how EOLA1 acts, whether it directly modulates translation elongation through TUFM, whether it is primarily required for mitoribosome biogenesis/rRNA stability, or whether it influences translation indirectly through mitochondrial stress pathways. The in vivo phenotype is intriguing, but without tissue-specific deletion/rescue and deeper cardiac pathology/mitochondrial functional measurements, it remains uncertain how directly the heart phenotype reflects a cardiomyocyte-autonomous defect in mitochondrial translation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify a previously uncharacterised regulator of mitochondrial function using a genetic screen and propose a role for this protein in supporting mitochondrial protein production. They provide evidence that the protein localises to mitochondria, interacts with components of the mitochondrial translation machinery, and is required for normal heart function in an animal model.

      Strengths:

      A major strength of the work is the use of multiple independent approaches to assess mitochondrial activity and protein production, which together provide support for the central conclusions. The in vivo data linking loss of this factor to impaired heart function are particularly compelling and elevate the relevance of the study beyond a purely cell-based context.

      Weaknesses:

      Given prior reports placing this protein outside mitochondria, its mitochondrial localisation would benefit from more rigorous and quantitative validation, and the proposed mechanism of the interaction with the mitochondrial translation machinery remains only partially explored. In addition, the physiological analysis is largely limited to the heart, leaving open questions about how broadly this pathway operates across tissues.

      Major comments:

      (1) Evidence for mitochondrial localization of EOLA1<br /> EOLA1 has previously been reported as a nuclear and cytosolic protein and is not annotated in MitoCarta 3.0, making rigorous validation of its mitochondrial localization particularly important. Although the authors provide several lines of evidence, interpretation is complicated by the use of different cell lines across localization, interaction, and functional experiments. Greater consistency in the cellular models used would strengthen the conclusions. The immunofluorescence analysis of tagged EOLA1 would also benefit from quantification across more cells and the inclusion of an additional mitochondrial marker (e.g., an outer membrane marker such as TOM20), as HSP60 staining can vary with mitochondrial state.

      (2) Normalization of OCR measurements<br /> Clarification of how Seahorse oxygen consumption rate measurements were normalized (e.g., cell number or protein content) would aid interpretation, particularly given potential effects of Eola1 loss on cell growth.

      (3) Linking interaction data to functional phenotypes<br /> Loss-of-function analyses are performed in mouse cell lines, whereas localization and interactome studies are conducted in human HEK293T cells. The absence of a human EOLA1 knockout model makes it difficult to directly connect the interaction data to the observed functional phenotypes. Additional validation or discussion of species conservation would improve clarity.

      (4) Mechanistic interpretation of the EOLA1-TUFM-12S rRNA interaction<br /> The identification of TUFM and 12S mt-rRNA as EOLA1 interactors is an interesting finding; however, the basis for prioritizing TUFM among the many mitochondrial proteins identified in the interactome is not fully explained. Providing enrichment statistics and functional categorization of mitochondrial interactors would increase transparency. In addition, the proposed role of the ASCH domain in RNA binding would be strengthened by structure-informed or mutational analysis of the conserved RNA-binding motif.

      (5) Interpretation of mitochondrial translation and protein abundance data<br /> Several assays supporting impaired mitochondrial translation would benefit from additional controls and quantification. The de novo mitochondrial translation assay (Fig. 3h) is not quantified, making it difficult to assess the magnitude and reproducibility of the effect. In addition, western blots showing reduced levels of mitochondrially encoded OXPHOS subunits (Figure 3g) lack a mitochondrial loading control (e.g., TOM20 or VDAC). Since loss of EOLA1 may affect mitochondrial mass, normalization to a mitochondrial marker is necessary. Relatedly, it would be informative to assess whether steady-state levels of mitoribosomal proteins (e.g., MRPS15, MRPL37) and nuclear-encoded OXPHOS subunits are altered upon Eola1 loss, both in knockout cell lines and in the knockout mouse.

      (6) Physiological scope of the in vivo analysis<br /> The cardiac phenotype observed in the whole-body Eola1 knockout mouse is compelling, but the focus on a single tissue limits interpretation of EOLA1's broader physiological role. Examination of additional high-energy-demand tissues would help clarify whether the observed effects are heart-specific or more general. In addition, the presence of residual EOLA1 protein bands in western blots (Figure 4a) and remaining Eola1 transcripts in qRT-PCR analyses (Extended Figure 4e) from knockout tissues should be addressed. The authors should clarify whether these signals reflect incomplete knockout, alternative isoforms, antibody cross-reactivity, or technical background.

      (7) Relationship to previously reported MT2A interaction<br /> Given prior reports of EOLA1 interaction with MT2A, a brief comment on whether MT2A was detected in the authors' co-immunoprecipitation experiments and how this relates to the proposed mitochondrial role would be useful.

    4. Reviewer #3 (Public review):

      The authors identified EOLA1 in a CRISPR/Cas9 screen for essential mitochondrial genes in a mouse B16-F10 cell line; however, no information on the library used for this screen or the list of all identified essential genes is provided. What was the p-value for EOLA1 in Figure 1b?

      The authors show that EOLA1 is indeed a mitochondrial protein (using both mouse and human cell lines). It is valuable that the authors use different cell lines to investigate the function of this protein; however, this also presents a challenge, as four different cell lines (two mouse and two human) are used across individual experiments, with no consistency between them. Knock-out (KO) experiments were performed in mouse cell lines only, and human cell lines were used in overexpression experiments, in which EOLA1 was tagged with FLAG-HA. It would be beneficial if a knock-out were also generated in a human cell line to confirm the effect on the expression of mitochondria-encoded proteins, along with a rescue experiment in which the EOLA1 protein is reintroduced into KO cells.

      Functional analysis of EOLA1: The authors performed affinity immunoprecipitation of FLAG-HA-tagged EOLA1 from stably overexpressing cells, and identified 202 co-immunoprecipitating proteins, of which 71 were known mitochondrial proteins; however, no list of these proteins is provided. Why did the authors choose TUFM? Were any mitochondrial ribosomal proteins co-immunoprecipitated, if EOLA1 is suggested to regulate translation? Were levels of TUFM affected in EOLA1-KO cells?

      The authors continued to analyze mitochondrial ribosomes using sucrose gradient fractionation and in-vitro mitochondrial translation. However, there are several technical problems with the presented data: It has been established that mitochondrial ribosomes do not form polysomes in mammalian cells but rather perform translation as monosomes. The authors indirectly confirm this: almost no 12S or 16S rRNA (Fig. 3f) or MRP proteins (Extended data 3c) are present in "polysome" fractions. Although indeed 12S and 16S rRNAs are decreased in monosome fractions, the levels of mRNAs are not different between KO and WT cells, and neither is the migration of mitochondrial ribosomal proteins. As there is no loading control provided for the sucrose gradients blots (such as SDHA, VDAC), it is not possible to assess the overall levels of mitochondrial ribosomes. The gel presented for mitochondrial translation is of poor quality, as it is impossible to identify any of the expected 13 polypeptides. Although the intensity of the signal is weaker for KO, so is the intensity in the portion of Coomassie stained gel. A better-quality gel and quantification need to be provided to support the claims.

      What is the difference between endogenous and exogenous RIP-qPCR? EOLA1 pulled down 12S rRNA without cross-linking (Figure 3d) or with UV-crosslinking (Figure 3e), however, both 12S and 16S rRNAs were enriched in UV-crosslinked cells (Figure 3c) and by UV-RIP-seq (Extended data 3b; although no control is provided here). Is no discussion offered for this observation? Is it possible that EOLA1 plays a role in the maturation of the mito-ribosome, rather than translation? Does EOLA1 co-migrate with the mito-ribosome on sucrose gradients?

      Altogether, there is insufficient evidence to support the conclusion that EOLA1 plays a role in mitochondrial translation.

      To investigate EOLA1 biological function, the authors created a whole-body EOLA1-/- mouse that exhibited no overall developmental abnormalities; however presented with an abnormal cardiac function. This is an ideal model to confirm prior observations in cellular models; however, apart from one western-blot for three mitochondrial encoded subunits, no other experiments were provided (such as measurements of the levels of 12S, or 16S rRNA, TUFM levels, ribosomes profile, mitochondrial translation, OXPHOS assembly, respirometry).

      In Figure 2 g-i: TEM images are presented, but the method is not described, nor is any information on the cells used provided, nor is it clear how the circularity was determined. KO cells certainly look abnormal; however, are the authors sure that the indicated structures are mitochondria? They rather resemble autophagosomes/lysosomes with lamellar inclusions.

    1. Reviewer #3 (Public review):

      This work by Du et al. addresses a critical problem in cryo-electron microscopy. To date, there are few ways of generating phase contrast during cryo-EM imaging while remaining in focus. Cryo-EM practitioners today must generate contrast by collecting out-of-focus exposures, a process that introduces aberrations in the resulting image data. Recent work has shown that standing wave lasers are capable of using the ponderomotive effect to shift the phase of electrons in transmission electron microscopy to generate in-focus phase contrast imaging for cryo-EM. A limitation of this 'laser phase plate' is the high laser power required, which can damage optical mirrors and necessitate high laser safety. Thus, alternative approaches are needed for phase contrast imaging in cryo-EM.

      In this manuscript, Du et al. exploit their expertise in ultrafast electron microscopy to explore the ability to shift the phase of electrons using pulsed electrons and lasers. The motivation for exploring pulsed laser phase plates stems from the fact that femtosecond pulses from 9W lasers can generate extremely high power (as much as the standing-wave laser phase plate, > 1 gigawatt) at the back focal plane. If successful, this type of instrument will likely be much more affordable and easier to deploy worldwide.

      The work outlined here shows a proof of principle, highlighting that an ultrafast scanning electron microscopy beam at 30 kV can have the electron packets phase shift by 430 radians (24637 degrees), which is much greater than the required 1.5 radians (90 degrees) needed for phase contrast imaging. The data presented do not use any biological samples; instead, they measure the spread of the electron beam on a test sample to assess the ability to target pulsed lasers onto electron packets and the amount of electron spread (which relates to the phase shift). They were also able to take their system a step further to measure how changes to the system in terms of laser power affect performance, and show that the system can be stable for 10+ hours.

      The only weaknesses relate to the broad readability of the text. Improved textual clarity will help ensure a wider readership.

      Overall, this work is an important step toward developing lower-cost alternatives to the standing-wave laser phase plate.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors present the development and characterization of a pulsed ponderomotive phase plate for transmission electron microscopy (TEM). The primary goal is to overcome the long-standing challenge of generating stable, tunable phase contrast for weakly scattering biological specimens - a capability that has remained elusive despite decades of development. While the commercially available Volta Phase Plate offers phase enhancement, it suffers from a lack of control and stability. More recent efforts have focused on continuous-wave (CW) laser phase plates; however, these systems face significant practical hurdles, including extreme optical power requirements, thermal instability of mirrors, and the necessity for high-finesse optical cavities that act as diffraction gratings for the electron beam. The authors aim to demonstrate that a pulsed, free-space laser interaction can circumvent these limitations, offering a more robust path toward practically usable phase plates

      Strengths:

      The most significant strength of this work is the elegant use of a free-space pulsed interaction, which fundamentally simplifies the hardware requirements compared to cavity-based designs. By utilizing a high-intensity pulsed laser focus rather than a standing wave inside a resonator, the authors eliminate the need for complex locking feedback loops and avoid the thermal mirror deformation that currently limits CW systems.

      Furthermore, this approach provides a critical theoretical advantage regarding image quality. Current CW cavity-based designs must grapple with the Kapitza-Dirac effect, where the standing wave creates a diffraction grating that generates unwanted "ghost images," delocalizing the signal. Recent proposals have had to resort to complex crossed-beam geometries to mitigate these artifacts. In contrast, the traveling-wave nature of the pulsed interaction described here inherently avoids the creation of a standing wave grating, thereby eliminating ghost images entirely without requiring elaborate compensation strategies.

      The authors successfully demonstrate a proof-of-concept implementation, reporting a pronounced peak phase shift of approximately 430 radians and a stable angular deflection of the electron beam. The stability data, covering a 10-hour period, suggests that this approach is robust enough for data collection sessions typical in structural biology.

      Weaknesses:

      However, the strength of the evidence is modestly tempered by limitations in data presentation and analysis. The agreement between the experimental data and the theoretical simulation in Figure 2b is imperfect; the simulation underestimates the depth of the central signal trough. While the authors acknowledge this "muted" prediction, the discrepancy suggests that the theoretical model or the estimation of experimental parameters (such as electron beam size or laser intensity) requires refinement to fully describe the interaction.

      While the authors claim stability over many hours, the data in Figure 3c reveal a significant drift in the baseline reference signal. Although attributed to a weakening electron beam, this drift complicates the reader's ability to assess the true stability of the laser-induced phase shift. A drift-corrected analysis would have provided more compelling evidence of the "stable angular kick" described.

      Despite these specific weaknesses in data presentation, the work represents a fundamental step forward. The authors have effectively demonstrated that the trade-off between beam current and spatiotemporal resolution (driven by space-charge effects) can be managed to achieve significant phase modulation. By moving the field away from the tight constraints of optical cavities and toward free-space pulsed interactions, this work establishes a potentially more viable route for integrating laser phase plates into routine biological imaging workflows. This study will be of high value to biophysicists and microscopists seeking to push the boundaries of contrast in cryo-EM

    3. Reviewer #1 (Public review):

      Summary:

      Du, Daniel X. et al studied the interaction of the ultrashort electron and laser pulses inside a scanning electron microscopy (SEM), aiming to build a foundation for pulsed laser phase plate electron microscopy, in which the contrast of cryo samples can be significantly increased. The author modified a commercial SEM to accommodate optics to introduce a laser beam inside the instrument to overlap with the electron beam and performed multiple experiments aimed to characterize the electron-light interaction, particularly reaching an extremely high phase shift of >400 rad. Moreover, the authors built a theoretical model for this interaction and estimated the laser beam parameters needed to reach 90 degrees phase shift in transmission electron microscopy (TEM).

      Strengths:

      The conclusion on the interaction of the electron pulses and laser pulses is well described and supported by the experiment.

      The presented instrument can serve as a great tool for studying fundamental interactions of electrons with extremely intense light pulses.

      Weaknesses:

      The authors motivate the project by using the pulsed electron beam with a phase shift for improving the contrast in cryo-EM, and while they indicate the low current in UEM, they do not discuss the limitations of the laser beam properties.

      Such, even for 1 ps electron pulses with the repetition rate of 100 GHz (duty cycle of 10%), they will need to use 100 GHz laser pulses with pulse energies of at least ~1 uJ a second (the lowest pulse energy reported in the simulations in Figure 4), which would mean that ~10 kW of optical power needs to enter the electron microscope and be dumped somewhere after leaving the instrument. This significantly complicates the system and, in my view, makes it harder to use a pulsed laser phase plate in cryo-EM due to either low acquisition rate at lower repetition rates or extreme difficulties to operate multi kW ultrafast laser system.

      I would also expect the unscattered electron beam diameter to be <1 micron, which would significantly change the plot in 4b for the 300 keV electron beam.

      Adding experimental parameters for a typical cryo-EM experiment with the pulsed phase plate, including the repetition rate, electron pulse duration, number of electrons per pulse, electron beam size, and the parameters of the laser beam (wavelength, laser pulse duration, pulse energy), will help readers better understand technical requirements for the proposed cryo-EM experiments.

    4. eLife Assessment

      This important study introduces a pulsed laser phase plate that generates stable phase contrast in electron microscopy, offering a practical alternative to continuous-wave designs that suffer from optical instabilities and diffraction artifacts. The experimental results demonstrate a controllable and stable electron phase shift, and the evidence supporting the feasibility of this approach for phase-contrast electron microscopy is convincing. Clarifying the agreement between experiment and theory and further elaborating on possible applications would strengthen the manuscript.

    1. eLife Assessment

      This important study reports an endometrial organoid culture system mimicking the window of implantation. The evidence supporting the conclusion drawn is convincing. The data will be of interest to embryologists and investigators working on reproductive biology and medicine.

    2. Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only early-passage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

      [Editors' note: the authors have responded to the previous round of recommendations.]

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. Although many bioinformatic analyses point in this direction, there are major concerns that must be addressed.

      Strengths:

      The addition of 3 hormones to enhance the WOI state (although not clearly supported in comparison to the secretory state).

      Comments on revisions:

      The authors did their best to revise their study according to the Reviewers' comments. However, the study remains unconvincing, incomplete and at the same time still too dense and not focused enough.

      Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only earlypassage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This study generated 3D cell constructs (i.e., assembloids) that were treated with hormones to induce a 'window of implantation' (WOI) state. While the authors have made large efforts to address the reviewers' feedback, the study's findings remain unconvincing and incomplete.

      (1) The authors have appropriately revised the terminology from 'organoids' to 'assembloids' in several parts of the manuscript. However, this revision remains incomplete, as the main title, figure legends, and figure titles still contain the incorrect term. A thorough review of the entire manuscript is recommended to ensure consistent and accurate use of terminology.

      Thank you for your meticulous review. We have now conducted a full check and confirmed that terminology is used consistently and accurately throughout the text.

      (1) Previous comments raised concerns about the feasibility of robustly passaging assembloid structures - comprising epithelial, stromal and immune cells - under epithelial growth conditions. The authors responded by stating that they optimized the expansion medium with a stromal cell-promoting factor. Additionally, rather than conducting scRNA-seq on both early and late passages (P6-P10) as suggested, they performed immunofluorescence staining, which confirmed the persistence of stromal cells at passage 6. However, the presence of immune cells was not addressed. Confirmation of their presence is essential for all further claims. Moreover, a more zoomed-out view of the immunostaining would help clarify the overall cellular composition across the entire well and facilitate comparison with corresponding brightfield images.

      Whole-mount immunofluorescence of the 6th - generation assembloids revealed that CD45<sup>+</sup> immune cells surrounded FOXA2<sup>+</sup> glands, with a more zoomed-out view provided.

      Author response image 1.

      Whole-mount immunofluorescence showed that CD45<sup>+</sup> cells (immune cells) were arranged around the glandular spheres that were FOXA2<sup>+</sup>. Scale bar =50 μm (left) and 30 μm (right).

      In their response, the authors mention using the first three passages to ensure optimal cell diversity and viability. However, the manuscript states that 'assembloids derived from the first generation are used for experiments' (line 106). This discrepancy must be clarified.

      Thank you for your suggestion. We have revised the relevant content to “The assembloids derived from the first three generation are used for experiments” (Line 90-91).

      (2) The authors have made a commendable effort to bring more focus to the manuscript, which has improved readability.

      We thank you for your insightful suggestions, which have greatly improved the quality of our manuscript.

      (3) The "embryo implantation" part remains very unconvincing. How did authors define "the blastoids could grow within the endometrial assembloids and interact with them"? What did they mean with "grow"? Did blastoids further differentiate? Normally, blastoids cannot further "grow". "Survival rates of blastoids" is not equal to "growth". It is not clear how the survival rate was quantified. Besides, regarding the "interaction rates", how did authors define and quantify it? Actually, blastoids are able to attach to Matrigel efficiently (even without any endometrial cells), so authors cannot simply define the "interaction" as the co-localization of blastoids and assembloids via brightfield images. In addition, for the assembloids as the 3D structures grow in the Matrigel, the epithelial parts are normally apical-in, while the blastoids attach to the apical (lumen) side of the epithelial cells, so physiologically, blastoids should interact with the apical part of the epithelial cells instead of the outside of the assembloids.

      (1) What did they mean with "grow"? Did blastoids further differentiate?

      On the one hand, volume and morphology undergo continuous dynamic changes; on the other hand, only the inner cell mass and trophectoderm exist at the blastocyst stage, with the ICM further differentiating into OCT4<sup>+</sup> epiblast and GATA6<sup>+</sup> hypoblast.

      (2) Survival rates of blastoids" is not equal to "growth". It is not clear how the survival rate was quantified.

      The definition of "survival rate" is as follows: morphologically, the blastocoel remains noncollapsed and the cell boundaries are distinct (with no obvious cell detachment); molecularly, the markers of epiblast, hypoblast and trophectoderm are expressed. The survival rate is calculated as the ratio of viable embryoids to the total number of embryoids.

      (3) Besides, regarding the "interaction rates", how did authors define and quantify it? Actually, blastoids are able to attach to Matrigel efficiently (even without any endometrial cells), so authors cannot simply define the "interaction" as the co-localization of blastoids and assembloids via brightfield images.

      The criteria for determining interaction include not only attachment between the blastoids and assembloids observed via brightfield images, but also their sustained tight adhesion against external mechanical perturbations (e.g., medium replacement, immunostaining procedures).

      (4) In addition, for the assembloids as the 3D structures grow in the Matrigel, the epithelial parts are normally apical-in, while the blastoids attach to the apical (lumen) side of the epithelial cells, so physiologically, blastoids should interact with the apical part of the epithelial cells instead of the outside of the assembloids.

      You are absolutely correct. In vivo, the embryo indeed makes initial contact with the apical side of the epithelial cells. The introduction of the blastoid co-culture model herein is intended to demonstrate that this receptive endometrial assembloids can better support blastoid growth and development.

      (4) Previous comments highlighted the absence of distinct shifts in gene expression profiles between SEC assembloids and WOI assembloids, which contrasts with findings from primary endometrial tissue reported by Wang et al. (2020). While the authors have expanded their analysis using the Mfuzz algorithm and identified changes in mitochondria- and cilia-associated genes, the manuscript still lacks evidence of significant transcriptional changes in key WOI marker genes, as described in Wang et al. This discrepancy must be addressed and discussed in greater depth to clarify the biological relevance of their model.

      The endometrium in vivo involves complex crosstalk among multiple cell types and is tightly regulated by the hypothalamic-pituitary-ovarian (HPO) axis, thus exhibiting distinct shifts in gene expression during the peri-implantation period.

      In our in vitro model, alterations in mitochondria- and cilia-related genes were observed, which to a certain extent demonstrates that these window of implantation (WOI) assembloids possess receptive-phase characteristics and can be employed to investigate WOI-associated scientific questions or conduct in vitro drug screening.

      However, substantial efforts are still required to optimize the current model for fully recapitulating the dynamic changes in endometrial gene expression across different phases in vivo, and this aspect is further addressed in the Limitations section of our discussion (Line 342-353).

      “However, our WOI endometrial assembloids also exhibit some limitations. It is undeniable that the assembloids cannot perfectly replicate the in vivo endometrium, which comprises functional and basal layers with a greater abundance of cell subtypes, under superior regulation by hypothalamic-pituitary-ovarian (HPO) axis. Specifically, stromal and immune cells are challenging to stably passage, and their proportion is lower than in the in vivo endometrium. While the in vivo peri-implantation period exhibits intricate gene expression dynamics driven by systemic regulation, our models only partially recapitulate these changes, primarily in mitochondria- and cilia-associated genes. Nevertheless, to some extent, these WOI assembloids possess receptivity characteristics and can be utilized for investigating receptivity-related scientific questions or conducting in vitro drug screening. Further refinements are required to fully simulate the dynamic endometrial gene expression patterns across all menstrual cycle stages. We are looking forward to integrating stem cell induction, 3D printing, and microfluidic systems to modify the culture environment.”

      (5) In the authors' response document, they present data integrating their results with those of Garcia Alonso et al. (2021). However, these integrated analyses are not included in the revised manuscript (which should be, if answering a major concern).

      Thanks for your valuable suggestions. We have now integrated the findings of Garcia Alonso et al. (2021) into the revised manuscript (Line 132) and Figure S2E–F.

      (8) Fig 2D: The authors have clarified that CD45+ staining is used. However, they have not yet adapted the typo in the figure legend of the right picture.

      Thanks for your thorough review. The left panel of Figure 2D is stained with CD45 to label immune cells, while the right panel is stained with CD44. These details have been clearly indicated in both the manuscript and the figure legend.  

      (9) All quantification analyses (as described in the authors' response document) should be clearly described in the Materials & Methods section.  

      Thanks for your valuable suggestions. All quantification analyses have now been added to the Supporting Materials and Methods section (Line 94-104, Line 110-111, Line 241244).

      (10) The authors have provided clarification regarding their method for quantifying immunofluorescence staining (e.g., OLFM4 expression in Fig. 3C) in their response document. However, these methodological details are not included in the revised manuscript. It is important that such information is incorporated into the manuscript itself to ensure transparency and reproducibility for others.

      Thanks for your valuable suggestions. All quantification analyses have now been added to the Supporting Materials and Methods section (Line 94-104).

      (13) It is needed to include the author's response to the comment about literature showing the opposite of increased number of cilia during the WOI into the discussion part of the paper.

      We appreciate your suggestions. The relevant content has now been added to the Discussion section (Lines 319–323).

      (14) In the authors' response, they explain the difference between pinopodes and microvilli. They should include this explanation briefly in the manuscript. Moreover, Fig. 3F lacks a picture of cilia structure in CTRL condition. In addition, the structures that are indicated as cilia with an orange arrow seem to not be attached to the endometrial cells (anymore). It would be useful to show another more representative picture for the cilia.

      (1) Thank you for your valuable suggestions. The distinction between pinopodes and microvilli has now been added to the Supporting Materials and Methods section (Line 230-236).

      (2) You are probably referring to Figure 2F—we did not observe ciliary structures in the CTRL group.

      (3) The cilia structure was visualized via transmission electron microscopy (TEM), which requires ultrathin sectioning. Thus, the cilia shown in the image correspond to a single cross-section of the captured assembloids. Owing to technical limitations, three-dimensional visualization of cilia on the cells cannot be achieved.

      (17) The results on co-culturing blastoids with the WOI assembloids is not convincing. The blastoids are exposed to the basolateral side of the endometrial epithelial cells, while in vivo, blastocysts interact with the apical side of the endometrial epithelial cells first (apposition and attachment), followed by invasion into the endometrium. This means that the interaction shown here is not physiological. Therefore, it is not justified to say that this platform holds promise to investigate maternal-fetal interactions.

      We agree with your perspective that discrepancies exist between this model and the physiological processes in vivo. However, such differences do not negate the scientific value of the model.

      The core merit of this study lies in the successful establishment of co-culture systems for blastoids and WOI assembloids. Notably, genuine cross-talk occurs between the two components, thereby providing a practical and operational tool for subsequent research.

      Although the current contact orientation differs from that observed in vivo, future optimization of the cell culture protocol (via modulation of cell polarity) will enable the model to better recapitulate physiological conditions. Therefore, the innovation and operability of this model within specific research contexts still render it a robust platform for investigating maternal-fetal interactions.

      Overall, it is highly recommended that the authors carefully review the manuscript for grammatical errors, inconsistencies and issues with scientific phrasing. The language throughout the text requires substantial editing to improve clarity, readability and precision. 

      We appreciate your suggestions. A full manuscript check was performed to rectify grammatical errors, inconsistencies, and inappropriate scientific phrasing, with further language refinement by a native English-speaking specialist.

      Fig 1A: This overview is unclear. How many days do the assembloids grow before being stimulated with hormones? Are CTRL assembloids only kept in culture until day 2 and SEC and WOI assembloids until day 8? This is also not clear form the Materials and Methods section. Should be clarified.

      Thanks for your valuable suggestions. We have now updated the overview (Figure 1A) and Materials and Methods section (Line 370-371, Line 379-381).

      “Hormonal treatment was initiated following the assembly of the endometrial assembloids (about 7-day growth period).”

      “The CTRL group was cultured in ExM without hormone supplementation and subjected to parallel culture for 8 days along with the two aforementioned groups.”

      Fig 1B: From these brightfield images, it appears that the size of the assembloids remains relatively consistent from Day 0 to Day 3 and up to Day 11 (especially in CTRL). However, in Fig S1A, the assembloids on Day 11 appear significantly larger compared to those on Day 2 (or Day 4). Authors should clarify this discrepancy (since both of the figures are shown as "brightfield of endometrial assembloids").

      You are probably referring to the observation that the assembloids at Day 11 in Fig. S1A are smaller in size than those at Day 2 (or Day 4) in Fig. 1B. This discrepancy arises because the time points in Fig. 1B are calculated starting from the initiation of hormone treatment for the SEC and WOI groups, rather than from the beginning of the overall culture as in Fig. S1A. In addition, assembloids exhibit size variability during the same culture period due to individual heterogeneity.

      To eliminate ambiguity, we have now labeled “Hormone Day 0, Day 2, Day 8” in Fig. 1B and revised the corresponding figure legend to read: “Endometrial assembloids from the CTRL, SEC, and WOI groups, which were subjected to hormone treatment on Days 0, 2, and 8, exhibited comparable growth patterns throughout the culture period.”

      Fig 2G: authors still used the description "organoids" here instead of "assembloids".

      We appreciate your careful review. Corrections have been made accordingly.

      Fig. 3C: For the OLFM4 staining quantification, in the Y-axis authors wrote "proportion of OLFM4 (+) cells (OLFM4 (+)/total", but in the rebuttal letter they mention "its fluorescence intensity (quantified as mean grey value) was significantly stronger in both the SEC and WOI groups compared to the CTRL group". This is confounding and should be clarified.

      We apologize for incorrectly writing "fluorescence intensity" in the rebuttal letter; the correct term should be the "proportion of OLFM4 (+) cells (OLFM4 (+)/total)" as shown in Fig. 3C.

      Fig 5D: Acetyl-α-tubulin is the marker of ciliated cells and should be expressed in the cilia instead of the whole cells. It is very strange to quantify as "mean fluorescence intensity (acetyl-αtubulin/DAPI)" to assess the cilia. Please clarify.

      Thank you for your insightful comment. To clarify, the ratio "mean fluorescence intensity (acetyl-α-tubulin/DAPI)" was calculated within individual acetyl-α-tubulin<sup>+</sup> ciliated cells. Acetyl-αtubulin fluorescence was normalized to the DAPI signal of the same cell nucleus, not the wholecell population. This corrected for variations in cell number and staining efficiency to ensure data accuracy.

      Fig 5F: it is very bizarre that unciliated epithelium was transformed from ciliated epithelium, and CTRL was transformed from SEC and WOI. Should be clarified and discussed.

      Pseudotime analysis sorts discrete cells along a "pseudotime axis" based on similarities and differences in cellular gene expression, thereby simulating cell state transitions.

      Ciliated epithelium → unciliated epithelium: During the menstrual cycle, ciliated and unciliated epithelia undergo mutual transformation from the secretory phase (or mid-secretory phase) to the menstrual phase, and then to the proliferative phase. Here, we demonstrate the transition of ciliated cells to unciliated cells from the SEC and WOI stages to the CTRL stage.

      Notably, the two cell types coexist, and what is presented here merely reflects a transformation trend. Relative content has been incorporated into the Discussion section (Line 319-321).

      “Throughout the menstrual cycle, ciliated and unciliated epithelia undergo mutual transformation from the secretory phase (or mid-secretory phase) to the menstrual phase, and then to the proliferative phase.”

      Fig 5H: To show "enhanced invasion ability", authors must provide some quantification and statistic analysis. It is very hard to see the difference between the CTRL and SEC regarding ROR2Wnt5A.

      We appreciate your suggestion. Quantification and statistic analysis have been added to Figure 5H.

      Fig 6A: please elaborate the "mIVC1" and "mIVC2" in the figure legends.

      Additions have been made to the figure legends accordingly, as follows: "mIVC1: modified In Vitro Culture Medium 1; mIVC2: modified In Vitro Culture Medium 2."

      Fig S1D: Is the PAS staining also done in CTRL assembloids? In addition, it is stated that the assembloids secrete glycogen because of a positive PAS staining, while it could also be neutral mucins, glycoproteins, etc, which are all detected by PAS staining. So, the authors should be more careful in stating that it is glycogen, or a PAS staining with diastase digestion should be done.

      The PAS staining results for the CTRL group are presented in Fig. S1I. In addition, results of PAS staining with diastase digestion are included in Figure S1.

      Line 120: references?

      The reference has been added accordingly.

      Line 178: The term 'Endometrial Receptivity Test (ERT)' is used. Do the authors mean Endometrial Receptivity Analysis (ERA) test? ERA is the commonly used abbreviation for this test. Moreover, the authors describe ERA as 'a kind of gene analysis-based test.' This should be rephrased more scientifically correct.

      Thank you for your valuable suggestion. We have revised the term to ERA, and modified the phrase "a kind of gene analysis-based test" to "gene expression profiling-based diagnostic assay" (Lines 160–163).

      “We performed Endometrial Receptivity Analysis (ERA), a gene expression profiling-based diagnostic assay that integrates high-throughput sequencing and machine learning to quantify the expression of endometrial receptivity-associated genes.”

      Line 83: assemblies à assembloids

      We appreciate your suggestion. The text has been updated to “the endometrial assembloids progressed from epithelial organoids, to assemblies of epithelial and stromal cells and then to stem cell-laden 3D artificial endometrium”.

      The Materials and Methods section currently lacks the needed details. Authors should substantially expand this section to clearly describe all experimental and analytical procedures, including, aùmong others, immunofluorescence staining, quantification methods, bioinformatics analyses and statistical approaches. Providing comprehensive methodological information is essential.

      A detailed description of these methods is provided in the Supporting Materials and Methods section.

      Reviewer #2 (Recommendations for the authors): 

      The revised manuscript is much improved in clarity, focus, and experimental support. The authors have thoughtfully addressed the major concerns from the previous review. In particular, the logic and flow of the paper are clearer, it now guides the reader through the rationale (constructing a WOI model), the comparative analysis against in vivo tissue and simpler organoids, and the key features that distinguish the WOI assembloid. The added functional validation (especially the blastoid co-culture experiment) significantly strengthens the work by showing a tangible outcome of "receptivity" beyond molecular profiling. The distinction between the standard secretory-phase organoid and the WOI assembloid is now more convincing, as the authors highlight several specific differences in morphology (more cilia, pinopodes), metabolism, and implantation success that favor the WOI model. The manuscript also reads cleaner with the bioinformatic sections condensed to the most important findings (excess detail was trimmed or moved to supplements) and the rationale for gene/pathway selection explicitly stated.

      The manuscript has been significantly strengthened through the addition of functional assays (like the blastoid co-culture), clearer transcriptomic and proteomic data, and detailed analyses of hormone treatments, cilia biology, and stromal and immune cell behavior in early passages. These updates confirm that the WOI assembloid supports embryo attachment and outperforms standard secretory organoids, while integrating external references and clarifications on terminology. Minor suggestions remain, such as clarifying statistical significance and adding functional interpretations for certain observations, but overall, the manuscript is now more robust and biologically convincing.

      Remaining points for clarification: There are a few minor points that still merit attention:

      - Use of the Endometrial Receptivity Test (ERT): As previously mentioned, if the authors have ERT data for the SEC organoid group, including that information would further support the claim that the WOI assembloid is uniquely receptive. If not, it would be helpful to add a statement clarifying that the ERT was employed specifically as a confirmatory test for the WOI assembloids, rather than as a comparative measure across all groups.

      Thank you for your valuable suggestion. We have now supplemented the description in the Supporting Materials and Methods section (Lines 160–162) as follows: “ERA was employed specifically as a confirmatory test for the WOI assembloids, rather than as a comparative measure across all groups.”

      - Because the assembloids are created from primary tissue samples, it would be helpful to briefly comment on how consistent the findings were across different patient-derived samples. For example, did all biological replicates show similar expression of receptivity markers and comparable capacity to support blastoid attachment? Although this seems implied, including a sentence in the Methods or Results sections that specifies the number of donor lines tested would help readers assess the model's variability and reproducibility.

      We appreciated your advice. The relevant statement has been added to the Supporting Materials and Methods section. (Line 312-313).

      “All biological replicates (fourteen individuals) of endometrial assembloids show similar expression of receptivity markers and comparable capacity to support blastoid attachment.”

      - The authors mention promising future directions, such as integrating 3D printing and microfluidics to further enhance the model, which is an excellent forward-looking statement. It would also be valuable to suggest the inclusion of additional cell types, like more robust immune cell populations or endothelial components, as future improvements to create an even more comprehensive model of the endometrial lining.

      Thank you for your valuable suggestion. 3D printing and microfluidics serve as approaches for introducing multiple cell types. We have supplemented the following statement in the manuscript: “We are looking forward to integrating stem cell induction, 3D printing, and microfluidic systems to modify the culture environment.” (Line 352-353).

      We are grateful for your valuable feedback and constructive criticism, which have helped us improve the quality of our work in terms of content and presentation. We have diligently revised the manuscript and made necessary changes. Here, we have attached the revised manuscript, figures, and all supplementary materials for your re-evaluation. Thank you again for your continued support and look forward to your favorable decision.

    1. eLife Assessment

      The authors developed a fundamental computational method, which is intended to automatically process bioluminescence imaging-derived tumour images across anatomical regions and over time. This allows quantitative analysis of such data, and the authors applied it to describe the spatiotemporal distribution of tumour cells in response to CD19-targeted CAR-T cells that contained either CD28 or 4-1BB costimulatory domains. Some operational limitations were identified, which relate to the pipeline's reliance on predefined regions of interest instead of aligning signal sites with anatomical information, scaling, and limitations in taking animal pose into account. Overall, the authors provide compelling evidence for the functionality of their computational approach towards automated analysis of bioluminescence imaging data, while applying it to a current topic of wide interest in cell therapy research.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

      Comments on revisions:

      (1) Clarification of 2D Analysis. We strongly recommend that the authors explicitly define maRQup as a 2D spatiotemporal analysis technique. Since optical imaging quantification is inherently dependent on tissue type and signal depth, characterizing this as a 3D or volumetric method without tomographic correction is inaccurate. Please precede "spatiotemporal" with "2D" throughout the text to ensure precision regarding the method's capabilities.

      (2) Data Validation and Scaling in Supplemental Figure g currently lacks the units necessary to support the assertion.

      Non-Uniform Growth: The authors' method implies that mouse growth is linear and uniform in all directions (isotropic). However, murine growth is not akin to the inflation of a balloon; animals elongate and widen at different rates. The current scaling does not account for these physiological non-linearities.

      Pose Variability: The scaling approach appears to neglect significant variability in animal positioning. Even under anesthesia, animal pose is rarely identical across subjects or time points.

      Requirement for Evidence: Without quantitative data, there appears to be significant differences between the individual images and the merged image. If the authors assert that this is a "classical setting" where mouse positioning is 100% consistent and growth curves are identical in multiple dimensions, please provide specific references that validate these assumptions. Otherwise, the scaling must be corrected to account for anisotropic growth and pose differences or stated that scaling was only based on one dimension.

      (3) Methodology of Spatial Regions The manuscript does not currently indicate how the nine distinct spatial regions were determined. Please expand the methods section to include the specific segmentation algorithms or anatomical criteria used to define these regions, as this is critical for reproducibility.

    3. Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights on preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification.<br /> This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Comments on revisions:

      The critiques have been taken care of appropriately.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup, a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      Our answer to this comment is included in the Supplemental Methods. The standard deviation of the mouse pixels was calculated to ensure that the image processing steps did not alter the shape or size of the mice. Such consistency is particularly striking because our dataset was accrued by nine lab members over the last five years, before we conceived and carried out our analysis (c.f., answer to point #2). In fact, it is the very consistency of this IVIS measurement that led us to conceive our pipeline. As seen from Supplemental Figure 4G, there is minimal difference in the shape or size of the mice across 7,534 images. A total of 99 images were removed either due to being too slanted (91/7663, 1.2%) or due to processing errors (8/7633, 0.1%). Also, the vertical scaling was conducted while keeping the aspect ratio unchanged to prevent any non-anatomical scaling. Hence, we did not record any nonlinear growth of the mice that would warrant more convoluted alignment and/or batch correction for our images.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera, as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

      Reviewer #1 is correct that different mouse postures would be an issue when aligning the images and normalizing for size. However, all experiments are conducted for luminescence measurements in the IVIS system (i.e., this requires anesthesia and long integration time for imaging). In our experience and in our 1000+ mouse dataset, we noticed that all experiments (n=37) did place the anesthetized mice in a stretched/elongated position. Of note, these experiments were conducted by nine different researchers who were not instructed on how to place the mice on the machine for ideal image processing, thus showing that the standard protocol of imaging mice on IVIS does not introduce large variations in animal pose during image acquisition. We think the issue raised by Reviewer #1 is moot in the context of classical settings for mouse luminescence imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a method that automatically processes bioluminescent tumor images for quantitative analysis and used it to describe the spatiotemporal distribution of tumor cells in response to CD19-targeting CAR-T cells, comprising CD28 or 4-1BB costimulatory domains. The conclusion highlights the dependence of tumor decay and relapse on the number of injected cells, the type of cells, and the initial growth rate of tumors (where initial is intended from the first day of therapy). The authors also determined the spatiotemporal analysis of tumor response to CAR T therapy in different regions of the mouse body in a model of acute lymphoblastic leukemia (ALL).

      Strengths:

      The analysis is based on a large number of images and accounts for many variables. The results of the analysis largely support their claims that the kinetics of tumor decay and relapse are dependent on the CAR T co-stimulatory domain and number of cells injected and tumor growth rates. 

      Weaknesses:

      The study does not specify how a) differences in mouse positioning (and whether they excluded not-aligned mice) and b) tumor spread at the start of therapy influenced their data. The study does not take into account the potential heterogeneity of CAR T cells in terms of CAR T expression or T cell immunophenotype (differentiation, exhaustion, fitness...).

      See answer #2 to Reviewer #1.

      Author response image 1.

      Author response image 1 shows the average tumor radiance on day zero (when CAR-T cell therapy was administered) for all mice. While there is some spread, most mice had tumor localized to the liver or bone marrow.

      Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights into preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification. 

      This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Weaknesses: 

      No weaknesses were identified by this Reviewer. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this paper, the authors propose a significant advancement in optical image data analysis by employing automation. They effectively demonstrate the valuable insights that can be gained from analyzing extensive datasets with a more unbiased methodology. At present, I do not have any specific suggestions for improvement.

      However, it is important to note that this work is limited in its operational scope. Specifically, it relies on predefined ROIs rather than aligning the signal site with anatomical systems. The scaling model and image cropping are simplistic, animal pose is not taken into account, and the data output needs to be called semi-quantitative or qualitative, and would have been stronger utilizing an AI agent. Nevertheless, this work underscores the potential of automated systems in preclinical image analysis, which is a crucial step towards developing more sophisticated approaches to optical image data analysis.

      While our analysis used predefined ROIs, the maRQup pipeline allows users to manually draw ROIs on the mouse image.

      Reviewer #2 (Recommendations for the authors):

      The writing and presentation of data are clear and accurate, but some additional information should be added regarding the imaging protocol used to acquire the original data. 

      The authors mention fluorescence in Figure 1. I expected all the data to be generated from bioluminescent NALM-6 tumors, since bioluminescence is indeed measured in average radiance and can be per pixel (p/sec/cm2/sr/pixel). Fluorescence should be measured using radiance efficiency (p/sec/cm2/sr)/(µW/cm2), a unit that compensates for non-uniform excitation light pattern in the instrument. Would the author find different results if fluorescence data were analyzed separately?

      Reviewer #2 is correct that the unit for fluorescence would be radiance efficiency. The word “fluorescent” was included in the label of Figure 1a  to highlight that our workflow could be applied to other types of light-generating methods (i.e., fluorescence vs. bioluminescence). However, in this study, measurements of bioluminescent tumors only were analyzed. If fluorescence measurements are to be analyzed, our methods of image acquisition and processing would be directly applicable.

      Did the author ever check the signal of the snout in mice with no tumor?

      In mice with no tumor, there is no detectable signal in the snout (or anywhere else, for that matter).

      The urine of mice contains phosphor, and might give a background signal, especially if longer exposure is used at the end of the study.

      For the mice with no tumor injection, the luminescence signal was below background (<10<sup>2</sup> p/sec/cm<sup>2</sup>/sr/pixel). In particular, we do not detect any signal in the bladder/urine. Additionally, as described in the Supplemental Methods and Figure 1b, only pixels that were on the mouse as determined from the brightfield image were used to calculate the tumor burden from the radiance of the luminescent image. This method ensures that any background signal (e.g., from phosphor in mouse urine) would be excluded in the radiance quantification and not bias the results.

      Additionally, as described in the Methods, the exposure time was held constant at 30 seconds for each IVIS measurement across all 37 experiments.

      The data using more than 2 million cells comes from only 10 mice, and maybe the biological relevance of this group is limited since it will not be achievable and translatable in humans (PMID: 33653113).

      We appreciate Reviewer #2’s attention to this issue. The effect observed in our study is large enough to reach statistical significance despite the small number of mice. Note that the dosing regimen used was optimized for the murine NSG model and would require appropriate scaling before clinical application. Nonetheless, NSG mice remain the gold standard for pre‑clinical in vivo evaluation and their use is generally required by regulatory agencies, such as the FDA, for assessing novel CAR‑T cell therapies; thus these findings are relevant for advancing such treatments.

    1. eLife Assessment

      This valuable study presents a technically sophisticated intravital two-photon calcium imaging approach to characterize Ca²⁺ dynamics in distinct populations of meningeal macrophages in awake, freely behaving mice. These data are solid and suggest that meningeal macrophage calcium activity is tightly linked to anatomical sub-compartments, with potential implications for migraine and neuroinflammatory processes. Despite these strengths and broad relevance to neuroimmunology, several technical and interpretational issues limit the study, which could be addressed to strengthen this manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a technically sophisticated intravital two-photon calcium imaging approach to characterize meningeal macrophage Ca²⁺ dynamics in awake mice. The development of a Pf4Cre:GCaMP6s reporter line and the integration of event-based Ca²⁺ analysis represent clear methodological strengths. The findings reveal niche-specific Ca²⁺ signaling patterns and heterogeneous macrophage responses to cortical spreading depolarization (CSD), with potential relevance to migraine and neuroinflammatory conditions. Despite these strengths, several conceptual, technical, and interpretational issues limit the impact and mechanistic depth of the study. Addressing the points below would substantially strengthen the manuscript.

      Strengths:

      The use of chronic two-photon Ca²⁺ imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca²⁺ dynamics in the meninges.

      The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca²⁺ signaling properties. The identification of macrophage Ca²⁺ activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      By linking macrophage Ca²⁺ responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Weaknesses:

      The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca²⁺ signal interpretation.

      The manuscript offers an extensive characterization of Ca²⁺ event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca²⁺ activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca²⁺ dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      The GLM analysis revealing coupling between dural perivascular macrophage Ca²⁺ activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca²⁺ manipulation).

      The authors conclude that synchronous Ca²⁺ events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved.

      A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca²⁺ activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca²⁺ suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding.

      The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca²⁺ increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions.

    3. Reviewer #2 (Public review):

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      I have only minor comments.

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this report wish to show that distinct populations of meningeal macrophages respond to cortical spreading depolarization (CSD) via unique calcium activity patterns depending on their location in the meningeal sub-compartments. Perivascular macrophages display calcium signaling properties that are sometimes in opposition to non-perivascular macrophages. Many of the meningeal macrophages also displayed synchronous activity at variable distances from one another. Other macrophages were found to display calcium signals in response to dural vasomotion. CSD could induce variable calcium responses in both perivascular and non-perivascular macrophages in the meninges, in part due to RAMP1-dependent effects. Results will inform future research on the calcium responses displayed by macrophages in the meninges under both normal and pathological conditions.

      Strengths:

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Weaknesses:

      The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Strengths:

      (1) The use of chronic two-photon Ca<sup>2+</sup> imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca<sup>2+</sup> dynamics in the meninges.

      (2) The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca<sup>2+</sup> signaling properties. The identification of macrophage Ca<sup>2+</sup> activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      (3) By linking macrophage Ca<sup>2+</sup> responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Thank you for recognizing the strengths in our work.

      Weaknesses: 

      (1) The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca<sup>2+</sup> signal interpretation.

      We acknowledge that PF4 is not an exclusively macrophage-restricted marker. Yet, among meningeal immunocytes, it is almost exclusively expressed in macrophages (1, 2). Furthermore, in the adult mouse meninges, Pf4<sup>Cre</sup>-based reporter lines label nearly all dural and leptomeningeal macrophages and almost no other cells (3, 4). This Cre line has also been used to target border-associated macrophages (2, 4). Moreover, a recent study suggests that the bacterial artificial chromosome used to generate the Pf4<sup>Cre</sup> line does not affect meningeal macrophage activity (4). Nonetheless, while we already discussed PF4 expression in meningeal megakaryocytes, in a revised version, we plan to discuss the possibility that a very small population of other meningeal immune cells may also be labeled.

      (2) The manuscript offers an extensive characterization of Ca<sup>2+</sup> event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca<sup>2+</sup> activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca<sup>2+</sup> dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      In our discussion, we indicated that “the exact link between the distinct Ca<sup>2+</sup> signal properties of meningeal macrophage subsets observed herein and their homeostatic function remains to be established”. In a revised version, we plan to further acknowledge that this is primarily a descriptive study that provides a foundational landscape of Ca<sup>2+</sup> dynamics in meningeal macrophages.

      (3) The GLM analysis revealing coupling between dural perivascular macrophage Ca<sup>2+</sup> activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca<sup>2+</sup> manipulation). 

      In the results section, we indicated that our data suggest that dural perivascular macrophages are functionally coupled to locomotion-driven dural vasomotion, either responding to it or mediating it. Furthermore, in our discussion, we discussed the possibilities that 1) macrophages sense vascular-related mechanical changes and 2) macrophage Ca<sup>2+</sup> signaling may regulate dural vasomotion. Moreover, we explicitly state that studying causality will require an experimental approach that has yet to be developed, enabling selective manipulation of dural perivascular macrophages.

      (4) The authors conclude that synchronous Ca<sup>2+</sup> events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved. 

      Thank you for this suggestion. In the revision, we will indicate that the source of synchrony remains unresolved.

      (5) A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca<sup>2+</sup> activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca<sup>2+</sup> suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding. 

      While we propose that the decrease in macrophage calcium signaling following CSD could indicate that a hyperexcitable cortex dampens meningeal immunity, in the revised version, we plan to elaborate on the possible implications of this finding.

      (6) The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca<sup>2+</sup> increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions. 

      We plan to acknowledge these limitations.

      Reviewer #2 (Public review): 

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      Thank you.

      I have only minor comments. 

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text. 

      We did not measure the exact distance of the perivascular macrophages from the blood vessels, but defined them as such based on previous data showing that these cells reside along the abluminal surface and maintain tight interactions with mural cells (5). We plan to provide this information in the revised manuscript.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

      We plan to add the method for inducing CSD (i.e., a pinprick in the frontal cortex) to the Results section and provide more background in the Introduction section.

      Reviewer #3 (Public review):

      Strengths: 

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Thank you for recognizing the strengths of our paper

      Weaknesses:

      (1) The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included. 

      We plan to address these issues in the revision

      References

      (1) H. Van Hove et al., A single-cell atlas of mouse brain macrophages reveals unique transcriptional identities shaped by ontogeny and tissue environment. Nat Neurosci 22, 1021-1035 (2019).

      (2) F. A. Pinho-Ribeiro et al., Bacteria hijack a meningeal neuroimmune axis to facilitate brain invasion. Nature 615, 472-481 (2023).

      (3) G. L. McKinsey et al., A new genetic strategy for targeting microglia in development and disease. Elife 9,  (2020).

      (4) H. J. Barr et al., The circadian clock regulates scavenging of fluid-borne substrates by brain border-associated macrophages. bioRxiv,  (2025).

      (5) H. Min et al., Mural cells interact with macrophages in the dura mater to regulate CNS immune surveillance. J Exp Med 221,  (2024).

    1. eLife Assessment

      This study provides valuable evidence that hepatic DHHC7-dependent palmitoylation is a physiologically relevant regulator of systemic metabolism, and that loss of DHHC7 disrupts Gαi palmitoylation, activates cAMP-PKA-CREB signaling, and increases hepatic transcription and secretion of Prg4. The identification of Prg4 as a hepatokine that is elevated in vivo, together with some in vitro evidence for its interaction with GPR146, represents a conceptually novel contribution to the field. However, the evidence linking these mechanisms to systemic lipolysis, liver-adipose tissue crosstalk, and whole-body metabolic physiology remains incomplete, as the phenotypic analyses rely on a limited set of experiments and do not yet fully support claims regarding adipose tissue dysfunction or altered lipid flux.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors' aim was to determine whether hepatic palmitoylation is a physiologically relevant regulator of systemic metabolism. The data demonstrate that loss of DHHC7 in hepatocytes disrupts Gαi palmitoylation, enhances cAMP-PKA-CREB signaling, and drives transcriptional upregulation and secretion of Prg4. The KO mice display increased body weight, fat mass, and plasma cholesterol, but at 12 weeks on HFD, do not exhibit insulin resistance. The potential mechanism underlying the metabolic phenotype was examined by assessing adipocyte signaling and by exploring whether Prg4 acts through GPR146. Through this pathway, the authors intend to link DHHC7-dependent palmitoylation to the regulation of hepatokines that exert systemic metabolic effects.

      Strengths:

      (1) Hepatic palmitoylation in systemic metabolic regulation is largely unexplored. The authors demonstrate the role of DHHC7 in vivo using a successful liver-specific knockout mouse model that causes HFD-dependent obesity without insulin resistance.

      (2) Several studies were performed on chow and HFD, as well as male and female mice.

      (3) Plasma proteomics identified Prg4 as a circulating factor elevated in KO mice. Prg4 overexpression phenocopied the KO mice.

      (4) There is solid mechanistic data supporting the hypothesis that hepatic DHHC7 loss selectively increases Prg4 secretion as a hepatokine.

      (5) There is convincing evidence for the DHHC7 mechanism in liver: DHHC7 controls cAMP-PKA-CREB via Gαi palmitoylation. The authors recognize that the palmitoylation change is causative rather than correlated, and this needs to be more fully explored in the future.

      (6) Strong in vitro data support that Prg4 acts through adipocyte GPR146 via its SMB domain

      Weaknesses:

      (1) The assessment of liver and adipose tissue responses to DHH7 loss is insufficient to support claims that it alters systemic lipolysis. In this new mouse model, liver histology is necessary, especially given the cholesterol increase in the KO. As this is a newly established mouse line, common assessments of the liver during HFD feeding would be important for interpreting the phenotype.

      (2) The data show DHH7 loss causes adipose tissue dysfunction and alterations in lipid metabolism. Beyond that, I suggest not stating more regarding the phenotype of the DHH7 mice for this work. A thorough analysis would be needed to determine which factor drives the obesity and changes in energy balance in the mice. For example, the KO mice had lower oxygen consumption (but no change in CO2 production, which is also usually similarly altered), suggesting a CNS component could drive obesity. However, since the data are not normalized for lean mass and there is no information about locomotor activity, this analysis is incomplete. RER may be informative if available. A broad conservative description of the KO phenotype would be more accurate since Pgr4 has many paracrine targets and likely has autocrine signaling in the liver.

      (3) Most references to lipolysis or lipolysis flux systemically would be inaccurate. To suggest a suppression of lipolysis, serum NEFA would need to be measured, and in vivo or in vitro lipolysis assays performed to test the effect of DHH7 loss or the specificity of PGR4 action on adipocytes in vivo. To demonstrate adipose tissue dysfunction, analysis of lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction should be performed/measured.

      (4) Line 179: The experiment was performed in brown adipocytes to show that Prg4 does not affect p-CREB Figure S8 under the heading: "DHHC7 controls hepatic PKA-CREB activity through Gαi palmitoylation to regulate Prg4 transcription." Unless repeated using liver lysate, the conclusions stated in the text throughout the paper should be revised.

      (5) It appears that the serum and liver proteomics were only assessed for factors that increased in KO mice? Were proteins that were significantly decreased analyzed?

      (6) The beige adipocyte culture method is unclear. The methods do not describe the fat pad used, and the protocol suggests the cells would be differentiated into mature white adipocytes. If they are beige cells, a reference for the method, gene expression, and cell images could support that claim.

      (7) The use of tamoxifen can confound adipocyte studies, as it increases beigeing and weight gain even after a brief initiation period. Both groups were treated with Tam, but another way to induce Cre would be ideal.

      (8) Evidence for the lack of the glucose phenotype is incomplete. One reason could be due to the IP route of glucose administration, which has a large impact on glucose handling during a GTT. To confirm the absence of a glucose tolerance phenotype, an OGTT should be performed, as it is more physiological. In addition, the mice should be fed for 16 weeks. Prg4 affects immune cells, changing how adipose tissue expands, and 12 weeks of HFD feeding is often not long enough to see the effects of adipose tissue inflammation spilling over into the system.

      (9) There may be liver-adipose tissue crosstalk in KO mice, but this was not fully assessed in this study and would be difficult to determine in any setting, given the diverse cell types that are targets of Pdg4. The crosstalk claim is unnecessary to share the basic premises; there is the DHH7 mechanism/phenotype and the Pgr4 mechanism/phenotype, and while there is no Pgr4 adipose direct mechanism, the paper can be successfully reframed.

      (10) Although the DHH7 loss on the chow diet did not result in a phenotype, did the Pgr4 increase in the KO mice on chow? This would determine whether either i) the expression of Pgr4 is dependent on HFD/obesity, or ii) circulating Pgr4 has effects only in an HFD condition. The receptors may also change on HFD, especially in adipocytes.

      Impact:

      This work would significantly contribute to the study of liver metabolism, provided it includes data describing the liver. The role of Pgr4 in adipocytes and other cell types is of substantial value to the field of metabolism. By reframing the paper and conducting some key experiments, its quality and impact can be increased.

    3. Reviewer #2 (Public review):

      In the current report, Sun and Colleagues sought to determine the liver-specific role that DHHC7, a DHHC palmitoyltransferase protein, plays in regulating whole-body energy balance and hepatic crosstalk with adipose tissues. The authors generated an inducible, liver-specific DHHC7 knockout mouse to determine how altered palmitoylation in hepatocytes alters hepatokine production/secretion, and in turn, systemic metabolism. The ablation of DHHC7 was found to alter the production of proteoglycan 4 (Prg4), a hepatokine previously linked to metabolic regulation. The authors propose that the change in Prg4 production is mediated by the loss of Gαi palmitoylation, due to DHHC7 ablation, thereby augmenting cAMP-PKA-CREB signaling in hepatocytes, which alleviates the 'brake' on Prg4 production. The authors further propose that Prg4 overexpression leads to excessive binding to GPR146 on adipocytes, which in turn suppresses PKA-mediated HSL activation, promoting impairments in lipolysis, leading to obesity. The report is interesting and generally well-written, but it appears to have some clear gaps in additional data that would aid in interpretation. The addition of confirmatory culture studies would be incredibly helpful for testing the hypotheses being explored. My comments, concerns, and/or suggestions are outlined below in no particular order.

      (1) Figures: All data should be presented in dot-boxplot format so the reader knows how many samples were analyzed for each assay and group. n=3 for some assays/experiments is incredibly low, particularly when considering the heterogeneity in responsiveness to HFD, food intake, etc....

      (2) Figure 1E-F: It is unclear when the food intake measure was performed. Mice can alter their feeding behavior based on a myriad of environmental and biological cues. It would also be interesting to show food intake data normalized to body mass over time. Mice can counterregulate anorexigenic cues by altering neuropeptide production over time. It is not clear if this is occurring in these mice, but the timing of measuring food intake is important. Additionally, the VO2 measure appears to be presented as being normalized to total body mass, when in fact, it would probably be more accurate to normalize this to lean body mass. Normalizing to total body mass provides a denominator effect due to excessive adiposity, but white fat is not as metabolically active as other high-glucose-consuming tissues. If my memory serves me right, several reports have discussed appropriate normalizations in circumstances such as this.

      (3) Figure 1J-N: It is not all that surprising that fasting glucose and/or TGs were found to be similar between groups. It is well-established that mice have an incredible ability to become hyperinsulinemic in an effort to maintain euglycemia and lipid metabolism dynamics. A few relatively easy assays can be performed to glean better insights into the metabolic status of the authors' model. First, fasting insulin concentrations will be incredibly helpful. Secondly, if the authors want to tease out which adipose depot is most adversely affected by ablation, they could take an additional set of CON and KO mice, fast them for 5-6 hours, provide a bolus injection of insulin (similar to that provided during an insulin tolerance test), and then quickly harvest the animals ~15 minutes after insulin injections; followed by evaluating AKT phosphorylation. This will really tell them if these issues have impairments in insulin signaling. The gold-standard approach would be to perform a hyperinsulinemic-euglyemic clamp in the CON and KO mice. I now see GTT and ITT data, but the aforementioned assays could help provide insight.

      (4) Figure 3A: This looks overexposed to me.

      (5) Figures 3-4: It appears that several of these assays could be complemented with culture-based models, which would almost certainly be cleaner. The conditioned media could then be used from hepatocyte cultures to treat differentiated adipocytes.

      (6) Figure 4: It is unclear how to interpret the phospho-HSL data because the fasting state can affect this readout. It needs to be made clear how the harvest was done. Moreover, insulin and glucagon were never measured, and these hormones have a significant influence over HSL activity. I suspect the KO mice have established hyperinsulinemia, which would likely affect HSL activity. This provides an example of why performing some of these experiments in a dish would make for cleaner outcomes that are easier to interpret.

    4. Reviewer #3 (Public review):

      Summary:

      In the current manuscript, Sun et al aimed to determine the metabolic function of hepatocyte DHHC7, one of the key enzymes in protein palmitoylation. They generated inducible liver-specific Dhhc7 knockout mice and discovered that Dhhc7-LKO mice are more prone to gain weight and develop adipose expansion and obesity. Via unbiased proteomic analysis, they identified PRG4 as one of the top secreted factors in the liver of Dhhc7-LKO mice. Hepatic overexpression of PRG4 recapitulates the obesity phenotype observed in Dhh7-LKO mice. At the mechanistic level, PRG4, once secreted from the liver, can bind to GPR146 on adipocytes and inhibit PKA-HSL signaling and lipolysis. Taken together, their findings suggest a novel pathway by which the liver communicates with adipose tissue and impacts systemic metabolism.

      Strengths:

      (1) The systemic metabolic homeostasis depends on coordination among metabolically active tissues. Thus, active communication between the liver and adipose tissue when facing nutritional challenges (such as high-fat diet feeding) is crucial for achieving metabolic health. The concept that the liver can communicate with adipose tissue and impact the lipolysis process via secreted hepatokines is quite significant but remains poorly understood.

      (2) Hepatocyte Dhhc7 knockout mice developed a significant obesity phenotype, which is associated with adipose expansion.

      (3) Unbiased proteomic analysis identified PRG4 as one of the top secreted factors in the liver of Dhh7-LKO mice. Hepatic overexpression of PRG4 recapitulates the obesity phenotype observed in Dhh7-LKO mice.

      (4) In vitro cell-based assay showed that PRG4 can bind to adipocyte GPR146, inhibit PKA-mediated HSL phosphorylation, and subsequently, the lipolysis process.

      Weaknesses:

      (1) Lack of a causal-effect study to generate evidence directly linking hepatocyte DHH7 and PRG4 in driving adipose expansion and obesity upon HFD feeding.

      (2) Lack of direct evidence to support that PRG4 inhibits adipocyte lipolysis via GPR146. A functional assay demonstrating adipocyte lipolysis is required.

      (3) The conclusion is largely based on the correlation evidence.

    5. Author response:

      Public reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The assessment of liver and adipose tissue responses to DHH7 loss is insufficient to support claims that it alters systemic lipolysis. In this new mouse model, liver histology is necessary, especially given the cholesterol increase in the KO. As this is a newly established mouse line, common assessments of the liver during HFD feeding would be important for interpreting the phenotype.

      We will add the data of the liver histology in the revised version.

      (2) The data show DHH7 loss causes adipose tissue dysfunction and alterations in lipid metabolism. Beyond that, I suggest not stating more regarding the phenotype of the DHH7 mice for this work. A thorough analysis would be needed to determine which factor drives the obesity and changes in energy balance in the mice. For example, the KO mice had lower oxygen consumption (but no change in CO2 production, which is also usually similarly altered), suggesting a CNS component could drive obesity. However, since the data are not normalized for lean mass and there is no information about locomotor activity, this analysis is incomplete. RER may be informative if available. A broad conservative description of the KO phenotype would be more accurate since Pgr4 has many paracrine targets and likely has autocrine signaling in the liver.

      We will add the data of CO2 production, locomotor activity and RER in the revised version.

      (3) Most references to lipolysis or lipolysis flux systemically would be inaccurate. To suggest a suppression of lipolysis, serum NEFA would need to be measured, and in vivo or in vitro lipolysis assays performed to test the effect of DHH7 loss or the specificity of PGR4 action on adipocytes in vivo. To demonstrate adipose tissue dysfunction, analysis of lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction should be performed/measured.

      We will measure the serum NEFA to test the effect of DHHC7. We will analyze the lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction.

      (4) Line 179: The experiment was performed in brown adipocytes to show that Prg4 does not affect p-CREB Figure S8 under the heading: "DHHC7 controls hepatic PKA-CREB activity through Gαi palmitoylation to regulate Prg4 transcription." Unless repeated using liver lysate, the conclusions stated in the text throughout the paper should be revised.

      The figure S8 is to demonstrate that Prg4 has no impact on forskolin induced CREB phosphorylation at Ser133, and provide the evidence that the prg4 acts on the upstream of adenylyl cyclase. We will revise the description.

      (5) It appears that the serum and liver proteomics were only assessed for factors that increased in KO mice? Were proteins that were significantly decreased analyzed?

      We are analyzing the decreased proteins in the following project.

      (6) The beige adipocyte culture method is unclear. The methods do not describe the fat pad used, and the protocol suggests the cells would be differentiated into mature white adipocytes. If they are beige cells, a reference for the method, gene expression, and cell images could support that claim.

      We will add a reference for the method, gene expression, asn cell images.

      (7) The use of tamoxifen can confound adipocyte studies, as it increases beigeing and weight gain even after a brief initiation period. Both groups were treated with Tam, but another way to induce Cre would be ideal.

      We will use the Doxycycline-inducible systems in the future.

      (8) Evidence for the lack of the glucose phenotype is incomplete. One reason could be due to the IP route of glucose administration, which has a large impact on glucose handling during a GTT. To confirm the absence of a glucose tolerance phenotype, an OGTT should be performed, as it is more physiological. In addition, the mice should be fed for 16 weeks. Prg4 affects immune cells, changing how adipose tissue expands, and 12 weeks of HFD feeding is often not long enough to see the effects of adipose tissue inflammation spilling over into the system.

      We will perform the OGTT and feed the mice for 16 weeks in the future.

      (9) There may be liver-adipose tissue crosstalk in KO mice, but this was not fully assessed in this study and would be difficult to determine in any setting, given the diverse cell types that are targets of Pdg4. The crosstalk claim is unnecessary to share the basic premises; there is the DHH7 mechanism/phenotype and the Pgr4 mechanism/phenotype, and while there is no Pgr4 adipose direct mechanism, the paper can be successfully reframed.

      We will reframe the paper.

      (10) Although the DHH7 loss on the chow diet did not result in a phenotype, did the Pgr4 increase in the KO mice on chow? This would determine whether either i) the expression of Pgr4 is dependent on HFD/obesity, or ii) circulating Pgr4 has effects only in an HFD condition. The receptors may also change on HFD, especially in adipocytes.

      We will test the Prg4 in the KO mice on chow diet.

      Reviewer #2 (Public review):

      (1) Figures: All data should be presented in dot-boxplot format so the reader knows how many samples were analyzed for each assay and group. n=3 for some assays/experiments is incredibly low, particularly when considering the heterogeneity in responsiveness to HFD, food intake, etc.

      We will present the data in dot-boxplot format.

      (2) Figure 1E-F: It is unclear when the food intake measure was performed. Mice can alter their feeding behavior based on a myriad of environmental and biological cues. It would also be interesting to show food intake data normalized to body mass over time. Mice can counterregulate anorexigenic cues by altering neuropeptide production over time. It is not clear if this is occurring in these mice, but the timing of measuring food intake is important. Additionally, the VO2 measure appears to be presented as being normalized to total body mass, when in fact, it would probably be more accurate to normalize this to lean body mass. Normalizing to total body mass provides a denominator effect due to excessive adiposity, but white fat is not as metabolically active as other high-glucose-consuming tissues. If my memory serves me right, several reports have discussed appropriate normalizations in circumstances such as this.

      We will see how to be more accurate to normalize.

      (3) Figure 1J-N: It is not all that surprising that fasting glucose and/or TGs were found to be similar between groups. It is well-established that mice have an incredible ability to become hyperinsulinemic in an effort to maintain euglycemia and lipid metabolism dynamics. A few relatively easy assays can be performed to glean better insights into the metabolic status of the authors' model. First, fasting insulin concentrations will be incredibly helpful. Secondly, if the authors want to tease out which adipose depot is most adversely affected by ablation, they could take an additional set of CON and KO mice, fast them for 5-6 hours, provide a bolus injection of insulin (similar to that provided during an insulin tolerance test), and then quickly harvest the animals ~15 minutes after insulin injections; followed by evaluating AKT phosphorylation. This will really tell them if these issues have impairments in insulin signaling. The gold-standard approach would be to perform a hyperinsulinemic-euglyemic clamp in the CON and KO mice. I now see GTT and ITT data, but the aforementioned assays could help provide insight.

      We have the data for evaluating AKT phosphorylation and will add it in the revised version.

      (4) Figure 3A: This looks overexposed to me.

      We will replace it with short exposed one.

      (5) Figures 3-4: It appears that several of these assays could be complemented with culture-based models, which would almost certainly be cleaner. The conditioned media could then be used from hepatocyte cultures to treat differentiated adipocytes.

      We will perform the cell culture experiments for Figures 3-4

      (6) Figure 4: It is unclear how to interpret the phospho-HSL data because the fasting state can affect this readout. It needs to be made clear how the harvest was done. Moreover, insulin and glucagon were never measured, and these hormones have a significant influence over HSL activity. I suspect the KO mice have established hyperinsulinemia, which would likely affect HSL activity. This provides an example of why performing some of these experiments in a dish would make for cleaner outcomes that are easier to interpret.

      We will perform some experiments in cell culture dish.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) Lack of a causal-effect study to generate evidence directly linking hepatocyte DHH7 and PRG4 in driving adipose expansion and obesity upon HFD feeding.

      We will perform the causal-effect study to demonstrate the hypothesis.

      (2) Lack of direct evidence to support that PRG4 inhibits adipocyte lipolysis via GPR146. A functional assay demonstrating adipocyte lipolysis is required.

      We will add the direct evidence in the revised version.

      (3) The conclusion is largely based on the correlation evidence.

      We will perform the experiment to strengthen the conclusion base on the a causal-effect study.

    1. eLife assessment

      The manuscript presents important findings with theoretical or practical implications beyond a single subfield. The work is overall solid, and the methods, data, and analyses broadly support the claims. Although the novelty of this study and the work put into it are appreciated, there are also clearly some weaknesses that should be addressed.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patient-specific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9-GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      (g) Suggested fixes/experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers)

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes/experiments

      Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii)Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus.

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiple-comparison corrections).

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

    3. Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPS-fGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH⁺ neurons, GFAP⁺ glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patientspecific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      We thank the reviewer for highlighting the importance of genetic and biological replication. An additional patient-derived iPSC line was included in the manuscript, therefore, our study includes two independent nGD patient-derived iPSC lines, GD2-1260 (GBA1<sup>L444P/P415R</sup>) and GD2-10-257 (GBA1<sup>L444P/RecNcil</sup>), both of which carry the severe mutations associated with nGD. These two lines represent distinct genetic backgrounds and were used to demonstrate the consistency of key disease phenotypes (reduced GCase activity, elevated substrate, impaired dopaminergic neuron differentiation, etc.) across different patient’s MLOs. Major experiments (e.g., GCase activity assays, substrate, immunoblotting for DA marker TH, and therapeutic testing with SapC-DOPS-fGCase, AAV9-GBA1) were performed using both patient lines, with results showing consistent phenotypes and therapeutic responses (see Figs. 2-6, and Supplementary Figs. 4-5). To ensure clarity and transparency, a new Supplementary Table 2 summarizes the characterization of both the GD2-1260 and GD2-10-257 lines.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      Biological replication was ensured in our study by conducting experiments in at least 3 independent differentiations per line, and technical replicates (multiple organoids/fields per batch) were averaged accordingly. We have clarified biological replicates and differentiation in the figure legends. 

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      In the revision, we have clarified biological replicates and differentiation in the figure legend in Fig.1E; Fig.2B,2G; Fig.3F, 3G; Fig.4B-C,E,H-J, M-N; Fig.6D; and Fig.7A-C, I.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was not feasible because GBA1 overlaps with a highly homologous pseudogene (PGBA), which makes precise editing technically challenging. Consequently, only the L444P mutation was successfully corrected, and the resulting isogenic line retains the P415R mutation in a heterozygous state. Because Gaucher disease is an autosomal recessive disorder, individuals carrying a single GBA1 mutation (heterozygous carriers) do not develop clinical symptoms. Therefore, the partially corrected isogenic line, which retains only the P415R allele, represents a clinically relevant carrier model. Consistent with this, our results show that GCase activity was restored to approximately 50% of wild-type levels (Fig.4B-C), supporting the expected heterozygous state. These findings also make it unlikely that the remaining differences observed are due to clonal variation or epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      We agree that a systematic analysis of maturation stages is essential for validating the MLO model. Our data integrated a longitudinal comparison across multiple developmental windows (Weeks 3 to 28) to characterize the transition from progenitors to mature/functional states for nGD phenotyping and evaluation of therapeutic modalities: 1) DA differentiation (Wks 3 and 8 in Fig. 3): qPCR analysis demonstrated the progression of DA-specific programs. We observed a steady increase in the mature DA neuron marker TH and ASCL1. This was accompanied by a gradual decrease in early floor plate/progenitor markers FOXA2 and PLZF, indicating a successful differentiation path from progenitors to differentiated/mature DA neurons. 2) Glycosphingolipid substrates accumulation (Wks 15 and 28 in Fig 2): To assess late-stage nGD phenotyping, we compared GluCer and GluSph at Week 15 and Week 28. This comparison highlights the progressive accumulation of substrates in nGD MLOs, reflecting the metabolic consequences of the disease at different mature stage. 3) Organoid growth dynamics (Wks 4, 8, and 15 in new Fig. 4): The new Fig. 4 tracks physical maturation through organoid size and growth rates across three key time points, providing a macro-scale verification of consistent development between WT and nGD groups. By comparing these early (Wk 3-8) and late (Wk 15-28) stages, we confirmed that our MLOs transition from a proliferative state to a post-mitotic, specialized neuronal state, satisfied the requirement for comparing distinct maturation stages.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work.

      (g) Suggested fixes / experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers).

      Additional line iPSC GD2-10-257 derived MLO was included in the manuscript. This was addressed above [see response to Weaknesses (1)-a]. 

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was unsuccessful because the GBA1 gene overlaps with a pseudogene (PGBA) located 16 kb downstream of GBA1, which shares 96-98% sequence similarity with GBA1 (Ref#1, #2), which complicates precise editing. GBA1 is shorter (~5.7 kb) than PGBA (~7.6 kb). The primary exonic difference between GBA1 and PGBA is a 55-bp deletion in exon 9 of the pseudogene. As a result, the isogenic line we obtained carries only the P415R mutation, and L444P was corrected to the normal sequence. We have included this limitation in the Methods as “This gene editing strategy is expected to also target the GBA1 pseudogene due to the identical target sequence, which limits the gene correction on certain mutations (e.g., P415R)”. 

      References:

      (1) Horowitz M., Wilder S., Horowitz Z., Reiner O., Gelbart T., Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics (1989). 4, 87–96. doi:10.1016/0888-7543(89)90319-4

      (2) Woo EG, Tayebi N, Sidransky E. Next-Generation Sequencing Analysis of GBA1: The Challenge of Detecting Complex Recombinant Alleles. Front Genet. (2021). 12:684067. doi:10.3389/fgene.2021.684067. PMCID: PMC8255797.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      This was addressed above [see response to Weaknesses (1)-b, (1)-c]. 

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes/experiments

      Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii) Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      We thank the reviewer for these valuable suggestions. We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work. Importantly, the primary conclusions of our manuscript, that GBA1 mutations in nGD MLOs resulted in nGD pathologies such as diminished enzymatic function, accumulation of lipid substrates, widespread transcriptomic changes, and impaired dopaminergic neuron differentiation, which can be corrected by several therapeutic strategies in this study, are supported by the evidence presented. The suggested experiments represent an important direction for future research using brain organoids.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      We agree with the reviewer. Because this is a proof-of-principle study, the treatment was designed within a short time window. Long-term studies with more comprehensive outcome assessments will be conducted in future work.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      We appreciate the reviewer’s concerns. This study was intended to demonstrate the feasibility and initial response of MLOs to AAV therapy. A comprehensive evaluation of AAV biodistribution will be considered in future studies.

      The penetration and distribution of SapC-DOPS have been extensively characterized in prior studies. In vivo biodistribution of SapC–DOPS coupled CellVue Maroon, a fluorescent cargo, was examined in mice bearing human tumor xenografts using real-time fluorescence imaging, where CellVue Maroon fluorescence in tumor remained for 48 hours (Ref. #3: Fig. 4B, mouse 1), 100 hours (Ref. #4: Fig. 5), up to 216 hours (Ref. #5: Fig. 3). Uptake kinetics were also demonstrated in cells, with flow cytometry quantification showing that fluorescent cargo coupled SapC-DOPS nanovesicles, were incorporated into human brain tumor cell membranes within minutes and remained stably incorporated into the cells for up to one hour (Ref. # 6: Fig. 1a and Fig. 1b). Building on these findings, the present study focuses on evaluating the restoration of GCase function rather than reexamining biodistribution and uptake kinetics.

      References:

      (3) X. Qi, Z. Chu, Y.Y. Mahller, K.F. Stringer, D.P. Witte, T.P. Cripe. Cancer-selective targeting and cytotoxicity by liposomal-coupled lysosomal saposin C protein. Clin. Cancer Res. (2009) 15, 5840-5851. PMID: 19737950.

      (4) Z. Chu, S. Abu-Baker, M.B. Palascak, S.A. Ahmad, R.S. Franco, and X. Qi. Targeting and cytotoxicity of SapC-DOPS nanovesicles in pancreatic cancer. PLOS ONE (2013) 8, e75507. PMID: 24124494.

      (5) Z. Chu, K. LaSance, V.M. Blanco, C.-H. Kwon, B., Kaur, M., Frederick, S., Thornton, L., Lemen, and X. Qi. Multi-angle rotational optical imaging of brain tumors and arthritis using fluorescent SapC-DOPS nanovesicles. J. Vis. Exp. (2014) 87, e51187, 17. PMID: 24837630.

      (6) J. Wojton, Z. Chu, C-H. Kwon, L.M.L. Chow, M. Palascak, R. Franco, T. Bourdeau, S. Thornton, B. Kaur, and X. Qi. Systemic delivery of SapC-DOPS has antiangiogenic and antitumor effects against glioblastoma. Mol. Ther. (2013) 21, 1517-1525. PMID: 23732993.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      Including inactive fGCase would confound the assessment of fGCase in MLOs by immunoblot and immunofluorescence; therefore, saposin C–DOPS was used as the control instead. 

      We agree that assessment of Off-target expression and potential cytotoxicity for AAV is important; this will be included in future studies.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      To address this comment, we have added a new table (Supplementary Table 2) comparing the four therapeutic modalities and summarizing their respective outcomes. While this study focused on short-term responses as a proof-of-principle, future work will explore long-term therapeutic effects. 

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      We appreciate the reviewer’s suggestions. The therapeutic testing in patient-derived MLOs was designed as a proof-of-principle study to demonstrate feasibility and the primary response (rescue of GCase function) to the treatment. A comprehensive, long-term therapeutic evaluation of AAV and SapC-DOPS-fGCase is indeed important for a complete assessment; however, this represents a separate therapeutic study and is beyond the scope of the current work.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      For the AAV-treated experiments, we agree that measuring AAV copy number and GFP expression would provide additional information. However, the primary goal of this study was to demonstrate the key therapeutic outcome, rescue of GCase function by AAV-delivered normal GCase, which is directly relevant to the treatment objective.

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      As noted above [see response to Weakness (3)-c], using inert GCase would confound the assessment of fGCase uptake in MLOs; therefore, it was not suitable for this study. See response above for the distribution and uptake kinetics of SapC-DOPS [see response to Weaknesses (3)-b].

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      We have added a new table (Supplementary Table 2) providing a head-to-head comparison of the treatment effects. 

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      We agree that the absence of microglia and vasculature in midbrain-like organoids represents a limitation, as we have discussed in the manuscript. In this revision, we highlighted this limitation in the Discussion section and clarified that it may contribute to incomplete phenotyping and phenotypic rescue observed in our therapeutic experiments. Additionally, we have outlined future directions to incorporate microglia and vascularization into the organoid system to better recapitulate the in vivo environment and improve translational relevance (see 7th paragraph in the Discussion).

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that certain abnormalities, such as patterning defects observed during early differentiation, likely reflect developmental consequences of GBA1 mutations rather than degenerative processes. Conversely, phenotypes such as substrate accumulation, lysosomal dysfunction, and impaired dopaminergic maturation at later stages are interpreted as degenerative features. We have updated the Results and Discussion sections to avoid conflating developmental defects with neurodegenerative mechanisms.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      The manuscript has been revised to avoid overstatements.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      The manuscript now includes further plans to address the incorporation of microglia and vascularization, described in the last two paragraphs in the Discussion. Pilot study of microglia incorporation will be reported when it is completed.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      We have clarified biological replicates and differentiation in the figure legend [see response to Weaknesses (1)-b, (1)-c]. 

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated Statistical analysis in the methods as described below:

      “For comparisons between two groups, data were analyzed using unpaired two-tailed Student’s t-tests when the sample size was ≥6 per group and normality was confirmed by the Shapiro-Wilk test. When the normality assumption was not met or when sample sizes were small (n < 6), the non-parametric Mann-Whitney U test was used instead. For comparisons involving three or more groups, one-way ANOVA followed by Tukey’s multiple comparison test was applied when data were normally distributed; otherwise, the nonparametric Dunn’s multiple comparison test was used. Exclusion of outliers was made based on cut-offs of the mean ±2 standard deviations. All statistical analyses were performed using GraphPad Prism 10 software. Exact p-values are reported throughout the manuscript and figures where feasible. A p-value < 0.05 was considered statistically significant.”

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      In this work, quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM. This multilevel averaging approach minimizes bias from regional heterogeneity within organoids and accounts for variability across differentiations. Representative confocal images shown in the figures were selected to accurately reflect the quantified data. We believe this standardized quantification strategy ensures robust and reproducible results while appropriately representing the 3D architecture of the organoids.

      In the revision, we have clarified the method used for image analysis of sectioned MLOs as below:

      “Quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed using ImageJ (NIH) on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM.”

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      RNA-seq data are from the same batch. The mapping rate is >90%. GEO accession will be active upon publication. These were included in the Methods.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      We have revised the figure legends to include replicates for each figure and statistical tests [see response in weaknesses (1)-b, (1)-c].

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      Statistical analysis method is provided in the revision [see response in Weaknesses (5)-b].

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      We validated the MLO identity by 1) FOXG1 and 2) EN1. FOXG1 was barely detectable in Wk8 75.1_MLO but highly present in ‘age-matched’ cerebral organoid (CO), suggesting our culturing method is midbrain region-oriented. In nGD MLO, FOXG1 expression is significantly higher than 75.1_MLO, indicating that there was aberrant anterior-posterior brain specification, consistent with the transcriptomic dysregulation observed in our RNA-seq data.

      To further confirm midbrain identity, we examined the expression of EN1, an established midbrain-specific marker. Quantitative RT-PCR analysis demonstrated that EN1 expression increased progressively during differentiation in both WT-75.1 and nGD2-1260 MLOs at weeks 3 and 8 (Author response image 1). EN1 reached 34-fold and 373-fold higher levels than in WT-75.1 iPSCs at weeks 3 and 8, respectively, in WT-75.1 MLOs. In nGD MLOs, although EN1 expression showed a modest reduction at week 8, the levels were not significantly different from those observed in age-matched WT-75.1 MLOs (p > 0.05, ns).

      Author response image 1.

      qRT-PCR quantification of midbrain progenitor marker EN1 expression in WT-75.1 and GD2-1260 MLOs at Wk3 and Wk8. Data was normalized to WT-75.1 hiPSC cells and presented as mean ± SEM (n = 3-4 MLOs per group).ns, not significant.<br />

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      We quantified TH expression at both the mRNA level (Fig. 3F) and the protein level (Fig. 3G/H) from whole-organoid lysates, which provides a more consistent and integrative measure across samples. These TH expression levels correlated well with the corresponding extracellular (medium) dopamine concentrations for each genotype. In contrast, TH⁺ neuron counts may not reliably reflect total cellular dopamine levels because the number of cells captured on each organoid section varies substantially, making normalization difficult. Measuring intracellular dopamine is an alternative approach that will be considered in future studies.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus. (off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus). 

      The off-target effect was analyzed during gene editing and the chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis. The related method was also updated as stated below:

      “The chance to target other Off-targets is low due to low Off-target scores ranked based on the MIT Specificity Score analysis (Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013).https://doi.org/10.1038/nbt.2647).”

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      The normalization was to the protein of the organoid lysate. This was clarified in the Methods section in the revision as stated below:

      “The GluCer and GluSph levels in MLO were normalized to total MLO protein (mg) that were used for glycosphingolipid analyses. Protein mass was determined by BCA assay and glycosphingolipid was expressed as pmol/mg protein. Additionally, GluSph levels in the culture medium were quantified and normalized to the medium volume (pmol/mL).”

      Representative LC-MS chromatograms for both normal and GD MLOs have been included in a new figure, Supplementary Figure 2.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiplecomparison corrections).

      This was addressed above [see response to Weaknesses (1)-b and (5)-b].

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      The title was revised to: Patient-Specific Midbrain Organoids with CRISPR Correction Recapitulate Neuronopathic Gaucher Disease Phenotypes and Enable Evaluation of Novel Therapies

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      This was addressed above [see response to Weaknesses (1)-a, (1)-b, (1)-c].

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      As addressed above [see response to Weaknesses (2)], the suggested experiments in b) and c) would provide additional insights into this study and we will consider them in future work. 

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      This was addressed above [see response to Weaknesses (3)-a to e].

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

      This was addressed above [see response to Weaknesses (1)-b, (5)-b].

      Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

      We thank the reviewer for the supportive remarks.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a 

      general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      We observed reduced dopaminergic neuron marker TH expression in GBA1 L444P/RecNciI (GD2-10-257) MLOs, suggesting that this line also exhibits defects in dopaminergic neuron differentiation. These data are provided in a new Supplementary Fig. 4E, and are summarized in new Supplementary Table 2 in the revision.

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPSfGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH⁺ neurons, GFAP⁺ glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      All cell types in wild-type MLOs are expected to express GBA1, as it is a housekeeping gene broadly expressed across neurons, astrocytes, and other brain cell types. Its lysosomal function is essential for cellular homeostasis and is therefore not restricted to any specific lineage. (https://www.proteinatlas.org/ENSG00000177628GBA1/brain/midbrain). 

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

      We appreciate the reviewer’s suggestion; however, we respectfully prefer to retain the current order of Figures 2 and 3, as we believe this structure provides the clearest narrative flow. Figure 2 establishes the core biochemical hallmarks: reduced GCase activity, substrate accumulation, and global transcriptomic dysregulation (1,429 DEGs enriched in neural development, WNT signaling, and lysosomal pathways), which together provide essential molecular context for studying the specific cellular differentiation defects presented in Figure 3. Presenting the broader disease landscape first creates a coherent mechanistic link to the subsequent analyses of midbrain patterning and dopaminergic neuron impairment.

      To enhance readability, we have added a brief transitional sentence at the start of the Figure 3 paragraph: “Building on the molecular and transcriptomic hallmarks of GCase deficiency observed in nGD MLOs (Figure 2), we next investigated the impact on midbrain patterning and dopaminergic neuron differentiation (Figure 3).”

    1. eLife Assessment

      This modelling study tests several hypotheses describing how seasonality and migration drive the epidemiology of Rift Valley Fever Virus among transhumant cattle in The Gambia. The work is methodologically solid, and the findings offer valuable insights into how the movement of cattle in and out of the Gambia River and Sahel ecoregions could lead to source-sink transmission dynamics among cattle subpopulations, sustaining endemic transmission.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses data from a recent RVFV serosurvey among transhumant cattle in The Gambia to inform the development of an RVFV transmission model. The model incorporates several hypotheses that capture the seasonal nature of both vector-borne RVFV transmission and cattle migration. These natural phenomena are driven by contrasting wet and dry seasons in The Gambia's two main ecoregions and are purported to drive cyclical source-sink transmission dynamics. Although the Sahel is hypothesized to be unsuitable for year-long RVFV transmission, findings suggest that cattle returning from the Gambia River to the Sahel at the beginning of the wet season could drive repeated RVFV introductions and ensuing seasonal outbreaks. Upon review, the authors have removed an additional analysis evaluating the potential impacts of cattle movement bans on transmission dynamics, which was poorly supported by the methodological approach.

      Strengths:

      Like most infectious diseases in animal systems in low- and middle-income countries, the transmission dynamics of RVFV in cattle in The Gambia are poorly understood. This study harnesses important data on RVFV seroepidemiology to develop and parameterize a novel transmission model, providing plausible estimates of several epidemiological parameters and transmission dynamic patterns.

      This study is well written and easy to follow.

      The authors consider both deterministic and stochastic formulations of their model, demonstrating potential impacts of random events (e.g. extinctions) and providing confidence regarding model robustness.

      The authors use well-established Bayesian estimation techniques for model fitting and confront their transmission model with a seroepidemiological model to assess model fit.

      Elasticity analyses help to understand the relative importance of competing demographic and epidemiological drivers of transmission in this system.

      Weaknesses:

      The model does not include an impact of infection on cattle birth rates, but the authors justify that this parameter should have limited impact on dynamics given predicted low-level circulation patterns, as opposed to explosive outbreaks, in this region.

      The importance of the LVFV positivity decay rate is highlighted but loss of immunity is not considered in the SIR model. The authors do discuss uncertainty regarding model structure and a need for future data collection to begin to answer this question.

      The model's structure, including homogenous mixing within each ecoregion and step-change seasonality, allows for estimation of generalized transmission rates at a macro scale. However, it greatly simplifies the movement process itself and assumes that transhumant cattle movement is the only mechanism for RVF reintroduction into the Sahel region. The authors discuss that integration of more finely-scaled movement and contact data may help to address this limitation in future work.

      This model seems well-suited to be exploited in future work to explore for e.g. impacts of cattle vaccination, and potential differential efficiency when targeting T herds relative to M or L.

      Comments on revisions:

      I thank the authors for thoughtfully and thoroughly addressing my concerns. I have no further comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public reviews:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In this revision, we have:

      - clarified that while epidemics occur in other parts of sub-Saharan Africa, our results are consistent with the epidemiological narrative of RVF in The Gambia, characterised by sustained, moderate transmission without resulting in substantial outbreaks (hyperendemicity).

      - discussed how model assumptions (e.g. seasonality, homogenous mixing) may bias our results toward an endemic quasi-equilibrium dynamic.

      - highlighted the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In this revision we have:

      - clarified this distinction in the manuscript to avoid overinterpretation.

      - emphasized the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause substantial abortions and neonatal deaths, these events occur during sporadic epidemics. In the Gambian context, where we’re not observing large outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This approach is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we have acknowledged this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for the M subpopulation in the dry season were not included in the appendix due to an oversight, though demographic turnover was implemented in the model code. We have now added the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay) is an important consideration in RVFV serology; however, whether this decline reflects a true loss of protective immunity following natural infection remains unknown. Available evidence suggests that infected cattle likely develop long-lasting immunity, and findings in humans further support this assumption, although longitudinal field data regarding RVFV-specific antibody durability in animals are not available to the best of our knowledge. From a modelling perspective, our objective was to estimate FOI and use it to predict an age-seroprevalence curve consistent with the observed cross-sectional age-seroprevalence patterns. We therefore adopted a parsimonious SIR framework, interpreting loss of seropositivity as a potential explanation for discrepancies between observed and predicted age-seroprevalence rather than explicitly modelling waning immunity. We have now:

      - clarified this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      - highlighted that while an SEIS/SIRS framework could theoretically generate different long-term dynamics, evaluating this approach requires stronger evidence for true immunity loss.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment and for raising an important conceptual point. However, the force of infection in our study is not estimated using a serocatalytic framework. Instead, FOI is estimated mechanistically within the transmission model as a function of the number of infectious cattle, rather than from age-stratified seroprevalence data.

      RVF-induced mortality is accounted for through its effect on the infectious compartment, where increased mortality reduces the number and duration of infectious cattle and therefore indirectly reduces FOI. Consequently, RVF-related cattle death does not need to be explicitly incorporated into the FOI expression itself. Seroreversion similarly does not influence FOI estimation under this modelling framework. We have clarified this distinction in the Methods section to avoid confusion between mechanistic transmission models and serocatalytic approaches.

      (7) Clarifying previous vs. current study components

      We have revised the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We have expanded the limitations section to identify the sparse household movement data as contributing most to uncertainty. We have outlined how these limitations may have implications for our conclusions, and may lead to under- or over-estimation of periods of heightened transmission risk.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not ideally be suited to exploring a movement ban. In this revised manuscript, we have removed this analysis. We are currently developing separate work focused on RVF vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

      Reviewer #1 (Recommendations for the authors):

      We thank the reviewer for the recommendations regarding the Introduction, Methods, Results, and Supplementary Figures. We have addressed these points below and revised the manuscript accordingly.

      (1) Introduction: Should avoid describing as "inaccessible" the regions that are inhabited by nomadic and transhumant pastoralists.

      We have revised the wording to “hard-to-reach” regions.

      (2) Methods: Can the authors state what share of the animals included in the household survey data were cattle as opposed to other small ruminants? It would be helpful to understand what share of the data is "excluded"

      We have now included the total number of cattle sampled, providing clarity on the proportion of data used in the analyses.

      (3) Methods: When introducing the deterministic model, it seems unnecessary to mention the initialization conditions (i.e., introduction of a single infected individual at time 0) when this is later repeated in the Estimation of model parameters section, where it seems simulations were first conducted.

      We have removed the redundant description.

      (4) Results: Could the negative correlation between geographic distance of connected herds and mean seroprevalence simply indicate proximal exposure rather than common risk factors?

      We acknowledge that both mechanisms are plausible. RVFV transmission is strongly influenced by share environmental factors that shape mosquito dynamics; however, direct transmission between proximal cattle herds may also occur through close contact with infectious tissues, bodily fluids, or contaminated materials. We have clarified this interpretation in the Results section.

      (5) Figure S5: inconsistent notation for the scaling factor parameter (tau), which is expressed in equations and tables as psi.

      We thank the reviewer for identifying this issue and have corrected all instances to ensure consistent use of tau throughout the manuscript.

      (6) Figure S6: Why a density plot, isn't the number of temporary extinctions (x-axis) discrete?

      We have replaced the density plot with a bar plot in Figure S6.

    1. eLife Assessment

      This useful study examines whether the sugar trehalose, coordinates energy supply with the gene programs that build muscle in the cotton bollworm (Helicoverpa armigera). The evidence for this currently is incomplete. The central claim - that trehalose specifically regulates an E2F/Dp-driven myogenic program - is not supported by the specificity of the data: perturbations and sequencing are systemic, alternative explanations such as general energy or amino-acid scarcity remain plausible, and mechanistic anchors are also limited. The work will interest researchers in insect metabolism and development; focused, tissue-resolved measurements together with stronger mechanistic controls would substantially strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this work by Mohite et al., they have used transcriptomic and metabolic profiling of H. armigera, muscle development, and S. frugiperda to link energy trehalose metabolism and muscle development. They further used several different bioinformatics tools for network analysis to converge upon transcriptional control as a potential mechanism of metabolite-regulated transcriptional programming for muscle development. The authors have also done rescue experiments where trehalose was provided externally by feeding, which rescues the phenotype. Though the study is exciting, there are several concerns and gaps that lead to the current results as purely speculative. It is difficult to perform any genetic experiments in non-model insects; the authors seem to suggest a similar mechanism could also be applicable in systems like Drosophila; it might be possible to perform experiments to fill some missing mechanistic details.

      A few specific comments below:

      The authors used N-(phenylthio) phthalimide (NPP), a trehalose-6-phosphate phosphatase (TPP) inhibitor. They also find several genes, including enzymes of trehalose metabolism, that change. Further, several myogenic genes are downregulated in bulk RNA sequencing. The major caveat of this experiment is that the NPP treatment leads to reduced muscle development, and so the proportion of the samples from the muscles in bulk RNA sequencing will be relatively lower, which might have led to the results. So, a confirmatory experiment has to be performed where the muscle tissues are dissected and sequenced, or some of the interesting targets could be validated by qRT-PCR. Further to overcome the off-target effects of NPP, trehalose rescue experiments could be useful.

      Even the reduction in the levels of ADP, NAD, NADH, and NMN, all of which are essential for efficient energy production and utilization, could be due to the loss of muscles, which perform predominantly metabolic functions due to their mitochondria-rich environment. So it becomes difficult to judge if the levels of these energy molecules' reduction are due to a cause or effect.

      The authors have used this transcriptomic data for pathway enrichment analysis, which led to the E2F family of transcription factors and a reduction in the level of when trehalose metabolism is perturbed. EMSA experiments, though, confirm a possibility of the E2F interaction with the HaTPS/TPP promoter, but it lacks proper controls and competition to test the actual specificity of this interaction. Several transcription factors have DNA-binding domains and could bind any given DNA weakly, and the specificity is ideally known only from competitive and non-competitive inhibition studies.

      The work seems to have connected the trehalose metabolism with gene expression changes, though this is an interesting idea, there are no experiments that are conclusive in the current version of the manuscript. If the authors can search for domains in the E2F family of transcription factors that can bind to the metabolite, then, if not, a chip-seq is essential to conclusively suggest the role of E2F in regulating gene expression tuned by the metabolites.

      Some of the above concerns are partially addressed in experiments where silencing of E2F/Dp shows similar phenotypes as with NPP and dsRNA. It is also notable that silencing any key transcription factor can have several indirect effects, and delayed pupation and lethality could not be definitely linked to trehalose-dependent regulation.

      Trehalose rescue experiments that rescue phenotype and gene expression are interesting. But is it possible that the fed trehalose is metabolized in the gut and might not reach the target tissue? In which case, the role of trehalose in directly regulating transcription factors becomes questionable. So, a confirmatory experiment is needed to demonstrate that the fed trehalose reaches the target tissues. This could possibly be done by measuring the trehalose levels in muscles post-rescue feeding. Also, rescue experiments need to be done with appropriate control sugars.

      No experiments are performed with non-target control dsRNA. All the experiments are done with an empty vector. But an appropriate control should be a non-target control.

    3. Reviewer #2 (Public review):

      Summary:

      This study shows that the knockdown of the effects of TPS/TPP in Helicoverpa armigera and Spodoptera frugiperda can be rescued by trehalose treatment. This suggests that trehalose metabolism is necessary for development in the tissues that NPP and dsRNA can reach.

      Strengths:

      This study examines an important metabolic process beyond model organisms, providing a new perspective on our understanding of species-specific metabolism equilibria, whether conserved or divergent.

      Weaknesses:

      While the effects observed may be truly conserved across Lepidopterans and may be muscle-specific, the study largely relies on one species and perturbation methods that are not muscle-specific. The technical limitations arising from investigations outside model systems, where solid methods are available, limit the specificity of inferences that may be drawn from the data.

    4. Reviewer #3 (Public review):

      The hypothesis is that Trehalose metabolism regulates transcriptional control of muscle development in lepidopteran insects.

      The manuscript investigates the role of Trehalose metabolism in muscle development. Through sequencing and subsequent bioinformatics analysis of insects with perturbed trehalose metabolism (knockdown of TPS/TPP), the authors have identified transcription factor E2F, which was validated through RT-PCR. Their hypothesis is that trehalose metabolism regulates E2F, which then controls the myogenic genes. Counterintuitive to this hypothesis, the investigators perform EMSAs with the E2F protein and promoter of the TPP gene and show binding. Their knockdown experiments with Dp, the binding partner of E2F, show direct effect on several trehalose metabolism genes. Similar results are demonstrated in the trehalose feeding experiment, where feeding trehalose leads to partial rescue of the phenotype observed as a result of Dp knockdown. This seems contradictory to their hypothesis. Even more intriguing is a similar observation between paramyosin, a structural muscle protein, and E2F/Dp - they show that paramyosin regulates E2F/Dp and E2F/Dp regulated paramyosin. The only plausible way to explain the results is the existence of a feed-forward loop between TPP-E2F/Dp and paramyosin-E2F/Dp. But the authors have mentioned nothing in this line. Additionally, I think trehalose metabolism impacts amino acid content in insects, and that will have a direct bearing on muscle development. The sequencing analysis and follow-up GSEA studies have demonstrated enrichment of several amino acid biosynthetic genes. Yet authors make no efforts to measure amino acid levels or correlate them with muscle development. Any study aiming to link trehalose metabolism and muscle development and not considering the above points will be incomplete.

      The result section of the manuscript is quite concise, to my understanding (especially the initial few sections), which misses out on mentioning details that would help readers understand the paper better. While technical details of the methods should be in the Materials and Methods section, the overall experimental strategy for the experiments performed should be explained in adequate detail in the results section itself or in figure legends. I would request authors to include more details in the results section. As an extension of the comment above, many times, abbreviations have been used without introducing them. A thorough check of the manuscript is required regarding this.

      The Spodoptera experiments appear ad hoc and are insufficient to support conservation beyond Helicoverpa. To substantiate this claim, please add a coherent, minimal set of Spodoptera experiments and present them in a dedicated subsection. Alternatively, consider removing these data and limiting the conclusions (and title) to H. armigera.

      In order to check the effects of E2F/Dp, a dsRNA-mediated knockdown of Dp was performed. Why was the E2F protein, a primary target of the study, not chosen as a candidate? The authors should either provide justification for this or perform the suggested experiments to come to a conclusion. I would like to point out that such experiments were performed in Drosophila.

      Silencing of HaDp resulted in a significant decrease in HaE2F expression. I find this observation intriguing. DP is the cofactor of E2F, and they both heterodimerise and sit on the promoter of target genes to regulate them. I would request authors to revisit this result, as it contradicts the general understanding of how E2F/Dp functions in other organisms. If Dp indeed controls E2F expression, then further experiments should be conducted to come to a conclusion convincingly. Additionally, these results would need thorough discussion with citations of similar results observed for other transcription factor-cofactor complexes.

      I consider the overall bioinformatics analysis to remain very poorly described. What is specifically lacking is clear statements about why a particular dry lab experiments were conducted.

      In my judgement, the EMSA analysis presented is technically poor in quality. It lacks positive and negative controls, does not show mutation analysis or super shifts. Also, it lacks any competition assays that are important to prove the binding beyond doubt. I am not sure why protein is not detected at all in lower concentrations. Overall, the EMSA assays need to be redone; I find the current results to be unacceptable.

      GSEA studies clearly indicate enrichment of the amino acid synthesis gene in TPP knockdown samples. This supports the plausible theory that a lack of Trehalose means a lack of enough nutrients, therefore less of that is converted to amino acids, and therefore muscle development is compromised. Yet the authors make no effort to measure amino acid levels. While nutrients can be sensed through signalling pathways leading to shut shutdown of myogenic genes, a simple and direct correlation between less raw material and deformed muscle might also be possible.

      The authors are encouraged to stick to one color palette while demonstrating sequencing results. Choosing a different color palette for representing results from the same sequencing analysis confuses readers.

      Expression of genes, as understood from sequencing analysis in Figure 1D, Figure 2F, and Figure 3D, appears to be binary in nature. This result is extremely surprising given that the qRT-PCR of these genes have revealed a checker and graded expression.

      In several graphs, non-significant results have been interpreted as significant in the results section. In a few other cases, the reported changes are minimal, and the statistical support is unclear; please recheck the analyses and include exact statistics. In the results section, fold changes observed should be discussed, as well as the statistical significance of the observed change.

      Finally, I would add that trehaolse metabolism regulates cell cycle genes, and muscle development genes establish correlation and causation. The authors should ensure that any comments they make are backed by evidence.

    5. Author response:

      eLife Assessment

      This useful study examines whether the sugar trehalose, coordinates energy supply with the gene programs that build muscle in the cotton bollworm (Helicoverpa armigera). The evidence for this currently is incomplete. The central claim - that trehalose specifically regulates an E2F/Dp-driven myogenic program - is not supported by the specificity of the data: perturbations and sequencing are systemic, alternative explanations such as general energy or amino-acid scarcity remain plausible, and mechanistic anchors are also limited. The work will interest researchers in insect metabolism and development; focused, tissue-resolved measurements together with stronger mechanistic controls would substantially strengthen the conclusions.

      We thank the reviewer for the thoughtful and constructive evaluation of our work and for recognizing its potential relevance to researchers working on insect metabolism and development. We fully agree that our current evidence is preliminary and that the mechanistic link between trehalose and the E2F/Dp‑driven myogenic program needs to be strengthened.

      Our intention was to present trehalose-E2F/Dp coupling as a working model emerging from our data, rather than as a fully established pathway. We agree that systemic manipulations of trehalose and whole‑larval RNA‑seq cannot fully differentiate global metabolic stress from specific effects on myogenic programs. In the revision, we plan to include additional metabolic readouts (e.g., ATP/AMP ratio, key amino acids where available) to better discuss the overall energetic and nutritional state. We will reanalyze our RNA‑seq data to more clearly distinguish broad stress/metabolic signatures from cell‑cycle/myogenic signatures. Furthermore, we will reframe our discussion to explicitly state that we cannot completely rule out a contribution of general energy or amino‑acid scarcity at this stage.

      We acknowledge that, with our current experiments, the specificity for an E2F/Dp‑driven program is inferred mainly from enrichment of E2F targets among differentially expressed genes, and expression changes in canonical E2F partners and downstream cell‑cycle/myogenic regulators. To address this more rigorously, we are performing targeted qRT-PCR for a panel of well‑characterized E2F/Dp target genes and myogenic markers in larval muscle versus non‑muscle tissues, following trehalose perturbation. Where technically feasible, testing whether partial knockdown of HaE2F or HaDp modifies the effect of trehalose manipulation on selected myogenic markers. These data, even if limited, will help to provide a more direct functional link, and we will include them in the manuscript if completed in time. In parallel, we will soften statements that imply a fully established, trehalose‑specific regulation of E2F/Dp and instead present this as a strong candidate pathway suggested by the current data.

      We fully agree that tissue‑resolved analyses are essential to move from systemic correlations to causality in muscle. We are in the process of standardizing larval muscle dissections and isolating thoracic/abdominal body wall muscle for trehalose, glycogen, and expression assays. Comparing expression of key metabolic and myogenic genes in muscle versus fat body and midgut, under trehalose manipulation. These tissue‑resolved data will directly address whether the transcriptional changes we report are preferentially localized to muscle.

      We are grateful for the reviewer’s critical but encouraging comments. We will moderate our central claims, also explicitly consider and discuss alternative explanations. Further, we will add tissue‑resolved and more focused mechanistic data as far as possible within the current revision. We believe these changes will substantially strengthen the manuscript and better align our conclusions with the evidence we presently have.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work by Mohite et al., they have used transcriptomic and metabolic profiling of H. armigera, muscle development, and S. frugiperda to link energy trehalose metabolism and muscle development. They further used several different bioinformatics tools for network analysis to converge upon transcriptional control as a potential mechanism of metabolite-regulated transcriptional programming for muscle development. The authors have also done rescue experiments where trehalose was provided externally by feeding, which rescues the phenotype. Though the study is exciting, there are several concerns and gaps that lead to the current results as purely speculative. It is difficult to perform any genetic experiments in non-model insects; the authors seem to suggest a similar mechanism could also be applicable in systems like Drosophila; it might be possible to perform experiments to fill some missing mechanistic details.

      A few specific comments below:

      The authors used N-(phenylthio) phthalimide (NPP), a trehalose-6-phosphate phosphatase (TPP) inhibitor. They also find several genes, including enzymes of trehalose metabolism, that change. Further, several myogenic genes are downregulated in bulk RNA sequencing. The major caveat of this experiment is that the NPP treatment leads to reduced muscle development, and so the proportion of the samples from the muscles in bulk RNA sequencing will be relatively lower, which might have led to the results. So, a confirmatory experiment has to be performed where the muscle tissues are dissected and sequenced, or some of the interesting targets could be validated by qRT-PCR. Further to overcome the off-target effects of NPP, trehalose rescue experiments could be useful.

      Thank you for this valuable comment. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. This will help determine whether the gene expression patterns observed in the RNA-seq data are muscle-specific or systemic.

      Even the reduction in the levels of ADP, NAD, NADH, and NMN, all of which are essential for efficient energy production and utilization, could be due to the loss of muscles, which perform predominantly metabolic functions due to their mitochondria-rich environment. So it becomes difficult to judge if the levels of these energy molecules' reduction are due to a cause or effect.

      We thank the reviewer for this thoughtful comment and agree that reduced levels of ADP, NAD, NADH, and NMN could arise either from a disturbance of energy metabolism or from loss of mitochondria‑rich muscles. Our current data cannot fully separate these two possibilities. Still, several studies support the interpretation that perturbing trehalose metabolism causes a primary systemic energy deficit that is coupled to mitochondrial function, not merely a passive consequence of tissue loss.

      For example:

      (1) Our previous study in H. armigera showed that chemical inhibition of trehalose synthesis results in depletion of trehalose, glucose, glucose‑6‑phosphate, and suppression of the TCA cycle, indicating reduced energy levels and dysregulated fatty‑acid oxidation (Tellis et al., 2023).

      (2) Chang et al. (2022) showed that trehalose catabolism and mitochondrial ATP production are mechanistically linked. HaTreh1 localizes to mitochondria and physically interacts with ATP synthase subunit α. 20‑hydroxyecdysone increases HaTreh1 expression, enhances its binding to ATP synthase, and elevates ATP content, while knockdown of HaTreh1 or HaATPs‑α reduces ATP levels.

      (3) Similarly, our previous study inhibition of Treh activity in H. armigera generates an “energy‑deficient condition” characterized by deregulation of carbohydrate, protein, fatty‑acid, and mitochondria‑related pathways, and a concomitant reduction in key energy metabolites (Tellis et al., 2024).

      (4) The starvation study in H. armigera has shown that reduced hemolymph trehalose is associated with respiratory depression and large‑scale reprogramming of glycolysis and fatty‑acid metabolism (Jiang et al., 2019).

      These findings support a direct coupling between trehalose availability and systemic energy/redox state. Therefore, the coordinated decrease in ADP, NAD, NADH, and NMN following TPS/TPP silencing is consistent with a primary disturbance of systemic energy and mitochondrial metabolism rather than exclusively a secondary consequence of muscle loss. We agree, however, that the present whole‑larva metabolite measurements do not allow a quantitative partitioning between changes due to altered muscle mass and those due to intrinsic metabolic impairment at the cellular level. Thus, tissue-specific quantification of these metabolites would allow us to directly test whether altered energy metabolites are a cause or consequence of muscle loss.

      References:

      (1) Tellis, M. B., Mohite, S. D., Nair, V. S., Chaudhari, B. Y., Ahmed, S., Kotkar, H. M., & Joshi, R. S. (2024). Inhibition of Trehalose Synthesis in Lepidoptera Reduces Larval Fitness. Advanced Biology, 8(2), 2300404.

      (2) Chang, Y., Zhang, B., Du, M., Geng, Z., Wei, J., Guan, R., An, S. and Zhao, W., 2022. The vital hormone 20-hydroxyecdysone controls ATP production by upregulating the binding of trehalase 1 with ATP synthase subunit α in Helicoverpa armigera. Journal of Biological Chemistry, 298(2).

      (3) Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Jiang, T., Ma, L., Liu, X.Y., Xiao, H.J. and Zhang, W.N., 2019. Effects of starvation on respiratory metabolism and energy metabolism in the cotton bollworm Helicoverpa armigera (Hübner)(Lepidoptera: Noctuidae). Journal of Insect Physiology, 119, p.103951.

      The authors have used this transcriptomic data for pathway enrichment analysis, which led to the E2F family of transcription factors and a reduction in the level of when trehalose metabolism is perturbed. EMSA experiments, though, confirm a possibility of the E2F interaction with the HaTPS/TPP promoter, but it lacks proper controls and competition to test the actual specificity of this interaction. Several transcription factors have DNA-binding domains and could bind any given DNA weakly, and the specificity is ideally known only from competitive and non-competitive inhibition studies.

      We thank the reviewer for this important comment and fully agree that EMSA alone, without appropriate competition and control reactions, cannot establish the specificity or functional relevance of a transcription factor-DNA interaction. In our study, we found the E2F family from GRN analysis of the RNA seq data obtained upon HaTPS/TPP silencing, suggesting a potential regulatory connection. After that, we predicted E2F binding sites on the promoter of HaTPS/TPP. The EMSA experiments were intended as preliminary evidence that E2F can associate with the HaTPS/TPP promoter in vitro. We will clarify this in the manuscript by softening our conclusion to indicate that our data support a “possible E2F-HaTPS/TPP interaction”. We also perform EMSA with specific and non‑specific competitors to confirm the E2F binding to the HaTPS/TPP promoter.

      The work seems to have connected the trehalose metabolism with gene expression changes, though this is an interesting idea, there are no experiments that are conclusive in the current version of the manuscript. If the authors can search for domains in the E2F family of transcription factors that can bind to the metabolite, then, if not, a chip-seq is essential to conclusively suggest the role of E2F in regulating gene expression tuned by the metabolites.

      A previous study in D. melanogaster, Zappia et al., (2016) showed vital role of E2F in skeletal muscle required for animal viability. They have shown that Dp knockdown resulted in reduced expression of genes encoding structural and contractile proteins, such as Myosin heavy chain (Mhc), fln, Tropomyosin 1 (Tm1), Tropomyosin 2 (Tm2), Myosin light chain 2 (Mlc2), sarcomere length short (sals) and Act88F, and myogenic regulators, such as held out wings (how), Limpet (Lmpt), Myocyte enhancer factor 2 (Mef2) and spalt major (salm). Also, ChiP-qRT-PCR showed upstream regions of myogenic genes, such as how, fln, Lmpt, sals, Tm1 and Mef2, were specifically enriched with E2f1, E2f2, and Dp antibodies in comparison with a nonspecific antibody. Further, Zappia et al. (2019) reported a chip-seq dataset that suggests that E2F/Dp directly activates the expression of glycolytic and mitochondrial genes during muscle development. Zappia et al., (2023) showed the regulation of one of the glycolytic genes, Phosphoglycerate kinase (Pgk) by E2F during Drosophila development.

      However, the regulation of trehalose metabolic genes by E2F/Dp and vice versa was not studied previously. So here in our study, we tried to understand the correlation of trehalose metabolism and E2F/Dp in the muscle development of H. armigera.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Rogers, A., Islam, A.B. and Frolov, M.V., 2019. Rbf activates the myogenic transcriptional program to promote skeletal muscle differentiation. Cell reports, 26(3), pp.702-719.

      (3) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      Some of the above concerns are partially addressed in experiments where silencing of E2F/Dp shows similar phenotypes as with NPP and dsRNA. It is also notable that silencing any key transcription factor can have several indirect effects, and delayed pupation and lethality could not be definitely linked to trehalose-dependent regulation.

      Yes. It’s true that silencing of any key transcription factor can have several indirect effects. Our intention was not to argue that delayed pupation and lethality are exclusively due to trehalose-dependent regulation, but that E2F/Dp and HaTPS/TPP silencing showed a consistent set of phenotypes and molecular changes, such as (i) transcriptomic enrichment of E2F targets upon trehalose perturbation, (ii) reduced HaTPS/TPP expression following E2F/Dp silencing, (iii) reduced myogenic gene expression that parallels the phenotypes observed with HaTPS/TPP silencing and (iv) restoration of E2F and Dp expression in E2F/Dp‑silenced insects upon trehalose feeding in the rescue assay. Together, these findings support a functional association between E2F/Dp and trehalose homeostasis. At the same time, we fully acknowledge that these results do not exclude additional, trehalose‑independent roles of E2F/Dp in development.

      Trehalose rescue experiments that rescue phenotype and gene expression are interesting. But is it possible that the fed trehalose is metabolized in the gut and might not reach the target tissue? In which case, the role of trehalose in directly regulating transcription factors becomes questionable. So, a confirmatory experiment is needed to demonstrate that the fed trehalose reaches the target tissues. This could possibly be done by measuring the trehalose levels in muscles post-rescue feeding. Also, rescue experiments need to be done with appropriate control sugars.

      Yes, it’s possible that, to some extent, trehalose is metabolized in the gut. Even though trehalase is present in the insect gut, some of the trehalose will be absorbed via trehalose transporters on the gut lining. Trehalose feeding was not rescued in insects fed with the control diet (empty vector and dsHaTPP), which contains chickpea powder, which is composed of an ample amount of amino acids and carbohydrates. Insects fed exclusively on a trehalose-containing diet are rescued, but not on a control diet that contains other carbohydrates. We agree that direct measurement of trehalose in target tissues will provide important confirmation. In the manuscript, we will measure trehalose levels in muscle, gut, and haemolymph after trehalose feeding.

      No experiments are performed with non-target control dsRNA. All the experiments are done with an empty vector. But an appropriate control should be a non-target control.

      Yes, there was no experiment with non-target dsRNA. Earlier, we have optimized a protocol for dsRNA delivery and its effectiveness in target knockdown (concentration, time) experiment, and published several research articles using a similar protocol:

      (1) Chaudhari, B.Y., Nichit, V.J., Barvkar, V.T. and Joshi, R.S., 2025. Mechanistic insights in the role of trehalose transporter in metabolic homeostasis in response to dietary trehalose. G3: Genes, Genomes, Genetics, p. jkaf303.

      (2) Barbole, R.S., Sharma, S., Patil, Y., Giri, A.P. and Joshi, R.S., 2024. Chitinase inhibition induces transcriptional dysregulation altering ecdysteroid-mediated control of Spodoptera frugiperda development. Iscience, 27(3).

      (3) Patil, Y.P., Wagh, D.S., Barvkar, V.T., Gawari, S.K., Pisalwar, P.D., Ahmed, S. and Joshi, R.S., 2025. Altered Octopamine synthesis impairs tyrosine metabolism affecting Helicoverpa armigera vitality. Pesticide Biochemistry and Physiology, 208, p.106323.

      (4) Tellis, M.B., Chaudhari, B.Y., Deshpande, S.V., Nikam, S.V., Barvkar, V.T., Kotkar, H.M. and Joshi, R.S., 2023. Trehalose transporter-like gene diversity and dynamics enhances stress response and recovery in Helicoverpa armigera. Gene, 862, p.147259.

      (5) Joshi, K.S., Barvkar, V.T., Hadapad, A.B., Hire, R.S. and Joshi, R.S., 2025. LDH-dsRNA nanocarrier-mediated spray-induced silencing of juvenile hormone degradation pathway genes for targeted control of Helicoverpa armigera. International Journal of Biological Macromolecules, p.148673.

      The same vector backbone and preparation procedures were used for both control and experimental constructs, allowing us to specifically compare the effects of the target dsRNA. The phenotypes and gene expression changes we observed were specific to the target genes and were not seen in the empty vector controls, suggesting that the effects are not due to nonspecific responses of dsRNA delivery or vector components.<br /> We acknowledge your suggestions, and in future studies, we will keep non-target dsRNA as a control in silencing assays.

      Reviewer #2 (Public review):

      Summary:

      This study shows that the knockdown of the effects of TPS/TPP in Helicoverpa armigera and Spodoptera frugiperda can be rescued by trehalose treatment. This suggests that trehalose metabolism is necessary for development in the tissues that NPP and dsRNA can reach.

      Strengths:

      This study examines an important metabolic process beyond model organisms, providing a new perspective on our understanding of species-specific metabolism equilibria, whether conserved or divergent.

      Weaknesses:

      While the effects observed may be truly conserved across Lepidopterans and may be muscle-specific, the study largely relies on one species and perturbation methods that are not muscle-specific. The technical limitations arising from investigations outside model systems, where solid methods are available, limit the specificity of inferences that may be drawn from the data.

      Thank you for this potting out this experimental weakness. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. We will also perform metabolite analysis with muscle samples. This will help to determine whether the observed gene expression patterns and metabolite changes are muscle-specific or systemic.

      Reviewer #3 (Public review):

      The hypothesis is that Trehalose metabolism regulates transcriptional control of muscle development in lepidopteran insects.

      The manuscript investigates the role of Trehalose metabolism in muscle development. Through sequencing and subsequent bioinformatics analysis of insects with perturbed trehalose metabolism (knockdown of TPS/TPP), the authors have identified transcription factor E2F, which was validated through RT-PCR. Their hypothesis is that trehalose metabolism regulates E2F, which then controls the myogenic genes. Counterintuitive to this hypothesis, the investigators perform EMSAs with the E2F protein and promoter of the TPP gene and show binding. Their knockdown experiments with Dp, the binding partner of E2F, show direct effect on several trehalose metabolism genes. Similar results are demonstrated in the trehalose feeding experiment, where feeding trehalose leads to partial rescue of the phenotype observed as a result of Dp knockdown. This seems contradictory to their hypothesis. Even more intriguing is a similar observation between paramyosin, a structural muscle protein, and E2F/Dp - they show that paramyosin regulates E2F/Dp and E2F/Dp regulated paramyosin. The only plausible way to explain the results is the existence of a feed-forward loop between TPP-E2F/Dp and paramyosin-E2F/Dp. But the authors have mentioned nothing in this line. Additionally, I think trehalose metabolism impacts amino acid content in insects, and that will have a direct bearing on muscle development. The sequencing analysis and follow-up GSEA studies have demonstrated enrichment of several amino acid biosynthetic genes. Yet authors make no efforts to measure amino acid levels or correlate them with muscle development. Any study aiming to link trehalose metabolism and muscle development and not considering the above points will be incomplete.

      We appreciate the reviewer’s efforts in the careful evaluation of this manuscript and constructive comments. From our and earlier data we found it was difficult to consider linear pathway “trehalose → E2F → muscle,” but rather a regulatory module in which trehalose metabolism and E2F/Dp form an interdependent circuit controlling myogenic genes. E2F/Dp binds and activates trehalose metabolism genes (TPS/TPP, Treh1) and myogenic structural genes, consistent with EMSA (TPS/TPP-E2F) and predicted binding sites of E2F on metabolic genes, Treh1, Pgk, and myogenic genes such as Act88F, Prm, Tm1, Fln, etc. At the same time, perturbing trehalose synthesis reduces E2F/Dp expression and myogenic gene expression, and trehalose feeding partially restores all three. This bidirectional influence is similar to E2F‑dependent control of carbohydrate metabolism and systemic sugar homeostasis described in D. melanogaster, where E2F/Dp both regulates metabolic genes and is itself constrained by metabolic state (Zappia et al., 2023a; Zappia et al., 2021).

      The reciprocal regulation between Prm and E2F/Dp is indeed intriguing. Rather than a paradox, we interpret this as evidence that E2F/Dp couples metabolic genes and structural muscle genes within a shared module, and that key sarcomeric components (such as paramyosin) feed back on this transcriptional program. Similar cross‑talk between E2F‑controlled metabolic programs and tissue function has been documented in D. melanogaster muscle and fat body, where E2F loss in one tissue elicits systemic changes in the other (Zappia et al., 2021). For further confirmation of E2F-regulated Prm, we will perform EMSA on the Prm promoter with appropriate controls.

      We fully agree that amino‑acid metabolism is a critical missing piece. In the manuscript, we will quantify the amino acid levels and include the results: “Amino acids display differential levels showing cysteine, leucine, histidine, valine, and proline showed significant reductions, while isoleucine and lysine showed non-significant reductions upon trehalose metabolism perturbation. These results are consistent with previous reports published by Tellis et al. (2024) and Shi et al. (2016)”. We will reframe our conclusions more cautiously as establishing a trehalose-E2F/Dp-muscle development, while stating that “definitive causal links via amino‑acid metabolism remain to be demonstrated”.

      Reference:

      (1) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      (3)Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Shi, J.F., Xu, Q.Y., Sun, Q.K., Meng, Q.W., Mu, L.L., Guo, W.C. and Li, G.Q., 2016. Physiological roles of trehalose in Leptinotarsa larvae revealed by RNA interference of trehalose-6-phosphate synthase and trehalase genes. Insect Biochemistry and Molecular Biology, 77, pp.52-68.

      Author response image 1.

      The result section of the manuscript is quite concise, to my understanding (especially the initial few sections), which misses out on mentioning details that would help readers understand the paper better. While technical details of the methods should be in the Materials and Methods section, the overall experimental strategy for the experiments performed should be explained in adequate detail in the results section itself or in figure legends. I would request authors to include more details in the results section. As an extension of the comment above, many times, abbreviations have been used without introducing them. A thorough check of the manuscript is required regarding this.

      Thank you very much for pointing out this issue. We will revise the manuscript content according to these suggestions.

      The Spodoptera experiments appear ad hoc and are insufficient to support conservation beyond Helicoverpa. To substantiate this claim, please add a coherent, minimal set of Spodoptera experiments and present them in a dedicated subsection. Alternatively, consider removing these data and limiting the conclusions (and title) to H. armigera.

      We thank the reviewer for this helpful comment. We agree that, in this current version of the manuscript, the S. frugiperda experiments are not sufficiently systematic to support strong claims about conservation beyond H. armigera. Our primary focus in this study is indeed on H. armigera, and the addition of the S. frugiperda data was intended only as preliminary, supportive evidence rather than a central component of our conclusions. To avoid over‑interpretation and to keep the manuscript focused and coherent, we will remove all S. frugiperda data from the revised version, including the corresponding text and figures. We will also adjust the title, abstract, and conclusion to clearly state that our findings are limited to H. armigera.

      In order to check the effects of E2F/Dp, a dsRNA-mediated knockdown of Dp was performed. Why was the E2F protein, a primary target of the study, not chosen as a candidate? The authors should either provide justification for this or perform the suggested experiments to come to a conclusion. I would like to point out that such experiments were performed in Drosophila.

      Thank you for this thoughtful comment and the specific suggestion. We agree that directly targeting E2F would, in principle, be an informative complementary approach. In our study, however, we prioritized Dp knockdown for two main reasons. First, E2F is a large family, and E2F-Dp functions as an obligate heterodimer. Previous work in D. melanogaster has shown that depletion of Dp is sufficient to disrupt E2F-dependent transcription broadly, often with more efficient loss of complex activity than targeting individual E2F isoforms (Zappia et al., 2021; Zappia et al., 2016). Second, in our preliminary trials, we performed a dsRNA feeding assay with dsHaE2F, dsHaDp, and combined dsHaE2F plus dsHaDp. In that assay, we did not achieve silencing of E2F in dsRNA targeting HaE2F (dsHaE2F). So here, as E2F is a large family, other E2F isoforms may be compensating for the silencing effect of targeted HaE2F. However, HaE2F showed significantly reduced expression upon dsHaDp and combined dsHaE2F plus dsHaDp feeding (Figure A), whereas HaDp showed a significant reduction in its expression in all three conditions (Figure B).  As we observed reduced expression of both HaE2F and HaDp upon combined feeding of dsHaE2F and dsHaDp, we further performed a rescue assay by exogenous feeding of trehalose. We observed the significant upregulation of HaE2F, HaDp, trehalose metabolic genes (HaTPS/TPP and HaTreh1), and myogenic genes (HaPrm and HaTm2) (Figure C). For these reasons, we focused on Dp silencing as a more reliable way to impair E2F/Dp complex function in H. armigera.

      Author response image 2.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      Silencing of HaDp resulted in a significant decrease in HaE2F expression. I find this observation intriguing. DP is the cofactor of E2F, and they both heterodimerise and sit on the promoter of target genes to regulate them. I would request authors to revisit this result, as it contradicts the general understanding of how E2F/Dp functions in other organisms. If Dp indeed controls E2F expression, then further experiments should be conducted to come to a conclusion convincingly. Additionally, these results would need thorough discussion with citations of similar results observed for other transcription factor-cofactor complexes.

      Thank you for highlighting this point and for prompting us to examine these data more carefully. Silencing HaDp leading to reduced HaE2F mRNA is indeed unexpected if one only considers the canonical view of E2F/Dp as a heterodimer that co-occupies target promoters without strongly regulating each other’s expression. However, several lines of work suggest that transcription factor-cofactor networks frequently include feedback loops in which cofactors influence the expression of their partner TFs. First, in multiple systems, transcription factors and their cofactors are known to regulate each other’s transcription, forming positive or negative feedback loops. For example, in hematopoietic cells, the transcription factor Foxp3 controls the expression of many of its own cofactors, and some of these cofactors in turn facilitate or stabilize Foxp3 expression, forming an interconnected regulatory network rather than a simple one‑way interaction (Rudra et al., 2012). Second, E2F/Dp complexes exhibit non‑canonical regulatory mechanisms and can regulate broad sets of targets, including other transcriptional regulators. Several studies show that E2F/Dp proteins not only control classical cell‑cycle genes but also participate in diverse processes such as DNA damage signaling, mitochondrial function, and differentiation (Guarner et al., 2017; Ambrus et al., 2013; Sánchez-Camargo et al., 2021). In D. melanogaster, complete loss of dDP alters the expression of direct targets E2F/DP, including dATM (Guarner et al., 2017).

      All these reports indicate that the E2F-Dp complex sits at the top of multi‑layer regulatory hierarchies. Such architectures make it plausible that Dp silencing in H. armigera could modulate HaE2F expression in a non-canonical way.

      References:

      (1) Rudra, D., DeRoos, P., Chaudhry, A., Niec, R.E., Arvey, A., Samstein, R.M., Leslie, C., Shaffer, S.A., Goodlett, D.R. and Rudensky, A.Y., 2012. Transcription factor Foxp3 and its protein partners form a complex regulatory network. Nature immunology, 13(10), pp.1010-1019.

      (2) Guarner, A., Morris, R., Korenjak, M., Boukhali, M., Zappia, M.P., Van Rechem, C., Whetstine, J.R., Ramaswamy, S., Zou, L., Frolov, M.V. and Haas, W., 2017. E2F/DP prevents cell-cycle progression in endocycling fat body cells by suppressing dATM expression. Developmental cell, 43(6), pp.689-703.

      (3) Ambrus, A.M., Islam, A.B., Holmes, K.B., Moon, N.S., Lopez-Bigas, N., Benevolenskaya, E.V. and Frolov, M.V., 2013. Loss of dE2F compromises mitochondrial function. Developmental cell, 27(4), pp.438-451.

      (4) Sánchez-Camargo, V.A., Romero-Rodríguez, S. and Vázquez-Ramos, J.M., 2021. Non-canonical functions of the E2F/DP pathway with emphasis in plants. Phyton, 90(2), p.307.

      I consider the overall bioinformatics analysis to remain very poorly described. What is specifically lacking is clear statements about why a particular dry lab experiments were conducted.

      We again thank the reviewer for advising us to give a biological context/motivation for every bioinformatics analysis performed. The bioinformatics analyses devised here, try to explain the systems-level perturbations of HaTPS/TPP silencing to explain the observed phenotype and to discover transcription factors potentially modulating the HaTPS/TPP induced gene regulatory changes.

      (1) Gene set enrichment analyses:

      Differential gene expression analyses of the bulk RNA sequencing data followed by qRT-PCR confirmed the transcriptional changes in myogenic genes and gene expression alterations in metabolic and cell cycle-related genes. These perturbations merely confirmed the effect induced by HaTPS/TPP silencing in obviously expected genes. We wanted to see whether using an “unbiased” system-level statistical analyses like gene set enrichment analyses (GSEA), can reveal both expected and novel biological processes that underlie HaTPS/TPP silencing. GSEA results revealed large-scale transcriptional changes in 11 enriched processes, including amino acid metabolism, energy metabolism, developmental regulatory processes, and motor protein activity. GSEA not only divulged overall transcriptionally enriched pathways but also identified the genes undergoing synchronized pathway-level transcriptional change upon HaTPS/TPP silencing.

      (2) Gene regulatory network analysis:

      Although GSEA uncovered potential pathway-level changes, we were also interested in identifying the gene regulatory network associated with such large-scale process-level transcriptional perturbations. Interestingly, the biological processes undergoing perturbations were also heterogeneous (e.g., motor protein activity, energy metabolism, amino acid metabolism, etc.). We hypothesized that the inference of a causal gene regulatory network associated with the genes associated with GSEA-enriched biological processes should predict core/master transcription factors that might synchronously regulate metabolic and non-metabolic processes related to HaTPS/TPP silencing, thereby providing a broad understanding of the perturbed phenotype. The gene regulatory network analysis statistically inferred an “active” gene regulatory network corresponding to the GSEA-enriched KEGG gene sets. Ranking the transcription factors (TFs) based on the number of outgoing connections (outdegree centrality) within the active gene regulatory network, E2F family TFs were identified to be top-ranking, highly connected transcription factors associated with the transcriptionally enriched processes. This suggests that E2F family TFs are central to controlling the flow of regulatory information within this network. Intriguingly, E2F has been previously implicated in muscle development in insects (Zappia et al., 2016). Further extracting the regulated targets of E2F family TFs within this network revealed the mechanistic connection with the 11 enriched processes. This GRN analysis was crucial in discovering and prioritizing E2F TFs as central transcription factors mediating HaTPS/TPP silencing effects, which was not apparent using trivial analyses like differential gene expression analysis.

      As per the reviewer’s suggestions, we will add these outlined points in the text of the manuscript (Results section) to further give context and clarity to the bioinformatics analyses conducted in this study.

      In my judgement, the EMSA analysis presented is technically poor in quality. It lacks positive and negative controls, does not show mutation analysis or super shifts. Also, it lacks any competition assays that are important to prove the binding beyond doubt. I am not sure why protein is not detected at all in lower concentrations. Overall, the EMSA assays need to be redone; I find the current results to be unacceptable.

      Thank you for pointing out this issue. We will reperform the EMSA analysis with appropriate controls.  Although the gel image was not clear, there was a light band of protein (indicated by the white square) observed in well No. 8, where we used 8 μg of E2F protein and 75 ng of HaTPS/TPP promoter, upon gel stained with SYPRO Ruby protein stain, suggesting weak HaTPS/TPP-E2F complex formation.

      GSEA studies clearly indicate enrichment of the amino acid synthesis gene in TPP knockdown samples. This supports the plausible theory that a lack of Trehalose means a lack of enough nutrients, therefore less of that is converted to amino acids, and therefore muscle development is compromised. Yet the authors make no effort to measure amino acid levels. While nutrients can be sensed through signalling pathways leading to shut shutdown of myogenic genes, a simple and direct correlation between less raw material and deformed muscle might also be possible.

      We quantified amino acid levels as per the suggestion, and we observed differential levels of amino acids upon trehalose metabolism perturbation.

      However, we observed that insect were failed to rescue when fed a control chickpea-based artificial diet that contained nutrients required for normal growth and development. Based on this observation, we conclude that trehalose deficiency is the only possible cause for the defect in muscle development.

      The authors are encouraged to stick to one color palette while demonstrating sequencing results. Choosing a different color palette for representing results from the same sequencing analysis confuses readers.

      Thank you for the comment. We will revise the color palette as per the suggestion.

      Expression of genes, as understood from sequencing analysis in Figure 1D, Figure 2F, and Figure 3D, appears to be binary in nature. This result is extremely surprising given that the qRT-PCR of these genes have revealed a checker and graded expression.

      Thank you for pointing out this issue. We will revise the scale range for these figures to get more insights about gene expression levels and include figures as per the suggestion.

      In several graphs, non-significant results have been interpreted as significant in the results section. In a few other cases, the reported changes are minimal, and the statistical support is unclear; please recheck the analyses and include exact statistics. In the results section, fold changes observed should be discussed, as well as the statistical significance of the observed change.

      We will revise the analyses and include exact statistics as per the suggestion.

      Finally, I would add that trehalose metabolism regulates cell cycle genes, and muscle development genes establish correlation and causation. The authors should ensure that any comments they make are backed by evidence.

      We thank the reviewer for this insightful comment.  Although direct evidence in insects is currently lacking, multiple independent studies in yeast, plants and mammalian systems support a regulatory link between trehalose metabolism and the cell cycle. In budding yeast Saccharomyces cerevisiae, neutral Treh (Nth1) is directly phosphorylated and activated by the major cyclin‑dependent kinase Cdk1 at G1/S, routing stored trehalose into glycolysis to fuel DNA replication and mitosis (Ewald et al., 2016). CDK‑dependent regulation of trehalase activity has also been reported in plants, where CDC28‑mediated phosphorylation channels glucose into biosynthetic pathways necessary for cell proliferation (Lara-núñez et al., 2025). Furthermore, budding yeast cells accumulate trehalose and glycogen upon entry into quiescence and subsequently mobilize these stores to generate a metabolic “finishing kick” that supports re‑entry into the cell cycle (Silljé et al., 1999; Shi et al., 2010). Exogenous trehalose that perturbs the trehalose cycle impairs glycolysis, reduces ATP, and delays cell cycle progression in S. cerevisiae, highlighting a dose‑ and context‑dependent control of growth versus arrest (Zhang, Zhang and Li, 2020). In mammalian systems, trehalose similarly modulates proliferation-differentiation decisions. In rat airway smooth muscle cells, low trehalose concentrations promote autophagy, whereas higher doses induce S/G2–M arrest, downregulate Cyclin A1/B1, and trigger apoptosis, indicating a shift from controlled growth to cell elimination at higher exposure (Xiao et al., 2021). In human iPSC‑derived neural stem/progenitor cells, low‑dose trehalose enhances neuronal differentiation and VEGF secretion, while higher doses are cytotoxic, again highlighting a tunable impact on cell‑fate outcomes (Roose et al., 2025). In wheat, exogenous trehalose under heat stress reduces growth, lowers auxin, gibberellin, abscisic acid and cytokinin levels, and represses CycD2 and CDC2 expression, suggesting that trehalose signalling integrates with hormone pathways and core cell‑cycle regulators to restrain proliferation during stress (Luo, Liu, and Li, 2021). Together, these studies showed the importance of trehalose metabolism in cell‑cycle regulation to decide whether cells and tissues proliferate, differentiate, or remain quiescent.

      With respect to muscle development, previous work has implicated glycolytic metabolism in myogenesis and muscle growth. Tixier et al. (2013) showed that loss of key glycolytic genes results in abnormally thin muscles, while Bawa et al. (2020) demonstrated that loss of TRIM32 decreases glycolytic flux and reduces muscle tissue size. These findings indicate that carbohydrate and energy metabolism pathways are important determinants of muscle structure and growth. However, there are no previous studies about the role of trehalose metabolism in muscle development, other than as an energy source, so here we specifically set out to establish the involvement of trehalose metabolism in muscle development.

      References:

      (1) Ewald, J.C. et al. (2016) “The yeast cyclin-dependent kinase routes carbon fluxes to fuel cell cycle progression,” Molecular cell, 62(4), pp. 532–545.

      (2) Lara-núñez, A. et al. (2025) “The Cyclin-Dependent Kinase activity modulates the central carbon metabolism in maize during germination,” (January), pp. 1–16.

      (3) Silljé, H.H.W. et al. (1999) “Function of trehalose and glycogen in cell cycle progression and cell viability in Saccharomyces cerevisiae,” Journal of bacteriology, 181(2), pp. 396–400.

      (4) Shi, L. et al. (2010) “Trehalose Is a Key Determinant of the Quiescent Metabolic State That Fuels Cell Cycle Progression upon Return to Growth,” 21, pp. 1982–1990.

      (5) Zhang, X., Zhang, Y. and Li, H. (2020) “Regulation of trehalose, a typical stress protectant, on central metabolisms, cell growth and division of Saccharomyces cerevisiae CEN. PK113-7D,” Food Microbiology, 89, p. 103459.

      (6) Xiao, B. et al. (2021) “Trehalose inhibits proliferation while activates apoptosis and autophagy in rat airway smooth muscle cells,” Acta Histochemica, 123(8), p. 151810.

      (7) Roose, S.K. et al. (2025) “Trehalose enhances neuronal differentiation with VEGF secretion in human iPSC-derived neural stem / progenitor cells,” Regenerative Therapy, 30, pp. 268–277.

      (8) Luo, Y., Liu, X. and Li, W. (2021) “Exogenously-supplied trehalose inhibits the growth of wheat seedlings under high temperature by affecting plant hormone levels and cell cycle processes,” Plant Signaling & Behavior, 16(6).

      (9) Tixier, V., Bataillé, L., Etard, C., Jagla, T., Weger, M., DaPonte, J.P., Strähle, U., Dickmeis, T. and Jagla, K., 2013. Glycolysis supports embryonic muscle growth by promoting myoblast fusion. Proceedings of the National Academy of Sciences, 110(47), pp.18982-18987.

      (10) Bawa, S., Brooks, D.S., Neville, K.E., Tipping, M., Sagar, M.A., Kollhoff, J.A., Chawla, G., Geisbrecht, B.V., Tennessen, J.M., Eliceiri, K.W. and Geisbrecht, E.R., 2020. Drosophila TRIM32 cooperates with glycolytic enzymes to promote cell growth. elife, 9, p.e52358.

      Finally, we appreciate the meticulous review of this manuscript and constructive comments. We will perform the recommended experiments, data analysis, and revise the manuscript accordingly.

    1. eLife Assessment

      This study offers important methodological advances for CRISPR-based mutagenesis in mice, highlighting the potential of founder animals for early phenotypic characterization. The authors present convincing evidence, supported by rigorous experimental design, that "crispant" (F0) analysis in mice, despite prior concerns about genetic mosaicism, can be utilized to assess protein function.

    2. Reviewer #1 (Public review):

      Summary:

      This study evaluates the feasibility of using crispant founder mice, first-generation animals directly edited by CRISPR/Cas9, for initial phenotypic assessments. The authors target seven genes known to produce visible recessive traits to test whether mosaicism in founder animals prevents meaningful phenotype-genotype interpretation. Remarkably, they observe clear null phenotypes in founders for six of the seven genes, with high editing efficiencies. These results demonstrate that crispant mice can, under specific conditions, display recessive phenotypes that are readily interpretable. However, this conclusion should be moderated, as the study addresses only one biological context, visible Mendelian traits, and may not generalize to quantitative, subtle, or late-onset phenotypes. The report also examines attempts at multiplex CRISPR targeting, which reduce viability, underscoring limits for concurrent gene disruptions. Finally, the detailed description of diverse alleles generated by CRISPR provides valuable insight into how allelic series can be exploited to investigate protein function.

      Strengths:

      (1) The manuscript provides a comprehensive and technically rigorous description of CRISPR/Cas9‑induced mutations across several loci. The accompanying genotyping, sequencing, and analytical approaches are sound, complementary, and well-detailed, providing a resource that will be valuable to researchers using genome editing beyond the specific application of genetic screening.

      (2) By documenting a wide diversity of alleles and mutation types, the study contributes to understanding how allelic series generated by CRISPR can be leveraged for dissecting protein function, a perspective that has been less systematically presented in prior literature and could be compared to targeted strategies such as those described by Cassidy et al. (2022, DOI: [10.1016/bs.mie.2022.03.053]) or other relevant studies addressing CRISPR-based allelic series generation.

      (3) The work demonstrates technically solid editing and validation workflows, setting a methodological reference point for similar projects across species or trait categories.

      Weaknesses:

      (1) There is a disconnect between the abstract/introduction and the discussion. While both the abstract and introduction focus on the potential use of crispant founders for phenotypic assessment in the context of genetic screening, with the introduction notably emphasizing this framework through a detailed section on ENU-based screens, the discussion devotes relatively little attention to this aspect. Instead, it primarily examines CRISPR mutagenesis outcomes, mutation detection, and allele characterization. Overall, the study's aims are not clearly or explicitly defined, which contributes to the lack of alignment across sections.

      (2) Important limitations of the approach are not sufficiently discussed. For instance, the paper does not address how applicable the findings are beyond visible Mendelian traits, such as for quantitative, late-onset, or more subtle phenotypes, including behavioral ones, or how to interpret wild-type appearing founders. There is little consideration of appropriate experimental controls (e.g., wild-type or mock-edited animals) or of how many animals might be required to robustly establish genotype-phenotype associations.

      (3) The conclusion that this strategy will "dramatically reduce time, resources, and animal numbers" is not quantitatively supported by the data presented and should be expressed more cautiously.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to validate the use of genetic screening pipelines that assess phenotypes in founders (F0, referred to as "crispants") obtained from CRISPR/Cas9 gene editing in 1-cell zygotes. The application of this approach in mice has generally been avoided due to concerns that results would be confounded by genetic mosaicism, but benefits to this approach include reducing animal numbers needed to achieve goals of identifying knockout phenotypes, as well as improved efficiency in the use of time and resources. The authors targeted seven genes associated with visible recessive phenotypes and observed the expected null phenotype in up to 100% of founders for each gene. Although mosaicism was common in the crispants, the various alleles were generally all functional null alleles and, in fact, some in-frame deletions with null phenotypes revealed critical functional motifs within the gene products. The rigorous data presented support using crispants to assess knockout phenotypes when guide RNAs with strong on-target and low off-target scores are used for gene editing in 1-cell mouse embryos.

      Strengths:

      By targeting multiple genes with existing, well-characterized mutations, the authors established a robust system for validating the analysis of crispants to assess gene function.

      Cutting-edge technologies were used to carefully assess the spectrum of mutations generated.

      Weaknesses:

      There could have been some discussion regarding how this approach would be impacted if mutations are dominant or embryonic lethal (for the latter, for example, F0 can be examined as embryos).

    4. Reviewer #3 (Public review):

      Summary:

      The study assesses whether CRISPR-generated founder (F0) "crispant" mice can be reliably used for initial phenotypic assessment and screening. By targeting seven genes with known visible recessive phenotypes, the authors show that, despite genetic mosaicism, the expected null phenotypes were observed in all targeted genes. These findings demonstrate that the phenotyping and screening of F0 "crispant" mice is a valid (and efficient) approach to selecting candidate alleles for follow-up studies, thereby streamlining mouse breeding and animal husbandry-related costs.

      Strengths:

      The study is comprehensive, carefully executed, and provides deep insight into the utility of F0 "crispant" mice for phenotypic screening. The authors evaluated the CRISPR/Cas9 editing outcomes across seven genes using multiple sequencing modalities, providing a robust framework for determining and interpreting complex founder genotypes. Importantly, the study examines/highlights the biological insight gained from compound heterozygous founders and naturally arising allelic series, enabling genotype-phenotype associations and functional inferences about protein domains.

      More broadly, the authors' thorough evaluation of the CRISPR/Cas9-based gene editing events in the founders can serve as a benchmark for others in the field, engineering their own mouse "crispants."

      Weaknesses:

      The relationship between the sgRNA/Cas9 concentrations delivered to the zygotes and the resulting editing efficiencies are not explicitly investigated.

    5. Author response:

      We would like to thank the reviewers for their detailed reading of our manuscript and for the constructive comments they have provided.

      We plan to make structural changes to the introduction and the discussion. Reviewer #1 describes the “disconnect between the abstract/introduction and the discussion”. We agree that “the study's aims are not clearly or explicitly defined”. We will edit the introduction to state our aim of investigating the factors that affect using “crispants” in mouse functional genomics. In the discussion, we described how our findings inform sgRNA choice to ensure biallelic gene disruption in founders and how our extensive genotyping methods enabled us to determine the molecular basis for the observed phenotype (explaining why some founders showed the expected recessive trait and why it was partial or absent in others). We also concluded from our attempts of multiplexing that this had too great an impact on viability to be useful. We will edit the discussion to better address our aim and to elaborate on several points raised by the reviewers (discussed in more detail below). Specifically, we will provide examples of screening situations where generating crispant mice may be useful, e.g. preliminary in vivo studies to follow up candidates identified in large-scale cellular screens. We will also provide more context about our assumptions underlying our statement that the use of crispants will “dramatically reduce time, resources, and animal numbers” compared to ENU mutagenesis (where recessive traits require breeding of G2 females with G1 males to achieve homozygosity of de novo mutations in G3 offspring) and the work needed to validate this. We will more clearly acknowledge that our proof-of-principle study used visible phenotypes that can be assessed in individual animals and then discuss how the use of crispants could be extended to the investigation of quantitative or late-onset traits using cohorts of crispants (discussed further below). We will also discuss the assessment of non-null alleles to dissect protein function, building on our unexpected finding that a single round of CRISPR/Cas9mediated mutagenesis can generate an allelic series.

      Reviewer #1 asked us to address “how to interpret wild-type appearing founders”. We have discussed the mechanisms underlying the wild-type appearing founders generated in this study. This is linked with concerns in the field that incomplete editing, transcripts escaping nonsense-mediated decay, and/or the presence of in-frame mutations that don’t disrupt protein function may lead to founders that appear wild-type or have a partial phenotype. We have shown that our electroporation protocol results in very high levels of editing, but that this must always be assessed during genotyping. We found that by using an sgRNA that targets a critical protein domain, you can ensure that short in-frame indels also disrupt protein function. In future studies that determine how strain background modifies a phenotype that has been established on one strain (e.g. C57BL/6J), wild-type appearing founders would suggest that the new strain background rescues the null phenotype. In future studies that determine the consequence of targeting a second gene on a mutant background, wild-type appearing founders would indicate that the second mutation supresses the phenotype associated with the mutant background. We will add this to the discussion section where we describe possible screening situations in which crispant mice would be useful.

      Reviewer #3 states that “the relationship between the sgRNA/Cas9 concentrations delivered to the zygotes and the resulting editing efficiencies are not explicitly investigated.” Members of The Centre for Phenogenomics (TCP) Transgenic Production Core who co-author this study (Lauryl Nutter, Marina Gertsenstein and Lauri Lintott) have published detailed protocols on mouse model production, which we cite in this paper (PMID: 30040228; PMID: 33524495; PMID: 39999224). In PMID: 33524495, they tested a two-fold difference in Cas9 RNP concentrations for generating knock-out alleles. Using their optimised protocols for electroporation of one cell zygotes with RNPs, we achieved an extremely high editing rate. We did not vary the sgRNA/Cas9 concentrations as part of this study as our goal was to assess the ability to generate “complete” null animals. We do note, however, that by targeting two genes simultaneously whilst keeping the total RNP concentration constant (to avoid reagent toxicity), we halved the amount of each sgRNA and this did not lead to a decrease in editing efficiency. We will highlight this in the results/discussion section (as appropriate).

      Reviewer #1 asks about whether the use of crispants is applicable for “quantitative, late-onset, or more subtle phenotypes, including behavioral ones”. We are hopeful that this is possible and it is a priority for future studies. Crucially, cohorts of crispants can be generated in a single round of mutagenesis. Starting an experiment with ten donor females will produce ~100 zygotes, resulting in ~40 crispants. Power calculations must be performed to determine the size of the cohort required for the effect size and variability of the phenotype being studied, but many neurobehavioural studies use ~10 mutants vs ~10 controls. We note that sex and/or background genotype may mean that only some of the ~40 crispants produced can be used for phenotypic testing. This reviewer also raises the point about whether wild-type animals or mock-edited animals serve as the best controls. From work carried out by Lauryl Nutter and her colleagues from the IMPC (PMID: 37301944), we know that “wild-type” controls should ideally be from the same embryo pool as the crispants to avoid differences due to genetic drift within inbred colonies. This study also found that possible off-target mutations from CRISPR/Cas9-mediated mutagenesis is not an issue (despite a lot of attention in the literature). The suggestion of using mock-edited controls, resulting from zygotes that have gone through electroporation without RNP, addresses a possible need to control for the stress of undergoing the electroporation process. Our study shows that additional stress is caused by inducing and repairing a break in a neutral locus (EGFP). Controlling for these stressors may be particularly important when assessing behavioural phenotypes in crispants vs controls.

      Reviewer #2 states that “there could have been some discussion regarding how this approach would be impacted if mutations are dominant or embryonic lethal (for the latter, for example, F0 can be examined as embryos).” Our manuscript discusses how crispants could help with the study of genes that may be essential. Specifically, we stated that when CRISPR/Cas9-mediated mutagenesis fails to produce live pups, phenotypic assessment of crispant embryos could reveal whether targeting the gene impacts embryogenesis. Crispants can only be used to screen for recessive traits since both alleles are edited. The assessment of dominant traits is not addressed in our study and remains a challenge in the field. We note that CRISPRi screens in cultured cells reveal candidates that when partially downregulated lead to the desired phenotype. One possibility is to employ this set up in vivo using dCas9-KRAB transgenic mice (JAX stock #030000). We could add this point to the discussion section.

    1. eLife Assessment

      This study presents valuable data suggesting that ATP-induced modulation of alveolar macrophage (AM) functions is associated with NLRP3 inflammasome activation and enhanced phagocytic capacity. While the in vivo and in vitro data reveal an interesting phenotype, the evidence provided is incomplete and does not fully support the paper's conclusions. Additional investigations would be of value in complementing the data and strengthening the interpretation of the results. This study should be of interest to immunologists and the mucosal immunity community.

    2. Reviewer #1 (Public review):

      Summary:

      Alveolar macrophages (AMs) are key sentinel cells in the lungs, representing the first line of defense against infections. There is growing interest within the scientific community in the metabolic and epigenetic reprogramming of innate immune cells following an initial stress, which alters their response upon exposure to a heterologous challenge. In this study, the authors show that exposure to extracellular ATP can shape AM functions by activating the P2X7 receptor. This activation triggers the relocation of the potassium channel TWIK2 to the cell surface, placing macrophages in a heightened state of responsiveness. This leads to the activation of the NLRP3 inflammasome and, upon bacterial internalization, to the translocation of TWIK2 to the phagosomal membrane, enhancing bacterial killing through pH modulation. Through these findings, the authors propose a mechanism by which ATP acts as a danger signal to boost the antimicrobial capacity of AMs.

      Strengths:

      This is a fundamental study in a field of great interest to the scientific community. A growing body of evidence has highlighted the importance of metabolic and epigenetic reprogramming in innate immune cells, which can have long-term effects on their responses to various inflammatory contexts. Exploring the role of ATP in this process represents an important and timely question in basic research. The study combines both in vitro and in vivo investigations and proposes a mechanistic hypothesis to explain the observed phenotype.

      Weaknesses:

      The authors have revised the manuscript to address the comments raised during the first round of review. However, several figures, figure legends, and methodological sections still require additional adjustments and clarification.

      The interpretation of CFU from lysates as 'killing' is unclear; lysate CFUs typically reflect intracellular surviving bacteria and are confounded by differences in uptake. Please include an uptake control (early time point) or time-course to distinguish phagocytosis from intracellular killing. Also, clarify how bacterial burden was calculated (supernatant vs cell-associated vs total). Supernatant alone may not capture adherent bacteria. The normalization as 'fold killing' (mean negative control / sample) is non-standard; please report absolute CFU (log scale) and specify the exact definition of killing/survival.

      The Methods section is largely incomplete and requires substantial revision. For instance, the authors report quantification of cytokine concentrations, yet no information is provided regarding how these measurements were performed. It is unclear whether cytokines were measured in BALF by ELISA, or assessed at the mRNA level by qPCR from total lung lysates, or by another method. This information must be clearly specified. In addition, the rationale for selecting the measured cytokines should be justified. While the choice of IL-1β and IL-6 is relatively straightforward, the focus on IL-18 requires explicit justification.

      Similarly, the methodology used to quantify immune cell populations presented in Figure 2 is not described. It is not stated how immune cells were isolated and identified (e.g. flow cytometry from lung tissue). No information is provided regarding tissue digestion, cell isolation procedures, or gating strategy (presumably by flow cytometry). These details are essential and should be included, together with the corresponding gating strategy and absolute cell numbers.

      Moreover, immune cell quantification would be expected in the context of the challenge experiment as well. Reporting unchanged percentages of lung immune cells following ATP exposure does not support the conclusion of a training effect, particularly one that is specific to alveolar macrophages (AMs). In addition, AMs are not considered recruited immune cells; this should be corrected in the figure legend and throughout the manuscript where applicable.

      There are inconsistencies throughout the manuscript. For example, the authors report n = 5 for the survival curves in the figure legend, whereas n = 7 is stated in the Methods section. This discrepancy is unclear and should be clarified.

      Supplementary Fig. 1 contains major conceptual errors. The volcano plot represents ATAC-seq peaks (differentially accessible chromatin regions), yet the figure, legend, and color scale repeatedly refer to 'genes' and 'differentially expressed genes'. This conflates chromatin accessibility with gene expression and is misleading. Peaks are secondarily annotated to nearby genes, which should be clearly described as an annotation step rather than the unit of analysis. The figure should be revised to explicitly present peak-level statistics (DARs), with gene names shown only as optional annotations. Additionally, the use of simultaneous P < 0.05 and Q < 0.05 thresholds is non-standard, and the absence of down-regulated regions in the plot requires explanation.

      In Figure 7, trained WT and Nlrp3⁻/⁻ mice display similar levels of bacterial clearance. How should this result be interpreted?

      Overall, while the study addresses an interesting biological question, the manuscript would benefit from substantial revision prior to publication. In particular, clarifications and improvements regarding the methodology, data presentation, and interpretation are required to strengthen the rigor and reproducibility of the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Thompson et al. investigate the impact of prior ATP exposure on later macrophage functions as a mechanism of immune training. They describe that ATP training enhances bactericidal functions which they connect to the P2x7 ATP receptor, Nlrp3 inflammasome activation, and TWIK2 K+ movement at the cell surface and subsequently at phagosomes during bacterial engulfment. This is an incremental addition to existing literature, which has previously explored how ATP alters TWIK2 and K+, and linked it to Nlrp3 activation. The novelty here is in discovering the persistence of TWIK2 change and exploring the impact this biology may have on bacterial clearance. Additional experiments could strengthen their hypothesis that the in vivo protective effect of ATP-training on bacterial clearance is mediated by alveolar macrophages.

      Strengths:

      The authors demonstrate three novel findings: 1) prolonged persistence of TWIK2 at the macrophage plasma membrane following ATP that can translocate to the phagosome during particle engulfment, 2) a persistent impact of ATP exposure on remodeling chromatin around nlrp3, and 3) administering mice intra-nasal ATP to 'train' lungs protects mice from otherwise fatal bacterial infection.

      Weaknesses:

      (1) Some methods remain unclear including the timing and method by which lung cellularity was assessed in Figure 2. It is also difficult to understand how many mice were used in experiments 1, 2 and 6 and thus how rigorous the design was. A specific number is only provided for 1D and the number of mice stated in legend and methods do not match.

      (2) The study design is not entirely ideal for the authors' in vivo question. Overall, the discussion would benefit from a clear summary of study caveats, which are primarily that that 1) in vitro studies attributing ATP training-mediated bacterial killing to persistent TWIK2 relocation, K+ influx, a glycolytic metabolic shift , and epigenetic nlrp3 reprogramming were performed in BMDM or RAW cells and not primary AMs, 2) data does not eliminate the possibility that non-AM immune or non-immune cells in the lung are "trained" and responsible for ATP-mediated protection in vivo; flow data examined total lung digest which may obscure important changes in alveolar recruitment, and 3) in vivo work shows data on acute bacterial clearance but does not explore potential risks that "training" for a more responsive inflammasome may have for the severity of lung injury during infection.

      (3) The is some lack of transparency on data and rigor of methods. Clear data is not provided regarding the RNA-sequencing results. Specific identities of DEGs is not provided, only one high-level pathway enrichment figure. It would also be ideal if controls were included for subcellular fractionating to confirm pure fractions and for dye microscopy to show negative background.

      (4) In results describing 5A, the text states that "ATP-induced macrophage training effects, as measured by augmented bactericidal activity, were diminished in macrophages treated with protease inhibitors". However, these data are not identified significant in the figure; protease dependence can be described as a trend that supports the authors' hypothesis but should not be stated as significant data in text.

      In summary, this work contains some useful data showing how ATP can train macrophages via TWIK2/Nlrp3. Revisions have significantly improved methods reporting, added some data to strengthen the conclusions, and toned down on overstatements to bring conclusions more in line with data presented. The title still overstates what the authors have actually tested, since no macrophage-specific targeting in vivo (no conditional gene deletion, macrophage depletion etc) was performed in infection studies. However, in vitro data provide clear evidence that macrophages can be trained by ATP, and through caveats remain, it is plausible that macrophage training is a key mechanism for the protection observed here in the lung.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) First, the concept of training or trained immunity refers to long-term epigenetic reprogramming in innate immune cells, resulting in a modified response upon exposure to a heterologous challenge. The investigations presented demonstrate phenotypic alterations in AMs seven days after ATP exposure; however, they do not assess whether persistent epigenetic remodeling occurs with lasting functional consequences. Therefore, a more cautious and semantically precise interpretation of the findings would be appropriate.

      In response, we have performed epigenetic analysis (ATAC seq analysis) as requested (Supp Fig. 1).

      (2) Furthermore, the in vivo data should be strengthened by additional analyses to support the authors' conclusions. The authors claim that susceptibility to Pseudomonas aeruginosa infection differs depending on the ATP-induced training effect. Statistical analyses should be provided for the survival curves, as well as additional weight curves or clinical assessments. Moreover, it would be appropriate to complement this clinical characterization with additional measurements, such as immune cell infiltration analysis (by flow cytometry), and quantification of pro-inflammatory cytokines in bronchoalveolar lavage fluid and/or lung homogenates.

      We have added the statistical analyses provided for the survival curves (new Fig. 1D), immune cell infiltration analysis, and quantification of pro-inflammatory cytokines in the lung (new Figs. 1, 2).

      (3) Moreover, the authors attribute the differences in resistance to P. aeruginosa infection to the ATP-induced training effect on AMs, based on a correlation between in vivo survival curves and differences in bacterial killing capacity measured in vitro. These are correlative findings that do not establish a causal role for AMs in the in vivo phenotype. ATP-mediated effects on other (i.e., non-AM) cell populations are omitted, and the possibility that other cells could be affected should be, at least, discussed. Adoptive transfer experiments using AMs would be a suitable approach to directly address this question.

      We have performed additional experiments and found that the numbers of lung macrophages were not significantly altered before and after ATP training (new Fig. 2), indicating the training effects are focused on lung resident macrophages.

      Reviewer #2 (Public review):

      (1) Missing details from methods/reported data: Substantial sections of key methods have not been disclosed (including anything about animal infection models, RNA-sequencing, and western blotting), and the statistical methods, as written, only address two-way comparisons, which would mean analysis was improperly performed. In addition, there is a general lack of transparency - the methods state that only representative data is included in the manuscript, and individual data points are not shown for assays.

      We have revised the methods and statistical analysis.

      (2) Poor experimental design including missing controls: Particularly problematic are the Seahorse assay data (requires normalization to cell numbers to interpret this bulk assay - differences in cell growth/loss between conditions would confound data interpretation) and bacterial killing assays (as written, this method would be heavily biased by bacterial initial binding/phagocytosis which would confound assessment of killing). Controls need to be included for subcellular fractionating to confirm pure fractions and for dye microscopy to show a negative background. Conclusions from these assays may be incorrect, and in some cases, the whole experiment may be uninterpretable.

      Seahorse assay methodology was updated to confirm the order of cell counting, time at seeding and cell counts. Methods were also updated to address the distinction between bacterial killing (Fig. 1B) and overall decrease in bacterial load.

      (3) The conclusions overstate what was tested in the experiments: Conceptually, there are multiple places where the authors draw conclusions or frame arguments in ways that do not match the experiments used. Particularly:

      (a) The authors discuss their findings in the context of importance for AM biology during respiratory infection but in vitro work uses cells that are well-established to be poor mimics of resident AMs (BMDM, RAW), particularly in terms of glycolytic metabolism.

      We have adjusted the text to reflect that the metabolic assay was performed on BMDMs. AMs are fragile for certain manipulations in vitro. We expect that the metabolic change is similar across several macrophage systems as well as the bacterial load reduction.

      (b) In vivo work does not address whether immune cell recruitment is triggered during training.

      We have performed immune cell infiltration analysis (new Fig. 2).

      (c) Figure 3 is used to draw conclusions about K+ in response to bacterial engulfment, but actually assesses fungal zymosan particles.

      We have corrected this in the manuscript.

      (d) Figure 5 is framed in bacterial susceptibility post-viral infection, but the model used is bacterial post-bacterial.

      We have corrected this in the manuscript.

      (e) In their discussion, the authors propose to have shown TWIK2-mediated inflammasome activation. They link these separately to ATP, but their studies do not test if loss of TWIK2 prevents inflammasome activation in response to ATP (Figure 4E does not use TWIK2 KO).

      We have now added the TWIK2 KO results (new Fig. 5E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As noted in the public review, it would be advisable to further characterize the in vivo phenotype in order to strengthen the conclusions. Specifically, it would be useful to quantify the bacterial load in the bronchoalveolar lavage fluid and lung homogenates, as well as to measure cytokine levels both in the respiratory compartment and systemically. Additionally, a broader characterization of the immune response in the presence or absence of ATP-induced training would be valuable. In the absence of direct evidence demonstrating that trained AMs mediate the observed phenotype, the authors should adopt a more cautious interpretation of their results. Moreover, careful attention to semantic accuracy is recommended. The concept of trained immunity refers specifically to long-term epigenetic reprogramming that leads to an altered response of target cells upon a secondary challenge, distant from the initial stress. The data presented do not fully demonstrate this phenomenon, and the interpretations should remain aligned with the evidence provided.

      Bacterial load has been quantified (see more details in the Methods part). And we also measured immune cell infiltration, quantification of pro-inflammatory cytokines in the lung (new Figs. 1, 2), and epigenetic evaluation of vehicle- and ATP-treated cells (Supp. Fig. 1).

      Reviewer #2 (Recommendations for the authors):

      (1) It cannot be overstated how lacking the methods are. This includes no discussion of IACUC approval for animal procedures, which must be included as part of research ethics. It also needs to be made clear where raw data is being archived. This notably includes an accession for deposited RNA-sequencing data, although unmanipulated microscopy and western blot images should also be shown. Methods should discuss any pre-processing that occurred with images.

      We have revised the methods in the manuscript.

      (2) Per statistics, in addition to generally providing more detail and adjusting analyses if they have not been correctly performed, please disclose if SD or SEM is shown. Reporting aggregate data versus representative data would provide more rigor. Perhaps replicate experiments could be included in the supplemental if they cannot, for some reason, be aggregated. Detailed statistical methods for RNA-seq analysis also need to be included.

      More details have been provided in the methods section.

      (3) It is unclear whether bacterial killing assays were correctly designed and can be interpreted. What does cells collected mean? If the assay was focused on intracellular macrophage bacterial load, it is critical to assess and report phagocytosis since different input loads would confound the assessment of killing. A rigorous wash or an antibiotic to eliminate extracellular bacteria should also have been performed and be described in this case. If the total bacterial burden was assessed, that would use cells+media and also needs to be clear and described. With the information provided, it is unclear whether the assays performed are sufficiently rigorous to assess bacterial killing. In addition, Figure 1B reports using an MOI of 50-100, but all data is compiled in one graph - data from different levels of infection should be separated. Figure 5A shows a model with E.coli followed by PA, but that does not appear to be how the assay was structured in B or C. This also does not match how the experiment is written in the results section, which references S. aureus. It is unclear what tissue (or cells) were assessed in Figure 5. Whole lung? BAL? As written, no data provided regarding bacterial killing is of sufficient quality to be considered valid.

      We have re-written the bacterial killing assay in the manuscript. The methodology was corrected to distinguish bacterial killing vs load decrease and generally accurate methodology.

      (4) The in vitro data provide reasonable evidence that BMDM/RAW macrophage training can occur in response to ATP exposure. However, it is unclear whether training is an important mechanism for resident AM in vivo, or whether, in vivo, a broader inflammatory response is generated, recruiting additional immune cells that persist and change infection susceptibility. The authors argue for resident AM immune training, but do not provide sufficient evidence to counter the latter possibility (resident AM are never themselves directly assessed, and the presence of other immune cells in vivo is not excluded). See Iliakis et al 2023 (PMID 37640788) for discussion of how this issue continues to drive uncertainty in the field. For this study, at least providing flow cytometry data quantifying myeloid and lymphoid immune populations in BALF before and after various treatments would help address this caveat. Without knowing this, it also confounds the interpretation of Figure 1B; if BAL is not pure AM after training, perhaps 1B could be repeated with ex vivo training or resident AM could be purified?

      We have performed immune cell infiltration analysis in the lung (both to BALF and in-tissue, new Fig. 2).

      (5) Figure 3A appears to show that fewer than 50% of cells express GFP. Is it expected that only a fraction of RAW cells express TWIK2-GFP? How was this addressed in the analyses for Figure 3? Were cells not appearing to express any significant GFP, included in phagosomal-negative or excluded from analysis? Please include in the methods.

      The RAW cells were transfected with TWIK2-GFP and variable GFP expression was expected. These cells were expressing a non-integrated transgene, which has been added to the methods as well as the consideration of cells for the analysis. Cells without visible GFP expression were excluded.

      (6) Why are many data points in Figure 3D negative? This suggests that settings were not optimized for microscopy - perhaps there is a very high background signal and the ION stain is barely above it. This is concerning for the quality of data. Further, is it expected that only some cells are positive for ION K+? The images shown clearly differentiate phagosomal K with ATP versus the absence of K without, but it is surprising that some cells appear not to contain any ION K+ signal (not completely clear given lack of brightfield or other cell staining) - this may again point to issues with imaging settings that confound data interpretation. This analysis should be carefully assessed.

      This has been updated in the methodology. In old Fig. 3D (new Fig. 4D), the presented data is the net intensity of the phagosome, subtracting the average cytoplasmic MFI from that of the area corresponding to an engulfed zymosan-af594 bead. Thus, a negative value has higher cytoplasmic IonK signal than that of the phagosome.

      (7) The Discussion states that it will be interesting to test whether ATP-TWIK2 is a common mechanism of training and specifically references LPS as an ATP-generating signal. However, Figure 2D data show that LPS induces only transient TWIK2 translocation; the authors have data suggesting that, in the context of LPS, TWIK2 'training' will not be engaged. This line of discussion shows incomplete consideration of the data.

      We have further limited this language in the text such that this may require differential sensitivity/damage sustained by macrophages as compared to that of epi/endothelial cells in response to bacterial endotoxin.

      (8) For RNA-sequencing, plots of the actual genes changed for the mitochondrial pathways of interest would be helpful information for readers, as would a heat map showing sample purity between groups for macrophage markers versus possible contaminant cells, which can also be generated from precursors in BMDM cultures. In general, information in Methods regarding how the analyses in Figure 4B were run is necessary, per cutoffs used to determine DEGs, number of samples in each group, sex of samples used, etc. Greater transparency of data would be appreciated, so plots that show variation between replicates, such as heat maps, would be ideal. Supplemental tables would also be nice.

      We have added to the methodology of the RNA sequencing analysis

      (9) The use of alternate DAMPs is a positive addition to the experimental design, but no data is given regarding the concentrations used. Ideally, positive controls showing histones/NAD are used at acutely activating concentrations could be included but at least references supporting the doses chosen or information about how doses were selected should be given. It is easy to find substantial literature on histones as a DAMP, but it was unclear why/how NAD was selected.

      We have added these concentrations and corresponding references.

      (10) The E.coli CFU reported in Figure 5B are extraordinarily low. In addition, CFU are generally shown on a log scale, but this appears to be linear. Please confirm that these data are correct. Perhaps improved methods might explain why? Is the second hit a low dose?

      These have been corrected in the new Fig. 6B.

      (11) Given that loss of either TWIK2 or Nlrp3 ablates bacterial protection, a link should be tested - experiments should test whether loss of TWIK2 prevents inflammasome activation in response to ATP (TWIK2 KO in 4E) and if loss of Nlrp3 changes TWIK2 translocation (Nlrp3 KO in at least some experiments of Figures 2/3).

      We have now added the TWIK KO results (new Fig. 5E).

      (12) One of the most striking data pieces is Figure 1D. It would, therefore, strengthen the paper to repeat those experiments (even just with the high-dose ATP) using TEIK2/P2X7/NLRP3 KO mice and really show the importance of these pathways in vivo. This is conceptually Figure 5, but the survival data of Figure 1 is far more convincing than the relatively weak bacterial load data of Figure 5.

      Unfortunately, our previous laboratory has been closed and we have trouble acquiring enough mice for additional survival data during the transition period. However, the bacterial load data has been adjusted to the same bacterial counts per 5 mg lung tissue instead of per individual sampling, giving a more contextual interpretation of the data.

    1. eLife Assessment

      This fundamental study provides the first genome-wide characterization of H3K115 acetylation and identifies a striking and previously unappreciated association of this globular-domain histone modification with fragile nucleosomes at CpG island promoters, active enhancers, and CTCF binding sites. While the work is largely descriptive and correlative in nature the evidence is compelling. The authors present multiple, orthogonal genomic and biochemical analyses that consistently support their central conclusions.

    2. Reviewer #1 (Public review):

      Summary

      The authors set out to define the genomic distribution and potential functional associations of acetylation of histone H3 lysine 115 (H3K115ac), a poorly characterized modification located in the globular domain of histone H3. Using native ChIP-seq and complementary genomic approaches in mouse embryonic stem cells and during differentiation to neural progenitor cells, they report that H3K115ac is enriched at CpG island promoters, active enhancers, and CTCF binding sites, where it preferentially localizes to regions containing fragile or subnucleosomal particles. These observations suggest that H3K115ac marks destabilized nucleosomes at key regulatory elements and may serve as an informative indicator of chromatin accessibility and regulatory activity.

      Strengths

      A major strength of this study is its focus on a histone post-translational modification in the globular domain, an area that has received far less attention than histone tail modifications despite strong evidence from structural and in vitro studies that such marks can directly influence nucleosome stability. The authors employ a wide range of complementary genomic analyses-including paired-end ChIP-seq, fragment size-resolved analyses, contour (V-) plots, and sucrose gradient fractionation-to consistently support the association of H3K115ac with fragile nucleosomes across promoters, enhancers, and architectural elements. The revised manuscript is careful in its interpretation and provides a coherent and internally consistent picture of how H3K115ac differs from other acetylation marks such as H3K27ac and H3K122ac. The datasets generated will be valuable to the chromatin community as a resource for further exploration of nucleosome dynamics at regulatory elements.

      Weaknesses

      The conclusions are largely correlative. While the authors provide strong evidence for the localization of H3K115ac to fragile nucleosomes, the work does not directly test whether this modification causally contributes to nucleosome destabilization or regulatory function in vivo. Questions regarding the enzymes responsible for depositing or removing H3K115ac and its direct functional consequences therefore remain open.

      Overall assessment and impact

      Overall, the authors largely achieve their stated aims by providing a detailed and well-supported characterization of H3K115ac distribution in mammalian chromatin and its association with fragile nucleosomes at regulatory elements. While mechanistic insight remains to be established, the study introduces a compelling new perspective on globular-domain histone acetylation and highlights H3K115ac as a potentially useful marker for identifying functionally important regulatory regions. The work is likely to stimulate further mechanistic studies and will be of broad interest to researchers studying chromatin structure, transcriptional regulation, and genome organization.

    3. Reviewer #2 (Public review):

      Summary:

      Kumar et al. aimed to assess the role of the understudied H3K115 acetylation mark, which is located in the nucleosomal core. To this end, the authors performed ChIP-seq experiments of H3K115ac in mouse embryonic stem cells as well as during differentiation into neuronal progenitor cells. Subsequent bioinformatic analyses revealed an association of H3K115ac with fragile nucleosomes at CpG island promoters, as well as with enhancers and CTCF binding sites. This is an interesting study, which provides important novel insights into the potential function of H3K115ac. However, the study is mainly descriptive, and functional experiments are missing.

      Strengths:

      (1) The authors present the first genome-wide profiling of H3K115ac and link this poorly characterized modification to fragile nucleosomes, CpG island promoters, enhancers, and CTCF binding sites.

      (2) The study provides a valuable descriptive resource and raises intriguing hypotheses about the role of H3K115ac in chromatin regulation.

      (3) The breadth of the bioinformatic analyses adds to the value of the dataset

      Comments on revisions:

      The authors sufficiently addressed my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Kumar et al. examine the H3K115 epigenetic mark located on the lateral surface of the histone core domain and present evidence that it may serve as a marker enriched at transcription start sites (TSSs) of active CpG island promoters and at polycomb-repressed promoters. They also note enrichment of the H3K115ac mark is found on fragile nucleosomes within nucleosome-depleted regions, on active enhancers and CTCF bound sites. They propose that these observations suggest that H3K115ac contributes to nucleosome destabilization and so may servers a marker of functionally important regulatory elements in mammalian genomes.

      Strengths:

      The authors present novel observations suggesting that acetylation of a histone residue in a core (versus on a histone tail) domain may serve a functional role in promoting transcription in CPG islands and polycomb-repressed promoters. They present a solid amount of confirmatory in silico data using appropriate methodology that supports the idea that H3K115ac mark may function to destabilize nucleosomes and contribute to regulating ESC differentiation. These findings are quite novel.

      Weaknesses:

      Additional experiments to confirm specificity of the antibodies used have been done, improving confidence in the study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      The reviewer was under the impression that that we did not perform biological replicates of our ChIP-seq experiments. All ChIP-seq (and ATAC-seq) experiments were performed with biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. We had indicated this in the text and methods but will try to make this even clearer.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      Histone modifications are difficult to alter genetically because of the high copy number of histone genes and inhibition of HATs/HDACs in general leads to alterations in other histone modifications. It is an inherent challenge in establishing causality of histone modifications, especially histone acetylation marks.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      We have modified the text in response to this point. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

      We agree that the paper does not provide mechanistic details or solid causality of H3K115ac. We have only emphasized the potential role of H3K115ac in nucleosome fragility based on our in vivo data and previously published in-vitro experiments (Manohar et.al., 2009, Chatterjee et. al., 2015). We do provide the evidence that H3K115ac is enriched on subnucleosomal particles via sucrose gradient sedimentation of MNase-digested chromatin (Figure 3C-D).

      Reviewer #2 (Public review):

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region. It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      ChIP-qPCR in S1B includes competition from native chromatin and shows high specificity to its target. We have provided antibody validation in three ways:

      - Western blot with dot-blot of synthetic peptides (Figure S1A).

      - Western blots with Whole cell extracts (Figure 4D).

      - ChIP-qPCR on native chromatin spiked with a cocktail of synthetic mono-nucleosomes, each carrying a single acetylation and a specific barcode (SNAP-ChIP K-AcylStat Panel).

      We could not include H3K115ac marked nucleosomes as they are not available in the panel. Figure S1B shows that the H3K115ac antibody exhibits negligible binding to known K-acyl marks, comparable to an unmodified nucleosome. Because of the absence of a H3K115ac modified barcoded nucleosome, we used the KLF4 promoter from mESCs as a positive control, in agreement with ChIP-seq signal shown in the genome browser profile (Figure 1E), the KLF4 promoter shows a significantly higher signal than the gene body.

      (2) The association of H3K115ac with fragile nucleosomes is based on MNase-sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      We have performed ChIP-seq on MNase digested mESC chromatin fractionated on sucrose gradients and this shows that H3K115ac is enriched in fractions containing sub-nucleosomal and fragile nucleosomes but depleted in fractions containing stable nucleosomes (Figure 3D).

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      H3K64ac and H3K122ac datasets were generated by us in a previous publication (Pradeepa et. al., 2016) using same native MNase ChIP protocol as used here. The ChIP-seq datasets for H3K122ac and H3K27ac are processed in an identical manner, with the same computational pipelines, to the H3K115ac data sets generated in this paper.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      We agree with the reviewer’s comment, but we have not claimed causality.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      Due to broad target specificity, redundancies and crosstalk among different classes of HATs and HDACs, it is not tractable to answer this question in the current manuscript.

      Reviewer #3 (Public reviews):

      Reviewer 3 is mistaken in thinking our ChIP experiments are performed under cross-linked conditions. As clearly stated in the main text and methods, all our ChIP-seq for histone modifications is done on native MNase-digested chromatin – with no cross-linking. This includes the spike-in experiment shown in Fig S1B to test H3K115ac antibody specificity against the bar-coded SNAP-ChIP® K-AcylStat Panel from Epicypher. We could not include H3K115ac bar-coded nucleosomes in that experiment since they are not available in the panel.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I have two primary concerns that resound through the entire paper:

      (a) Overall, the manuscript is making strong claims based on entirely correlative datasets. No quantitative analyses are performed to demonstrate co-occupancy/localization. Please see more detailed descriptions below.

      Our responses to specific points are provided against each comment below.

      (b) Lack of paired-end replicates for H3K115ac ChIP-seq. While the reviewer token for the deposited data was not made accessible to me, looking at Supplementary Table 1, it appears there are two H3K115ac ChIP-seq datasets. One is paired-end and is single-read. So are peaks called with only one replicate of PE? Or are inaccurate peaks called with SR datasets? Either way, this is not a rigorous way to evaluate H3K115ac localization.

      We are sorry that this reviewer was not able to access the data – the token for the GEO accession was provided for reviewers at the journal’s request. All ChIP-seq (and ATAC-seq) experiments (paired and single-end) were performed with two biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. This was indicated in both the main text and in the methods. In the revised manuscript we have tried to make this even clearer and have put the relevant Pearsons coefficient (r) into the text at the appropriate places. For the reviewer’s information, here is the complete list of data samples in the GEO Accession:

      Author response image 1.

      While I agree that H3K115ac occupancy is high at +CGIs, the authors downplay that H3K122ac and H3K27ac is also more highly enriched at these locations (page 7, last sentence of first paragraph). I imagine this is all due to the more highly transcribed nature of these genes. Sub-stratifying the K27ac and K122ac by transcription (as in Figure 1G) would help to demonstrate a unique nature of H3K115ac. But even better would be to do an analysis that plots H3K115ac enrichment vs transcription for every individual gene rather than aggregate analyses that are biased by single locations. For example, make an XY scatterplot of RNAPII occupancy or 4SU-seq signal vs H3K115ac level, where each point represents a single gene. Because the interpretation that it is CGI-based and not transcription is confounded with the fact that -CGI are more lowly transcribed. So, looking at Figure 1G, even the -CGI occupancy of H3K115ac is correlated with transcription, but it is just more lowly transcribed.

      We thank the reviewer for these suggestions but point out that Figure 1G shows H3K115ac signal for CGI+ and CGI– TSS that are matched for expressions levels (quartiles of 4SU-seq). Fig 1F shows that H3k115ac is much more of a discriminator between CGI+ and – than H3K27ac or H3K122ac.

      (2) H3K115ac, H3K27ac, and H3K122ac are all more enriched (in aggregate) at +CGI locations (Fig 1F); so do these locations just have more positioned nucleosomes? More H3.3? So that these PTMs are just more enriched due to the opportunity?

      Positioned nucleosomes are generally found downstream of the TSS of active CpG island promoters, so what the reviewer suggests may well account for the relative enrichment of H327ac and H3K122ac at CGI+ vs CGI- promoters in Fig.1F. But H3K115ac localisation is distinct, with the peak at the nucleosome-depleted region not the +1 nucleosome. This is also confirmed by the contour plots in Fig 3. Our observation is also not explained by an enrichment of H3.3 at CGI promoters, since we show that H3K115ac is not specific to H3.3 (Fig 4D).

      (3) The authors note in paragraph 2 of page 7 that "H3K115ac does not scale linearly with gene expression..." but the authors never show a quantification of this; stratification in four clusters is not able to make a linear correlation. Furthermore, in the second line of page 7, the authors state that the levels do generally correlate with transcription. To claim it is a specific CGI link and not transcription is tricky, but I encourage the authors to consider more quantifiable ways, rather than correlations, to demonstrate this point, if it is observed.

      We thank the reviewer for this comment, and taking it into consideration, we have decided to re-phrase this paragraph. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) The authors claim on page 7 that "on average, transcription increased from TSS that also gained H3K115ac but to a modest extent, compared with the more substantial loss of H3K115ac from downregulated TSS". However, both upregulated and downregulated are significant; the difference in magnitude could simply be due to more highly or more lowly transcribed locations, meaning that fold change could be more robustly detected. I caution the authors to substantiate claims like this rather than stating a correlation.

      We thank the reviewer for this comment which relates to the data in Fig 2A. It is Fig. 2B shows that the association of H3K115ac loss with downregulation is statistically stronger than H3K115ac gain with upregulation, but only for CGI promoters. With regard to the text on the original pg 7 that is referred to, we have now reworded this to read “Average levels of transcription increased from TSS that also gained H3K115ac, and there was loss of H3K115ac from downregulated TSS (Figure 2A).”

      (5) For Figure 2C, the authors argue that H3K115ac correlate with bivalent locations. So this is all qualitative and aggregate localization; please quantitatively demonstrate this claim.

      Figure S2D provides statistics for this (observed/expected and Fishers exact test).

      (6) The authors claim in Figure 2 that H3115ac is dynamic during differentiation (title of Figure 2). However, there are locations that gain and lose, or maintain H3K115ac. In fact, the most discussed locations are H3K115ac with no change (2C); which means it is NOT dynamic during differentiation. So what is the message for the role during differentiation? From Supplemental Table 1, it appears there is a single ChIP experiment for H3K115ac in NPC, and it is a single read. So this is also a difficult claim with one replicate. Related to this, in S2A, the authors show K115ac where there is no change in transcription; so what is the role of H3K115ac at TSSs relevant to differentiation - it is at both locations changed and unchanged in transcription, but H3K115ac levels itself do not change at these subsets. So, how is this dynamic? This is very confusing, and clearer analyses and descriptions are necessary to deconvolute these data.

      We apologise for the misleading title for Figure 2. This has now been amended to “Changes in H3K115ac during differentiation”. The message of this figure is that whilst changes in H3K115ac at TSS are small (panels A-C), at enhancers the changes are much more dramatic (panel D). The reviewer is incorrect about the number of replicates for NPCs – there are two biological replicates (see response to point 1b).

      (7) The authors go on to examine H3K115ac enrichment on fragile nucleosomes through sucrose gradient sedimentation. A control for H3K27ac or H3K122ac would be nice for comparison.

      We do not have the material available to perform these experiments

      (8) When discussing Figures 3 and SF3, the authors mention performing a different MNase for a second ChIP. Showing the MNase distribution for both the more highly digested and the lowly digested would be nice. a) Related to the above, the authors show input in SF3E to argue that the difference in H3K115ac vs H3K27ac is not due to the library, but they do not show the MNase digestion patterns, which is more important for this argument.

      Input libraries (first two graphs of FigS3E) are the MNase-digested chromatin. Comparison of nucleotide frequencies from millions of reads is more robust method than the fragment length patterns.

      (9) The authors move on to examine H3K115ac at enhancers. Just out of curiosity, given what was found at promoters, is H3K115ac enriched at +CGI enhancers? And what is the correlation with enhancer transcription?

      This is an interesting point, but the number of enhancers associated with CGI is not very high and so we did not focus on this. We have not analysed a correlation with eRNAs in this paper.

      (10) The authors state on page 14 that the most frequent changes in H3K115ac during differentiation are at these enhancers. So do these changes connect with differentiation-specific genes, and/or genes that have altered transcription during differentiation? Just trying to understand the functional role.

      Given the challenges of connecting enhancers with target genes, we have not addressed this question quantitatively. However, we draw the reviewer’s attention to the Genome Browser shots in Figures 2D and S2C, which show clear gain of H3K115ac (and ATAC-seq peaks) at intra and intergenic regions close to genes whose transcription is activated during the differentiation to NPCs.

      (11) Related, at the end of page 14, the authors state that the changes in H3K115ac correlate with changes in ATAC-seq; I imagine this dynamic is not unique for H3K115ac and this is observed for other PTMs (H3K27ac), so assessing and clarifying this, to again get to the specific interest of H3K115ac, would be ideal.

      We have not claimed that chromatin accessibility is unique to H3K115ac. It is the location of H3K115ac which is found inside the ATAC-seq peak region while H3K27ac is found only upstream/downstream of the ATAC peak that is so striking. This is apparent in Fig 4C.

      (12) The authors examine levels of H3K115ac in H3.3 KO cell lines via western blot (Figure 4D), but no replicates and/or quantification are shown.

      We now provide a biological replicate for the Western Blot (new FigS4H) together with an image of the whole gel for the data in Fig 4D

      (13) In Figure S4 and at the end of page 17, the authors are arguing that there is a link to pioneer TF complexes, based on Oct4 binding. First, while Oct4 has pioneering activity, not all Oct4 sites (or motifs) are pioneering; this has been established. So if you want to use Oct4, substratifying by pioneer vs no pioneer is necessary. Second, demonstrating this is unique to pioneer and not to non-pioneer TFs would be an important control.

      In response to the reviewer’s comment, we have removed the term “pioneer” from the manuscript.

      (14) Minor point: Figure 4 A and B, there are some formatting issues with the scale bars.

      We thank the reviewer for pointing this out, and the errors have been corrected in the revised figure.

      (15) Minor point is that it should be clear when single replicates of data are used and when PE/SR sequences are combined or which one is used in each analysis, as this was hard to discern when reading the paper and figure legends.

      We have clearly stated in the text that, after Figure2, we repeated all experiments in paired-end mode. All processing steps are defined separately for single end and paired end datasets in the method section. Details of biological replicates are provided in Sup. Table 1. These concerns are also addressed in our response to Reviewer’s public comment-1.

      (16) Minor point: it is surprising that different MNase and different units were used in the ChIP vs sucrose sedimentation. Could the authors clarify why?

      Chromatin prep for sucrose gradients were done on a much larger scale than for ChIP-seq and required different setups to obtain the right level of MNase digestion.

      (17) The authors note that fragile nucleosomes contain H2A.Z and H3.3, but they never perform an analysis of available data to demonstrate a correlation (or better a quantifiable correlation) between H3K115ac occupancy and these marks at the locations they identify H3K115ac.

      Since have shown (Fig. 4) that depletion of H3.3 does not affect overall levels of H3K115ac, we do not think there is value in further quantitative correlative analyses of H3K115ac and variant histones.

      (18) Minor point: What is the overlap in peaks for H3K115ac, H3K122ac, and H3K27ac (Figure 1C)?

      Nearly all H3K115ac peaks overlap with H3K122ac and/or H3K27ac. Its most distinct properties are its association with CGI promoters, fragile nucleosomes and its unique localisation within the NDRs, three points that the manuscript is focussed on.

      Reviewer #3 (Recommendations for the authors):

      (1) The western blot results in Figure 4D probing for H3, H3.3, and H3K115ac use Ponceau S staining, presumably of an area of the membrane where histones might be expected to migrate, as a measure of loading. However, the Ponceau S bands appear uniformly weaker in the H3.3KO lanes, yet despite this, blotting with H3.3 antibody detects a band in H3.3 knockout ESCs, suggesting that the antibody does not have a high degree of specificity. Again, a blocking experiment with appropriate peptides would instill more confidence in the specificity of these reagents, and/or the authors could provide independent validation of the knockout model to differentiate between a partial knockout or antibody cross-reactivity (e.g., by Sanger sequencing).

      In a revised Fig. S4H we now show the whole gel corresponding to this blot but including co-staining with an antibody for H4 to provide a better loading control. We also provide a biological replicate of this Western blot in the lower panel of Fig. S4H.

      (2) The manuscript would benefit from in vitro follow-up and validation, but if the authors intend to keep the manuscript primarily in silico, I suggest dedicating a few lines in each section to explain the plots, their axes, and their purpose, as well as to assist with interpretation, rather than directly discussing the results. This would make the manuscript more accessible and understandable for a broader audience in the field of epigenetics.

      In the revised version, we have tried to improve the text to make the data more accessible to a broad audience.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of OXT (oxytocin) neurons and OXTR (oxytocin receptor) expressions in mammalian brains using an advanced RNAscope at the single transcript level. The evidence supporting the conclusions is compelling using chromogenic assays and state-of-the-art microscopy. The work will be of broad interest to neuroscientists and endocrinologists.

    2. Reviewer #1 (Public Review):

      This study by Ryu et al, provides compelling evidence to demonstrate the distributions of Oxt and Oxtr in the murine brain using an advanced RNAscope technique. Detailed information on the distributions was provided, revealing differences in Oxt and Oxtr expressions between males and females. This study will provide a new platform for investigators to study previously unknown roles of brain-region specific Oxt and Oxtr neurons and signaling in animal behaviors and metabolism, and others.

    3. Reviewer #2 (Public Review):

      This an exciting study investigating the role of OXT in central nervous system (CNS) regulation of different behaviors and physiological processes. The study clearly shows the expression level of Oxt and Oxtr in different brain nuclei and regions.

      Sex differences in Oxt expression are also well demonstrated.

      Extensions of OXT's function in CNS regulation are sufficiently discussed.

      Overall, this provides a good direction for further investigate OXT's role in CNS's regulation on different behaviors and physiological processes.

    1. eLife Assessment

      This potentially important study explores the specificity of olfactory perceptual learning. In keeping with previous work, the authors found that learning to discriminate between two enantiomers does not generalize across the nostrils or to unrelated enantiomers, whereas learning to discriminate odor mixtures does generalize across the nostrils and to other odor mixtures, with this learning effect persisting over at least two weeks. While the evidence presented to support these findings is convincing, it remains unclear why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

    2. Reviewer #1 (Public Review):

      This study extends a previous study by the same group on the generalization of odor discrimination from one nostril to the other. In their earlier study, the group showed that learning to discriminate between two enantiomers does not generalize across nostrils. This was surprising given the Mainland & Sobel 2001 study that found that detecting androstenone in people who do not detect it can generalize across the two nostrils. In this study, they confirmed their previous results and reported that, unlike enantiomers, learning to discriminate odor mixtures generalizes across nostrils, generalizes to other odor mixtures, and is persistent over at least two weeks. This interesting and important result extends our knowledge of this phenomenon and will likely steer more research. It may also help develop new training protocols for people with impairments in their sense of smell.

      The main weakness of this study is its scope, as it does not provide substantial insight into why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

    3. Reviewer #2 (Public Review):

      The manuscript from Chang et al. taps on an important issue in olfactory perceptual plasticity, named the generalization of perceptual learning effect by training using odors. They employed a discrimination training/learning task with either binary odor mixture or odor enantiomers, and tested for post-training effect at several time intervals. Their results showed contrasting patterns of specificity (enantiomers) and transfer (odor mixtures), and the learning effect persisted at 2 weeks post-training. They demonstrated that the effect was independent of task difficulty, olfactory adaptation and gender.

      Overall this was a well-controlled study and shows novel results. The strength of the study includes the consideration of odor structure and perceptual (dis)similarity and the control training condition. I have two minor issues that hope the authors could address in the next version of the manuscript.

      1) The author used a binary odor mixture with a ration 7:9 or 9:11, why is this ratio chosen and used for the experiment?

      2) Over the course of training, has the valence of odor (odor mixture) changed, it would be helpful to include these results in the supplements. As the author indicated in the discussion, the potential site underlying the transfer effect is the OFC, which has been found to represent odor valence previously (Anderson, Christoff et al. 2003). It would be nice to see the author replicate the results with odor/odor mixture valence (change) controlled.

      Anderson, A. K., K. Christoff, I. Stappen, D. Panitz, D. G. Ghahremani, G. Glover, J. D. Gabrieli and N. Sobel (2003). "Dissociated neural representations of intensity and valence in human olfaction." Nat Neurosci 6(2): 196-202.

    4. eLife Assessment

      This potentially important study explores the specificity of olfactory perceptual learning. In keeping with previous work, the authors found that learning to discriminate between two enantiomers does not generalize across the nostrils or to unrelated enantiomers, whereas learning to discriminate odor mixtures does generalize across the nostrils and to other odor mixtures, with this learning effect persisting over at least two weeks. While the evidence presented to support these findings is convincing, it remains unclear why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

      Discrimination of odor enantiomers ultimately relies on the enantioselectivity of olfactory receptors, whereas mixture discrimination likely depends on relative differences in perceived configural odor notes. These processes probably engage plasticity at different stages of the olfactory pathway. The revised Discussion (p.16-18) now elaborates on this distinction and the potential underlying mechanisms. Please also refer to our responses to Reviewer 1’s Point 1 and Reviewer 2’s Points 2 and 3 below.

      Reviewer #1 (Public Review):

      This study extends a previous study by the same group on the generalization of odor discrimination from one nostril to the other. In their earlier study, the group showed that learning to discriminate between two enantiomers does not generalize across nostrils. This was surprising given the Mainland & Sobel 2001 study that found that detecting androstenone in people who do not detect it can generalize across the two nostrils. In this study, they confirmed their previous results and reported that, unlike enantiomers, learning to discriminate odor mixtures generalizes across nostrils, generalizes to other odor mixtures, and is persistent over at least two weeks.

      This interesting and important result extends our knowledge of this phenomenon and will likely steer more research. It may also help develop new training protocols for people with impairments in their sense of smell.

      We thank the reviewer for the encouraging remarks.

      The main weakness of this study is its scope, as it does not provide substantial insight into why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

      We thank the reviewer for this insightful comment. While the present study does not directly identify the neural mechanisms underlying these differences, it provides behavioral constraints on where specificity and generalization may arise within the olfactory system. Further neuroimaging and neurophysiological work will be needed to fully elucidate the underlying mechanisms.

      Reviewer #2 (Public Review):

      The manuscript from Chang et al. taps on an important issue in olfactory perceptual plasticity, named the generalization of perceptual learning effect by training using odors. They employed a discrimination training/learning task with either binary odor mixture or odor enantiomers, and tested for post-training effect at several time intervals. Their results showed contrasting patterns of specificity (enantiomers) and transfer (odor mixtures), and the learning effect persisted at 2 weeks post-training. They demonstrated that the effect was independent of task difficulty, olfactory adaptation and gender.

      Overall this was a well-controlled study and shows novel results. The strength of the study includes the consideration of odor structure and perceptual (dis)similarity and the control training condition.

      We appreciate the reviewer’s positive assessment of our work.

      I have two minor issues that hope the authors could address in the next version of the manuscript.

      (1). The author used a binary odor mixture with a ration 7:9 or 9:11, why is this ratio chosen and used for the experiment?

      This ratio was selected based on pilot testing and practical constraints. During piloting, we evaluated several mixing ratios to identify those that met two key criteria: (1) Baseline indiscriminability: Most participants were unable to reliably discriminate between the two binary mixtures in a:b and b:a ratios at baseline. (2)Trainability: With 1–5 weeks of training, participants could acquire the ability to discriminate between them.

      The a:b ratios of 7:9 and 9:11 were the ratios that met both criteria in our pilot testing, making them suitable for assessing training‑induced improvements in mixture discrimination. This clarification has been added to the revised Olfactory Stimuli subsection of the Materials and Methods (p.19-20 of the revised manuscript).

      (2) Over the course of training, has the valence of odor (odor mixture) changed, it would be helpful to include these results in the supplements. As the author indicated in the discussion, the potential site underlying the transfer effect is the OFC, which has been found to represent odor valence previously (Anderson, Christoff et al. 2003). It would be nice to see the author replicate the results with odor/odor mixture valence (change) controlled.

      Anderson, A. K., K. Christoff, I. Stappen, D. Panitz, D. G. Ghahremani, G. Glover, J. D. Gabrieli and N. Sobel (2003). "Dissociated neural representations of intensity and valence in human olfaction." Nat Neurosci 6(2): 196-202.

      Odor valence ratings were not collected in Experiments 1 and 2. However, we have since conducted a new experiment examining concentration discrimination learning (see our response to Reviewer 1, Point 1), using the constituents of the mixtures from Experiment 2 as stimuli (i.e., concentration pairs of acetophenone, 2 octanone, methyl salicylate, and isoamyl butyrate). In this new experiment (now incorporated as Experiment 3 in the revised manuscript), unilateral odor valence ratings were collected at baseline (Day 0) and at the post training test and retests on Days N, N+1, N+3, N+7, and N+14.

      For all odor pairs (training and controls), there was no significant change in perceived valence from baseline to Day N, regardless of nostril (ps > 0.05 for the main effects of session and nostril, as well as their interaction; Figure S5D). Moreover, odor valence ratings remained stable across the five post training test sessions (ps ≥ 0.29 for the main and interaction effects involving session), showing the same pattern as at baseline (Figure S5D, F). Thus, training appeared to have no measurable influence on odor valence perception. These results have been incorporated into the revised manuscript on p.14-15.

    1. eLife Assessment

      This important study provides evidence, albeit still incomplete, that high-elevation species lose water at slower rates than low-elevation species. The findings imply that egg physiology may be a factor limiting the distributional range of bird species. While this work reinforces the need for all life stages to be considered when evaluating physiological adjustment to climate change, the analyses as presented by the authors do not clearly highlight the specific impact of species differences in influencing these adjustments.

    2. Reviewer #1 (Public Review):

      The authors tested the hypothesis that at high elevations avian eggs will be adapted to prevent desiccation that might arise from loss of water to surrounding drier air. They used a combination of gas diffusion experiments and scanning electron microscopy to examine water vapour conductance rates and eggshell structure, including thickness, pore size, and pore density among 197 bird species distributed along an elevational gradient in the Andes. While there was a correlation between water vapour conductance and elevation among species, a decrease in water vapour conductance with elevation was not associated with eggshell thickness, pore size, and pore density, suggesting the variation in the structure of the eggshells is unlikely to do with among species differences in water loss along elevational gradients. This study is very interesting and timely, especially with increasing water vapour pressure due to climate warming. It is a very well-written study and easy to read. However, I have some concerns about the conclusions drawn from the results.

      There are more than twice as many species in low and medium-elevation sites compared to high-elevation sites, so the amount of variation in low and medium-elevation should be expected to be higher by default. The argument for a wider range of variation in low-elevation species will be stronger if the comparison was a similar sample size. Moreover, the pattern clearly breaks down within families. Note also that for Low and medium elevation there is no difference in the amount of variation in conductance residuals possibly because the sample sizes are similar. The seemingly strong positive correlation between eggshell conductance and egg mass may be driven by the five high and two medium-elevation species with large eggs. There seem to be hardly any high-elevation species with egg mass greater than 12g whereas species in low elevation egg size seem to be as high as 80g (Figure 2a). Since larger eggs (and thus eggs of larger birds) lose more water compared to smaller eggs, the correlation between water vapour conductance and elevation may be more strongly associated with body size distribution along elevational gradients rather than egg structure and function.

      Authors argue that the observed variation in the relationship between water vapour conductance and elevation among and within bird families suggests potential differences in the adaptive response to common selective pressures in terms of eggshell thickness and pore density, and size. The evidence for this is generally weak from the data analyses because the decrease in water vapour conductance with elevation was not consistent across taxonomic groups nor were differences associated with specific patterns in eggshell thickness and pore density, and size.

      It is not clear how the authors expected the relationship between water vapour conductance and elevation to differ among taxonomic groups and there was no attempt to explain the biological implication of these differences among taxonomic groups based on the specific traits of the species or their families. This missing piece of information is crucial to justify the argument that differences among taxonomic groups may be due to differences in adaptive response.

    3. Reviewer #2 (Public Review):

      Many tropical montane species live only within narrow elevational ranges. Rapid climate change has led to considerable interest in determining whether these narrow elevational ranges are the result of physiological specialization: if so, then warming temperatures will have direct fitness consequences. Thus far, studies of tropical montane ectotherms have often found patterns consistent with physiological specialization, while the few field studies of tropical montane birds (endotherms) have not. However, these few studies measured the thermal physiology of adult birds. The early life stages of birds may show physiological specialization, as eggs and nestlings function as ectotherms.

      In this paper, Ocampo and colleagues provide the first test of the hypothesis that bird eggs are physiologically specialized to the climatic conditions of certain elevational zones. They use experiments and observations to measure water vapor conductance rates and eggshell traits in a diverse set of 197 species that live from the lowland Amazon to the high Andes. Ocampo and colleagues present two principal results: (1) High-elevation eggs lose less water over time than do low-elevation eggs, high elevations tend to be less humid than low elevations and (2) Eggshell traits do not show consistent patterns along the elevational gradient. The pattern in water loss is consistent with the hypothesis that high-elevation eggs are physiologically specialized for the slightly drier environments they experience. The finding that eggshell traits did not vary with elevation, however, means that the pattern of water loss is not driven by single eggshell traits (thicker eggshells could reduce water loss rates, as could fewer or smaller eggshell pores).

      This paper represents a strong advance for two main reasons. First, it provides evidence that egg physiology varies with elevation as predicted by the hypothesis that eggs are physiologically adapted to certain climatic conditions. This means egg physiological adaptation is a factor that could influence species' elevational ranges. Second, it is a proof-of-concept study that shows it is possible to measure eggshell physiology for a large number of species in the field in order to test hypotheses. As such, it should inspire many further tests that examine adaptation in egg physiology in the context of species' distributions along environmental gradients.

      There are two caveats that readers should be aware of. First, measuring these traits is difficult, and there remain questions about the efficacy of different methods. For example, the authors note that quantifying eggshell structures is very difficult, with several unresolved questions about their method of using scanning electron microscopy images to measure eggshell pores. Similarly, the authors mention that temperature variation may partially influence their main result that high-elevation eggs lose water at slower rates than low-elevation eggs (temperatures were colder for experiments at high elevations than for low elevations). Second, I regard the analyses of eggshell traits for specific families as exploratory. There are no a priori expectations for how different families might be expected to differ in their patterns. These analyses are fruitful in that they generate additional hypotheses that future work can test. However, it does mean that the statistical significance of eggshell trait relationships with elevation for specific families should be interpreted with caution.

    4. Author response:

      Reviewer #1 (Public Review):

      The authors tested the hypothesis that at high elevations avian eggs will be adapted to prevent desiccation that might arise from loss of water to surrounding drier air. They used a combination of gas diffusion experiments and scanning electron microscopy to examine water vapour conductance rates and eggshell structure, including thickness, pore size, and pore density among 197 bird species distributed along an elevational gradient in the Andes. While there was a correlation between water vapour conductance and elevation among species, a decrease in water vapour conductance with elevation was not associated with eggshell thickness, pore size, and pore density, suggesting the variation in the structure of the eggshells is unlikely to do with among species differences in water loss along elevational gradients. This study is very interesting and timely, especially with increasing water vapour pressure due to climate warming. It is a very well-written study and easy to read. However, I have some concerns about the conclusions drawn from the results.

      There are more than twice as many species in low and medium-elevation sites compared to high-elevation sites, so the amount of variation in low and medium-elevation should be expected to be higher by default. The argument for a wider range of variation in lowelevation species will be stronger if the comparison was a similar sample size. Moreover, the pattern clearly breaks down within families. Note also that for Low and medium elevation there is no difference in the amount of variation in conductance residuals possibly because the sample sizes are similar. The seemingly strong positive correlation between eggshell conductance and egg mass may be driven by the five high and two medium-elevation species with large eggs. There seem to be hardly any high-elevation species with egg mass greater than 12g whereas species in low elevation egg size seem to be as high as 80g (Figure 2a). Since larger eggs (and thus eggs of larger birds) lose more water compared to smaller eggs, the correlation between water vapour conductance and elevation may be more strongly associated with body size distribution along elevational gradients rather than egg structure and function.

      We thank the reviewer for this thoughtful observation. As noted in our response to comment 3, we recognize that the higher number of species at low and mid-elevations reflects the natural turnover in species richness along elevational gradients, and we are transparent about this caveat in our revised Discussion section. Nevertheless, to address this specific concern, we conducted additional analyses excluding the species with large eggs (i.e., egg mass >12g, which are only present at low and mid-elevations in our dataset). These analyses are now included in the Supplementary Figure 1, and the main pattern of lower water vapor conductance at high elevations holds even when larger eggs are excluded.

      We agree that the well-known scaling relationship between egg mass and conductance (recognized since the 1970s) may partially explain the observed trends across the elevational gradient. Our aim was to explore whether the known relationship between egg size and conductance varies when incorporating environmental variables such as elevation, which brings with it changes in humidity and oxygen availability. While we acknowledge the possible confounding effect of body size distributions along the gradient, our results, even after controlling for egg size (residual analysis), still suggest a decrease in conductance at higher elevations, consistent with predictions based on environmental conditions.

      We have clarified these points in the revised Discussion, including the acknowledgment that disentangling the relative contributions of body size and elevation to conductance patterns remains challenging and warrants further study.

      Authors argue that the observed variation in the relationship between water vapour conductance and elevation among and within bird families suggests potential differences in the adaptive response to common selective pressures in terms of eggshell thickness and pore density, and size. The evidence for this is generally weak from the data analyses because the decrease in water vapour conductance with elevation was not consistent across taxonomic groups nor were differences associated with specific patterns in eggshell thickness and pore density, and size.

      We appreciate the reviewer’s comments on the observed variation in water vapor conductance across taxonomic groups. As mentioned in response to comment 7, we have removed the explicit analyses and figures showing within-family comparisons, as these were exploratory and not directly tied to a specific hypothesis. We have also toned down our speculations regarding the potential adaptive drivers of the observed variation. In the revised Discussion, we emphasize the need for further research to explore these patterns and acknowledge the limitations of our current dataset in making strong conclusions about the adaptive responses to selective pressures.

      It is not clear how the authors expected the relationship between water vapour conductance and elevation to differ among taxonomic groups and there was no attempt to explain the biological implication of these differences among taxonomic groups based on the specific traits of the species or their families. This missing piece of information is crucial to justify the argument that differences among taxonomic groups may be due to differences in adaptive response.

      We appreciate the reviewer’s point. To clarify, we were not expecting the relationship between water vapor conductance and elevation to differ among taxonomic groups. Rather, our primary hypothesis was that water vapor conductance would decrease with elevation due to the drier conditions in highland habitats, and we sought to link this pattern with structural characteristics of the eggshell. The suggestion of potential differences among taxonomic groups arose from the lack of a consistent pattern across families, which prompted us to consider possible adaptive variation. We now address this more clearly in the Discussion section, acknowledging the need for further exploration into the potential selective pressures driving this variation among taxonomic groups.

      Reviewer #2 (Public Review):

      This paper represents a strong advance for two main reasons. First, it provides evidence that egg physiology varies with elevation as predicted by the hypothesis that eggs are physiologically adapted to certain climatic conditions. This means egg physiological adaptation is a factor that could influence species' elevational ranges. Second, it is a proof-of-concept study that shows it is possible to measure eggshell physiology for a large number of species in the field in order to test hypotheses. As such, it should inspire many further tests that examine adaptation in egg physiology in the context of species' distributions along environmental gradients.

      There are two caveats that readers should be aware of. First, measuring these traits is difficult, and there remain questions about the efficacy of different methods. For example, the authors note that quantifying eggshell structures is very difficult, with several unresolved questions about their method of using scanning electron microscopy images to measure eggshell pores. Similarly, the authors mention that temperature variation may partially influence their main result that high-elevation eggs lose water at slower rates than low-elevation eggs (temperatures were colder for experiments at high elevations than for low elevations). Second, I regard the analyses of eggshell traits for specific families as exploratory. There are no a priori expectations for how different families might be expected to differ in their patterns. These analyses are fruitful in that they generate additional hypotheses that future work can test. However, it does mean that the statistical significance of eggshell trait relationships with elevation for specific families should be interpreted with caution.

      We thank Reviewer 2 for these insightful comments. As mentioned earlier, measuring these traits is indeed very challenging, and we acknowledge the limitations of our methods, particularly when it comes to using scanning electron microscopy to quantify eggshell structures. We are aware of the unresolved questions around these techniques, and we plan to continue refining these methods in future studies. Regarding the influence of temperature variation on water loss, we recognize that colder temperatures at high elevations may have influenced our results, and we address this potential confounding factor in the Discussion section, Line 257.

      We also agree with the reviewer’s point regarding the exploratory nature of the family-specific analyses. These analyses were not guided by specific hypotheses, other than the expectation of replicating the overall pattern, and we recognize that they should be interpreted with caution. They serve primarily to generate additional hypotheses for future studies. In the revised manuscript, we have toned down the emphasis on the statistical significance of eggshell trait relationships with elevation for specific families, and we emphasize the need for further research to confirm these patterns.

    1. eLife Assessment

      In this manuscript, Jong et al. provide and validate a very useful resource for performing CRISPR screenings to study neutrophil differentiation and function by generating Hoxb8 cells that constitutively express Cas9. This library-screening approach has the potential to improve on the established lentiviral CRISPR-Cas9 editing of Hoxb8 cells. However, the technical advances provided are only incremental and the results presented in this study do not significantly further our understanding of these cells, but rather provide a good validation of their Cas9+ modified version.

    2. Reviewer #1 (Public Review):

      This study aims to develop a new system to analyze genetic determinants of neutrophil function by large-scale genetic screens. For that, the authors use a genetically-engineered ER-Hoxb8 neutrophil progenitor line that expresses Cas9 to perform CRISPR screens and to identify genes required for neutrophil survival and differentiation.

      A main strength of this study is that the authors develop a pooled CRISPR sgRNA library applicable to neutrophils and show potential determinants of neutrophil differentiation in vitro using this screening methodology.

      A main weakness of this study is that identified candidates associated with neutrophil differentiation, as those indicated in Fig. 4A, were not validated in vivo using neutrophil-specific K.O. models or further characterized in vitro (e.g. transcriptional or epigenetic changes during maturation when compared to non-targeting sgRNA controls).

      As secondary strengths, the authors provide evidence of efficient gene editing in Cas9+ER-Hoxb8 neutrophils both in vivo and in vitro and provide evidence of the specificity of this methodology using a Cas9+ER-Hoxb8 immortalized cell line that differentiates into macrophages.

      In terms of methodology, this study provides a useful tool to explore gene regulatory networks in neutrophils in large-scale genetic screens. However, it falls short in identifying and validating the true potential of this kind of methodology in the biology of the neutrophil.

      Moreover, the technical advances in the field are only incremental as several studies, including those using CRISPR/Cas9 technology in Hoxb-8 immortalized neutrophil progenitor cell lines have been already performed.

    3. Reviewer #2 (Public Review):

      In this manuscript, Jong et al. provide and validate a very useful resource for performing CRISPR screenings to study neutrophil differentiation and function. The major strength of the paper lies in its careful validation of many aspects of the Hoxb8-immortalized progenitor cells, including their differentiation capacity, their ability to clear bacteria, and their capacity to differentiate in vivo. The authors succeed at this, with results correctly supporting their conclusions. The major weaknesses are its presentation and writing, some of which are poorly organized. Finally, while the potential impact of this resource in the field could be very large, the CRISPR screening results appear half-baked, almost preliminary, and could be better validated, or at least presented. A few other points that warrant revision are included below:

      • The introduction should be better constructed and organized. It should be written with more connectors to present facts in a stream that flows naturally, from the broad general facts to the experimental details implemented in the manuscript. It should also discuss other similar approaches used in the literature, such as LaFleur et al. 2019, and relate in which ways these presented methods could be better.

      • The scheme in Figure 4A should more clearly indicate the timings, doublings, numbers of cells, and other aspects of the experimental design.

      • The volcano plot in Figure 4B is poorly informative and almost redundant. What does one make of it?

      • The representation (normalized reads) of each sgRNA in the library and across multiple experiments, including their correlation, should be checked and plotted, to visualize how robust these replicates are.

      • In Figure 4E, the distribution of the hit sgRNAs should be compared to all other targeting guides (instead of just to non-targeting controls). Linear density distribution plots or scatter plots of all guides are usually the best way, but there are others (for example, see Figure 4 of LaFleur et al. 2019). Ideally, each independent sgRNA for each gene in the library, as well as biological replicates, should be separately shown, with hits clearly highlighted.

      • While in vivo differentiation is shown as possible with these cell lines, it is unclear whether CRISPR screenings could be performed in vivo too. Would sgRNA representation suffice for genome-wide? At least some of the new hits could be validated by testing differentiation in vivo (i.e. WASH complex).

      • In the methods section, the RNA-seq analysis pipeline details are missing (versions, software for alignment, quantification, differential gene expression, and enrichment). Also, parameters for MAGeCK and MAGeCKFlute should be explicit and detailed.

      • The discussion is mostly a summary of the results. It is lacking in detail and thoughtful discussion regarding novelty and impact beyond the validation of the cell line. What about potential applications? What about extending screenings to test bacterial-killing, as suggested in Figure 2? What about limitations compared to other similar methods out there? There is little discussion of such important potential matters. Also, a large part of the discussion is dedicated to discussing details about Cebpe that are all well known in the literature and add little value.

      • Figure legends are typically too succinct and hard to interpret, especially for non-experts. The text should enable the figure reader to correctly interpret what is shown in each panel.

    4. Reviewer #3 (Public Review):

      Primary neutrophils are difficult to modify genetically, whereas the generation of knockout mice to study the role of specific proteins is time-consuming and expensive. CRISPR-Cas 9 genetic modification of neutrophil progenitors in vitro offers a platform to study neutrophil biology. Hoxb8 cells are immortalized neutrophil progenitors that differentiate into neutrophils when cultured in the presence of G-CSF, and have been shown to recapitulate the stages of murine neutrophil differentiation. They have also been shown to be amendable to CRISPR-Cas 9 genetic editing with successful deletion of key transcriptional regulators of neutrophil maturation and function. The authors of this manuscript offer an extension to this technique, by generating Hoxb8 cells that constitutively express Cas9. This may reduce the variation between the generated knock-out cells by avoiding the introduction of Cas9 in a plasmid every time together with a guide RNA.

      The first part of the manuscript is dedicated to the characterisation of Cas9+HoxB8 cells throughout their differentiation. Considering the existing body of literature on HoxB8 progenitors and their differentiation into neutrophils ex vivo, it does not significantly further our understanding of these cells, but rather provides a good validation to their Cas9+ modified version of them. Gene editing using Cas9+ Hoxb8 progenitors seems to be highly efficient, which is an important technical point, however, it is hard to assess the degree of improvement in efficiency compared to the published protocols with Cas9 delivery in a plasmid.

      As a test, the authors use Cas9+HoxB8 progenitor to generate a knockout of CEBPE, known for its important role in neutrophil development. They convincingly demonstrate its impact on HoxB8 cell differentiation, with in vivo reconstitution of wild-type and CEBPE-deficient HoxB8 progenitors into irradiated mice being especially elegant. However, the transfer into different recipient mice assumed no differences in the recipient environment, while immunophenotyping for mature neutrophils within the HoxB8 progenitor-derived cells did not account for possible differences in numbers of wt and CEBPE KO surviving cells, limiting the conclusions.

      Finally, the authors put the system to the test by screening a library of Brie gRNA library of ~80K mouse sgRNAs, targeting almost 20K genes with 4 gRNA per gene coverage, to identify genes that are needed for the differentiation of Cas9+ERHoxb8 progenitors into mature neutrophils. They identify a number of hits, amongst which the WASH complex and CEBPE are highlighted. A comparison of cell numbers prior to differentiation and at 4 days post differentiation indicates that they are indeed required for neutrophil survival. To validate the role of these hits in neutrophil maturation itself, as they stated in the aims, i.e. "to identify genes that modulate the differentiation of Cas9+ERHoxb8 progenitors into mature neutrophils", phenotypic, functional, and morphological characterization of these cell lines could have been appropriate.

      Overall, this study has the potential to improve on the established lentiviral CRISPR-Cas9 editing of Hoxb8 cells and be valuable for library-screening approaches for neutrophil modulators, which will benefit the community.

    1. eLife Assessment

      This study reports an mRNA-based strategy for restoring sperm motility in a mouse model of monogenic male infertility. The work is technically innovative and potentially valuable, as it demonstrates feasibility of in vivo testicular mRNA delivery without genomic integration of foreign DNA. However, although partial recovery of sperm motility is supported, the evidence for meaningful restoration of fertility remains incomplete, with weak IVF outcomes and difficult-to-interpret ICSI results. In addition, mechanistic questions regarding the persistence of mRNA and the specificity of germ-cell targeting remain insufficiently resolved, limiting the strength of the authors' conclusions.

    2. Reviewer #4 (Public review):

      I maintain that the images in Figure 12 (new Figure 14) do not support the authors' interpretation that 2-cell embryos resulted from in vitro fertilization (IVF) of Amrc-/- rescued sperm. They are clearly not normal 2-cell embryos and instead look very much like fragmented eggs that can be seen occasionally following in vitro fertilization procedures even when that is done with wild type eggs and sperm. The only portion of current Figure 14 that has normal looking 2-cell embryos is in panel 14A4, where wild type B6D2 sperm were used. Even in that panel, there are some fragmented eggs that the authors identify as 2-cell embryos.

      The authors offer the explanation that CD1 eggs fertilized by B6D2F1 hybrid male sperm do not develop beyond the 2-cell stage, citing a 2008 paper published in Biology of Reproduction by Fernandez-Goonzalez et al. I read through that paper very carefully and even had a colleague read through it in case I missed something, but that paper says nothing at all about strain incompatibilities, much less 2-cell arrest due to them. The only crosses done in that paper are CD1 eggs x CD1 sperm and B6D2 eggs x B6D2 sperm, all by intracytoplasmic sperm injection and not standard in vitro fertilization. [Note that the paper does mention performing in vitro fertilization but says nothing about how it was done or what mouse strains were used.] I even searched the literature for information regarding incompatibility between these strains and could find nothing relevant. But even if the authors are correct and there happens to be a strain incompatibility and 2-cell arrest is expected, what the authors are calling 2-cell embryos are clearly not.

      A second explanation offered by the authors is that they used collagenase to remove the cumulus cells and that this may have affected the appearance of the embryos. This technique is actually used to remove both the cumulus cells and the zona pellucida and has been described as a gentler way to do so than other standard methods (hyaluronidase treatment followed by acid Tyrodes to remove the zona pellucida) (Yamatoya et al., Reprod Med Biol 2011, DOI 10.1007/s12522-011-0075-8). I think it is highly relevant to the current study that the method they used to remove cumulus cells also dissolves the zona, either partially or completely. Given that many of the eggs, fragmented eggs, and 2-cell embryos (from the WT sperm) in Figure 14A are lacking a zona pellucida, it seems very likely that many of the eggs were either zona-free or had partial zona dissolution from the start. In fact, the authors state in the Methods section that "Cumulus-free and zona-free eggs were collected..." for how IVF was done. Partial zona dissolution is standard in some protocols for performing IVF using frozen mouse sperm, which usually have much lower motility and overall efficacy than fresh sperm. In any case, it would improve transparency if the manuscript made clear somewhere other than buried in the Methods that the IVF procedure was done on eggs with partially or fully removed zonas, to allow proper interpretation.

      In the rebuttal, the authors go on to state: "To provide additional functional evidence, we complemented the IVF experiments with ICSI using rescued Armc2-/- sperm and B6D2 oocytes, which allowed embryos to develop to the blastocyst stage. In these experiments, 25% of injected oocytes reached the blastocyst stage with rescued sperm compared to 13% for untreated Armc2-/- sperm (Supplementary Fig. 9) These results support the functional competence of rescued sperm and demonstrate partial recovery of fertilization ability following Armc2 mRNA electroporation."

      Their conclusion that the data support partial recovery of fertilization ability following Armc2 mRNA electroporation in my opinion has no basis. This experiment was done only once, and no information is provided regarding how many eggs underwent ICSI or how many reached the blastocyst stage. The authors claim that the rescued sperm were better than the Armc2-/- sperm in producing blastocysts, but this is based on a simple percentage report of 25% vs 13% without any statistical analysis, even on the results from the single experiment presented.

      Overall, the paper shows rescue of some sperm motility by the new method they use, and the new title is therefore appropriate. The authors have also dealt reasonably with many of the original concerns regarding documenting that their methodology was effective in producing protein (at least the GFP marker) in spermatogenic cells. In my view the authors have, however, not shown any indication of functional recovery over what is already known for the knockout sperm, that ICSI can support blastocyst stage embryo development. They also have not, in my view, justified the claims at the end of the abstract "These motile sperm were able to produce embryos by IVF..." and that "...mRNA electroporation can restore...partially fertilizing ability..."

    3. Reviewer #5 (Public review):

      While the study presents an innovative and potentially impactful mRNA-based approach for addressing monogenic causes of male infertility, several significant weaknesses limit the strength of the authors' central conclusions.

      First, the functional evidence for true fertility restoration remains incomplete. Although the authors convincingly demonstrate partial recovery of sperm motility, the downstream reproductive outcomes, particularly for IVF, are weak. Importantly, these concerns are shared by all three reviewers and the former Reviewing Editor, and to my eye they are both thoughtfully articulated and well warranted. The ICSI data show modest improvement, but this rescue is difficult to interpret.

      In parallel, significant mechanistic questions persist regarding the unusually prolonged persistence of naked mRNA and reporter protein expression in germ cells, which is not fully reconciled with established mRNA and protein half-life biology and is supported largely by inference rather than by direct decay measurements.

      Finally, although the authors have conducted additional cellular analyses, concerns about the extent and specificity of germ-cell targeting versus Sertoli-cell expression remain unresolved. Together, these issues do not negate the technical novelty of the work, but they do constrain the confidence with which the current dataset can support the authors' strongest therapeutic claims.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors assess the effectiveness of electroporating mRNA into male germ cells to rescue the expression of proteins required for spermatogenesis progression in individuals where these proteins are mutated or depleted. To set up the methodology, they first evaluated the expression of reporter proteins in wild-type mice, which showed expression in germ cells for over two weeks. Then, they attempted to recover fertility in a model of late spermatogenesis arrest that produces immotile sperm. By electroporating the mutated protein, the authors recovered the motility of ~5% of the sperm; although the sperm regenerated was not able to produce offspring using IVF, the embryos reached the 2-cell state (in contrast to controls that did not progress past the zygote state).

      This is a comprehensive evaluation of the mRNA methodology with multiple strengths. First, the authors show that naked synthetic RNA, purchased from a commercial source or generated in the laboratory with simple methods, is enough to express exogenous proteins in testicular germ cells. The authors compared RNA to DNA electroporation and found that germ cells are efficiently electroporated with RNA, but not DNA. The differences between these constructs were evaluated using in vivo imaging to track the reporter signal in individual animals through time. To understand how the reporter proteins affect the results of the experiments, the authors used different reporters: two fluorescent (eGFP and mCherry) and one bioluminescent (Luciferase). Although they observed differences among reporters, in every case expression lasted for at least two weeks. The authors used a relevant system to study the therapeutic potential of RNA electroporation. The ARMC2-deficient animals have impaired sperm motility phenotype that affects only the later stages of spermatogenesis. The authors showed that sperm motility was recovered to ~5%, which is remarkable due to the small fraction of germ cells electroporated with RNA with the current protocol. The sperm motility parameters were thoroughly assessed by CASA. The 3D reconstruction of an electroporated testis using state-of-the-art methods to show the electroporated regions is compelling.

      The main weakness of the manuscript is that although the authors manage to recover motility in a small fraction of the sperm population, it is unclear whether the increased sperm quality is substantial to improve assisted reproduction outcomes. The authors found that the rescued sperm could be used to obtain 2-cell embryos via IVF, but no evidence for more advanced stages of embryo differentiation was provided. The motile rescued sperm was also successfully used to generate blastocyst by ICSI, but the statistical significance of the rate of blastocyst production compared to non-rescued sperm remains unclear. The title is thus an overstatement since fertility was never restored for IVF, and the mutant sperm was already able to produce blastocysts without the electroporation intervention.

      Overall, the authors clearly show that electroporating mRNA can improve spermatogenesis as demonstrated by the generation of motile sperm in the ARMC2 KO mouse model.

      We thank the reviewer for this thoughtful and constructive comment. We agree that our study demonstrates a partial functional recovery of spermatogenesis rather than a complete restoration of fertility. Our main objective was to establish and validate a proof-of-concept approach showing that mRNA electroporation can rescue the expression of a missing or mutated protein in post-meiotic germ cells and result in the production of motile sperm.

      To address the reviewer’s concern, we have the title and discussion to more accurately reflect the scope of our findings. The new title reads:

      “Sperm motility in mice with oligo-astheno-teratozoospermia restored by in vivo injection and electroporation of naked mRNA”

      In the manuscript, we now emphasize that while motility recovery was significant, complete fertility restoration was not achieved. We have also clarified that:

      The 5% recovery in motile sperm represents a substantial improvement considering the small population of germ cells reached by the current electroporation method.

      The 2-cell embryo formation observed after IVF serves as a strong indication of partial functional recovery

      Finally, we now explicitly state in the Discussion that this approach should be considered a therapeutic proof-of-concept, demonstrating feasibility and potential, rather than a fully curative intervention.

      Reviewer #2 (Public review):

      The authors inject, into the rete testes, mRNA and plasmids encoding mRNAs for GFP and then ARMC2 (into infertile Armc2 KO mice) in a gene therapy approach to express exogenous proteins in male germ cells. They do show GFP epifluorescence and ARMC2 protein in KO tissues, although the evidence presented is weak. Overall, the data do not necessarily make sense given the biology of spermatogenesis and more rigorous testing of this model is required to fully support the conclusions, that gene therapy can be used to rescue male infertility.

      In this revision, the authors attempt to respond to the critiques from the first round of reviews. While they did address many of the minor concerns, there are still a number to be addressed. With that said, the data still do not support the conclusions of the manuscript.

      We thank the reviewer for their careful and detailed assessment of our manuscript. We appreciate the concerns raised regarding mRNA stability, GFP localization, and the interpretation of spermatogenesis stages, and we have addressed these points in the manuscript and in the responses below.

      (1) The authors have not satisfactorily provided an explanation for how a naked mRNA can persist and direct expression of GFP or luciferase for ~3 weeks. The most stable mRNAs in mammalian cells have half-lives of ~24-60 hours. The stability of the injected mRNAs should be evaluated and reported using cell lines. GFP protein's half-life is ~26 hours, and luciferase protein's half-life is ~2 hours.

      We thank the reviewer for this important comment. The stability of mRNA-GFP was assessed by RT-QPCR in HEK cells and seminiferous tubule cells (Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells (Fig. 5A). Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability, efficient translation within germ cells and the slow protein turnover that is typical of the spermatogenic lineage.

      (2) There is no convincing data shown in Figs. 1-8 that the GFP is even expressed in germ cells, which is obviously a prerequisite for the Armc2 KO rescue experiment shown in the later figures! In fact, to this reviewer the GFP appears to be in Sertoli cell cytoplasm, which spans the epithelium and surrounds germ cells - thus, it can be oft-confused with germ cells. In addition, if it is in germ cells, then the authors should be able to show, on subsequent days, that it is present in clones of germ cells that are maturing. Due to intracellular bridges, a molecule like GFP has been shown to diffuse readily and rapidly (in a matter of minutes) between adjacent germ cells. To clarify, the authors must generate single cell suspensions and immunostain for GFP using any of a number of excellent commercially-available antibodies to verify it is present in germ cells. It should also be present in sperm, if it is indeed in the germline.

      We thank the reviewer for this insightful comment. To directly address the concern, we performed additional experiments to assess GFP expression in germ cells following in vivo mRNA delivery. GFP-encoding mRNA was injected and electroporated into the testes on day 0. On day 1, testes were collected, enzymatically dissociated, and the resulting seminiferous tubule cell suspensions were cultured for 12 hours. Live cells were then analyzed by fluorescence microscopy (Fig. 10).

      We observed GFP expression in various germ cell types, including pachytene spermatocytes (53,4 %) (Fig 10 A-), round spermatids (25 %) (Fig 10B-E) and in elongated spermatids (11,4%) (Fig 10 C-E). The identification of these cell types was based on DAPI nuclear staining patterns, cell size fig 10 F, non-adherent characteristics, and the use of an enzymatic dissociation protocol.

      Fluorescence imaging revealed strong cytoplasmic GFP signals in each of these populations, confirming efficient transfection and translation of the delivered mRNA. These results demonstrate that the in vivo injection and electroporation protocol enables effective mRNA transfection across multiple stages of spermatogenesis. These results confirm that the injected mRNA is efficiently translated in germ cells at various stages of spermatogenesis. Together, these data validate the germ cell-specific nature of the GFP signal, supporting the Armc2 KO rescue experiments.

      As mentioned previously, we assessed the stability of mRNA-GFP using RT-QPCR in HEK cells and seminiferous tubule cells (see Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells. Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability and local translation within germ cells, as well as the slow protein turnover typical of the spermatogenic lineage.

      Other comments:

      70-1 This is an incorrect interpretation of the findings from Ref 5 - that review stated there were ~2,000 testis-enriched genes, but that does not mean "the whole process involves around two thousand of genes"

      We thank the reviewer for this helpful comment. We agree that our previous phrasing was imprecise. We have revised the sentence to clarify that approximately 2,000 genes show testis-enriched expression, rather than implying that the entire spermatogenic process is limited to these genes. The corrected sentence now reads:

      “Spermatogenesis involves the coordinated expression of a large number of genes, with approximately 2,000 showing testis-enriched expression, about 60% of which are expressed exclusively in the testes”

      74 would specify 'male':

      we have now specified it as you suggested.

      79-84 Are the concerns with ICSI due to the procedure itself, or the fact that it's often used when there is likely to be a genetic issue with the male whose sperm was used? This should be clarified if possible, using references from the literature, as this reviewer imagines this could be a rather contentious issue with clinicians who routinely use this procedure, even in cases where IVF would very likely have worked:

      We thank the reviewer for this important comment. Concerns about ICSI outcomes indeed reflect two partly overlapping causes: the procedure itself (direct sperm injection and associated laboratory manipulations) and the clinical/genetic background of couples undergoing ICSI (especially men with severe male-factor infertility). Large reviews and meta-analyses report a small increase in some perinatal and congenital risks after ART/ICSI, but these studies conclude that it is difficult to fully disentangle procedural effects from parental factors. Importantly, genetic or epigenetic abnormalities in the male (which motivate use of ICSI) likely contribute to adverse outcomes in offspring, while some studies also suggest that ICSI-specific manipulations may alter epigenetic marks in embryos. For these reasons professional bodies recommend reserving ICSI for appropriate male-factor indications rather than as routine insemination for non-male-factor cases

      We have revised the text accordingly to clarify this distinction:

      “ICSI can efficiently overcome the problems faced.  Nevertheless, concerns persist regarding the potential risks associated with this technique, including blastogenesis defect, cardiovascular defect, gastrointestinal defect, musculoskeletal defect, orofacial defect, leukemia, central nervous system tumors, and solid tumors [1-4]. Statistical analyses of birth records have demonstrated an elevated risk of birth defects, with a 30-40 % increased  likelihood in cases involving ICSI [1-4], and a prevalence of birth defects between 1 % and 4 % [3]. It is important to note, however, that the origin of these risks remains debated. Several large epidemiological and mechanistic studies indicate that both the procedure itself (direct microinjection and in vitro manipulation) and the underlying genetic or epigenetic abnormalities often present in men requiring ICSI contribute to the observed outcomes [1, 3] [5, 6] . To overcome these drawbacks, a number of experimental strategies have been proposed to bypass ARTs and restore spermatogenesis and fertility, including gene therapy [7-10].”

      199 Codon optimization improvement of mRNA stability needs a reference;

      We have added the references accordingly: [11-15]

      In one study using yeast transcripts, optimization improved RNA stability on the order of minutes (e.g., from ~5 minutes to ~17 minutes); is there some evidence that it could be increased dramatically to days or weeks?

      We agree with the reviewer that codon optimization can enhance mRNA stability, but available evidence indicates that this effect is moderate. In Saccharomyces cerevisiae, Presnyak et al. (2015) [16] showed that codon optimization increased mRNA half-life from approximately 5 minutes to ~17 minutes, representing a several-fold improvement rather than a shift to days or weeks. Similar codon-dependent stabilization has been observed in mammalian systems, where transcripts enriched in optimal codons exhibit longer half-lives and enhanced translation efficiency [11]; [17]). However, these studies consistently report effects on the scale of minutes to hours. In mammalian cells, the prolonged stability of therapeutic or vaccine mRNAs—lasting for days—is primarily achieved through additional features such as optimized untranslated regions, chemical nucleotide modifications (e.g., N¹-methylpseudouridine), and protective delivery systems, rather than codon usage alone ([18]; [19]).

      Other molecular optimizations that improve in vivo mRNA stability and translation include a poly(A) tail, which binds poly(A)-binding proteins to protect the transcript from 3′ exonuclease degradation and promotes ribosome recycling, and a CleanCap structure at the 5′ end, which mimics the natural Cap 1 configuration, protects against 5′ exonuclease attack, and enhances translational initiation [11-15]. Together, these modifications act synergistically to stabilize the transcript and support efficient translation.

      472-3 The reported half-life of EGFP is ~36 hours - so, if the mRNA is unstable (and not measured, but certainly could be estimated by qRT-PCR detection of the transcript on subsequent days after injection) and EGFP is comparatively more stable (but still hours), how does EGFP persist for 21 days after injection of naked mRNA??

      We thank the reviewer for this important comment. The stability of mRNA-GFP was assessed by RT-QPCR in HEK cells and seminiferous tubule cells (Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells (Fig. 5). Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability, efficient translation within germ cells and the slow protein turnover that is typical of the spermatogenic lineage.

      Curious why the authors were unable to get anti-GFP to work in immunostaining?

      We appreciate the reviewer’s question. We attempted to detect GFP using several commercially available anti-GFP antibodies under various standard immunostaining conditions. However, in our hands, these antibodies consistently produced either no signal or high background staining, making the results unreliable. We therefore relied on direct detection of GFP fluorescence, which provides a more accurate and specific readout of protein expression in our system.

      In Fig. 3-4, the GFP signals are unremarkable, in that they cannot be fairly attributed to any structure or cell type - they just look like blobs; and why, in Fig. 4D-E, why does the GFP signal appear stronger at 21 days than 15 days? And why is it completely gone by 28 days? This data is unconvincing.

      We would like to thank the reviewer for their comments. Figure 3–4 provides a global overview of GFP expression on the surface of the testis. The entire testis was imaged using an inverted epifluorescence microscope, and the GFP signal represents a composite of multiple seminiferous tubules across the tissue surface. Due to this whole-organ imaging approach, it is not possible to resolve individual structures such as the basement membrane or lumen, which is why the signals may appear as diffuse “blobs.”

      Regarding the time-course in Figure 4D–E, the apparent increase in GFP signal at 21 days compared with 15 days likely reflects accumulation and translation of the delivered mRNA in germ cells over time, whereas the absence of signal at 28 days corresponds to the natural turnover and degradation of GFP protein and mRNA in the tissue. We hope this explanation clarifies the observed patterns of fluorescence.

      If the authors did a single cell suspension, what types or percentage of cells would be GFP+? Since germ cells are not adherent in culture, a simple experiment could be done whereby a single cell suspension could be made, cultured for 4-6 hours, and non-adherent cells "shaken off" and imaged vs adherent cells. Cells could also be fixed and immunostained for GFP, which has worked in many other labs using anti-GFP.

      We thank the reviewer for this insightful comment. To directly address the concern, we performed additional experiments to assess GFP expression in germ cells following in vivo mRNA delivery. GFP-encoding mRNA was injected and electroporated into the testes on day 0. On day 1, testes were collected, enzymatically dissociated, and the resulting seminiferous tubule cell suspensions were cultured for 12 hours. Live cells were then analyzed by fluorescence microscopy (Fig. 10).

      We observed GFP expression in various germ cell types, including pachytene spermatocytes (53,4 %) (Fig 10 A-), round spermatids (25 %) (Fig 10B-E) and in elongated spermatids (11,4%) (Fig 10 C-E). The identification of these cell types was based on DAPI nuclear staining patterns, cell size fig 10 F, non-adherent characteristics, and the use of an enzymatic dissociation protocol.

      Fluorescence imaging revealed strong cytoplasmic GFP signals in each of these populations, confirming efficient transfection and translation of the delivered mRNA. These results demonstrate that the in vivo injection and electroporation protocol enables effective mRNA transfection across multiple stages of spermatogenesis.

      These results confirm that the injected mRNA is efficiently translated in germ cells at various stages of spermatogenesis. Together, these data validate the germ cell-specific nature of the GFP signal, supporting the Armc2 KO rescue experiments.

      As mentioned previously, we assessed the stability of mRNA-GFP using RT-QPCR in HEK cells and seminiferous tubule cells (see Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells. Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability and local translation within germ cells, as well as the slow protein turnover typical of the spermatogenic lineage.

      In Fig. 5, what is the half-life of luciferase? From this reviewer's search of the literature, it appears to be ~2-3 h in mammalian cells. With this said, how do the authors envision detectable protein for up to 20 days from a naked mRNA? The stability of the injected mRNAs should be shown in a mammalian cell line - perhaps this mRNA has an incredibly long half-life, which might help explain these results. However, even the most stable endogenous mRNAs (e.g., globin) are ~24-60 hrs.

      We did not directly assess the stability of luciferase mRNA, but we evaluated the persistence of GFP mRNA, which was synthesized and optimized using the same sequence optimization and chemical modification strategy as the luciferase mRNA. In these experiments, mRNA-GFP was detectable in seminiferous tubule cells for up to two weeks after injection. We therefore expect a similar stability profile for the luciferase mRNA. These findings suggest that the prolonged fluorescence or bioluminescence observed in our study likely reflects a combination of factors, including enhanced transcript stability, local translation within germ cells, and the inherently slow protein turnover characteristic of the spermatogenic lineage.

      527-8 The Sertoli cell cytoplasm is not just present along the basement membrane as stated, but also projects all the way to the lumina:

      we clarified the sentence " Sertoli cells have an oval to elongated nucleus and the cytoplasm presents a complex shape (“tombstone” pattern) along the basement membrane, with long projections that extend toward the lumen."

      529-30 This is incorrect, as round spermatids are never "localized between the spermatocytes and elongated spermatids" - if elongated spermatids are present, rounds are not - they are never coincident in the same testis section:

      We thank the reviewer for this important comment and for drawing attention to the detailed staging of the seminiferous epithelium. We agree that the spatial organization of germ cells varies depending on the stage of spermatogenesis. While round spermatids (steps 1–8) and elongated spermatids (steps 9–16) are typically associated with distinct stages, transitional stages of the seminiferous epithelium can contain both cell types in close proximity, reflecting the continuous and overlapping nature of spermatid differentiation (Meistrich, 2013, Methods Mol. Biol. 927:299–307). We have revised the text to clarify this point, indicating that the relative positioning of germ cell types depends on the stage of the seminiferous cycle rather than implying their constant coexistence within the same tubule section.

      Fig. 7. To this reviewer, all of the GFP appears to be in Sertoli cell cytoplasm In Figs 1-8 there is no convincing evidence presented that GFP is expressed in germ cells! In fact, it appears to be in Sertoli cells.

      We thank the reviewer for their observation. As previously mentioned, we have included an additional experiment specifically demonstrating GFP expression in germ cells (fig 10). This new data provides clear evidence that the GFP signal is not restricted to Sertoli cells and confirms successful uptake and translation of GFP mRNA in germ cells.

      Fig. 9 - alpha-tubuline?

      We corrected the figure.

      Fig. 11 - how was sperm morphology/motility not rescued on "days 3, 6, 10, 15, or 28 after surgery", but it was in some at 21 and 35? How does this make sense, given the known kinetics of male germ cell development??

      We note the reviewer’s concern regarding the timing of motile sperm appearance. Variability among treated mice is expected because transfection efficiency differed between spermatogonia and spermatids. Full spermiogenesis requires ~15 days, and epididymal transit adds ~8 days, consistent with motile sperm appearing around 21 days post-injection in some mice.

      And at least one of the sperm in the KO in Fig. B5 looks relatively normal, and the flagellum may be out-of-focus in the image? With only a few sperm for reviewers to see, how can we know these represent the population?

      We thank the reviewer for their comment. Upon closer examination of the image, the flagellum of the spermatozoon in question is clearly abnormally short and this is not due to being out of focus. Furthermore, the supplementary figure shows that the KO consistently lacks normal spermatozoa. These defects are consistent with previous findings from our laboratory [22], confirming that the observed phenotype is representative of the KO population rather than an isolated occurrence.

      Reviewer #3 (Public review):

      Summary:

      The authors used a novel technique to treat male infertility. In a proof-of-concept study, the authors were able to rescue the phenotype of a knockout mouse model with immotile sperm using this technique. This could also be a promising treatment option for infertile men.

      Strengths:

      In their proof-of-concept study, the authors were able to show that the novel technique rescues the infertility phenotype of Armc2 knockout spermatozoa. In the current version of the manuscript, the authors have added data on in vitro fertilisation experiments with Armc2 mRNA-rescued sperm. The authors show that Armc2 mRNA-rescued sperm can successfully fertilise oocytes that develop to the blastocyst stage. This adds another level of reliability to the data.

      Weaknesses:

      Some minor weaknesses identified in my previous report have already been fixed. The technique is new and may not yet be fully established for all issues. Nevertheless, the data presented in this manuscript opens the way for several approaches to immotile spermatozoa to ensure successful fertilisation of oocytes and subsequent appropriate embryo development.

      [Editors' note: The images in Figure 12 do not support the authors' interpretation that 2-cell embryos resulted from in vitro fertilization. Instead, the cells shown appear to be fragmented, unfertilized eggs. Combined with the lack of further development, it seems highly unlikely that fertilization was successful.]

      We thank the reviewer for their careful evaluation and constructive feedback. We appreciate the acknowledgment of the strengths of our study, particularly the proof-of-concept demonstration that Armc2-mRNA electroporation can rescue sperm motility in Armc2 knockout mice.

      Regarding the concern raised by the editor about Figure 12, we would like to clarify two technical points. First, the IVF experiments were performed using CD1 oocytes and B6D2 sperm. Due to strain-specific incompatibilities, fertilization of CD1 oocytes by B6D2 sperm typically does not progress beyond the two-cell stage (Fernández-González [23] et al., 2008, Biology of Reproduction). Therefore, the observation of two-cell embryos represents the expected limit of development in this cross and serves as a strong indication of successful fertilization, even though further development is not possible. Second, the oocytes used in these experiments were treated with collagenase to remove cumulus cells. This enzymatic treatment can sometimes affect the morphology of early embryos, which may explain why the two-cell embryos in Figure 12 appear less regular or somewhat fragmented. We also included a control showing embryos from B6D2 sperm with the same collagenase treatment on CD1 oocytes, which yielded similar appearances (Fig14 A4).

      To provide additional functional evidence, we complemented the IVF experiments with ICSI using rescued Armc2<sup>–/–</sup> sperm and B6D2 oocytes, which allowed embryos to develop to the blastocyst stage. In these experiments, 25% of injected oocytes reached the blastocyst stage with rescued sperm compared to 13% for untreated Armc2–/– sperm (Supplementary Fig. 9) These results support the functional competence of rescued sperm and demonstrate partial recovery of fertilization ability following Armc2 mRNA electroporation.

      We have clarified these points in the revised Results and Discussion sections to emphasize that the IVF data indicate partial functional recovery of rescued sperm rather than full fertility restoration. These clarifications address the editor’s concern while accurately representing the technical limitations of the strain combination used in our experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Fig 12 and Supplementary Fig 9 are mislabeled in the text and rebuttal.

      We thank the reviewer for pointing this out. We have carefully checked the manuscript and the rebuttal text, and corrected all references to Figure 12 and Supplementary Figure 9 to ensure they are accurately labeled and consistent throughout the text.

      Reviewer #3 (Recommendations for the authors):

      The contribution of the newly added authors should be clarified. All other aspects of inadequacy raised in my previous report have been adequately addressed.

      No further comments.

      We thank the reviewer for noting this. The contributions of the newly added authors have been clarified in the Author Contributions section of the revised manuscript. All other points raised in the previous review have been addressed as indicated.

      References

      (1) Hansen, M., et al., Assisted reproductive technologies and the risk of birth defects--a systematic review. Hum Reprod, 2005. 20(2): p. 328-38.

      (2) Halliday, J.L., et al., Increased risk of blastogenesis birth defects, arising in the first 4 weeks of pregnancy, after assisted reproductive technologies. Hum Reprod, 2010. 25(1): p. 59-65.

      (3) Davies, M.J., et al., Reproductive technologies and the risk of birth defects. N Engl J Med, 2012. 366(19): p. 1803-13.

      (4) Kurinczuk, J.J., M. Hansen, and C. Bower, The risk of birth defects in children born after assisted reproductive technologies. Curr Opin Obstet Gynecol, 2004. 16(3): p. 201-9.

      (5) Graham, M.E., et al., Assisted reproductive technology: Short- and long-term outcomes. Dev Med Child Neurol, 2023. 65(1): p. 38-49.

      (6) Palermo, G.D., et al., Intracytoplasmic sperm injection: state of the art in humans. Reproduction, 2017. 154(6): p. F93-f110.

      (7) Usmani, A., et al., A non-surgical approach for male germ cell mediated gene transmission through transgenesis. Sci Rep, 2013. 3: p. 3430.

      (8) Raina, A., et al., Testis mediated gene transfer: in vitro transfection in goat testis by electroporation. Gene, 2015. 554(1): p. 96-100.

      (9) Michaelis, M., A. Sobczak, and J.M. Weitzel, In vivo microinjection and electroporation of mouse testis. J Vis Exp, 2014(90).

      (10) Wang, L., et al., Testis electroporation coupled with autophagy inhibitor to treat non-obstructive azoospermia. Mol Ther Nucleic Acids, 2022. 30: p. 451-464.

      (11) Wu, Q., et al., Translation affects mRNA stability in a codon-dependent manner in human cells. eLife, 2019. 8: p. e45396.

      (12) Gallie, D.R., The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes & Development, 1991. 5(11): p. 2108-2116.

      (13) Henderson, J.M., et al., Cap 1 messenger RNA synthesis with co-transcriptional CleanCap® analog improves protein expression in mammalian cells. Nucleic Acids Research, 2021. 49(8): p. e42.

      (14) Stepinski, J., et al., Synthesis and properties of mRNAs containing novel “anti-reverse” cap analogs. RNA, 2001. 7(10): p. 1486-1495.

      (15) Sachs, A.B., P. Sarnow, and M.W. Hentze, Starting at the beginning, middle, and end: translation initiation in eukaryotes. Cell, 1997. 89(6): p. 831-838.

      (16) Presnyak, V., et al., Codon optimality is a major determinant of mRNA stability. Cell, 2015. 160(6): p. 1111-24.

      (17) Cao, D., et al., Unlock the sustained therapeutic efficacy of mRNA. J Control Release, 2025. 383: p. 113837.

      (18) Karikó, K., et al., Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol Ther, 2008. 16(11): p. 1833-40.

      (19) Pardi, N., et al., mRNA vaccines — a new era in vaccinology. Nature Reviews Drug Discovery, 2018. 17(4): p. 261-279.

      (20) Meistrich, M.L. and R.A. Hess, Assessment of Spermatogenesis Through Staging of Seminiferous Tubules, in Spermatogenesis: Methods and Protocols, D.T. Carrell and K.I. Aston, Editors. 2013, Humana Press: Totowa, NJ. p. 299-307.

      (21) Au - Mäkelä, J.-A., et al., JoVE, 2020(164): p. e61800.

      (22) Coutton, C., et al., Bi-allelic Mutations in ARMC2 Lead to Severe Astheno-Teratozoospermia Due to Sperm Flagellum Malformations in Humans and Mice. Am J Hum Genet, 2019. 104(2): p. 331-340.

      (23) Fernández-Gonzalez, R., et al., Long-term effects of mouse intracytoplasmic sperm injection with DNA-fragmented sperm on health and behavior of adult offspring. Biol Reprod, 2008. 78(4): p. 761-72.

    1. eLife Assessment

      In their study, Scherer and colleagues aim to use analyses of single-cell clones of murine granulocyte monocyte progenitors that are conditionally immortalized, and analyses of neutrophils derived from those clones to characterize an experimental system for studying neutrophil heterogeneity. The multi-omic and functional analyses reported are valuable but the strength of the evidence presented in support of them is incomplete because the study lacks a rigorous demonstration that the neutrophil-like cells that they derive are fully mature neutrophils.

    2. Reviewer #1 (Public Review):

      The heterogeneity within the neutrophil population is becoming clear. However, it was not clear if neutrophil progenitors are also heterogenous. Because neutrophils are short-lived, it is technically challenging to tackle the question. This study used a system to isolate and expand clonal neutrophil progenitors (granulocyte-monocyte progenitors; GMPs) to achieve molecular and functional profiling. In the study, transcriptional profiling was performed by RNAseq and ATACseq. Functional assays were performed ex vivo to examine phagocytosis, ROS production, NET formation, and neutrophil swarming using Candida albicans, as well as C. glabrata and C. auris. The strengths of this study include the use of the neutrophil clone system to track GMPs developing into neutrophils. The clone-based approach made it possible to evaluate the functions of multiple neutrophil subpopulations. Limitations of this study include the dependency on ex vivo approaches and the modest degree of heterogeneity within presented neutrophils. Nevertheless, the finding - the heterogeneity of neutrophils can be traced back to the GMP stage - is significant.

    3. Reviewer #2 (Public Review):

      The stated goal of the authors is to establish and characterize an experimental system to study neutrophil heterogeneity in a manner that allows for functional outcomes to be probed. To do so, they start with murine GMPs that are conditionally immortalized by ER-HoxB8 expression and make single-cell clonal populations to ask whether those GMPs or neutrophils derived by differentiating such clonal GMPs harbor heterogeneity. At a conceptual level, this is an innovative approach that could shed light on mechanisms of neutrophil heterogeneity that have been described in both health and disease. They perform bulk multi-omics and functional analyses of both the clonal GMPs and neutrophil-like cells, including transcriptional and epigenetic profiling. However, the major weakness of the study is that the authors do not provide rigorous or convincing data that the cells they derive are truly mature neutrophils. To the contrary, the neutrophil-like cells lack Ly6G expression and so the authors fall back on using CD11b as the primary marker for delineating neutrophils; however, CD11b is expressed by both myeloid progenitors and some premature and mature myeloid lineages that are not neutrophils. They acknowledge this shortcoming, but they make an assumption that Ly6G expression is the only way in which the cells they derive are different from primary neutrophils without presenting any evidence indicating such. The authors use only SCF during the maturation of ER-HoxB8 GMPs into leukocytes, rather than including other cytokines such as G-CSF (or use in vivo maturation) that could have better-induced differentiation and maturation into granulocytes/neutrophils. The authors did not use their transcriptional analyses to further establish that the cells they derive from ER-HoxB8 GMPs are similar/different from primary murine neutrophils. Unfortunately, this shortcoming means that all of the analyses of neutrophil-like cells derived from clonal GMPs may or may not represent the transcriptional, epigenetic, etc. profile of a true mature neutrophil. It is also not rigorously addressed whether what they call PMNs derived from clonal GMPs are a transcriptionally uniform population or if they harbor heterogeneity within the bulk population. Overall, while conceptually intriguing and in pursuit of an experimental system that would be impactful for the field, the study as performed has critical flaws.

    4. Author response:

      Reviewer #1 (Public Review):

      The heterogeneity within the neutrophil population is becoming clear. However, it was not clear if neutrophil progenitors are also heterogenous. Because neutrophils are short-lived, it is technically challenging to tackle the question. This study used a system to isolate and expand clonal neutrophil progenitors (granulocyte-monocyte progenitors; GMPs) to achieve molecular and functional profiling. In the study, transcriptional profiling was performed by RNAseq and ATACseq. Functional assays were performed ex vivo to examine phagocytosis, ROS production, NET formation, and neutrophil swarming using Candida albicans, as well as C. glabrata and C. auris. The strengths of this study include the use of the neutrophil clone system to track GMPs developing into neutrophils. The clone-based approach made it possible to evaluate the functions of multiple neutrophil subpopulations. Limitations of this study include the dependency on ex vivo approaches and the modest degree of heterogeneity within presented neutrophils. Nevertheless, the finding - the heterogeneity of neutrophils can be traced back to the GMP stage - is significant.

      Reviewer #2 (Public Review):

      The stated goal of the authors is to establish and characterize an experimental system to study neutrophil heterogeneity in a manner that allows for functional outcomes to be probed. To do so, they start with murine GMPs that are conditionally immortalized by ER-HoxB8 expression and make single-cell clonal populations to ask whether those GMPs or neutrophils derived by differentiating such clonal GMPs harbor heterogeneity. At a conceptual level, this is an innovative approach that could shed light on mechanisms of neutrophil heterogeneity that have been described in both health and disease. They perform bulk multi-omics and functional analyses of both the clonal GMPs and neutrophil-like cells, including transcriptional and epigenetic profiling. However, the major weakness of the study is that the authors do not provide rigorous or convincing data that the cells they derive are truly mature neutrophils. To the contrary, the neutrophil-like cells lack Ly6G expression and so the authors fall back on using CD11b as the primary marker for delineating neutrophils; however, CD11b is expressed by both myeloid progenitors and some premature and mature myeloid lineages that are not neutrophils. They acknowledge this shortcoming, but they make an assumption that Ly6G expression is the only way in which the cells they derive are different from primary neutrophils without presenting any evidence indicating such. The authors use only SCF during the maturation of ER-HoxB8 GMPs into leukocytes, rather than including other cytokines such as G-CSF (or use in vivo maturation) that could have better-induced differentiation and maturation into granulocytes/neutrophils.

      Thank you. Of note, reviewer #1 also commented on the question of including other cytokines during the neutrophil differentiation process. We have included our response to reviewer #1 below, which includes the use of GM-CSF and IL-4.

      “We have now demonstrated enhanced Ly6G expression with GM-CSF and IL-4 treatment in a new Supplementary Figure 1.

      GMPs were washed out of estradiol-containing media and placed in fresh media containing 10 ng/ml GM-CSF and/or 1 ng/ml IL-4 for four days. Cells were collected and stained with CD117 (APC), F4/80 (AlexaFluor 488), Ly6G (PE), and CD11b (BV421). Neutrophil clones were run in biological triplicates, and undifferentiated GMPs were included as a negative control.

      GMPs stain as CD117POS / F4/80NEG / Ly6GNEG / CD11bNEG, indicating they are immature. The clones removed from estradiol differentiate and lose their CD117 expression. The mature cells remain F4/80NEG, as expected for mature neutrophils.

      The addition of GM-CSF to the media led to a significant increase in the expression of Ly6G. The addition of both GM-CSF + IL-4 did not further increase the proportion of Ly6G+ cells, and we have altered our statement slightly in the main text to reflect this finding (line 139).”

      The authors did not use their transcriptional analyses to further establish that the cells they derive from ER-HoxB8 GMPs are similar/different from primary murine neutrophils. Unfortunately, this shortcoming means that all of the analyses of neutrophil-like cells derived from clonal GMPs may or may not represent the transcriptional, epigenetic, etc. profile of a true mature neutrophil.

      Thank you. The ER-Hoxb8 system has been well-characterized by many authors at the function and at the transcriptional level, confirming that the cells highly reflect that same gene expression pattern as mature neutrophils. This was actually recently reviewed by Lail et al. (Traffic, 2022, PMID: 36117140). In terms of our analysis, we used transcriptional profiling to examine heterogeneity between different single-cell clones and not to re-validate the similarity with primary neutrophils.

      It is also not rigorously addressed whether what they call PMNs derived from clonal GMPs are a transcriptionally uniform population or if they harbor heterogeneity within the bulk population.

      Thank you. The reviewer poses an interesting, albeit challenging, question of whether even a single GMP clone can differentiate and result in mature neutrophil heterogeneity. To address this would require single cell sequencing of the resulting cells which we did not perform. We relied on single cell subcloning of the immature granulocyte monocyte progenitors to ensure a genetically identical clonal population. This was then additional confirmed by the retroviral insertional analysis. These analyses confirmed the clonal nature of our starting population, from which we posed the question of as whether the neutrophils derived from these clonal GMPs resulted in mature cells with consistent functional heterogeneity, which was indeed the case.

      Overall, while conceptually intriguing and in pursuit of an experimental system that would be impactful for the field, the study as performed has critical flaws.

    1. eLife Assessment

      This important study tackles an interesting aspect of fungal physiology: how a mitochondria-associated gene influences production of the secondary metabolite DON and fungicide sensitivity. The authors have improved the manuscript and the supporting evidence is convincing, although some uncertainties remain around descriptions of the methods.

    2. Reviewer #1 (Public review):

      Summary:

      In their study the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely and the manuscript is well written, albeit in some cases details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease of the DON content associated with deletion of FgDML1: Although some growth data are shown in figure 6 - indicating a severe growth defect - the DON production presented in figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to a decreased growth and specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation on the same conditions to the DON amount detected. Only then a conclusion as to an altered production in the mutant strains can be drawn.

      Comments to the revised manuscript:

      The authors carefully revised the manuscript and provided explanations for methods in several cases. However, there are still some problems - probably due to misunderstanding - that need revision.

      (1) A major problem of the first version of the manuscript was the lack of appropriate description of biomass analysis and the consideration of the respective results for evaluation of production of DON and other metabolites. Although the authors provide some explanation in the response to reviews, I could not find a corresponding explanation or description in the manuscript. It is not sufficient to explain the problem to me, but a detailed explanation and description of the method has to be provided in the manuscript along with the definition of one "unit of mycelium". It is still not entirely clear to me what such a "unit of mycelium" is.

      Please clarify this and any other uncertainties that were commented on by me and other reviewers in the manuscript, not only in the response to reviews. Also adjust the reference list accordingly.

      (2) Another problem was, that the authors considered FgDML1 a regulator of DON production. As mentioned by me and reviewer 3, FgDML1 is crucial to numerous functions in F. graminearum and its lack causes a plethora of problems for fungal physiology. Hence, although it is clear that the lack of FgDML1 causes alterations in DON production, it is not appropriate to designate this factor as a "regulator".<br /> It seems to me that the authors are afraid that if FgDML1 would not be a "regulator" that this would decrease the value of their study, which is not the case. This is a matter of correct wording. Therefore, please revise the wording accordingly, starting with the title:

      ...FgDML1 impacts DON toxin biosynthesis...

      Moreover, for sure the manuscript might benefit from more detailed description of the whole cascade leading from FgDML1 to DON biosynthesis and production of the other metabolites that change upon deletion. Such explanation can help the reader grasp the relevance of FgDML for regulatory processes as well as on more general versus specific effects.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper in innovative, but there are issues in the writing that need to be added and corrected.

      Comments on revisions:

      The author has addressed my questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      In their study, the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely, and the manuscript is well written, albeit in some cases, details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease in the DON content associated with the deletion of FgDML1. Although some growth data are shown in Figure 6, indicating a severe growth defect, the DON production presented in Figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to decreased growth, and the specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation under the same conditions to the DON amount detected. Only then can a conclusion as to an altered production in the mutant strains be drawn.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions. The point to point responds to the reviewer’s comments are listed as following. Our method for DON quantification was based on the amount per unit of mycelium. After obtaining the absorbance value from the ELISA reaction, the concentration of DON was calculated according to a standard curve and a formula, then divided by the dry weight of the mycelium to obtain the DON content per unit of mycelium, with the results finally expressed in µg/g.

      (1) Line 139f

      ... FgDML1 is a critical positive regulator of virulence ....

      Clearly, the deletion of FgDML1 impacts virulence, but it is too much of a general effect to say it is a regulator. DML1 acts high up in the cascade, impacting numerous processes, one of which is virulence. Generally, it has to be considered that deletion of DML1 causes a severe growth defect, which in turn is likely to lead to a plethora of effects. Besides discussing this fact, please also revise the manuscript to avoid references to "direct effects" or "regulator".

      Thank you very much for your advice. Our method for determining the amount of DON is based on the amount of mycelium per unit. After obtaining the absorbance value through Elisa reaction, we calculate the concentration of DON toxin according to the established standard curve and formula. Then, we divide it by the dry weight of mycelium to obtain the DON toxin content per unit mycelium, and finally present the results in µg/g. In summary, we conclude that the decrease in DON production by ΔFgDML is not due to slower hyphal growth, but rather a decrease in the ability of unit hyphae to produce DON toxins compared to the wild type. Given the decrease in DON toxin synthesis caused by FgDML1 deficiency, we believe that using a regulator is reasonable.

      (2) Line 143

      Please define "toxin-producing conditions".

      Thank you very much for your advice. We have accurately defined the conditions for toxin-producing conditions in the manuscript' toxin-inducing conditions '(28°C, 145 ×g, 7 days incubation)' (in L163-164)

      (3) Line 149

      A brief intro on toxisomes should be provided in the introduction to better integrate this into the manuscript's results.

      Thank you very much for your advice. We have added corresponding content about toxin producing bodies in the introduction section 'The biosynthesis of DON entails a reorganization of the endoplasmic reticulum into a specialized compartment termed the "toxisome" (Tang et al., 2018). The assembly of the toxisome coincides with the aggregation of key biosynthetic enzymes, which in turn enhances the efficiency of DON production. Concurrently, this compartmentalization serves as a self-defense mechanism, protecting the fungus from the autotoxicity of TRI pathway intermediates (Boenisch et al., 2017). The proteins TRI1, TRI4, TRI14, and Hmr1 are confirmed constituents of this structure(Kistler and Broz, 2015; Menke et al., 2013).' (in L86-93)

      (4) Line 153

      DON production decreases by about 80 %, but not to 0. Consequently, DML1 is important, but NOT essential for DON production.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'FgDML1 is essential for the biosynthesis of the DON toxin. '(in L161)

      (5) Line 168ff

      Please provide a reference for FgDnm1 being critical for mitochondrial fission and state whether such an interaction has been shown in other organisms.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'FgDnm1 is a key dynamin-related protein mediating mitochondrial fission(Griffin et al., 2005; Kang et al., 2023), suggesting that FgDML1 may form a complex with FgDnm1 to regulate mitochondrial fission and fusion processes. To our knowledge, this is the first report documenting an interaction between DML1 and Dnm in any fungal species, including model organisms such as S. cerevisiae. This novel finding provides new insights into the molecular mechanisms underlying mitochondrial dynamics in filamentous fungi. '(in L277-283)

      (6) Line 178

      Please specify whether Complex III activity was related to biomass and provide a p-value or standard deviation for the value.

      Thank you very much for your question. The activity determination of complex III was completed using a complex III enzyme activity kit (Solarbio, Beijing, China) (Li, et al 2022; Wang, et al 2022). Take 0.1 g of standardized mycelium as the sample for the experiment. Given that the mycelium has been homogenized, we believe that there is no necessary correlation between the activity and biomass of complex III. And we also refined the specific measurement steps in the article. ' Briefly, 0.1 g of mycelia was homogenized with 1 mL of extraction buffer in an ice bath. The homogenate was centrifuged at 600 ×g for 10 min at 4°C. The resulting supernatant was then subjected to a second centrifugation at 11,100 ×g for 10 min at 4°C. The pellet was resuspended in 200 μL of extraction buffer and disrupted by ultrasonication (200 W, 5 s pulses with 10 s intervals, 15 cycles). Complex III enzyme activity was finally measured by adding the working solution as per the manufacturer's protocol. Each treatment group contains three biological replicates and three technical replicates. '(in L511-517)

      Li C, et al. Amino acid catabolism regulates hematopoietic stem cell proteostasis via a GCN2-eIF2 axis. Cell Stem Cell. 2022 Jul 7; 29(7):1119-1134.e7. doi: 10.1016/j.stem.2022.06.004. PMID: 35803229.

      Wang K, et al. Locally organised and activated Fth1hi neutrophils aggravate inflammation of acute lung injury in an IL-10-dependent manner. Nat Commun. 2022 Dec 13;13(1):7703. doi: 10.1038/s41467-022-35492-y. PMID: 36513690; PMCID: PMC9745290

      (7) Line 185

      Albeit this headline is a reasonable hypothesis, you actually did not show that the conformation is altered. Please reword accordingly.

      Please also add references for cyazofamid acting on the QI site versus other fungicides acting on the QO site.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'Overexpression of FgQCR2, FgQCR8, and FgQCR9 may alters the conformation of the QI site, resulting in reduced sensitivity to cyazofamid. '(in L212-213). For fungicides targeting Qi and QO sites, we have added corresponding descriptions in the respective sections 'Numerous fungicides have been developed to inhibit the Qo site (e.g., pyraclostrobin, azoxystrobin)(Nuwamanya et al., 2022; Peng et al., 2022) and the Qi site (e.g., cyazofamid)(Mitani et al., 2001) of the cytochrome bc1 complex. '(in L327-329)

      (8) Line 200

      This section on growth should be moved up right after introducing the mutant strain.

      Thank you very much for your advice. We have advanced the part of nutritional growth and sexual asexual development before DON toxin to promote better reading and understanding. We arranged the sequence in the previous way to emphasize the new discovery between mitochondria and DON toxin. We found a significant decrease in DON toxin in ΔFgDML1, defects in the formation of toxin producing bodies, and downregulation of FgTRis at both the gene and protein levels. In summary, we believe that the absence of FgDML1 does indeed lead to a decrease in the content of DON toxin, and FgDML1 plays a regulatory role in the synthesis of DON toxin. In addition, our measurements of DON toxin, acetyl CoA, ATP and other indicators are all based on the amount per unit hyphae, excluding differences caused by hyphal biomass or growth. We have further refined the materials and methods to facilitate better reading and understanding.

      (9) Line 203

      "... significantly reduced growth rates ..."

      This is not what was measured here. Figure 6A shows a plate assay that can be used to assess hyphal extension. In the figure, it is also visible that the mycelium of the deletion mutant is much denser, maybe due to increased hyphal branching. Please reword.

      Additionally, it is important to include a biomass measurement here under the conditions used for DON assessment. Hyphal extension measurements cannot be used instead of biomass.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'The ΔFgDML1 strain displayed a distinct growth phenotype characterized by retardation in radial growth and the formation of more compact, denser hyphal networks on all tested media compared to the PH-1 and ΔFgDML-C strains. '(in L136-138).

      (10) Line 217

      Please include information on how long the cultures were monitored. Given the very slow growth of the mutant, perithecia formation may be considerably delayed beyond 14 days.

      Thank you very much for your advice. Based on your suggestion, we have extended the incubation time for sexual reproduction to 21 days to more accurately evaluate its sexual reproduction ability. Our results show that even after 21 days, Δ FgDML1 still cannot produce ascospores and ascospores, which proves that the absence of FgDML1 does indeed cause sexual reproduction defects in F. graminearum.

      Author response image 1.

      Discussion

      (11) Please mention your summary Figure 8 early on in the discussion, and explain conclusions with this figure in mind. Please avoid repetition of the results section as much as possible.

      Also, please state clearly what was already known from previous research and is in agreement with your results, and what is new (in fungi or generally).

      Thank you very much for your advice. Based on your suggestion, we mentioned Fig8 earlier in the first half of the discussion and provided guidance for the following text. We also conducted a more comprehensive discussion by analyzing our research results and comparing them with previous studies. 'Our study defines a novel mechanism through which FgDML1 governs mitochondrial homeostasis. We demonstrate that FgDML1 directly interacts with the key mitochondrial fission regulator FgDnm1 and positively modulates cellular bioenergetic metabolism, as evidenced by elevated ATP and acetyl-CoA levels (Fig. 8). '(in L250-253). 'The Misato/DML1 protein family is evolutionarily conserved from yeast to humans and plays a critical role in mitochondrial regulation. In S. cerevisiae, DML1 is an essential gene; its deletion is lethal, while its overexpression results in fragmented mitochondrial networks and aberrant cellular morphology, underscoring its necessity for normal mitochondrial function (Gurvitz et al., 2002). Similarly, in Homo sapiens, the homolog Misato localizes to the mitochondrial outer membrane, and both its depletion and overexpression are sufficient to disrupt mitochondrial morphology and distribution (Kimura and Okano, 2007). '(in L241-244).

      (12) Line 262ff

      Please specify if this interaction was shown previously in other organisms and provide references.

      Thank you very much for your advice. We have clearly stated in the corresponding section that the interaction between FgDML and FgDnm is the first reported, and to our knowledge, no relevant reports have been found in other species so far. ' Notably, FgDML1 was found to interact with FgDnm1 (Fig. 5E), FgDnm1 is a key dynamin-related protein mediating mitochondrial fission(Griffin et al., 2005; Kang et al., 2023), suggesting that FgDML1 may form a complex with FgDnm1 to regulate mitochondrial fission and fusion processes. To our knowledge, this is the first report documenting an interaction between DML1 and Dnm in any fungal species, including model organisms such as S. cerevisiae. This novel finding provides new insights into the molecular mechanisms underlying mitochondrial dynamics in filamentous fungi. '(in L276-283)

      (13) Line 287ff

      There is no result that would justify this speculation. Please remove.

      Thank you very much for your advice. We have modified the corresponding wording in the corresponding section. 'In conclusion, our findings suggest that the overexpression of assembly factors FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 potentially modifies the conformation of the Qi site, which specifically modulates the sensitivity of F. graminearum to cyazofamid. '(in L352-355)

      Materials and methods

      (14) A table with all primer sequences used in the study and their purpose is missing. For every experiment, the number of technical and biological replicates needs to be stated.

      Thank you very much for your advice. We have presented all the primers used in this study in Supplementary Table 1 (in Table S1) .We added the number of technical and biological replicates in the material and method descriptions for each experiment. 'For each sample, a total of 200 conidia were counted. The experiment included three biological replicates with three technical replicates each.'(in L434-436). 'Each treatment group contains three biological replicates. '(in L444-445). 'Each treatment group contains three biological replicates and three technical replicates. ' (in L463-464). 'Each treatment group contains three biological replicates and three technical replicates. '(in L474-475). 'Each treatment group contains three biological replicates. '(in L483). 'Each treatment group contains three biological replicates and three technical replicates.'(in L501-502). 'Each treatment group contains three biological replicates and three technical replicates. '(in L516-517). 'The experiment was independently repeated three times. '(in L533-534).

      (15) Line 369ff

      Please provide final concentrations used for assays here.

      Thank you very much for your advice. The final concentration has been displayed in the Figure (in Fig6. A, B) (in Fig. S3). And we have provided supplementary Table 2 to reflect the concentration in a more intuitive way.(in Table. S2)

      (16) Line 383

      Please provide a reference or data on the use of F2du for transformant selection and explain the abbreviation.

      Thank you very much for your advice. Based on your suggestion, we have provided the full name and references of F2du. 'Transformants were selected on PDA plates containing either 100 μg/mL Hygromycin B (Yeasen, Shanghai, China) or 0.2 μmol/mL 5-Fluorouracil 2'-deoxyriboside (F2du) (Solarbio, Beijing, China)(Zhao et al., 2022). '(in L405-407).

      (17) Line 407

      Please provide a reference for the method and at least a brief description.

      Thank you very much for your advice. Based on your suggestion, we have added references and provided a brief introduction to the method. 'As previously described (Tang et al., 2020; Wang et al., 2025), Specifically, coleoptiles were inoculated with conidial suspensions and incubated for 14 days, while leaves were inoculated with fresh mycelial plugs and incubated for 5 days, followed by observation and quantification of disease symptoms. DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018). '(in L466-471)

      (18) Line 414ff

      Also, here, the amount of biomass has to be considered for the measurement to be able to distinguish if actually less of the compounds were produced or if the effect seen was merely due to an altered amount of biomass present.

      Thank you very much for your advice. We believe that biomass is not within the scope of our measurement indicators, as we have measured and calculated based on unit hyphae. Therefore, we have ruled out experimental bias caused by a decrease in biomass.

      RNA and RT-qPCR

      (19) Line 461

      When the strains were transferred to AEA medium, was the biomass measured, at least wet weight, and in which culture volume was it done? It makes a big difference if the amount of (wet) biomass dilutes a small amount of fungicide-containing culture or if biomass is added in at least roughly equal amounts in sufficient growth medium to ensure equal conditions.

      Thank you very much for your question. Our sample processing controlled the wet weight of the samples before dosing, ensuring that the wet weight of the mycelium obtained from each sample before dosing was 0.2g. The mycelium was obtained through AEA with a volume of 100mL. This ensured consistency in the initial biomass between groups before dosing, and also ensured the accuracy of the drug concentration.

      (20) Line 466

      Please provide the name and supplier of the kit.

      Thank you very much for your advice. We have added corresponding content in the corresponding location. 'Mycelium was collected and total RNA was extracted following the instructions provided by the Total RNA Extraction Kit (Tiangen, Beijing, China).' (in L523-524).

      (21) All primer sequences must be provided in a table.

      Thank you very much for your advice. We have presented all the primers used in this study in Supplementary Table 1. (in Table S1).

      (22) For RT qPCR it is essential to check the RNA quality to be sure that the obtained results are not artifacts due to varying quality, which may exceed differences. Please state how quality control was done and which threshold was applied for high-quality RNA to be used in RTqPCR (like RIN factor, etc).

      Thank you very much for your question. We performed stringent quality control on the extracted total RNA. First, a micro-spectrophotometer was used to measure RNA concentration and purity, confirming that the A260/A280 ratio was between 1.8 and 2.0 and the A260/A230 ratio was greater than 2.0, indicating good RNA purity without significant protein or organic solvent contamination.Subsequently, verification by agarose gel electrophoresis revealed distinct 28S and 18S rRNA bands, demonstrating good RNA integrity and absence of degradation.

      Author response image 2.

      (B): Minor Comments:

      (1) Please increase the font size of the labels and annotations of the figures; it is hard to read as it is now.

      Thank you very much for your advice. We have increased the size of annotations or numerical labels in the corresponding images for better reading.

      (2) Throughout the manuscript: Please check that all abbreviations are explained at first use.

      Thank you very much for your advice. We have checked the entire text to ensure that abbreviations have their full names when they first appear.

      (3) I do hope that the authors can clarify all concerns and provide an amended manuscript of this interesting story.

      Thank you very much for your advice. Sincerely thank you for your suggestions and questions, which have been very helpful to us.

      Reviewer #2:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper is innovative, but there are issues in the writing that need to be addressed and corrected.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions with red words. In the response comments, to highlight the specific positions of the revised parts in the manuscript with red line number. The point to point responds to the reviewer’s comments are listed as following.

      Weaknesses:

      (1) The authors speculate that cyazofamid treatment caused upregulation of the assembly factors, leading to a change in the conformation of the Qi protein, thus restoring the enzyme activity of complex III. But no speculation was given in the discussion as to why this would lead to the upregulation of assembly factors, and how the upregulation of assembly factors would change the protein conformation, and is there any literature reporting a similar phenomenon? I would suggest adding this to the discussion.

      Thank you very much for your advice. Based on your suggestion, we have added content related to the assembly factor of complex III in the discussion section and made modifications to the corresponding wording. 'Previous studies have reported that mutations in the Complex III assembly factors TTC19, UQCC2, and UQCC3 impair the assembly and activity of Complex III (Feichtinger et al., 2017; Wanschers et al., 2014). '(in L345-347). 'In conclusion, our findings suggest that the overexpression of assembly factors FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 potentially modifies the conformation of the Qi site, which specifically modulates the sensitivity of F. graminearum to cyazofamid. '(in L352-355).

      (2) Would increased sensitivity of the mutant to cell wall stress be responsible for the excessive curvature of the mycelium?

      Thank you very much for your question. We believe that the sensitivity of ΔFgDML1 to osmotic stress is reduced, which may not be related to hyphal bending, as shown in the Author response image 3. During the conidia stage, ΔFgDML1 cannot germinate in YEPD, while the application of 1M Sorbitol promotes its germination. But it is caused by internal unknown mechanisms, which is also the focus of our future research.

      Author response image 3.

      (3) The vertical coordinates of Figure 7B need to be modified with positive inhibition rates for the mutants.

      Thank you very much for your advice. The display in Figure 7B truly reflects its inhibition rate. In the Δ FgDML1 mutant, when subjected to osmotic stress treatment, the inhibition rate becomes negative, indicating that the colony growth is greater than that of the CK. Therefore, the negative inhibition rate is shown in Figure 7B.

      (1) In Figure 1B, Figure 3C, and Figure 6C, the scale below the picture is not clear. In Figure 5D, the histogram is unclear, and it is recommended to redraw the graph.

      Thank you very much for your advice. The issue with the above images may be due to Word compression. We have changed the settings and enlarged the images as much as possible to better display them.

      (2) The full Latin name of the strain should be used in the title of figures and tables.

      Thank you very much for your advice. Based on your suggestion, we have used the full names of the strains appearing in the title of figures and tables.

      (3) Proteins in line 117 should be abbreviated.

      Thank you very much for your advice. Based on your suggestion, we have abbreviated the corresponding positions. 'The DML1 protein from S. cerevisiae was used as a query for a BLAST search against the Fusarium genome database, resulting in the identification of the putative DML1 gene FgDML1 (FGSG_05390) in F. graminearum. '(in L118-120).

      (4) The sentence in lines 187-189, which is supposed to introduce why the test is sensitive to the three drugs, is currently illogical.

      Thank you very much for your advice. Based on your suggestion, we have made modifications to the corresponding sections. 'Since Complex III is involved in the action of both cyazofamid (targeting the QI site) and pyraclostrobin (targeting the QO site), the sensitivity of ΔFgDML1 to cyazofamid and pyraclostrobin was investigated. ' (in L214-216).

      (5) The expression of FgQCR2, FgQCR7, and FgQCR8 was significantly upregulated in ΔFgDML1 at transcription levels. Do FgQCR2, FgQCR8, and FgQCR9 show upregulated expression at the protein level?

      Thank you very much for your question. Based on your suggestion, we evaluated the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in PH-1 and ΔFgDML1, and we found that the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 were higher than those in PH-1. (in Fig. 6F).

      (6) In Figure 7B, it is recommended to adjust the position of the horizontal axis labels in the histogram.

      Thank you very much for your advice. Based on your suggestion, we have made modifications to the corresponding sections.(in Fig. 7B)

      (7) There are numerous errors in the writing of gene names in the text. Please check the full text and change the writing of gene names and mutant names to italic.

      Thank you very much for your advice. We have checked the entire text to ensure that all genes have been italicized.

      (8) All acronyms should be spelled out in figure and table captions. e.g., F. graminearum.

      Thank you very much for your advice. Based on your suggestion, we have used the full names of the strains appearing in the title of figures and tables.

      (9) In line 492, P should be lowercase and italic.

      Thank you very much for your advice. Based on your suggestion, we have made adjustments to the corresponding content.

      Reviewer #3:

      Summary:

      The manuscript "Mitochondrial 1 protein FgDML1 regulates DON toxin biosynthesis and cyazofamid sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" describes the construction of a null mutant for the FgDML1 gene in F. graminearum and assays characterising the effects of this mutation on the pathogen's infection process and lifecycle. While FgDML1 remains underexplored with an unclear role in the biology of filamentous fungi, and although the authors performed several experiments, there are fundamental issues with the experimental design and execution, and interpretation of the results.

      Strengths:

      FgDML1 is an interesting target, and there are novel aspects in this manuscript. Studies in other organisms have shown that this protein plays important roles in mitochondrial DNA (mtDNA) inheritance, mitochondrial compartmentalisation, chromosome segregation, mitochondrial distribution, mitochondrial fusion, and overall mitochondrial dynamics. Indeed, in Saccharomyces cerevisiae, the mutation is lethal. The authors have carried out multi-faceted experiments to characterise the mutants.

      Weaknesses:

      However, I have concerns about how the study was conceived. Given the fundamental importance of mitochondrial function in eukaryotic cells and how the absence of this protein impacts these processes, it is unsurprising that deletion of this gene in F. graminearum profoundly affects fungal biology. Therefore, it is misleading to claim a direct link between FgDML1 and DON toxin biosynthesis (and virulence), as the observed effects are likely indirect consequences of compromised mitochondrial function. In fact, it is reasonable to assume that the production of all secondary metabolites is affected to some extent in the mutant strains and that such a strain would not be competitive at all under non-laboratory conditions. The order in which the authors present the results can be misleading, too. The results on vegetative growth rate appeared much later in the manuscript, which should have come first, as the FgDML1 mutant exhibited significant growth defects, and subsequent results should be discussed in that context. Moreover, the methodologies are not described properly, making the manuscript hard to follow and difficult to replicate.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions with red words. In the response comments, to highlight the specific positions of the revised parts in the manuscript with red line number. The point to point responds to the reviewer’s comments are listed as following.

      For weaknesses,we arranged the sequence in this way to emphasize the novel discovery between mitochondria and DON toxin. We found a significant decrease in DON toxin in Δ FgDML1, defects in the formation of toxin producing bodies, and downregulation of FgTRis at both the gene and protein levels. In summary, we believe that the absence of FgDML1 does indeed lead to a decrease in the content of DON toxin, and FgDML1 plays a regulatory role in the synthesis of DON toxin. In addition, our measurements of DON toxin, acetyl CoA, ATP and other indicators are all based on the amount per unit hyphae, excluding differences caused by hyphal biomass or growth. We have further refined the materials and methods to facilitate better reading and understanding.

      (1) Lines 37-39: The disease itself does not produce toxins; it is the fungus that causes the disease that produces toxins. Moreover, the disease symptoms observed are likely caused by the toxins produced by the fungus.

      Thank you very much for your advice. We have made modifications to the wording of the corresponding sections. 'Studies have shown that increased DON levels are positively correlated with the pathogenicity rate of F. graminearum.'(in L36-37).

      (2) Lines 82-87: While it is challenging to summarise the role of ATP in just a few words, this section needs improvement for clarity and accuracy. Additionally, I do not believe that drawing a direct link between mitochondrial defects and toxin production is an appropriate strategy in this case.

      Thank you very much for your advice. Based on your suggestion, we have added corresponding descriptions in the corresponding positions to provide more information on the relationship between ATP and toxins, in order to better prepare for the following text. 'Pathogen-intrinsic ATP homeostasis is recognized as a critical, rate-limiting determinant for toxin biosynthesis. Previous studies indicate that dual-target inhibition of ATP synthase (AtpA) and adenine deaminase (Ade) by a specific small-molecule probe effectively depletes intracellular ATP, consequently suppressing the synthesis of key virulence factors TcdA and TcdB transcriptionally and translationally(Marreddy et al., 2024). The systemic toxicity of Anthrax Edema Toxin (ET) is primarily attributed to its catalytic activity, which depletes the host cell's ATP reservoir, thereby triggering a bioenergetic collapse that culminates in cell lysis and death(Liu et al., 2025). '(in L78-86).

      (3) Lines 125-126: The manuscript does not clearly describe how subcellular localisation was determined. This methodology needs to be properly detailed.

      Thank you very much for your advice. The subcellular localization was validated through co-localization analysis with MitoTracker Red CMXRos, a mitochondrial-specific dye. The observed overlap between the FgDML1-GFP signal and the mitochondrial marker confirmed mitochondrial localization. Based on these results, we determined that FgDML1 is definitively localized to the mitochondria.We have incorporated this description in the appropriate section of the manuscript. 'Furthermore, subcellular localization studies confirmed that FgDML1 localizes to mitochondria, as demonstrated by colocalization with a mitochondria-specific dye MitoTracker Red CMXRos (Fig. 1B). '(in L125-127).

      (4) Regarding the organisation of the Results section, it needs to be revised. While I understand the authors' intention to emphasise the impact on virulence, the results showing how FgDML1 deletion affects vegetative growth, asexual and sexual reproduction, and sensitivity to stressors should be presented before the virulence assays and effects on DON production. Additionally, the authors do not provide any clear evidence that FgDML1 directly interacts with proteins involved in asexual or sexual reproduction, stress responses, or virulence. Therefore, it is misleading to suggest that FgDML1 directly regulates these processes. The observed phenotypes are, rather, a consequence of severely impaired mitochondrial function. Without functional mitochondria, the cell cannot operate properly, leading to widespread physiological defects. In this regard, statements such as those in lines 139-140 and 343-344 are misleading.

      Thank you very much for your advice. We have adjusted the order of the images based on your suggestion, placing the characterization of ΔFgDML1 in nutritional growth, sexual reproduction, and other aspects before DON toxin. And we have made adjustments to the corresponding statements. 'These findings demonstrate that FgDML1 is a positive regulator of virulence in F. graminearum. '(in L140-141).

      (5) Lines 185-186: The authors do not provide sufficient evidence to support the claim that FgQCR2, FgQCR8, and FgQCR9 overexpression is the main cause of reduced cyazofamid sensitivity. Although expression of these genes is altered, reduced sensitivity may result from changes in other proteins or pathways. To strengthen this claim, overexpression of FgQCR2, 8, and 9 in the wild-type background, followed by assessment of cyazofamid resistance, would be necessary. As it stands, there is no support for the claim presented in lines 329-332.

      Thank you very much for your advice. To establish a causal link between the overexpression of FgQCR2, FgQCR7, and FgQCR8 and the observed reduction in cyazofamid sensitivity, we first quantified the protein levels of these assembly factor. Western blot analysis confirmed their elevated expression in the ΔFgDML1 mutant compared to the wild-type PH-1. We further generated individual overexpression strains for FgQCR2, FgQCR7, and FgQCR8 in the wild-type PH-1 background. Fungicide sensitivity assays revealed that all three overexpression mutants displayed significantly reduced sensitivity to cyazofamid compared to the parental strain. These genetic complementation experiments confirm that upregulation of FgQCR2, FgQCR7, and FgQCR8 is sufficient to confer reduced cyazofamid sensitivity.We have incorporated these explanations and provided supporting images in the appropriate section of the manuscript. 'To further clarify whether the upregulated expression of FgQCR2, FgQCR7, and FgQCR8 genes affects their protein expression levels, we measured the protein levels. The results showed that the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 were higher than those in PH-1(Fig. 6F). Subsequently, we overexpressed FgQCR2, FgQCR7, and FgQCR8 in the wild-type background, and the corresponding overexpression mutants exhibited reduced sensitivity to cyazofamid(Fig. 6E). '(in L205-211)(in Fig. 6E, F)

      (6) Lines 187-190: This segment is confusing and difficult to follow. It requires rewriting for clarity.

      Thank you very much for your advice. Based on your suggestion, we have made corresponding modifications in the corresponding locations. 'Since Complex III is involved in the action of both cyazofamid (targeting the QI site) and pyraclostrobin (targeting the QO site), the sensitivity of ΔFgDML1 to cyazofamid and pyraclostrobin was investigated. ''(in L214-216)

      (7) Lines 345-346: The authors state that in this study, FgDML1 is localised in mitochondria, which implies that in other studies, its localisation was different. Is this accurate? Clarification is needed.

      Thank you very much for your question. In previous studies, the localization of this protein was not clearly defined, and its function was only emphasized to be related to mitochondria. Whether in yeast or in Drosophila melanogaster. (Miklos et al., 1997; Gurvitz et al., 2002)

      Miklos GLG, Yamamoto M-T, Burns RG, Maleszka R. 1997. An essential cell division gene of drosophila, absent from saccharomyces, encodes an unusual protein with  tubulin-like and myosin-like peptide motifs. Proc Natl Acad Sci 94:5189–5194. doi:10.1073/pnas.94.10.5189

      Gurvitz A, Hartig A, Ruis H, Hamilton B, de Couet HG. 2002. Preliminary characterisation of DML1, an essential saccharomyces cerevisiae gene related to misato of drosophila melanogaster. FEMS Yeast Res 2:123–135. doi:10.1016/S1567-1356(02)00083-1

      Material and Methods Section

      (8) In general, the methods require more detailed descriptions, including the brands and catalog numbers of reagents and kits used. Simply stating that procedures were performed according to manufacturers' instructions is insufficient, particularly when the specific brand or kit is not identified.

      Thank you very much for your advice. We have added corresponding content based on your suggestion to more comprehensively display the reagent brand and complete product name. 'Transformants were selected on PDA plates containing either 100 μg/mL Hygromycin B (Yeasen, Shanghai, China) or 0.2 μmol/mL 5-Fluorouracil 2'-deoxyriboside (F2du) (Solarbio, Beijing, China)(Zhao et al., 2022). ' (in L405-407). 'DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018) '. (in L469-471)

      (9) Line 364: What do CM and MM stand for? Please define.

      Thank you very much for your advice. Based on your suggestion, we have made modifications in the corresponding locations. 'To evaluate vegetative growth, complete medium (CM), minimal medium (MM), and V8 Juice Agar (V8) media were prepared as described previously(Tang et al., 2020). '(in L385-387)

      Generation of Deletion and Complemented Mutants:

      (10) This section lacks detail. For example, were PCR products used directly for PEG-mediated transformation, or were the fragments cloned into a plasmid?

      Thank you very much for your question. We directly use the fused fragments for protoplast transformation after sequencing confirmation. We have clearly defined the fragment form used for transformation at the corresponding location. 'The resulting fusion fragment was transformed into the wild-type F. graminearum PH-1 strain via polyethylene glycol (PEG)-mediated protoplast transformation. '(in L403-405).

      (11) PCR and Southern blot validation results should be included as supplementary material, along with clear interpretations of these results.

      Thank you very much for your advice. In the supplementary material we submitted, Supplementary Figure 2 already includes the results of PCR and Southern blot validation.(in Fig. S2)

      (12) There is almost no description of how the mutants mentioned in lines 388-390 were generated.

      Thank you very much for your advice. Based on your suggestions, we have added relevant content in the appropriate sections to more comprehensively and clearly reflect the experimental process. 'Specifically, FgDML1, including its native promoter region and open reading frame (ORF) (excluding the stop codon), was amplified.The PCR product was then fused with the XhoI -digested pYF11 vector. After transformation into E. coli and sequence verification, the plasmid was extracted and subsequently introduced into PH-1 protoplasts. For FgDnm1-3×Flag, the 3×Flag tag was added to the C-terminus of FgDnm1 by PCR, fused with the hygromycin resistance gene and the FgDnm1 downstream arm, and then introduced into PH-1 protoplasts. The overexpression mutant was constructed according to a previously described method. Specifically, the ORF of FgDML1 was amplified and the PCR product was ligated into the SacII-digested pSXS overexpression vector. The resulting plasmid was then transformed into PH-1 protoplasts (Shi et al., 2023). For the construction of PH-1::FgTri1+GFP and ΔFgDML1::FgTri1+GFP, the ORF of FgTri1 was amplified and ligated into the XhoI-digested pYF11 vector as described above. The resulting vectors were then transformed into protoplasts of PH-1 or ΔFgDML1, respectively.'(in L413-426).

      Vegetative Growth and Conidiation Assays:

      (13) There is no information about how long the plates were incubated before photos were taken. Judging by the images, it appears that different incubation times may have been used.

      Thank you very much for your advice. Due to the slower growth of ΔFgDML1, we adopted different incubation periods and have supplemented the relevant content in the corresponding section. 'All strains were incubated at 25°C in darkness; however, due to ΔFgDML1 slower growth, the ΔFgDML1 mutant required a 5-day incubation period compared to the 3 days used for PH-1 and ΔFgDML1-C. '(in L490-493).

      (14) There is no description of the MBL medium.

      Thank you very much for your advice. Based on your suggestion, we have supplemented the corresponding content in the corresponding positions. 'Mung bean liquid (MBL) medium was used for conidial production, while carrot agar (CA) medium was utilized to assess sexual reproduction(Wang et al., 2011). '(in L387-389).

      DON Production and Pathogenicity Assays:

      (15) Were DON levels normalised to mycelial biomass? The vegetative growth assays show that FgDML1 null mutants exhibit reduced growth on all tested media. If mutant and wild-type strains were incubated for the same period under the same conditions, it is reasonable to assume that the mutants accumulated significantly less biomass. Therefore, results related to DON production, as well as acetyl-CoA and ATP levels, must be normalised to biomass.

      Thank you very much for your question. We have taken into account the differences in mycelial biomass. Therefore, when measuring DON, acetyl-CoA, and ATP levels, all data were normalized to mycelial mass and calculated as amounts per unit of mycelium, thereby avoiding discrepancies arising from variations in biomass.

      Sensitivity Assays:

      (16) While the authors mention that gradient concentrations were used, the specific concentrations and ranges are not provided. Importantly, have the plates shown in Figure 5 been grown for different periods or lengths? Given the significantly reduced growth rate shown in Figure 6A, the mutants should not have grown to the same size as the WT (PH-1) as shown in Figures 5A and 5B unless the pictures have been taken on different days. This needs to be explained.

      Thank you very much for your question. Due to the slower growth of ΔFgDML1, we adopted different incubation periods and have supplemented the relevant content in the corresponding section. 'All strains were incubated at 25°C in darkness; however, due to ΔFgDML1 slower growth, the ΔFgDML1 mutant required a 5-day incubation period compared to the 3 days used for PH-1 and ΔFgDML1-C. '(in L490-493).

      (17) Additionally, was inhibition measured similarly for both stress agents and fungicides? This should be clarified.

      Thank you very much for your question. We have supplemented the specific concentration gradient of fungicides. 'The concentration gradients for each fungicide in the sensitivity assays were set up according to Supplementary Table S2. '(in L493-494)(in Table. S2).

      Complex III Enzyme Activity:

      (18) A more detailed description of how this assay was performed is needed.

      Thank you very much for your advice. We have provided further detailed descriptions of the corresponding sections. 'Briefly, 0.1 g of mycelia was homogenized with 1 mL of extraction buffer in an ice bath. The homogenate was centrifuged at 600 ×g for 10 min at 4°C. The resulting supernatant was then subjected to a second centrifugation at 11,000 ×g for 10 min at 4°C. The pellet was resuspended in 200 μL of extraction buffer and disrupted by ultrasonication (200 W, 5 s pulses with 10 s intervals, 15 cycles). Complex III enzyme activity was finally measured by adding the working solution as per the manufacturer's protocol. '(in L511-517)

      (19) Were protein concentrations standardised prior to the assay?

      Thank you very much for your question. Protein concentrations for all Western blot samples were quantified using a BCA assay kit to ensure equal loading.

      (20) Line 448: Are ΔFgDML1::Tri1+GFP and ΔFgDML1+GFP the same strain? ΔFgDML1::Tri1+GFP has not been previously described.

      Thank you very much for your question. These two strains are not the same strain, and we have supplemented their construction process in the corresponding section. 'For the construction of PH-1::FgTri1+GFP and ΔFgDML1::FgTri1+GFP, the ORF of FgTri1 was amplified and ligated into the XhoI-digested pYF11 vector as described above. The resulting vectors were then transformed into protoplasts of PH-1 or ΔFgDML1, respectively. '(in L423-426)

      (21) Lines 460 and 468: Please adopt a consistent nomenclature, either RT-qPCR or qRT-PCR.

      Thank you very much for your advice. We have unified it and modified the corresponding content in the corresponding sections. 'Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) was carried out using the QuantStudio 6 Flex real-time PCR system (Thermo, Fisher Scientific, USA) to assess the relative expression of three subunits of Complex III (FgCytb, FgCytc1, FgISP), five assembly factors (FgQCR2, FgQCR6, FgQCR7, FgQCR8, FgQCR9), and DON biosynthesis-related genes (FgTri5 and FgTri6). '(in L526-531)

      (22) Lines 472-473: Why was FgCox1 used as a reference for FgCytb? Clarification is needed.

      Thank you very much for your question. FgCytb (cytochrome b) and FgCOX1 (cytochrome c oxidase subunit I) are both encoded by the mitochondrial genome and serve as core components of the oxidative phosphorylation system (Complex III and Complex IV, respectively). Their transcription is co-regulated by mitochondrial-specific mechanisms in response to cellular energy status. Consequently, under experimental conditions that perturb energy homeostasis, FgCOX1 expression exhibits relative, context-dependent stability with FgCytb, or at least co-varies directionally, making it a superior reference for normalizing target gene expression. In contrast, FgGapdh operates within a distinct genetic and regulatory system. Using FgCOX1 ensures that both reference and target genes reside within the same mitochondrial compartment and functional module, thereby preventing normalization artifacts arising from independent variation across disparate pathways.

      (23) Lines 476-477: This step requires a clearer and more detailed explanation.

      Thank you very much for your advice. We provided detailed descriptions of them in their respective positions. 'For FgDnm1-3×Flag, the 3×Flag tag was added to the C-terminus of FgDnm1 by PCR, fused with the hygromycin resistance gene and the FgDnm1 downstream arm, and then introduced into PH-1 protoplasts. '(in L417-419). 'The FgDnm1-3×Flag fragment was introduced into PH-1 and FgDML1+GFP protoplasts, respectively, to obtain single-tagged and double-tagged strains. '(in L541-543)

      Western blotting:

      (24) Uncropped Western blot images should be provided as supplementary material.

      Thank you very much for your advice. All Western blot images will be submitted to the supplementary material package.

      (25) Lines 485-489: A more thorough description of the antibodies used (including source, catalogue number, and dilution) is necessary.

      Thank you very much for your advice. The antibodies used are clearly stated in terms of brand, catalog number, and dilution. We have added the dilution ratio. 'All antibodies were diluted as follows: primary antibodies at 1:1000 and secondary antibodies at 1:10000. '(in L550-551)

      (26) The Western blot shown in Figure 3D appears problematic, particularly the anti-GAPDH band for FgDML1::FgTri1+GFP. Are both anti-GAPDH bands derived from the same gel?

      Thank you very much for your advice. We are unequivocally certain that these data derive from the same gel. Therefore, we are providing the original image for your inspection.

      Author response image 4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.

      We have revised the paper for clarity at all levels: motivation, application, and parameterization. We clarify that there is a large unmet need for using RSA in a trial-wise manner, and that this approach indeed offers benefits to any team interested in decoding trial-wise representational information linked to a behavioral responses, and as such is not a problem specific to a single memory study.

      (2) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.

      We appreciate the summary of relevant literature and have included a revised Introduction to address this bounty of relevant work. While much is owed to these authors, new developments from a diverse array of researchers outside of a single group can aid in new research questions, and should always have a place in our research landscape. We owe much to the work of Kriegeskorte’s group, and in fact, Schutt et al., 2023 served as a very relevant touchpoint in the Discussion and helped to highlight specific needs not addressed by the assessment of the “representational geometry” of an entire presented stimulus set. Principal amongst these needs is the application of trial-wise representational information that can be related to trial-wise behavioral responses and thus used to address specific questions on brain-behavior relationships. We invite the Reviewer to consider the utility of this shift with the following revisions to the Introduction.

      Page 3. “Recently, methodological advancements have addressed many known limitations in cRSA. For example, cross-validated distance measures (e.g., Euclidean distance) have improved the reliability of representational dissimilarities in the presence of noise and trial imbalance (Walther et al., 2016; Nili et al., 2014; Diedrichsen et al., 2021). Bayesian approaches such as pattern component modeling (Diedrichsen, Yokoi, & Arbuckle, 2018) have extended representational approaches to accommodate continuous stimulus features or temporal variation. Further, model comparison RSA strategies (Diedrichsen et al., 2021) and generalization techniques across stimuli (Schütt et al., 2023) have improved sensitivity and inference. Nevertheless, a common feature shared across most of improvements is that they require stimuli repetition to examine the representational structure. This requirement limits their ability to probe brain-behavior questions at the level of individual events”.

      Page 8. “While several extensions of RSA have addressed key limitations in noise sensitivity, stimulus variance, and modeling (e.g., Diedrichsen et al., 2021; Schütt et al., 2023), our tRSA approach introduces a new methodological step by estimating representational strength at the trial level. This accounts for the multi-level variance structure in the data, affords generalizability beyond the fixed stimulus set, and allows one to test stimulus- or trial-level modulations of neural representations in a straightforward way”.

      Page 44. “Despite such prevalent appreciation for the neurocognitive relevance of stimulus properties, cRSA often does not account for the fact that the same stimulus (e.g., “basketball”) is seen by multiple subjects and produces statistically dependent data, an issue addressed by Schütt et al., 2023, who developed cross validation and bootstrap methods that explicitly model dependence across both subjects and stimulus conditions”.

      (3) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".

      At a general level, our approach rests on the premise that there is meaningful information present in a single presentation of a given stimulus. This assumption may have less utility when the research goals are more focused on estimating the fidelity of signal patterns for RSA, as in designs with multiple repetitions. But it is an exaggeration to state that such a trial-wise approach cannot address the difference between “true” stimulus patterns and noise. This trial-wise approach has explicit utility in relating trial-wise brain information to trial-wise behavior, across multiple cognitions (not only memory studies, as applied here). We have added substantial text to the Introduction distinguishing cRSA, which is widely employed, often in cases with a single repetition per stimulus, and model comparative methods that employ multiple repetitions. We clarify that we do not consider tRSA an alternative to the model comparative approach, and discuss that operational definitions of representational strength are constrained by the study design.

      Page 3. “In this paper, we present an advancement termed trial-level RSA, or tRSA, which addresses these limitations in cRSA (not model comparison approaches) and may be utilized in paradigms with or without repeated stimuli”.

      Page 4. “Representational geometry usually refers to the structure of similarities among repeated presentations of the same stimulus in the neural data (as captured in the brain RSM) and is often estimated utilizing a model comparison approach, whereas representational strength is a derived measure that quantifies how strongly this geometry aligns with a hypothesized model RSM. In other words, geometry characterizes the pattern space itself, while representational strength reflects the degree of correspondence between that space and the theoretical model under test”.

      Finally, we clarified that in our simulation methods we assume a true underlying activity pattern and a random error pattern. The model RSM is computed based on the true pattern, whereas the brain RSM comes from the noisy pattern, not the model RSM itself.

      Page 9. “Then, we generated two sets of noise patterns, which were controlled by parameters σ<sub>A</sub> and σ<sub>B</sub> , respectively, one for each condition”.

      (4) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).

      We have added notations for true and measured activity patterns to differentiate it from our notation for variance. We agree that multilevel models are usually defined at the level of means rather than at the level of variances and we include a Figure (Fig 1D) that describes the model in terms of the means. We clarify that the σ ($\sigma$) used in the manuscript were not variances/standard deviations themselves; rather, they were meant to denote components of the actual (multilevel) variance parameter. Each component was sampled from normal distributions, and they collectively summed up to comprise the final variance parameter for each trial. We have modified our notation for each component to the lowercase letter s to minimize confusion. We have also made our R code publicly available on our lab github, which should provide more clarity on the exact simulation process.

      (5) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.

      We have added justification of the mixed-effects model given the potential assumption violations. We caution readers to investigate the robustness of their models, and to employ permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. Finally, we agree that the first simulation setting does not possess several properties of realistic RDMs/RSMs; however, we believe that there is utility in understanding the mathematical properties of correlations – an essential component of RSA – in a straightforward simulation where the ground truth is known, thus moving the simulation to Appendix 1.

      (6) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.

      The Reviewer is correct that the voxel values in the true pattern are drawn from i.i.d. standard normal distributions. We take the Reviewer’s suggestion of “condition-specific pattern” to mean that there could be a condition-voxel interaction in two non-mutually exclusive ways. The first is additive, essentially some common underlying multi-voxel pattern like [6, 34, -52, …, 8] for all condition A trials, and different one such pattern for condition B trials, etc. The second is multiplicative, essentially a vector of scaling factors [x1.5, x0.5, x0.8, …, x2.7] for all condition A trials, and a different one such vector for condition B trials, etc. Both possibilities could indeed affect tRSA as much as it would cRSA.

      Importantly, If such a strong condition-specific pattern is expected, one can build a condition-specific model RDM using one-shot coding of conditions (see example figure; src: https://www.newbi4fmri.com/tutorial-9-mvpa-rsa), to either capture this interesting phenomenon or to remove this out as a confounding factor. This practice has been applied in multiple regression cRSA approaches (e.g., Cichy et al., 2013) and can also be applied to tRSA.

      (7) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.

      We appreciate this important warning, and now caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the supplement.

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models. The multilevel structure of RSA data introduces potential dependencies across subjects, stimuli, and trials, which can violate assumptions of independence if not properly modeled. In the present study, we used a model that included random intercepts for both subjects and stimuli, which accounts for variance at these levels and improves the generalizability of fixed-effect estimates. Still, there is a potential for systematic dependence across trials within a subject. To ensure that the model assumptions were satisfied, we conducted a series of diagnostic checks on an exemplar ROI (right LOC; middle occipital gyrus) in the Object Perception dataset, including visual inspection of residual distributions and autocorrelation (Appendix 3, Figure 13). These diagnostics supported the assumptions of normality, homoscedasticity, and conditional independence of residuals. In addition, we conducted permutation-based inference, similar to prior improvements to cRSA (Niliet al. 2014), using a nested model comparison to test whether the mean similarity in this ROI was significantly greater than zero. The observed likelihood ratio test statistic fell in the extreme tail of the null distribution (Appendix 3, Figure 14), providing strong nonparametric evidence for the reliability of the observed effect. We emphasize that this type of model checking and permutation testing is not merely confirmatory but can help validate key assumptions in RSA modeling, especially when applying mixed-effects models to neural similarity data. Researchers are encouraged to adopt similar procedures to ensure the robustness and interpretability of their findings”.

      Exemplar Permutation Testing

      To test whether the mean representational strength in the ROI right LOC (middle occipital gyrus) was significantly greater than zero, we used a permutation-based likelihood ratio test implemented via the permlmer function. This test compares two nested linear mixed-effects models fit using the lmer function from the lme4 package, both including random intercepts for Participant and Stimulus ID to account for between-subject and between-item variability.

      The null model excluded a fixed intercept term, effectively constraining the mean similarity to zero after accounting for random effects:

      ROI ~ 0 + (1 | Participant) + (1 | Stimulus)

      The full model included the same random effects structure but allowed the intercept to be freely estimated:

      ROI ~ 1 + (1 | Participant) + (1 | Stimulus)

      By comparing the fit of these two models, we directly tested whether the average similarity in this ROI was significantly different from zero. Permutation testing (1,000 permutations) was used to generate a nonparametric p-value, providing inference without relying on normality assumptions. The full model, which estimated a nonzero mean similarity in the right LOC (middle occipital gyrus), showed a significantly better fit to the data than the null model that fixed the mean at zero (χ²(1) = 17.60, p = 2.72 × 10⁻⁵). The permutation-based p-value obtained from permlmer confirmed this effect as statistically significant (p = 0.0099), indicating that the mean similarity in this ROI was reliably greater than zero. These results support the conclusion that the right LOC contains representational structure consistent with the HMAXc2 RSM. A density plot of the permuted likelihood ratio tests is plotted along with the observed likelihood ratio test in Appendix 3 Figure 14.

      (8) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.

      We agree that the comparability between mean row-wise Spearman correlations and the matrix-wise Spearman correlation is needed. We believe that the simulations are the best approach for this comparison, since they are much more robust than the empirical dataset and have the advantage of knowing the true pattern/noise levels. We expand on our comparison of mean tRSA values and matrix-wise Spearman correlations on page 42.

      Page 42. “Although tRSA and cRSA both aim to quantify representational strength, they differ in how they operationalize this concept. cRSA summarizes the correspondence between RSMs as a single measure, such as the matrix-wise Spearman correlation. In contrast, tRSA computes such correspondence for each trial, enabling estimates at the level of individual observations. This flexibility allows trial-level variability to be modeled directly, but also introduces subtle differences in what is being measured. Nonetheless, our simulations showed that, although numerical differences occasionally emerged—particularly when comparing between-condition tRSA estimates to within-condition cRSA estimates—the magnitude of divergence was small and did not affect the outcome of downstream statistical tests”.

      (9) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.

      We recognize the potential of condition-specific patterns and chose to constrain the analyses to those most comparable with cRSA. However, depending on their hypotheses, researchers may consider testing condition RSMs and utilizing a model comparison approach or employ the z-scored approach, as employed in the simulations above. Regarding the potential run confounds, this is always the case in RSA and why we exclude within-run comparisons. We have also added to the Discussion the suggestion to include run as a covariate in their mixed-effects models. However, we do not employ this covariate here as we preferred the most parsimonious model to compare with cRSA.

      Page 46 - 47. “Further, while analyses here were largely employed to be comparable with cRSA, researchers should consider taking advantage of the flexibility of the mixed-effects models and include co variates of non-interest (run, trial order etc.)”.

      (10) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli.

      We agree that studies advancing designs with multiple repetitions of a given stimulus image are useful in estimating the reliability of concept representations. We would argue however that model comparison in RSA is not restricted to such data. Many extant studies do not in fact have multiple repetitions per stimulus per subject (Wang et al., 2018 https://doi.org/10.1088/1741-2552/abecc3, Gao et al, 2022 https://doi.org/10.1093/cercor/bhac058, Li et al, 2022 https://doi.org/10.1002/hbm.26195, Staples & Graves, 2020 https://doi.org/10.1162/nol_a_00018) that allow for that type of model-comparative approach. While beneficial in terms of noise estimation, having multiple presentations was not a requirement for implementing cRSA (Kriegeskorte, 2008 https://doi.org/10.3389/neuro.06.004.2008). The aim of this manuscript is to introduce the tRSA approach to the broad community of researchers whose research questions and datasets could vary vastly, including but not limited to the number of repeated presentations and the balance of trial counts across conditions.

      (11) Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here.

      We have added language on the value of cross-validation approaches to RSA in the Discussion:

      Page 47. “Additionally, we note that while our proposed tRSA framework provides a flexible and statistically principled approach for modeling trial-level representational strength, we acknowledge that there are alternative methods for addressing trial-level variability in RSA. In particular, the use of cross-validated distance metrics (e.g., crossnobis distance) has become increasingly popular for controlling differences in measurement noise variance and accounting for possible covariance structures across trials (Walther et al., 2016). These metrics offer several advantages, including unbiased estimation of representational dissimilarities under Gaussian noise assumptions and improved generalization to unseen data. However, cross-validated distances are conceptually distinct from the approach taken here: whereas cross-validation aims to correct for noise-related biases in representational dissimilarity matrices, our trial-level RSA method focuses on estimating and modeling the variability in representation strength across individual trials using mixed-effects modeling. Rather than proposing a replacement for cross-validated RSA, tRSA adds a complementary tool to the methodological toolkit—one that supports hypothesis-driven inference about condition effects and trial-level covariates, while leveraging the full structure of the data”.

      (12) One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.

      Indeed, the chosen model RSM may not be the true RSM, but as the noise level increases the correlation between RSMs practically becomes zero. In our simulations we assume this to be true as a straightforward way to manipulate the correspondence between the brain data and the model. However, just like cRSA, tRSA is constrained by the model selections the researchers employ. We encourage researchers to have carefully considered theoretically-motivated models and, if their research questions require, consider multiple and potentially competing models. Furthermore, the trial-wise estimates produced by tRSA encourage testing competing models within the multiple regression framework. We have added this language to the Discussion.

      Page 46. ..”choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives”.

      Pages 45-46. “While a number of studies have addressed the validity of measuring representational geometry using designs with multiple repetitions, a conceptual benefit of the tRSA approach is the reliance on a regression framework that engenders the testing of competing conceptual models of stimulus representation (e.g., taxonomic vs. encyclopedic semantic features, as in Davis et al., 2021)”.

      Reviewer #2 (Public review):

      (1)  While I generally welcome the contribution, I take some issue with the accusatory tone of the manuscript in the Introduction. The text there (using words such as 'ignored variances', 'errouneous inferences', 'one must', 'not well-suited', 'misleading') appears aimed at turning cRSA in a 'straw man' with many limitations that other researchers have not recognized but that the new proposed method supposedly resolves. This can be written in a more nuanced, constructive manner without accusing the numerous users of this popular method of ignorance.

      We apologize for the unintended accusatory tone. We have clarified the many robust approaches to RSA and have made our Introduction and Discussion more nuanced throughout (see also 3, 11 and16).

      (2) The described limitations are also not entirely correct, in my view: for example, statistical inference in cRSA is not always done using classic parametric statistics such as t-tests (cf Figure 1): the rsatoolbox paper by Nili et al. (2014) outlines non-parametric alternatives based on permutation tests, bootstrapping and sign tests, which are commonly used in the field. Nor has RSA ever been conducted at the row/column level (here referred to by the authors as 'trial level'; cf King et al., 2018).

      We agree there are numerous methods that go beyond cRSA addressing these limitations and have added discussion of them into our manuscript as well as an example analysis implementing permutation tests on tRSA data (see response to 7). We thank the reviewer for bringing King et al., 2014 and their temporal generalization method to our attention, we added reference to acknowledge their decoding-based temporal generalization approach.

      Page 8. “It is also important to note that some prior work has examined similarly fine-grained representations in time-resolved neuroimaging data, such as the temporal generalization method introduced by King et al. (see King & Dehaene, 2014). Their approach trains classifiers at each time point and tests them across all others, resulting in a temporal generalization matrix that reflects decoding accuracy over time. While such matrices share some structural similarity with RSMs, they do not involve correlating trial-level pattern vectors with model RSMs nor do their second-level models include trial-wise, subject-wise, and item-wise variability simultaneously”.

      (3) One of the advantages of cRSA is its simplicity. Adding linear mixed effects modeling to RSA introduces a host of additional 'analysis parameters' pertaining to the choice of the model setup (random effects, fixed effects, interactions, what error terms to use) - how should future users of tRSA navigate this?

      We appreciate the opportunity to offer more specific proscriptions for those employing a tRSA technique, and have added them to the Discussion:

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models and choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives. However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (4) Here, only a single real fMRI dataset is used with a quite complicated experimental design for the memory part; it's not clear if there is any benefit of using tRSA on a simpler real dataset. What's the benefit of tRSA in classic RSA datasets (e.g., Kriegeskorte et al., 2008), with fixed stimulus conditions and no behavior?

      To clarify, our empirical approach uses two different tasks: an Object Perception task more akin to the classic RSA datasets employing passive viewing, and a Conceptual Retrieval task that more directly addresses the benefits of the trialwise approach. We felt that our Object Perception dataset is a simpler empirical fMRI dataset without explicit task conditions or a dichotomous behavioral outcome, whereas the Retrieval dataset is more involved (though old/new recognition is the most common form of memory retrieval testing) and  dependent on behavioral outcomes. However, we recognize the utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (5) The cells of an RDM/RSM reflect pairwise comparisons between response patterns (typically a brain but can be any system; cf Sucholutsky et al., 2023). Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. Does this raise issues with the validity of the linear mixed effects model? Does it assume the observations are linearly independent?

      We recognize the potential danger for not meeting model assumptions. Though our simulation results and model checks suggest this is not a fatal flaw in the model design, we caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. See response to R1.

      (6) The manuscript assumes the reader is familiar with technical statistical terms such as Type I/II error, sensitivity, specificity, homoscedasticity assumptions, as well as linear mixed models (fixed effects, random effects, etc). I am concerned that this jargon makes the paper difficult to understand for a broad readership or even researchers currently using cRSA that might be interested in trying tRSA.

      We agree this jargon may cause the paper to be difficult to understand. We have expanded/added definitions to these terms throughout the methods and results sections.

      Page 12. “Given data generated with 𝑠<sub>𝑐𝑜𝑛𝑑,𝐴</sub> = 𝑠<sub>𝑐𝑜𝑛𝑑,B</sub>, the correct inference should be a failure to reject the null hypothesis of ; any significant () result in either direction was considered a false positive (spurious effect, or Type I error). Given data generated with , the inference was considered correct if it rejected the null hypothesis of  and yielded the expected sign of the estimated contrast (b<sub>B-𝐴</sub><0). A significant result with the reverse sign of the estimated contrast (b<sub>B-𝐴</sub><0) was considered a Type I error, and a nonsignificant (𝑝 ≥ 0.05) result was considered a false negative (failure to detect a true effect, or Type II error)”.

      Page 2. “Compared to cRSA, the multi-level framework of tRSA was both more theoretically appropriate and significantly sensitive (better able to detect) to true effects”.

      Page 25.”The performance of cRSA and tRSA were quantified with their specificity (better avoids false positives, 1 - Type I error rate) and sensitivity (better avoids false negatives 1 - Type II error rate)”.

      Page 6. “One of the fundamental assumptions of general linear models (step 4 of cRSA; see Figure 1D) is homoscedasticity or homogeneity of variance — that is, all residuals should have equal variance” .

      Page11. “Specifically, a linear mixed-effects model with a fixed effect  of condition (which estimates the average effect across the entire sample, capturing the overall effect of interest) and random effects of both subjects and stimuli (which model variation in responses due to differences between individual subjects and items, allowing generalization beyond the sample) were fitted to tRSA estimates via the `lme4 1.1-35.3` package in R (Bates et al., 2015), and p-values were estimated using Satterthwaites’s method via the `lmerTest 3.1-3` package (Kuznetsova et al., 2017)”.

      (7) I could not find any statement on data availability or code availability. Given that the manuscript reuses prior data and proposes a new method, making data and code/tutorials openly available would greatly enhance the potential impact and utility for the community.

      We thank the reviewer for raising our oversight here. We have added our code and data availability statements.

      Page 9. “Data is available upon request to the corresponding author and our simulations and example tRSA code is available at https://github.com/electricdinolab”.

      Reviewer #1 (Recommendations for the authors):

      (13) Page 4: The limitations of cRSA seem to be based on the assumption that within each different experimental condition, there are different stimuli, which get combined into the condition. The framework of RSA, however, does not dictate whether you calculate a condition x condition RDM or a larger and more complete stimulus x stimulus RDM. Indeed, in practice we often do the latter? Or are you assuming that each stimulus is only shown once overall? It would be useful at this point to spell out these implicit assumptions.

      We agree that stimulus x stimulus RDMs can be constructed and are often used. However, as we mentioned in the Introduction, researchers are often interested in the difference between two (or more) conditions, such as “remembered” vs. “forgotten” (Davis et al., https://doi.org/10.1093/cercor/bhaa269) or “high cognitive load” vs. “low cognitive load” (Beynel et al., https://doi.org/10.1523/JNEUROSCI.0531-20.2020). In those cases, the most common practice with cRSA is to construct condition-specific RDMs, compute cRSA scores separately for each condition, and then compare the scores at the group level. The number of times each stimulus gets presented does not prevent one from creating a model RDM that has the same rows and columns as the brain RDM, either in the same condition (“high load”) or across different conditions.

      (14) Page 5: The difference between condition-level and stimulus-level is not clear. Indeed, this definition seems to be a function of the exact experimental design and is certainly up for interpretation. For example, if I conduct a study looking at the activity patterns for 4 different hand actions, each repeated multiple times, are these actions considered stimuli or conditions?

      We have added clarifying language about what is considered stimuli vs conditions. Indeed, this will depend on the specific research questions being employed and will affect how researchers construct their models. In this specific example, one would most likely consider each different hand action a condition, treating them as fixed effects rather than random effects, given their very limited number and the lack of need to generalize findings to the broader “hand actions” category.

      Page 5. “Critically, the distinction between condition-level and stimulus level is not always clear as researchers may manipulate stimulus-level features themselves. In these cases, what researchers ultimately consider condition-level and stimulus-level will depend on their specific research questions. For example, researchers intending to study generalized object representation may consider object category a stimulus-level feature, while researchers interested in if/how object representation varies by category may consider the same category variable condition-level”.

      (15) Page 5: The fact that different numbers of trials / different levels of measurement noise / noise-covariance of different conditions biases non-cross-validated distances is well known and repeatedly expressed in the literature. We have shown that cross-validation of distances effectively removes such biases - of course, it does not remove the increased estimation variability of these distances (for a formal analysis of estimation noise on condition patterns and variance of the cross-nobis estimator, see (Diedrichsen et al. 2021)).

      We thank the reviewer for drawing our attention to this literature and have added discussions of these methods.

      (16). Page 5: "Most studies present subjects with a fixed set of stimuli, which are supposedly samples representative of some broader category". This may be the case for a certain type of RSA experiments in the visual domain, but it would be unfair to say that this is a feature of RSA studies in general. In most studies I have been involved in, we use a "stimulus" x "stimulus" RDM.

      We have edited this sentence to avoid the “most” characterization. We also added substantial text to the introduction and discussion distinguishing cRSA, which is nonetheless widely employed, especially in cases with a single repetition per stimulus (Macklin et al., 2023, Liu et al, 2024) and the model comparative method and explicitly stating that we do not consider tRSA an alternative to the model comparative approach.

      (17). Page 5: I agree that "stimuli" should ideally be considered a random effect if "stimuli" can be thought of as sampled from a larger population and one wants to make inferences about that larger population. Sometimes stimuli/conditions are more appropriately considered a fixed effect (for example, when studying the response to stimulation of the 5 fingers of the right hand). Techniques to consider stimuli/conditions as a random effect have been published by the group of Niko Kriegeskorte (Schütt et al. 2023).

      Indeed, in some cases what may be thought of as “stimuli” would be more appropriately entered into the model as a fixed effect; such questions are increasingly relevant given the focus on item-wise stimulus properties (Bainbridge et al., Westfall & Yarkoni). We have added text on this issue to the Discussion and caution researchers to employ models that most directly answer their research questions.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question. An effect is fixed when the levels represent the specific conditions of theoretical interest (e.g., task condition) and the goal is to estimate and interpret those differences directly. In contrast, an effect is random when the levels are sampled from a broader population (e.g., subjects) and the goal is to account for their variability while generalizing beyond the sample tested. Note that the same variable (e.g., stimuli) may be considered fixed or random depending on the research questions”.

      (18) Page 6: It is correct that the "classical" RSA depends on a categorical assignment of different trials to different stimuli/conditions, such that a stimulus x stimulus RDM can be computed. However, both Pattern Component Modelling (PCM) and Encoding models are ideally set up to deal with variables that vary continuously on a trial-by-trial or moment-by-moment basis. tRSA should be compared to these approaches, or - as it should be clarified - that the problem setting is actually quite a different one.

      We agree that PCM and encoding models offer a flexible approach and handle continuous trial-by-trial variables. We have clarified the problem setting in cRSA is distinct on page 6, and we have added the robustness of encoding models and their limitations to the Discussion.

      Page 6. “While other approaches such as Pattern Component Modeling (PCM) (Diedrichsen et al., 2018) and encoding models (Naselaris et al., 2011) are well-suited to analyzing variables that vary continuously on a trial-by-trial or moment-by-moment basis, these frameworks address different inferential goals. Specifically, PCM and encoding models focus on estimating variance components or predicting activation from features, while cRSA is designed to evaluate representational geometry. Thus, cRSA as well as our proposed approach address a problem setting distinct from PCM and encoding models”.

      (19) Page 8: "Then, we generated two noise patterns, which were controlled by parameters 𝜎 𝐴 and 𝜎𝐵, respectively, one for each condition." This makes little sense to me. The noise patterns should be unique to each trial - you should generate n_a + n_b noise patterns, no?

      We clarify that the “noise patterns” here are n_voxel x n_trial in size; in other words, all trial-level noise patterns are generated together and each trial has their own unique noise pattern. We have revised our description as “two sets of noise patterns” for clarity starting on page 9.

      (20) Page 9: First, I assume if this is supposed to be a hierarchical level model, the "noise parameters" here correspond to variances? Or do these \sigma values mean to signify standard deviations? The latter would make little sense. Or is it the noise pattern itself?

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (21) Page 10: your formula states "𝜎<sub>𝑠𝑢𝑏𝑗</sub>~ 𝙽(0, 0.5^2)". This conflicts with your previous mention that \sigmas are noise "levels" are they the noise patterns themselves now? Variances cannot be normally distributed, as they cannot be negative.

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (22) Page 13: What was the task of the subject in the Memory retrieval task? Old/new judgements relative to encoding of object perception?

      We apologize for the lack of clarity about the Memory Retrieval task and have added that information and clarified that the old/new judgements were relative to a separate encoding phase, the brain data for which has been reported elsewhere.

      Page 14. “Memory Retrieval took place one day after Memory Encoding and involved testing participants’ memory of the objects seen in the Encoding phase. Neural data during the Encoding phase has been reported elsewhere. In the main Memory Retrieval task, participants were presented with 144 labels of real-world objects, of which 114 were labels for previously seen objects and 30 were unrelated novel distractors. Participants performed old/new judgements, as well as their confidence in those judgements on a four-point scale (1 = Definitely New, 2 = Probably New, 3 = Probably Old, 4 = Definitely Old)”.

      (23) Page 13: If "Memory Retrieval consisted of three scanning runs", then some of the stimulus x stimulus correlations for the RSM must have been calculated within a run and some between runs, correct? Given that all within-run estimates share a common baseline, they share some dependence. Was there a systematic difference between the within-run and the between-run correlations?

      We have clarified in this portion of the methods that within run comparisons were excluded from our analyses. We also double-checked that the within-run exclusion was included in the description of the Neural RSMs.

      Page 14. “Retrieval consisted of three scanning runs, each with 38 trials, lasting approximately 9 minutes and 12 seconds (within-run comparisons were later excluded from RSA analyses)”.

      Page 18. “This was done by vectorizing the voxel-level activation values within each region and calculating their correlations using Pearson’s r, excluding all within-run comparisons.”

      (24) Page 20: It is not clear why the mean estimate of "representational strength" (i.e., model-brain RSM correlations) is important at all. This comes back to Major point #2, namely that you are trying to solve a very different problem from model-comparative RSA.

      We have clarified that our approach is not an alternative to model-comparative RSA, and that depending on the task constraints researchers may choose to compare models with tRSA or other approaches requiring stimulus repetition (see 3).

      (25) Page 21: I believe the problems of simulating correlation matrices directly in the way that the authors in their first simulation did should be well known and should be moved to an appendix at best. Better yet, the authors could start with the correct simulation right away.

      We agree the paper is more concise with these simulations being moved to the appendix and more briefly discussed. We have implemented these changes (Appendix 1). However, we are not certain that this problem is unknown, and have several anecdotes of researchers inquiring about this “alternative” approach in talks with colleagues, thus we do still discuss the issues with this method.

      (26) Page 26: Is the "underlying continuous noise variable 𝜎𝑡𝑟𝑖𝑎𝑙 that was measured by 𝑣𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 " the variance of the noise pattern or the noise pattern itself? What does it mean it was "measured" - how?

      𝜎𝑡𝑟𝑖𝑎𝑙 is a vector of standard deviations for different trials, and 𝜎𝑡𝑟𝑖𝑎𝑙 i would be used to generate the noise patterns for trial i. v_measured is a hypothetical measurement of trial-level variability, such as “memorability” or “heartbeat variability”. We have revised our description to clarify our methods.

      Reviewer #2 (Recommendations for the authors):

      (8) It would be helpful to provide more clarity earlier on in the manuscript on what is a 'trial': in my experience, a row or column of the RDM is usually referred to as 'stimulus condition', which is typically estimated on multiple trials (instances or repeats) of that stimulus condition (or exemplars from that stimulus class) being presented to the subject. Here, a 'trial' is both one measurement (i.e., single, individual presentation of a stimulus) and also an entry in the RDM, but is this the most typical scenario for cRSA? There is a section in the Discussion that discusses repetitions, but I would welcome more clarity on this from the get-go.

      We have added discussion of stimulus repetition methods and datasets to the Introduction and clarified our use of the terms.

      Page 8. “Critically, in single-presentation designs, a “trial” refers to one stimulus presentation, and corresponds to a row or column in the RSM. In studies with repeated stimuli, these rows are often called “conditions” and may reflect aggregated patterns across trials. tRSA is compatible with both cases: whether rows represent individual trials or averaged trials that create “conditions”, tRSA estimates are computed at the row level”.

      (9) The quality of the results figures can be improved. For example, axes labels are hard to read in Figure 3A/B, panels 3C/D are hard to read in general. In Figure 7E, it's not possible to identify the 'dark red' brain regions in addition to the light red ones.

      We thank the reviewer for raising these and have edited the figures to be more readable in the manner suggested.

      (10) I would be interested to see a comparison between tRSA and cRSA in other fMRI (or other modality) datasets that have been extensively reported in the literature. These could be the original Kriegeskorte 96 stimulus monkey/fMRI datasets, commonly used open datasets in visual perception (e.g., THINGS, NSD), or the above-mentioned King et al. dataset, which has been analyzed in various papers.

      We recognize the great utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (11) On P39, the authors suggest 'researchers can confidently replace their existing cRSA analysis with tRSA': Please discuss/comment on how researchers should navigate the choice of modeling parameters in tRSA's linear mixed effects setting.

      We have added discussion of the mixed-effects parameters and the various and encourage researchers to follow best practices for their model selection.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (12) The final part of the Results section, demonstrating the tRSA results for the continuous memorability factor in the real fMRI data, could benefit from some substantiation/elaboration. It wasn't clear to me, for example, to what extent the observed significant association between representational strength and item memorability in this dataset is to be 'believed'; the Discussion section (p38). Was there any evidence in the original paper for this association? Or do we just assume this is likely true in the brain, based on prior literature by e.g. Bainbridge et al (who probably did not use tRSA but rather classic methods)?

      Indeed, memorability effects have been replicated in the literature, but not using the tRSA method. We have expanded our discussion to clarify the relationship of our findings and the relevant literature and methods it has employed.

      Page 38. “Critically, memorability is a robust stimulus property that is consistent across participants and paradigms (Bainbridge, 2022). Moreover, object memorability effects have been replicated using a variety of methods aside from tRSA, including univariate analyses and representational analyses of neural activity patterns where trial-level neural activity pattern estimates are correlated directly with object memorability (Slayton et al, 2025).”

      (13) The abstract could benefit from more nuance; I'm not sure if RSA can indeed be said to be 'the principal method', and whether it's about assessing 'quality' of representations (more commonly, the term 'geometry' or 'structure' is used).

      We have edited the abstract to reflect the true nuisance in the current approaches.

      Abstract. Neural representation refers to the brain activity that stands in for one’s cognitive experience, and in cognitive neuroscience, a prominent method of studying neural representations is representational similarity analysis (RSA). While there are several recent advances in RSA, the classic RSA (cRSA) approach examines the structure of representations across numerous items by assessing the correspondence between two representational similarity matrices (RSMs): usually one based on a theoretical model of stimulus similarity and the other based on similarity in measured neural data.

      (14) RSA is also not necessarily about models vs. neural data; it can also be between two neural systems (e.g., monkey vs. human as in Kriegeskorte et al., 2008) or model systems (see Sucholutsky et al., 2023). This statement is also repeated in the Introduction paragraph 1 (later on, it is correctly stated that comparing brain vs. model is most likely the 'most common' approach).

      We have added these examples in our introduction to RSA.

      Page 3.”One of the central approaches for evaluating information represented in the brain is representational similarity analysis (RSA), an analytical approach that queries the representational geometry of the brain in terms of its alignment with the representational geometry of some cognitive model (Kriegeskorte et al., 2008; Kriegeskorte & Kievit, 2013), or, in some cases, compares the representational geometry of two neural systems (e.g., Kriegeskorte et al., 2008) or two model systems (Sucholutsky et al., 2023)”.

      (15) 'theoretically appropriate' is an ambiguous statement, appropriate for what theory?

      We apologize for the ambiguous wording, and have corrected the text:

      Page 11. “Critically, tRSA estimates were submitted to a mixed-effects model which is statistically appropriate for modeling the hierarchical structure of the data, where observations are nested within both subjects and stimuli (Baayen et al., 2008; Chen et al., 2021)”.

      (16) I found the statement that cRSA "cannot model representation at the level of individual trials" confusing, as it made me think, what prohibits one from creating an RDM based on single-trial responses? Later on, I understood that what the authors are trying to say here (I think) is that cRSA cannot weigh the contributions of individual rows/columns to the overall representational strength differently.

      We thank the reviewer for their clarifying language and have added it to this section of the manuscript.

      “Abstract. However, because cRSA cannot weigh the contributions of individual trials (RSM rows/columns), it is fundamentally limited in its ability to assess subject-, stimulus-, and trial-level variances that all influence representation”.

      (17) Why use "RSM" instead of "RDM"? If the pairwise comparison metric is distance-based (e..g, 1-correlation as described by the authors), RDM is more appropriate.

      We apologize for the error, and have clarified the Methods text:

      Page3-4. First, brain activity responses to a series of N trials are compared against each other (typically using Pearson’s r) to form an N×N representational similarity matrix.

      (18) Figure 2: please write 'Correlation estimate' in the y-axis label rather than 'Estimate'.

      We have edited the label in Figure 2.

      (19) Page 6 'leaving uncertain the directionality of any findings' - I do not follow this argument. Obviously one can generate an RDM or RSM from vector v or vector -v. How does that invalidate drawing conclusions where one e.g., partials out the (dis)similarity in e.g., pleasantness ratings out of another RDM/RSM of interest?

      We agree such an approach does not invalidate the partial method; we have clarified what we mean by “directionality”.

      Page 8. ”For instance, even though a univariate random variable , such as pleasantness ratings, can be conveniently converted to an RSM using pairwise distance metrics (Weaverdyck et al., 2020), the very same RSM would also be derived from the opposite random variable , leaving uncertain of the directionality (or if representation is strongest for pleasant or unpleasant items) of any findings with the RSM (see also Bainbridge & Rissman, 2018)”.

      (20) P7 'sampled 19900 pairs of values from a bi-variate normal distribution', but the rows/columns in an RDM are not independent samples - shouldn't this be included in the simulation? I.e., shouldn't you simulate first the n=200 vectors, and then draw samples from those, as in the next analysis?

      This section has been moved to Appendix 1 (see responses to Reviewer 1.13).

      (21) Under data acquisition, please state explicitly that the paper is re-using data from prior experiments, rather than collecting data anew for validating tRSA.

      We have clarified this in the data acquisition section.

      Page 13. “A pre-existing dataset was analyzed to evaluate tRSA. Main study findings have been reported elsewhere (S. Huang, Bogdan, et al., 2024)”.

      (22) Figure 4 could benefit from some more explanation in-text. It wasn't clear to me, for example, how to interpret the asterisks depicted in the right part of the figure.

      We clarified the meaning of the asterisks in the main text in addition to the existent text in the figure caption.

      Page 26. “see Figure 4, off-diagonal cells in blue; asterisks indicate where tRSA was statistically more sensitive then cRSA)”.

      (23) Page 38 "the outcome of tRSA's improved characterization can be seen in multiple empirical outcomes:" it seems there is one mention of 'outcomes' too many here.

      We have revised this sentence.

      Page 41. “tRSA's improved characterization can be seen in multiple empirical outcomes”.

      (24) Page 38 "model fits became the strongest" it's not clear what aspect of the reported results in the paragraph before this is referring to - the Appendix?

      Yes, the model fits are in the Appendix, we have added this in text citation.

      Moreover, model-fits became the strongest when the models also incorporated trial-level variables such as fMRI run and reaction time (Appendix 3, Table 6).

      References

      Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789

      Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

      Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.

      Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.

      Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.

      King, M. L., Groen, I. I., Steel, A., Kravitz, D. J., & Baker, C. I. (2019). Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage, 197, 368-382.

      Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., ... & Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.

      Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bobu, A., Kim, B., ... & Griffiths, T. L. (2023). Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018.

    2. Reviewer #2 (Public review):

      This paper proposes two changes to classic RSA, a popular method to probe neural representation in neuroimaging experiments: computing RSA at row/column level of RDM, and using linear mixed modeling to compute second level statistics, using the individual row/columns to estimate a random effect of stimulus. The benefit of the new method is demonstrated using simulations and a re-analysis of a prior fMRI dataset on object perception and memory encoding.

      The author's claim that tRSA is a promising approach to perform more complete modeling of cogneuro data, and to conceptualize representation at the single trial/event level (cf Discussion section on P42), is appealing.

      In their revised manuscript, the authors have addressed some previous concerns, now referencing more literature aiming to improve RSA and its associated statistical inferences, and providing more guidance on methodological considerations in the Discussion. However, I wish the authors had more extensively edited the Introduction to better contextualize the work and clarify the specific settings in which they see the method as being beneficial over classic RSA. For example, some of the limitations of cRSA mentioned on page 6, e.g. related to presenting the same stimuli to multiple subjects, seem to be quite specific to settings where the researcher expects differential responses across subjects to fundamentally alter the interpretation, rather than something that will just average out by repeatedly offering the same stimulus, or combining data across subjects. It's not clear to me how the switch from 'matrix-level' to 'row-level' analysis in tRSA necessarily addresses this problem. I would be very helpful if the authors would more explicitly outline what problem the row-level aspect of tRSA is solving; what problem statistical inference via LMM is solving; and walk the reader through a very specific use case (perhaps a toy version of the real-data experiment which is now at the end of the paper). Explaining the utility of tRSA for experimental settings in which assessing representational strength for a single-events is crucial would clarify the contribution of this new method better.

      A few weaknesses mentioned in my previous review were not adequately addressed. To demonstrate the utility of the method on real neural recordings, only a single dataset is used with a quite complicated experimental design; it's not clear if there is any benefit of using tRSA on a simpler real dataset. Moreover, the cells of an RDM/RSM reflect pairwise comparisons between response patterns. Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. While the authors show examples that failure to meet independence assumptions do not affect results in their specific dataset, it does not get acknowledged as a problem at a more fundamental level. Finally, while the paper now states that 'simulations and example tRSA code' are publicly available, the link points to the lab's general github page containing many lab repositories, in which I could not identify a specific repository related to this paper. This is disappointing given that the main goal of this manuscript is to provide a new method that they encourage others to use; a clear pointer to available code is only a minimal requirement to achieve that goal. A dedicated repository, including documentation, READMEs and tutorials/demo's to run simulations, compare methods, etc. would greatly enhance the paper's contribution.

    3. eLife Assessment

      This study proposes a potentially useful improvement on a popular fMRI method for quantifying representational similarity in brain measurements by focusing on representational strength at the single trial level and adding linear mixed effects modeling for group-level inference. The manuscript provides solid evidence of increased sensitivity with no loss of precision compared to more classic versions of the method. However, several assumptions are insufficiently motivated, and it is unclear to what extent the approach would generalize to other paradigms.

    1. eLife Assessment

      This is an important study that provides compelling data from a diverse set of approaches from single cell transcriptome data and network analysis from genetically diverse mouse cells to identify novel driver genes underlying human GWAS associations. The authors present evidence that network analysis of scRNA-seq data from genetically diverse mouse bone-marrow derived stromal cells can be informative for identifying human BMD GWAS driver genes. Their approach should be broadly used and applicable to other GWAS studies.

    2. Reviewer #1 (Public review):

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease.

      Specifically, they utilize a large single cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses.

      The current study builds on previous published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilized a much larger scRNA-seq dataset from 80 DO BMSC-OBs, inferred co-expression based on Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that co-localized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below.

      Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking.

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3).

      Potential drawbacks of the authors' approach include their focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type specific eQTLs. Another issue concerns potential model overfitting in the iterativeWGCNA analysis of mesenchymal cell type-specific co-expression, which identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per cell read depth (400-6200 reads/cell) and drop outs, it's surprising that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting is responsible for these results.

      Overall, though, these concerns are minor relative to the many strengths of the study design and results. Indeed, I expect the analytical framework employed by the authors here will be valuable to -- and replicated by -- researchers in other disease areas.

      Comments on revisions:

      Thank you for addressing my concerns. This is an impressive study and manuscript that you should be proud of.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Farber and colleagues have performed single cell RNAseq analysis on bone marrow derived stem cells from DO Mice. By performing network analysis, they look for driver genes that are associated with bone mineral density GWAS associations. They identify two genes as potential candidates to showcase the utility of this approach.

      Strengths:

      The study is very thorough and the approach is innovative and exciting. The manuscript contains some interesting data relating to how cell differentiation is occurring and the effects of genetics on this process. The section looking for genes with eQTLs that differ across the differentiation trajectory (Figure 4) was particularly exciting.

      Weaknesses:

      The manuscript is, in parts, hard to read due to the use of acronyms and there are some questions about data analysis that still need to be addressed.

      Comments on revisions:

      Dillard et al have made several improvements to their manuscript.

      (1) We previously asked the authors to determine whether any cell types were enriched for BMD-related traits since the premise of the paper is that 'many genes impacting BMD do so by influencing osteogenic differentiation or ... adipogenic differentiation'. Given the potential for the cell culture method to skew the cell type distribution non-physiologically, it is important to establish which cell types in their assay are most closely associated with BMD traits. The new CELLECT analysis and Figure 1E address this point nicely. However, it would still be nice to see the correlations between these cell types and BMD traits in the mice as this would provide independent evidence to support their physiological importance more broadly.

      (2) Shortening the introduction.

      (3) Addressing limitations that arise from not accounting for founder genome SNPs when aligning scRNA-seq data.

      (4) The main take-away of this paper is, to us, the development of a single cell approach to studying BMD-related traits. It is encouraging that the cells post-culture appear to be representative of those pre-culture (supplemental figure 3).

      However, the authors seem to have neglected several comments made by both reviewers. While we share the authors' enthusiasm for the single cell analytical approach, we do not understand their reluctance to perform further statistical tests. We feel that the following comments have still not been addressed:

      (1) The manuscript still contains the following:

      "To provide further support that tradeSeq-identified genes are involved in differentiation, we performed a cell type-specific expression quantitative trait locus (eQTL) analysis for each mesenchymal cell type from the 80 DO mice. We identified 563 genes (eGenes) regulated by a significant cis-eQTL in specific cell types of the BMSC-OB scRNA-seq data (Supplementary Table S14). In total, 73 eGenes were also tradeSeq-identified genes in one or more cell type boundaries along their respective trajectories (Supplementary Table S9)."

      The purpose of this paragraph is to convince readers that the eGenes approach aligns with the tradeSeq approach (and that their approach can therefore be trusted). It is essential that such claims are supported by statistical reasoning. Given that it would be very simple to perform permutation/enrichment analyses to address this point, and both reviewers requested similar analyses, we do not understand the author's reluctance here. Otherwise, this section should be rewritten so that it does not imply that the identification of these genes provides support for their approach.

      (2) Given that a central purpose of this manuscript is to establish a systematic workflow for identifying candidate genes, the manuscript could still benefit from more explanation as to why the authors chose to highlight Tpx2 and Fgfrl1. Tpx2 does already have a role in bone physiology through the IMPC. The authors should comment on why they did not explore Kremen1, for instance, as this gene seems important for the transition to both OB1 and 2.

      A final minor comment is that it would be very helpful if the authors could indicate if the DDGs in Table 1 are also eGenes for the relevant cell type. This is much more meaningful than looking through GTEx.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease. Specifically, they utilize a large single-cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses. The current study builds on previously published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilize a much larger scRNA-seq dataset from 80 DO BMSC-OBs, infer co-expression-based and Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentiation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that colocalized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single-cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below. Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking. 

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3). One weakness involves the focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but the reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type-specific eQTLs. Furthermore, the mesenchymal cell type-specific co-expression analysis by iterative WGCNA identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per-cell read depth (400-6200 reads/cell) and dropouts, it's hard to believe that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting here and would expect that many/most of these identified modules have very few gene members, but the methods list a minimum module size of 20 genes. How do the numbers of modules identified in this study compare to other published scRNA-seq studies that use iterative WGCNA? 

      In the section "Identification of differentiation driver genes (DDGs)", the authors identified 408 significant DDGs and found that 49 (12%) were reported by the International Mouse Knockout [sic] Consortium (IMPC) as having a significant effect on whole-body BMD when knocked out in mice. Is this enrichment significant? E.g., what is the background percentage of IMPC gene knockouts that show an effect on whole-body BMD? Similarly, they found that 21 of the 408 DDGs were genes that have BMD GWAS associations that colocalize with GTEx eQTLs/sQTLs. Given that there are > 1,000 BMD GWAS associations, is this enrichment (21/408) significant? Recommend performing a hypergeometric test to provide statistical context to the reported overlaps here. 

      We thank the reviewer for their constructive feedback and thoughtful questions. In regards to the iterativeWGCNA, a larger number of modules is sometimes an outcome of the analysis, as reported in the iterativeWGCNA preprint (Greenfest-Allen et al., 2017). While we did not make a comparison to other works leveraging this tool for scRNA-seq, it has been used broadly across other published studies, such as PMID: 39640571, 40075303, 33677398, 33653874. While model overfitting, as you mention, may be a cause for more modules, our Bayesian network analysis we perform after iterativeWGCNA highlights smaller aspects of coexpression modules, as opposed to focusing on the entirety of any given module.

      We did not perform enrichment or statistical tests as our goal was to simply highlight attributes or unique features of these genes for additional context.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Farber and colleagues have performed single-cell RNAseq analysis on bone marrow-derived stem cells from DO Mice. By performing network analysis, they look for driver genes that are associated with bone mineral density GWAS associations. They identify two genes as potential candidates to showcase the utility of this approach. 

      Strengths: 

      The study is very thorough and the approach is innovative and exciting. The manuscript contains some interesting data relating to how cell differentiation is occurring and the effects of genetics on this process. The section looking for genes with eQTLs that differ across the differentiation trajectory (Figure 4) was particularly exciting. 

      Weaknesses: 

      The manuscript is in parts hard to read due to the use of acronyms and there are some questions about data analysis that need to be addressed. 

      We thank the reviewer for their feedback and shared enthusiasm for our work. We tried to minimize the use of technical acronyms as much as we could without compromising readability. Additionally, we addressed questions regarding aspects of data analysis. 

      Reviewer #1 (Recommendations for the authors):

      (1) For increased transparency and to allow reproducibility, it would be necessary for the scripts used in the analysis to be shared along with the publication of the preprint. Also, where feasible, sharing the processed data in addition to the raw data would allow the community greater access to the results and be highly beneficial. 

      Thank you for this suggestion. The raw data will be available via GEO accession codes listed in the data availability statement. We will make available scripts for some analyses on our Github (https://github.com/Farber-Lab/DO80_project) and processed scRNA-seq data in a Seurat object (.rds) on Zenodo (https://zenodo.org/records/15299631)

      (2) Lines 55-76: I think the summary of previous work here is too long. I understand that they would like to cover what has been done previously, but this seems like overkill. 

      Good suggestion. We have streamlined some of the summary of our previous work.

      (3) Did the authors try to map QTL for cell-type proportion differences in their BMSC-OBs? While 80 samples certainly limit mapping power, the data shown in Figs 4C/D suggest that you might identify a large-effect modifier of LMP/OB1 proportions. 

      We did try to map QTL for cell type proportion differences, but no significant associations were identified. 

      (4) Methods question: Does the read alignment method used in your analysis account for SNPs/indels that segregate among the DO/CC founder strains? If not, the authors may wish to include this in their discussion of study limitations and speculate on how unmapped reads could affect expression results. 

      The read alignment method we used does not account for SNPs/indels from the DO founder strains that fall in RNA transcripts captured in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424). 

      (5) Much of the discussion reads as an overview of the methods, while a discussion of the results and their context to the existing BMD literature is relatively lacking in comparison.

      We have added additional explanation of the results and context to the discussion (line 381-382, 396-407). 

      (6) Figure 1E and lines 146-149: Adjusted p values should be reported in the figure and accompanying text instead of switching between unadjusted and adjusted p values. 

      We updated Figure 1e to portray adjusted p-values, listed the adjusted p-values in legend of Figure 1e, and listed them in the main text (line 153-154).

      (7) Why do the authors bring the IMPC KO gene list into the analysis so late? This seems like a highly relevant data resource (moreso than the GTEx eQTLs/sQTLs) that could have been used much earlier to help identify DDGs. 

      Given that our scRNA-seq data is also from mice, we did choose to integrate information from the IMPC to highlight supplemental features of genes in networks (i.e., genes that have an experimentally-tested and significant effect on BMD in mice). However, our primary goal was to inform human GWAS and leverage our previous work in which we identified colocalizations between human BMD GWAS and eQTL/sQTL in a human GTEx tissue, which is why this information was used to guide our network analysis.

      (8) Does Fgfrl1 and/or Tpx2 have a cis-eQTL in your BMSC-OB scRNA-seq dataset? 

      We did not identify cis-eQTL effects for Fgfrl1 and Tpx2.

      (9) Figure 4B-C: These eQTLs may be real, but based on the diplotype patterns in Figure 4C, I suspect they are artifacts of low mapping power that are driven by rare genotype classes with one or two samples having outlier expression results. For example, if you look at the results in Fig 4C for S100a1 expression, the genotype classes with the highest/lowest expression have lower sample numbers. In the case of Pkm eQTL showing a PWK-low effect, the PWK genome has many SNPs that differ from the reference genome in the 3' UTR of this gene, and I wonder if reads overlapping these SNPs are not aligning correctly (see point 4 above) and resulting (falsely) in lower expression values for samples with a PWK haplotype. 

      As mentioned above, our alignment method did not consider DO founder genetic variation that is specifically located in the 3’ end of RNA transcripts in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424).

      In future studies, we intend to include larger populations of mice to potentially overcome, as you mention, any artifacts that may be attributable to low statistical power, rare genotype classes, or outlier expression.

      Reviewer #2 (Recommendations for the authors):

      Major Points 

      (1) The authors hypothesize "that many genes impacting BMD do so by influencing osteogenic differentiation or possibly bone marrow adipogenic differentiation". However, cell type itself does not correlate with any bone trait. Does this indicate that the hypothesis is not entirely correct, as genes that drive these phenotypes would not be enriched in one particular cell type? The authors have previously identified "high-priority target genes". So, are there any cell types that are enriched for these target genes? If not, this would indicate that all these genes are more ubiquitously expressed and this is probably why they would have a greater effect on the overall bone traits. Furthermore, are the 73 eGenes (so genes with eQTLs in a particular cell type that change around cell type boundaries) or the DDGs (Table 1) enriched for these high-priority target genes? 

      The bone traits measured in the DO mice are complex and impacted by many factors, including the differentiation propensity and abundance of certain cell types, both within and outside of bone. Though we did not identify correlations between cell type abundance and the bone traits we measured, we tailored our investigations to focus on cellular differentiation using the scRNA-seq data. However, future studies would need to be performed to investigate any connections between cellular differentiation, cell type abundance, and bone traits.

      We did not perform enrichment analyses of either the target genes identified from our other work or eGenes identified here, but instead used the target gene list to center our network analysis and the eGenes to showcase the utility of the DO mouse population.

      (2) The readability of the paper could be improved by minimising the use of acronyms and there are several instances of confusing wording throughout the paper. In many cases, this can be solved by re-organising sentences and adding a bit more detail. For example, it was unclear how you arrived at Fgfrl1 or Tpx2.

      One of the goals of our study was to identify genes that have (to our knowledge) little to no known connection to BMD. We chose to highlight Fgfrl1 and Tpx2 because there is minimal literature characterizing these genes in the context of bone, which we speak to in the results (line 296-297). Additionally, we prioritized these genes in our previous work and they were identified in this study by using our network analyses using the scRNA-seq data, which we mention in the results (line 276-279).

      (3) Technical aspects of the assay. In Figure 1d you show that the cell populations vary considerably between different DO mice. It would be useful to give some sense of the technical variance of this assay given that the assay involves culturing the cells in an exogenous environment. This could take the form of tests between mice within the same inbred strain, or even between different legs of the same DO mice to show that results are technically very consistent. It might also be prudent to identify that this is a potential limitation of the approach as in vitro culturing has the potential to substantially change the cell populations that are present. 

      We agree that in vitro culturing, in addition to the preparation of single cells for scRNA-seq, are unavoidable sources of technical variation in this study. However, the total number of cells contributed by each of the 80 DO mice after data processing does not appear to be skewed and the distribution appears normal (see added figures, now included as Supplemental Figure 3). Therefore, technical variation is at least consistent across all samples. Nevertheless, we have mentioned the potential for technical variation artifacts in our study in the discussion (line 414-416).

      (4) Need for permutation testing. "We identified 563 genes regulated by a significant eQTL in specific cell types. In total, 73 genes with eQTLs were also tradeSeq-identified genes in one or more cell type boundaries". These types of statements are fine but they need to be backed up with permutation testing to show that this level of enrichment is greater than one would expect by chance. 

      We did not perform enrichment tests as our only goal was to 1. determine if eQTL could be resolved in the DO mouse population using our scRNA-seq data and 2. predict in what cell type the associated eQTL and associated eGene may have an effect.

      (5) The main novelty of the paper seems to be that you have used single-cell RNA seq (given that you appear to have already detailed the candidates at the end). I don't think this makes the paper less interesting, but I think you need to reframe the paper more about the approach, and not the specific results. How you landed on these candidates is also not clear. So the paper might be improved by more robustly establishing the workflow and providing guidelines for how studies like this should be conducted in the future. 

      We sought to not only devise a rigorous approach to analyze our single cell data, but also showcase the utility of the approach in practice by highlighting targets for future research (i.e., Fgfrl1 and Tpx2).

      Our goal was to identify novel genes and we landed on these candidate genes (Fgfrl1 and Tpx2) because they had substantial data supporting their causality and they have yet to be fully characterized in the context of bone and BMD (line 295-297).

      In regards to establishing the workflow, we have included rationale for specific aspects of our approach throughout the paper. For example, Figure 2 itemizes each step of our network analysis and we explain why each step is utilized throughout various parts results (e.g., lines 168-170, 179-181, 191-193, 202-203, 257-260, 276-277).

      We have added a statement advocating for large-scale scRNA-seq from genetically diverse samples and network analyses for future studies (line 436-438).

      Minor Points 

      (1) In the summary you use the word "trajectory". Trajectories for what? I assume the transition between cell types, but this is not clear. 

      We added text to clarify the use of trajectory in the summary (line 34).

      (2) This sentence: "By 60 identifying networks enriched for genes implicated in GWAS we predicted putatively causal genes 61 for hundreds of BMD associations based on their membership in enriched modules." is also not clear. Do you mean: we predicted putatively causal genes by identifying clusters of co-expressed genes that were enriched for GWAS genes?" It is not clear how you identify the causal gene in the network. Is this just based on the hub gene? 

      The aforementioned sentence has since been removed to streamline the introduction, as suggested by Reviewer 1.

      In regards to causal gene identification, it is not based on whether it is hub gene. We prioritized a DDG (and their associated networks) if it was a causal gene that we identified in our previous work as having eQTL/sQTL in a GTEx tissue that colocalizes with human BMD GWAS.

      (3) Figure 3C. This is good but the labels are quite small. Would be good to make all the font sizes larger. 

      We have enlarged Figure 3C.

      (4) Line 341 in the Discussion should be "pseudotemporal". 

      We have edited “temporal” to “pseduotemporal”.

    1. eLife Assessment

      This study presents a valuable finding on the neural representation of time from two distinct egocentric and allocentric reference frames. The evidence is solid and largely supports the hypothesis, with one caveat that the task differences could impact the observed effects. The work will be of interest to cognitive neuroscientists working on the perception and memory of time.

    2. Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible temporal construals. For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and participants were instructed to consider the event from "an internal" or from "an external" perspective. The authors found distinct patterns of brain activity in the posterior parietal cortex (PPC) and anterior hippocampus for the internal and the external viewpoint. Specifically, activation in the posterior parietal cortex positively correlated with distance during the external-perspective task, but negatively during the internal-perspective task. The anterior hippocampus positively correlated with distance in both perspectives. The authors conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are supported by the parietal cortex.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and the work tackles them from the perspective of construals theory.

      Weaknesses:

      Although the work uses two distinct psychological tasks, the authors do not elaborate on the cognitive operationalization the tasks entail, nor the implication of the task design for the observed neural activation.

    3. Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses as well as neuroimaging analyses is also much appreciated.

      Suggestions:

      The authors have done a commendable job addressing my previous comments. In particular, the additional analyses elucidating the potential contribution of boundary effects to the behavioural data, the impact of incorporating RT into the fMRI GLMs, and the differential contributions of RT and sequential distance to neural activity (i.e., in PPC) are valuable and strengthen the authors' interpretation of their findings.

      My one remaining suggestion pertains to the potential contribution of boundary effects. While the new analyses suggest that the RT findings are driven by sequential distance and duration independent of a boundary effect (i.e., Same vs. Different factor), I'm wondering whether the same applies to the neural findings? In other words, have the authors run a GLM in which the Same vs. Different factor is incorporated alongside distance and duration?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible "temporal construals". For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and they were instructed to consider the event from "an internal" or from "an external" perspective. The authors found opposite patterns of brain activity in the posterior parietal cortex and the anterior hippocampus for the internal and the external viewpoint. They conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are used in the parietal cortex. The claims align with previous fMRI work addressing this question.

      We appreciate the reviewer's concise summary of our research. We would like to offer two clarifications to prevent any potential misunderstandings.

      First, the activity patterns in the parietal cortex and hippocampus are not entirely opposite across internal and external perspectives. Specifically, the activation level in the posterior parietal cortex shows a positive correlation with sequential distance during external-perspective tasks, but a negative correlation during internal-perspective tasks. In contrast, the activation level in the anterior hippocampus positively correlates with sequential distance, irrespective of the observer's perspective. Therefore, our results suggest that the parietal cortex, with its perspective-dependent activity, supports egocentric representation; the hippocampus, with its consistent activity across perspectives, supports allocentric representation.

      Second, while some of our findings align with previous fMRI studies, to our knowledge, no prior research has explicitly investigated how the neural representation of time may vary depending on the observer's viewpoint. This gap in the literature is the primary motivation for our current study.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and this work seems to want to tackle some of them.

      We appreciate the reviewer's acknowledgment of the theoretical significance of our study.

      Weaknesses:

      The current writing is fuzzy both conceptually and experimentally. I cannot provide a sufficiently well-informed assessment of the quality of the experimental work because there is a paucity of details provided in the report. Any future revisions will likely improve transparency.

      (1) Improving writing and presentation:

      The abstract and the introduction make use of loaded terms such as "construals", "mental timeline", "panoramic views" in very metaphoric and unexplained ways. The authors do not provide a comprehensive and scholarly overview of these terms, which results in verbiage and keywords/name-dropping without a clear general framework being presented. Some of these terms are not metaphors. They do refer to computational concepts that the authors should didactically explain to their readership. This is all the more important that some statements in the Introduction are misattributed or factually incorrect; some statements lack attributions (uncited published work). Once the theory, the question, and the working hypothesis are clarified, the authors should carefully explain the task.

      We appreciate the reviewer's critics.

      The formulation of the scientific question in the introduction is grounded in the spatial construals of time hypothesis and conceptual metaphor theory (e.g., Traugott, 1978; Lakoff & Johnson, 1980; see recent reviews by Núñez & Cooperrider, 2013; Bender & Beller, 2014). These frameworks were originally developed through analyses of how spatial metaphors are used to describe temporal concepts in natural language. Consequently, it is theoretically motivated and largely unavoidable to introduce the two primary temporal construals—mental time travel and mental time watching— using metaphorical expressions.

      However, we do agree with the reviewer that the introduction in the original manuscript was overly long and that the working hypothesis was not clearly stated. In the revised manuscript, we have streamlined the introduction and substantially revised the following two paragraphs to clarify the formulation of our working hypothesis (Pages 5-6):

      “Recent studies have already begun to investigate the neural representation of the memorized event sequence (e.g., Deuker et al., 2016; Thavabalasingam et al., 2018; Bellmund et al., 2019, 2022; see reviews by Cohn-Sheehy & Ranganath, 2017; Bellmund et al., 2020). Yet, the neural mechanisms that enable the brain to construct distinct construals of an event sequence remain largely unknown. Valuable insights may be drawn from research in the spatial domain, which diPerentiates the neural representation in allocentric and egocentric reference frames. According to an influential neurocomputational model (Byrne et al., 2007; Bicanski & Burgess, 2018; Bicanski & Burgess, 2020), allocentric and egocentric spatial representations are dissociable in the brain—they are respectively implemented in the medial temporal lobe (MTL)—including the hippocampus—and the parietal cortex. Various egocentric representations in the parietal cortex derived from diPerent viewpoints can be transformed and integrated into a unified allocentric representation and stored in the MTL (i.e., bottom-up process). Conversely, the allocentric representation in the MTL can serve as a template for reconstructing diverse egocentric representations across diPerent viewpoints in the parietal cortex (i.e., top-down process).”

      “In line with the spatial construals of time hypothesis, several authors have recently suggested that such mutually engaged egocentric and allocentric reference frames (in the parietal cortex and the medial temporal lobe, respectively) proposed in the spatial domain might also apply to the temporal one (e.g., Gauthier & van Wassenhove, 2016ab; Gauthier et al., 2019, 2020; Bottini & Doeller, 2020). If this hypothesis holds, it could explain how the brain flexibly generates diverse construals of the same event sequence. Specifically, the hippocampus may encode a consistent representation of an event sequence that is independent of whether an individual adopts an internal or external perspective, reflecting an allocentric representation of time. In contrast, parietal cortical representations are expected to vary flexibly with the adopted perspective that is shaped by task demands, reflecting an egocentric representation of time.”

      In the revised manuscript, we also corrected statements in the Introduction that may have been misattributed (see Reviewer 2, comment 4(ii)) and added several relevant and important publications.

      (2) The experimental approach lacks sufficient details to be comprehensible to a general audience. In my opinion, the results are thus currently uninterpretable. I highlight only a couple of specific points (out of many). I recommend revision and clarification.

      (a) No explanation of the narrative is being provided. The authors report a distribution of durations with no clear description of the actual sequence of events. The authors should provide the text that was used, how they controlled for low-level and high-level linguistic confounds.

      We thank the reviewer for the suggestions. The event sequence for the odd-numbered participants is shown in the original Figure 1. In the revised manuscript, we added to Figure 1 the figure supplement 1 to illustrate the actual sequence of events for the participants with both odd and even numbers. We also added the narratives used in the reading phase of the learning procedures for the participants with both odd and even numbers (Figure 1—source data 1).

      To control for low-level linguistic confounds, we included the number of syllables as a covariate in the first-level general linear model in the fMRI analysis. To address high-level linguistic confounds, such as semantic information (which is difficult to quantify), we randomly assigned event labels to the 15 events twice, creating two counterbalanced versions for participants with even and odd numbers (see Comment 2b below).

      (b) The authors state, "we randomly assigned 15 phrases to the events twice". It is impossible to comprehend what this means. Were these considered stimuli? Controls? IT is also not clear which event or stimulus is part of the "learning set" and whether these were indicated to be such to participants.

      We apologize for any confusion in the Results section and the legend of Figure 1. Our motivation was explained in the "Stimuli" section of the Methods. In the revised manuscript, we have clarified this by adding an explanation to the legend of Figure 1 and including the supplementary Figure 1: " To minimize potential confounds between the semantic content of the event phrases and the temporal structure of the events, we randomly assigned the phrases to the events, creating two versions for participants with even and odd ID numbers. Both versions can be seen in Figure1—figure supplement 1 and Figure 1—source data 1."

      (c) The left/right counterbalancing is not being clearly explained. The authors state that there is counterbalancing, but do not sufficiently explain what it means concretely in the experiment. If a weak correlation exists between sequential position and distance, it also means that the position and the distance have not been equated within. How do the authors control for these?

      We thank the reviewer for highlighting this point and apologize for the lack of clarity in the original manuscript. In the current version (Page 40), we have provided further clarification: “We carefully selected two sets of 20 event pairs from the 210 possible combinations, assigning them to the odd and even runs of the fMRI experiment. Using a brute-force search, we identified 20 pairs in which sequential distance showed only weak correlations with positional information for both reference and target events (ranging from 1 to 15), as well as with behavioral responses (Same vs. Different or Future vs. Past, coded as 0 and 1), with all correlation coefficients below 0.2. At the same time, we balanced the proportion of correct responses across conditions: for the external-perspective task, Same/Different = 11/9 and 12/8; for the internal-perspective task, Future/Past = 12/8 and 8/12. Under these constraints, the sequential distances in both sets ranged from 1 to 5. To further mitigate spatial response biases, we pseudorandomized the left/right on-screen positions of the two response options within each task block, while ensuring an equal number of correct responses mapped to the left and right buttons (i.e., 10 per block).”

      The event pairs we selected already represent the best possible choice given all the criteria we aimed to satisfy. It is impossible to completely eliminate all potential correlations. For instance, if the target event occurs near the beginning of the day, it will tend to fall in the past, whereas if it occurs near the end of the day, it is more likely to fall in the future. To further ensure that the significant results were not driven by these weak confounding factors, we constructed another GLM that included three additional parametric modulators: the sequence position of the target event (ranging from 1 to 15) and the behavioral responses (Future vs. Past in the internal-perspective task; Same vs. Different in the external-perspective task, coded as 0 and 1). The significant findings were unaffected.

      (d) The authors used two tasks. In the "external perspective" one, the authors asked participants to report whether events were part of the same or a different part of the day. In the "internal perspective one", the authors asked participants to project themselves to the reference event and to determine whether the target event occurred before or after the projected viewpoint. The first task is a same/different recognition task. The second task is a temporal order task (e.g., Arzy et al. 2009). These two asks are radically different and do not require the same operationalization. The authors should minimally provide a comprehensive comparison of task requirements, their operationalization, and, more importantly, assess the behavioral biases inherent to each of these tasks that may confound brain activity observed with fMRI.

      We understand the reviewer’s concern. We agree that there is a substantial difference between the two tasks. However, the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component. Rather, the neural correlates of temporal distance were first identified as brain regions showing a significant correlation between neural activity and temporal distance using the parametric modulation analysis. We then compared these neural correlates between the two tasks. Therefore, any general differences between the tasks should not be a confound for our main results. Our aim was to examine whether the hippocampal representation of temporal distance remains consistent across different perspectives, and whether the parietal representation of temporal distance varies as a function of the perspective adopted.

      Therefore, the main aim of our task manipulation was to ensure that participants adopted either an external or an internal perspective on the event sequence, depending on the task condition. In the Introduction (Pages 6–7), we clarify this manipulation as follows: “In the externalperspective task, participants localized events with respect to external temporal boundaries, judging whether the target event occurred in the same or a different part of the day as the reference event. In the internal-perspective task, participants were instructed to mentally project themselves into the reference event and localize the target event relative to their own temporal point, judging whether the target event happened in the future or the past of the reference event (see Methods for details of the scanning procedure).”

      We believe this task manipulation was successful. Behaviorally, the two tasks showed opposite correlations between reaction time and temporal distance, resembling the symbolic distance versus mental scanning effect. Neurally, contrasting the internal- and external-perspective tasks revealed activation of the default mode network, which is known to play a central role in self-projection (Buckner et al., 2017).

      (e) The authors systematically report interpreted results, not factual data. For instance, while not showing the results on behavioral outcomes, the authors directly interpret them as symbolic distance effects.

      Thank you for this comment. In the original paper, we reported the relevant statistics before our interpretation: “Sequential Distance was correlated positively with RT in the external-perspective task (z = 3.80, p < 0.001) but negatively in the internal-perspective task (z = -3.71, p < 0.001).” However, they may have been difficult to notice, and we are including a figure for the RT analysis in the revised manuscript.

      Crucially, the authors do not comment on the obvious differences in task difficulty in these two tasks, which demonstrates a substantial lack of control in the experimental design. The same/different task (task 1 called "external perspective") comes with known biases in psychophysics that are not present in the temporal order task (task 2 called " internal perspective"). The authors also did not discuss or try to match the performance level in these two tasks. Accordingly, the authors claim that participants had greater accuracy in the external (same/different) task than in the internal task, although no data are shown and provided to support this report. Further, the behavioral effect is trivialized by the report of a performance accuracy trade off that further illustrates that there is a difference in the task requirements, preventing accurate comparison of the two tasks.

      As noted in Question 2d, we acknowledge the substantial difference between the two tasks. However, the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component. Instead, we first identified the neural correlates of temporal distance as brain regions showing a significant correlation between neural activity and temporal distance, independent of task demands. We then compared these neural correlates across the two task conditions, which were designed to engage different temporal perspectives. Therefore, any general differences between the tasks should not be a confound for our main findings and interpretation.

      Our aim was to investigate whether the hippocampal representation of temporal distance remains consistent across different perspectives and whether the parietal representation of temporal distance varies as a function of the perspective adopted. We do not see how this doubledissociation pattern could be explained by differences in task difficulty.

      While we do not consider the overall difference in task difficulty between the two tasks to be a confounding factor, we acknowledge the potential confound posed by variations in task difficulty across temporal distances (1 to 5). This concern arises from the similarity between the activity patterns in the posterior parietal cortex and reaction time across temporal distances. To address this, we conducted control analyses to test this hypothesis (see the second and third points from Reviewer 2 for details).

      On page 8, we present the behavioral accuracy data: “Participants showed significantly higher accuracy in the external-perspective task than in the internal-perspective task (external-perspective task: M = 93.5%, SD = 4.7%; internal-perspective task: M = 89.5%, SD = 8.1%; paired t(31) = 3.33, p = 0.002).”

      All fMRI contrasts are also confounded by this experimental shortcoming, seeing as they are all reported at the interaction level across a task. For instance, in Figure 4, the authors report a significant beta difference between internal and external tasks. It is impossible to disentangle whether this effect is simply due to task difference or to an actual processing of the duration that differs across tasks, or to the nature of the representation (the most difficult to tackle, and the one chosen by the authors).

      We thank the reviewer for pointing out this important issue. Like temporal distance, the neural correlates of duration were not derived from a direct contrast between the two tasks. Instead, they were identified by detecting brain regions showing a significant correlation between neural activity and the implied duration of each event using the parametric modulation analysis. Therefore, what is shown in Figure 4 reflects the significant differences in these neural correlations with duration between the two tasks.

      The observed difference in the neural representation of duration between the two tasks was unexpected. In the original manuscript, we provided a post hoc explanation: “Since the externalperspective task in the current study encouraged the participants to compare the event sequence with the external parallel temporal landmarks, duration representation in the hippocampus may be dampened.”

      However, we agree that this difference might also arise from other factors distinguishing the two tasks. In the revised manuscript, we have clarified this possibility as follows: “The difference in duration representation between the two tasks remains open to interpretation. One possible explanation is that the hippocampus is preferentially involved in memory for durations embedded within event sequences (see review by Lee et al., 2020). In the internal-perspective task, participants indeed localized events within the event sequence itself. In contrast, the externalperspective task encouraged participants to compare the event sequence with external temporal landmarks, which may have attenuated the hippocampal representation of duration.”

      Conclusion:

      In conclusion, the current experimental work is confounded and lacks controls. Any behavioral or fMRI contrasts between the two proposed tasks can be parsimoniously accounted for by difficulty or attentional differences, not the claim of representational differences being argued for here.

      We hope that our explanations and clarifications above adequately address the reviewer’s concerns. We would like to reiterate that we did not directly compare the two tasks. Rather, we first identified the neural representations of sequential distance and duration, and then examined how these representations differed across tasks. It is unclear to us how the overall difference in task difficulty or attentional demands could lead to the observed pattern of results.

      By determining where the neural representations were consistent and where they diverged, we were able to differentiate brain regions that encode temporal information allocentrically from those that represent temporal information in a perspective-dependent manner, modulated by task demands.

      Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task, but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      We sincerely appreciate the reviewers for providing an accurate, comprehensive, and objective summary of our study.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out, and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses, as well as neuroimaging analyses, is also much appreciated.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses/Suggestions:

      Although the design and analysis choices are generally solid, there are a few finer details/nuances that merit further clarification or consideration in order to strengthen the readers' confidence in the authors' interpretation of their data.

      (1) Given the known behavioural and neural effects of boundaries in sequence memory, I was wondering whether the number of traversed context boundaries (i.e., between morning-afternoon, and afternoon-evening) was controlled for across sequential length in the internal perspective condition? Or, was it the case that reference-target event pairs with higher sequential numbers were more likely to span across two parts of the day compared to lower sequential numbers? Similarly, did the authors examine any potential differences, whether behaviourally or neurally, for day part same vs. day part different external task trials?

      We thank the reviewer for the thoughtful comments. When we designed the experiment, we minimized the correlation between the sequential distance between the target and reference events and whether the reference and target events occurred within the same or different parts of the day (coded as Same = 0, Different = 1). The point-biserial correlation coefficient between these two variables across all the trials within the same run were controlled below 0.2.

      To investigate the effect of day-part boundaries on behavior, as well as the contribution of other factors, we conducted a new linear mixed-effects model analysis incorporating four additional variables. They are whether the target and the reference events are within the same or different parts of the day (i.e., Same vs. Different), whether the target event is in the future or the past of the reference event (i.e., Future vs. Past), and the interactions of the two factors with Task Type (i.e., internal- vs. external-perspective task).

      The results are largely the same as the original one in the table: There was a significant main effect of Syllable Length, and the interaction effects between Task Type and Sequence Distance and between Task Type and Duration remain significant. What's new is we also found a significant interaction effect between Task Type and Same vs. Different.

      As shown in the Figure 2—figure supplement 1, this Same vs. Different effect was in line with the effect of Sequential Distance, with two events in the same and different parts of the day corresponding to the short and long sequential distances. Given that Sequential Distance had already been considered in the model, the effect of parts of the day should result from the boundary effect across day parts or the chunking effect within day parts, i.e., the sequential distance across different parts of the day was perceived longer while the sequential distance within the same parts of the day was perceived shorter. We have incorporated these findings into the manuscript.

      Neurally, to further verify that the significant effects of sequential distance were not driven by its weak correlation with the Same/Different judgment or other potential confounding factors, we constructed another GLM that incorporated three additional parametric modulators: the sequence position of the target event (ranging from 1 to 15) and the behavioral responses (Future vs. Past in the internal-perspective task; Same vs. Different in the external-perspective task, coded as 0 and 1). The significant findings were unaffected.

      (2) I would appreciate further insight into the authors' decision to model their task trials as stick functions with duration 0 in their GLMs, as opposed to boxcar functions with varying durations, given the potential benefits of the latter (e.g., Grinband et al., 2008). I concur that in certain paradigms, RT is considered a potential confound and is taken into account as a nuisance covariate (as the authors have done here). However, given that RTs appear to be critical to the authors' interpretation of participant behavioural performance, it would imply that variations in RT actually reflect variations in cognitive processes of interest, and hence, it may be worth modelling trials as boxcar functions with varying durations.

      We appreciate the reviewer’s insightful comment on this important issue. Whether to control for RT’s influence on fMRI activation is indeed a long-standing paradox. On the one hand, RT reflects underlying cognitive processes and therefore should not be fully controlled for. On the other hand, RT can independently influence neural activity, as several brain networks vary with RT irrespective of the specific cognitive process involved—a domain-general effect. For example, regions within the multiple-demand network are often positively correlated with RT across different cognitive domains.

      Our strategy in the manuscript is to first present the results without including RT as a control variable and then examine whether the effects are preserved after controlling for RT. In the revised manuscript, we have clarified this approach (Page 13): “Here, changes in activity levels within the PPC were found to align with RT. Whether to control for RT’s influence on fMRI activation represents a well-known paradox. On the one hand, RT reflects underlying cognitive processes and therefore should not be fully controlled for. On the other hand, RT can independently influence neural activity, as several brain networks vary with RT irrespective of the specific cognitive process involved—a domain-general effect. For instance, regions within the multiple-demand network are often positively correlated with RT and task difficulty across diverse cognitive domains (e.g., Fedorenko et al., 2013; Mumford et al., 2024). To evaluate the second possibility, we conducted an additional control analysis by including trial-by-trial RT as a parametric modulator in the first-level model (see Methods). Notably, the same PPC region remained the only area in the entire brain showing a significant interaction between Task Type and Sequential Distance (voxel-level p < 0.001, clusterlevel FWE-corrected p < 0.05). This finding indicates that PPC activity cannot be fully attributed to RT. Furthermore, we do not interpret the effect as reflecting a domain-general RT influence, as regions within the multiple-demand system—typically sensitive to RT and task difficulty—did not exhibit significant activation in our data.”

      The reason we did not use boxcar functions with varying durations in our original manuscript is that we also applied parametric modulation in the same model. In the parametric modulation, all parametric modulators inherit the onsets and durations of the events being modulated. Consequently, the modulators would also take the form of boxcar functions rather than stick functions—the height of each boxcar reflecting the parameter value and its length reflecting the RT. We were uncertain whether this approach would be appropriate, as we have not encountered other studies implementing parametric modulation in this manner.

      For exploratory purposes, we also conducted a first-level analysis using boxcar functions with variable durations. The same PPC region remained the strongest area in the entire brain that shows an interaction effect between Task Type and Sequential Distance. However, the cluster size was slightly reduced (voxel-level p < 0.001, cluster-level FWE-corrected p = 0.0610; see the Author response image 1 below). The cross indicates the MNI coordinates at [38, –69, 35], identical to those shown in the main results (Figure 4A).

      Author response image 1.

      (3) The activity pattern across tasks and sequential distance in the posterior parietal cortex appears to parallel the RT data. Have the authors examined potential relationships between the two (e.g., individual participant slopes for RT across sequential distance vs. activity betas in the posterior parietal cortex)?

      We thank the reviewer for this helpful suggestion. As shown in the Author response image 2, the interaction between Task Type and Sequential Distance was a stronger predictor of PPC activation than of RT. Because PPC activation and RT are measured on different scales, we compared their standardized slopes (standardized β) measuring the change in a dependent variable in terms of standard deviations for a one-standard-deviation increase in an independent variable. The standardized β for the Task Type × Sequential Distance interaction was −0.30 (95% CI [−0.42, −0.19]) for PPC activation and −0.21 (95% CI [−0.30, −0.13]) for RT. The larger standardized effect for PPC activation indicates that the Task Type × Sequential Distance interaction was a stronger predictor of neural activation than of behavioral RT.

      Author response image 2.

      A more relevant question is whether PPC activation can be explained by temporal information (i.e., the sequential distance) independently of RT. To test this, we included both Sequential Distance and RT in the same linear mixed-effects model predicting PPC Activation Level. As shown in the Author response table 1, although RT independently influenced PPC activation (F(1, 288) = 4.687, p = 0.031), the interaction between Task Type and Sequential Distance was a much stronger independent predictor (F(1, 290) = 19.319, p < 0.001).

      Author response table 1.

      PPC Activation Level Predicted by Sequential Distance and RT

      (3) Linear Mixed Model Formula: PPC Activation Level ~ 1 + Task Type * (Sequential Distance + RT) + (1 | Participant)

      (4) There were a few places in the manuscript where the writing/discussion of the wider literature could perhaps be tightened or expanded. For instance:

      (i) On page 16, the authors state 'The negative correlation between the activation level in the right PPC and sequential distance has already been observed in a previous fMRI study (Gauthier & van Wassenhove, 2016b). The authors found a similar region (the reported MNI coordinate of the peak voxel was 42, -70, 40, and the MNI coordinate of the peak voxel in the present study was 39, -70, 35), of which the activation level went up when the target event got closer to the self-positioned event. This finding aligns with the evidence suggesting that the posterior parietal cortex implements egocentric representations.' Without providing a little more detail here about the Gauthier & van Wassenhove study and what participants were required to do (i.e., mentally position themselves at a temporal location and make 'occurred before' vs. 'occurred after' judgements of a target event), it could be a little tricky for readers to follow why this convergence in finding supports a role for the posterior parietal cortex in egocentric representations.

      We appreciate the reviewer’s comments. In the revised manuscript, we have provided a more detailed explanation of Gauthier and van Wassenhove’s study (Page 17): “The negative correlation between the activation level in the right PPC and sequential distance has already been observed in a previous fMRI study by Gauthier & van Wassenhove (2016b). In their study, the participants were instructed to mentally position themselves at a specific time point and judge whether a target event occurred before or after that time point. The authors identified a similar brain region (reported MNI coordinates of the peak voxel: 42, −70, 40), closely matching the activation observed in the present study (MNI coordinates of the peak voxel: 39, −70, 35). In both studies, activation in this region increased as the target event approached the self-positioned time point, which aligns with the evidence suggesting that the posterior parietal cortex implements egocentric representations.”

      (ii) Although the authors discuss the Lee et al. (2020) review and related studies with respect to retrospective memory, it is critical to note that this work has also often used prospective paradigms, pointing towards sequential processing being the critical determinant of hippocampal involvement, rather than the distinction between retrospective vs. prospective processing.

      We sincerely thank the reviewer for highlighting these important points. In response, we have revised the section of the Introduction discussing the neural underpinnings of duration (Pages 3-4). “Neurocognitive evidence suggests that the neural representation of duration engages distinct brain systems. The motor system—particularly the supplementary motor area—has been associated with prospective timing (e.g., Protopapa et al., 2019; Nani et al., 2019; De Kock et al., 2021; Robbe, 2023), whereas the hippocampus is considered to support the representation of duration embedded within an event sequence (e.g., Barnett et al., 2014; Thavabalasingam et al., 2018; see also the comprehensive review by Lee et al., 2020).”

      (iii) The authors make an interesting suggestion with respect to hippocampal longitudinal differences in the representation of event sequences, and may wish to relate this to Montagrin et al. (2024), who make an argument for the representation of distant goals in the anterior hippocampus and immediate goals in the posterior hippocampus.

      We thank the reviewer for bringing this intriguing and relevant study to our attention. In the Discussion of the manuscript, we have incorporated it into our discussion (Page 21): “Evidence from the spatial domain has suggested that the anterior hippocampus (or the ventral rodent hippocampus) implements global and gist-like representations (e.g., larger receptive fields), whereas the posterior hippocampus (or the dorsal rodent hippocampus) implements local and detailed ones (e.g., finer receptive fields) (e.g., Jung et al., 1994; Kjelstrup et al., 2008; Collin et al., 2015; see reviews by Poppenk et al., 2013; Robin & Moscovitch, 2017; see Strange et al., 2014 for a different opinion). Recent evidence further shows that the organizational principle observed along the hippocampal long axis may also extend to the temporal domain (Montagrin et al., 2024). In that study, the anterior hippocampus showed greater activation for remote goals, whereas the posterior hippocampus was more strongly engaged for current goals, which are presumed to be represented in finer detail.”

      Reviewing Editor Comments:

      While both reviewers acknowledged the significance of the topic, they raised several important concerns. We believe that providing conceptual clarification, adding important methodological details, as well as addressing potential confounds will further strengthen this paper.

      We thank the editor for the suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please, provide the actual ethical approval #.

      We have added the ethical approval number in the revised manuscript (P 36): “The ethical committee of the University of Trento approved the experimental protocol (Approval Number 2019-018),”

      (2) Thirty-two participants were tested. Please report how you estimated the sample size was sufficient to test your working hypothesis.

      We thank the editor for pointing out this omission. In the revised manuscript, we have added an explanation for our choice of sample size (p. 36): “The sample size was chosen to align with the upper range of participant numbers reported in previous fMRI studies that successfully detected sequence or distance effects in the hippocampus (N = 15–34; e.g., Morgan et al., 2011; Howard et al., 2014; Deuker et al., 2016; Garvert et al., 2017; Theves et al., 2019; Park et al., 2021; Cristoforetti et al., 2022).”

      (3) All MRI figures: please orient the reader; left/right should be stated.

      In the revised manuscript, we have added labels to all MRI figures to indicate the left and right hemispheres.

      (4) In Figure 3A-B, the clear lateralization of the activation is not discussed in the Results or in the Discussion. Was it predicted?

      We thank the editors for highlighting this important point regarding hemispheric lateralization. The right-lateralization observed in our findings is indeed consistent with previous literature. In the revised manuscript, we have expanded our discussion to emphasize this aspect more clearly.

      For the parietal cortex, we now note (Page 17-18): “The negative correlation between activation in the right posterior parietal cortex (PPC) and sequential distance has previously been reported in an fMRI study by Gauthier and van Wassenhove (2016b). In their paradigm, participants were instructed to mentally position themselves at a specific time point and judge whether a target event occurred before or after that point. The authors identified a similar region (peak voxel MNI coordinates: 42, −70, 40), closely corresponding to the activation observed in the present study (peak voxel MNI coordinates: 39, −70, 35). In both studies, activation in this region increased as the target event approached the self-positioned time point, consistent with evidence suggesting that the posterior parietal cortex supports egocentric representations. Neuropsychological studies have further shown that patients with lesions in the bilateral or right PPC exhibit ‘egocentric disorientation’ (Aguirre & D’Esposito, 1999), characterized by an inability to localize objects relative to themselves (e.g., Case 2: Levine et al., 1985; Patient DW: Stark, 1996; Patients MU: Wilson et al., 1997, 2005).”

      For the hippocampus, we have added (Page 19): “Previous research has shown that hippocampal activation correlates with distance (e.g., Morgan et al., 2011; Howard et al., 2014; Garvert et al., 2017; Theves et al., 2019; Viganò et al., 2023), and that distributed hippocampal activity encodes distance information (e.g., Deuker et al., 2016; Park et al., 2021). Most studies have reported hippocampal ePects either bilaterally or predominantly in the right hemisphere, whereas only one study (Morgan et al., 2011) found the ePect localized to the left hippocampus.”

    1. eLife Assessment

      This manuscript describes a useful integrated proteogenomics pipeline to enable the discovery of novel peptides in cancer cell lines. The method combines long-read RNA sequencing with a multi-protease digestion and proteomics approach. The method is a further development of the authors' previous approaches to identify cancer-specific peptides; however, the current study focuses on a single cell line, and the characterization remains incomplete and lacks validation for candidate alterations. The manuscript will be of interest to scientists focusing on identifying unique alterations in cancer cells.

    2. Reviewer #1 (Public review):

      In this study, the authors provide an integrated proteogenomics pipeline to enable the discovery of novel peptides in an Ewing sarcoma cell line (A673). To identify novel full-length resolved isoforms, they performed long-read RNA sequencing (Oxford Nanopore Technology). Then, to increase the chance of detecting Ewing-specific neopeptides, the authors combined two approaches: a multi-protease digestion and a multi-dimensional proteomics approach.

      Given the importance of novel isoforms and cryptic sites in neoantigen discovery and its putative applications in immunotherapy, this method and resource paper are of interest for the Ewing community and potentially for a broader cancer audience. The originality of this paper relies mostly on this optimized method to discover novel peptides (long-read sequencing with multiprotease, multi-dimensional trapped ion mobility spectrometry parallel accumulation-serial fragmentation mass spectrometry). Although, to my knowledge, no study combining long-read sequencing and proteomics methods has been published on Ewing Sarcoma, this study appears limited by a few aspects:

      (1) The study is restricted to the analysis of a single cell line (A673). The authors should consider extending the analysis to other Ewing cell lines.

      (2) The characterization of the 1121 non-canonical transcripts can be improved. How many are just splice variants of known genes, and how many are bona fide neogenes? In this respect, the definition of what the authors call neogene is quite unclear. Is a transcript with a new exon reported as a neogene? Is a transcript with a new start site reported as a neogene? It should be clearly indicated which categories of Figure 4B are reported on Figure 4D. A general flow chart would be very useful to help follow the analysis process.

      (3) Similarly, the authors detect 3216 A673 specific proteins with no match in SwissProt. This number decreases to 72 "putative non-canonical proteoforms with unique peptides after BLASTp" against Uniprot. Again, a flow chart would conveniently enable one to follow the step-by-step analysis.

      (4) Finally, only 17 spectral matches are suggested to be derived from non-canonical proteoforms. It would be important to compare the spectrum of these detected peptides with that of synthetic peptides. Such an analysis would enable us to assess the number of reliably detected proteoforms that can be expected in an Ewing sarcoma cell line.

      (5) It is very unclear what the authors want to highlight in Supplementary Figure 5. Is it that non-canonical transcripts are broadly expressed in normal tissue? Which again raises the question of definitions of neogenes, non-canonical... Apparently, this figure shows that these non-canonical transcripts contain a large part of canonical sequences, which account for the strong signal in many normal tissues. A similar heatmap could be presented, including only the non-canonical sequences of the non-canonical transcripts. This figure should also include Ewing sarcoma samples.

    3. Reviewer #2 (Public review):

      The paper from Kulej et al. reports a set of tools for proteogenomic analysis of cancer proteomes. Their approach utilizes modern methods in long-read RNA sequencing to assemble a proteome database that is specific to Ewing sarcoma-derived A673 cells. To maximize proteome coverage and therefore increase the odds of detecting cancer-specific alterations at the protein level, the authors use multiple enzymes (trypsin, gluC, etc.) to digest cellular proteins and then perform multidimensional peptide fractionation. Peptide samples are then analyzed by LC-MS/MS using data-dependent and data-independent schemes on a timstof mass spectrometer. Proteogenomics is an important area of investigation for cancer research and does require new informatics tools.

      The authors describe an end-to-end workflow where they claim to have optimized four different steps:

      (1) Assembly of a sample-specific protein database using long-read transcriptomic data.

      (2) Use of 8 different proteolytic enzymes to maximize diversity of peptides.

      (3) Multiple stages of peptide fractionation using SCX and high pH rp chromatography.

      (4) Utilize acquisition methods on the timstof mass spec to provide MS/MS data from single-charged peptides and multiply-charged peptides.

      The authors published two earlier versions of ProteomeGenerator (versions 1 and 2) in the Journal of Proteome Research. In these earlier versions, 'ProteomeGenerator' was the set of software tools designed to integrate DNA and RNA sequencing to create a sample-specific protein database. To test the performance of each ProteomeGenerator version, the authors generated LC-MS/MS data using a combination of trypsin and LysC, then in the other paper, trypsin, LysC, and GluC. In both papers, they performed some levelof peptide fractionation prior to LC-MS/MS. They acquired LC-MS/MS data on a Thermo Q-Exactive in one paper and a Thermo Orbitrap mass spec in the other paper.

      In the current paper, the primary innovation is the use of long-read sequencing to potentially improve the quality of the sample specific protein database. The other three components noted above are incremental compared to the authors' previous two papers and generally accepted practices in the field of proteomics. To note one example, the authors previously digested proteins using three enzymes and now use eight. Similarly, they are now using a timstof Bruker mass spec instead of one from Thermo. The detailed descriptions around the use of many enzymes and peptide fractionation, etc., create a very technically oriented paper, similar to or more so than the authors' earlier papers in J. Proteome Research. So, while there is enthusiasm for the use of long-read sequencing across biomedical research, the impact here for proteogenomic applications is somewhat lost with all of the technical description for experimental details that are not particularly innovative. In this respect, the report is not well matched to a broad readership.

    4. Author response:

      We thank you and reviewers for their thoughtful, constructive, and fair evaluation of our manuscript. We appreciate the recognition of the value of an end-to-end proteogenomics framework integrating long-read transcriptomics with deep proteomic analysis, and we are grateful for the specific guidance on how to strengthen clarity, generality, and impact for a broad scientific readership. We outline below the key revisions we plan to undertake in response to the public reviews.

      Reviewer #1

      We thank the reviewer for their positive assessment of the relevance of this work to Ewing sarcoma and cancer proteogenomics.

      Scope and generality.

      We agree that analysis of a single cell line limits generalization. In the revised manuscript, we will extend the ProteomeGenerator3 workflow to additional tumor specimens, including Ewing sarcoma tumors, to assess reproducibility and biological relevance beyond a single test cancer cell line.

      Definitions and analytical clarity.

      We will clarify definitions of non-canonical transcripts, alternative splice isoforms, and neogenes, and explicitly distinguish these categories throughout the manuscript. We will add a summary flow diagram that tracks transcripts through classification, ORF prediction, and proteoform detection, clarifying how Figures 4B and 4D relate.

      Proteoform filtering and confidence.

      To improve transparency, we will add a step-wise schematic summarizing how candidate non-canonical proteoforms are filtered to a high-confidence subset, including SwissProt comparison, BLASTp filtering, peptide uniqueness, and competitive database searches.

      Validation.

      We agree that orthogonal validation is important. We will include additional analyses of non-canonical proteofoms detected recurrently in additional tumor specimens to provide an empirical estimate of reliably detectable non-canonical proteoforms.

      Supplementary Figure 5.

      We will revise the presentation and explanation of this figure to avoid misinterpretation, including analyses focused specifically on non-canonical sequence segments and inclusion of tumor samples for direct comparison.

      Reviewer #2

      We thank the reviewer for placing this work in context with our prior ProteomeGenerator publications and for their guidance on framing the manuscript for a broad audience.

      Emphasizing the central conceptual advance.

      We agree that the primary innovation is the use of long-read transcriptomics to generate sample-specific proteogenomic databases. In the revised manuscript, we will directly compare long-read-derived and short-read-derived databases applied to the same samples and proteomic data, explicitly demonstrating where long-read sequencing enables discovery inaccessible to short-read approaches.

      Manuscript reorganization.

      We will substantially revise the manuscript to foreground the biological and conceptual consequences of long-read-enabled proteogenomics, using focused examples. Detailed descriptions of protease selection, fractionation, and acquisition optimization will be moved to supplementary methods, while retaining key conclusions about their impact on discovery.

      Positioning of technical advances.

      We will frame multi-protease and acquisition strategies as general principles required for unbiased proteoform discovery, rather than as static technical prescriptions, emphasizing their relevance across evolving proteomics platforms.

      Overall Significance

      In the revised manuscript, we will more clearly articulate that this work establishes long-read-informed, sample-specific proteogenomics as a discovery-grade framework, revealing cancer-specific proteoforms that are systematically invisible to reference-based and short-read-driven approaches, with broad implications for cancer biology and biomarker discovery.

      We thank the editors and reviewers again for their constructive feedback, which we believe will substantially strengthen the clarity and broad impact of this work.

    1. eLife Assessment

      This study provides an important contribution by showing that whiteflies and planthoppers use salivary effectors to suppress plant immunity through the receptor-like protein RLP4, suggesting convergent evolution in these insect lineages. The topic is of clear interest for understanding plant-insect interactions and offers ideas that could stimulate further research in the field. The authors provide mostly solid evidence for the functional roles of the salivary effectors; however, the interpretation of the physiological function of RLP4 in plant defense requires clarification.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

    3. Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein. Unlike previously reported receptor-like proteins with large ligand-binding domains, the NtRLP4 here has a malectin LRR domain. Interestingly, it also associates with the adaptor SOBIR1. While the function of this protein remains to be further explored, the authors provide strong evidence showing it's the target of salivary proteins as the insects' survival strategy.

      Major points:

      The authors mixed the concepts of LRR-RLPs with malectin LRR-RLPs. These are two different type of receptors. While LRR-RLPs are well studied, little is known about malectin LRR-RLPs. The authors should not simply apply the mode of function of LRR-RLPs to RLP4 which is a malectin LRR-RLP. In addition, LRR-RLPs that function as ligand-binding receptors typically possess >20 LRRs, whereas RLP4 in this work has a rather small ectodomain. It remains unclear whether it will function as a PRR.

      I can't agree with the author's logic of testing uninfested plants for proving a PRR's function. The function of a pattern recognition receptor depends on perceiving the corresponding ligand. As shown by the data provided, RLP4-OE plants have altered transcriptional profile indicating activated defense, suggesting it's unlikely a PRR. An alternative explanation is needed.

      More work on BAK1 will also help to clarify the ideas proposed by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al., investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Two minor comments:

      In line 140, yeast two-hybrid (Y2H) was used to screen for interacting proteins in plants. However, it is generally difficult to identify membrane receptors using Y2H. Please provide more methodological details to justify this approach, or alternatively, include a discussion explaining this.

      In Figure S12C, the interaction between the two proteins appears to be present in the nucleus as well. Please provide a possible explanation for this observation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a well-structured and interesting manuscript that investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

      Weaknesses:

      Western blot evidence for effector secretion is weak. The possibility of contamination from insect tissues during the sample preparation should be avoided.

      Below are some specific comments and suggestions to strengthen the manuscript.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      (1) Western blot evidence for effector secretion:

      The western blot evidence in Figure 1, which aims to show that the insect protein is secreted into plants, is not fully convincing. The band of the expected size (~30 kDa) in the infested tissues is very weak. Furthermore, the high and low molecular weight bands that appear in the infested tissues do not match the size of the protein in the insects themselves, and a high molecular weight band also appears in the uninfested control tissues. It is difficult to draw a definitive conclusion that this protein is secreted into the plants based on this evidence. The authors should also address the possibility of contamination from insect tissues during the sample preparation and explain how they have excluded this possibility.

      Thank you for pointing out this. One or two bands between 25-35kDa were specifically identified in B. tabaci-infested plants, but not the non-infested plants, and the smaller high intensity band is the same size as that of BtRDP in salivary glands. This experiment has been repeated for six times. In the current version, we reperformed this experiment, and provided salivary gland sample as a positive control, which showed the same molecular weight with a specific band in infested sample. It is noteworthily that in the experiment of current version, only the smaller high intensity band appear, while the low intensity band did not appear. The detection of a protein within infested plant tissue is a key criterion for validating the secretion of salivary effectors, an approach supported by numerous studies in this field. Furthermore, our previous LC-MS/MS analysis of B. tabaci watery saliva identified six unique peptides matching BtRDP, providing independent evidence for its presence in saliva. Therefore, as we now state in the manuscript “the detection of BtRDP in infested plants (Fig. 1a) and in watery saliva (Fig. S1) collectively indicates that BtRDP is a salivary protein”.

      Regarding the higher molecular weight band that present in both infested and non-infested samples, we agree that it most likely represents a non-specific band, which is a common occurrence in Western blot assays. Such bands are sometimes used to indicate comparable sample loading. To address the possibility of contamination by insect tissues, we wish to clarify that all insects and deposited eggs were carefully removed from the infested leaves prior to sample processing. Moreover, BtRDP is undetectable at the egg stage, and no BtRDP-associated band can be detected even in egg contamination. We have revised the Methods section to explicitly state this procedure:

      “After feeding, the eggs deposited on the infested tobacco leaves were removed. The leaves showing no visible insect contamination were immediately frozen in liquid nitrogen and ground to a fine powder.”

      (2) Inconsistent conclusion (Line 156 and Figure 3c):

      The statement in line 156 is inconsistent with the data presented in Figure 3c. The figure clearly shows that the LRR domain of the protein is the one responsible for the interaction with BtRDP, not the region mentioned in the text. This is a critical misrepresentation of the experimental findings and must be corrected. The conclusion in the text should accurately reflect the data from the figure.

      We apologize for any confusion caused by the original phrasing. In our previous manuscript, the description “NtRLP4 without signal peptides and transmembrane domains” referred specifically to the truncated construct NtRLP4<sub>(23-541)</sub> used in the experiment. To prevent any misunderstanding, we have revised the sentence in the updated version to state explicitly: “Point-to-point Y2H assays reveal that NtRLP4<sub>(23-541)</sub> (a truncated version lacking the signal peptide and transmembrane domains) interacts with BtRDP<sup>-sp</sup>”.

      (3) Role of SOBIR1 in the RLP4/SOBIR1 Complex:

      The authors demonstrate that the salivary effectors destabilize the RLP4 receptor, leading to a decrease in its protein levels and a reduction in the RLP4/SOBIR1 complex. A key question remains regarding the fate of SOBIR1 within this complex. The authors should clarify what happens to the SOBIR1 protein after the destabilization of RLP4. Does SOBIR1 become unbound, targeted for degradation itself, or does it simply lose its function without RLP4? This would provide further insight into the mechanism of action of the effectors.

      Thank you for suggestion. In the current version, we assessed the impact of BtRDP on NtSOBIR1 following NtRLP4 destabilization. The results showed that while the NtRLP4-myc accumulation was markedly reduced, NtSOBIR1-flag levels remained unchanged, suggesting that destabilization of NtRLP4 did not affect NtSOBIR1 accumulation.

      (4) Clarification on specificity and evolutionary claims:

      The paper's most significant claim is that the effectors from both whiteflies and planthoppers "independently evolved" to target RLP4. While the functional data is compelling, this evolutionary claim would be more convincing with stronger evidence. Showing that two different effector proteins target the same host protein is a fascinating finding but without a robust phylogenetic analysis, the claim of independent evolution is not fully supported. It would be valuable to provide a more detailed evolutionary analysis, such as a phylogenetic tree of the effector proteins, showing their relationship to other known insect proteins, to definitively rule out a shared, but highly divergent, common ancestor.

      We appreciate the reviewer’s valuable suggestion to investigate a potential evolutionary link between BtRDP and NlSP104. Our initial analysis already indicated no detectable sequence similarity. To address this point more thoroughly, we attempted a phylogenetic analysis. However, we were unable to generate a meaningful alignment due to a complete lack of conserved amino acid sequences. Therefore, we conducted a comparative genomics analysis by blasting both proteins against the genomic or transcriptomic data of 30 diverse insect species. This analysis revealed that RDP is exclusively present in Aleyrodidae species, and SP104 is exclusively present in Delphacidae species (Table S1). Taken together, the absence of sequence similarity, their distinct protein structure, and their lineage-specific distributions, we conclude that BtRDP and NlSP104 are highly unlikely to be homologous and thus did not originate from a common ancestor.

      (5) Role of SOBIR1 in the interaction:

      The results suggest that the effectors disrupt the RLP4/SOBIR1 complex. It is not entirely clear if the effectors are specifically targeting RLP4, SOBIR1, or both. Further experiments, such as a co-immunoprecipitation assay with just RLP4 and the effector, could clarify if the effector can bind to RLP4 in the absence of SOBIR1. This would help to definitively place RLP4 as the primary target.

      We appreciate the reviewer’s insightful comments regarding whether the effector preferentially targets RLP4, SOBIR1, or both. In our study, we conducted reciprocal co-immunoprecipitation assays using RLP4 and BtRDP as controls. These assays showed that BtRDP interacts with RLP4 but does not interact with SOBIR1, supporting the conclusion that SOBIR1 is unlikely to be a direct target of BtRDP. We fully agree that testing the interaction between RLP4 and BtRDP in the absence of SOBIR1 would further strengthen the conclusion. However, we were unable to obtain N. tabacum SOBIR1 knockout mutants, and therefore could not experimentally assess whether the RLP4–BtRDP interaction persists in planta without SOBIR1. Nevertheless, our yeast two-hybrid assays demonstrate that RLP4 and BtRDP can directly interact, indicating that their association does not strictly depend on SOBIR1. Together, these results support the interpretation that RLP4 is the primary target of BtRDP, while SOBIR1 is not directly engaged by the effector.

      (6) Transcriptome analysis (Lines 130-143):

      The transcriptome analysis section feels disconnected from the rest of the manuscript. The findings, or lack thereof, from this analysis do not seem to be directly linked to the other major conclusions of the paper. This section could be removed to improve the manuscript's overall focus and flow. If the authors believe this data is critical, they should more clearly and explicitly connect the conclusions of the transcriptome analysis to the core findings about the effector-RLP4 interaction.

      Thank you for suggestion. As you and Reviewer #2 pointed, the transcriptomic analysis did not closely link to the major conclusions of the paper, and we got little information from the transcriptomic analysis. Therefore, we remove these analyses to improve the manuscript’s overall focus and flow.

      (7) Signal peptide experiments (Lines 145 and beyond):

      The experiments conducted with the signal peptide (SP) are questionable. The SP is typically cleaved before the protein reaches its final destination. As such, conducting experiments with the SP attached to the protein may have produced biased observations and could lead to unjustified conclusions about the protein's function within the plant cell. We suggest the authors remove the experiments that include the signal peptide.

      Thank you for pointing out this. The SP was retained to direct the target proteins to the extracellular space of plant cells. Theoretically, the SP is cleaved in the mature protein. This methodology is widely used in effector biology. For example, the SP directs Meloidogyne graminicola Mg01965 to the apoplast, where it functions in immune suppression, whereas Mg01965 without the SP fails to exert this function (10.1111/mpp.12759). In our study, the SP of BtRDP was expected to guide the target protein to the extracellular space, facilitating its interaction with RLP4. Moreover, the observed protein sizes of BtRDP with and without the SP in transgenic plants were identical, suggesting successful SP cleavage. Therefore, we have retained the experiments involving the SP in the current version.

      (8) Overly strong conclusion and unclear evidence (Line 176):

      The use of the word "must" on line 176 is very strong and presents a definitive conclusion without sufficient evidence. The authors state that the proteins must interact with SOBIR1, but they do not provide a clear justification for this claim. Is SOBIR1 the only interaction partner for NtRLP4? The authors should provide a specific reason for focusing on SOBIR1 instead of demonstrating an interaction with NtRLP4 first. Additionally, do BtRDP or NlSP694 also interact with SOBIR1 directly? The authors should either tone down their language to reflect the evidence or provide a clearer justification for this strong claim.

      Thank you for pointing this out. In the current version, the word “must” has been toned down to “may” due to insufficient supporting evidence. In this study, SOBIR1 was chosen because it has been widely reported to be required for the function of several RLPs involved in innate immunity. However, it remains unclear whether SOBIR1 is the only interaction partner of NtRLP4. In the current version, we have clarified the rationale for focusing on SOBIR1 prior to the experiments “The receptor-like kinase SOBIR1, which contains a kinase domain, has been widely reported to be required for the function of RLPs involved in innate immunity (Gust & Felix, 2014)” and discussed that “Although NtRLP4 interacts with SOBIR1, this alone does not confirm that it operates strictly through this canonical module. Evidence from other RLPs shows that co-receptor usage can be flexible, and some RLPs function partly or conditionally independent of SOBIR1. Therefore, a more definitive assessment of NtRLP4 signaling will therefore require genetic dissection of its co-receptor dependencies, including but not limited to SOBIR1.”. In addition, the direct interaction between BtRDP and SOBIR1 was experimentally tested, and the results showed that BtRDP failed to interact with SOBIR1.

      Minor Comments

      (9) The statement in the abstract, "However, it remains unclear how these invaders are able to overcome receptor perception and disable the plant signaling pathways," is not entirely accurate. The fields of effector biology and host-pathogen interactions have provided significant insight into how pathogens and pests manipulate both Pattern-Triggered Immunity (PTI) and Effector-Triggered Immunity (ETI). While the specific mechanism described in this paper is novel, the broader claim that the field is unclear on these processes weakens the initial hook of the paper. A more precise framing of the problem would be beneficial, perhaps by stating that the specific mechanisms used by these particular herbivores to target RLP4 were previously unknown.

      Thank you for this insightful comment. We agree that the original statement in the abstract overstated the lack of understanding in the field. In the current version, we have refined the sentence to more accurately reflect the current state of knowledge, emphasizing that while microbial suppression of plant immunity has been extensively studied, the strategies used by herbivorous insects to overcome receptor-mediated defenses remain less understood. The revised sentence now reads as follows: “Although the mechanisms used by microbial pathogens to suppress plant immunity are well studied, how herbivorous insects overcome receptor-mediated defenses remains unclear”.

      (10) The introduction is heavily focused on Pattern Recognition Receptors (PRRs), which, while central to the paper's findings, gives a somewhat narrow view of the plant's defense against herbivores. It would be beneficial to briefly acknowledge the broader context of plant defenses, such as physical barriers, direct chemical toxicity, and indirect defenses, before narrowing the focus to the specific molecular interactions of PRRs that are the core of this study. This would provide a more complete picture of the "arms race" between plants and herbivores.

      Thank you for this valuable suggestion. We agree that the original introduction focused too narrowly on pattern-recognition receptors (PRRs). In the current version, we have expanded the introductory section to provide a broader overview of plant defense mechanisms. Specifically, we now acknowledge the multiple layers of plant defenses, including physical barriers (e.g., cuticle and cell wall), chemical defenses (e.g., toxic secondary metabolites and anti-nutritive compounds), and indirect defenses mediated by herbivore-induced volatiles. This addition provides a more complete context for understanding the molecular interactions discussed in this study. The revised paragraph now reads as follows: “Plants have evolved sophisticated defense systems to survive constant attacks from pathogens and herbivorous insects. These defenses operate at multiple levels, including physical barriers such as the cuticle and cell wall, chemical defenses involving toxic secondary metabolites and anti-nutritive compounds, and indirect defenses that attract natural enemies of herbivores through the emission of herbivore-induced volatiles. Beyond these general strategies, plants also rely on highly specialized molecular immune responses that allow them to detect and respond rapidly to invaders.”

      (11) The figure legends are generally clear, but some could be more detailed. For instance, in Figure 2, it would be helpful to explicitly state what each bar represents in the graph and to include the statistical test used. Please ensure all panels in all figures have clear labels.

      Thank you for this helpful suggestion. We have revised the legend of Fig. 2 and other figures to provide more detailed information for each panel. Specifically, we now explicitly describe what each bar represents in the graphs and specify the statistical test used. In addition, we ensured that all panels are clearly labeled. These changes improve clarity and allow readers to better interpret the data.

      (12) The methods section is comprehensive, but it would be helpful to include more specifics on the statistical analyses used. For example, the type of statistical test (e.g., t-test, ANOVA) and the software used should be mentioned for each experiment.

      Thank you for your suggestion. We have revised the Methods section (Statistical analysis) to provide more detailed information on the statistical analysis used for each experiment.

      (13) The manuscript's overall impact is weakened by the inclusion of unnecessary words and a few grammatical issues. A focused revision to tighten the language would make the major findings stand out more clearly. For example, on page 2, line 18, "in whitefly Bemisia tabaci, BtRDP is an Aleyrod..." seems to have an incomplete sentence. A thorough proofreading for typos and grammatical errors is highly recommended to improve the overall readability.

      Thank you for your suggestion. We have carefully revised the abstract and the manuscript to improve clarity, readability, and grammatical correctness. In addition, we sought the assistance of a professional English editor to thoroughly proofread and polish the manuscript, ensuring that the language meets high academic standards.

      (14) The discussion section is strong, but it could benefit from a more explicit connection between the findings and the broader ecological implications. For instance, how might the independent evolution of these effectors in different insect species impact plant-insect co-evolutionary dynamics?

      We thank the reviewer for the valuable suggestion. In the current version, we have added a paragraph in the Discussion section highlighting the broader ecological and evolutionary implications of our findings. Specifically, we discuss how the independent evolution of RLP4-targeting effectors in different insect lineages may drive plant-insect co-evolution, influence selection pressures on both plants and herbivores, and potentially shape defense diversification across plant communities. This addition helps to link our molecular findings to ecological outcomes and co-evolutionary dynamics.

      (15) The sentence on line 98, which reads " A few salivary proteins have been reported to attach to salivary sheath after secretion" seems to serve an unclear purpose in the introduction. It would be helpful for the authors to clarify its relevance to the surrounding context or to the paper's overall argument. Its inclusion currently disrupts the flow of the introduction and makes it difficult for the reader to understand its intended purpose.

      We thank the reviewer for the comment. We have revised the paragraph to clarify the relevance of salivary sheath localization to the study. Specifically, we now introduce the role of the salivary sheath as a potential scaffold for effector delivery and explicitly link previous reports of sheath-associated salivary proteins to our observation that BtRDP localizes to the salivary sheath after secretion.

      (16) The writing in lines 104-106 is both grammatically inconsistent and overly wordy. The authors switch between present and past tense ("is" and "was"), and the sentences could be made more concise to improve the clarity and flow of the text. Also check entire paper.

      We thank the reviewer for pointing this out. We have revised the sentence to improve grammatical consistency and clarity, and also checked the manuscript for similar issues. The sentence is now split into two concise statements. In addition, we have thoroughly checked the entire manuscript for similar tense inconsistencies and overly wordy sentences, and have made revisions throughout to ensure consistent past tense usage and improved readability.

      (16) The sentences on lines 111-113 are quite wordy. The core conclusion, which is that the protein affects the insect's feeding probe, could be expressed more simply and directly to improve clarity and flow. I suggest rephrasing this section to be more concise and to highlight the primary finding without the added language.

      We thank the reviewer for the helpful suggestion. We have revised the sentences to make them more concise and to emphasize the main finding that BtRDP influences the whitefly’s feeding behavior as follow: “Compared with the dsGFP control, dsBtRDP-treated B. tabaci showed a marked reduction in phloem ingestion and a longer pathway duration, indicating that BtRDP is required for efficient feeding (Fig. 2c).”

      (17) On line 118, the authors mention "subcellular location." It is not clear where the protein is localized. The authors should explicitly state the specific subcellular compartment of the protein, as this is crucial for understanding its function and interaction with other proteins.

      We thank the reviewer for this valuable comment. To clarify the subcellular localization of BtRDP, we have revised the manuscript accordingly. The transgenic line overexpressing the full-length BtRDP including the signal peptide (oeBtRDP) is expected to localize in the apoplast (extracellular space), whereas the line expressing BtRDP without the signal peptide (oeBtRDP<sup>-sp</sup>) is likely retained in the cytoplasm.

      (18) Lines 121-128, the description of the fecundity and choice assays in this section is overly wordy. The authors should present the main conclusion of these experiments more directly and concisely. The key finding is that the protein affects feeding behavior; this central point is somewhat lost in the detailed, and sometimes repetitive, phrasing.

      We thank the reviewer for this suggestion. In the revised manuscript, we have simplified the description of the fecundity and two-choice assays to highlight the main conclusion as follow: “Fecundity and two-choice assays showed that BtRDP, whether localized in the apoplast (oeBtRDP) or cytoplasm (oeBtRDP<sup>-sp</sup>), enhanced whitefly settling and oviposition compared with EV controls (Fig. 2d-i; Fig. S10), indicating that BtRDP promotes whitefly feeding behavior regardless of its subcellular location.”

      (19) Line 148, the manuscript mentions experiments involving transformation, but the transformation efficiency is not provided. Please include the transformation efficiency for all transformation experiments, as this is crucial for the reproducibility of the results.

      We thank the reviewer for raising this point. We would like to clarify that no transformation experiments were performed in this section. The experiments described involved Y2H screening using BtRDP<sup>-sp</sup> as a bait to identify interacting proteins from a N. benthamiana cDNA library. Therefore, there is no transformation efficiency to report.

      (20) Line 159, the manuscript refers to a sequence similarity around line 159 but does not provide the specific data. It is important to show the actual sequence similarity, perhaps in a supplementary figure or table, to support the claims being made.

      We thank the reviewer for this suggestion. To support our statement regarding sequence similarity, we have added the corresponding alignment figure in the Fig. S11.

      (21) Line 159, the manuscript refers to "three randomly selected salivary proteins." It is unclear from where these proteins were selected. The authors should clarify the source of this selection (e.g., a specific database or a previous study) to ensure the methodology is transparent and the results are reproducible.

      We thank the reviewer for raising this point. These proteins were selected based on previously reports (10.1093/molbev/msad221; 10.1111/1744-7917.12856). In the current version, we provide the accession of these proteins in the MS.

      (22) Line 160, the description "NtcCf9 without signal peptide and transmembrane domains" is difficult to understand. It would be clearer and more consistent to use a term like "truncated NtcCf9" and then specify which domains were removed, as this is a standard practice in molecular biology for describing protein constructs.

      We thank the reviewer for this suggestion. We have revised the manuscript to describe the construct as “truncated NtCf9” and specified that the signal peptide and transmembrane domains were removed

      (23) The phrase "incubated with anti-flag beads" on line 172 is a detail of a routine method. Such details are more appropriate for the Methods section rather than the main text, which should focus on the results and their implications. Please remove such descriptions from the main text to improve readability and flow.

      We thank the reviewer for this suggestion. We have removed the methodological detail from the main text to improve readability. We also check this throughout the MS.

      I am excited about the potential of this work and look forward to seeing the current version.

      We sincerely thank the reviewer for the positive feedback and encouragement. We appreciate your time and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein.

      Strengths:

      The authors used a wide range of methods to dissect the function of the white fly protein BtRDP and identify its host target NtRLP4.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      Weaknesses:

      (1) Serious concerns about protein work.

      I did not find the indicated protein bands for anti-BtRDP in Figures 1a and 1b in the original blot pictures shown in Figure S30. In Figure 1a, I can't get the point of showing an unspecific protein band with a size of ~190 kD as a loading control for a protein of ~ 30 kD.

      The data discrepancy led me to check other Western blot pictures. Similarly, Figures 2d, 3b, 3d, and S15b (anti-Myc) do not correspond to the original blots shown. In addition, the anti-Myc blot in Figure 4i, all blot pictures in Figures 5b, 5h, and S19a appeared to be compressed vertically. These data raised concerns about the quality of the manuscript.

      Blots shown in Figure 3d, 4f, 4g, and 4h appeared to be done at a different exposure rate compared to the complete blot shown in Figure S30. The undesirable connection between Western blot pictures shown in the figures and the original data might be due to the reduced quality of compressed figures during submission. Nevertheless, clarification will be necessary to support the strength of the data provided.

      We sincerely thank the reviewer for carefully examining our Western blot data and for pointing out these inconsistencies. The discrepancy between the figures in the main text and the original blots (Figure S30) resulted from an oversight during manuscript revision. This manuscript had undergone multiple rounds of revision after submission to another journal. During this process, the main figures and supplementary figures were updated separately, and we mistakenly failed to replace the original blot files with the corresponding current versions.

      For the different exposure rate, the blots shown in the main text were adjusted for overall contrast and brightness to enhance band visibility and presentation clarity, whereas the original images in Figure S30 were raw, unprocessed scans directly from the imaging system. For example, in the Author response image 1 below, to visualize the loading of the input sample, the output figure was adjusted for overall contrast and brightness. This was acceptable for image processing (https://www.nature.com/nature-portfolio/editorial-policies/image-integrity)

      Author response image 1.

      The same figure with brightness and contrast changes across the entire image.

      For the vertical compression, in the previous version, some images were vertically compressed for layout purposes to make the composite figures appear more visually balanced. However, after consulting relevant publication guidelines, we realized that such one-dimensional compression is not encouraged by certain journals as it may alter the original aspect ratio of the image. Therefore, in the manuscript, we have avoided any non-proportional scaling and retained the original aspect ratio of all images.

      We have now carefully rechecked all Western blot data, replaced the outdated raw blot images with the correct corresponding ones, avoid vertical compression, and ensured that the processed figures in the main text match their original data. The revised supplementary figures now accurately reflect the raw experimental results.

      (2) Misinterpretation of data.

      I am afraid the authors misunderstood pattern-triggered immunity through receptor-like proteins. It is true that several LRR-type RLPs constitutively associate with SOBIR1, and further recruit BAK1 or other SERKs upon ligand binding. One should not take it for granted that every RLP works this way. To test the hypothesis that NtRLP4 confers resistance to B.tabaci infestation, the author compared transcriptional profiles between an EV plant line and an RLP4 overexpression line. If I understood the methods and figure legends correctly, this was done without B. tabaci treatment. This experimental design is seriously flawed. To provide convincing genetic evidence, independent mutant lines (optionally independent overexpression lines) in combination with different treatments will be necessary. Otherwise, one can only conclude that overexpressing the RLP4 protein generated a nervous plant. In addition, ROS burst, but not H2O2 accumulation, is a common immune response in pattern-triggered immunity.

      We agree with the reviewer that not every RLP functions through the same mechanism as the canonical SOBIR1–BAK1 pathway. In the current version, we further examined the interaction between the whitefly salivary protein and SOBIR1, and found that they do not interact. However, our interaction assays clearly demonstrated that NtRLP4 does interact with SOBIR1. Whether NtRLP4 functions through, or exclusively through, SOBIR1 remains uncertain, and we have emphasized this limitation in the Discussion section as follow: “Although NtRLP4 interacts with SOBIR1, this alone does not confirm that it operates strictly through this canonical module. Evidence from other RLPs shows that co-receptor usage can be flexible, and some RLPs function partly or conditionally independent of SOBIR1 [39]. Therefore, a more definitive assessment of NtRLP4 signaling will therefore require genetic dissection of its co-receptor dependencies, including but not limited to SOBIR1.”

      Regarding the transcriptome analysis, our original aim was to explore why B. tabacishowed such a pronounced preference among tobacco plants. As this preference was assessed using uninfested plants, we also performed transcriptome sequencing using plants without B. tabaci treatment. The enrichment analysis demonstrated that the majority of up-regulated DEGs were associated with plant–pathogen interaction, environmental adaptation, MAPK signaling, and signal transduction pathways, while down-regulated DEGs were enriched in glutathione, carbohydrate, and amino acid metabolism. Notably, many DEGs were annotated as RLK/RLPs or WRKY transcription factors, most of which were upregulated, suggesting an enhanced defense state in the NtRLP4-overexpressing plants. The altered expression of JA- and SA-related genes (e.g., upregulation of FAD7 and downregulation of PAL and NPR1) further supported this enhanced defense and hormonal crosstalk. We agree that combining overexpression or knockout lines with insect infestation treatments would provide more direct genetic evidence for NtRLP4-mediated resistance, and we have acknowledged this as an important future direction. Nevertheless, our current data are consistent with the conclusion that NtRLP4 overexpression confers increased resistance to B. tabaci infestation.

      Finally, DAB staining for H<sub>2</sub>O<sub>2</sub> accumulation is also a well-established indicator of PTI responses, and many studies have shown that overexpression of salivary elicitors can trigger such accumulation.

      (3) Lack of logic coherence.

      The written language needs substantial improvement. This impeded the readability of the work. More importantly, the logic throughout the manuscript appeared scattered. The choice of testing protein domains for protein-protein interactions, using plants overexpressing an insect protein to study its subcellular localization, switching back and forth between using proteins with signal peptides and without signal peptides, among others, lacks a clear explanation.

      We appreciate the reviewer’s careful reading and valuable comments regarding the logical coherence of our manuscript.

      (1) To improve the English quality, the entire manuscript has been professionally edited by a certified language-editing service.

      (2) Regarding the rationale for testing protein domains in the protein–protein interaction assays: NtRLP4 is a membrane-anchored receptor-like protein composed of extracellular, transmembrane, and short intracellular domains. We aimed to determine which region of NtRLP4 is responsible for interacting with the salivary protein, as this would help infer the likely site of interaction in planta. In addition, not all RLPs contain a malectin-like domain, and we sought to verify whether the BtRDP–NtRLP4 interaction depends on this domain. To enhance the logical flow, we introduced a brief statement explaining the experimental purpose before presenting the interaction assays in the current version as follow: “These findings raised the question of which domain of NtRLP4 is responsible for binding BtRDP, as identifying the interacting domain could help infer where the salivary protein contacts the receptor in planta. We therefore dissected the NtRLP4 domains accordingly.”

      (3) With respect to using plants overexpressing an insect protein to examine subcellular localization: since both the brown planthopper and the whitefly are non-model species for which stable genetic transformation is technically unfeasible, many previous studies have used Agrobacterium-mediated transient expression or transgenic plant systems to investigate the subcellular localization of insect salivary proteins within host cells. Following these precedents, our study also employed plant systems to determine the localization of the insect protein and to assess how different localizations affect plant defense responses.

      (4) As for switching between constructs with or without signal peptides: the subcellular localization of effectors can influence their biological activity and interactions. Previous studies have used the presence or absence of signal peptides, or replacement with a PR1 signal peptide, to direct protein targeting (for example, Frontiers in Plant Science, 2022, 13:813181). Because salivary sheaths are generally considered to localize in the apoplastic space, we generated two transgenic N. tabacum lines overexpressing BtRDP: one carrying the full-length coding sequence including the signal peptide (oeBtRDP), expected to be secreted into the apoplast, and another lacking the signal peptide (oeBtRDP-sp), likely retained in the cytoplasm. In the current version, we clarified this rationale and added references to similar studies to improve the manuscript’s logic and readability. Details are as follow: “To investigate the role of BtRDP in different subcellular location of host plants, we constructed two transgenic N. tabacum lines overexpressing BtRDP: one carrying the full-length coding sequence including the signal peptide (oeBtRDP), which is expected to be secreted into the apoplast (extracellular space), and the other lacking the signal peptide (oeBtRDP<sup>-sp</sup>), which is likely retained in the cytoplasm.”

      Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al. investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      Weaknesses:

      (1) I found the naming of BtRDP and NlSP694 somewhat confusing. The authors defined BtRDP as "B. tabaci RLP-degrading protein," whereas NlSP694 appears to have been named after the last three digits of its GenBank accession number (MF278694, presumably). Is there a standard convention for naming newly identified proteins, for example, based on functional motifs or sequence characteristics? As it stands, the inconsistency makes it difficult for readers to clearly distinguish these proteins from those reported in other studies.

      Thank you for your comment. These are species-specific salivary proteins that have not been reported or annotated in previous studies. Because no homologous genes could be identified in other species, there are no existing names or annotations for these proteins. For such lineage-specific salivary proteins, it is common in recent studies to name them according to their experimentally identified functions. For example, a recently reported salivary protein was named SR45-interacting salivary protein (SISP) based on its function (10.1111/nph.70668). Following this convention, we adopted a similar functional naming strategy in this study. We acknowledge that there may not yet be a standardized rule for naming such proteins, and we would be glad to follow a more authoritative naming guideline if possible.

      (2) Figure 2 and other figures. Transgenic experiments require at least two independent lines, because results from a single line may be confounded by position effects or unintended genomic alterations, and multiple lines provide stronger evidence for reproducibility and reliability.

      We appreciate the reviewer’s suggestion. In our study, two independent transgenic lines were used to ensure the reproducibility and reliability of the results. One representative line was presented in the main figures, while data from the second independent line were included in the supplementary figures. To make this clearer, we have emphasized in the manuscript that bioassays were conducted using two independent transgenic lines.

      (3) Figure 3e. Quantitative analysis of NtRLP4 was required. Additionally, since only one band was observed in oeRLP, were any tags included in the construct?

      Thank you for your comment. In the current version, quantitative analysis of NtRLP4 expression has been performed and is now presented in Figure 3. For the oeRLP plants, no tag was fused to NtRLP4; thus, anti-RLP serum was used to detect the target bands. In contrast, oeBtRDP and oeBtRDP-sp were fused with C-terminal FLAG tags, and their detection was carried out using anti-FLAG serum. This information has been clarified in the revised Methods section as follows: “The oeBtRDP and oeBtRDP<sup>-sp</sup> were fused with C-terminal FLAG tags, while no tag was fused to oeNtRLP4.”

      (4) Figure 4a. The RNAi effect appears to be well rescued in Line 1 but poorly in Line 2. Could the authors clarify the reason for this difference?

      Thank you for pointing this out. We also noticed that the RNAi effect appeared to be better rescued in Line 2 than in Line 1. Based on our measurements, the silencing efficiency of NtRLP4 in RNAi-RLP4 Line 1 was markedly weaker than in Line 2, which likely explains the difference in rescue efficiency. In the current version, we have clarified this point as follows: “Both RNAi-RLP lines showed reduced NtRLP4 levels compared with EV plants, with RNAi-RLP#2 exhibiting a stronger silencing effect (Fig. S19a).” “The differential rescue effect between the two RNAi lines likely resulted from their different NtRLP4 silencing efficiencies, with the lower NtRLP4 level in RNAi-RLP#2 leading to a more complete rescue phenotype.”

      (5) ROS accumulation is shown for only a single leaf. A quantitative analysis of ROS accumulation across multiple samples would be necessary to support the conclusion. The same applies to Figure 16f.

      Thank you for pointing this out. The H<sub>2</sub>O<sub>2</sub> accumulation experiments have been repeated for 5 times in Figure 4 and Figure S16f. In the current version, we addressed that “the experiment is repeated five times with similar results” in the figure legends.

      (6) Figure 4f: NtRLP4 abundance was significantly reduced in oeBtRDP plants but not in oeBtRDP-SP. Although coexpression analysis suggests that BtRDP promotes NtRLP4 degradation in an ubiquitin-dependent manner, the reduced NtRLP4 levels may not result from a direct interaction between BtRDP and NtRLP4. It is possible that BtRDP influences other factors that indirectly affect NtRLP4 abundance. The authors should discuss this possibility.

      Thank you for your valuable suggestion. We agree that the reduced NtRLP4 abundance may not necessarily result from a direct interaction between BtRDP and NtRLP4. In the manuscript, we have further discussed this possibility as follows: “Notably, BtRDP and NlSP104 shared no sequence or structural similarity and lack resemblance to known eukaryotic ubiquitin-ligase domains. Their interaction with RLP4s occurs in the extracellular space (Fig. 3d; Fig. 5c), whereas the ubiquitin-proteasome system primarily functions in the cytosol and nucleus [46]. Furthermore, NtRLP4 reduction is observed only in oeBtRDP transgenic plants, not in oeBtRDP-sp plants (Fig. 4f), suggesting that BtRDP exerts its influence on NtRLP4 in the extracellular space. These observations collectively argue against the possibility that BtRDP or NlSP694 possesses intrinsic E3 ligase activity capable of directly ubiquitinating RLP4s within plant cells. Importantly, the reduced NtRLP4 levels may not result from a direct physical interaction between BtRDP and NtRLP4. Instead, BtRDP may indirectly affect RLP4 post-translational modification, thereby accelerating its degradation, which warrants further investigation”

      (7) The statement in lines 335-336 that 'Overexpression of NtRLP4 or NtSOBIR1 enhances insect feeding, while silencing of either gene exerts the opposite effect' is not supported by the results shown in Figures S16-S19. The authors should revise this description to accurately reflect the data.

      Thank you for pointing this out. We agree that our original statement was not precise, as we measured the insect settling preference and oviposition on transgenic plants, but did not directly assess the feeding behavior of B. tabaci. Therefore, we have revised the description in the manuscript to more accurately reflect our data as follows: “Overexpression of NtRLP4 or NtSOBIR1 in N. tabacum is attractive to B. tabaci and promotes insect reproduction, whereas silencing of either gene exerts the opposite effect.”

      (8) BtRDP is reported to attach to the salivary sheath. Does the planthopper NlSP694 exhibit a similar secretion localization (e.g., attachment to the salivary sheath)? The authors should supplement this information or discuss the potential implications of any differences in secretion localization between BtRDP and NlSP694 for their respective modes of action.

      Thank you for your insightful suggestion. We agree that determining the secretion localization of NlSP694 would provide valuable information for understanding its potential mode of action. Immunohistochemical (IHC) staining is indeed a critical approach for such analysis. However, in this study, we were unable to express NlSP694 in Escherichia coli, and the antibody generated using a synthesized peptide did not show sufficient specificity or sensitivity for IHC detection. Consequently, we were unable to determine whether NlSP694 is attached to the salivary sheath. Therefore, whether BtRDP and NlSP694 acted in different mode require further investigation.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1e. The BtRDP-labeled fluorescent signal is difficult to discern. An enlarged view of the target region would be helpful for clarity.

      Thank you for your suggestion. In the current version, an enlarged view of the target region was provided below the figure.

      (2) The finding that BtRDP accumulates in the salivary sheath secreted by Bemisia tabaci is important for understanding the subcellular localization of this protein during actual insect feeding. I suggest moving Figure S5 to the main text.

      Thank you for your suggestion. Figure S5 has been moved to Fig. 1f in the current version.

      (3) Please carefully cross-check the figure numbering to ensure that all in-text citations correspond to the correct figures and panels. i.e., lines 136,188,192, and 194.

      Thank you for pointing this out. We corrected them in the current version.

    1. eLife Assessment

      This study demonstrates that endothelial toll-like receptor 4 is a central regulator of leptomeningeal inflammation in the context of neonatal E. coli meningitis. The data are derived from cell type-specific gene knockout in mice as well as from cultured endothelial cells, and are generally solid, with only minor weaknesses in analysis and interpretation. This work is important as it advances our understanding of host cellular processes and molecular pathways underlying meningitis pathogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Seegren and colleagues demonstrate that in a mouse model of neonatal E. coli meningitis, loss of endothelial toll-like receptor 4 (TLR4) leads to a marked decrease in transcriptional dysregulation across multiple leptomeningeal cell types, a decrease in vascular permeability, and a decrease in macrophage abundance. In contrast, loss of macrophage TLR4 had less pronounced effects. Using cultured wild-type and TLR4-knockout endothelial cells, the authors further demonstrate that TLR4-NF-κB signaling leads to reversible internalization of the tight junction protein claudin-5, establishing a potential mechanism of increased vascular permeability. Finally, the authors use RNA-sequencing of wild-type and TLR4-knockout endothelial cells to define the TLR4-dependent cell-autonomous transcriptional response to E. coli.

      Strengths:

      (1) The authors address an important, well-motivated hypothesis related to the cellular and molecular mechanisms of leptomeningeal inflammation.

      (2) The authors use model systems (mouse conditional knockouts and cultured endothelial cells) that are appropriate to address their hypotheses. The data are of high quality.

      Weaknesses:

      (1) The authors perform single-nucleus RNA-seq on dissected leptomeninges from control and E. coli-infected mice across three genotypes (WT, Tlr4MKO, and Tlr4ECKO). A major discovery from this experiment, as summarized by the authors, is: "Tlr4ECKO mice exhibited a global attenuation of infection-induced transcriptional responses across all major leptomeningeal cell types, as judged by the positions of cell clusters in the UMAP." This conclusion could be considerably strengthened by improving the qualitative and quantitative analysis.

      (2) The authors interpret E. coli infection-induced increases in leptomeningeal sulfo-NHS-biotin as evidence of compromised BBB integrity (i.e., extravasation from the vasculature) (Results, page 7), but another possible route in this context is sulfo-NHS-biotin entry from the dura across a compromised arachnoid barrier. The complete rescue in Tlr4ECKOs is strongly suggestive that the vascular route dominates, but it would strengthen the work if the authors could assess arachnoid barrier fidelity (e.g. via immunohistochemistry). At a minimum, authors should mention that the sulfo-NHS-biotin signal in this context may represent both vascular and arachnoid barrier extravasation.

      (3) The authors state that "deletion of TLR4 prevented both NF-κB nuclear translocation and Cldn5 internalization in response to E. coli (Figure 4A-D)" (Results, page 9). In Figures 4C and D, however, there is no indicator of a statistical test directly comparing the two genotypes. A comparison of within-genotype P-values should not be used to support a genotype difference (PMID: 34726155).

      (4) In the first paragraph of the Results, the authors summarize the meningeal layers as (1) pia, (2) subarachnoid space, (3) arachnoid, and (4) dura, and then state "The second and third layers constitute the leptomeninges." This definition of leptomeninges seems to omit the pia, which is widely considered part of the leptomeninges (PMID: 37776854).

      (5) The Cdh5-CreER/+;Tlr4 fl/- mouse lacks TLR4 in all endothelial cells (i.e., in peripheral organs as well as CNS/leptomeninges), and, as the authors note, the periphery is exposed to E. coli. It would be helpful if the authors could comment in the Discussion on the possibility that peripheral effects (e.g., peripheral endothelial cytokine production, changes to blood composition as a result of changes to peripheral endothelial permeability) may contribute to the observed leptomeningeal phenotypes.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a postnatal mouse model of E. coli bacterial meningitis and a mouse brain endothelioma cell line combined with cell-type-specific gene deletion to study the function of endothelial TLR4, a cell surface receptor that recognizes gram positive bacterial wall components, in the local leptomeningeal (LPM) response with a focus on endothelial barrier breakdown mediated by TLR4. Single-cell transcriptional profiling and imaging studies using whole-mount preps of the LPM support that LPM endothelial, CD206+ local macrophage and LPM fibroblast and arachnoid barrier cell inflammatory response and is abrogated in endothelial-specific KO of TLR4, pointing to a role for endothelial TLR4 in local LPM response. Culture studies using Bend3.1 cells (a mouse brain endothelioma cell line) support a direct role for TLR4 in the bacteria-mediated inflammatory response and in internalization of Cldn5 via the endosomal-lysosomal pathway, resulting in loss of barrier integrity

      Strengths:

      The local LPM cell response in meningitis and the role of specific LPM cells in inflammation and CNS barrier breakdown have not been extensively studied, despite ample evidence for primary immune response in the meninges in human patients and in animal models. The authors employ a robust, multi-model approach using both in vivo and in vitro models with cell-type-specific knockout to study the function of TLR4 in brain endothelial cell response. The authors nicely combine functional barrier assays with IF for junctional localization in their experimental design, and they delve into potential mechanisms of Cldn5 internalization using markers of endosomal-lysosomal pathway localization. The authors also describe a new type of barrier assay using a streptavidin-coated plate upon which barrier-forming cell cultures can be placted, this could be a very useful alternative or complement to other size-selective barrier assays and presumably could work for other barrier forming cells types, likely epithelial cells.

      Weaknesses:

      (1) There are no measures of bacterial burden in peripheral organs, blood, in the LPM or brain in the TLR4 endothelial cKO mice. Lack of TLR4 in endothelial cells could prevent bacterial 'access' into the LPM and brain, essentially preventing meningitis and leading to a lack of inflammatory responses in the LPM-located cells simply because there is no bacteria present. Bacteremia may also be reduced, as might inflammatory responses in peripheral organs with TLR4-deficient peripheral endothelium. Bacterial counts and inflammatory measures in peripheral organs and blood are important to better understand the mechanism(s) underlying the reduced inflammatory profile in LPM cells and no LPM endothelial breakdown in the Tlr4 endothelial cKO mice. In other words, does deleting TLR4 in EC protect against the development of meningitis by somehow blocking bacteria access to the LPM (this would be supported by low or no CFU counts in infected Tlr4 endothelial cKO) or is it what the authors appear to propose in Figure 1J that TLF4 in EC is the only cell responding to the bacteria to trigger the immune cascade in the LPM? More data is needed to resolve this, as this is a major claim of the paper.

      (2) The authors look at the underlying cortical response (cerebral vasculature for ICAM and immune cells) but do not use markers that could identify microglia (Iba1), the primary resident immune cell (CD206 is not useful, at this stage, in perivascular macrophages that are extremely sparse in the postnatal brain). This would be important to better study the impact on CNS resident immune cell morphological activation.

      (3) The authors suggest that Cldn5 junctional localization is selectively disrupted upon bacterial exposure, mediated by TLR4 - they suggest this based on studying PECAM, GLUT-1, ZO-1 and B-catenin (all normally junction or cell surface located in cultured Bend3.1) in relationship to Cldn5 localization (normally high) - it is possibly these are also impact by bacteria exposure (maybe through different mechanisms?) - a better measure would be to use the similar cyto/PM measure they do for Cldn5 in Fig. 4D and to evaluate this or to use intensity measurements.

      (4) The discussion could benefit from delving more into the prior literature on E.coli-mediated breakdown of junctions in cultured human microvascular brain endothelial cell model and critical host-pathogen interactions of the bacteria with ECs (PMID: 14593586), and how this might involve TLR4.

      (5) It would be important to discuss how their results relate to earlier studies on TLR4-/- and TLR2-/- global knockout mice and protection vs vulnerability to development of meningitis (see PMCID: PMC3524395) - this paper showed that TLR4 global KO mice have increased susceptibility to die from meningitis and have much higher CFU counts in the CNS. In this manuscript and their prior work (Wang et al., 2023), this group shown that both global TLR4-/- mutants and their EC-specific KO have reduced barrier permeability, but we don't have any information about CFU or susceptibility to death from meningitis in their models.

    4. Reviewer #3 (Public review):

      Summary:

      This study investigates the molecular underpinnings of immune responses in the leptomeninges in neonatal bacterial meningitis. Bacterial meningitis is a major disease burden, particularly for neonates, and it has previously been noted that the meningeal immune environment in infants is permissive to opportunistic infection (Kim et al., Sci Immunol, 2023). There is less known about the contribution of the stromal compartment to meningeal immune responses. Seegren et al. interrogate the role of leptomeningeal endothelium in host defence in E. coli infected neonatal mice using mouse genetic tools to delete the LPS receptor Tlr4 from either endothelial cells (using Cdh5-CreER) or macrophages (using LysM-Cre). The authors use snRNAseq, cleared cortical mounts, and in vitro work to define the impact of E. coli infection on leptomeningeal endothelial cells. This study uses a range of innovative techniques to probe the role of the stromal compartment in meningitis.

      Strengths:

      This study makes excellent use of cleared cortical mounts to examine the biology of the leptomeninges, in particular, changes to the endothelium, with unprecedented detail. In combination with high-quality sequencing data provide new insights into the impact of meningitis on the leptomeninges. The data presented by the authors is of very high quality.

      Weaknesses:

      The weaknesses of the study were in terms of interpretation and perhaps study design.

      (1) Most importantly, the authors need to provide additional validation of their conditional knockout models. The authors need to confirm that the Cdh5-CreER does not impact leptomeningeal fibroblasts and to confirm gene deletion in macrophages.

      (2) The authors could also strengthen the paper by providing data on the impact of these conditional knockout models on the course of meningitis and bacterial burden.

      (3) Finally, it is perhaps not surprising that Tlr4 is required for meningitis responses with E. coli. However, it is unclear if these findings can be generalised to other, more common, meningitis infections (streptococcal/pneumococcal).

      (4) There are additional minor issues; for instance, the arachnoid fibroblast 2 population appears to closely resemble dural border cells.

      (5) The cell line model (bEnd.3) is a relatively low-fidelity model of BBB endothelial cells, and this should be acknowledged.

      With these caveats, it is difficult to be certain that the endothelium alone is the driver of meningeal immune responses in meningitis, and what the impact of these is.

    1. eLife Assessment

      This fundamental study advances our understanding of how dietary patterns shape cancer immunity by identifying a link between a Mediterranean-mimicking diet, gut bacteria, and a metabolite that enhances anti-tumor immune responses. The evidence supporting the main conclusions is solid, based on carefully controlled diet experiments, measurements of gut-derived molecules, and functional immune analyses across multiple models, together with supportive observations in human data. The work will be of broad interest to biologists working on microbiota and cancer. However, there are several issues that the authors should address to improve the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      In brief, this manuscript addresses a very interesting topic, namely, the impact of the Mediterranean diet on the development of cancer. Using one mouse model and three tumor cell lines, the data show that a Mediterranean diet is sufficient to promote an anti-tumor response mediated by the microbiota, metabolites, and the immune system. Mechanistically, the Mediterranean diet promotes the expansion of Bacteroides thetaiotaomicron (B. theta for short), which converts tryptophan into 3-IAA. Both B. theta and the metabolite are sufficient to phenocopy the effect of the Mediterranean diet on cancer growth in vivo. The manuscript also shows that this effect is mediated by CD8 T cells and suggests, by way of in vitro assays, that 3-IAA sustains the functionality of CD8 T cells, preserving their exhaustion and blocking the ISR pathway.

      Strengths:

      The conclusions of this manuscript are potentially interesting and of potential clinical relevance.

      Weaknesses:

      For a full technical evaluation of the strength of the data, I am missing important technical and experimental details (e.g., number of independent experiments, statistics), and found some legends with potential labelling inaccuracies.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate the mechanistic link between a Mediterranean-mimicking diet (MedDiet)-specifically the synergy between high fiber and fish oil-and its ability to suppress tumor growth. They successfully identify that this dietary combination alters the gut microbiome to favor the expansion of Bacteroides thetaiotaomicron. This bacterium metabolizes dietary tryptophan into indole-3-acetic acid (3-IAA), which then acts systemically to prevent CD8+ T-cell exhaustion.

      Strengths:

      The study integrates controlled dietary interventions, microbiome perturbation, metabolite profiling, and immune functional analyses into a coherent and well-organized framework, making the overall logic of the work easy to follow. The dietary design is carefully controlled, allowing clear interpretation of which broad dietary features are associated with the observed antitumor effects. The immune dependence of the phenotype is addressed using appropriate experimental approaches, and the results broadly support a role for gut microbiota-derived metabolites in shaping immune cell function. In addition, analyses of human datasets provide important context and enhance the potential relevance and usefulness of the findings for a broader research community.

      Weaknesses:

      While the manuscript provides strong support for a role of the microbial metabolite indole-3-acetic acid and downstream stress signaling in shaping immune cell function, the upstream mechanism by which this metabolite exerts its effects remains unresolved. In particular, the specific molecular sensor or binding target through which the metabolite acts has not been identified, and this uncertainty limits mechanistic precision. Framing this point more explicitly as an open question would help align the interpretation with the current data.

      In addition, at several points, the presentation may imply that a single microbial species is uniquely responsible for the observed effects. However, the experimental evidence more directly demonstrates sufficiency under the tested conditions rather than necessity. A clearer distinction between "sufficient" and "necessary" claims would help readers better assess the generality of the findings and their applicability to more complex microbial communities.

      The interpretation of the human data also warrants some caution. The diet-associated score applied to human datasets is derived from gene-expression signatures identified in mouse models and therefore represents an indirect proxy rather than a direct measure of dietary intake. Although the score correlates with clinical outcomes, it does not establish that patient survival is driven by consumption of specific dietary components such as fiber and fish oil.

    1. eLife Assessment

      This study presents a valuable finding that depletion of the rRNA methyltransferase METTL5 enhances anti-tumor immunity through a novel mechanism involving neoantigen generation from non-canonical translation. The evidence supporting the central conclusions is solid, with comprehensive multi-omics data including ribosome profiling, immunopeptidomics, TCR sequencing, and multiple in vivo tumor models demonstrating synergy with immune checkpoint blockade.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Zhang et al. demonstrate that depletion of the 18S rRNA m6A methyltransferase Mettl5 compromises translation fidelity and consequently increases neoantigen generation, thereby uncovering an unexpected role for Mettl5 in tumor immunity. Mettl5-KO tumors exhibit enhanced CD8⁺ T-cell infiltration and show improved responses to immune checkpoint blockade. Mechanistically, loss of Mettl5 perturbs the local structure of 18S rRNA and disrupts the ribosome's ability to perform accurate translation. Subsequent ribosome profiling and mass spectrometry analyses provide compelling evidence that Mettl5 functions as a previously unrecognized regulator of translation to participate in tumor immune evasion.

      Strengths:

      This study presents a comprehensive set of experimental data supporting a mechanistic link between rRNA modification, translation fidelity, and neoantigen generation. The observed synergistic effect of Mettl5 depletion and anti-PD-1 therapy highlights the potential translational relevance of targeting rRNA modifications in cancer immunotherapy.

      Weaknesses:

      (1) In light of the principal function of Mettl5, which is to methylate 18S rRNA within the small ribosomal subunit, the authors focus primarily on translation fidelity, largely associated with elongation, but provide limited exploration of potential effects on translation initiation. Loss of Mettl5 may alter the initiation landscape, potentially promoting alternative or noncanonical initiation events (e.g., initiation at CUG codons), which could also contribute to the observed neoantigen repertoire changes. Further investigation into initiation-level alterations would strengthen the mechanistic interpretation.

      (2) Given the broad involvement of rRNA methyltransferases in ribosome function, the authors should incorporate a parallel analysis using another enzyme (e.g., Zcchc4 or Nsun5) as a negative control. Such an experiment is essential to demonstrate that the tumor immunity phenotype observed is specific to Mettl5 rather than a general consequence of perturbing rRNA modification.

    3. Reviewer #2 (Public review):

      Summary:

      This study demonstrates that METTL5-mediated rRNA m⁶A1832 modification regulates tumor neoantigen generation by maintaining translational fidelity. Loss of METTL5 in tumor cells promotes immune cell infiltration into the tumor microenvironment and enhances the therapeutic efficacy of anti-PD-1 treatment, identifying a novel and potentially important target for cancer immunotherapy.

      Strengths:

      In murine tumor models, the authors found that Mettl5 depletion increases CD8⁺T cell infiltration and T cell receptor (TCR) repertoire diversity, and revealed a novel mechanism by which reduced ribosomal translation fidelity enhances non-canonical translation, thereby promoting the production of tumor neoantigens.

      Weaknesses:

      (1) While Mettl5 knockout enhances T-cell infiltration into tumors, it remains unclear whether loss of Mettl5 affects the expression of chemokines involved in immune cell recruitment.

      (2) Although the authors report a significant reduction in tumor cell growth as well as tumor volume and weight, direct evidence demonstrating T-cell-mediated cytotoxicity is lacking.

    1. eLife Assessment

      The importance of uterine natural killer (NK) cells in reproductive success has been demonstrated in mice and humans; however, it is still unclear how uterine NK cells are developed. In this valuable manuscript, the authors provide convincing evidence that TGF-b signaling in NK cells supports normal pregnancy in mice by the conversion of conventional NK cells into uterine tissue-resident NK cells. There are some concerns about the paper, particularly around Figures 1A, 1C, and 2E.

    2. Reviewer #1 (Public review):

      This is an excellent paper from Dr. Yokoyama and colleagues. The experiments are technically demanding, given the very low cell numbers and the challenges of working with implantation sites at gestational days 6.5, 10.5, and 14.5. Overall, the impact of TGF-β receptor II deficiency in the NK lineage on uterine trNK cell numbers and litter size is convincing, and the authors' conclusions are well supported by the data. Less convincing, however, is the claim that the decrease in trNK cells is compensated by an increase in cNK cells; rather, the absence of TGF-β receptor II appears to result in an overall reduction of NK/ILC1 cells.

      Major Points:

      (1) Figure 1A and B

      Although a trend is evident, it does not appear that the absolute number of cNK cells at day 14 is significantly changed from day 6.5?

      (2) Figure 2E

      The authors state, "This reduction of uterine trNK cells was accompanied by a concomitant increase in the absolute number and frequency of CD49b+Eomes+ cNK cells within the pregnant uterus of TGF-βRIINcr1Δ dams (Figure 2 D, E). The number of cNK cells appears relatively low (visually ~1,000-1,300), and although the difference is statistically significant, its physiological relevance is unclear. More importantly, this modest increase does not correlate with the marked decrease in trNK and ILC1 populations, as cNK cells do not appear to accumulate. In my opinion, the conclusion "Collectively, these findings indicate that a TGF-β-driven differentiation pathway directs the conversion of peripheral cNK cells into uterine trNK cells during murine pregnancy" should be slightly toned down.

      (3) Figures 2-4

      It is unclear whether the littermate controls are floxed mice or floxhet-Ncr1iCre mice? This distinction is important, as Ncr1iCre expression itself could potentially lead to a phenotype.

    3. Reviewer #2 (Public review):

      In their manuscript "TGF-β drives the conversion of conventional NK cells into uterine tissue-resident NK cells to support murine pregnancy", Yokoyama and colleagues investigate the role of Tgfbr2 expression by NK cells in the formation of tissue-resident uterine NK cells and subsequent importance in murine pregnancy. By transferring congenic splenic conventional NK cells into pregnant mice, they show conversion of circulating NK cells into uterine ivCD45 negative tissue-resident NK cells. When interfering with the formation of uterine trNK cells, spiral artery remodelling was impaired, fetal resorption rates were increased, and litter sizes were reduced.

      Generally, this is a research topic of high interest, yet the manuscript is lacking detailed mechanistic insights, and some questions remain open. At the current state, the data represent an interesting characterisation of the Tgfbr2-fl/fl Ncr1-Cre mice in pregnancy, but considering (a) the recent publication by the group (Reference 17) on the role of Eomes+ cNK cells during pregnancy, (b) the previously described role of Tgfbr2 and autocrine TGFb expression for uterine NK cell differentiation in virgin mice (also cited by the authors), and (c) the well-known relevance of uterine NK cells during pregnancy, additional experiments addressing the specific role of Tgfb during pregnancy would help to improve novelty and significance of the manuscript. To this end, the following aspects should be discussed and, where applicable, experimentally addressed by the authors:

      (1) The authors suggest cNK extravasation and local differentiation into iv- trNK.

      Can it be estimated how much this process contributes to the trNK pool vs. a potential local proliferation of already existing trNK? How do absolute numbers of CD49a+ Eomes+ trNK change during pregnancies? (In Figure 1A, the cell numbers of CD49a+ Eomes+ trNK seem to go down dramatically between gd 6.5 and 14.5). The plot in 1B could also include absolute numbers of ILC1s and trNKs. Would recruited cNK cells compensate for a potential loss of CD49a+ Eomes+ trNK?

      (2) Figure 1C: 2.5

      Mio cNK cells have been transferred, but only very few cells can be detected within the uterus (concatenated FACS plot shown). What may represent the limit to generate uterine trNK out of cNK? Is the niche supporting cNK-trNK differentiation limited? Is it only a specific subset of (splenic) cNK capable of differentiating into trNK? Is gd 0.5 the optimal timepoint for the transfer? Is there continuous recruitment of cNK into the uterus and differentiation into trNK, or is it enhanced at specific timepoints of pregnancy? Could there be local proliferation of cNK-derived trNK? This could be studied by proliferation dye dilution of WT cNK cells in this transfer-setup.

      (3) The authors should consider inducible Tgfbr2 deletion (e.g. with Tamoxifen-inducible Cre) to enable development of the uterine NK compartment in virgin mice and only ablate trNK differentiation during pregnancy. This could help to estimate the turnover of cNK into trNK, or to understand if constant cNK recruitment is required to form the uterine trNK compartment during pregnancy.

      (4) Did the authors consider transfer of Tgfbr2-floxed Ncr1-Cre cNK in the same setup as in Fig. 1C? This experiment could confirm the requirement of Tgfbr-dependent signalling for cNK to trNK conversion during pregnancy versus effects of Tgfb signals on trNK numbers in the uterus at steady state (before pregnancy).

      (5) Figures 2D/E

      The authors should state that ILC1s are reduced in the virgin uterus of female Tgfbr2-floxed or Tgfb1-floxed Ncr1-Cre mice and cite the relevant work (the Ref #29 discussed in this context did not show that?). It would be helpful to include an analysis of all three uterine ILC subsets in steady state. This could help to answer the question if the cNK cell changes are pregnancy-specific or a general phenomenon in Tgfbr2-floxed Ncr1-Cre mice.

      (6) Figure 2E

      Please phrase more carefully about the "concomitant increase" of cNKs, since this increase is much less pronounced compared to the very strong reduction (absence) of trNKs in Tgfbr2-floxed Ncr1-Cre mice. Do the authors suggest that cNKs are halted at this stage and cannot differentiate into trNK, based on these data?

      (7) Figure 3/4

      Can the reduced litter size and the abnormal spiral artery formation be rescued by transfer of WT cNK into Tgfbr2-floxed Ncr1-Cre mice?

    1. eLife Assessment

      This important study reports that the human posterior inferotemporal cortex (hPIT) functions as an attentional priority map, integrating both top-down and bottom-up attentional signals rather than serving solely as an object-processing region. The experiments and analyses are well conducted and provide compelling evidence that hPIT bridges dorsal and ventral attention networks and is robustly modulated by attention across diverse visual tasks. The study will be relevant for researchers investigating visual attention, high-level visual cortex, and the neural mechanisms that integrate endogenous and exogenous attentional control.

    2. Reviewer #1 (Public review):

      The manuscript titled "The distinct role of human PIT in attention control" by Huang et al. investigates the role of the human posterior inferotemporal cortex (hPIT) in spatial attention. Using fMRI experiments and resting-state connectivity analyses, the authors present compelling evidence that hPIT is not merely an object-processing area, but also functions as an attentional priority map, integrating both top-down and bottom-up attentional processes. This challenges the traditional view that attentional control is localized primarily in frontoparietal networks.

      The manuscript is strong and of high potential interest to the cognitive neuroscience community. Below, I raise questions and suggestions to help with the reliability, methodology, and interpretation of the findings.

      (1) The authors argue that hPIT satisfies the criteria for a priority map, but a clearer justification would strengthen this claim. For example, how does hPIT meet all four widely recognized criteria, such as spatial selectivity, attentional modulation, feature invariance, and input integration, when compared to classical regions such as LIP or FEF? A more systematic summary of how hPIT meets these benchmarks would be helpful. Additionally, to what extent are the observed attentional modulations in hPIT independent of general task difficulty or behavioral performance?

      (2) The authors report that hPIT modulation is invariant to stimulus category, but there appear to be subtle category-related effects in the data. Were the face, scene, and scrambled images matched not only in terms of luminance and spatial frequency, but also in terms of factors such as semantic familiarity and emotional salience? This may influence attentional engagement and bias interpretation.

      (3) The result that attentional load modulates hPIT is important and adds depth to the main conclusions. However, some clarifications would help with the interpretation. For example, were there observable individual differences in the strength of attentional modulation? How consistent were these effects across participants?

      (4) The resting-state data reveal strong connections between hPIT and both dorsal and ventral attention networks. However, the analysis is correlational. Are there any complementary insights from task-based functional connectivity or latency analyses that support a directional flow of information involving hPIT? In addition, do the authors interpret hPIT primarily as a convergence hub receiving input from both DAN and VAN, or as a potential control node capable of influencing activity in these networks? Also, were there any notable differences between hemispheres in either the connectivity patterns or attentional modulation?

      (5) A few additional questions arise regarding the anatomical characteristics of hPIT: How consistent were its location and size across participants? Were there any cases where hPIT could not be reliably defined? Given the proximity of hPIT to FFA and LOp, how was overlap avoided in ROI definition? Were the functional boundaries confirmed using independent contrasts?

      Comments on revisions:

      The authors have successfully addressed my previous questions and concerns. The public comments above reflect my views on the initial submission and, in my opinion, will remain helpful for general readers. Given this, I do not have additional public comments and will keep my previous public review unchanged.

    3. Reviewer #2 (Public review):

      Summary

      This study investigates the role of the human posterior inferotemporal cortex (hPIT) in attentional control, proposing that hPIT serves as an attentional priority map that integrates both top-down (endogenous) and bottom-up (exogenous) attentional processes. The authors conducted three types of fMRI experiments and collected resting-state data from 15 participants. In Experiment 1, using three different spatial attention tasks, they identified the hPIT region and demonstrated that this area is modulated by attention across tasks. In Experiment 2, by manipulating the presence or absence of visual stimuli, they showed that hPIT exhibits strong attentional modulation in both conditions, suggesting its involvement in both bottom-up and top-down attention. Experiment 3 examined the sensitivity of hPIT to stimulus features and attentional load, revealing that hPIT is insensitive to stimulus category but responsive to task load - further supporting its role as an attentional priority map. Finally, resting-state functional connectivity analyses showed that hPIT is connected to both dorsal and ventral attention networks, suggesting its potential role as a bridge between the two systems. These findings extend prior work on monkey PITd and provide new insights into the integration of endogenous and exogenous attention.

      Strength

      (1) The study is innovative in its use of specially designed spatial attention tasks to localize and validate hPIT, and in exploring the region's role in integrating both endogenous and exogenous attention, as prior works focus primarily on its involvement in endogenous attention.

      (2) The authors provided very comprehensive experiment designs with clear figures and detailed descriptions.

      (3) A broad range of analyses was conducted to support the hypothesis that hPIT functions as an attentional priority map -- including experiments of attentional modulation under both top-down and bottom-up conditions, sensitivity to stimulus features and task load, and resting-state functional connectivity. These analyses showed consistent results.

      (4) Multiple appropriate statistical analyses - including t-tests, ANOVAs, and post-hoc tests-were conducted, and the results are clearly reported.

      Comments on revisions:

      The authors have addressed our comments in their revised manuscript and in their response to the reviewers. We don't have any further suggestions or comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript titled "The distinct role of human PIT in attention control" by Huang et al. investigates the role of the human posterior inferotemporal cortex (hPIT) in spatial attention. Using fMRI experiments and resting-state connectivity analyses, the authors present compelling evidence that hPIT is not merely an object-processing area, but also functions as an attentional priority map, integrating both top-down and bottom-up attentional processes. This challenges the traditional view that attentional control is localized primarily in frontoparietal networks.

      The manuscript is strong and of high potential interest to the cognitive neuroscience community. Below, I raise questions and suggestions to help with the reliability, methodology, and interpretation of the findings.

      Thank you for a nice summary of the key points of our study. Below you will find our reply to your questions.

      (1) The authors argue that hPIT satisfies the criteria for a priority map, but a clearer justification would strengthen this claim. For example, how does hPIT meet all four widely recognized criteria, such as spatial selectivity, attentional modulation, feature invariance, and input integration, when compared to classical regions such as LIP or FEF? A more systematic summary of how hPIT meets these benchmarks would be helpful. Additionally, to what extent are the observed attentional modulations in hPIT independent of general task difficulty or behavioral performance?

      Great suggestions! For the first suggestion, we have included a clearer justification in the discussion part of manuscript (line 405-406). For the second one, all participants received task practice prior to scanning, and task accuracy exceeded 90%, suggesting the tasks were not overly demanding. Although ceiling effects limit the interpretability of behavioral-performance correlations, we argue that higher task demands would likely require greater attentional effort, leading to stronger modulation in hPIT, which aligns with our findings.

      (2) The authors report that hPIT modulation is invariant to stimulus category, but there appear to be subtle category-related effects in the data. Were the face, scene, and scrambled images matched not only in terms of luminance and spatial frequency, but also in terms of factors such as semantic familiarity and emotional salience? This may influence attentional engagement and bias interpretation.

      The response of hPIT is not sensitive to stimulus category, but attentional modulation in hPIT is slightly stronger to faces than scenes and scrambled images. Although faces used in the task had neutral expressions and the scene pictures were also neutral, we acknowledge that we indeed cannot exclusively eliminate the possibility that potential semantic familiarity or emotional salience may contribute to the subtle category-related effects in the results of experiment 3. This limitation has been noted in the discussion part of manuscript (line 440-442).

      (3) The result that attentional load modulates hPIT is important and adds depth to the main conclusions. However, some clarifications would help with the interpretation. For example, were there observable individual differences in the strength of attentional modulation? How consistent were these effects across participants?

      Yes, individual differences exist. In the manuscript, we have included individual subject data points in the figure 6B. No data exceeded three standard deviations from the group mean, suggesting that the attentional modulation effects were generally consistent across participants.

      (4) The resting-state data reveal strong connections between hPIT and both dorsal and ventral attention networks. However, the analysis is correlational. Are there any complementary insights from task-based functional connectivity or latency analyses that support a directional flow of information involving hPIT? In addition, do the authors interpret hPIT primarily as a convergence hub receiving input from both DAN and VAN, or as a potential control node capable of influencing activity in these networks? Also, were there any notable differences between hemispheres in either the connectivity patterns or attentional modulation?

      Though it’s hard to generate directional flow of information from fMRI due to the low temporal resolution. We agree that besides resting-state connection, task-based functional connectivity analyses would have the potential to provide additional information about whether hPIT serves as a convergence node or a control hub. We have conducted task-based functional connectivity analyses, specifically PPI, using data from experiment 2, which revealed task-modulated right hPIT connectivity with FFA, LOp, and TPJ, suggesting hPIT may allocate attentional resources to object-processing regions following priority map generation (line 378-383). Given the limited number of significant PPI results and the inherent constraints of fMRI in capturing fast or transient attention-related interactions, the present data do not allow us to determine the role of hPIT. Future studies combining effective connectivity or causal perturbation methods (e.g., DCM, TMS-fMRI) would be ideal to test whether hPIT acts as a control node influencing activity within DAN and VAN.

      We also observed modest hemispheric asymmetries in connectivity—for instance, both left and right hPIT showed stronger connectivity with right-hemisphere attention nodes. This has been described in the results part of manuscript (line 373-377).

      (5) A few additional questions arise regarding the anatomical characteristics of hPIT: How consistent were its location and size across participants? Were there any cases where hPIT could not be reliably defined? Given the proximity of hPIT to FFA and LOp, how was overlap avoided in ROI definition? Were the functional boundaries confirmed using independent contrasts?

      We can see a relatively consistent size and location of hPIT across subjects in Supplementary Figure 1, where the voxel size and location for individual subjects reported. The consistency also demonstrated by figure 4C.

      We avoided overlap with the FFA and LOp by manually delineating the hPIT which is defined by conjunction maps across three tasks and by avoiding overlapping voxels. The FFA was defined using an independent contrast (Exp3 contrast [face-scene]) and the Lop location was defined by anatomical parcellation (Glasser et al., 2016).

      Reviewer #2 (Public review):

      Summary

      This study investigates the role of the human posterior inferotemporal cortex (hPIT) in attentional control, proposing that hPIT serves as an attentional priority map that integrates both top-down (endogenous) and bottom-up (exogenous) attentional processes. The authors conducted three types of fMRI experiments and collected resting-state data from 15 participants. In Experiment 1, using three different spatial attention tasks, they identified the hPIT region and demonstrated that this area is modulated by attention across tasks. In Experiment 2, by manipulating the presence or absence of visual stimuli, they showed that hPIT exhibits strong attentional modulation in both conditions, suggesting its involvement in both bottom-up and top-down attention. Experiment 3 examined the sensitivity of hPIT to stimulus features and attentional load, revealing that hPIT is insensitive to stimulus category but responsive to task load - further supporting its role as an attentional priority map. Finally, resting-state functional connectivity analyses showed that hPIT is connected to both dorsal and ventral attention networks, suggesting its potential role as a bridge between the two systems. These findings extend prior work on monkey PITd and provide new insights into the integration of endogenous and exogenous attention.

      Strengths

      (1) The study is innovative in its use of specially designed spatial attention tasks to localize and validate hPIT, and in exploring the region's role in integrating both endogenous and exogenous attention, as prior works focus primarily on its involvement in endogenous attention.

      (2) The authors provided very comprehensive experiment designs with clear figures and detailed descriptions.

      (3) A broad range of analyses was conducted to support the hypothesis that hPIT functions as an attentional priority map -- including experiments of attentional modulation under both top-down and bottom-up conditions, sensitivity to stimulus features and task load, and resting-state functional connectivity. These analyses showed consistent results.

      (4) Multiple appropriate statistical analyses - including t-tests, ANOVAs, and post-hoc tests - were conducted, and the results are clearly reported.

      Thank you for a nice summary of the key points and strengths of our study.

      Weaknesses

      (1) The sample size is relatively small (n = 15), and inter-subject variability is big in Figures 5 and 6, as seen in the spread of individual data points and error bars. The analysis of attention-modulated voxel map intersections appears to be influenced by multiple outliers.

      We agree that the sample size (n = 15) is not ideal, and we acknowledge that some data points in Figures 5 and 6 appear to be potential outliers. However, according to conventional outlier detection criteria, all data points fell within three standard deviations of the group mean and were therefore retained for analysis.

      Moreover, the attention-modulated voxel intersection map shown in Figure 4C is insensitive to outliers, because the intersection plotted is based on the number of subjects

      (2) The authors acknowledge important limitations, including the lack of exploration of feature-based attention and the temporal constraints inherent to fMRI.

      Yes, we have mentioned these limitations in the discussion.

      (3) Prior research has established that regions such as the prefrontal cortex (PFC) and posterior parietal cortex (PPC) are involved in both endogenous and exogenous attention and have been proposed as attentional priority maps. It remains unclear what is uniquely contributed by hPIT, how it functionally interacts with these classical attentional hubs, and whether its role is complementary or redundant. The study would benefit from more direct comparisons with these regions.

      In this study, we define the ROI base on intersection across three different types of spatial attention tasks, which is a stricter criterion. And the results didn’t reveal spatial attentional modulation across tasks besides PITd. This could be due to the lack of lateralized responses in PFC/PPC. To evaluate whether a region qualifies as a priority map, we applied four widely accepted criteria (as mentioned in introduction). While dorsal and ventral attention network (DAN and VAN) regions can be considered supportive components of the priority map system, our findings suggest that among the regions tested, only hPIT fully meets all criteria. In Experiment 2, we included regions such as VFC (as part of PFC) and IPS (as part of PPC), and our findings suggest these areas are more involved in top-down attention. In the revision, we have performed additional analysis on PPC (IPS) and PFC (FEF, VFC), shown in Figure S2.

      (4) The functional connectivity analysis is only performed on resting-state data, and this approach does not capture context-dependent interactions. Task-based data analysis can provide stronger evidence.

      We acknowledge that resting-state FC is limited in assessing task-specific communication. To further investigate the role of hPIT, we have conducted PPI analysis, which revealed task-modulated right hPIT connectivity in attention allocation (line 378-383).

      (5) The study does not report whether attentional modulation in hPIT is consistent across the two hemispheres. A comparison of hemispheric effects could provide important insight into lateralization and inter-individual variability, especially given the bilateral localization of hPIT.

      We thank the reviewer for this suggestion. hPIT was localized bilaterally using the same intersection-based method in Experiment 1. We have now performed additional analysis and found hemispheric differences in hPIT attentional modulation (Experiment 2). Besides, we also found in Experiment 3, the difference of load modulation (averaged across stimulus categories) in left and right hPIT was not significant. These results have been reported in the results part of manuscript (line 347-351).

    1. eLife Assessment

      This is an important study that advances our understanding of the transition from quadrupedal to bipedal gait in a neuromechanical model of the Japanese macaque. The method and results are solid; the neuromusculoskeletal model successfully reproduces experimental data, and the stability analysis based on an inverted pendulum model effectively explains the effects of different transition strategies. However, the study would benefit from a more comprehensive sensitivity analysis. The findings are highly relevant for researchers in motor control, comparative physiology/biomechanics, and robotics.

    2. Reviewer #1 (Public review):

      Summary:

      The article investigates how the Japanese macaque makes gait transitions between quadruped and biped gaits. It presents a compelling neuromechanical simulation that replicates the transition and an interesting analysis based on an inverted pendulum that can explain why some transition strategies are successful and others are not.

      Strengths:

      I enjoyed reading this article. I think it presents an interesting study and elegant modeling approaches (musculoskeletal + inverted pendulum). The study is well conducted, and the results are interesting. I particularly liked how the success of gait transitions could be predicted based on the inverted pendulum and its saddle node stability. I think it makes a useful and interesting contribution to the state of the art.

      Weaknesses:

      The article is already in great shape, but could be improved a bit by:

      (1) Strengthening the comparison to animal data. In particular, videos of the real animal should be included + snapshots of their gaits (quadruped, biped, and transitions).

      (2) Exploring and testing a broader range of conditions. I think it would be very interesting to test gaits and gait transitions on up and down slopes (both with the musculoskeletal model and with the inverted pendulum model). This could be used to make predictions on how the real animal adapts to those conditions. Ideally, this should be tested on the animal as well. I think this could increase (even more) the impact of this work.

      (3) Better explaining several aspects of the PSO optimization.

      (4) (Ideally) performing a sensitivity analysis on the optimized parameters (e.g. variations of +-5, 10, 20%) in order to determine their respective importance and how much their instantiated values have influenced the results.

      (5) Running a spell checker, as there are quite a few typos.

    3. Reviewer #2 (Public review):

      Summary:

      This article presents a neuromusculoskeletal (NMS) model of the Japanese Macaque. This model is added with a neural feedforward controller based on CPG and synergy that allows for reproducing quadrupedal and bipedal gait as well as the transition between quadrupedal and bipedal gait. The model and controller were validated using experimental data. Results were also compared to an inverted pendulum model to show that the transition between quadrupedal and bipedal in macaque is using this kind of representation for transition and stability. Overall, the article is very interesting, but it sometimes lacks clarity.

      Strengths:

      The results of the model present impressive results for quadrupedal, bipedal, and transition, validated by experimental data. NMS controllers based on feedforward controllers are very difficult to fine-tune.

      Weaknesses:

      (1) The movement regulator is not clear and should be better explained. At first, it seems that it is just a new CPG/synergy (feedforward) added, but in the methods, it seems to be a feedback controller.

      (2) It is also not clear what is meant by discretizing the weight for the trigger limb from 0 to 1 (page 8).

      (3) The controller is mainly using a feedforward controller, allowing only anticipatory movement. Animals are also using a reflex-based feedback controller. A controller with feedback/reflex could reduce failed attempts in training and better represent the transition.

      (4) There are small typos throughout the article that should be corrected.

    4. Reviewer #3 (Public review):

      Summary:

      The purpose of this study was to test the hypothesis that the inverted pendulum mechanism contributes to the gait transition from quadrupedal to bipedal gait in Japanese macaques. The author uses a neuromusculoskeletal model to generate different motor tasks by varying motor command parameters during forward dynamics simulations. After simulations were done, the authors used dynamical system analysis of the inverted pendulum model to reveal the underlying principles of gait transition control. The authors showed that successful gait transition from quadrupedal to bipedal gait mostly depends on increased step length of a hindlimb.

      Strengths:

      This study is important not only for understanding gait transition, but also to understand stability control of bipedal gaits. Another advantage of this study is that it allows us to estimate the effect of one control mechanism and find its effect and limits. In animal studies, we also have a combination of compensatory stability control mechanisms.

      Weaknesses:

      Any simulation is not perfect, so discrepancies from experimental data are expected. A 2D model is used, but the advantage of using a 3D model is not clear, and it is much more complicated.

    1. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about the interpretation of group differences across neutral and negative conditions limit the interpretability of the results.

      We are grateful for this improved assessment. Below, we provide detailed responses that we believe address the noted concerns about interpreting group differences across conditions. If these clarifications resolve the interpretability concerns, we would be grateful if the editors would consider updating the eLife assessment accordingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the Drift and Diffusion Model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options in individuals with bulimia nervosa (BN) and healthy participants

      (2)The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has potential to improve the understanding of pathological food choices.

      Weaknesses:

      I thank the author for reviewing their manuscript.

      However, I still have major concerns.

      The authors say that they removed any causal claims in their revised version of the manuscript. The sentence before the last one of the abstract still says "bias for high-fat foods predicted more frequent subjective binge episodes over three months". This is a causal claim that I already highlighted in my previous review, specifically for that sentence (see my second sentence of my major point 2 of my previous review).

      We appreciate the Reviewer's continued attention to causal language. We acknowledge that our use of the term 'predicted', though intended to refer to statistical prediction in a regression model, could be misinterpreted as implying causation. We have therefore revised this sentence to read: 'bias for high-fat foods was associated with more frequent subjective binge episodes over three months’.

      I also noticed that a comment that I added was not sent to the authors. In this comment I was highlighting that in Figure 2 of Galibri et al., I was uncertain about a difference between neutral and negative inductions of the average negative rating after the induction in the BN group (i.e. comparing the negative rating after negative induction in BN to the negative rating after neutral induction in BN). Figure 2 of Galibri et al. looks to me that:

      (1) The BN participants were more negative before the induction when they came to the neutral session than when they came to the negative session.

      (2) The BN participants looked almost negatively similar (taking into account the error bars reported) after the induction in both sessions

      These observations are of high importance because they may support the fact that BN patients were likely in a similar negative state to run the food decision task in both conditions (negative and neutral). Therefore, the lack of difference in food choices in BN patients is unsurprising and nothing could be concluded from the DDM analyses. Moreover, the strong negative ratings of BN patients in the neutral condition as compared to healthy participants together with almost similar negative ratings after the two inductions contradict the authors' last sentence of their abstract.

      I appreciate that the authors reproduced an analysis of their initial paper regarding the negative ratings (i.e. Table S1). It partly answers my aforementioned point but does not address the fact that BN may have been in a similar negative state in both conditions (neutral and negative) when running the food decision task: if BN patients were similarly negative after both induction (neutral and negative), nothing can be concluded from their differences in their results obtained from the DDM. As the authors put it, "not all loss-ofcontrol eating occurs in the context of negative state", I add that far from all negative states lead to a loss-of-control eating in BN patients. This grounds all my aforementioned remarks and my remarks of my first review.

      A solution for that is to run a paired t-test in BN patients only comparing the score after the induction in the two conditions (neutral and negative) reported in Figure 2 of their initial article.

      We appreciate the reviewer’s concern. We understand how the visual representation in Figure 2, which displays between-subject error bars, might suggest similar post-induction affect levels. However, the within-subject paired comparison (which appropriately accounts for individual differences in baseline affect) reveals a significant difference, which we detail below.

      While BN participants did report higher baseline negative affect than the HC group prior to the mood inductions, this does not negate the effectiveness of the manipulation. The critical comparison is the within-subject change from pre- to post-induction (detailed below) which shows that negative affect was significantly higher after the negative induction than the neutral induction.

      As we reported in the Supplementary Information (Table S1), our initial analyses of self-reported affect ratings used a linear mixed-effects model with group (HC = 0, BN = 1), condition (Neutral = 0, Negative = 1), and time (pre-induction = 0, post-induction = 1) as fixed effects, including all interactions, and random intercepts for participants. This approach accounts for individual differences in baseline affect.

      However, to address the reviewer's concerns, we conducted two simple effects analyses using estimated marginal means. As the reviewer suggested, we directly compared post-induction affect between conditions within the BN group (described in the second analysis below). In the first analysis, we examined the diagnosis × time interaction within each condition separately. In the Negative condition, individuals with BN demonstrated a substantial increase in negative affect from pre- to post-induction (mean difference = 20.36, t = 4.84, p < 0.0001, Cohen’s d = 0.97). In the second analysis, we examined the condition × time interaction within each group separately. Among the BN group, we found that reported affect was significantly higher following the negative mood induction than after the neutral affect induction (mean difference = -17.40, t = -4.13, p = 0.0003, Cohen’s d = 0.83). This difference in post-induction negative affect between conditions within the BN group represents a meaningful and statistically robust difference in affective states. These within-group effects confirm that the negative mood induction was (1) effective in the BN group and (2) produced significantly greater negative affect than the neutral mood induction.

      These findings confirm that participants completed the food decision task under meaningfully different affective states, supporting the interpretability of the subsequent DDM analyses. We now report these analyses in the Supplementary Information.

      I appreciate the analysis that the authors added with the restrictive subscale of the EDE-Q.

      That this analysis does not show any association with the parameters of interest does not show that there is a difference in the link between self reported restrictions and self reported binges. Only such a difference would allow us to claim that the results the authors report may be related to binges.

      We thank the reviewer for raising this important point about specificity. To address this concern, we examined the correlation between self-reported binge frequency (both subjective binge episodes and objective binge episodes over the past three months) and EDE-Q Restraint subscale in our BN sample.

      The correlation between these measures were modest and non-significant (subjective binge frequency: Spearman’s p = 0.21, p = 0.306; objective binge frequency: Spearman’s p = 0.05, p = 0.806), indicating that both binge frequency measures and dietary restraint were relatively independent dimensions of eating pathology in our sample. This dissociation supports the specificity of our findings: the fact that our DDM parameters were associated with binge frequency but not with dietary restraint suggests that the affect-induced changes in decisionmaking we observed are specifically related to binge-eating behavior rather than reflecting a correlate of dietary restraint. We now report this analysis in the Supplementary Information.

      I appreciate the wording of the answer of the authors to my third point: "the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms". This sentence is crystal clear and sums very well the limits of the associations the authors report with binge eating frequency. However, I do not see this sentence in the manuscript. I think the manuscript would benefit substantially from adding it.

      We thank the reviewer for the suggestion. We have added the following sentences that convey this information to the end of the third paragraph of the discussion:

      “These results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic. However, our correlational design does not allow us to determine whether this reactivity causes the symptoms.”

      Statistical analyses:

      If I understood well the mixed models performed, analyses of supplementary tables S1 and S27 to S32 are considering all measures as independent which means that the considered score of each condition (neutral vs negative) and each time (before vs after induction) which have been rated by the same participants are independent. Such type of analyses does not take into account the potential correlation between the 4 scores of a given participant. As a consequence, results may lead to false positives that a linear mixed model does not address. The appropriate analysis would be to run adapted statistical tests pairing the data without running any mixed model.

      We appreciate the reviewer's attention to the statistical approach. However, we respectfully note that mixed-effects models do account for within-subject correlations, contrary to the reviewer’s interpretation.

      The linear mixed-effects model we employed explicitly accounts for the correlation among repeated measures from the same participant through the random intercept term. This random effect structure models the non-independence of observations within participants, allowing for correlated errors within individuals while assuming independence between individuals. This is a standard and appropriate approach for analyzing repeated-measures data (Bates et al., 2015).

      The mixed-effects model is, in fact, more appropriate than separate paired t-tests for our design because it:

      (1) Simultaneously models all fixed effects (group, condition, time) and their interactions in a single unified framework;

      (2) Properly partitions variance into within-subject and between-subject components;

      (3) Provides greater statistical power and more precise estimates by using all available data simultaneously; and

      (4) Allows for direct testing of three-way interactions that cannot be assessed through pairwise comparisons alone.

      Paired tests (e.g., t-tests), as the reviewer suggests, would require multiple separate analyses and would not allow us to test our primary hypotheses about group × condition × time interactions. The mixed-effects approach provides a more comprehensive and statistically rigorous analysis of our repeated-measures design. To clarify this even further in the manuscript, we have added the following in our methods when describing our model, “participant-level random intercepts were included to account for within-subject correlations across repeated measurements.”

      Notes:

      It is not because specific methods like correlating self reported measures over long periods with almost instantaneous behaviors (like tasks) have been used extensively in studies that these methods are adapted to answer a given scientific question. Measures aggregated over long periods miss the variations in instantaneous behaviors over these periods.

      We acknowledge the reviewer’s concern about the temporal mismatch between our session-level task measures and the 3-month aggregated symptom reports. This is a valid limitation of crosssectional designs, and we agree that examining how task performance fluctuates in relation to real-time symptom variation would provide richer insights into the potential dynamics of these relationships.

      We agree that we cannot capture how daily changes in task performance relate to momentary symptom occurrence. In response to previous rounds of helpful reviews, we added this limitation to the Discussion section, noting that future research employing ecological momentary assessment (EMA) or daily diary methods could examine whether the decision-making processes we identified also fluctuate in relation to real-time symptom occurrence.

      We note that our finding that affect-induced changes in decision-making parameters were associated with subjective binge frequency suggests that this laboratory-measured reactivity may reflect a stable individual difference that manifests across contexts and time periods. While our current study provides initial evidence that individual differences in affect-related decisionmaking are associated with symptom severity, we acknowledge that longitudinal designs with repeated assessments would strengthen causal and temporal inferences.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well-understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decisionmaking processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant, and the methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      Sample size was relatively small, and participants were all women with BN, which limits generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. These limitations are adequately noted in the discussion.

      We are grateful to Reviewer #2 for their careful and supportive review of our manuscript. We appreciate their recognition that computational modeling can reveal nuanced alterations in decision-making processes that may not be apparent in overt behavioral choices. Their balanced assessment of both the strengths and limitations of our work has been helpful in contextualizing our findings appropriately. We have carefully considered their comments regarding sample size and the potential limitations of our mood induction procedure, both of which we discuss in detail in the manuscript's limitations section.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach-the diffusion decision model-to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding-that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness-offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We are grateful to Reviewer #3 for their thoughtful evaluation of our work. We appreciate their recognition that the diffusion decision model provides a novel analytical lens for understanding how negative affect influences the dynamics of food-related decision-making in bulimia nervosa. Their balanced assessment of both the methodological strengths of our design (counterbalancing, rigorous statistical corrections) and its limitations (sample size, mood induction efficacy) has been valuable in ensuring we appropriately contextualize our findings and their implications. Specifically, we have taken their comments regarding sample size and the relative efficacy of different mood induction methods seriously, and we address these important methodological considerations in our discussion of the study's limitations.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have addressed my previous comments, and I do not have any additional suggestions for improvement.

      We thank the reviewer for their time, effort, and insightful feedback.

      Reviewer #3 (Recommendations for the authors):

      The authors have adequately addressed my feedback. I have no further comments.

      We thank the reviewer for their time, effort, and insightful feedback.

    2. eLife Assessment

      This study makes a valuable contribution to understanding how negative affect shapes food-choice decision making in bulimia nervosa by leveraging a mechanistic drift diffusion model to quantify the weighting of tastiness and healthiness attributes. The evidence is solid, supported by a randomized crossover design and generally appropriate statistical analyses. However, the interpretability of the findings is limited by ambiguities in the affect manipulation, particularly regarding whether neutral and negative inductions yielded reliably distinct affective states at the time of task performance in the bulimia nervosa group. Consequently, session-related differences in model parameters cannot be unequivocally attributed to negative affect rather than to uncontrolled state or contextual factors, and clearer separation of affective conditions alongside analyses aligned with the paired data structure would strengthen the conclusions.

    3. Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the Drift and Diffusion Model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options in individuals with bulimia nervosa (BN) and healthy participants (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has potential to improve the understanding of pathological food choices.

      Weaknesses:

      I thank the authors for revising their manuscript.

      I still notice that the authors did not go through their manuscript to look for wordings refering to a prediction interpretation of their results while I already highlighted the inappropriateness of this wording in my two first rounds of reviews: e.g. there is still "we used zero-inflated negative binomial models to predict the three-month frequency" and I can find other statements like this. The design of their study does not allow such claims.

      The authors answered my major concern regarding the experimental induction towards a negative or a neutral state before running the food decision task. My concern is: BN patients already seemed to be already in a high negative state before undergoing the neutral induction, while these patients are in a lower negative state before undergoing the negative induction. It is therefore not surprising that patients seem to report a similar level of negative state after the two inductions (according to the figure of the authors' previous article). Of note is that the additional analysis the authors ran within the BN group only provides a significant result: this result shows that there has been an induction but does not rule out that patients were in the exact same magnitude of negative state to perform the task as the figure in their previously published article suggests it. The major issue is to show that:

      (1) As compared to the neutral induction, there has been a higher variation in negative state after as compared to before the negative induction.

      (2) The magnitude of the negative state after the negative induction is higher than the magnitude of the negative state after the neutral induction.

      The first point shows that the induction worked. The second point shows that the participants are in two distinct states. Without showing the second point, it may be possible that one induction increases the negative state of participants to the same level as the one of the second induction that has not increased anything.

      Within this context, how is it possible to associate, in patients, a difference in the DDM between the two sessions to a negative state (which is one of the main focus of the article) rather than to another parameter that has not been captured? A similar situation would be in an experiment studying the consequence of stress, a stressfull induction over relaxed participants attending the lab has high chances to raise the level of stress of those participants to the same level as the one that the same participants would experience after a neutral induction when these participants attend the lab with an already high level of stress. In that case, would it be approrpiate to claim that a difference at a task performed after the induction would be related to stress while the participants would be at the same level of stress when performing the task despite the fact that the induction worked ?

      In the experiment performed by the authors, the additional analysis to perform would be a paired sample t-test (or the appropriate non-parametric test) to check whether the magnitude of negative state of BN patients was different between the negative and neutral conditions after the induction only. If not, associating the difference at the DDM with negative states in BN is highly misleading.

      I read carefully the authors' answer related to mixed models: they claim that mixed models take into account correlations within their repeated data. The specification of the structure of the covariance matrix allows to control only partly for that. I notice that the authors did not specify the structure of that matrix: the article they refer to to justify the appropriatness of their analyses is not adapted. The specification of the structure of the covariance matrix needs to address, in a mixed model, the difference in handling 4 repeated data per participants that cannot be paired as compared to 4 repeated data that can be paired (two per session with one before and one after the neutral or negative priming sessions, if I count right). Of note is that a covariance structure that is left free of constraint for the fit of the model does not capture appropriately the pairing of the data: it has all chances to capture the covariance in a different way. And a covariance structure that has constraints has more chances to lead to a model that cannot be estimated because of an absence of convergence of the algorithms.

      By the way, a single two-sample t-test (or a Mann-Whitney test if appropriate), and not a set of multiple paired-sample t-test as the authors suggest, would answer the goal of the authors to test for what they call the three-way interaction in their comment. This test would be performed between the two groups of participants (BN/controls) with the computation for each participant separately: (assessment after neutral induction-assessment before neutral induction)-(assessment after negative induction-assessment before negative induction). This analysis answers points 1, 2 and 4 they raise together with my point of controlling for the paired data. I would have agreed with their choice of a mixed model if they had an unbalanced dataset within each participant.

    4. Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well-understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      Sample size was relatively small, and participants were all women with BN, which limits generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. These limitations are adequately noted in the discussion.

    1. eLife Assessment

      Hoverflies are known for their sexually dimorphic visual systems and exquisite flight behaviors. This valuable study reports how two types of visual descending neurons differ between males and females in their motion- and speed-dependent responses, yet surprisingly, the behavior they control lacks any sexual dimorphism. The results convincingly support these findings, which will be of interest for studies of visuomotor transformations and network-level brain organization.

    2. Reviewer #1 (Public review):

      Summary:

      Hoverflies are known for a striking sexual dimorphism in eye morphology and early visual system physiology. Surprisingly, the male and female flight behaviors show only subtle differences. Nicholas et al. investigate the sensori-motor transformation of sexually dimorphic visual information to flight steering commands via descending neurons. The authors combined intra- and extracellular recordings, neuroanatomy, and behavioral analysis. They convincingly demonstrate that descending neurons show sexual dimorphisms - in particular at high optic flow velocities - while wing steering responses seem relatively monomorphic. The study highlights a very interesting discrepancy between neuronal and behavioral response properties.

      More specifically, the authors focused on two types of descending neurons that receive inputs from well-characterized wide-field sensitive tangential cells: OFS DN1, which receives inputs from so-called HS cells, and OFS DN2, which receives input from a set of VS cells. Their likely counterparts in Drosophila connect to the neck, wing, and haltere neuropils. The authors characterized the visual response properties of these two neuronal classes in both male and female hoverflies and identified several interesting differences. They then presented the same set of stimuli, tracked wing beat amplitude, and analyzed the sum and the difference of right and left wing beat amplitude as a readout of lift or thrust, and yaw turning, respectively. Behavioral responses showed little to no sexual dimorphism, despite the observed neuronal differences.

      Strengths:

      I find the question very interesting and the results both convincing and intriguing. A fundamental goal in neuroscience is to link neuronal responses and behavior. The current study highlights that the transformations - even at the level of descending neurons to motoneurons - are complex and less straightforward than one might expect.

      Weaknesses:

      The authors investigated two types of descending neurons, but it was not clear to me how many other descending neurons are thought to be involved in wing steering responses to wide-field motion. I would suggest providing a more in-depth overview of what is known about hoverflies and Drosophila, since the conclusions drawn from the study would be different if these two types were the only descending neurons involved, as opposed to representing a subset of the neurons conveying visual information to the wing neuropil.

      Both neuronal classes have counterparts in Drosophila that also innervate neck motor regions. The authors filled the hoverfly DNs in intracellular recordings to characterize their arborization in the ventral nerve cord. In my opinion, these anatomical data could be further exploited and discussed a bit more: is the innervation in hoverflies also consistent with connecting to the neck and haltere motor regions? Are there any obvious differences and similarities to the Drosophila neurons mentioned by the authors? If the arborization also supports a role in neck movements, the authors could discuss whether they would expect any sexual dimorphism in head movements.

    3. Reviewer #2 (Public review):

      Summary:

      Many fly species exhibit male-specific visual behaviors during courtship, while little is known about the circuit underlying the dimorphic visuomotor transformations. Nicholas et al focus on two types of visual descending neurons (DNs) in hoverflies, a species in which only males exhibit high-speed pursuit of conspecifics. They combined electrophysiology and behavior analysis to identify these DNs and characterize their response to a variety of visual stimuli in both male and female flies. The results show that the neurons in both sexes have similar receptive fields but exhibit speed-dependent dimorphic responses to different optic flow stimuli.

      Strengths:

      Hoverflies, though not a common model system, show very interesting dimorphic behaviors and provide a unique and valuable entry point to explore the brain organization behind sexual dimorphism. The findings here are not only interesting on their own right but will also likely inspire those working in other systems, particularly Drosophila.

      The authors employed rigorous morphology, electrophysiology, and behavior methods to deliver a comprehensive characterization of the neurons in question. The precision of the measurements allowed for identifying a subtle and nuanced neuronal dimorphism and set a standard for future work in this area.

      Weaknesses:

      Cell-typing using receptive field preferred directions (RFPDs): if I understood correctly, this classification method mostly relies on the LPDs near the center of the receptive field (median within the contour in Fig.1). I have two concerns here. First, this method is great if we are certain there are only two types of visual DNs as described in the manuscript. But how certain is this? Given the importance of vision in flight control, I would expect many DNs that transmit optic flow information to the motor center. I'd also like to point out that there are other lobula plate tangential cells (LPTCs) than HS and VS cells, which are much less studied and could potentially contribute to dimorphic behaviors. Second, this method feels somewhat impoverished given the richness of the data. The authors have nicely mapped out the directional tuning for almost the entire visual field. Instead of reducing this measurement to 2 values (center and direction), I was wondering if there is a better method to fully utilize the data at hand to get a better characterization of these DNs. As the authors are aware, local features alone can be ambiguous in characterizing optic flows. What's more, taking into account more global features can be useful for discovering potentially new cell types.

      Line 131, it wasn't clear to me why full-screen stimuli were used for comparison here, instead of the full receptive field maps. Male flies exhibit sexual dimorphic behaviors only during courtship, which would suggest that small-sized visual stimuli (mimicking an intruder or female conspecific) would be better suited to elicit dimorphic neuronal responses. A similar comment applies to the later results as well. Based on the receptive field mapping in Figure 1, I'm under the impression that these 2 DN types are more suited to detect wide-field optic flows, those induced by self-motion as mentioned in the manuscript. The results are still very interesting, but it's good to make this point clear early on to help set appropriate expectations. Conversely, this would also suggest that there are other visual DN types that are responsible for the courtship-related sexually dimorphic behaviors.

    4. Author response:

      eLife Assessment

      Hoverflies are known for their sexually dimorphic visual systems and exquisite flight behaviors. This valuable study reports how two types of visual descending neurons differ between males and females in their motion- and speed-dependent responses, yet surprisingly, the behavior they control lacks any sexual dimorphism. The results convincingly support these findings, which will be of interest for studies of visuomotor transformations and network-level brain organization.

      This statement perfectly recapitulates our findings.

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary: 

      Hoverflies are known for a striking sexual dimorphism in eye morphology and early visual system physiology. Surprisingly, the male and female flight behaviors show only subtle differences. Nicholas et al. investigate the sensori-motor transformation of sexually dimorphic visual information to flight steering commands via descending neurons. The authors combined intra- and extracellular recordings, neuroanatomy, and behavioral analysis. They convincingly demonstrate that descending neurons show sexual dimorphisms - in particular at high optic flow velocities - while wing steering responses seem relatively monomorphic. The study highlights a very interesting discrepancy between neuronal and behavioral response properties.

      Thank you for this summary. Most of the statement perfectly recapitulates the main findings of our paper. However, we want to emphasize that some hoverfly flight behaviors are strongly sexually dimorphic, especially those related to courtship and mating. Indeed, only male hoverflies pursue targets at high speed, chase away territorial intruders, and pursue females for mating. However, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not sexually dimorphic. We will amend the Introduction to make the difference between flight behaviors clear.

      More specifically, the authors focused on two types of descending neurons that receive inputs from well-characterized wide-field sensitive tangential cells: OFS DN1, which receives inputs from so-called HS cells, and OFS DN2, which receives input from a set of VS cells. Their likely counterparts in Drosophila connect to the neck, wing, and haltere neuropils. The authors characterized the visual response properties of these two neuronal classes in both male and female hoverflies and identified several interesting differences. They then presented the same set of stimuli, tracked wing beat amplitude, and analyzed the sum and the difference of right and left wing beat amplitude as a readout of lift or thrust, and yaw turning, respectively. Behavioral responses showed little to no sexual dimorphism, despite the observed neuronal differences.

      Thank you for this very nice summary of our work. We want to clarify that LPTC input to DN1 and DN2 has not been shown directly in hoverflies using e.g. dye coupling, or dual recordings. Instead, the presumed HS and VS input is inferred from morphological and physiological DN evidence, and comparisons to similar data in Drosophila and blowflies. We will amend the Introduction to clarify this. The rest of the paragraph perfectly recapitulates the main findings of our paper.

      Strengths:

      I find the question very interesting and the results both convincing and intriguing. A fundamental goal in neuroscience is to link neuronal responses and behavior. The current study highlights that the transformations - even at the level of descending neurons to motoneurons - are complex and less straightforward than one might expect.

      Thank you.

      Weaknesses:

      The authors investigated two types of descending neurons, but it was not clear to me how many other descending neurons are thought to be involved in wing steering responses to wide-field motion. I would suggest providing a more in-depth overview of what is known about hoverflies and Drosophila, since the conclusions drawn from the study would be different if these two types were the only descending neurons involved, as opposed to representing a subset of the neurons conveying visual information to the wing neuropil.

      This is a great point. There are around 1000 fly DNs, of which many could respond to widefield motion, without being specifically tuned to widefield motion. For example, many looming sensitive neurons also respond to widefield motion, and could therefore be involved in the WBA movements that we measured here. In addition, there are many multimodal neurons that could be involved in optomotor responses in free flight, but these may not have been stimulated when we only provided visual input. Furthermore, many visual neurons are modulated by proprioceptive feedback, which is lacking in immobilized physiology preps. Finally, in blowflies, up to 5 optic flow sensitive DNs have been identified morphologically, and in Drosophila 3 have been identified morphologically and physiologically. In summary, it is more than likely that other neurons project visual widefield motion information to the wing neuropil. We will amend our Introduction and Discussion to make this important point clear to the readers.

      Both neuronal classes have counterparts in Drosophila that also innervate neck motor regions. The authors filled the hoverfly DNs in intracellular recordings to characterize their arborization in the ventral nerve cord. In my opinion, these anatomical data could be further exploited and discussed a bit more: is the innervation in hoverflies also consistent with connecting to the neck and haltere motor regions? Are there any obvious differences and similarities to the Drosophila neurons mentioned by the authors? If the arborization also supports a role in neck movements, the authors could discuss whether they would expect any sexual dimorphism in head movements.

      These are all great points. We did not see any clear arborizations to the frontal nerve, where we would expect to find the neck motor neurons (NMNs). In addition, while we did see fine arborizations throughout the length of the thoracic ganglion, we saw no strong outputs projecting directly to the haltere nerve (HN). In the revised version of the MS we will modify figure 4 (morphological characterization) to clarify.

      There are important differences between the morphology of DN1 and DN2 in hoverflies and DNHS1 and DNOVS2 in Drosophila, in terms of their projections in the thoracic ganglion. For example, In Drosophila DNOVS2, there are several fine branches along the length of the neuron in the thoracic ganglia. Similarly, we found fine branches in Eristalis tenax DN2, however, in addition, we found a wide branch projecting to the area of the thoracic ganglion where the prothoracic and pterothoracic nerves likely get their inputs (Figure 4), suggesting that the neuron could contribute to controlling the wings and/or the forelegs (which is why we quantified the WBA). In Drosophila DNHS1, there is a similar fat branch to the prothoracic and pterothoracic nerves, which we also found in Eristalis tenax OFS DN1 (Figure 4). Indeed, while Drosophila DNHS1 and DNOVS2 have quite strikingly different morphology, DN1 and DN2 in Eristalis looked quite similar. We will modify the Results section to make this clear.

      In addition, to investigate this further, in the revised version of the MS we will include analysis of the movement of different body parts (including the head) to investigate the presence of any potential sexual dimorphism. Unfortunately, however, this will not include the halteres, as they cannot be seen well in the videos.

      Reviewer #2 (Public review):

      Summary:

      Many fly species exhibit male-specific visual behaviors during courtship, while little is known about the circuit underlying the dimorphic visuomotor transformations. Nicholas et al focus on two types of visual descending neurons (DNs) in hoverflies, a species in which only males exhibit high-speed pursuit of conspecifics. They combined electrophysiology and behavior analysis to identify these DNs and characterize their response to a variety of visual stimuli in both male and female flies. The results show that the neurons in both sexes have similar receptive fields but exhibit speed-dependent dimorphic responses to different optic flow stimuli.

      This statement perfectly recapitulates the main findings of our paper. However, as mentioned above, while hoverfly flight behaviors related to courtship and mating are strongly sexually dimorphic, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not. We will amend the Introduction to make the difference between flight behaviors clear.

      Strengths:

      Hoverflies, though not a common model system, show very interesting dimorphic behaviors and provide a unique and valuable entry point to explore the brain organization behind sexual dimorphism. The findings here are not only interesting on their own right but will also likely inspire those working in other systems, particularly Drosophila.

      Thank you.

      The authors employed rigorous morphology, electrophysiology, and behavior methods to deliver a comprehensive characterization of the neurons in question. The precision of the measurements allowed for identifying a subtle and nuanced neuronal dimorphism and set a standard for future work in this area.

      Thank you.

      Weaknesses:

      Cell-typing using receptive field preferred directions (RFPDs): if I understood correctly, this classification method mostly relies on the LPDs near the center of the receptive field (median within the contour in Fig.1). I have two concerns here. First, this method is great if we are certain there are only two types of visual DNs as described in the manuscript. But how certain is this? Given the importance of vision in flight control, I would expect many DNs that transmit optic flow information to the motor center. I'd also like to point out that there are other lobula plate tangential cells (LPTCs) than HS and VS cells, which are much less studied and could potentially contribute to dimorphic behaviors.

      This is very true, and an important point. As mentioned above, in blowflies, up to 5 optic flow sensitive DNs have been identified morphologically, however, if these correspond to 5 different physiological types remain unclear. In both blowflies and Drosophila 3 have been identified morphologically and physiologically (DNHS1, DNOVS1, DNOVS2). Importantly, in both blowflies and fruitflies DNOVS1 gives graded responses, and no action potentials, meaning that we would not be able to record from it using extracellular electrophysiology.

      We previously used clustering techniques to show that in Eristalis, we can reliably distinguish two types of optic flow sensitive DNs from extracellular electrophysiological data, based on a range of receptive field parameters, and we think that these correspond to DNHS1 and DNOVS2 in Drosophila (Nicholas et al, J Comp Physiol A, 2020, cited in paper). As mentioned above in response to Reviewer 1, this does not mean that there are no other neurons that could respond to widefield optic flow, and which might be involved in the WBA we recorded in the paper. However, the point of this paper was not to conclusively show that there are only two optic flow sensitive descending neurons. The point was to say that there are two quite distinct optic flow sensitive neurons that have similar receptive fields in males and females, while the responses to widefield motion show differences between males and females.

      We will modify the Introduction and Discussion to make these important points clear to the Reader, including the discussion of the 45-60 LPTCs that exist in the lobula plate, and what their role might be.

      Second, this method feels somewhat impoverished given the richness of the data. The authors have nicely mapped out the directional tuning for almost the entire visual field. Instead of reducing this measurement to 2 values (center and direction), I was wondering if there is a better method to fully utilize the data at hand to get a better characterization of these DNs. As the authors are aware, local features alone can be ambiguous in characterizing optic flows. What's more, taking into account more global features can be useful for discovering potentially new cell types.

      This is a great point, and we did an extensive analysis of other receptive field properties in this study (shown in supp fig 1). In addition, and as mentioned above, we have published a clustering analysis across receptive field properties of these neurons (Nicholas et al, J Comp Physiol A, 2020, cited in paper). The point that we attempted to make in this paper was that by using two strikingly simple metrics, we can reliably distinguish which of the two neuron types we are recording from (if we accept that there are two main types that we are likely to record from) simply based on location and overall directional preference. This makes automated analysis very easy and straightforward. Indeed, we now use this routinely to ID what neuron we are recording from, rather than making a human-based assumption.

      However, we agree that further in depth analysis is warranted. Therefore, to address this, we will provide additional receptive field analysis and clustering in the revised version of the MS. In addition, we want to highlight that all data is uploaded to DataDryad for anyone interested in doing additional in-depth analyses.

      Line 131, it wasn't clear to me why full-screen stimuli were used for comparison here, instead of the full receptive field maps. Male flies exhibit sexual dimorphic behaviors only during courtship, which would suggest that small-sized visual stimuli (mimicking an intruder or female conspecific) would be better suited to elicit dimorphic neuronal responses. A similar comment applies to the later results as well. Based on the receptive field mapping in Figure 1, I'm under the impression that these 2 DN types are more suited to detect wide-field optic flows, those induced by self-motion as mentioned in the manuscript. The results are still very interesting, but it's good to make this point clear early on to help set appropriate expectations. Conversely, this would also suggest that there are other visual DN types that are responsible for the courtship-related sexually dimorphic behaviors.

      Thank you for mentioning these important points. Our reasoning for using full-screen stimuli for the analysis on line 131 was that since we used the small sinusoidal gratings for mapping the receptive fields, and to subsequently classify the neurons, it would be unfair to use the same data to investigate potential sexual dimorphism. I.e., we selected neurons that fulfilled certain criteria, and then we cannot rightfully use the same criteria to determine differences. This was not explicitly mentioned in the paper, so we will modify the text to make this clear to the Reader.

      However, in Supp Figure 1d/e we show that there are no striking receptive field differences between males and females in terms of receptive field center nor directional preference. In Supp Figure 1f we show that there is no difference between male and female receptive field height and width. We will modify the text to draw the Reader’s attention to this figure, and also mention the additional analysis done in response to the comment above.

      As a side note, I personally expected at least DNHS1 to have a smaller receptive field in males, as the hoverfly HSN is strikingly sexually dimorphic (Nordström et al, Curr Biol 2008), and also very sensitive to small objects. However, while optic flow sensitive DNs do respond to small objects (see e.g. the J Comp Physiol paper mentioned above) we did not detect any obvious sexual dimorphism in receptive field properties. Indeed, we think that a different subset of DNs control target pursuit behavior (target selective DNs (TSDNs)). This will be addressed in the modified version of the paper.

    1. eLife Assessment

      This valuable study reports results showing how different neurons in the dysgranular retrosplenial cortex code spatial orientation. Specifically, the paper reports that some neurons maintain tuning for a single head direction across multi-compartmental environments, while other neurons are tuned to different head directions that reflect the geometry within each compartment. The study was viewed as likely to expand the field's understanding of directional tuning of neurons, but incomplete evidence was provided to support the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The dysgranular retrosplenial cortex (RSD) and hippocampus both encode information related to an animal's navigation through space. Here, the authors study the different ways in which these two brain regions represent spatial information when animals navigate through interconnected rooms. Most importantly, they find that the RSD contains a small fraction of neurons that encode properties of interconnected rooms by firing in different head directions within each room. This direction is shifted by 180 degrees in 2-room environments, and by 90 degrees in 4-room environments. While it cannot be definitively proven that this encoding is not just related to the presence of exits (doors) in each room, this is a noteworthy finding and will motivate further study in more complex and well-controlled environments to understand this coding scheme in the RSD. The recordings and analyses used to identify these multi-directional cells are mostly solid. Additional conclusions regarding the rotational symmetry across rooms seen in the RSD neurons that do not encode direction (representing the majority of RSD neurons) remain incomplete, given the evidence presented thus far. The differences between RSD and hippocampus encoding of space are clear and consistent with prior observations.

      Strengths:

      (1) Use of tetrode recordings from the RSD to identify multi-direction cells that only encode one direction in each room, but shift the preferred direction by either 180 or 90 degrees depending on the number of rooms in the environment.

      (2) Solid controls to show that this multi-direction encoding is stable over time and across some environmental manipulations.

      (3) Convincing evidence that these multi-direction cells can co-exist with single-direction head direction cells in the RSD (as both cell types can be simultaneously recorded).

      (4) Convincing evidence for clear differences between directional and spatial encoding in the RSD versus hippocampus, consistent with prior observations.

      Weaknesses:

      (1) The paper mostly uses the term "retrosplenial cortex", but it is important to clarify that the study is only focused on the dysgranular retrosplenial cortex (RSD; Brodmann Area 30) and not the granular retrosplenial cortex (Brodmann Area 29). These are two distinct regions (despite the similar names), each with distinct connectivity and distinct behavioral encoding and function, so it is important to clarify in the abstract and title that the present study is solely about the RSD to prevent confusion in the literature.

      (2) The proportion of each observed cell type is not clearly stated, although it is clear that the multi-directional cells are in the minority. Having the proportion of well-isolated neurons in distinct sessions that encode each type of information (e.g., multi vs single direction encoding) would greatly aid the interpretation of the result and help the field know how common each cell type is in the RSD.

      (3) The authors state that "MDCs [multi-directional cells] never exhibited multidirectional activity within a single room" - but many of the single room examples from the 4-room environment (shown in Figures 2E and 2F) reveal multi-peaked directional encoding. This suggests that the multi-direction encoding may be more compatible with encoding some property of the number of exits rather than relative room orientations.

      (4) The spatial rotation analyses of non-directional cell analyses are considered incomplete. This is impacted by the slower speed at the doors and hence altered firing rates (as evidenced in spatial rate plots). The population rate is not relevant as the correlational analyses are done on a single cell level. Since some cells fire more with increasing speed and others fire less, that will necessarily result in a population rate map that minimizes firing rate differences near the doorway, where the animals move more slowly. But on a single cell level, that reduced speed is having a big effect, as evidenced by individual rate map examples, and the rooms will need to be rotated to obtain a higher correlation by overlapping the doorway regions. This does not necessarily say anything about spatial coding across the two or four interconnected rooms being rotationally symmetric, and it would appear difficult to draw any conclusions related to spatial encoding from those analyses.

    3. Reviewer #2 (Public review):

      Summary:

      Laurent et al. perform in vivo electrophysiological recordings in the retrosplenial cortex of rats foraging in multi-compartment environments with either identical or unique visual features. The authors characterize two types of directional signals in the area that they have previously reported: classic head direction cells anchored to the global allocentric reference frame and multi-direction cells (MDCs), which have a rotationally preserved directional field anchored to local compartments. The primary finding of this work is that MDCs seem sensitive to local environmental geometry rather than visual context. They also show that MDC tuning persists in the absence of hippocampal place field repetition, further dissociating the RSC local directional signal from the broader allocentric representation of space. A novel observation is that RSC non-directional spatial signals are anchored to the local environment, which could and should be explored further. While the data is solid and the analyses are mostly appropriate, the primary findings are incremental, and more interesting novel claims are not explored in detail or not explicitly tested.

      Strengths:

      The environmental manipulations clearly demonstrate that tuning is not modulated by complex visual information.

      The finding that RSC two-dimensional spatial responses are stable and anchored to environmental features is novel and can be further explored in future work.

      Weaknesses:

      The observation that BDCs and MDCs are insensitive to visual context builds upon the author's previous work (and replicates aspects of Zhang et al., 2022) but leaves many open questions that are not addressed with the current set of experiments. Specifically, what exactly are MDCs anchoring to? The primary theory is that they anchor to environmental geometry, but there are no explicit experimental manipulations to test this theory. It is important to note that 2- and 4-compartment environments share many features, including the same cardinal axes, making any differences/similarities in these two conditions difficult to interpret.

      The main finding presented with respect to BDC/MDs tuning is that they are not sensitive to visual context as manipulated by distinct visual patterns on the wall and floor in multicompartment environments. One could argue that the individual rooms are, in actuality, quite similar in low-level visual features - each possesses a large white background square visual feature on a single wall with a fixed relationship to the door(s). How can the authors rule out that i) BDC/MDC responses are modulated by these low-level features rather than geometry and/or ii) that the rats are not paying attention to any visual features at all? There is no task requiring them to indicate which room they are in. Furthermore, the doorways themselves are prominent visual features that are present in each context. It would be interesting to see if MDC/BDC tuning persisted in a square room where the number of doorways was manipulated to rule out this possibility.

      A strong possibility is that the rotational symmetry of both MDCs and non-directional spatial neurons is related to i) door-related firing, 2) stereotyped movement, and 3) stereotyped directional sampling. In Supplemental Figure 8, the authors begin to address this by comparing a 'population ratemap' to a 'population speed map.' I do not think this is sufficient and is difficult to interpret. Instead, the authors should assess whether MDC and BDCs fire more at doorways and what the overlap is with the speed-modulated cells they report. Moreover, they should assess whether the spatial speed profile itself is rotationally symmetric within each session. It would also be useful to look at the confluence of the variables simultaneously using some form of regression analysis. The authors could generate a directional predictor that captures the main response property of these cells and see if it accounts for greater variability in spiking than speed or x,y position. Finally, rotationally symmetric directional sampling biases could arise from the doors being present on the same two walls in each room. The authors should assess whether MDC tuning is still present if directional sampling is randomly downsampled to match directional observations in each compartment.

      Recent work has demonstrated that neurons with egocentric corner or boundary tuning are observed in RSC. The authors do not address whether egocentric tuning contributes to MDC signals. An explicit analysis of the relationship and potential overlap of MDC and egocentric populations is warranted.

      Many of the MDCs presented in the main figures are not especially compelling. This includes alterations to MDC tuning in Figure 2, which is a key datapoint. The authors should show significantly more (if not all) examples of MDCs in each environment. It would similarly be useful to see all/more examples of non-directional spatially tuned neurons with rotationally symmetric firing patterns.

      "One might hypothesize that specific environmental cues, such as door orientation or landmark positioning, drive these tuning shifts. However, our results argue against this interpretation. In four-room environments, each room had multiple entry points, yet MDCs never exhibited multidirectional activity within a single room."

      I do not understand the logic here. Can the authors unpack this? Also, it is clear that some of the example cells have more than one peak in individual compartments. How is this quantified?

    4. Reviewer #3 (Public review):

      Summary:

      The authors examine firing of dysgranular retrosplenial cortex (dRSC) neurons in relation to head orientation and location for rats exploring open-field environments. One environment utilized was a square arena with high walls that is split into two rectangular spaces connected by a doorway. Another environment is a square arena split into quadrants connected by doors near the center. For each, the different sub-spaces of the environments are either identical in terms of visual and tactile cues or different. For head direction neurons, the authors present one population where each neuron maintains a single tuning direction for the two or four sub-compartments of the two environments. A second population exhibits what is termed multi-directional firing, wherein neurons exhibit (overall) two or four head direction peaks in firing. For such neurons, firing in each of the sub-compartments is associated with only a single preferred direction, but the directions across compartments are shown to be at 180-degree (two-compartment environment) or 90-degree offsets. The offsets evidence tuning to the "same" orientation for the sub-compartments that are, in the global reference frame, oriented at 180 or 90 degree offsets. The results are similar whether or not the sub-compartments have the same or different tactile and visual cues. Thus, the first population is said to be global in its head direction tuning, while the second relates to each local environment in a way that is systematic across sub-compartments. Spatially-specific activity of another population of non-direction-tuned RSC neurons is examined, and comparisons of sub-compartment spatial firing maps suggest that spatial tuning in RSC also repeats across compartments when the firing maps for the compartments are rotated to match each other (as in physical space). Finally, a population of hippocampal "place" cells exhibited different location mapping across sub-compartments. The findings are interpreted to indicate that RSC can simultaneously map orientation in both local and global reference frames, possibly forming a mechanism whereby the sub-compartments' shared geometry (given by the boundary shapes and the door locations) can be related to each other and to the global space they share.

      Strengths:

      This paper addresses an interesting problem and expands how the field will think about directional tuning.

      Weaknesses:

      It is not clear that the experimental design allows for a clear interpretation of the data. Rates for preferred turning are low, as are ratemap correlations for spatially-tuned neurons.

      (1) It is concerning that the neurons with head direction tuning have fairly low peak firing rates (mean close to 5 Hz), where prior studies examining head direction tuning in dRSC found head direction-tuned neurons with peak rates more than an order of magnitude higher (100 Hz or more). Under circumstances where neurons are tuned well to variables other than head direction (for example, angular velocity of movement), weak head direction tuning may be observed if those other variables are not sampled equally across head directions. The manuscript contains no rigorous control for this possibility. One place to start to address this issue would be to map out variables such as angular velocity by head orientation, and to test whether such relationships also carry 90 and 180 degree offsets.

      (2) There is some question as to whether dRSC neurons (spatial or directional) following the sub-compartment "geometry" is appropriate in terms of interpreting the data. In the condition with sub-compartments carrying different tactile and visual cues, it seems that such cues pertain only to the floor of the environments. The distal visual space of the boundaries appears to be identical. One is left to wonder whether distinguishing environments according to boundary wall visual cues would lead to different results. The CA1 data does not help to rule this possibility out. A second reason to doubt the "shared geometry" interpretation is that there is no condition where sub-compartment geometry is varied. It is also the case that the sub-compartment doorways may stand as the only salient distal visual cue linking the environments. Local sensory cues and geometry seem not so disentangled in this study, but this is a major claim in the abstract.

      (3) There is some concern with the interpretation that the spatial tuning of some dRSC neurons repeats in rotated form across sub-compartments. The firing rate map correlations are very low on average (~0.2), and far lower than the population of CA1 having repeating fields across the same vs different visual/tactile cue conditions. The authors should define the chance level of ratemap correlation by shuffling neuron identities. Apologies if this is indeed the current approach, but it seems not to be (I was left a bit lost by the description in the methods). For any population of hippocampal place cells, the cross-neuron correlations of firing rate maps are typically not zero, and correlations at 0.2 would normally be evidence for remapping.

      (4) A somewhat picky point here that is not meant to claim that multi-compartment studies are not useful - the introduction states that real-world environments typically consist of multi-compartment rooms. This is certainly not true for rodents and is only sometimes true in humans.

      (5) The discussion lacks a consideration of how such dRSC output might impact the target structures of dRSC.

      (6) The discussion speaks to the idea that multi-directional neurons may aid in transitioning between contexts (sub-compartments). But it is notable that none of the multidirectional neurons have multi-directional tuning in all sub-compartments, but such firing was seen in the 2017 Nature Neuroscience study by Jacob/Jeffery. The discussion should address this difference and perhaps posit a means by which the firing of global and local head direction neurons can be related to each other to yield navigation that depends on both scales.

      (7) The authors should provide the size of the smoothing function for spatial firing rate maps.

      (8) The authors should devise a measure to define directional tuning in 4 directions (with 90-degree offsets).

      (9) Figures 2D and 2H - The offsets in preferred tuning across sub-compartments are rather variable.

    1. eLife Assessment

      This important study substantially advances the imaging toolbox available to neuroscientists by presenting a tunable Bessel (tBessel-TPFM) platform that enables high-speed volumetric two-photon imaging. The evidence supporting the novel methodology is convincing, with rigorous benchmarking and demonstrations of a wide range of neuroimaging applications covering vascular dynamics, neurovascular coupling, optogenetic perturbation, and microglial responses. The work will be of broad interest to neuroscientists and imaging system tool developers.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents a tunable Bessel-beam two-photon fluorescence microscopy (tBessel-TPFM) platform that enables high-speed volumetric imaging with stable axial focus. The work is technically strong and broadly significant, as it substantially improves the flexibility and practicality of Bessel-beam-based two-photon microscopy. The demonstrations are generally strong and bridge a wide range of neuroimaging applications, namely vascular dynamics, neurovascular coupling, optogenetic perturbation, and microglial responses. These convincingly show that the approach enables biological measurements that are difficult or impractical with existing methods.

      The evidence supporting the technical and biological claims is generally strong. The optical design is carefully motivated, clearly described, and validated through a combination of simulations and experimental characterization. The biological applications are diverse and well chosen to highlight the strengths of the proposed method, and the data are of high quality, with appropriate controls and comparative measurements where relevant.

      Strengths:

      (1) The optical innovation addresses a well-recognized limitation of existing Bessel-TPFM implementations, namely axial focus drift during tuning, and does so using a relatively simple, light-efficient, and cost-effective design.

      (2) The manuscript provides convincing experimental evidence for this being a versatile platform to map flow dynamics across diverse vessel sizes and orientations in both healthy and pathological states.

      (3) Biological demonstrations are comprehensive and span multiple domains such as hemodynamics, neurovascular coupling, and neuroimmune responses.

      (4) Quantitative analyses of blood flow across vessel sizes and orientations, including kilohertz line scanning, are particularly compelling and clearly beyond the reach of standard Gaussian TPFM.

      (5) Particular advantages are that higher blood slow speeds become measurable up to 23mm/sec (20x more than conventional frame scanning), and that simultaneous (Bessel-)imaging and (Gaussian-)perturbation are possible because of the stable axial focus.

      Weaknesses:

      (1) At present, the paper does not properly position the new Bessel-beam method against previous work, and fails to compare it to alternative fast volumetric imaging methods without Bessel beams.

      (2) The cost-effectiveness of the proposed method is not well described or supported by evidence; it would be useful to include more detail or remove this claim.

      (3) Some biological conclusions, e.g., regarding novel features of microglial dynamics (i.e., the observed two-wave responses and coordinated extension-retraction), are based on relatively limited sample size and would benefit from clearer discussion of variability across animals and fields of view.

      (4) The use of neural network-based denoising for microglial imaging is reasonable but introduces potential concerns about trustworthiness; additional clarification of validation or failure modes would strengthen confidence in these results.

      To conclude, most of the authors' claims are well supported by the data. The central conclusion, namely that tBessel-TPFM provides tunable volumetric imaging enabling experiments not feasible with existing two-photon approaches, is justified. Some biological interpretations would benefit from a more cautious framing, but they do not undermine the main technical and methodological contributions of the study. This is a strong and technically rigorous manuscript that makes a substantial methodological advance with clear relevance to neuroscience and intravital imaging. Minor clarifications and a slightly more measured discussion of certain biological findings are recommended.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe a tunable Bessel beam two-photon microscope (tBessel-TPFM) designed to overcome a common limitation of Bessel-based volumetric imaging: axial shifts of the effective focus during Bessel beam parameter tuning. Their optical design allows independent control of axial beam length and resolution while keeping the axial center fixed. This is extensively validated through simulations and experiments.

      Strengths:

      A major strength of the work is the breadth of validation combined with the level of technical detail provided. The authors carefully characterize the optical performance of the system and clearly explain the design choices and underlying derivations, which will make it easier for others to understand and implement. The authors demonstrate the utility of the method across several in vivo applications, including neurovascular imaging, blood flow measurements, optogenetic stimulation, and microglial dynamics.

      Weaknesses:

      In the in vivo demonstrations, the authors employ different Bessel beam configurations across experiments, but the beam parameters are not dynamically tuned during live imaging. A video example showing continuous or interactive tuning of the Bessel beam within a single in vivo imaging sequence would further highlight the practical advantages of this platform and strengthen the case for its potential applications. In addition, while excitation powers are reported, the manuscript does not place these values in the broader context of known photodamage thresholds for two-photon microscopy, which would be helpful to the readers. Denoising/image restoration are applied in one of the in vivo examples, but it is unclear why this step was used specifically for this dataset and whether it was necessary to achieve adequate SNR or primarily included as an additional demonstration.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript presents an elegant and cost-effective approach for generating a tunable Bessel beam on a conventional two-photon microscope. The authors assemble a compact optical module comprising three axicons and a series of lenses that permits rapid adjustment of both lateral resolution and axial extent without modifying the focal plane. This flexibility enables the system to be readily adapted to a variety of biological preparations. As a proof of concept, the authors employ the device to record blood flow velocities in cortical microcapillaries, arterioles, and venules, thereby directly visualizing vasodilatation and vasoconstriction dynamics and permitting quantitative analysis of neurovascular coupling across cortical layers in awake mice.

      The authors demonstrate that the tunability of the Bessel beam can be exploited to match the numerical aperture to the vessel type: a high NA configuration, albeit slower scan, is optimal for resolving flow in capillaries, whereas a low NA setting provides faster acquisition suitable for arterioles and venules. By implementing a one-dimensional line scan with the Bessel beam, they achieve an imaging speed that is twentyfold faster than conventional frame-by-frame scanning, which proves sufficient to capture hemodynamic transients before and after an induced ischemic stroke.

      In addition to pure observation, the authors integrate a co-propagating Gaussian line to the system, allowing simultaneous imaging and photostimulation within the same focal plane. This capability addresses a common limitation of other Bessel beam implementations, in which the observation and perturbation planes often become misaligned when the Bessel beam is altered. The manuscript also emphasizes the advantage of Bessel beam excitation for calcium imaging after a perturbation, because it captures neuronal activity in planes both above and below the nominal focal plane, signals that would be missed with a standard Gaussian focus. Finally, the authors apply the technique to investigate the neuroimmune response following targeted microglial ablation; they report that adjacent microglia extend processes toward the injury site while retracting processes in the opposite direction.

      Overall, the work offers a technically straightforward yet powerful extension to existing two-photon platforms, providing high-speed, volumetric imaging and stimulation capabilities that are well-suited to a broad range of neurovascular and neuroimmune studies. The experimental validation is quite thorough, and the presented data convincingly illustrates the benefits of the approach.

      Strengths:

      The authors present a truly clever and inexpensive optical module that can be integrated into almost any two-photon microscope, providing a tunable Bessel beam with a minimal modification of the existing system. The experimental data and accompanying quantitative analysis convincingly demonstrate that the system can reveal physiological events, such as capillary flow, calcium transients across multiple axial planes, and microglial process dynamics, that are difficult or impossible to capture with a conventional Gaussian beam. The breadth of experiments chosen for the manuscript illustrates the practical utility of the device and supports the authors' conclusions that it extends the functional repertoire of standard two-photon microscopy.

      Weaknesses:

      The manuscript would benefit from a more detailed contextualisation of the claimed speed advantage. Although the authors mention other techniques in the introduction, they do not provide any direct comparison with other state-of-the-art high-speed two-photon approaches such as light beads microscopy (Demas et al., Nat. Methods 2021), temporal multiplexing schemes (Weisenburger et al., Cell 2019), or random access microscopy (Villette et al., Cell 2019). A brief comparison of imaging speed, spatial resolution, and instrumental complexity would enable readers to assess the relative merits of the present method.

      A second limitation that warrants discussion is the inherent trade off between volumetric coverage and image specificity. Because the Bessel beam excites fluorescence throughout an extended axial range, the detector inevitably integrates signal from a three dimensional volume into a two dimensional image. In densely labelled tissue, this can lead to significant signal crosstalk, reducing contrast and complicating quantitative interpretation. A brief analysis of how labeling density affects the fidelity of flow or calcium measurements, or suggestions for mitigating crosstalk (e.g., computational deconvolution, adaptive excitation shaping, or combinatorial sparse labeling), would broaden the applicability of the technique.

    1. eLife Assessment

      The study investigates, from multiple angles, the still-debated function of insect rhodopsin-7 (Rh7). The authors present compelling results for its ancient phylogenetic origin across pan-arthropods, a non-visual role based on expression analyses in the fly brain, an unusual G-protein signalling pathway, and - using behavioural genetics - that Rh7 affects how Drosophila melanogaster interprets and responds to light-dark transitions. Through this, the work provides fundamental new insights into the evolution and function of non-visual opsins.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigates the Drosophila non-visual light receptor rhodopsin7 with regard to its role in light information processing and resulting consequences for behavioral patterns and circadian clock function. Using behavioral, in situ staining, and receptor activation assays together with different fly mutants, the authors show that rhodopsin7 is an important determinant of activity under and response to darkness, which likely signals via a pathway distinct from other, visual Drosophila rhodopsins. Based on phylogenetic analysis, the authors further discuss a potentially conserved functional role of non-visual photoreceptors like rhodopsin7 and the mammalian melanopsin light information processing and circadian clock modulation.

      Strengths:

      The manuscript follows a very clear structure with all investigations logically building onto each other. Background information and methodology are provided in appropriate detail so that readers can fully understand why and how experiments were conducted. It is further praiseworthy that the authors provide the details that allow also non-experts in the field to fully understand their approaches. Experimental work was conducted in a highly standardized manner, and also considered potential "side-aspects" like the consequences of temperature cycles and changed photoperiods. The detailed and clear description of the obtained results makes them very convincing, with (almost) all observable patterns being addressed.

      By highlighting the evolutionary old phylogenetic position of rhodopsin7 and its conservation across numerous clades, the authors provide strong reasoning for the relevance of their work, also pointing out the similarities to the mammalian melanopsin. The postulated hypothesis regarding protein structure and functioning, as well as the role in light information processing and behavioral and circadian clock modulation are well based on the authors' observations, and speculative aspects are correctly pointed out.

      Weaknesses:

      Where the manuscript still has potential for improvement is the discussion, which in its current form does seem slightly self-contained and does not fully integrate the findings of previous studies on Drosophila rhodopsin7. As the introduction specifically points out that previous findings have been contradictory, this seems like a missed opportunity. Further details on this are provided in the recommendations below.

      Similarly, the manuscript currently lacks a discussion of the possible relevance of rhodopsin7 (and other non-visual light receptors in other organisms) in the context of a species' environment and lifestyle, i.e., what is the relevance/benefit of having rhodopsin7 in the fly's everyday life? While this clearly involves speculation, when done carefully, it can elevate the paper's relevance from a primarily academic to a societal one.

      An additional point concerns the title and abstract, which postulate rhodopsin7 roles in contrast vision as well as motion and brightness perception. Contrast remains poorly defined in the text, leaving it ambiguous whether it refers to bright/dark contrasts, e.g., along edges, or the temporal contrast that results from dark pulses (startle response). While the latter seems to apply here, the former is likely more intuitive. Thus, this aspect should be rephrased (also in the title) or properly clarified early on. Regarding motion detection, this is backed up by the optomotor response results, but the findings stand somewhat isolated from the other results, lacking a clear connection aside from general visual processing. Lastly, brightness perception is mentioned in the abstract, but never again, possibly due to inconsistent phrasing throughout the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This is a very interesting paper bringing new and important information about the poorly understood rhodopsin 7 photoreceptive molecule. The very ancient origin of the gene is revealed in addition to data supporting a signaling pathway that is different from the one known for the canonical rhodopsins. Precise expression data, particularly in the optic lobe of the fly, as well as clear behavioral phenotypes in responses to light changes, make this study a strong contribution to the understanding of the still-debated function of rhodopsin 7.

      Specific comments

      (1) Title and abstract: Contribution of Rh7 to circadian clock regulation

      (a) It is not that clear to me what rhodopsin does in terms of circadian regulation (even though its function might be circadianly regulated). The clear role in the light/dark distribution of activity might not be circadian per se, but mostly light/dark-driven, and there is no evidence here for a role in the entrainment of the clock.

      (b) The authors should cite Lazopulo, which nicely shows that Rh7 has an important role in peripheral neurons to allow flies to escape from blue light (see below).

      (2) Figure 2 C

      The finding showing that Galphaz but not Galphaq can trigger signaling from light-excited Rh7 is a very intriguing finding to better understand Rh7 function. Since Galphaz is related to Gi/o, it would be interesting to test those, for example, by expressing RNAi with Rh7-gal4 and testing the Light-dark or light-off response behavior.

      (3) Figures 3-4

      The change in the locomotor activity distribution between light and dark in LD conditions provides a nice assay for Rh7 function. Since Lazopulo et al. (2019) have shown that wild-type but not Rh7 mutants do escape from blue light, it would be important to compare and discuss these LD behavior data with the Lazopulo results. Precisely, is this nighttime preference linked to blue light?

      The expression data are really nice and show that Rh7 is mostly a non-retinal photoreceptor. However, the paper would be strongly reinforced by correlating this with the LD behavior. The LD phenotype should be tested in flies with Rh7 expression rescued under Rh7gal4 control (as done for the startle response). This is important to show whether the expression pattern is likely responsible for the described Rh7 function in LD. If L5 and or M11 drivers are available, they should be used to rescue Rh7? Since expression in some clock neurons is shown, the rescue experiment should also be done with a clock neuron driver.

      In the same line, can the LD phenotype (or startle response phenotype of Figure 4) be restored by expressing Rh7 under ppk control, as shown for the blue light avoidance phenotype by Lazopulo et al?

      Finally, the Rh7 "darkfly" rescued flies should be tested in LD.

    4. Reviewer #3 (Public review):

      Summary:

      While our knowledge regarding visual opsins is largely very good, a lot more uncertainty exists around the role of non-visual opsins. Using the power of the Drosophila melanogaster model system, Kirsh et al. investigate the role of the non-visual opsin Rhodopsin7 (Rh7). Expression analysis, based on Rh7-Gal4>UAS-GFP and HRC in situ staining, reveals strong expression in the optic lobes and somewhat weaker, but nevertheless extensive expression in the brain. An investigation of motor activity reveals that loss of function leads to an altered day and night rhythm, specifically decreasing activity during the dark phase. These flies were also less sensitive, but still responsive to a light-induced startle response and showed deficiencies in the optomotor response. To further investigate how Rh7 may modulate these responses, inspired by the Dark line of flies (which were kept in the dark for ~1400 generations) and which has accumulated C-terminal related losses, the authors conducted rescues with an intact and a C-terminal-deficient Rh7 and were able to pinpoint that region as an important driver of related behavioral shifts. These findings are particularly intriguing as Rh7 represents an ancient opsin with phylogenetic and mechanistic parallels to mammalian melanopsin.

      Strengths:

      The paper is well-written and contains high-quality data with appropriate sample sizes, and the conclusions are well supported.

      Weaknesses:

      No weaknesses were identified by this reviewer, but the following recommendations are made:

      (1) The authors should clarify exactly what tissues were taken for the comparative qPCR. This is particularly interesting in terms of the retina. Since Rh7 appears not to be expressed within the photoreceptor cells of the retina, this raises the important question as to which cells it is expressed in. To address this important question, it would also be helpful to include an expression analysis of the retina itself (by extending the RH7-GFP expression patterns and/or adding HCR in situ of the ommatidia array). The cell types of the retina are very well classified, and some evidence already exists for Rh7 expression in support cells (e.g., Charlton-Perkins et al., (2017); PMID: 28562601). This study has a unique opportunity to investigate this further by adding these critical data for a more complete picture of Rh7.

      (2) Mammalian opsins should be included in the phylogenetic analysis illustrated in Figure 2A and indicate their position on the tree. This will allow readers to better put the authors' statements regarding the intermediate position of Rh7 into perspective. In addition, note that the distinction between red and deep red is easy to miss regarding the Rh7 cluster. Perhaps the authors could use a more distinct colour scheme, for example, orange and deep red.

      (3) More details should be provided on the optomotor response experiments. Specifically, specifications of the frequencies used for the optomotor response are needed. Results show a relatively large level of variation, which may be due to different angular perspectives that flies may have had while viewing the stimulus. If possible, provide videos as examples, as they will make it clearer to viewers how much flies could move around in the setup (from the methods, it seems they could move within the 2.2 of the 3 cm diameter of the arena, which would lead to substantial differences in the visual angle of the viewed grating.

    1. eLife Assessment

      This useful study raises interesting questions but provides inadequate evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The findings are intriguing but they are correlative and hypothesis-generating with the strong possibility of residual confounding.

      [Note: The final version has been published in Brain, Behavior, and Immunity: https://doi.org/10.1016/j.bbi.2026.106473]

    2. Author response:

      [Note: The final version has been published in Brain, Behavior, and Immunity: https://doi.org/10.1016/j.bbi.2026.106473]

      eLife Assessment

      Rhis useful study raises interesting questions but provides inadequate evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The findings are intriguing but they are correlative and hypothesis-generating with the strong possibility of residual confounding.

      We thank the editors and reviewers for characterizing our work as useful and for the opportunity to publish a Reviewed Preprint with a corresponding response. However, the statements in the Assessment characterizing the evidence as ‘inadequate’ and asserting a ‘strong possibility of residual confounding’ are factually incorrect as applied to our data and incompatible with the empirical findings presented in the manuscript. We have notified the editors of this factual inaccuracy. As the Assessment will be published as originally written, we provide clarification here to ensure an accurate scientific record for readers of the Reviewed Preprint.

      Our study shows that the association between atovaquone–proguanil (A/P) exposure and reduced dementia risk, first identified in a rigorously matched national cohort in Israel, is robustly reproduced across three independently constructed age-stratified cohorts in the U.S. TriNetX network (with exposure at ages 50–59, 60–69, and 70–79). In each cohort, individuals exposed to A/P were compared with rigorously matched individuals who received another medication at the same age and were then followed over a decade for incident dementia. Cases and controls were matched on all major established dementia risk factors: age, sex, race/ethnicity, diabetes, hypertension, obesity, and smoking status.

      Across all three strata, each containing more than 10,000 exposed individuals with an equal number of matched controls, we observed substantial and consistent reductions in cumulative dementia incidence (HR 0.34–0.51), extremely low P-values (10<sup>–16</sup> to 10<sup>–40</sup>), and continuously widening divergence of Kaplan–Meier curves over the follow-up period. To more rigorously exclude the possibility of unmeasured baseline differences in health status, we additionally performed, for the purpose of this response, comparative analyses of key indicators of frailty and clinical utilization, including emergency and inpatient encounters, as well as the prevalence of mild cognitive impairment prior to medication exposure (values provided below in response to Reviewer #2, Weakness 1). These analyses provide clear evidence showing no pattern suggestive of exposed individuals being medically or cognitively healthier at baseline.

      Taken together, these findings constitute a rigorously matched and independently replicated association across two national health systems, using TriNetX, the most widely cited real-world evidence platform in published cohort studies. Replication across three age strata, each with >10,000 exposed individuals, followed for a decade, and matched on all major known risk factors for dementia, meets the accepted epidemiologic definition of strong and reproducible evidence.

      Although we disagree with elements of the editorial Assessment that appear inconsistent with the empirical findings, we will proceed with publication of the current manuscript as a Reviewed Preprint in order to ensure timely dissemination of findings with meaningful implications for public health and dementia prevention. In this initial public version, the point-by-point responses below provide concise explanations addressing the critiques underlying the Assessment. A revised manuscript, incorporating expanded baseline comparisons across each TriNetX age stratum, additional stringent exclusions, and an expanded discussion that will address the remarks presented in this review, will be submitted shortly.

      Reviewer #1 (Public review):

      Summary:

      This useful study provides incomplete evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The study reinforces findings that VZ vaccine lowers AD risk and suggests that this vaccine may be an effect modifier of A-P's protective effect. Strengths of the study include two extremely large cohorts, including a massive validation cohort in the US. Statistical analyses are sound, and the effect sizes are significant and meaningful. The CI curves are certainly impressive.

      Weaknesses include the inability to control for potentially important confounding variables. In my view, the findings are intriguing but remain correlative / hypothesis generating rather than causative. Significant mechanistic work needs to be done to link interventions which limit the impact of Toxoplasmosis and VZV reactivation on AD.

      We thank the reviewer for describing our study as useful and for highlighting several of its strengths, including the very large cohorts, sound statistical analyses, meaningful effect sizes, and the impressive CI curves. We also appreciate the reviewer’s recognition that our findings reinforce prior evidence linking VZV vaccination to reduced AD risk.

      Regarding the statement that the evidence remains incomplete due to “inability to control for potentially important confounding variables,” we refer to our introductory explanation above. As noted there, our analyses meet the accepted criteria for reproducible epidemiological evidence, and the assumption of uncontrolled confounding is contradicted by rigorous matching and by additional baseline evaluations. We fully agree that mechanistic work is warranted, and our epidemiologic findings strongly motivate such efforts.

      We address the reviewer’s specific comments in detail below.

      (1) Most of the individuals in the study received A-P for malaria prophylaxis as it is not first line for Toxo treatment. Many (probably most) of these individuals were likely to be Toxo negative (~15% seropositive in the US), thereby eliminating a potential benefit of the drug in most people in the cohort. Finally, A-P is not a first line treatment for Toxo because of lower efficacy.

      We agree that individuals in our cohort received Atovaquone-Proguanil (A-P) for malaria prophylaxis rather than for treatment of toxoplasmosis. However, this does not contradict our interpretation. Because latent CNS colonization by T. gondii is not currently considered clinically actionable, asymptomatic carriers are not offered treatment, and therefore would only receive an anti-Toxoplasma regimen unintentionally, through a medication prescribed for another indication such as malaria prophylaxis. Importantly, atovaquone is an established therapy for toxoplasmosis, including CNS disease, with documented efficacy and CNS penetration in current treatment guidelines. It is therefore reasonable to assume that, during the multi-week course typically administered for malaria prophylaxis, A-P would exert significant anti-Toxoplasma activity in individuals with latent CNS infection, potentially reducing or eliminating parasite burden even though the medication was not prescribed for that purpose.

      The reviewer notes that only ~15% of individuals in the U.S. are Toxoplasma-seropositive, based on surveys performed primarily in young adults of reproductive age (serologic testing is most commonly obtained in women during prenatal care). However, seropositivity increases cumulatively over the lifespan, and few reliable estimates exist for the age groups in which Alzheimer’s disease and dementia occur. Even if we accept the lower estimate of ~15% latent colonization in older adults, this proportion is still smaller than the lifetime cumulative incidence of dementia in the general population.

      Therefore, if latent toxoplasmosis contributes causally to dementia risk, and A-P is capable of eliminating latent Toxoplasma in the subset of individuals who harbor it, then a multi-week course of treatment—such as the one routinely taken for malaria prophylaxis—would be expected to produce a substantial reduction in dementia incidence at the population level, of the same order of magnitude reported here. A protective effect concentrated in a minority of exposed individuals is fully compatible with, and can mechanistically explain, the large overall reduction in risk that we observe.

      Finally, the reviewer notes that A-P is not a first-line treatment for toxoplasmosis due to assumed lower efficacy. This point does not undermine our results. Even a second-line agent, when administered over several weeks—as is routinely done for malaria prophylaxis—is expected to exert substantial anti-Toxoplasma activity. The long duration of exposure in large populations receiving A-P for travel provides a unique natural experiment that does not exist for other anti-Toxoplasma medications, which, when prescribed for their non-Toxoplasma indications, are not taken more than a few days. Thus, the widespread use of A-P for malaria prophylaxis allows a unique opportunity to evaluate long-term outcomes following inadvertent anti-Toxoplasma treatment.

      Moreover, “first line” recommendations in clinical guidelines refer to treatment of acute toxoplasmosis in immunosuppressed individuals, where tachyzoites are actively replicating. These guidelines do not consider efficacy against latent CNS colonization, which is dominated by bradyzoites, a biologically distinct form, in immunocompetent individuals. Therefore, the guideline hierarchy is not informative regarding which medication is more effective at clearing latent brain infection, the stage we consider most relevant to dementia risk.

      (2) A-P exposure may be a marker of subtle demographic features not captured in the dataset such as wealth allowing for global travel and/or genetic predisposition to AD. This raises my suspicion of correlative rather than casual relationships between A-P exposure and AD reduction. The size of the cohort does not eliminate this issue, but rather narrows confidence intervals around potentially misleading odds ratios which have not been adjusted for the multitude of other variables driving incident AD.

      We agree that prior to matching, A-P exposure may be associated with demographic features such as health or to travel internationally. However, this does not apply after matching. In all age-stratified analyses, exposed and control individuals were rigorously matched on all major risk factors known to influence dementia risk, including age, sex, race/ethnicity, smoking status, hypertension, diabetes, and obesity. Owing to the extremely large pool of individuals in TriNetX (~120M), our matching was performed stringently, producing exposed and unexposed cohorts that are near-identical with respect to the established determinants of dementia risk.

      The reviewer correctly identifies that large cohorts alone do not eliminate confounding; however, confounding must still be biologically and epidemiologically plausible. Any hypothetical confounder capable of producing a 50–70% reduction in dementia incidence over a decade would need to: (1) produce a very large protective effect against dementia; (2) be strongly associated with A-P exposure; and (3) remain entirely uncorrelated with age, sex, race/ethnicity, smoking, diabetes, hypertension and obesity, which have been rigorously matched. No such factor has been proposed. The suggestion that an unspecified ‘subtle demographic feature’ could produce effects of this magnitude remains hypothetical, and no such factor has been described in the dementia risk literature.

      If a specific evidence-supported confounder is proposed that meets these criteria, we would be pleased to test it empirically in our cohorts. In the absence of such a proposal, the interpretation that the association is merely “correlative rather than causal” remains speculative and does not negate the strength of a replicated, rigorously matched, long-term association across large cohorts in two national health systems.

      (3) The relationship between herpes virus reactivation and Toxo reactivation seems speculative.

      We respectfully disagree with the characterization of the herpesvirus–Toxoplasma interaction as speculative. The mechanism we describe is biologically valid, based on established virology and parasitology literature showing that latent T. gondii infection can reactivate from its bradyzoite state under inflammatory or immune-modifying conditions, including viral triggers. A published clinical report has documented CNS co-reactivation of T. gondii and a herpesvirus, explicitly noting that HHV-6 reactivation can promote Toxoplasma reactivation in neural tissue (Chaupis et al., Int J Infect Dis, 2016).

      Moreover, this mechanism is the only currently evidence-supported explanation that simultaneously and parsimoniously accounts for all of the epidemiologic observations in our study:

      (1) Substantially higher cumulative incidence of dementia in individuals with positive Toxoplasma serology, indicating that latent infection is a risk factor for subsequent cognitive decline;

      (2) Strong protective association following A-P exposure, a medication with established activity against Toxoplasma gondii, including in the CNS;

      (3) Independent protection conferred by VZV vaccination, observed consistently for two vaccines with distinct formulations (one live attenuated, one recombinant protein), whose only shared property is suppression of VZV reactivation;

      (4) Greater protective effect of A-P among individuals who were not vaccinated against VZV, consistent with a model in which dementia risk requires both herpesvirus reactivation and persistent latent Toxoplasma infection—such that reducing either factor alone (via VZV vaccination or anti-Toxoplasma suppression) substantially lowers risk.

      Taken together, these observations are difficult to reconcile under any alternative hypothesis.  

      To date, we are unaware of any other biologically coherent mechanism that can explain all four findings simultaneously. We would welcome any alternative explanation capable of accounting for these converging epidemiologic signals, as such a proposal could meaningfully advance the scientific discussion. In the absence of a competing explanation, the interaction between latent toxoplasmosis and herpesvirus reactivation remains the most parsimonious hypothesis supported by current knowledge.

      Finally, while observational studies are inherently limited in their ability to provide causal inference, the mechanism we propose is biologically grounded and experimentally testable. Our results provide a strong rationale for mechanistic studies and clinical trials, and warrant publication precisely because they generate a verifiable hypothesis that can now be evaluated directly.

      (4) A direct effect on A-P on AD lesions independent on infection is not considered as a hypothesis. Given the limitations above and effects on metabolic pathways, it probably should be. The Toxo hypothesis would be more convincing if the authors could demonstrate an enhanced effect of the drug in Toxo positive individuals without no effect in Toxo negative individuals.

      A direct effect of A-P on AD established lesions is indeed possible, and this hypothesis would be of significant therapeutic interest. However, we did not consider it within the scope of our epidemiologic analyses because all cohorts explicitly excluded individuals with existing dementia. Under these conditions, proposing a disease-modifying effect on established Alzheimer’s lesions based on our data would itself be speculative. Evaluating such a mechanism would be better answered by mechanistic or interventional studies rather than inference from populations without baseline disease.

      We also agree that demonstrating a stronger protective effect among Toxoplasma-positive individuals would be informative. Unfortunately, this “natural experiment” cannot be performed using the available data: Toxoplasma serology is rarely ordered in older adults, and A-P exposure is itself uncommon, resulting in a cohort overlap far too small to yield valid statistical inference (n≈25 in TriNetX).

      Thus, while both proposed hypotheses are scientifically attractive and merit further study, neither can be resolved using currently available real-world clinical data. Our findings provide the rationale to investigate both hypotheses experimentally, and we hope our report will motivate such studies.

      Reviewer #2 (Public review):

      Summary:

      This manuscript examines the association between atovaquone/proguanil use, zoster vaccination, toxoplasmosis serostatus and Alzheimer's Disease, using 2 databases of claims data. The manuscript is well written and concise. The major concerns about the manuscript center around the indications of atovaquone/proguanil use, which would not typically be active against toxoplasmosis at doses given, and the lack of control for potential confounders in the analysis.

      Strengths:

      (1) Use of 2 databases of claims data.

      (2) Unbiased review of medications associated with AD, which identified zoster vaccination associated with decreased risk of AD, replicating findings from other studies.

      We thank the reviewer for the thoughtful assessment and for noting key strengths of our work, including (1) the use of two large national databases, and (2) the unbiased discovery approach that replicated the widely reported association between zoster vaccination and reduced Alzheimer’s disease (AD) risk. We agree that these features highlight the validity and reproducibility of the analytic framework.

      Below we respond to the reviewer’s perceived weaknesses.

      Weaknesses:

      (1) Given that atovaquone/proguanil is likely to be given to a healthy population who is able to travel, concern that there are unmeasured confounders driving the association.

      We agree that, prior to matching, A-P exposure may correlate with demographic or health-related differences (e.g., ability to travel). However, this potential bias was explicitly controlled for in the study design. Across all three age-stratified TriNetX cohorts, exposed and unexposed individuals were rigorously matched on all major established dementia risk factors: age, sex, race/ethnicity, smoking status, obesity, diabetes mellitus, and hypertension. Comparative analyses confirm that these risk factors are equivalently distributed at baseline.

      As noted in our response to Reviewer #1, for any hypothetical unmeasured confounder to explain the results, it would need to satisfy three conditions simultaneously:

      (1) Be capable of producing a 50–70% reduction in dementia incidence sustained over a decade and across three distinct age strata (ages 50–79);

      (2) Be strongly associated with likelihood of receiving A-P;

      (3) Remain entirely uncorrelated with age, sex, race/ethnicity, smoking, diabetes, hypertension, or obesity, all of which were rigorously matched and balanced at baseline.

      No such factor has been proposed in the literature or by the reviewer. Thus, the concern remains hypothetical and unsupported by any measurable demographic or biological mechanism.

      Importantly, empirical evidence contradicts the notion of a “healthy traveler” bias:

      Emergency and inpatient encounter rates prior to exposure were comparable between A-P users and controls. Across the three age-stratified cohorts, emergency visits were similar or slightly higher among A-P users (EMER: 19.6% vs 16.4%, 19.9% vs 14.2%, 22.0% vs 14.8%), and inpatient encounters were effectively equivalent (IMP: 14.8% vs 15.2%, 17.7% vs 17.6%, 22.1% vs 22.2%). These patterns directly contradict the suggestion that A-P users were a healthier or less medically burdened population at baseline.

      Prevalence of mild cognitive impairment was not lower among A-P users and was, in fact, slightly higher in the oldest cohort. Across the three age groups, baseline diagnoses of mild cognitive impairment (MCI) were comparable or slightly higher among exposed individuals (0.1% vs 0.1%, 0.3% vs 0.2%, 1.1% vs 0.6%). These data contradict the suggestion that A-P users had superior baseline cognition.

      The strongest protective association occurred in the youngest stratum (age 50–59; HR 0.34). At this age, when nearly all individuals are sufficiently healthy to travel internationally, A-P uptake is the least likely to confound health status. A frailty-based “healthy traveler” hypothesis would instead predict the opposite pattern, with older adults showing the greatest apparent benefit, since health limitations are more likely to restrict travel in later life. In contrast, the protective association weakens with increasing age, empirically contradicting any explanation based on differential travel capacity.

      In conclusion, the empirical evidence directly contradicts the existence of a ‘healthy traveler’ effect.

      (2) The dose of atovaquone in atovaquone/proguanil is unlikely to be adequate suppression of toxo (much less for treatment/elimination of toxo), raising questions about the mechanism.

      A few important points should address the reviewer’s concern:

      In our cohorts, A-P was prescribed for malaria prophylaxis, as correctly noted. In this setting, it is taken for the entire duration of travel, plus several days before and after, typically resulting in many weeks of continuous exposure. This creates an unintentional but scientifically valuable natural experiment, in which a CNS-penetrating anti-Toxoplasma agent is administered for long durations.

      Atovaquone is an established treatment for CNS toxoplasmosis, has strong CNS penetration, and is included in current clinical guidelines for acute toxoplasmosis in immunocompromised patients, although at higher doses. Because latent, asymptomatic CNS colonization is not treated in clinical practice, there are currently no data establishing the dose required to eliminate bradyzoite-stage Toxoplasma in immunocompetent individuals.

      Our observations concern atovaquone–proguanil (A-P), a fixed-dose combination of atovaquone with proguanil, a DHFR inhibitor targeting a key metabolic pathway shared by malaria parasites and T. gondii. The combination has well-established synergistic effects in malaria prophylaxis and the same mechanism would be expected to enhance anti-Toxoplasma activity. This fixed-dose regimen has never been formally evaluated for toxoplasmosis treatment at prolonged durations or against latent bradyzoite infection.

      Our hypothesis does not require or imply complete eradication of Toxoplasma. A clinically meaningful reduction in latent cyst burden among the subset of colonized individuals may be sufficient to alter long-term disease trajectories. Thus, a population-level decrease in dementia incidence does not require universal clearance of infection, but only partial suppression or reduction of parasite load in susceptible individuals, which is entirely compatible with the known pharmacology and duration of A-P exposure.

      (3) Unmeasured bias in the small number of people who had toxoplasma serology in the TriNetX cohort.

      The relatively small number of older adults with Toxoplasma serology stems from current clinical practice: serologic testing is mostly performed in women during reproductive years due to risks in pregnancy, whereas in older adults a positive result has no clinical consequence and therefore testing is rarely ordered.

      Importantly, the seropositive and seronegative groups were drawn from the same underlying population of individuals who underwent serology testing, and the only difference between groups is the test result itself. Because the decision to order a test is made prior to and independent of the result, there is no plausible rationale by which the serology outcome (positive or negative) would introduce a bias favoring either group beyond the result of the test itself.

      Furthermore, the two groups were here also rigorously matched on all major dementia risk factors, including age, sex, race/ethnicity, smoking, diabetes, hypertension, and BMI, and these characteristics are similarly distributed between groups. A small sample size does not imply bias; it simply reduces statistical power. Despite this limitation, the observed association (HR = 2.43, p = 0.001) remains strongly significant.

      Finally, this result is consistent with multiple published studies reporting higher rates of Toxoplasma seropositivity among individuals with Alzheimer’s disease, dementia, and even mild cognitive impairment, such that our finding reinforces a broader and independently observed epidemiologic pattern. Importantly, in our cohort the serology testing clearly preceded dementia diagnosis, which supports the plausibility of a causal rather than merely correlative relationship between latent toxoplasmosis and cognitive decline.

      To conclude our provisional response, we thank the editor and reviewers for raising points that will be further addressed and expanded upon in the discussion of the forthcoming revision. We welcome transparent scientific dialogue and acknowledge that, as with all observational research, residual confounding cannot be eliminated with absolute certainty. However, we disagree with the overall Assessment and emphasize that our findings—reproduced independently across two national health systems and three age-stratified cohorts, each rigorously matched on all major determinants of dementia risk, meet, and in many respects exceed, current standards for high-quality observational evidence.

      Assigning the results to “residual confounding” requires more than speculation: it requires identification of a confounding factor that is (1) anchored in established dementia risk literature, (2) empirically plausible, and (3) quantitatively capable of generating a sustained ~50 percent reduction in dementia incidence over a decade. No such factor has been identified to date. We note that the assertion of “residual confounding” has not been supported by a specific, quantitatively plausible mechanism. A hypothetical bias that is both extremely large in effect and uncorrelated with all major risk factors is not statistically or biologically credible.

      The explanation we propose, reduction in dementia risk through elimination of latent Toxoplasma gondii, is biologically grounded, directly supported by independent epidemiologic literature, and uniquely capable of accounting for all convergent observations in our data. No alternative hypothesis has been put forward that can plausibly explain these findings.

      A revised version of the manuscript will be submitted shortly, incorporating expanded baseline analyses, with the strictest possible exclusion criteria (including congenital, vascular, chromosomal, and neurodegenerative disorders such as Parkinson’s disease), and complete tabulated comparisons. These data will further reinforce that the observed protective associations are not attributable to any measurable confounding. We also plan to enhance the discussion in order to address the points raised by the reviewers.

      In light of the expanded analyses, any reservations expressed in the initial Assessment can now be re-evaluated on the basis of the empirical evidence. The findings reported in our study meet, and in several respects exceed, current epidemiologic standards for high-quality observational research, clearly warrant publication, and provide a robust scientific foundation for future mechanistic and interventional studies to determine whether elimination of latent toxoplasmosis can prevent or treat dementia.

    3. Reviewer #1 (Public review):

      Summary:

      This useful study provides incomplete evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The study reinforces findings that VZ vaccine lowers AD risk and suggests that this vaccine may be an effect modifier of A-P's protective effect. Strengths of the study include two extremely large cohorts, including a massive validation cohort in the US. Statistical analyses are sound, and the effect sizes are significant and meaningful. The CI curves are certainly impressive.

      Weaknesses include the inability to control for potentially important confounding variables. In my view, the findings are intriguing but remain correlative / hypothesis generating rather than causative. Significant mechanistic work needs to be done to link interventions which limit the impact of Toxoplasmosis and VZV reactivation on AD.

      Weaknesses:

      Major:

      (1) Most of the individuals in the study received A-P for malaria prophylaxis as it is not first line for Toxo treatment. Many (probably most) of these individuals were likely to be Toxo negative (~15% seropositive in the US), thereby eliminating a potential benefit of the drug in most people in the cohort. Finally, A-P is not a first line treatment for Toxo because of lower efficacy.

      (2) A-P exposure may be a marker of subtle demographic features not captured in the dataset such as wealth allowing for global travel and/or genetic predisposition to AD. This raises my suspicion of correlative rather than casual relationships between A-P exposure and AD reduction. The size of the cohort does not eliminate this issue, but rather narrows confidence intervals around potentially misleading odds ratios which have not been adjusted for the multitude of other variables driving incident AD.

      (3) The relationship between herpes virus reactivation and Toxo reactivation seems speculative.

      (4) A direct effect on A-P on AD lesions independent on infection is not considered as a hypothesis. Given the limitations above and effects on metabolic pathways, it probably should be. The Toxo hypothesis would be more convincing if the authors could demonstrate an enhanced effect of the drug in Toxo positive individuals without no effect in Toxo negative individuals.

      Minor:

      (5) "Clinically meaningful" should be eliminated from the discussion given that this is correlative evidence.

    4. Reviewer #2 (Public review):

      Summary:

      This manuscript examines the association between atovaquone/proguanil use, zoster vaccination, toxoplasmosis serostatus and Alzheimer's Disease, using 2 databases of claims data. The manuscript is well written and concise. The major concerns about the manuscript center around the indications of atovaquone/proguanil use, which would not typically be active against toxoplasmosis at doses given, and the lack of control for potential confounders in the analysis.

      Strengths:

      (1) Use of 2 databases of claims data.

      (2) Unbiased review of medications associated with AD, which identified zoster vaccination associated with decreased risk of AD, replicating findings from other studies.

      Weaknesses:

      (1) Given that atovaquone/proguanil is likely to be given to a healthy population who is able to travel, concern that there are unmeasured confounders driving the association.

      (2) The dose of atovaquone in atovaquone/proguanil is unlikely to be adequate suppression of toxo (much less for treatment/elimination of toxo), raising questions about the mechanism.

      (3) Unmeasured bias in the small number of people who had toxoplasma serology in the TriNetX cohort.

    1. eLife Assessment

      This important work establishes a connection between PRMT1 and SFPQ by identifying common phenotypes downstream of their inactivation. In the resubmission, authors now include NMD as a contributor to aberrant gene expression underpinning craniofacial development. The complementary experiments help strengthen some solid conclusions. This paper describes an interesting mechanism for the regulation of RNA levels, which is of interest to the readers of eLife.

    2. Reviewer #1 (Public review):

      The current manuscript investigates a regulatory axis containing Prmt1, which methylates RNA binding proteins and alters intron splicing outcomes and expression of matrix genes. Authors test the effects of deficient Prmt1, Sfpq, and various other factors, using a combination of bioinformatic analyses and wet-lab validation approaches. Authors show that intron retention often triggers NMD, contributing to aberrant gene expression regulation and craniofacial development. The revised manuscript introduces several complementary experiments that help to strengthen conclusions. For example, authors directly investigate NMD-mediated transcript turnover to better understand how retention contributes to expression changes in genes of interest, and they assess several additional factors downstream of Prmt1 to justify a centralized interested in the PRMT1/SFPQ axis.

      Weaknesses:

      However, some points remain unaddressed or unexplored, which could bolster conclusions. For example, the transcriptome data from knockdown experiments indicate robust exon skipping, suggesting that analysis of these patterns in parallel with intron retention could provide additional insights into the responsive gene programs. Given that SFPQ is known to have multiple regulatory roles, a more thorough investigation of its possible mechanisms of action during craniofacial development would allow for definitive conclusions about the isolated impact of SFPQ-dependent splicing. Although authors employ CUT&Tag analysis of Pol II binding at the promoters and across the gene body, at the current scope, no change in Pol II association (i.e., absence of transcriptional repression) does not directly indicate a lack of transcriptional regulation by other means (pause release, elongation rate or processivity, transcription termination, etc.). Without a more thorough investigation of these mechanisms, this confounds definitive claims about their relative contributions to the gene expression landscape.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Lima et al examines the role of Prmt1 and SFPQ in craniofacial development. Specifically, the authors test the idea that Prmt1 directly methylates specific proteins that results in intron retention in matrix proteins. The protein SFPQ is methylated by Prmt1 and functions downstream to mediate Prmt1 activity. The genes with retained introns activate the NMD pathway to reduce the RNA levels. This paper describes an interesting mechanism for the regulation of RNA levels during development.

      Strengths:

      The phenotypes support what the authors claim that Prmt1 is involved in craniofacial development and splicing. They use of state of the art sequencing to determine the specific genes that have intron retention and changes in gene expression is a strength.

      Weaknesses:

      The results now support the conclusions;however, it is still unclear how direct the relationship is between Prmt1 and SFPQ.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 ( Public review):

      The strength of the current study lies in their establishing the molecular mechanism through which PRMT1 could alter craniofacial development through regulation of the transcriptome, but the data presented to support the claim that a PRMT1-SFPQ axis directly regulates intron retention of the relevant gene networks should be robust and with multiple forms of clear validation. For example, elevated intron retention findings are based on the intron retention index, and according to the manuscript, are assessed considering the relative expression of exons and introns from a given transcript. However, delineating between intron retention and other forms of alternative splicing (i.e., cryptic splice site recognition) requires a more comprehensive consideration of the intron splicing defects that could be represented in data. A certain threshold of intron read coverage (i.e., the percent of an intron that is covered by mapped reads) is needed to ascertain if those that are proximal to exons could represent alternative introns ends rather than full intron retention events. In other words, intron retention is a type of alternative splicing that can be difficult to analyze in isolation given the confounding influence of cryptic splicing and cryptic exon inclusion. If other forms of alternative splicing were assessed and not detected, more confident retention calls can be made.

      This manuscript is a mechanistic exploration that follows previous work we published on the role of Prmt1 in craniofacial development, in which genetic deletion of Prmt1 in CNCCs leads to cleft palate and mandibular hypoplasia (PMID: 29986157).

      As the reviewer pointed out, a certain threshold of intron read coverage is needed to assess intron retention events. We employed IRTools to assess the collective changes of intron retention between cell-states associated with certain biological function or pathway. IRTools incorporated considerations for intron read coverage by checking the evenness of read distribution in an intron. Specifically, every constitutive intronic regions (CIR) is divided into 10 equally sized bins and the proportion of reads that map to each bin is calculated. CIRs are then ranked according to their imbalance in bin-wise reads distribution, represented by the proportion of reads in its most populated bin. Those among top 1% are considered to contain potentially false IR events and excluded. We further addressed this question by developing another measure of intron retention, intron retention coefficient (IRC), which assesses IR events using the junction reads (Supplemental Figure-S8). Junction reads that straddle two exons are called exon-exon junction reads (spliced reads), and those that straddle an exon and a neighboring intron are called exon-intron junction reads (retained reads). The IRC of an intron is defined as the fraction of junction reads that are exon-intron junction reads: IRC = exon-intron read-count / (exon-exon read-count + exon-intron read-count), where exon-intron read-count = (5’ exon-intron read-count + 3’ exon-intron read-count) / 2. The IRC of a gene is defined as the exon-intron fraction of all junction reads overlapping or over the constitutive introns of this gene. In the calculation of the IRC, only exon-intron junction reads that cover the junction point and overlap both of each side for at least 8 bps were counted, and only exon-exon junction reads that jump over the relevant junction points and overlap each of the respective exons for at least 8 bps were counted. In this process, evenness of the proportion of exon-intron junction reads that are 5’ or 3’ exon-intron junction reads are taken into account. As shown in the Supplemental Figure S7A and S7B, IRC analysis generated consistent results with those obtained from using IRI (Figure 3A and 3I).

      In addition, as the reviewer pointed out, intron retention can be difficult to analyze in isolation. We followed the reviewer’s suggestion that “If other forms of alternative splicing were assessed and not detected, more confident retention calls can be made“ and analyzed other forms of alternative splicing for all ECM and GAG genes with significant IRI increase (genes highlighted in Figure-3A and 3I) using rMATS (Supplemental Figure-S9). Among these genes, only 5 genes (Cthcr1, Mmp23, Adamts10, Ccdc80 and Col25a1) showed statistically significant changes in skipped exon, 1 gene (Bmp7) showed significant changes in mutually exclusive exons, and none showed significant changes in alternative 5’ or 3’ splicing. SE and MXE changes detected were marginal (Supplemental figure S8), while the majority of matrix genes with significant intron retention didn’t exhibit other forms of alternative splicing, further supporting the confidence of intron retention calls.

      While data presented to support the PRMT1-SFPQ activation axis is quite compelling, that this is directly responsible for the elevated intron retention remains enigmatic. First, in characterizing their PRMT1 knockout model, it is unclear whether the elevated intron retention events directly correspond to downregulated genes.

      In the revised manuscript, we demonstrate IR-triggered NMD as a mechanism for transcript decay and downregulation of matrix genes. When IR-triggered NMD was blocked by chemical inhibitor NMDI14, the intron-retaining transcripts showed significant accumulation (new Figure-4). NMD is the RNA surveillance system to degrade aberrant RNAs. Intron retention-triggered NMD in cancer has both promotive and suppressive roles and NMD inhibitors has been tested for cancer therapy including immunotherapy. During embryonic development, the functional significance of NMD machinery is suggested by human genetic findings and mouse genetic models. NMD is driven by a protein complex composed of SMG and UPF proteins. Smg6, Upf1, Upf2 and Upf3a knockout mouse die at early embryonic stages (E5.5-E9.5), and Smg1 gene trap mutant mice die at E12.5 (PMID: 29272451). SMG9 mutation in human patients causes malformation in the face, hand, heart and brain (PMID: 27018474).

      We show that in CNCCs NMD functions both as a physiological mechanism and invoked by molecular insult. Blocking NMD in CNCCs caused significant accumulation of intron-retaining Adamts2, Alpl, Eln, Matn2, Loxl1 and Bgn transcripts, suggesting a basal role for NMD to degrade intron-retaining transcripts (Figure-4Ba-4Bf). We further demonstrated the accumulation of Adamts2 and Fbln5 using semi-quantitative PCR with the detection of a longer product from Adamts2 intron 19 and Fbln5 intron 7 (Figure-4Ca-4Ch). In CNCCs and ST2 cells, NMD is further invoked by Prmt1 and Sfpq deficiency. In Prmt1 deficient CNCCs, NMD blockage led to higher accumulation of intron-retaining Adamts2 and Alpl transcripts, suggesting that Prmt1 deficiency triggers NMD to reduce intron-containing transcripts (Figure-4Aa, 4Ab). In Sfpq-depleted ST2 cells, blocking NMD caused accumulation of intron-retaining transcripts Col4a2, St6galnac3 and Ptk7 (Figure-9B, 9C).

      Moreover, intron splicing is a well-documented node for gene regulation during embryogenesis and in other proliferation models, and craniofacial defects are known to be associated with 'spliceosomopathies'. However, reproduction of this phenotype does not suggest that the targets of interest are inherently splicing factors, and a more robust assessment is needed to determine the exact nature of alternative splicing in this system. Because there are several known splicing factors downstream of PRMT1 and presented in the supplemental data, the specific attribution of retention to SFPQ would be additionally served by separating its splicing footprint from that of other factors that are primed to cause alternative splicing.

      We have previously shown that a group of splicing factors depends on Prmt1 for arginine methylation, including SFPQ (PMID: 31451547). We tested additional splicing factors that are highly expressed in CNCCs and depends on PRMT1 for arginine methylation: SRSF1, EWSR1, TAF15, TRA2B and G3BP1 (Figure-5, 6 and 10). Among these factors, EWSR1 and TRA2B are both methylated in CNCCs and depend on PRMT1 for methylation (Fig. 5 and Supplemental Figure-S3B, S3C). We weren’t able to assess TAF15 methylation because of lack of efficient antibody for the PLA assay. We also demonstrated that their protein expression or subcellular localization was not altered by Prmt1 deletion in CNCCs, unlike SFPQ (Supplemental Figure-S4). To define their splicing footprint, we performed siRNA-mediated knockdown in ST2 cells, followed by RNA-seq and IRI analysis to define differentially regulated genes and introns, which revealed distinct biological pathways regulated by SFPQ, EWSR1, TRA2B and TAF15, but minimal roles of EWSR1, TRA2B and TAF15 on intron retention when compared to SFPQ (Fig. 10F-10S, Supplemental Figure S7A-S7F, Supplemental Tables S4-S6). ECM genes are significantly downregulated by all four splicing factors (Fig. 10F-10I), but EWSR1, TRA2B and TAF15 function through IR-independent mechanisms, such as exon skipping, as exemplified by Postn (Fig. 10J-10S).

      Clarifying the relationship between SFPQ and splicing regulation is important given that the observed splicing defects are incongruous with published data presented by Takeuchi et al., (2018) regarding SFPQ control of neuronal apoptosis in mice. In this system, SFPQ was more specifically attributed to the regulation of transcription elongation over long introns and its knockout did not result in significant splicing changes. Thus, to establish the specificity for the SFPQ in regulating these retention events, authors would need to show that the same phenotype is not achieved by mis-regulation of other splicing factors. That the authors chose SFPQ based on its binding profile is understandable but potentially confounding given its mechanism of action in transcription of long introns (Takeuchi 2018). Because mechanisms and rates of transcription can influence splicing and exon definition interactions, the role of SFPQ as a transcription elongation factor versus a splicing factor is inadequately disentangled by authors.

      To test whether SFPQ acts as a transcription elongation factor, we performed Pol II Cut&Tag in ST2 cells and demonstrated that depletion of SFPQ only caused marginal changes in either the promoter region or gene body of ECM genes, suggesting that the role of SFPQ as a transcriptional activator or elongation factor is minimal (Fig. 7G, 7H). This finding is distinct from SFPQ function in neurons (PMID: 29719248), suggesting that the activation or recruitment of SFPQ in transcriptional regulation may involve tissue-specific factors in neurons.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Lima et al examines the role of Prmt1 and SFPQ in craniofacial development. Specifically, the authors test the idea that Prmt1 directly methylates specific proteins that results in intron retention in matrix proteins. The protein SFPQ is methylated by Prmt1 and functions downstream to mediate Prmt1 activity. The genes with retained introns activate the NMD pathway to reduce the RNA levels. This paper describes an interesting mechanism for the regulation of RNA levels during development.

      Strengths:

      The phenotypes support what the authors claim that Prmt1 is involved in craniofacial development and splicing. The use of state-of-the-art sequencing to determine the specific genes that have intron retention and changes in gene expression is a strength.

      Weaknesses:

      Some of the data seems to contradict the conclusions. And it is unclear how direct the relationships are between Prmt1 and SFPQ.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      First, the claims regarding the effect of PRMT1 loss on splicing are unclear by the section title. In other words, does loss PRMT1 change the incidence of baseline alternative splicing events, or does it introduce new retention events that are responsible for underwriting the craniofacial phenotype? Consistent with this idea, the narrative could benefit from more cellular and/or histological validations of the transcriptomic defects discovered in the RNAseq, which could help contextualize the bioinformatics data with the developmental defects. Moreover, the conclusions drawn about intron retention could be clarified in terms of how applicable the mechanism is likely to be outside of this tissue-specific set of responsive introns.

      Loss of Prmt1 did not cause a global shift in intron retention, as shown in Supplemental Figure S2. Instead, Prmt1 deletion caused increase of intron retention specifically in genes enriched in cartilage development, glycosaminoglycan biology, dendrite and axon, and decreased intron retention in mitochondria and metabolism genes (Table. S1). We also tested matrix protein expression by histology to confirm that transcriptomic defects revealed at the RNA level resulted in lower protein production. The new data are in Figure 3E-3H.

      Additionally, invoking NMD to align splicing and differential gene expression data understandable but lacking sufficient controls to be conclusive, such as positive control genes to confirm inhibition of NMD.

      To validate the blockage of NMD, glutathione peroxidase 1 (Gpx1) intron 1, a well-documented substrate for NMD, is tested as positive control (Fig 4Ac, 4Ad, 9B).

      Additionally, it should be clarified whether NMD is a basal mechanism for the regulation of these introns or whether it is an induced mechanism that is invoked by the molecular insult.

      In CNCCs, NMD functions both as a physiological mechanism and invoked by molecular insult. Please refer to responses to Reviewer 1’s public review for detailed explanations.

      Further, authors present data downstream of two siRNAs for the same gene target, but it remains unclear how siRNAs for the same gene target produce different effects. It may be helpful for authors to clarify how many of the transcriptomic defects are shared versus unique between the siRNAs.

      To address this question, we used bioinformatic analysis of the whole genome data to the similarity in changes caused by the two SFPQ-targeting siRNAs. As shown in the new Fig. 7Ba & 7Bb, transcriptomic and intron changes are consistent between the two siRNAs, suggesting that genes targeted by the two siRNA predominantly overlap. This overlap is illustrated by scatter plot analysis of RNAseq DEG and IRI data from each siRNA against SFPQ.

      Finally, we stress the importance of presenting the full conceptual basis for SFPQ's potential role in splicing and gene expression. It is significant to note that SFPQ has been previously studied as a splicing factor and was instead determined to function in support of the transcription elongation rather than in splicing. Thus, if authors are confident that the SFPQ manifests directly in splicing changes they encumber the burden of proof to show that its role in transcription, nor another splicing factor, are driving splicing changes.

      We demonstrated that depletion of SFPQ only caused marginal changes in either the promoter region or gene body of ECM genes, suggesting that the role of SFPQ as a transcriptional activator or elongation factor is minimal (Fig. 7G, 7H). Please refer to responses to Reviewer 1’s public review for detailed explanations.

      Reviewer #2 (Recommendations for the authors):

      (1) It is not clear why the authors focused on intron retention targets vs the other possibilities. Skipped Exon is much higher in terms of the number of changes, please clarify. For the intron retention how is this quantified? The traces are nice, but it is hard to tell which part is retained at this magnification. Also, because the focus is on extracellular matrix (ECM) and NMD it would be nice to show some of those targets here. In the tbx1 trace, some are up and some are down. What does that mean for the gene expression?

      We have investigated SE initially and found that genes with significant changes in Prmt1 CKO CNCCs fall into diverse functional pathways. Among them, a few genes are critical for skeletal formation, including Postn and Fn, and the function of their exon skipping has been documented. For example, the two exons that are skipped in Postn, Exon17 and 21, have been shown to regulate craniofacial skeleton shape and mandibular condyle hypertrophic zone thickness using transgenic mouse models (PMID: 36859617). As illustrated by Figure 10, the skipped exon of Postn is regulated by multiple splicing factors that may perform overlapping functions in vivo.

      Intron retention of each gene is quantified by the ratio of the overall read density of its constitutive intronic regions (CIRs) to the overall read density of its constitutive exonic regions (CERs) and defined as the intron retention index (IRI). In the first section of Response to Reviewer 1’s comments, we explained additional bioinformatic analysis that was performed to address reviewers’ questions, support the confidence of intron event calls and rule out the possibility of other alternative splicing mechanisms, such as by SE, MXE, A5SS or A3SS (Supplemental Figure S5, S6, Table S7).

      (2) RNA-Sequencing of Prmt1 mutants nicely shows gene expression changes, including in ECM and GAG genes. While validation of the sequencing results is not necessarily required, it would be very interesting to show the expression in situ. In addition, the heat map shows both downregulated but also upregulated transcripts. This is expected since this protein regulates many genes. However, the volcano plot shows a significant number of genes upregulated. It would be interesting to show what the upregulated genes are. And what is the proposed mechanism for Prmt1 regulation of upregulated genes?

      Validation for the transcriptomic changes is shown in Fig. 3E-3H using immunostaining.

      As for upregulated genes in Prmt1 mutant, top pathways include cytokine-mediated signaling pathway, signal transduction by p53 signaling pathway and cell morphogenesis (Figure 2E), which are consistent with our previous reports that Prmt1 deletion induces cytokine production in oral epithelium and leads to p53 accumulation in embryonic epicardium (PMID: 32521264, 29420098). Besides these pathways, Prmt1 deletion also caused upregulation of genes involved in adult behavior, postsynaptic organization and apoptotic process, which is consistent with findings from other labs on PRMT1 function in neuronal and cancer cells (PMID: 34619150, 33127433).

      (3) Specific transcripts were shown to have elevated intron retention involved in the ECM and GAG pathway. However in Figure 3D it seems to show the opposite with intronic expression decreased and exonic increases and intronic decrease. This is very important to the final conclusion of the paper. In addition, is there a direct relationship between increased intron and downregulation of this specific gene expression? It seems a bit correlational as it could also be an indirect mechanism. One way to test this is to do in vitro translation with and without the specific intron to test if it results in lower expression.

      We apologize for the mis-labeling in previous version of Figure 3D, which is now corrected. We also tried to test the direct relationship between intron and downregulation of matrix genes such as Adamts2 using in vitro experiments, however, the introns of matrix genes with high retention tends to be long, many 10 to 50kb in length, making it challenging to generate mini-gene constructs for molecular analysis. We used a different approach and demonstrated that inhibition of NMD with a chemical inhibitor NMDI14 caused dramatic accumulation of the Adamts2, Alpl, Eln, Matn2, Loxl1 and Bgn transcripts, suggesting that retained introns triggered NMD to regulate gene expression and this mechanism acts as a physiological level in CNCCs (Fig. 4). We also blocked NMD in control and Prmt1 null CNCCs, where NMD blockage led to higher accumulation of Adamts2 and Alpl transcripts, suggesting that upon Prmt1 deficiency, NMD is further utilized to degrade intron-containing transcripts (Fig. 4). Similarly, in Sfpq-depleted ST2 cells, blocking NMD caused accumulation of intron-retaining transcripts Col4a2, St6galnac3 and Ptk7 (Fig. 9A, 9B).

      (4) While Figure 4 nicely shows the methylation of SFPQ is reduced in Prmt1 CKO cells, it is unclear which reside this methylation occurs. Also the overall expression of SFPQ is also down so it is possible that the methylation is indirect ie Prmt1 regulates some other methyltransferase that regulates SFPQ. Or that because the overall level of SFPQ is down, there is no protein to methylate. How do the authors differentiate between these possibilities?

      Previously, arginine methylation of SFPQ has been characterized using in vitro reaction and cell lines with biochemical assays by Snijders., et al in 2015 (PMID: 25605962). Among all PRMTs that catalyze asymmetric arginine dimethylation (ADMA), SFPQ is methylated by only PRMT1 and PRMT3, with PRMT1 showing higher efficiency while PRMT3 showing a lower efficiency. However, PRMT3 is mainly cytosolic. Its expression in CNCCs is about 100-fold lower than PRMT1 (Fig. 1). Based on these knowledges, PRMT1 is the primary arginine methyltransferase for SFPQ, a nuclear protein in CNCCs. We and others have shown in a previous publication that SFPQ methylation on arginine 7 and 9 depends on PRMT1 (PMID: 31451547).

      To investigate SFPQ protein degradation in CNCCs, we used MG132 to block proteasomal degradation and observed a partial rescue of SFPQ protein degradation in Prmt1 mutant embryos, suggesting that SFPQ is degraded through proteasomal-mediated mechanism. To address the relationship between SFPQ methylation and protein expression, we assessed arginine methylation of SFPQ that accumulated after MG132 treatment. The accumulated SFPQ was not methylated, confirming the absence of methylation even when SFPQ protein expression is restored.

      Snijders., et al, also shown that citrullination induced by PADI4 regulate SFPQ stability (Snijders 2015). We considered this possibility and assessed the expression levels of PADIs. In E13.5 and E15.5 CNCCs, PADI1-4 mRNA expression levels are very low (TPM<5), suggesting that PADIs may not regulate SFPQ stability in CNCCs. A detailed mechanism as to how PRMT1-mediated SFPQ methylation controls stability awaits further investigation.

      (5) For the Sfpq deleted experiment, it seems that the two knockdowns are not similar in the gene targets and GO terms different except Wnt signaling. This makes this data difficult to interpret. The genes identified as intron retention are different than the ones identified in Prmt1 deletion and not reduced as much. How does this fit in with the Prmt1 story? If working through Sfpq, it assumes that the targets will be similar and more the 8% would be in common.

      To address the first concern, we used bioinformatic analysis of the whole genome data to the similarity in changes caused by the two SFPQ-targeting siRNAs. As shown in the new Fig. 7Ba & 7Bb, transcriptomic and intron changes are consistent between the two siRNAs, suggesting that genes targeted by the two siRNA predominantly overlap. This overlap is illustrated by scatter plot analysis of RNAseq DEG and IRI data from each siRNA against SFPQ.

      We have previously identified a group of splicing factors that depends on PRMT1 for arginine methylation, including SFPQ (PMID: 31451547). In the new data in Figures 5, 6 and 10, we tested an additional five PRMT1-dependent splicing factors that are highly expressed in CNCCs: SRSF1, EWSR1, TAF15, TRA2B and G3BP1 (Fig. 5, 6 and 10). Among these factors, SRSF1 and G3BP1 are predominantly expressed in the cytosol of NCCs at E13.5. As splicing activity in the nucleus is needed for pre-mRNA splicing, we excluded these two and focused on the other three proteins. EWSR1 and TRA2B are both methylated in CNCCs and depend on PRMT1 for methylation (Fig. 5). We weren’t able to assess TAF15 methylation because of lack of efficient antibody for the PLA assay. We also demonstrated that their protein expression or subcellular localization was not altered by Prmt1 deletion in CNCCs, unlike SFPQ (Fig. S2). To define their splicing footprint, we performed siRNA-mediated knockdown in ST2 cells, followed by RNA-seq and IRI analysis to define differentially regulated genes and introns, which revealed distinct biological pathways regulated by SFPQ, EWSR1, TRA2B and TAF15, but minimal roles of EWSR1, TRA2B and TAF15 on intron retention when compared to SFPQ (Fig. 10F-10I, Supplemental Figure S7A-S7F). ECM genes are significantly downregulated by all four splicing factors (Fig. 10J-10M), but EWSR1, TRA2B and TAF15 regulate transcription or exon skipping instead of IR, as exemplified by Alpl and Postn (Fig. 10N-10T).

      (6) The addition of an NMD mechanism is interesting but not surprising that when inhibiting the pathway broadly, there is an increase in gene expression in the mesoderm cell line. How specific is this to craniofacial development?

      NMD is driven by a protein complex composed of SMG and UPF proteins. We show in the revised manuscript that NMD is both a physiological mechanism in CNCCs and triggered by genetic disturbance (Fig. 4). These data are in line with human patient reports where SMG9 mutation in human causes malformation in the face, hand, heart and brain (PMID: 27018474). Mouse genetic studies also demonstrated roles of NMD components during embryonic development.Smg6, Upf1, Upf2 and Upf3a knockout mouse die at early embryonic stages (E5.5-E9.5), and Smg1 gene trap mutant mice die at E12.5 (Han 2018). Additionally, intron retention-triggered NMD in cancer has both promotive and suppressive roles and NMD inhibitors has been tested for cancer therapy and recently cancer immunotherapy. Our findings highlight matrix genes as one of the key targets for NMD during craniofacial development.

      Minor:

      (1) The supplemental figures are difficult to understand. In the first upload there are many figures and tables, some excel files that are separate uploads and some not. Please upload as separate files so it is clear. And also put them in order that they are in the manuscript.

      (2) For the heat map in figure 2B, it would be good to show all the genes or none at all. It seems a bit like cherry-picking to highly only a few. And they are not labeled where they are located in the graph. Are these the top lines if so please label.

      (3) Gene names in Figure 3A are difficult to read. I would also not consider BMP7 an ECM gene.

      (4) A summary diagram of the interactions proposed will help to make this more understandable.

      The supplemental figures are reorganized and uploaded as separate word and excel documents. For Heat map in Fig. 2B, we have removed the gene names. For Fig. 3A, only the most significantly changed gene are labeled in red dots with names. We didn’t label all the genes because of the large number of genes. For the new Figure 3B, we have replaced BMP7. A schematic summary is also added to Supplemental Fig. S9 to illustrate the PRMT1-SFPQ pathway.

    1. eLife Assessment

      This important study provides one mechanism that can explain the rapid diversification of poison-antidote pairs in fission yeast: recombination between existing pairs. The evidence is largely solid, but the study needs to tune down its claims (as it is not shown that the novel poison-antidote can serve as a meiotic driver), and to address small experimental requests. The work is of interest to scientists studying genetic incompatibilities.

    2. Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the MS. If not, I suggest the authors do these simple experiments and add this information.

      Strengths:

      The most interesting data (Fig. 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      Weaknesses:

      Some minor rewriting is needed.

      Comments on Revision:

      (1) The parameter for "maximum growth rate" in Figure 2D needs to be defined and put on the graph.

      (2) On page 8, line 182, the authors should consider testing the hybrid wtf in meiosis using strain 975 of Leupold, which is h+, or another standard h+ strain. I don't think the antidote allele is needed; rather, it seems to me it would counter the lethality of the poison protein and should be omitted to test drive of the hybrid wtf. This is a simple experiment and would add considerably to the paper.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Revision update:

      Having read the response to the reviewers, I believe the major issues have been addressed. However, I would strongly suggest toning down the claim regarding the chimeric WTF element in the abstract, which currently reads

      "As proof-of-principle, we generate a novel meiotic driver through artificial recombination between wtf drivers, and its encoded poison cannot be detoxified by the antidotes encoded by their parental wtf genes but can be detoxified by its own antidote."

      As the author reports in their response, despite various attempts, it was not possible to show that this chimeric WTF element was indeed capable of meiotic drive in a natural context (not transgenic overexpression experiment). thus the authors should not claim they generated "a novel meiotic driver"

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth condition.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      Revision update:

      The authors measured the fitness of the deletion strains using growth curves (Fig. 2C and D) and no significant differences were found, further supporting their claims. The requested information (details on the generation of the deletion strains) is now available in the methods section.

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      Revision update:

      The authors report that the expression of the construct was measured. However, they do not make reference to any specific figure or section of the main text. It would be very useful if the authors explicitly referenced where exactly changes were made (this is true for all changed made)

      (3) Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test. "<br /> Reporting a p-value of 0 is not appropriate. Please report exact P-values.

      Revision update:

      This has been addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the manuscript. If not, I suggest the authors do these simple experiments and add this information.

      Thanks for the great summary! All the wtf genes have been tested for meiotic drive phenotypes previously by Bravo Nunez et al. (2020; http://doi.org/10.1371/journal.pgen.1008350). The reference was cited in our original manuscript, and we added the details in the revised manuscript.  

      Strengths:

      The most interesting data (Figure 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Weaknesses:

      In the opinion of this reviewer, some minor rewriting is needed.

      We did the rewriting as this reviewer suggested.

      Reviewer #2 (Public review):

      Summary:

      This important study provides a mechanism that can explain the rapid diversification of poison-antidote pairs (wtf genes) in fission yeast: recombination between existing genes.

      Thanks!

      Strengths:

      The authors analyzed the diversity of wtf in S. pombe strains, and found pervasive copy number variations. They further detected signals of recurrent recombination in wtf genes. To address whether recombination can generate novel wtf genes, the authors performed artificial recombination between existing wft genes, and showed that indeed a new wtf can be generated: the poison cannot be detoxified by the antidotes encoded by parental wtf genes but can be detoxified by own antidote.

      Thanks for the great summary!

      Weaknesses:

      The study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Thanks! Please see the following for our point-to-point response.

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe.

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Thanks for the great summary!

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed a spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth conditions.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      We generated growth curves for all the 25 wtf deletion strains. We provided the details for wtf gene knockout. However, for 25 wtf genes, there are too many combinations for editing two genes, and it is technically challenging to knock out multiple wtf together. Nevertheless, our results suggest single wtf genes have little effect on the host fitness under normal condition.

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      We verified the expression of the chimeric genes. The last exon of wtf18 is too small (128bp) to do more meaningful chimeras.

      (3) Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test." Reporting a p-value of 0 is not appropriate. Exact P-values should be reported. 

      Due to software limitations, the PHI test reports p-values of 0.0 for extremely significant results. We have therefore reported them as <0.0001 in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Regarding the synthetic chimeric wtf gene constructed by combining exons of wtf23 and wtf18, the authors did not explicitly test whether it acts as a meiotic driver in the natural context of a cross. Instead, they examined this possibility only through transgenic overexpression experiments. Given that this is arguably the most important claim of the paper, it is critical that the authors perform, report, and discuss such an experiment in a natural context, regardless of the outcome. It is not necessary to test other recombinants or other wtf loci.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Reviewer #1 (Recommendations for the authors):

      The paper is very well written, but some minor points should be corrected or checked.

      (1) Line 95: Why "Putative"? Is it not clear what a wtf pseudogene is?

      “Putative” was removed.

      (2) Line 105: Does "known functional" mean they are active (i.e., have been tested and shown to be active)? If so, a reference should be added.

      We used “known meiotic divers”, and added reference here.

      (3) Line 135: "no recombination signal was tested". Do the authors mean no signal was inferred? 

      We changed “tested” to “detected”.

      (4) Line 147: References for "known functional meiotic drivers (wtf23) and artificially generated meiotic driver (wtf18)" should be given. A statement of how wtf18 was "artificially generated" is essential so the reader knows how that element differs from the wtfC4 generated here.

      Reference for wtf23. As for wtf18, we have specified in the follow text, namely “we artificially introduced an in-frame ATG codon right before the start of exon 2, generating wtf18poison/-0M.”

      (5) Lines 154 and 424 say an ATG codon was introduced "right before the start of exon 2," but Figure 4B shows it before exon 1.

      We thank the reviewer. The introduced ATG is the second start codon in the long transcript and the first in the short transcript. The right panel of Figure 4B shows the short transcript, so the text and figure are consistent.

      (6) Line 159: The wtf18 mutant with this additional ATG codon should be tested in meiosis, to see if "putative" is correct.

      Thanks. As wtfC4, we came with technical challenges to show the driver phenotype in a natural setting, and thus removed this statement.

      (7) Line 181: change "driver" to "drive".

      Driver is correct.

      (8) Line 184: insert to read "wtf genes tested". Also, what is the basis for proposing that "the last exon might be crucial for antidote function"?

      “Tested” added, and removed the statement.

      (9) Line 198: change to read "detects only large differences".

      Done as suggested.

      (10) Line 204: change "removed" to "removal".

      Done as suggested.

      (11) Lines 242 and 243: Are "Splittree4" and "SplitsTree4" different, or is this a misprint?

      Corrected!

      (12) Lines 274-5 and 412 -3 would read better as "strains were diluted in five 10-fold steps” and “...μL of each dilution spotted on” “…to assay for…"

      Done as suggested.

      (13) Line 284 says "No new data were generated." This is clearly wrong. Perhaps the authors mean there are no supplementary data files.

      Corrected!

      (14) Line 406: Change "is" to "are".

      Corrected!

      (15) Line 413: Surely, they were spotted onto YE agar medium, not liquid medium.

      Corrected!

      (16) Figure 3C: Define "Rho" and the scale used.

      The definition of Rho has been added to the Methods section in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The evidence is largely solid, but the study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wt18f. However, we encountered a challenge: since 972h- is a mating-type strain and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins.

      Reviewer #3 (Recommendations for the authors):

      I strongly recommend the authors provide all the details concerning the generation of the knock-out strains, including specific primers used (for both the deletion and validation), the result of these validations, and the specific genotype (and ID) of the strains generated.

      These details are now included in the Materials and Methods section and in Supplementary.

      Please also provide exact P-values (see point 3).

      Due to software limitations, the PHI test reports p-values of 0.0 for extremely significant results. We have therefore reported them as <0.0001 in the revised manuscript.

    1. eLife Assessment

      This valuable study uses tools of population and functional genomics to examine long non-coding RNAs (lncRNAs) in the context of human evolution. Analyses of computationally predicted human-specific lncRNAs and their genomic targets lead to the development of hypotheses regarding the potential roles of these genetic elements in human biology. Compared to previous versions, the conclusions regarding evolutionary acceleration and adaptation have become more solid by more fully taking data and literature on human/chimpanzee genetics and functional genomics into account.

    2. Joint Public Review:

      While DNA sequence divergence, differential expression and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly the analysis are straightforward and after identifying these regions/HSlncRNAs they examined their effects using different external datasets.

      Comments on the latest version from Reviewer #2:

      I think this is as good as it is going to get, and I do appreciate that the authors are still engaging in good faith after all these rounds of revision, so I am happy to stop here! I do think the paper is significantly improved from the last time around, and the conclusions have been tempered significantly.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As before, I appreciate the changes made in response to my comments, and I think everyone is approaching this in the spirit of arriving at the best possible manuscript, but we still have some deep disagreements on the nature of the relevant statistical approach and defining adequate controls. I highlight a couple of places that I think are particularly relevant, but note that given the authors disagree with my interpretation, they should feel free to not respond!

      (1) On the subject of the 0.034 threshold, I had previously stated: "I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below."

      In their reply to me, the authors state:

      "What we need is a gene number, which (a) indicates genes that effectively differentiate humans from chimpanzees, (b) can be used to set a DBS sequence distance cutoff. Since this study is the first to systematically examine DBSs in humans and chimpanzees, we must estimate this gene number based on studies that identify differentially expressed genes in humans and chimpanzees. We choose Song et al. 2021 (Song et al. Genetic studies of human-chimpanzee divergence using stem cell fusions. PNAS 2021), which identified 5984 differentially expressed genes, including 4377 genes whose differential expression is due to trans-acting differences between humans and chimpanzees. To the best of our knowledge, this is the only published data on trans-acting differences between humans and chimpanzees, and most HS lncRNAs and their DBSs/targets have trans-acting relationships (see Supplementary Table 2). Based on these numbers, we chose a DBS sequence distance cutoff of 0.034, which corresponds to 4248 genes (the top 20%), slightly fewer than 4377."

      I have some notes here. First, Agoglia et al, Nature, 2021, also examined the nature of cis vs trans regulatory differences between human and chimps using a very similar set up to Song et al; their Supplementary Table 4 enables the discovery of genes with cis vs trans effects although admittedly this is less straightforward than the Song et al data. Second, I can't actually tell how the 4377 number is arrived at. From Song et al, "Of 4,671 genes with regulatory changes between human-only and chimpanzee-only iPSC lines, 44.4% (2,073 genes) were regulated primarily in cis, 31.4% (1,465 genes) were regulated primarily in trans, and the remaining 1,133 genes were regulated both in cis and in trans (Fig. 2C). This final category was further broken down into a cis+trans category (cis- and transregulatory changes acting in the same direction) and a cis-trans category (cis- and trans-regulatory changes acting in opposite directions)." Even when combining trans-only and cis&trans genes that gives 2,598 genes with evidence for some trans regulation. I cannot find 4,377 in the main text of the Song et al paper.

      Elsewhere in their response, the authors respond to my comment that 0.034 is an arbitrary threshold by repeating the analyses using a cutoff of 0.035. I appreciate the sentiment here, but I would not expect this to make any great difference, given how similar those numbers are! A better approach, and what I had in mind when I mentioned this, would be to test multiple thresholds, ranging from, eg,0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size.

      (1) We sincerely thank the reviewer for this critical point. Our initial purpose, based on DBS distances from the human genome to chimpanzee genome and archaic genomes, was that genes with large DBS distances may have contributed more to human evolution. However, our ORA (overrepresentation analysis) explored only genes with large DBS distances (the legend of old Figure 2 was “1256 target genes whose DBSs have the largest distances from modern humans to chimpanzees and Altai Neanderthals are enriched in different Biological Processes GO terms”), with the use of the cutoff (threshold) of 0.034 for defining large distance. The cutoff is not totally unreasonable (as our new results and the following sensitivity analysis indicate), but this approach was indirect and flawed.

      (2) We have now performed ORA using two methods. The first uses only DBS distances. Instead of using a cutoff, we now sort genes by DBS distance (human-chimpanzee distances and human-Altai Neanderthal distance, respectively, see Supplementary Table 5) and use the top 25% and bottom 25% of genes to perform ORA. This directly examines whether DBS distances along indicate that genes with large DBS distances contribute more to human evolution than genes with small DBS distances. The second also explores the ASE genes (allele-specific expression, genes undergoing human/chimpanzee-specific regulation in the tetraploid human–chimpanzee hybrid iPS) reported by Agoglia et al. 2021. We select the top 50% and bottom 50% of genes with large and small DBS distances, intersect them with ASE genes from Agoglia et al. 2021 (their Supplementary Table 4), and apply ORA to the intersections. Both the results are that: (a) more GO terms are obtained from genes with large DBS distances, (b) more human evolution-related GO terms are obtained from genes with large DBS distances (Supplementary Table 5,6,7; Figure 2; Supplementary Fig. 15). These results directly suggest that genes with large DBS distances contribute more to human evolution than genes with small DBS distances, which is a key theme of the study.

      (3) Regarding Song et al 2021, the statement of “we differentiated…allotetraploid (H1C1a, H1C1b, H2C2a, H2C2b) lines into ectoderm, mesoderm, and endoderm” made us assume that their differentiated hybrid cell lines cover more tissue types than those of Agoglia et al. 2021. Now, upon re-examining Supplementary Table 5 of Song et al. and Supplementary Table 4 of Agoglia et al. 2021, we find that the latter more clearly indicates significant ASE genes (p-adj<0.01 and |LFC>0.5| in GRCh38 and PanTro5).

      (4) We have also performed two additional analyses in response to the suggestion of “test multiple thresholds, ranging from, eg, 0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size”. First, we performed a multi-threshold sensitivity analysis using a spectrum of cutoffs (0.03, 0.034, 0.04, 0.05), and tracked the number of genes identified and the enrichment significance of key GO terms (e.g., "neuron projection development," "behavior") across these thresholds. The result confirms that while the absolute number of genes varies with the cutoffs, the core biological conclusion (specifically, the significant enrichment of target genes in neurodevelopmental and cognitive functions) remains stable and significant. For instance, "behavior" maintains strong statistical significance (FDR<0.01) in both the human-chimpanzee and human-Altai Neanderthal comparisons across all tested cutoffs, and "Neuron projection development" also remains significant across three (0.03, 0.034, 0.04) of the four cutoffs in the Altai comparison. This pattern suggests that our core findings regarding neurodevelopmental functions are robust across a range of cutoffs. Nevertheless, we did not extend the analysis to smaller cutoffs (e.g., 0.01 or 0.02) because such values would identify an excessively large number of genes (>10000) for ORA, which would render the GOterm enrichment analysis less meaningful due to a loss of specificity.

      Second, we have performed an additional validation to directly evaluate whether the 0.034 cutoff itself represents a stringent and biologically meaningful value. We sought to empirically determine how often a DBS sequence distance of 0.034 or greater might occur by chance in promoter regions, thereby testing its significance as a marker of potential evolutionary divergence. We randomly sampled 10,000 windows from annotated promoter regions across the hg38 genome, each with a size matching the average length of DBSs (147 bp). We then calculated the per-base sequence distances for these random windows between modern humans and chimpanzees, as well as between modern humans and the three archaic humans (Altai, Denisovan, Vindija). The analysis reveals that a distance of ≥0.034 is a rare event in random promoter sequences: for Human-Chimp, Human-Altai, HumanDenisovan, and Human-Vindija, 5.49% (549/10000), 0.31% (31/10000), 4.47% (447/10000), and0.03% (3/10000) of random windows reach this distance. This empirical evidence suggests that 0.034 is a sufficiently strong cutoff for defining large DBS distance, it would occur very unlikely in a random genomic background (P<0.1 for Chimpanzee and P<0.05 for the archaic humans), and DBSs exceeding this cutoff are significantly enriched for sequences that have undergone substantial evolutionary change instead of being random neutral variations.  

      (5) We present new Figure 2, Supplementary Table 5,6,7, and Supplementary Fig. 15. We have substantially revised section 2.3, related sections in Results, Supplementary Note 3, and Supplementary Table 8. We have removed related descriptions and explanations in the main text and Supplementary Notes. The results of the above two analyses are presented here as two Author response images.

      Author response table 1.

      Sensitivity analysis of GO-term enrichment across different DBS sequence distance cutoffs. The table shows the numbers of target genes identified and the false discovery rates (FDR) for the enrichment of three selected GO terms at four different distance cutoffs. Note that, unlike in the old Figure 2, the results for chimpanzees and Altai Neanderthals are not directly comparable here, as the numbers of target genes used for the enrichment analysis differ between them at each cutoff.

      Author response image 1.

      Distribution of per-base sequence distances for DBS size-matched random genomic windows in Ensembl-annotated promoter regions, calculated between modern humans and (A) chimpanzee, (B) Altai Neanderthal, (C) Denisovan, and (D) Vindija Neanderthal genomes.

      (2) The authors have introduced a new TFBS section, as a control for their lncRNAs - this is welcome, though again I would ask for caution when interpreting results. For instance, in their reply to me the authors state: "The number of HS TFs and HS lncRNAs (5 vs 66) <HS TF vs all HS lncRNAs> alone lends strong evidence suggesting that HS lncRNAs have contributed more significantly to human evolution than HS TFs (note that 5 is the union of three intersections between <many2zero + one2zero> and the three <human TF list>)."

      But this assumes the denominator is the same! There are 35899 lncRNAs according to the current GENCOVE build; 66/35899 = 0.0018, so, 0.18% of lncRNAs are HS. The authors compare this to 5 TFs. There are 19433 protein coding genes in the current GENCOVE build, which naively (5/19433) gives a big depletion (0.026%) relative to the lnc number. However, this assumes all protein coding genes are TFs, which is not the case. A quick search suggests that ~2000 protein coding genes are TFs (see, eg, https://pubmed.ncbi.nlm.nih.gov/34755879/); which gives an enrichment (although I doubt it is a statistically significant one!) of HS TFs over HS lncRNAs (5/2000 = 0.0025). Hence my emphasis on needing to be sure the controls are robust and valid throughout!

      We thank the reviewer for this comment. While 5 vs 66 reveals a difference, a direct comparison is too simplified. The real take-home message of the new TFBS section is not the numbers but the distributions of HS TFs’ targets and HS lncRNAs’ targets across GTEx organs and tissues (Figure 3 and Supplementary Figures 24, 25) - correlated HS lncRNA-target transcript pairs are highly enriched in brain regions, but correlated HS TF-target transcript pairs are distributed broadly across GTEx tissues and organs. We have now removed the simple comparison of “5 vs 66” and more carefully explained our comparison in section 2.6.

      (3) In my original review I said: line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      In their reply to me, the authors state:

      Here, we actually made an analogy but not an inference; therefore, we used such words as "suggesting" and "similar" instead of using more confirmatory words. We have revised the latter half sentence, saying "raising the possibility that these sequences have evolved considerably during human evolution".

      Is the aim here to draw attention to the ~2.2% of DBS that do not have a counterpart? In that case, it would be better to rewrite the sentence to emphasise those, not the ones that are shared between the two species? I do appreciate the revised wording, though.

      (1) Our original phrasing may be misleading, and we agree entirely that “pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee”. As explained in that reply, we know and think that DBSs and HARs are two different classes of sequences, and indeed, identifying HARs and acceleration relies on a far more thorough methodology. Yet, three factors prompted us to compare them. First, both suggest the importance of sequences outside genes. Second, both are quite “old” sequences and have undergone considerable evolution recently (although the references are different). Third, both have contributed greatly to human brain evolution.  

      (2) Here, our stress is 97.81% but not 2.2%, and we have made this analogy more clearly and cautiously. Relevant revisions have been made in the Results, Discussion, and Methods sections.   

      (3) We also have further determined whether the 2.2% DBSs are human-specific gains by analyzing them using the UCSC Multiz Alignments of 100 Vertebrates. The result confirms that all 2248 DBSs are present in the human genome but are absent from the chimpanzee genome and all other aligned vertebrate genomes. We add this result into the manuscript.

      (4) Finally, Line 408: "Ensembl-annotated transcripts (release 79)" Release 79 is dated to March 2015, which is quite a few releases and genome builds ago. Is this a typo? Both the human and the chimpanzee genome have been significantly improved since then!

      (1) We thank the reviewer for this comment, which prompts us to provide further explanation and additional data. First, we began predicting HS lncRNAs’ DBSs when Ensembl release 79 was available, but did not re-predict DBSs when new Ensembl releases were published because (a) these new Ensembl releases are based also on hg38, (b) we did not find any fault in the LongTarget program during our use, nor received any one from users, (c) predicting lncRNAs’ DBSs using the LongTarget program is highly time-consuming.  

      (2) Second, to assess the influence of newer Ensembl releases, we compared the promoters annotated in release 79 and in release 115. We found that the vast majority (87.3%) of promoters newly annotated in release 115 belong to non-coding genes. Thus, using release 115 may predict more DBSs in non-coding genes, but downstream analyses based on protein-coding genes would be essentially the same (meaning that all figures and tables would be the same).

      (3) Third, a key element of this study is GTEx data analysis, and these data were also published years ago.  

      (4) Finally, some lncRNA genes have new gene symbols in new Ensembl releases. To allow researchers to use our data conveniently, we have added a new column titled "Gene symbol (Ensembl release115)" to Supplementary Tables 2A and 2B.  

      Summary:

      Major changes based on Reviewer’s comments:

      (1) The following revisions are made to address the comment on “the 0.034 threshold”: (a) Section 2.3, section 2.4, Supplementary Note 3, and related contents in Discussion and Methods are revised, (b) new Figure 2, Supplementary Figure 15, new Supplementary Table 5,6,7, (c) Table 2 and Supplementary Table 8 are revised.

      (2) To address the comment on “new TFBS section”, section 2.6 and section 4.13 are revised.  

      (3) To address the comment on “97.81% and 2.2% of DBSs”, section 2.3 is revised.

      (4) The following revisions are made to address the comment on “release 79”: (a) the old Supplementary Table 2, 3 are merged to Supplementary Table 2AB, and the new column "Gene symbol (Ensembl release115)" is added to Supplementary Table 2AB, (b) accordingly, Supplementary Table 4,5 are renamed to Supplementary Table 3,4.

      Additional revisions:

      (1) Section 2.5 “Young weak DBSs may have greatly promoted recent human evolution” is moved into Supplementary Note 3 (which now has the subtitle “Target genes with specific DBS features are enriched in specific functions”), because this section is short and lacking sufficient cross-validation.

      (2) Considerable minor revisions of sentences have been made.

      (3) Since there are many supplementary figures, the main text now cites only Supplementary Notes, as the reader can easily access supplementary figures in Supplementary Notes.

    1. eLife Assessment

      This important study provides evidence supporting the hypothesis that postnatal visual experience shapes the patterns of functional connectivity between extrastriate visual cortex and frontal regions, by comparing neonates, blind and sighted adults using resting-state fMRI. The evidence supporting the main claim is convincing, and the authors' interpretations are appropriately calibrated in the discussion. Nevertheless, the study design and methodology are inherently limited to resolve the underlying mechanisms driving connectivity changes during neurodevelopment (experience-related plasticity vs post-natal experience-independent maturation). This study will be of broad interest to neuroscientists and neuroimaging researchers studying vision, plasticity and brain development.

    2. Reviewer #1 (Public review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between human extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience shapes these cortical networks through activity-dependent plasticity. This study provides novel insights into the impact of visual experience on the development of temporal correlations in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the main conclusion regarding postnatal experience-driven shaping of visual-frontal connectivity.

      The neonatal data offers a unique and valuable developmental anchor for interpreting divergence between blind and sighted adults. This is a major advance over prior studies limited to adult comparisons.

      Convergence with prior findings in the blind and sighted adult groups reinforces the reliability and external validity of the present results.

      The split-half reliability analysis in the infant and adult data increases confidence in the robustness of the reported group differences.

      Weaknesses:

      The methodology cannot determine whether group differences in correlations reflect direct changes in communication between visual and frontal regions or indirect effects mediated by other structures.

      The cross-sectional design cannot reveal the timecourse over which visual experience shapes connectivity between infancy and adulthood.

      Whether the infant resting-state patterns imply similar functional capacity to blind adults (e.g., cross-modal task responses) remains untested.

      Comments on revisions:

      The authors have done a fantastic job addressing my remaining questions.

    3. Reviewer #2 (Public review):

      Summary:

      Tian et al. explore the developmental origins of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. Here, Tian et al. explore how this organisation arises over development, asking whether the infant brain looks more like the blind adult pattern, or more like the sighted adult pattern. Their analyses reveal that the answer depends on the particular networks investigated. Some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.

      Strengths:

      The paper addresses very important questions about the "starting state" in the developing visual cortex, and how cortical networks are shaped by experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.

      Weaknesses:

      While potential roles of experience (e.g., visual, cross-modal) are discussed in detail, little consideration is given to the role of experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. It is possible that the sighted adult pattern may still emerge later in infancy or childhood, regardless of infant visual experience. If so, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). In short, it is not clear that the age range studied is a clear-cut "starting point" for development, after which all change can be attributed to experience.

    4. Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of infants lies between that of sighted adults (showing stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (showing stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of infants resembled those of sighted adults more than those of blind adults, but infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths

      - The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      - Overall, the presented analyses are solid and well-detailed, and the results and discussion are convincing.

      Weaknesses

      - While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating the evolution of functional connectivity of the visual system as a function of visual experience and thus as a function of age, at least during toddlerhood given the early and intense maturation of the visual system after birth. This could be achieved by analyzing different developmental periods using open databases such as the Baby Connectome Project.

      - The rationale for grouping full-term neonates and preterm infants (scanned at term-equivalent age) is not understandable when seeking to perform comparisons with adults. Even if the study results do not show differences between full-terms and preterms in terms of functional connectivity differences between regions and of connectivity patterns, preterms group had different neurodevelopment and post-natal (including visual) experiences (even a few weeks might have an impact). And actually they show reduced connectivity strength systematically for all regions compared with full-terms (Sup Fig 7). Considering a more homogeneous group of neonates would have strengthened the study design.

      - The rationale for presenting results on the connectivity of secondary visual cortices before the one of primary cortices (V1) could be clarified.

      - The authors acknowledge the methodological difficulties for defining regions of interest (ROIs) in infants in a similar way as adults. Since the brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing a delayed growth), this poses major problems for registration. This raises the question of whether the study findings could be biased by differences in ROI positioning across groups.

      Comments on revisions:

      The authors have addressed my specific recommendations, but some weaknesses in the study remain, particularly the inclusion of preterm infants alongside full-term neonates.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between human extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the main conclusion regarding postnatal experience-driven shaping of visual-frontal connectivity.

      The inclusion of neonates offers a unique and valuable developmental anchor for interpreting divergence between blind and sighted adults. This is a major advance over prior studies limited to adult comparisons.

      Convergence with prior findings in the blind and sighted adult groups reinforces the reliability and external validity of the present results.

      The split-half reliability analysis in the infant data increases confidence in the robustness of the reported group differences.

      Weaknesses:

      The manuscript risks overstating a mechanistic distinction between sighted and blind development by framing visual experience as "instructive" and blindness as "reorganizing." Similarly, the binary framing of visual experience and blindness as independent may oversimplify shared plasticity mechanisms.

      The interpretation of changes in temporal correlations as altered neural communication does not adequately consider how shifts in shared variance across networks may influence these measures without reflecting true biological reorganization.

      The discussion does not substantively engage with the longstanding debate over whether sensory experience plays an instructive or permissive role in cortical development.

      The relationship between resting-state and task-based findings in blindness remains unclear.

      Reviewer #2 (Public review):

      Summary:

      Tian et al. explore the developmental origins of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. Here, Tian et al. explore how this organization arises over development. Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated. Some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.

      Strengths:

      The paper addresses very important questions about the starting state in the developing visual cortex, and how cortical networks are shaped by experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.

      Weaknesses:

      While potential roles of experience (e.g., visual, cross-modal) are discussed in detail, little consideration is given to the role of experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. It is possible then that the sighted adult pattern may still emerge later in infancy or childhood, regardless of infant visual experience. If so, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). In short, it is not clear that birth, or the first couple weeks of life, are a clear cut "starting point" for development, after which all change can be attributed to experience.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of infants lies between that of sighted adults (showing stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (showing stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of infants resembled those of sighted adults more than those of blind adults, but infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths

      - The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      - Overall, the presented analyses are solid and well detailed, and the results and discussion are convincing.

      Weaknesses

      - While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating the evolution of functional connectivity of the visual system as a function of visual experience and thus as a function of age, at least during toddlerhood given the early and intense maturation of the visual system after birth. This could be achieved by analyzing different developmental periods using open databases such as the Baby Connectome Project.

      - The rationale for grouping full-term neonates and preterm infants (scanned at term-equivalent age) is not understandable when seeking to perform comparisons with adults. Even if the study results do not show differences between full-terms and preterms in terms of functional connectivity differences between regions and of connectivity patterns, preterms group had different neurodevelopment and post-natal (including visual) experiences (even a few weeks might have an impact). And actually they show reduced connectivity strength systematically for all regions compared with full-terms (Sup Fig 7). Considering a more homogeneous group of neonates would have strengthen the study design.

      - The rationale for presenting results on the connectivity of secondary visual cortices before the one of primary cortices (V1) could be clarified.

      - The authors acknowledge the methodological difficulties for defining regions of interest (ROIs) in infants in a similar way as adults. Since the brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing a delayed growth), this poses major problems for registration. This raises the question of whether the study findings could be biased by differences in ROI positioning across groups.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are appropriately cautious in many parts of the discussion and include several helpful control analyses. Nonetheless, additional clarification of key assumptions and potential confounds would strengthen the paper.

      (1) The current framing labels vision as "instructive" and blindness as "reorganizing," but it is unclear why these two experiential factors are characterized differently. Both involve activity-dependent changes to functional architecture from a shared immature scaffold. Labeling them differently risks conflating divergent outcomes with distinct underlying mechanisms. Just because visual and blind adults show different patterns of functional connectivity does not mean they reflect separate processes. While the discussion briefly acknowledges the possibility of shared plasticity mechanisms, much of the framing across the manuscript, including in the abstract and introduction, implies a dichotomy. A clearer articulation of the criteria used to assign these labels, or reconsideration of whether such a distinction is warranted, would improve conceptual clarity. The current framing appears analogous to saying that "heat causes expansion" and "cold causes contraction" as if these were separate mechanisms, when they are actually two directions of change along a single factor: temperature. A more parsimonious framework, such as activity-dependent reweighting of pre-existing connectivity, may better capture the nature of plasticity at play in both sighted and blind development.

      Following the reviewer’s suggestion, we have revised the manuscript to clarify that both vision and blindness can be understood as manifestations of a common framework of experience-driven plasticity. We removed all mention of reorganization and clarify and modified the wording throughout.

      Specifically:

      Abstract: “Are infant visual cortices functionally like those of sighted adults, with blindness leading to functional change? We find that, on the contrary that secondary visual cortices of infants are functionally more like those of blind adults: stronger coupling with PFC than with nonvisual sensory-motor networks, suggesting that visual experience modifies elements of the sighted-adult long-range functional connectivity profile. Infant primary visual cortices are in-between blind and sighted adults i.e., more balanced PFC and sensory-motor connectivity than either adult group. The lateralization of occipital-to-frontal connectivity in infants resembles the sighted adults, consistent with the idea that blindness leads to functional change. These results suggest that both vision and blindness modify functional connectivity through experience-driven (i.e., activity-dependent) plasticity.” (Page 1, Line 13)

      Introduction: We replaced “blindness leads to functional reorganization” with “blindness modifies this functional connectivity” (Page 2, Line 52), and the following sentence has also been modified to: “lifetime visual experience shapes connectivity toward the sighted-adult pattern” (Page 2, Line 54) For the lateralization patterns, we now describe them as “blindness-related modification” rather than “reorganization”, to keep the interpretation descriptive rather than mechanistic. (Page 4, Line 114),

      (2) In interpreting the functional correlation differences, the discussion should more explicitly consider how statistical interdependence between areas could influence the observed results. For example, an increase in shared variance between visual and motor areas, such as might result from visually guided action, could result in a reduction in the apparent strength of visual-prefrontal temporal correlation (at the resolution of fMRI) without any true biological change in communication between visual-prefrontal cortex. This possibility is not ruled out by reporting groupwise patterns of relative connectivity. A more cautious systems-level framing could help clarify the distinction between neural plasticity and statistical redistribution of variance.

      We thank the reviewer for raising this important point. We agree that resting-state fMRI provides a measure of statistical synchrony in BOLD signals rather than direct causal interactions between regions. This a fundamental limitation of resting state fMRI, which we now note in the Discussion section. Such changes in correlation are consistent with a variety of underlying biological mechanisms. Online task is one factor that influences cross-region correlations. In the current study, both blind and sighted groups were measured while blindfolded and were not performing visually guided actions during the resting state fMRI scans. It is possible that past visual-guided action experience changes the resting state correlations of sighted participants. Indeed, this is one interesting hypothesis.

      In the revised Discussion, we now explicitly note this limitation and clarify that differences in FC do not by themselves establish whether or how underlying neurophysiological mechanisms are changed. We also emphasize that future work will need to investigate whether FC changes are accompanied by alterations in structural connectivity and to probe causal interactions and mechanistic underpinnings as follows:

      “Resting-state functional connectivity captures synchrony in BOLD signal fluctuations rather than causal interactions and differences in functional connectivity cannot on their own reveal how underlying neurophysiological mechanisms are modified.” (page 13,line 342)

      “Future studies will be needed to determine whether these functional changes are accompanied by alterations in structural connectivity, and to probe causal interactions and mechanistic underpinnings.” (page 13,line 350)

      (3) The mechanistic interpretation of group differences in visual-motor coupling would benefit from stronger network-level justification. Direct connections between these areas are sparse in primates. If effects reflect indirect polysynaptic interactions or shared thalamic input, as the authors suggest, one might expect corresponding group differences in intermediate regions (e.g., parietal cortex, thalamus) that mediate these interactions. Is there any evidence for this in the data?

      We thank the reviewer for raising this point. We agree and as noted above, resting state fMRI cannot distinguish between direct causal interactions between two regions and ones that a mediating region is involved. This is a fundamental limitation of resting state fMRI. The current study further focused on testing a specific hypothesis motivated by previously observed group differences between blind and sighted adults and our analyses focused on ROI-to-ROI connectivity between occipital, frontal, and sensory-motor cortices, and did not include these additional regions. In prior work, we and others, have looked at effects in parietal cortices (Abboud & Cohen, 2019; Bedny et al., 2009; Deen et al., 2015; Kanjlia et al., 2016, 2021; Sen et al., 2022). In blindness, parietal networks show increased correlations with some visual areas, rather than decreased. Regarding the thalamus, there is less clear evidence and there is some ongoing work trying to address this question. A couple of studies suggest that there is indeed increased connectivity between some parts of the thalamus and visual cortex in blindness. Although the anatomical information is limited, some of the work suggests that this increase is with higher-cognitive nuclei of the thalamus (Bedny et al., 2011; Liu et al., 2007).

      We agree that this is an important direction for future work. To acknowledge this point, we have revised the manuscript to highlight the potential role of cortical and subcortical hub regions in mediating connectivity changes. The text has been modified as follows:

      “Connectivity changes between two areas could be mediated by ‘third-party’ hub regions. For example, posterior parietal cortex serves as a cortical hub for multisensory integration and visuo-motor coordination and could mediate occipital-to-sensory-motor communication (Rolls et al., 2023; Sereno & Huang, 2014). Subcortical structures such as the thalamus could also play a mediating role (Vega-Zuniga et al., 2025).” (page 13,line 345)

      (4) The discussion would benefit from deeper engagement with prior work on experience-dependent plasticity, particularly the longstanding distinction between instructive and permissive roles of experience. While the authors briefly define these concepts and reference their historical use, a more explicit consideration of how their findings relate to this broader literature would help clarify whether such distinctions are necessary or appropriate.

      We thank the reviewer for this thoughtful suggestion to engage more explicitly with the longstanding literature on instructive versus permissive roles of experience. However, most of this literature comes from animal models, where experimental manipulations of the anatomical structure, of experience itself (e.g., controlled rearing studies) and sometimes of neural activity patterns allow clear tests of these mechanisms. Such manipulations are not feasible in humans. The terminology in the animal literature does not directly map onto the methods and data available in the present study or in other work with humans. For this reason, the current data does not allow us to fully engage with the debates in the animal literature and doing risks overinterpreting our findings.

      Nevertheless, we agree that once the instructive/permissive framework has been introduced, it is important to clarify how our results relate to it, rather than only providing definitions. We have therefore added the following text to the discussion:

      “In humans, such manipulations are not feasible, leaving us to study only the consequences of the presence or absence of vision. Under an instructive account, visual and multisensory experience could strengthen coupling between visual and other non-visual sensory-motor cortices through coordinated activity, thereby establishing the sighted-adult connectivity pattern. In the absence of visual input, by contrast, the lack of such coordinated activity may prevent these couplings from being established. Alternatively, vision may act permissively, indirectly enabling maturational processes that shift connectivity toward the sighted-adult configuration.” (page 14,line 362)

      (5) The revised discussion acknowledges the divergence between resting-state and task-based findings, but does not fully frame the theoretical implications of this discrepancy. Although this study cannot resolve the issue with its own data, a more integrative discussion could help clarify whether these measures reflect distinct functional states, developmental trajectories, or mechanisms of plasticity. Without such framing, readers are left without clear guidance on how to reconcile the present results with prior work on cross-modal recruitment in blindness.

      We thank the reviewer for this thoughtful comment. We agree that know how resting-state evidence relates to task-based evidence is a fundamentally important issue. We now discuss this more in the Introduction as well as in the Discussion.

      There is a sizable literature of both task-based and resting state studies. Some of prior studies have measured resting state and task-based data within the same participants and found relationships (Kanjlia et al., 2016, 2021; Lane et al., 2015). We now clarify this in the introduction. These studies find that within visual cortices of blind people, the task-based profile of a cortical area is related to its resting state connectivity pattern (Abboud & Cohen, 2019; Deen et al., 2015; Kanjlia et al., 2016, 2021). This suggests that these two measures are related. However, the timecourse of this relationship, the developmental trajectory and mechanism of plasticity is not known. We note this now in the introduction on page 2. Primarily this is because there is very little relevant developmental evidence. For example, in the current study we find that the resting state profile of secondary visual networks in infants is similar to that of blind adults. However, we do not know whether the visual cortices of infants show task-based cross modal responses. To our knowledge nobody has tested this question. We agree with the reviewer that raising this question in the paper is better than not commenting on the relationship at all.

      To address the reviewer’s comment, we have expanded the discussion to situate our results within a developmental framework, highlighting how early intrinsic connectivity may scaffold alternative trajectories shaped by either visual experience or blindness. The revised text now reads as follows:

      “Conversely, for people who remain blind throughout life, visual-PFC connectivity could enable recruitment of visual cortices for higher-order non-visual functions, such as language and executive control (Bedny et al., 2011; Kanjlia et al., 2021). Our results suggest that blind adults may build on connectivity patterns already present in infancy: like blind adults, sighted infants show stronger occipital–PFC than occipital–sensory–motor coupling. Repeated engagement of occipital networks during higher cognitive tasks in early development could intern enhance connectivity and specialization of visual networks for non-visual higher-order functions.

      Some prior studies have measured resting-state and task-based functional profiles in the same participants. These studies find that within visual cortices of blind people, the task-based profile of a cortical area is related to its resting state connectivity pattern (citations.) This suggests that these two measures are related. However, the timecourse of this relationship, the developmental trajectory and mechanism of plasticity is not known. Primarily this is because there is very little relevant developmental evidence. For example, in the current study we find that the resting state profile of secondary visual networks in infants is similar to that of blind adults. However, we do not know whether the visual cortices of infants show enhanced task-based cross modal responses, relative to sighted adults and how this compares to responses observed in blind adults. Future work with infants and children would be able to address this question.

      In the current study, the clearest evidence for functional change driven by blindness was observed for laterality. Connectivity lateralization in sighted infants resembles that of sighted adults, in both V1 and secondary visual cortices. Relative to both sighted infants and sighted adults, blind adults show more lateralized connectivity patterns between occipital and prefrontal cortices. Previous studies suggest that in people born blind occipital and non-occipital language responses are co-lateralized (Lane et al., 2017; Tian et al., 2023). We speculate that habitual activation of visual cortices by higher-cognitive tasks, such as language, which are themselves highly lateralized, contributes to this biased connectivity pattern of occipital cortex in blindness. Taken together, these results suggest a developmental framework in which intrinsic connectivity present in infancy provides a scaffold that is subsequently shaped and reinforced by experience-dependent recruitment, through either visual experience or the lifelong absence of vision in blindness. Longitudinal work across successive developmental stages will be crucial to test how the alternative trajectories shaped by visual experience versus blindness unfold over development.” (page 14-15)

      (6) The split-half reliability analysis is a valuable control. Additional details would clarify what these noise ceilings reflect. Were the rsFC patterns for each ROI calculated only for the ROIs included in the current study or was a broader assessment across the whole brain performed? It also would be helpful to report whether reliability differed for individual ROIs within and between groups. Even if global reliability is matched, selective differences could influence group comparisons. Several infants in the dhcp dataset were scanned twice. Were any second scans included in the current analyses? Comparing first versus second scans directly could strengthen the claim that several weeks of visual experience are insufficient to shift connectivity toward a sighted adult profile.

      Thanks to the reviewer’s comments on the reliability of the current study.

      In the present study, the noise ceiling was computed from the reliability of the ROI-wise FC profiles used across all analyses. Reliability was estimated using a split-half procedure: each rs-fMRI time series was divided into two equal halves, FC among all ROIs included in the study was computed separately for each half, and the noise ceiling for each ROI was defined as the Pearson correlation between its two FC profiles. Then we averaged these ROI-wise noise ceilings to evaluate group-level reliability, which exceeded 0.70 in all three groups and found no significant difference across groups. This provides an estimate of the upper bound on explainable variance for the exact FC features subjected to statistical testing (Lage-Castellanos et al., 2019). A brief description has been added to the manuscript (page 19, line 518).

      Regarding the reviewer’s question about the scope of rsFC features used in the noise-ceiling analysis: we computed noise ceilings only for the ROIs included in the present study, because all analyses in this work were conducted at the ROI–ROI level and did not involve voxelwise whole-brain FC. Thus, the noise-ceiling estimates correspond directly to the full set of FC features on which all statistical comparisons were based.

      As suggested by the reviewer, we examined noise ceilings for each ROI separately. All ROIs showed high absolute reliability (noise ceiling > 0.80) across the three groups, indicating that the ROI-wise FC estimates are generally robust across participants. Although many ROIs exhibited statistically significant group differences in noise ceiling (one-way ANOVA, p < 0.05), the effect sizes were small to moderate (partial η<sup>2</sup> < 0.14). These differences indicate that reliability may vary modestly across groups at the ROI level, and we cannot fully determine whether such variability contributes to the observed different FC patterns across groups. We have included this point in the revised manuscript (page 19, line 525), along with the full statistical results for the ROI-wise noise ceilings in the Supplementary Table S2.

      Last, we fully agree that longitudinal comparisons across multiple time points can provide important insights into how early visual experience shapes connectivity. At the same time, in the present dataset, the first scan occurred at a preterm age and the second at term-equivalent age. The differences between the first and second scans would reflect not only additional weeks of visual input, but also differences in prematurity status and overall neurodevelopmental maturity, which would make the interpretation of such comparisons difficult in the context of our current aims. We have clarified in the revised manuscript that only term-equivalent (second) scans were included. We see careful longitudinal work as an important avenue for addressing this question more directly.

      (7) The signal dropout assessment in the infant dataset is a valuable quality control step. Applying the same metric to the adult datasets would help harmonize preprocessing across groups and increase confidence in group-level comparisons.

      Thank you for this valuable suggestion. Following your comment, we applied the same signal dropout assessment to the adult datasets. One participant in the sighted adult group and two participants in the blind adult group showed signal dropout in one ROI each. The corresponding results are now included in the Supplementary Materials (Figure S13). The findings remain unchanged after this additional control analysis. We also add the relevant content in the Method part as follows:

      “The same signal dropout assessment was also applied to the blind and sighted adults to ensure consistent quality control across groups. One participant in the sighted adult group and two participants in the blind adult group exhibited signal dropout in one ROI each. Excluding these participants did not alter the group-level results (see Figure S13).” (page 16, line 449)

      Minor:

      (8) The authors added accurate anatomical descriptions to the methods but a less precise characterization remains in the introduction: "Anatomically, these regions correspond roughly to the location of areas such as motion area V5/MT+, the lateral occipital complex (LO), V3a and V4v in sighted people."

      We thank the reviewer for this helpful comment. We have revised the Introduction to provide a fuller anatomical description, consistent with the Methods. The text now reads:

      “Anatomically, these regions in sighted people approximately correspond to the locations of motion-sensitive V5/MT+ and the lateral occipital complex (LO), as well as ventral portions of occipito-temporal cortex including V4v and dorsal portions including V3a. The occipital ROI also extends ventrally into the middle portion of the ventral temporal lobe and dorsally into the intraparietal sulcus and superior parietal lobule.” (page 3, line 88)

      (9)Typo: "lager effect" should be "larger effect."

      Secondary visual cortices showed a significant within > between difference in both groups, with a lager effect in the blind group (post-hoc tests, Bonferroni-corrected paired: t-test: sighted adults within hemisphere > between hemisphere: t (49) = 7.441, p = 0.012; blind adults within hemisphere > between hemisphere: t (29) = 10.735, p < 0.001; V1: F(1, 78) =87.211, p < 0.001).

      We thank the reviewer for catching this typo. We have corrected “lager effect” to “larger effect” in the revised manuscript. (page 9, line 214)

      Reviewer #2 (Recommendations for the authors):

      All of my other concerns were adequately addressed.

      We thank the reviewer for their positive evaluation, and we are glad that our revisions have addressed their concerns.

      Reviewer #3 (Recommendations for the authors):

      In my view, qualifying infants as "sighted" is confusing and unnecessary: why not simplifying and homogenizing the wording along the manuscript and figures?

      We thank the reviewer for this suggestion. We agree and have revised the manuscript to use consistent wording, avoiding the qualification of infants as “sighted.”

      l188, I don't understand the sentence "By contrast, in sighted adults, this cross-hemisphere difference is weak or absent."

      We thank the reviewer for noting that this sentence was unclear. We have revised the text to provide a more precise explanation. The text now reads:

      “By contrast, in sighted adults this lateralized pattern is weaker: visual areas in each hemisphere show only a modest preference for ipsilateral prefrontal cortices, and connectivity with the contralateral PFC remains comparatively strong.” (page 8, line 207)

      l193: "Secondary visual cortices showed a significant within > between difference in both groups, with a lager effect in the blind group": providing effect sizes for the 2 groups would strengthen this result (+ note the typo laRger).<br /> - Figure S7, S11: Please add titles of y-axes.

      Thank you for this helpful suggestion. We have corrected the typo and added the effect sizes for both groups in the revised text. The revised sentence now reads as follows:

      “Secondary visual cortices showed a significant within > between difference in both groups, with a larger effect in the blind group (post-hoc tests, Bonferroni-corrected paired: t-test: sighted adults within hemisphere > between hemisphere: t (49) = 7.441, p = 0.012, cohen’d = 0.817; blind adults within hemisphere > between hemisphere: t (29) = 10.735, p < 0.001, cohen’d = 1.96).” (page 9, line 214)

      Titles of the y-axes have also been added to Figures S7 and S11.

    1. eLife Assessment

      This important work describes wing mechanosensory neurons in detail, extending our understanding of sensorimotor processing in the fruit fly. The evidence presented convincingly supports the authors' identification of these neurons and leverages state-of-the-art methods to generate a near-complete map of wing mechanosensory circuitry. Overall, this study provides new hypotheses and invaluable tools for investigating proprioceptive motor control of the wing in Drosophila.

    2. Reviewer #1 (Public review):

      Summary:

      Lesser et al provide a comprehensive description of Drosophila wing proprioceptive sensory neurons at the electron microscopy resolution. This "tour-de-force", provides a strong foundation for future structural and functional research aimed at understanding wing motor control in Drosophila with implications to understanding wing control across other insects.

      Strengths:

      (1) Authors leverage previous research that described many of the fly wing proprioceptors, and combine this knowledge with EM connectome data such that they now provide a near-complete morphological description of all wing proprioceptors.

      (2) Authors cleverly leverage genetic tools and EM connectome data to tie the location of proprioceptors on the wings with axonal projections in the connectome. This enables them to both align with previous literature as well as make some novel claims.

      (3) In addition to providing a full description of wing proprioceptors, authors also identified a novel population of sensors on the wing tegula that make direct connections with the B1 wing motor neurons implicating the role of tegula in wing movements that was previously underappreciated.

      (4) Despite being the most comprehensive description so far, it is reassuring that authors clearly state the missing elements in the discussion.

      Weaknesses:

      (1) Authors do their main analysis on data from FANC connectome but provide corresponding IDs for sensory neurons in the MANC connectome. I wonder how the connectivity matrix compares across FANC and MANC if the authors perform similar analysis as they have done in Fig. 2. This could be a valuable addition and potentially also pick up any sexual dimorphism.

      (2) Authors speculate about presence of gap junctions based on density of mitochondria. I'm not convinced about this given mitochondrial densities could reflect other things that correlate with energy demands in sub-compartments.

      Overall, I consider this an exceptional analysis which will be extremely valuable to the community.

    3. Reviewer #2 (Public review):

      Summary:

      Lesser et al. present an atlas of Drosophila wing sensory neurons. They proofread the axons of all sensory neurons in the wing nerve of an existing electron microscopy dataset, the female adult fly nerve cord (FANC) connectome. These reconstructed sensory axons were linked with light microscopy images of full-scale morphology to identify their origin in the periphery of the wing and encoded sensory modalities. The authors described the morphology and postsynaptic targets of proprioceptive neurons as well as previously unknown sensory neurons.

      Strengths:

      The authors present a valuable catalogue of wing sensory neurons, including previously undescribed sensory axons in the Drosophila wing. By providing both connectivity information with linked genetic drive lines, this research facilitates future work on the wing motor-sensory network and applications relating to Drosophila flight. The findings were linked to previous research as well as their putative role in the proprioceptive and nerve cord circuitry, providing testable hypotheses for future studies.

      Weaknesses:

      With future use as an atlas, it should be noted that the evidence is based on sensory neurons on only one side of the nerve cord. Fruit flies have stereotyped left/right hemispheres in the brain and left/right hemisegments in the nerve cord. Comparison of left and right neurons of the nervous system can give a sense of how robust the morphological and connectivity findings are. Unfortunately, this dataset has damage to the right side, making such comparisons unreliable.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to identify the peripheral end organ origin in the fly's wing of all sensory neurons in the Anterior Dorsal Mesothoracic nerve. They reconstruct the neurons and their downstream partners in an electron microscopy volume of a female ventral nerve cord, analyse the resulting connectome and identify their origin with review of the literature and imaging of genetic driver lines. While some of the neurons were already known through previous work, the authors expand on the identification and create a near complete map of the wing mechanosensory neurons at synapse resolution.

      Strengths:

      The authors elegantly combine electron microscopy neuron morphology, connectomics and light microscopy methods to bridge the gap between fly wing sensory neuron anatomy and ventral nerve cord morphology. Further, they use EM ultrastructural observations to make predictions on the signaling modality of some of the sensory neurons and thus their function in flight.

      The work is as comprehensive as state of the art methods allow to create a near complete map of the wing mechanosensory neurons. This work will be of importance to the field of fly connectomics and modelling of fly behavior as well as a useful resource to the Drosophila research community.

      Through this comprehensive mapping of neurons to the connectome the authors create a lot of hypotheses on neuronal function partially already confirmed with the literature and partially to be tested in the future. The authors achieved their aim of mapping the periphery of the fly's wing to axonal projections in the ventral nerve cord, beautifully laying out their results to support their mapping.

      The authors identify the neurons in a previously published connectome of a male fly ventral nerve cord to enable cross-individual analysis of connections and find no indication of sexual dimorphism at the sensory neuron level. Further, together with their companion paper Dhawan et al., 2025 describing the haltere sensory neurons in the same EM dataset, they cover the entire mechanosensory space involved in Drosophila flight.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lesser et al provide a comprehensive description of Drosophila wing proprioceptive sensory neurons at the electron microscopy resolution. This “tour-de-force” provides a strong foundation for future structural and functional research aimed at understanding wing motor control in Drosophila with implications for understanding wing control across other insects.

      Strengths:

      (1) The authors leverage previous research that described many of the fly wing proprioceptors, and combine this knowledge with EM connectome data such that they now provide a near-complete morphological description of all wing proprioceptors.

      (2) The authors cleverly leverage genetic tools and EM connectome data to tie the location of proprioceptors on the wings with axonal projections in the connectome. This enables them to both align with previous literature as well as make some novel claims.

      (3) In addition to providing a full description of wing proprioceptors, the authors also identified a novel population of sensors on the wing tegula that make direct connections with the B1 wing motor neurons, implicating the role of the tegula in wing movements that was previously underappreciated.

      (4) Despite being the most comprehensive description so far, it is reassuring that the authors clearly state the missing elements in the discussion.

      Weaknesses:

      (1) The authors do their main analysis on data from the FANC connectome but provide corresponding IDs for sensory neurons in the MANC connectome. I wonder how the connectivity matrix compares across FANC and MANC if the authors perform a similar analysis to the one they have done in Figure 2. This could be a valuable addition and potentially also pick up any sexual dimorphism.

      We agree that systematic comparisons will provide valuable insights as more connectome datasets become available. However, the primary goal of this study was to link central axon morphology with peripheral structures in the wing. We deliberately omitted more detailed and quantitative analyses of the downstream VNC circuitry, apart from providing a global view of the connectivity matrix and using it to cluster the sensory axon types. A more detailed and systematic comparison of wing sensorimotor circuit connectivity across different connectome datasets (FANC, MANC, BANC, IMAC) is the subject of ongoing work in our lab, which we feel is beyond the scope of this study. Here, we chose to match the wing proprioceptors to axons in MANC to demonstrate their stereotypy across individuals and to make them more accessible to other researchers. We found no obvious sexual dimorphism at the level of wing sensory neurons. We now note this in the Discussion.

      (2) The authors speculate about the presence of gap junctions based on the density of mitochondria. I’m not convinced about this, given that mitochondrial densities could reflect other things that correlate with energy demands in sub-compartments.

      We have moved speculation about mitochondria and gap junctions to the Discussion.

      (3) I’m intrigued by how the tegula CO is negative for iav. I wonder if authors tried other CO labeling genes like nompc. And what does this mean for the nature of this CO. Some more discussion on this anomaly would be helpful.

      Based on this suggestion, we have added an image showing that tegula CO neurons are labeled by nompC-Gal4.

      (4) The authors conclude there are no proprioceptive neurons in sclerite pterale C based on Chat-Gal4 expression analysis. It would be much more rigorous if authors also tried a pan-neuronal driver like nsyb/elav or other neurotransmitter drivers (Vglut, GAD, etc) to really rule this out. (I hope I didn’t miss this somewhere.)

      To address this, we imaged OK371-GFP, which labels glutamatergic neurons, in the wing and wing hinge. We saw expression in the wing, as others have reported (Neukomm et. al., 2014), but we saw no expression at the wing hinge. Apart from a handful of glutamatergic gustatory neurons in the leg, we are not aware of any other sensory neurons in the fly that are not labeled by Chat-Gal4.

      Overall, I consider this an exceptional analysis that will be extremely valuable to the community.

      We sincerely appreciate the reviewer’s positive feedback.

      Reviewer #2 (Public review):

      Summary:

      Lesser et al. present an atlas of Drosophila wing sensory neurons. They proofread the axons of all sensory neurons in the wing nerve of an existing electron microscopy dataset, the female adult fly nerve cord (FANC) connectome. These reconstructed sensory axons were linked with light microscopy images of full-scale morphology to identify their origin in the periphery of the wing and encoded sensory modalities. The authors described the morphology and postsynaptic targets of proprioceptive neurons as well as previously unknown sensory neurons.

      Strengths:

      The authors present a valuable catalogue of wing sensory neurons, including previously undescribed sensory axons in the Drosophila wing. By providing both connectivity information with linked genetic drive lines, this research facilitates future work on the wing motor-sensory network and applications relating to Drosophila flight. The findings were linked to previous research as well as their putative role in the proprioceptive and nerve cord circuitry, providing testable hypotheses for future studies.

      Weaknesses:

      (1) With future use as an atlas, it should be noted that the evidence is based on sensory neurons on only one side of the nerve cord. Fruit flies have stereotyped left/right hemispheres in the brain and left/right hemisegments in the nerve cord. The comparison of left and right neurons of the nervous system can give a sense of how robust the morphological and connectivity findings are. Here, the authors have not compared the left and right side sensory axons from the wing nerve, leaving potential for developmental variability across samples and left/right hemisegments.

      The right ADMN nerve in the FANC dataset is partially severed, making left/right comparisons unreliable (see Azevedo 2024, Extended Data Figure 4). We have updated the text to explain this within the Methods section of the paper.

      (2) Not all links between the EM reconstructions and driver lines are convincing. To strengthen these, for all EM-LM matches in Figures 3-7, rotated views of the driver line (matching the rotated EM views) should be shown to provide a clearer comparison of the data. In particular, Figure 3G and Figure 7B are not very convincing based on the images shown. MCFO imaging of the driver lines in Figure 3G and 7B would make this position stronger if a clone that matches the EM reconstruction could be identified.

      Many of the z-stack images in the paper are from the Janelia FlyLight collection, and unfortunately their imaging parameters were not optimized for orthogonal views. Rotated views are blurry and not especially helpful for comparison to EM reconstruction. We now point out in the text that interested readers can access the z-stacks from FlyLight to see the dorsal-ventral projections.

      Regarding Figure 3G and 7B, we have added markers to the image with corresponding descriptions in the legend to guide the reader through the image of the busy driver line. Although these lines label many cells in the VNC as a whole, they sparsely label cells in the ADMN, making them nonetheless useful for identifying peripheral sensory neurons.

      (3) Figure 7B looks like the driver line might have stochastic expression in the sensory neuron, which further reduces confidence in the result shown in Figure 7C. Is this expression pattern in the wing consistently seen? Many split-GAL4s have stochastic expressions. The evidence would be strengthened if the authors presented multiple examples (~4-5) of each driver line’s expression pattern in the supplement.

      Figure 7B shows sparse labeling of the driver line using the MCFO technique, as specified in the legend. Its unilateral expression is therefore not due to stochastic expression of the Gal4 line. We have added the “MFCO” label to the image to clarify.

      (4) Certain claims in this work lack quantitative evidence. On line 128, for instance, “Overall, our comprehensive reconstruction revealed many morphological subgroups with overlapping postsynaptic partners, suggesting a high degree of integration within wing sensorimotor circuits.” If a claim of subgroups having shared postsynaptic partners is being made, there should have been quantitative evidence. For example, cosine similar amongst members of each group compared to the cosine similarity of shuffled/randomised sets of axons from different groups. The heat map of cosine similarity in Figure 2B alone is not sufficient.

      We agree that illustrating the extent of shared postsynaptic partners across subgroups strengthens this point. We added a visualization showing pairwise similarity scores for within- and between-cluster neuron pairs (Figure 2B inset). We also performed a permutation test to determine that within-cluster similarity is significantly higher than between clusters, and we report the test in the results as well as the figure legend. This analysis provides a more quantitative summary of the qualitative trends in connectivity that are summarized in Figure 2B.

      (5) Similarly, claims about putative electrical connections to b1 motor neurons are very speculative. The authors state that “their terminals contain very densely packed mitochondria compared to other cells”, without providing a quantitative comparison to other sensory axons. There is also no quantitative comparison to the one example of another putative electrical connection from the literature. Further, it should be noted that this connection from Trimarchi and Murphey, 1997, is also stated as putative on line 167, which further weakens this evidence. Quantification would strongly strengthen this position. Identification of an example of high mitochondrial density at a confirmed electrical connection would be even better. In the related discussion section “A potential metabolic specialization for flight circuitry”, it should be more clearly noted that the dense mitochondria could be unrelated to a putative electrical connection. If the authors have an alternative hypothesis about the mitochondria density, this should be stated as well.

      We agree with the reviewer that the link between mitochondrial density and metabolic specialization is purely speculative in this context. Based on reviewer feedback, we have moved all mention of the relationship between mitochondrial density and gap junction coupling to the Discussion. We acknowledge that this may seem like a somewhat random and not quantitatively supported observation. However, we found the coincidence striking and worthy of mention, though it is only tangentially relevant to the rest of the paper. From conversations with colleagues, we have also heard that this relationship is consistent with as yet unpublished work in other model organisms (e.g., zebrafish, mouse).

      The electrical coupling to b1 motor neurons is well-established (Fayyazuddin and Dickinson, 1999), and we have updated the text to state this more clearly. However, we agree that whether the specific neurons we have identified based on their anatomy are the same ones functionally identified through whole-nerve recordings remains unknown.

      (6) It would be appropriate to cite previous work using a similar strategy to match sensory axons to their cell bodies/dendrites at the periphery using driver lines and connectomics (see Figure 5 for example in the following paper: https://doi.org/10.7554/eLife.40247 ).

      At this point, there are now dozens of papers that match the axons of sensory neurons to their cell bodies/dendrites in the periphery by comparing light microscopy and connectomics. When we dug in, we found examples in C. elegans, Ciona intestinalis, zebrafish, and mouse, all published prior to the study cited above. For basically every animal for which scientists have acquired EM volumes of neural tissue, they have used other anatomical labeling methods to determine cell types inside and outside the imaged volume. In summary, we found it difficult to establish a single primary citation for this approach. In lieu of this, we have added a citation to an earlier review by a pioneer in EM connectomics that discusses the general approach of matching cells across different labeling/imaging modalities (Meinertzhagen et al., 2009).

      The methods section is very sparse. For the sake of replicability, all sections should be expanded upon.

      We have expanded the methods section, and also a STAR methods table.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to identify the peripheral end-organ origin in the fly’s wing of all sensory neurons in the anterior dorsomedial nerve. They reconstruct the neurons and their downstream partners in an electron microscopy volume of a female ventral nerve cord, analyse the resulting connectome, and identify their origin with a review of the literature and imaging of genetic driver lines. While some of the neurons were already known through previous work, the authors expand on the identification and create a near-complete map of the wing mechanosensory neurons at synapse resolution.

      Strengths:

      The authors elegantly combine electron microscopy, neuron morphology, connectomics, and light microscopy methods to bridge the gap between fly wing sensory neuron anatomy and ventral nerve cord morphology. Further, they use EM ultrastructural observations to make predictions on the signaling modality of some of the sensory neurons and thus their function in flight.

      The work is as comprehensive as state-of-the-art methods allow to create a near-complete mapof the wing mechanosensory neurons. This work will be of importance to the field of fly connectomics and modelling of fly behavior, as well as a useful resource to the Drosophila research community.

      Through this comprehensive mapping of neurons to the connectome, the authors create a lot of hypotheses on neuronal function, partially already confirmed with the literature and partially to be tested in the future. The authors achieved their aim of mapping the periphery of the fly’s wing to axonal projections in the ventral nerve cord, beautifully laying out their results to support their mapping.

      The authors identify the neurons in a previously published connectome of a male fly ventral nerve cord to enable cross-individual analysis of connections. Further, together with their companion paper, Dhawan et al. 2025, describing the haltere sensory neurons in the same EM dataset, they cover the entire mechanosensory space involved in Drosophila flight.

      Weaknesses:

      The connectomic data are only available upon request; the inclusion of a connectivity table of the reconstructed neurons would aid analysis reproducibility and cross-dataset comparisons.

      We have added a connectivity table as well as analysis scripts in the github repository for the paper (https://github.com/EllenLesser/Lesser_eLife_2025).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The methods section should be expanded in every aspect. Most pressing sections are:

      (1) Data and Code availability: All code should be included as a Zenodo database, the suggestion to ask authors for code upon request is inappropriate.

      We have added all code to a public github repository, which is now linked in the Methods section.

      (2) Samples: Standard cornmeal and molasses medium should have a reference, as many institutes use different recipes.

      The recipe used by the University of Washington fly kitchen is based on the Bloomington standard Cornmeal, Molasses and Yeast Medium recipe, which can be found at https://bdsc.indiana.edu/information/recipes/molassesfood.html. The UW recipe is slightly modified for different antifungal ingredients and includes tegosept, propionic acid, and phosophoric acid.

      (3) Table 3: Driver lines labelling wing sensory neurons: The genetic driver lines should have associated Bloomington stock centre numbers. Additionally, relevant information for effector lines used should be included in the methods.

      We now include the Bloomington stock numbers and more information on effector lines in the STAR methods table.

      Minor corrections:

      (1) Lines 119-120: “Notably, many of the axons do not form crisp cluster boundaries, suggesting that multimodal sensory information is integrated at early stages of sensory processing.” We do not follow the logic of this statement and suspect it is a bit too speculative.

      We removed this sentence from the manuscript.

      (2) Figure 1: The ADMN is missing in the schematics and would be helpful to depict for non-experts. Is this what is highlighted in Figure 1D?

      Yes, and we now label 1D as the ADMN wing nerve.

      (3) Figure 1B: Which driver lines are being depicted here? Looking at Table 3 does not clarify. It should be specified at least in the figure legend.

      As stated in the legend, we include a table of all of the driver lines we screened and which sensory structures they label.

      (4) Figure 1C: There are some minor placement issues with the text in the schematic. There is an arrow very close to the “CO” on the top right, which makes the “O” look like the symbol for male. “ax ii” is a bit too close to the wing hinge

      We updated the figure to address this issue.

      (5) Figure 1D: The outlined grey masks are not clear. The use of colour would be very useful for the reader to help understand what the authors are referring to here

      We now use color for the masks.

      (6) Figure 2A: It is unclear if the descending neuron and non-motor efferent neuron are not shown because they are under the described threshold, or to simplify the plot. They should be included in the plot if over the threshold.

      We have updated the legend to specify that the exclusion of the descending and non-motor efferent neurons are to visually simplify the plot. We include % of sensory output to each of these neurons in the legend, and they are included in the connectivity matrix data in the public  GitHub repository associated with the paper, included in the Methods.

      (7) Figure 2B: What clustering is used specifically? The method says it’s from Scikit-learn, but there are many types of clustering available in this package.

      We now include the specific clustering type used in the Methods section, which is agglomerative clustering.

      (8) Figure 3A: What does the green box behind the plot represent?

      The green box represents the tegula CO axons, which we now specify in the legend.

      (9) Figure 3C: the “C” is clipped at the top.

      We updated the figure to address this issue.

      (10) Figure 4A: the main text says a “group of four axons” (line 203) while the figure says 5 axons.

      We updated the text to address this issue.

      (11) Line 360: “We found that the campaniform sensilla on the tegula provide the most direct feedback onto wing steering motor neurons”. We struggled to find where this was directly shown, because several sensory axon types directly synapse onto motor neurons.

      We now specify in the text that this finding is shown in Figure 3.

      Reviewer #3 (Recommendations for the authors):

      I would like to congratulate the authors on their beautiful, easy-to-read, and easy-to-comprehend manuscript, with clear figures and nice visualizations. This work provides a valuable resource that will contribute to the interpretability of connectomic data and further to connectome-based modeling of fly behavior.

      We sincerely appreciate the reviewer’s positive feedback.

    1. eLife Assessment

      This important work examines the effects of side-wall confinement on chemotaxis of swimming bacteria in a shallow microfluidic channel. The authors present convincing experimental evidence, combined with geometric analysis and numerical simulations of simplified models, showing that chemotaxis is enhanced when the distance between the side walls is comparable to the intrinsic radius of chiral circular swimming near open surfaces. This study should be of interest to scientists specializing in bacteria-surface interactions.

    2. Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

    3. Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.<br /> The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This article deals with the chemotactic behavior of E coli bacteria in thin channels (a situation close to 2D). It combines experiments and simulations.

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, close to the average radius of the circle trajectories of the unconfined bacteria in 2D. It is known that these circles are chiral and impose that the bacteria swim preferentially along the right-side wall when there is no chemotactic gradient. In the presence of a chemotactic gradient, this larger proportion of bacteria swimming on the right wall yields chemotaxis. This effect is backed by numerical simulations and a geometrical analysis.

      If the conclusions drawn from the experiments presented in this article seem clear and interesting, I find that the key elements of the mechanism of this wall-directed chemotaxis are not sufficiently emphasized. Moreover, the paper would be clearer with more details on the hypotheses and the essential ingredients of the analyses.

      We thank the reviewer for these constructive suggestions. We agree that emphasizing the underlying mechanism is crucial for the clarity of our findings. In the revised manuscript, we have now explicitly highlighted the critical roles of chiral circular motion and the alignment effect following side-wall collisions in both the Abstract (lines 25-27) and the Discussion (lines 391-393). Furthermore, we have added a new analysis of bacterial trajectories post-collision (Fig. S2), which demonstrates that cells predominantly align with and swim along the sidewalls. We have also clarified the assumptions in our numerical simulations, specifically how the radius of circular trajectories and the alignment effect are incorporated into the equations of motion. Please refer to our detailed responses in the "Recommendations for the authors" section for further specifics.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigated the chemotaxis of E. coli swimming close to the bottom surface in gradients of attractant in channels of increasingly smaller width but fixed height = 30 µm and length ~160 µm. In relatively large channels, they find that on average the cells drift in response to the gradient, despite cells close to the surface away from the walls being known to not be chemotactic because they swim in circles.

      They find that this average drift is due to the cell localization close to the side walls, where they slide along the wall. Whereas the bacteria away from the walls have no chemotaxis (as shown before), the ones on the left side wall go down-gradient on average, but the ones on the right-side wall go up-gradient faster, hence the average drift. They then study the effect of reducing channel width. They find that chemotaxis is higher in channels with a width of about 8 µm, which approximately corresponds to the radius of the circular swimming R. This higher chemotactic drift is concomitant to an increased density of cells on the RSW. They do simulations and modeling to suggest that the disruption of circular swimming upon collision with the wall increases the density of cells on the RSW, with a maximal effect at w = ~ 2/3 R, which is a good match for their experiments.

      Strengths:

      The overall result that confinement at the edge stabilises bacterial motion and allows chemotaxis is very interesting although not entirely unexpected. It is also important for understanding bacterial motility and chemotaxis under ecologically relevant conditions, where bacteria frequently swim under confinement (although its relevance for controlling infections could be questioned). The experimental part of the study is nicely supported by the model.

      Weaknesses:

      Several points of this study, in particular the interpretation of the width effect, need better clarification:

      (1) Context:

      There are a number of highly relevant previous publications that should have been acknowledged and discussed in relation to the current work:

      https://pubs.rsc.org/en/content/articlehtml/2023/sm/d3sm00286a

      https://link.springer.com/article/10.1140/epje/s10189-024-00450-7

      https://doi.org/10.1016/j.bpj.2022.04.008

      https://doi.org/10.1073/pnas.1816315116

      https://www.pnas.org/doi/full/10.1073/pnas.0907542106

      https://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1039/c5sm00939a

      We appreciate the reviewer bringing these important publications to our attention. We have now cited and discussed these works in the Introduction (lines 55-62 and 76-85) to better contextualize our study regarding bacterial motility and chemotaxis in confined geometries.

      (2) Experimental setup:

      a) The channels are built with asymmetric entrances (Figure 1), which could trigger a ratchet effect (because bacteria swim in circle) that could bias the rate at which cells enter into the channel, and which side they follow preferentially, especially for the narrow channel. Since the channel is short (160 µm), that would reflect on the statistics of cell distribution. Controls with straight entrances or with a reversed symmetry of the channel need to be performed to ensure that the reported results are not affected by this asymmetry.

      We appreciate the reviewer's insight regarding the potential ratchet effect caused by asymmetric entrances. To rule this out, we fabricated a control device with straight entrances and repeated the measurements. As shown in Figure S3, the chemotactic drift velocity follows the same trend as observed in the original setup, confirming an optimal width of ~9 mm. These results demonstrate that the entrance geometry does not bias the reported statistics. We have updated the manuscript text at lines 233-235.

      b) The authors say the motile bacteria accumulate mostly at the bottom surface. This is strange, for a small height of 30 µm, the bacteria should be more-or-less evenly spread between the top and bottom surface. How can this be explained?

      We apologize for not explaining this clearly in the text. As shown by Wei et al., Phys. Rev. Lett. 135, 188401 (2025), significant surface accumulation occurs in channels with heights exceeding 20 µm. In our specific experimental setup, we did not use Percoll to counteract gravity. Therefore, the bacteria accumulated mostly at the bottom surface under the combined influence of gravity and hydrodynamic attraction. This bottom-surface localization is supported by our observation that the bacterial trajectories were predominantly clockwise (characteristic of the bottom surface) rather than counter-clockwise (characteristic of the top surface). We have added this explanation to Line 141.

      c) At the edge, some of the bacteria could escape up in the third dimension (http://doi.org/10.1039/c5sm00939a). What is the magnitude of this phenomenon in the current setup? Does it have an effect?

      We thank the reviewer for raising this important point regarding 3D escape. We have quantified this phenomenon and found the escape rate from the edge into the third dimension to be 0.127 s<sup>-1</sup>. This corresponds to a mean residence time that allows a cell moving at 20 mm/s to travel approximately 157.5 mm along the edge. Since this distance is comparable to the full length of our lanes (~160 mm), most cells traverse the entire edge without escaping. Furthermore, our analysis is based on the average drift of the surface trajectories per unit of time; this metric is independent of the absolute number of cells present. Therefore, the escape phenomenon does not significantly impact our conclusions. We have added a statement clarifying this at line 154.

      d) What is the cell density in the device? Should we expect cell-cell interactions to play a role here? If not, I would suggest to de-emphasize the connection to chemotaxis in the swarming paper in the introduction and discussion, which doesn't feel very relevant here, and rather focus on the other papers mentioned in point 1.

      The cell density in our experiments was approximately 1.3×10<sup>-3</sup> μm<sup>-2</sup>. Given this low density, we do not expect cell-cell interactions to play a role in the observed behaviors.

      Regarding the connection to swarming chemotaxis: We agree that our low-density setup differs from a high-density swarm; however, we believe the comparison remains relevant for two reasons. First, it provides a necessary contrast to studies showing surface inhibition of chemotaxis. Second, while we eliminate cell-cell interactions, we isolate the geometric aspect of swarming. In a swarm, cells move within narrow lanes created by their neighbors. Our device mimics this specific physical confinement by replacing neighboring cells with PDMS sidewalls. This allows us to decouple the effects of physical confinement from cell-cell interactions. We have added the text (Line 370) to clarify this rationale and have incorporated the additional references in introduction as suggested in point 1.

      e) We are not entirely convinced by the interpretation of the results in narrow channels. What is the causal relationship between the increased density on the RSW and the higher chemotactic drift? The authors seem to attribute higher drift to this increased RSW density, which emerges due to the geometric reasons. But if there is no initial bias, the same geometric argument would induce the same increased density of down-gradient swimmers on the LSW, and so, no imbalance between RSW and LSW density. Could it be the opposite that the increased RSW density results from chemotaxis (and maybe reinforces it), not the other way around? Confinement could then deplete one wall due to the proximity of the other, and/or modify the swimming pattern - 8 µm is very close to the size of the body + flagellum. To clarify this point, we suggest measuring the bacterial distributions in the absence of a gradient for all channel widths as a control.

      We thank the reviewer for this insightful comment regarding the causal relationship between cell density and chemotactic drift. We apologize if the initial explanation was unclear.

      Regarding the no-gradient control: Without an attractant gradient (and no initial bias), there is no breaking of symmetry and the labels of "LSW" and "RSW" are arbitrary. Therefore, there will be no asymmetry in the bacterial distributions on both sides (within experimental fluctuations) in the absence of a gradient for any channel width.

      Regarding the causality and density imbalance: We agree that the increased RSW density is a result of chemotaxis, which is then reinforced by the lane geometry especially at narrow lane width. The mechanism relies on the coupling of chemotactic bias with surface circularity. The angle ranges that lead to RSW-UG accumulation (Fig. 6A-C) coincide with the up-gradient direction. Because these cells experience suppressed tumbling (longer runs), they can maintain the steady circular trajectories required to reach and align with the RSW. Conversely, while pure geometric analysis suggests a similar potential for LSW-DG accumulation, these trajectories coincide with the down-gradient direction. These cells experience enhanced tumbling, which distorts the circular trajectories. This prevents them from effectively reaching the LSW and also increases the probability of them leaving the wall. Therefore, the causality is indeed a positive feedback loop: the attractant gradient creates an initial bias that allows the RSW-UG fraction to form stable trajectories; the optimal lane width (matching the swimming radius) then maximizes this capture efficiency, further enriching the RSW fraction and enhancing the overall drift.

      We have added clarifications regarding these points in the revised manuscript (the last paragraph of “Results”).

      (3) Simulations:

      The simulations treat the wall interaction very crudely. We would suggest treating it as a mechanical object that exerts elastic or "hard sphere" forces and torques on the bacteria for more realistic modeling.

      We appreciate the reviewer's suggestion to incorporate more detailed mechanical interactions, such as elastic or hard-sphere forces, for the wall collisions. While we agree that a full hydrodynamic or mechanical model would offer higher fidelity, our experimental observations suggest that a simplified kinematic approach is sufficient for the specific phenomena studied here.

      As shown in the new Fig. S2, our analysis of cell trajectories in the 44-µm-wide channels reveals that cells colliding with the sidewalls tend to align with the surface almost instantaneously. The timescale required for this alignment is negligible compared to the typical wall residence time (see also Ref. 6). Consequently, to maintain computational efficiency without sacrificing the essential physics of the accumulation effect, we employed a coarse-grained phenomenological model where a bacterium immediately aligns parallel to the wall upon contact, similar to approaches used previously (Ref. 43). We have added relevant text to the manuscript on lines 168-171.

      Notably, the simulations have a constant (chemotaxis independent) rate of wall escape by tumbling. We would expect that reduced tumbling due to up-gradient motility induces a longer dwell time at the wall.

      We apologize for the confusion. The chemotaxis effect is indeed fully integrated into our simulation. Specifically, the simulated cells sense the chemical gradient and adjust their motor CW bias (B) accordingly. This adjustment directly modulates the tumble rate (k), calculated as k \= B/0.31 s<sup>-1</sup>. Consequently, the wall escape rate is not constant but varies with the chemotactic response. We also imposed a maximum detention time limit which, when combined with the variable tumble rate, results in an average wall residence time of approximately 2 s, consistent with our experimental observations (Fig. S6B). We have clarified these details in the final section of 'Materials and Methods'.

      Reviewer #3 (Public review):

      This paper addresses through experiment and simulation the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      While the data analysis is reasonably convincing, I think that the authors could make much better use of what must be voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, I would like to see much more analysis of how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis. In essence, there needs to be a much clearer control analysis of trajectories without sidewalls to understand the mechanism in their presence.

      We thank the reviewer for this insightful suggestion. We agree that understanding how circular trajectories are interrupted by wall collisions is central to explaining the enhanced chemotaxis. While we did not explicitly formulate a Fokker-Planck equation, we have addressed the reviewer's core point by employing two complementary mathematical approaches that model the probability distribution of swimming directions and wall interactions:

      (1) Stochastic simulations (Langevin approach): As detailed in the "Simulation of E. coli chemotaxis within lane confinements" subsection of “Results” and Figure 5, we modeled cells as self-propelled particles performing random walks. This model explicitly accounts for the "interruption" of circular trajectories by incorporating a constant angular velocity (circular swimming) and an alignment effect upon collision with sidewalls. These simulations successfully reproduced the experimental trends, confirming that the interplay between circular radius and lane width determines the optimal drift velocity.

      (2) Geometric probability analysis: To provide the "intuitive understanding", we included a specific Geometrical Analysis section (the last subsection of “Results”) and Figure 6. This analysis mathematically formulates the problem by calculating the exact proportion of swimming angles that allow a cell to transition from a circular trajectory in the bulk to an up-gradient trajectory along the Right Sidewall (RSW). By integrating over the possible swimming directions, we derived the probability of wall interception as a function of lane width (w) and swimming radius (r). This analysis reveals that the interruption of circular paths is most favorable for chemotaxis when w » (0.7-0.8)´r.

      (3) Control analysis: regarding the "control analysis of trajectories without sidewalls," we utilized the cells in the Middle Area (MA) of the wide lanes as an internal control. As shown in Fig. 2B and 4A, these cells exhibit typical surface-associated circular swimming (Fig. 3B) but generate zero net drift. This serves as the baseline "no sidewall" condition, demonstrating that the chemotactic enhancement is strictly driven by the rectification of circular swimming into wall-aligned motion at the boundaries.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. Yet, each of these would be characterized by significant heterogeneity in pore sizes and geometries, and thus it is very unclear whether or how the findings in this work would carry over to those situations.

      We thank the reviewer for this important observation regarding environmental heterogeneity. We agree that we should be cautious about directly extrapolating to complex ecological contexts without qualification. We have revised the last sentence of the abstract to adopt a more measured tone: "Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Key elements of the mechanism of wall-directed chemotaxis are not sufficiently emphasized:

      For instance, the chirality of the trajectories is an essential part of the analysis but is mentioned only briefly in the introduction. In the geometrical analysis, I understand that one of the critical parameters is the angle at which bacteria "collide" with the walls. But, again, this remains largely implicit in the discussion. This comes to the point that these ideas are not even mentioned in the abstract which doesn't provide any hint of a mechanism. An analysis of the actual trajectories of the cells after they hit the walls, as a function of their initial angle would be helpful in comparison with the simulations and the geometrical analysis.

      We appreciate the reviewer's insightful comment regarding the need to better emphasize the mechanism of wall-directed chemotaxis. We agree that the chirality of trajectories and the geometry of wall collisions are central to our analysis and were previously under-emphasized.

      To address this, we have made the following revisions:

      (1) We have revised the Abstract (lines 25-27) and the Discussion (lines 391-393) to explicitly highlight the crucial role of chiral circular motion and the alignment effect following sidewall collisions.

      (2) We further analyzed bacterial trajectories at different collision angles. Typical examples are shown in Supplementary Fig. S2. We observed that cells tend to align with and swim along the sidewalls regardless of their initial collision angles. This finding is now described in the main text at lines 168-171.

      The motion of the bacteria is modelled as run-and-tumble at several places in the manuscript, and in particular in the simulations. Yet, the trajectories of the bacteria seem to be smooth in this almost 2D geometry, except of course when they directly interact with the walls (I hardly see tumbles in the MA region in Figure 1B). Can the authors elaborate on the assumptions made in the numerical simulations? In particular, how is the radius of the trajectories included in these equations of motion (line 514)?

      We apologize for the lack of clarity regarding the bacterial motion model. It has been established that while bacteria do tumble near solid surfaces, they exhibit a smaller reorientation angle compared to bulk fluids; in fact, the most probable reorientation angle on a surface is zero (Ref. 41). Consequently, tumbles are often difficult to distinguish from runs with the naked eye. Additionally, the trajectories in Figure 1B are plotted on a 44 mm ´ 150 mm canvas with unequal coordinate scales, which may further obscure the visual distinctness of tumbling events.

      Regarding the equations of motion: We modeled the bacteria as self-propelled particles governed by the internal chemotaxis pathway, alternating between run and tumble states. As noted in the equations on lines 286 & 578, we incorporated the circular motion by introducing a constant angular velocity, −ν<sub>0</sub>/r, during the run state. Here, ν<sub>0</sub> represents the swimming speed, r denotes the radius of circular swimming, and the negative sign indicates clockwise chirality. Furthermore, to model the hydrodynamic interaction with the boundaries, we assumed that when a cell collides with a sidewall, its velocity vector instantly aligns parallel to that wall.

      The comparison of Figure 5B (simulations) with Figure 4B (experiments) does not strike me as so "similar". Why are the points at small widths so noisy (Figure 5AB)? Figure 5C is cut at these widths, it should be plotted over the entire scale.

      We acknowledge that the agreement between simulation and experiment is less robust in the narrowest channels. The discrepancy and "noise" at small widths in Figure 5 arise from the limitations of the self-propelled particle model in highly confined geometries. Specifically, our simulation treats bacteria as point particles and does not explicitly calculate the physical exclusion (steric effects) caused by the finite size of the flagella and cell body.

      In the experimental setup, steric constraints within narrow channels (comparable to the cell size) restrict the cells' ability to turn freely, effectively stabilizing their motion. However, because our model allows particles to reorient more freely than actual cells would in such confined spaces, it produces fluctuations and an overestimation of the drift velocity at small widths. If these confinement effects were fully incorporated, the cell density mismatch between the left and right sidewalls would be reduced, leading to lower drift velocities that match the experimental data more closely.

      Regarding Figure 5C: Since the "active particle" assumption loses physical validity in channels narrower than the scale of the bacterium, the simulation results in this regime are not representative of biological reality. Plotting these non-physical points would distort the analysis. Therefore, we have maintained the truncation of Figure 5C at 4 mm to ensure the data presented is physically meaningful. We have added a clear discussion of these model limitations to the manuscript at lines 310-314.

      These important precisions should be added to the text or in a supplementary section. A validated mechanism describing in detail the impact of the walls on the cell trajectories would greatly improve the conclusions.

      We thank the reviewer for the suggestions. As noted in the responses above, we have incorporated the details concerning the simulation assumptions and the model limitations at narrow widths into the revised manuscript. We have performed further analysis of the collision trajectories between bacteria and the sidewalls. As illustrated in the new Fig. S2, the data confirms that cells tend to align with and swim along the sidewalls following a collision, regardless of the initial impact angle.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Related to swimming in 3D: The authors should specify the depth of field of the objective in their setup.

      We thank the reviewer for pointing this out. We have calculated the depth of field (DOF) of our objective to be approximately 3.7 µm. This estimate is based on the standard formula:

      where l = 610 nm (emission wavelength), n = 1.0 (refractive index), NA = 0.45 (numeric aperture), M = 20 (magnification), and e = 6.5 µm (camera resolution). We have added this specification to the "Microscopy and Data Acquisition" section of “Materials and Methods”.

      (2) Related to the interpretation of the width effect: We think plotting the cell enrichment, ie the probabilities P in Figure 4B normalized to the expected value if cells were homogeneously distributed ((3µm)/w for the side walls, (w - 6µm)/w for the middle) would help understand the strength of the wall 'siphoning' effect.

      We thank the reviewer for the suggestion. We have calculated the cell enrichment by normalizing the observed probabilities against the expected values for a homogeneous distribution, as suggested. The resulting relationship between cell enrichment and lane width is presented in Figure S4.

      Related to simulations:

      (1) Showing vd for the 3 regions in Figure S5 would be helpful also to understand the underlying mechanism.

      We thank the reviewer for the suggestion. The V<sub>d</sub> values for the three regions are shown in Fig. S5.

      (2) Figure 5B vs 4B: There is a mismatch in the right vs left side density at w=6µm in the simulations that is not here in the experiments. What could explain this difference?

      We appreciate the reviewer pointing this out. The mismatch in the simulations is due to the simplified treatment of cells as self-propelled particles, which overlooks the physical volume of the cell body and flagella. In narrow channels (w\=6 mm), these physical constraints would restrict the cells' ability to change direction freely - a factor not fully captured in the simulation. Accounting for these steric effects would trap cells more effectively against the walls, reducing the density asymmetry between the LSW and RSW and lowering the drift velocity. This would bring the simulation results closer to the experimental observations. We have added a discussion of these limitations and effects to the revised manuscript (lines 310-314).

      (3) The simulations essentially assume that the density of motile cells is homogeneous and equal at both x=0 and x=L open ends of the channel. Is it the case in the experiments, even with the gradient, and the walls creating some cell transport?

      We thank the reviewer for pointing this out. The simulation assumption is consistent with our experimental observations. Our data were recorded within 160-μm-long lanes located in the center of the wider (400 μm) cell channel. In this central region, the cells maintain a continuous flux. Furthermore, experiments were performed within 8 min of flow, limiting the time for significant cell density gradients to establish. As illustrated in Author response image 11, the inhomogeneity in the measured cell density distribution is insignificant across the length of the observation window, indicating that the walls and gradient do not create significant heterogeneity at the boundaries of the region of interest.

      Author response image 1.

      The cell density distribution along the gradient field from the data of 44-μm-wide lane.

      (4) Line 506: There is something strange with the definition of the bias. B cannot be the tumbling bias if k=B/0.31 s<sup>-1</sup> and the tumble-to-run rate is 5/s, because then the tumbling bias is B/0.31 / (B/0.31 + 5). Please clarify.

      We apologize for the confusion caused by the notation. In our model, B represents the CW bias of the individual flagellar motor, not the macroscopic tumbling bias of the cell. We assume the run-to-tumble rate is equivalent to the motor CCW-to-CW switching rate (k). Previous studies have shown that this rate increases linearly with the motor CW bias according to k=B/t, where t is a characteristic time (Ref. 50).

      Based on experimental data for wildtype cells, the average run time in the near-surface region is ~2.0 s (corresponding to a run-to-tumble rate of ~0.5 s<sup>-1</sup>) (Ref. 11), and the steady-state wildtype CW bias is ~0.15. Using these values, we determined t ~ 0.31 s. Consequently, the switching rate is defined as k=B/0.31 s<sup>-1</sup>. Since the tumble duration is constant (0.2 s) (Ref. 51), the tumble-to-run rate is fixed at 5 s<sup>-1</sup>. We have clarified these definitions and parameter values in lines 569-573.

      Other minor comments:

      (1) Line 20 and lines 34-35: We think that the connection to infection is questionable here and should be toned down.

      Thank you for the suggestion. We have revised Line 20 to read: “Understanding bacterial behavior in confined environments is helpful to elucidating microbial ecology and developing strategies to manage bacterial infections.” Additionally, we modified lines 34-35 to state: “Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections.”

      (2) Line 49: Consider highlighting the change in the sense of rotation at the air-liquid interface.

      Thank you for the suggestion. We have now highlighted the difference in chirality between trajectories at the air-liquid interface and those at the liquid-solid interface. The text has been updated to read: “For example, E. coli swim clockwise when observed from above a solid surface, whereas Caulobacter crescentus move in tight, counter-clockwise circles when viewed from the liquid side.”

      (3) Lines 58-59: The sentence should be better formulated, explaining what is CheY-P and that its concentration changes because of a change in phosphorylation (P).

      Thank you for the suggestion. We have reformulated this section to explicitly define CheY-P and explain how its concentration is regulated through phosphorylation. The revised text reads: “The transmembrane chemoreceptors detect attractants or repellents and transmit signals into the cell by modulating the autophosphorylation of the histidine kinase CheA. Attractant binding suppresses CheA autophosphorylation, while repellent binding promotes it. This modulation alters the concentration of the phosphorylated response regulator protein, CheY-P.”

      (4) Lines 63-64: CheR CheB do a bit more than "facilitating" adaptation, they mediate it. The notation CheB(p) may be confusing, since "-P" was used above for CheY.

      Thank you for pointing this out. We have corrected the notation and strengthened the description of the enzymes' roles. The revised text is: “The adaptation enzymes CheR and CheB methylate and demethylate the receptors, respectively, mediating sensory adaptation.”

      (5) Line 130: there must be a typo in the formula.

      We have replaced the ambiguous lag time variable in Fig. 1C with _n_Δt to ensure mathematical consistency.

      (6) Additionally, \Delta t is both the time between the frame here and the lag time in Figure 1.

      Thank you for highlighting this ambiguity. We have updated the notation to distinguish these two values. The lag time in Figure 1 is now explicitly denoted as _n_Δt, while Δt remains the time interval between individual frames.

      (7) Line 162: "Consistent with previous reports," a reference to said reports is missing.

      Thank you for pointing this out. We have now added the reference (Ref. 41) to support this statement.

      (8) Figure 1B: Are these tracks in the presence of a gradient? Same as used in panel C? This needs to be explained.

      Response: Thank you for this question. We confirm that the tracks shown in Figure 1B were indeed recorded in the presence of a gradient and represent a subset of the data used in Figure 1C. We have clarified this in the figure legend as follows: "Thirty bacterial trajectories selected from the data of the 44-mm-wide lane in gradient assays. These represent a subset of the trajectories analyzed in panel C."

      (9) Simulations: the equation for x(t) should also be given for completeness.

      Thank you for the suggestion. For completeness, we have added the position updating equations for the run state to the Materials and Methods section (lines 579-580). The equations are defined as:

      (10) Figure S2: For the swimming directions that are more unstable due to the surface friction torque, RSW-DG, and LSW-UG, one would have expected that the Up-gradient motion is more persistent than the down gradient one. It seems to be the opposite. Is it significant, and what could be the reason for this?

      We apologize for the lack of clarity in our original explanation. While we would generally expect up-gradient motion to be more persistent than down-gradient motion in bulk fluid, our measurements near the surface show a different trend due to the specific contributions of run and tumble states to the escape rate. Cells swimming up-gradient (UG) in the LSW experience higher probability of running. Consequently, they are subjected to the destabilizing surface friction torque for a greater proportion of time compared to cells swimming down-gradient (DG) in the RSW. This can be explained mathematically. The escape rates for RSW-DG and LSW-UG can be expressed as:

      Where B<sup>+</sup> and B<sup>−</sup> represent the tumble bias (probability of tumbling) when swimming up-gradient and down-gradient, respectively, and k<sub>T</sub> and k<sub>R</sub> denote the escape rates during a tumble and a run, respectively. Due to the chemotactic response, 0≤ B<sup>+</sup>< B<sup>−</sup> ≤1. Crucially, our system is characterized by k<sub>R</sub>>k<sub>T</sub> (the escape rate is higher during a run than a tumble). Therefore, the lower tumble bias during up-gradient swimming (B<sup>+</sup>< B<sup>−</sup>) increases the weight of the run-state escape term((1−B<sup>+</sup>)k<sub>R</sub>), leading to a higher overall escape rate for LSW-UG compared to RSW-DG. We have added an intuitive understanding of k<sub>R</sub>>k<sub>T</sub> in the Supplemental text.

    1. eLife Assessment

      This fundamental work reveals that the accessibility of the unstructured C-terminal tail of α-tubulin differs with the state of the microtubule lattice. Accessibility increases with the expansion of the lattice induced by GTP and certain MAPs, which can then dictate the subsequent interactions between MAPs and microtubules, and post-translational modifications of tubulin tails. The evidence supporting the conclusion is compelling, although the characterisation of the probes does not answer whether they directly affect the lattice or expose the C-terminal tail of α-tubulin. The probes can be used as tools in the future to study differences in microtubule lattice regulation under different conditions both in vitro and in vivo. This work will be of great interest to the cytoskeleton field.

    2. Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

    3. Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths and weaknesses:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification. Future work is required to distinguish CTT exposure in the microtubule lattice is sensitive to additional factors present in vivo but not in vitro.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      [Editors' note: the authors have responded to the reviewers and this version was assessed by the editors.]

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Weaknesses:

      There is no information on the status of the beta tubulin CTTs. The study is done with mixed isotype microtubules, both in cells and in vitro. It remains unclear whether all the alpha tubulins in a mixed isotype microtubule lattice behave equivalently, or whether the effect is tubulin isotype-dependent. It remains unclear whether local binding of effectors can locally expand the lattice and locally expose the alpha CTTs.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

      We thank the reviewer for their positive response to our work. We agree that it will be important to determine if the bCTT is subject to regulation similar to the aCTT. However, this will first require the development of sensors that report on the accessibility of the bCTT, which is a significant undertaking for future work.

      We also agree that it will be important to examine whether all tubulin isotypes behave equivalently in terms of exposure of the aCTT in response to conformational switching of the microtubule lattice.

      We thank the reviewer for the comment about local expansion of the microtubule lattice. We believe that Figure 3 does show that local binding of effectors can locally expand the lattice and locally expose the alpha-CTTs. We have added text to clarify this.

      Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      Weaknesses:

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

      We thank the reviewer for their positive response to our work. We are encouraged that the reviewer feels that the Discussion section does a good job of putting the findings, challenges, and possibility of confounding factors and indirect effects in context. 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      Weaknesses:

      There are a number of weaknesses in the paper, many of which can be addressed textually. Some of the supporting evidence is preliminary and would benefit from additional experimental validation and clearer presentation before the conclusions can be considered fully supported. In particular, the authors should directly test in vitro whether Taxol addition can induce lattice exchange (see comments below).

      We thank the reviewer for their positive response to our work. We have altered the text and provided additional experimental validation as requested (see below).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The resolution of the figures is insufficient.

      (2) The provision of scale bars is inconsistent and insufficient.

      (3) Figure 1E, the scale bar looks like an MT.

      (4) Figure 2C, what does the grey bar indicate?

      (5) Figure 2E, missing scale bar.

      (6) Figure 3 C, D, significance brackets misaligned.

      (7) Figure 3E, consider using the same alpha-beta tubulin / MT graphic as in Figure 1B.

      (8) Figure 5E, show cell boundaries for consistency?

      (9) Figure 6D, stray box above the y-axis.

      (11) Figure S3A, scale bar wrong unit again.

      (12) S3B "fixed" and mount missing scale bar in the inset.

      (13) S4 scale bars without scale, inconsistency in scale bars throughout all the figures.

      We apologize for issues with the figures. We have corrected all of the issues indicated by the reviewer.

      (10) Figure 6F, surprising that 300 mM KCL washes out rigor binding kinesin

      We thank the reviewer for this important point. To address the reviewer’s concern, we have added a new supplementary figure (new Figure 6 – Figure Supplement 1) which shows that the washing step removes strongly-bound (apo) KIF5C(1-560)-Halo<sup>554</sup> protein from the microtubules. In addition, we have made a correction to the Materials and Methods section noting that ATP was added in addition to the KCl in the wash buffer. We apologize for omitting this detail in the original submission. We also added text noting that the wash out step was based on Shima et al., 2018 where the observation chamber was washed with either 1 mM ATP and 300 mM K-Pipes or with 10 mM ATP and 500 mM K-Pipes buffer. In our case, the chamber was washed with 3 mM ATP and 300 mM KCl. It is likely that the addition of ATP facilitates the detachment of strongly-bound KIF5C.

      (14) Supplementary movie, please identify alpha and beta tubules for clarity. Please identify residues lighting up in interaction sites 1,2 & 3.

      Thank you for the suggestions. We have made the requested changes to the movie.

      Reviewer #2 (Recommendations for the authors):

      There appear to have been some minor issues (perhaps with .pdf conversion) that leave some text and images pixelated in the .pdf provided, alongside some slightly jarring text and image positioning (e.g., Figure 5E panels). The authors should carefully look at the figures to ensure that they are presented in the clearest way possible.

      We apologize for these issues with the figures. We have reviewed the figures carefully to ensure that they are presented in the clearest way possible.

      The authors might consider providing a more definitive structural description of compact vs expanded lattice, highlighting what specific parameters are generally thought to change and by what magnitude. Do these differ between taxol-mediated expansion or the effects of MAPs?

      Thank you for the suggestion. We have added additional information to the Introduction section.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 should include a schematic overview of all constructs used in the study. A clear illustration showing the probe design, including the origin and function of each component (e.g., tags, domains), would improve clarity.

      Thank you for the suggestion. We have added new illustrations to Figure 1 showing the origin and design (including domains and tags) of each probe.

      (2) Add Western blot data for the 4×CAP-Gly construct to Figure 1C for completeness.

      We thank the reviewer for this suggestion. We carried out a far-western blot using the purified 4xCAPGly-mEGFP protein to probe GST-Y, GST-DY, and GST-DC2 proteins (new Figure 1 – Figure Supplement 1C). We note that some bleed-through signal can be seen in the lanes containing GST-ΔY and GST-ΔC2 protein due to the imaging requirements and exposure needed to visualize the 4xCAPGly-mEGFP protein. Nevertheless, the blot shows that the purified CAPGly sensor specifically recognizes the native (tyrosinated) CTT sequence of TUBA1A.

      (3) Essential background information on the CAP-Gly domain, SXIP motif, and EB proteins is missing from the Introduction. These concepts appear abruptly in the Results and should be properly introduced.

      Thank you for the suggestion. We have added additional information to the Introduction section about the CAP-Gly domain. However, we feel that introducing the SXIP motif and EB proteins at this point would detract from the flow of the Introduction and we have elected to retain this information in the Results section when we detail development of the 4xCAPGly probe.

      (4) In Figure 2E, it remains possible that the CAP-Gly domain displacement simply follows the displacement of EB proteins. An experiment comparing EB protein localization upon Taxol treatment would clarify this relationship.

      We thank the reviewer for raising this important point. To address the reviewer’s concern, we utilized HeLa cells stably expressing EB3-GFP. We performed live-cell imaging before and after Taxol addition (new Figure 2 – Figure Supplement 1C). EB3-EGFP was lost from the microtubule plus ends within minutes and did not localize to the now-expanded lattice.

      (5) Statements such as "significantly increased" (e.g., line 195) should be replaced with quantitative information (e.g., "1.5-fold increase").

      We have made the suggested changes to the text.

      (6) Phrases like "became accessible" should be revised to "became more accessible," as the observed changes are relative, not absolute. The current wording implies a binary shift, whereas the data show a modest (~1.5-fold) increase.

      We have made the suggested changes to the text.

      (7) Similarly, at line 209, the terms "minimally accessible" versus "accessible" should be rephrased to reflect the small relative change observed; saturation of accessibility is not demonstrated.

      We have made the suggested changes to the text.

      (8) Statements that MAP7 "expands the lattice" (line 222) should be made cautiously; to my knowledge, that has not been clearly established in the literature.

      We thank the reviewer for this important comment. We have added text indicating that MAP7’s ability to induce or presence an expanded lattice has not been clearly established.

      (9) In Figures 3 and 4, the overexpression of MAP7 results in a strikingly peripheral microtubule network. Why is there this unusual morphology?

      The reviewer raises an interesting question. We are not sure why the overexpression of MAP7 results in a strikingly peripheral microtubule network but we suspect this is unique to the HeLa cells we are using. We have observed a more uniform MAP7 localization in other cell types [e.g. COS-7 cells (Tymanskyj et al. 2018), consistent with the literature [e.g. BEAS-2B cells (Shen and Ori-McKenney 2024), HeLa cells (Hooikaas et al. 2019)].

      (10) In Supplementary Figure 5C, the Western blot of detyrosination levels is inconsistent with the text. Untreated cells appear to have higher detyrosination than both wild-type and E254A-overexpressing cells. Do you have any explanation?

      We thank the reviewer for this important comment. We do not have an explanation at this point but plan to revisit this experiment. Unfortunately, the authors who carried out this work recently moved to a new institution and it will be several months before they are able to get the cell lines going and repeat the experiment. We thus elected to remove what was Supp Fig 5C until we can revisit the results. We believe that the important results are in what is now Figure 5 - Figure Supplement 1A,B which shows that the expression levels of the WT and E254E proteins are similar to each other.

      (11) The image analysis method in Figures 5B and 5D requires clarification. It appears that "density" was calculated from skeletonized probe length over total area, potentially using a strict intensity threshold. It looks like low-intensity binding has been excluded; otherwise, the density would be the same from the images. If so, this should be stated explicitly. A more appropriate analysis might skeletonize and integrate total fluorescence intensity relative to the overall microtubule network.

      We have added additional information to the Materials and Methods section to clarify the image analysis. We appreciate the reviewer’s valuable feedback and the suggestion to use the integrated total fluorescence intensity, which is a theoretically sound approach. While we agree that integrated intensity is a valid metric for specific applications, its appropriate use depends on two main preconditions:

      (1) Consistent microscopy image acquisition conditions.

      (2) Consistent probe expression levels across all cells and experiments.

      We successfully maintained consistent image acquisition conditions (e.g., exposure time) throughout the experiment. However, despite generating a stably-expressing sensor cell lines to minimize variation, there remains an inherent, biological variability in probe expression levels between individual cells. Integrated intensity is highly susceptible to this cell-to-cell variability. Relying on it would lead to a systematic error where differences in the total amount of expressed probe would be mistaken for differences in Y-aCTT accessibility.

      The density metric (skeletonized probe length / total cell area) was deliberately chosen as it serves as a geometric measure rather than an intensity-based normalization. The density metric quantifies the proportion of the microtubule network that is occupied by Y-aCTT-labeled structures, independent of fluorescence intensity. Thus, the density metric provides a more robust and interpretable measure of Y-aCTT accessibility under the variable expression conditions inherent to our experimental system. Therefore, we believe that this geometric approach represents the most appropriate analysis for our image dataset.

      (12) In Figure 5D, the fold-change data are difficult to interpret due to the compressed scale. Replotting is recommended. The text should also discuss the relative fold changes between E254A and Taxol conditions, Figure 2H.

      We appreciate the reviewer's insightful comment. We agree that the presence of significant outliers led to a compressed Y-axis scale in Figure 5D, obscuring the clear difference between the WT-tubulin and E254A-tubulin groups. As suggested, we have replotted Figure 5D using a broken Y-axis to effectively expand the relevant lower range of the data while still accurately representing all data points, including the outliers. We believe that the revised graph significantly enhances the clarity and interpretability of these results. For Figure 2, we have added the relative fold changes to the text as requested.

      (13) Figure 6. The authors should directly test in vitro whether Taxol addition can induce lattice exchange, for example, by adding Taxol to GDP-microtubules and monitoring probe binding. Including such an assay would provide critical mechanistic evidence and substantially strengthen the conclusions. I was waiting for this experiment since Figure 2.

      We thank the reviewer for this suggestion. As suggested, we generated GDP-MTs from HeLa tubulin and added it to two flow chambers. We then flowed in the YL1/2<sup>Fab</sup>-EGFP probe into the chambers in the presence of DMSO (vehicle control) or Taxol. Static images were taken and the fluorescence intensity of the probe on microtubules in each chamber was quantified. There was a slight but not statistically significant difference in probe binding between control and Taxol-treated GDP-MTs (Author response image 1). While disappointing, these results underscore our conclusion (Discussion section) that microtubule assembly in vitro may not produce a lattice state resembling that in cells, either due to differences in protofilament number and/or buffer conditions and/or the lack of MAPs during polymerization.

      Author response image 1.

      References

      Hooikaas, P. J., Martin, M., Muhlethaler, T., Kuijntjes, G. J., Peeters, C. A. E., Katrukha, E. A., Ferrari, L., Stucchi, R., Verhagen, D. G. F., van Riel, W. E., Grigoriev, I., Altelaar, A. F. M., Hoogenraad, C. C., Rudiger, S. G. D., Steinmetz, M. O., Kapitein, L. C. and Akhmanova, A. (2019). MAP7 family proteins regulate kinesin-1 recruitment and activation. J Cell Biol, 218, 1298-1318.

      Shen, Y. and Ori-McKenney, K. M. (2024). Microtubule-associated protein MAP7 promotes tubulin posttranslational modifications and cargo transport to enable osmotic adaptation. Dev Cell, 59, 1553-1570.

      Tymanskyj, S. R., Yang, B. H., Verhey, K. J. and Ma, L. (2018). MAP7 regulates axon morphogenesis by recruiting kinesin-1 to microtubules and modulating organelle transport. Elife, 7.

    1. eLife Assessment

      In this important study, the authors conducted extensive sets of computational and investigations of the mechanism of cholesterol transport in the smoothened (SMO) protein. The computational component integrated multiple state-of-the-art approaches such as adaptive sampling, free energy simulations, and Markov state modeling, providing compelling support for the proposed mechanistic model, which is further validated with solid experimental mutagenesis data.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses primarily simulation tools to probe the pathway of cholesterol transport with the smoothened (SMO) protein. The pathway to the protein and within SMO is clearly discovered and interactions deemed important are tested experimentally to validate the model predictions.

      Strengths:

      The authors have clearly demonstrated how cholesterol might go from the membrane through SMO for the inner and outer leaflets of a symmetrical membrane model. The free energy profiles, structural conformations and cholesterol-residue interactions are clearly described.

      Weaknesses:

      None. I find the revised manuscript strong and the work should be published.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors applied a range of computational methods to probe the translocation of cholesterol through the Smoothened receptor. They test whether cholesterol is more likely to enter the receptor straight from the outer leaflet of membrane or via a binding pathway in the inner leaflet first. Their data reveal that both pathways are plausible but that the free energy barriers of pathway 1 is lower suggesting this route is preferable. They also probe the pathway of cholesterol transport from the transmembrane region to the cysteine-rich domain (CRD).

      Strengths:

      A wide range of computational techniques are used, including potential of mean force calculations, adaptative sampling, dimensionality reduction using tICA, and MSM modelling. These are all applied in a rigorous manner and the data are very convincing. The computational work is an exemplar of a well-carried out study.

      Their computational predictions are experimentally supported using mutagenesis, with an excellent agreement between their PMF and mRNA fold change data.

      The data are described clearly and coherently, with excellent use of figures. They combine their findings into a mechanism for cholesterol transport, which on the whole seems sound.

      Their methods are described well, and much of their analysis methods have been made available via GitHub, which is an additional strength.

    4. Reviewer #3 (Public review):

      This manuscript presents a study combining molecular dynamics simulations and Hedgehog (Hh) pathway assays to investigate cholesterol translocation pathways to Smoothened (SMO), a G protein-coupled receptor central to Hedgehog signal transduction. The authors identify and characterize two putative cholesterol access routes to the transmembrane domain (TMD) of SMO and propose a model whereby cholesterol traverses through the TMD to the cysteine-rich domain (CRD), which is presented as the primary site of SMO activation.

      The MD simulations and biochemical experiments are carefully executed and provide useful data.

      Comments on revisions:

      I appreciate the authors' detailed response and the substantial revisions made to the manuscript. The changes addressing Comments 3.1-3.5 have significantly improved the balance and framing of the work, and my primary concerns regarding overstatement and selective interpretation have been satisfactorily addressed.

      The authors' rebuttal to my initial review includes extended argumentation regarding specific interpretations of prior studies and broader models of SMO regulation. These issues represent longstanding differences in interpretation that have already been discussed extensively in the literature and are not essential to evaluating the quality or conclusions of the present study.

      For readers seeking a comprehensive and balanced overview of cholesterol-dependent SMO activation that integrates both CRD- and TMD-centered models, I would point to recent review articles (e.g., Zhang and Beachy, Nat Rev Mol Cell Biol2023). I do not feel it is productive to rehash these debates further in the context of this review, and I have no additional substantive concerns with the revised manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses primarily simulation tools to probe the pathway of cholesterol transport with the smoothened (SMO) protein. The pathway to the protein and within SMO is clearly discovered, and interactions deemed important are tested experimentally to validate the model predictions.

      Strengths:

      The authors have clearly demonstrated how cholesterol might go from the membrane through SMO for the inner and outer leaflets of a symmetrical membrane model. The free energy profiles, structural conformations, and cholesterol-residue interactions are clearly described.

      We thank the reviewer for their kind words.

      (1) Membrane Model: The authors decided to use a rather simple symmetric membrane with just cholesterol, POPC, and PSM at the same concentration for the inner and outer leaflets. This is not representative of asymmetry known to exist in plasma membranes (SM only in the outer leaflet and more cholesterol in this leaflet). This may also be important to the free energy pathway into SMO. Moreover, PE and anionic lipids are present in the inner leaflet and are ignored. While I am not requesting new simulations, I would suggest that the authors should clearly state that their model does not consider lipid concentration leaflet asymmetry, which might play an important role.

      We thank the reviewer for their comment. Membrane asymmetry is inherent in endogenous systems; we acknowledge that as a limitation of our current model. We have addressed the comment by adding this limitation to our discussion in the manuscript.

      Added lines: (End of paragraph 6, Results subsection 2):

      “One possibility that might alter the thermodynamic barriers is native membrane asymmetry, particularly the anionic lipid-rich inner leaflet. This presents as a limitation of our current model.”

      (2) Statistical comparison of barriers: The barriers for pathways 1 and 2 are compared in the text, suggesting that pathway 2 has a slightly higher barrier than pathway 1. However, are these statistically different? If so, the authors should state the p-value. If not, then the text in the manuscript should not state that one pathway is preferred over the other.

      We thank the reviewer for their comment. We have added statistical t-tests for the barriers.

      Changes made: (Paragraph 6, Results subsection 2)

      “However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol v/s 6.5 ± 0.8 kcal/mol, p = 0.0013)”

      (3) Barrier of cholesterol (reasoning): The authors on page 7 argue that there is an enthalpy barrier between the membrane and SMO due to the change in environment. However, cholesterol lies in the membrane with its hydroxyl interacting with the hydrophilic part of the membrane and the other parts in the hydrophobic part. How is the SMO surface any different? It has both characteristics and is likely balanced similarly to uptake cholesterol. Unless this can be better quantified, I would suggest that this logic be removed.

      We thank the reviewer for this suggestion. We have removed the line to avoid confusion.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors applied a range of computational methods to probe the translocation of cholesterol through the Smoothened receptor. They test whether cholesterol is more likely to enter the receptor straight from the outer leaflet of the membrane or via a binding pathway in the inner leaflet first. Their data reveal that both pathways are plausible but that the free energy barriers of pathway 1 are lower, suggesting this route is preferable. They also probe the pathway of cholesterol transport from the transmembrane region to the cysteine-rich domain (CRD).

      Strengths:

      (1) A wide range of computational techniques is used, including potential of mean force calculations, adaptive sampling, dimensionality reduction using tICA, and MSM modelling. These are all applied rigorously, and the data are very convincing. The computational work is an exemplar of a well-carried out study.

      (2) The computational predictions are experimentally supported using mutagenesis, with an excellent agreement between their PMF and mRNA fold change data.

      (3) The data are described clearly and coherently, with excellent use of figures. They combine their findings into a mechanism for cholesterol transport, which on the whole seems sound.

      (4) The methods are described well, and many of their analysis methods have been made available via GitHub, which is an additional strength.

      Weaknesses:

      (1) Some of the data could be presented a little more clearly. In particular, Figure 7 needs additional annotation to be interpretable. Can the position of the cholesterol be shown on the graph so that we can see the diameter change more clearly?

      We thank the reviewer for this suggestion. We have added the cholesterol positions as requested.

      Changes made: (Caption, Figure 7)

      “The tunnel profile during cholesterol translocation in SMO. (a) Free energy plot of the zcoordinate v/s the tunnel diameter when cholesterol is present in the core TMD. The tunnel shows a spike in the radius in the TMD domain, indicating the presence of a cholesterol-accommodating cavity. (b) Representative figure for the tunnel when a cholesterol molecule is in the TMD. (c) Same as (a), when cholesterol is at the TMD-CRD interface. (e) same as (b), when cholesterol is at the TMD-CRD interface. (e) same as (a), when cholesterol is at the CRD binding site. (f) same as (b), when cholesterol is at the CRD binding site. Tunnel diameters shown as spheres. Cholesterol positions marked on plots using dotted lines. All snapshots presented are frames taken from MD simulations.”

      (2) In Figure 3C, it doesn’t look like the Met is constricting the tunnel at all. What residue is constricting the tunnel here? Can we see the Ala and Met panels from the same angle to compare the landscapes? Or does the mutation significantly change the tunnel? Why not A283 to a bulkier residue? Finally, the legend says that the figure shows that cholesterol can still pass this residue, but it doesn’t really show this. Perhaps if the HOLE graph was plotted, we could see the narrowest point of the tunnel and compare it to the size of cholesterol.

      We thank the reviewer for this suggestion. A283 was mutated to methionine as it presents with a longer heavy tail containing sulfur. We have plotted the tunnel radii for both WT and A283M mutants and added them as a supplemental figure. As shown in the figure, the presence of methionine doesn’t completely block the tunnel, but occludes it, thereby increasing the barrier for cholesterol transport slightly.

      Changes made: (End of Results subsection 1)

      “When we calculated the PMF for cholesterol entry, A<sup>2.60f</sup>M mutant showed restricted tunnel but it did not fully block the tunnel (Figure 3—figure Supplement 3).”

      (3) The PMF axis in 3b and d confused me for a bit. Looking at the Supplementary data, it’s clear that, e.g., the F455I change increases the energy barrier for chol entering the receptor. But in 3d this is shown as a -ve change, i.e., favourable. This seems the wrong way around for me. Either switch the sign or make this clearer in the legend, please.

      We thank the reviewer for this suggestion. We measured ∆PMF as PMF<sub>WT</sub> PMF<sub>mutant</sub>, hence the negative values. We have added additional text to the legend to clarify this.

      Changes made: (Caption, Figure 3)

      “(b) ∆Gli1 mRNA fold change (high SHH vs untreated) and ∆ PMF (difference of peak PMF , calculated as PMF<sub>WT</sub> - PMF<sub>mutant</sub>) plotted for the mutants in Pathway 1. (c) Example mutant A<sup>2_._60f</sup>M shows that cholesterol can enter SMO through Pathway 1 even on a bulky mutation. (d) Same as (b) but for Pathway 2 (e) Example mutant L<sup>5.62f</sup>A shows that cholesterol can enter SMO through Pathway 2 due to lesser steric hindrance. All snapshots presented are frames taken from MD simulations.”

      Changes made: (Caption, Figure 6)

      “(b) ∆Gli1 mRNA fold change (high SHH vs untreated) and ∆ PMF (difference of peak PMF, calculated as PMF<sub>WT</sub> - PMF<sub>mutant</sub>) plotted for mutants along the TMD-CRD pathway. (c, d) Example mutants Y<sup>LD</sup>A and F<sup>5.65f</sup>A show that cholesterol is unable to translocate through this pathway because of the loss of crucial hydrophobic contacts provided by Y207 and F484 and along the solvent-exposed pathway.”

      (4) The impact of G280V is put down to a decrease in flexibility, but it could also be a steric hindrance. This should be discussed.

      We thank the reviewer for this suggestion. We have added it as a possible mechanism of the decrease in activity of SMO.

      Changes made: (Paragraph 5, Results subsection 1)

      “We mutated G280<sup>2.57f</sup>  to valine - G<sup>2.57f</sup>V to test whether reducing the flexibility of TM2 prevents cholesterol entry into the TMD. Consequently, the activity of mSMO showed a decrease. However, this decrease could also be attributed to steric hindrance added by the presence of a bulky propyl group in valine.”

      (5) Are the reported energy barriers of the two pathways (5.8plus minus0.7 and 6.5plus minus0.8 kcal/mol) significantly and/or substantially different enough to favour one over the other? This could be discussed in the manuscript.

      We thank the reviewer for this suggestion. We have added statistical t-tests for the barriers.

      Changes made: (Paragraph 6, Results subsection 2)

      “However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol v/s 6.5 ± 0.8 kcal/mol, p = 0.001)”

      (6) Are the energy barriers consistent with a passive diffusion-driven process? It feels like, without a source of free energy input (e.g., ion or ATP), these barriers would be difficult to overcome. This could be discussed.

      We thank the reviewer for this suggestion. We have added a discussion to further clarify this point.

      Discussion: (Paragraph 6, Results subsection 2)

      “These values are comparable to ATP-Binding Cassette (ABC) transporters of membrane lipids, which use ATP hydrolysis (-7.54 ± 0.3 kcal/mol) (Meurer et al., 2017) to drive lipid transport from the membrane to an extracellular acceptor. Some of these transporters share the same mechanism as SMO, where the lipid from the inner leaflet is flipped and transported to the extracellular acceptor protein (Tarling et al., 2013). Additionally, for secondary active transporters that do not use ATP for the transport of substrates, a thermodynamic barrier of 5-6 kcal/mol has been reported in literature. (Chan et al., 2022; Selvam et al., 2019; McComas et al., 2023; Thangapandian et al., 2025).”

      (7) Regarding the kinetics from MSM, it is stated that the values seen here are similar to MFS transporters, but this then references another MSM study. A comparison to experimental values would support this section a lot.

      We thank the reviewer for this suggestion. We have added a discussion discussing millisecond-scale timescales measured for MFS transporters.

      Changes made: (Paragraph 2, Results subsection 5)

      “These timescales are comparable to the substrate transport timescales of Major Facilitator Superfamily (MFS) transporters (Chan et al., 2022). Furthermore, several experimental studies have also resolved the millisecond-scale kinetics of MFS transporters (Blodgett and Carruthers, 2005; Körner et al., 2024; Bazzone et al., 2022; Smirnova et al., 2014; Zhu et al., 2019), further corroborating the results from our study.”

      Reviewer #2 (Recommendations for the authors):

      (1) The heatmaps in Figures 2a and 4a are great. On these, an arrow denotes what looks like a minimum energy path. Is it possible to see this plotted, as this might show the height of the energy barriers more clearly?

      We thank the reviewer for this suggestion. We have computed the minimum energy paths for both pathways and presented them in a supplementary figure.

      Added lines: (Paragraph 4, Results subsection 1):

      For further clarity, we have plotted the minimum energy path taken by cholesterol as it translocates along this pathway (Figure 2—figure Supplement 3)a,b)

      Added lines: (Paragraph 4, Results subsection 2):

      For further clarity, we have plotted the minimum energy path taken by cholesterol as it translocates along this pathway (Figure 2—figure Supplement 3)c,d)

      (2) The tiCA data in S15 is first referred to on line 137, but the technique isn’t introduced until line 222. This makes understanding the data a little confusing. Reordering this might improve readability.

      We thank the reviewer for this suggestion. We have reordered the text to make it clearer.

      Changes made: (Paragraph 2, Results subsection 1) This provides evidence for multiple stable poses along the pathway as observed in the multiple stable poses of cholesterol in Cryo-EM structures of SMO bound to sterols (Deshpande et al., 2019; Qi et al., 2019b, 2020). A reliable estimate of the barriers comes from using the time-lagged Independent Components (tICs), which project the entire dataset along the slowest kinetic degrees of freedom. Overall, the highest barrier along Pathway 1 is 5.8 ± 0.7 kcal/mol, and it is associated with the entry of cholesterol into the TMD (Figure 2—Figure Supplement 2).

      Changes made: (Paragraph 3, Results subsection 2)

      “On plotting the first two components of tICs, (Figure 2—Figure Supplement 2), we observe that the energetic barrier between η and θ is ∼6.5 ± 0.8 kcal/mol.”

      (3) Missing bracket on line 577.

      We thank the reviewer for this suggestion. The typo has been fixed.

      (4) Line 577: Fig. S2nd?

      We thank the reviewer for this suggestion. This typo has been fixed.

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a study combining molecular dynamics simulations and Hedgehog (Hh) pathway assays to investigate cholesterol translocation pathways to Smoothened (SMO), a G protein-coupled receptor central to Hedgehog signal transduction. The authors identify and characterize two putative cholesterol access routes to the transmembrane domain (TMD) of SMO and propose a model whereby cholesterol traverses through the TMD to the cysteine-rich domain (CRD), which is presented as the primary site of SMO activation. The MD simulations and biochemical experiments are carefully executed and provide useful data.

      Weaknesses:

      However, the manuscript is significantly weakened by a narrow and selective interpretation of the literature, overstatement of certain conclusions, and a lack of appropriate engagement with alternative models that are well-supported by published data-including data from prior work by several of the coauthors of this manuscript. In its current form, the manuscript gives a biased impression of the field and overemphasizes the role of the CRD in cholesterol-mediated SMO activation. Below, I provide specific points where revisions are needed to ensure a more accurate and comprehensive treatment of the biology.

      (1) Overstatement of the CRD as the Orthosteric Site of SMO Activation

      The manuscript repeatedly implies or states that the CRD is the orthosteric site of SMO activation, without adequate acknowledgment of alternative models. To give just a few examples (of many in this manuscript):

      (a) “PTCH is proposed to modulate the Hh signal by decreasing the ability of membrane cholesterol to access SMO’s extracellular cysteine-rich domain (CRD)” (p. 3).

      (b) “In recent years, there has been a vigorous debate on the orthosteric site of SMO” (p. 3).

      (c) “cholesterol must travel through the SMO TMD to reach the orthosteric site in the CRD” (p. 4).

      (d) “we observe cholesterol moving along TM6 to the TMD-CRD interface (common pathway, Fig. 1d) to access the orthosteric binding site in the CRD” (p. 6).

      While the second quote in this list at least acknowledges a debate, the surrounding text suggests that this debate has been entirely resolved in favor of the CRD model. This is misleading and not reflective of the views of other investigators in the field (see, for example, a recent comprehensive review from Zhang and Beachy, Nature Reviews Molecular and Cell Biology 2023, which makes the point that both the CRD and 7TM sites are critical for cholesterol activation of SMO as well as PTCH-mediated regulation of SMO-cholesterol interactions).

      In contrast, a large body of literature supports a dual-site model in which both the CRD and the TMD are bona fide cholesterol-binding sites essential for SMO activation. Examples include:

      (a) Byrne et al., Nature 2016: point mutation of the CRD cholesterol binding site impairs-but does not abolish-SMO activation by cholesterol (SMO D99A, Y134F, and combination mutants - Fig 3 of the 2016 study).

      (b) Myers et al., Dev Cell 2013 and PNAS 2017: CRD deletion mutants retain responsiveness to PTCH regulation and cholesterol mimetics (similar Hh responsiveness of a CRD deletion mutant is also observed in Fig. 4 Byrne et al, Nature 2016).

      (c) Deshpande et al., Nature 2019: mutation of residues in the TMD cholesterol binding site blocks SMO activation entirely, strongly implicating the TMD as a required site, in contrast to the partial effects of mutating or deleting the CRD site.

      Qi et al., Nature 2019, and Deshpande et al., Nature 2019, both reported cholesterol binding at the TMD site based on high-resolution structural data. Oddly, Deshpande et al., Nature 2019, is not cited in the discussion of TMD binding on p. 3, despite being one of the first papers to describe cholesterol in the TMD site and its necessity for activation (the authors only cite it regarding activation of SMO by synthetic small molecules).

      Kinnebrew et al., Sci Adv 2022 report that CRD deletion abolished PTCH regulation, which is seemingly at odds with several studies above (e.g., Byrne et al, Nature 2016; Myers et al, Dev Cell 2013); but this difference may reflect the use of an N-terminal GFP fusion to SMO in the Kinnebrew et al 2022, which could alter SMO activation properties by sterically hindering activation at the TMD site by cholesterol (but not synthetic SMO agonists like SAG); in contrast, the earlier work by Byrne et al is not subject to this caveat because it used an untagged, unmodified form of SMO.

      Although overexpression of PTCH1 and SMO (wild-type or mutant) has been noted as a caveat in studies of CRD-independent SMO activation by cholesterol, this reviewer points out that several of the studies listed above include experiments with endogenous PTCH1 and low-level SMO expression, demonstrating that SMO can clearly undergo activation by cholesterol (as well as regulation by PTCH1) in a manner that does not require the CRD.

      Recommendation: The authors should revise the manuscript to provide a more balanced overview of the field and explicitly acknowledge that the CRD is not the sole activation site. Instead, a dual-site model is more consistent with available structural, mutational, and functional data. In addition, the authors should reframe their interpretation of their MD studies to reflect this broader and more accurate view of how cholesterol binds and activates SMO.

      We thank the reviewer for this comprehensive overview of the existing literature. We agree that cholesterol binding to both the TMD and CRD sites is required for full activation of SMO. As described below in responses to comments, we have made changes to the manuscript to make this point clear. For instance, in the revised manuscript, we refrain from calling the CRD cholesterol binding site the “orthosteric site”. Instead, we highlight that the goal of the manuscript is not to resolve the debate over whether the TMD or CRD site is more important for PTCH1 regulation by SMO but rather to use molecular dynamics to understand the fascinating question of how cholesterol in the membrane can reach the CRD, located at a significant distance above the outer leaflet of the membrane. We believe that this is an important goal since there is an abundance of evidence that supports the view that PTCH1 inhibits SMO by reducing cholesterol access to the CRD. This evidence is now summarized succinctly in the introduction:

      Changes made: (Paragraph 4, Introduction)

      “While cholesterol binding to both the TMD and CRD sites is required for full SMO activation, our work focuses on how cholesterol gains access to the CRD site, perched above the outer leaflet of the membrane (Luchetti et al., 2016; Kinnebrew et al., 2022). Multiple lines of evidence suggest that PTCH1-regulated cholesterol binding to the CRD plays an instructive role in SMO regulation both in cells and animals. Mutations in residues predicted to make hydrogen bonds with the hydroxyl group of cholesterol bound to the CRD reduced both the potency and efficacy of SHH in cellular signaling assays (Kinnebrew et al., 2022; Byrne et al., 2016) and, more importantly, eliminated HH signaling in mouse embryos (Xiao et al., 2017). Experiments using both covalent and photocrosslinkable sterol probes in live cells directly show that PTCH1 activity reduces sterol access to the CRD (Kinnebrew et al., 2022; Xiao et al., 2017). Notably, our simulations evaluate a path of cholesterol translocation that includes both the TMD and CRD sites: cholesterol first enters the 7-transmembrane domain bundle from the membrane; it then engages the TMD site before continuing along a conduit to the CRD site. Thus, we analyze translocation energetics and residue-level contacts along a path that includes both the TMD and the CRD.”

      However, Reviewer 3 makes several comments below that are biased, inaccurate, or selective. We feel it is important to address these so readers can approach the literature from a balanced perspective. Indeed, the eLife review forum provides an ideal venue to present contrasting views on a scientific model. We encourage the editors to publish both Reviewer 3’s comments and our response in full so readers can read the original papers and reach their own conclusions. It is important to note these issues are not relevant to the quality of the computational and experimental data presented in this paper.

      We have now removed the term “orthosteric” to describe the CRD site throughout the paper and clearly state in the introduction that “both the CRD and TMD sites are required for SMO activation” but that our focus is on how cholesterol moves from the membrane to the CRD site. There is no doubt that cholesterol binding to the CRD plays a key role in SMO activation– our focus on this path is justified and does not devalue the importance of the TMD site. Our prior models (see Figure 7 of Kinnebrew 2022 explicitly include contributions of both sites).

      Now we respond to some of the concerns outlined, individually:

      (1) Byrne et al., Nature 2016: point mutation of the CRD cholesterol binding site impairs-but does not abolish-SMO activation by cholesterol (SMO D99A, Y134F, and combination mutants - Fig 3 of the 2016 study)

      The fact that a point mutation dramatically diminishes (but does not abolish signaling) does not mean that the CRD cholesterol binding site is not important for SMO regulation. Indeed, the reviewer fails to mention that Song et. al. (Molecular Cell, 2017) found that a SMO protein carrying a subtle mutation at D99 (D95/99N, a residue that makes a hydrogen bond with the cholesterol hydroxyl) completely abolishes SMO signaling in mouse embryos. Thus, the CRD site is critical for SMO activation in an intact animal, justifying our focus on evaluating the path of cholesterol translocation to the CRD site.

      (2) Myers et al., Dev Cell 2013 and PNAS 2017: CRD deletion mutants retain responsiveness to PTCH regulation and cholesterol mimetics (similar Hh responsiveness of a CRD deletion mutant is also observed in Fig 4 Byrne et al, Nature 2016).

      The Reviewer fails to note that CRD-deleted versions of SMO have markedly (>10-fold) higher basal (i.e. ligand-independent) activity compared to full-length SMO. The response to SHH is minimal (∼2-fold), compared to >50-100-fold with full-length SMO. Thus, CRD-deleted SMO is likely in a non-native conformation. Local changes in cholesterol accessibility caused by PTCH1 inactivation or cholesterol loading can cause small fluctuations in delta-CRD activity, but this cannot be used to infer meaningful insights about how native, full-length SMO (with >10-fold lower basal activity) is regulated. We encourage the reviewer to read our previous paper (Kinnebrew et. al. 2022), which presents a unified view of how the TMD and CRD sites together regulate SMO activation.

      A more physiological experiment, reported in Kinnebrew et. al. 2022, tested mutations in residues that make hydrogen bonds with cholesterol at the CRD and TMD sites in the context of full-length SMO. These mutants were stably expressed at moderate levels in Smo<sup>−/−</sup> cells. Mutations at the CRD site reduced the fold-increase in signaling output in response to SHH, as would be expected for a PTCH1-regulated site. In contrast, analogous mutations in the TMD site reduced the magnitude of both basal and maximal signaling, without affecting the fold-change in response to SHH. In signaling assays, the key parameter in evaluating the impact of a mutation is whether it impacts the change in output in response to a signal (in this case PTCH1 inactivation by SHH). A mutation in SMO that affects PTCH1 regulation is expected to decrease the fold-change in signaling in response to SHH, a criterion that is fulfilled by mutations in the CRD site. Accordingly, mutations in the CRD site abolish SMO signaling in mouse embryos (Xiao et al., 2017).

      (3) Deshpande et al., Nature 2019: mutation of residues in the TMD cholesterol binding site blocks SMO activation entirely, strongly implicating the TMD as a required site, in contrast to the partial effects of mutating or deleting the CRD site.

      Introduction of bulky mutations at the TMD site (V333F) that abolish SMO activity were first reported by Byrne et. al. 2016 and were used to markedly increase the stability of SMO for protein expression. These mutations indeed stabilize the inactive state of SMO, increasing protein abundance and completely preventing its localization at primary cilia. SMO variants carrying such bulky mutations cannot be used to infer the importance of the TMD site since they do not distinguish between the following possibilities: (1) SMO is inactive because the sterol cannot bind, or (2) SMO is inactive because it is locked in an inactive conformation, or (3) SMO is inactive because it cannot localize to primary cilia (where it must be localized to activate downstream signaling).

      As described in Response 3.3, a better evaluation of the importance of the TMD site is the use of mutations in residues that make hydrogen bonds with the hydroxyl group of TMD cholesterol. These mutations do not markedly increase protein stability or prevent ciliary localization (Kinnebrew 2022, Fig.S2). While a TMD site mutation decreases the magnitude of maximal (and basal) SMO signaling, it does not impact the fold-increase in signal output in response to Hh ligands (the key parameter that should be used to evaluate PTCH1 activity).

      (4) Qi et al., Nature 2019, and Deshpande et al., Nature 2019, both reported cholesterol binding at the TMD site based on high-resolution structural data. Oddly, Deshpande et al., Nature 2019 not cited in the discussion of TMD binding on p. 3, despite being one of the first papers to describe cholesterol in the TMD site and its necessity for activation (the authors only cite it regarding activation of SMO by synthetic small molecules)

      The reference has now been added at this location in the manuscript.

      (5) Kinnebrew et al., Sci Adv 2022 report that CRD deletion abolished PTCH regulation, which is seemingly at odds with several studies above (e.g., Byrne et al, Nature 2016; Myers et al, Dev Cell 2013); but this difference may reflect the use of an N-terminal GFP fusion to SMO in the Kinnebrew et al 2022, which could alter SMO activation properties by sterically hindering activation at the TMD site by cholesterol (but not synthetic SMO agonists like SAG); in contrast, the earlier work by Byrne et al is not subject to this caveat because it used an untagged, unmodified form of SMO.

      The reviewer fails to note that CRD deleted versions of SMO have markedly (>10-fold) higher basal activity than full-length SMO. The response to SHH is minimal (∼2fold), compared to >50-fold with full-length SMO. Thus, CRD-deleted SMO is likely in a non-native conformation. Local changes in cholesterol accessibility caused by PTCH1 inactivation or cholesterol loading can cause small fluctuations in delta-CRD activity, but this cannot be used to infer meaningful insights about how native, full-length SMO (with >10-fold lower basal activity) is regulated. Please see Response 3.3 for further details.

      Reviewer 3 presents an incomplete picture of the extensive experiments reported in Kinnebrew et. al. to establish the functionality of YFP-tagged delta-CRD SMO. Most importantly, a TMDselective sterol analog (KK174) can fully activate YFP-tagged delta-CRD, showing conclusively that the YFP fusion does not block sterol access to the TMD site. The fact that this protein is nearly unresponsive to SHH highlights the critical role of the CRD-bound cholesterol in SMO regulation by PTCH1. Indeed, the YFP-tagged, CRD-deleted SMO was made purposefully to test the requirement of the CRD in a construct that had normal basal activity. Again, this data justifies the value of investigating the path of cholesterol movement from the membrane via the TMD site to the CRD.

      (6) Although overexpression of PTCH1 and SMO (wild-type or mutant) has been noted as a caveat in studies of CRD-independent SMO activation by cholesterol, this reviewer points out that several of the studies listed above include experiments with endogenous PTCH1 and low-level SMO expression, demonstrating that SMO can clearly undergo activation by cholesterol (as well as regulation by PTCH1) in a manner that does not require the CRD.

      This comment is inaccurate. The data presented in Deshpande et. al. (and prior work in Myers et. al.) used transient transfection to overexpress SMO in Smo<sup>−/−</sup> cells. At the individual cell level transient transfection produces expression levels that are markedly higher (10-1000-fold) than stable expression (in addition to being more variable). Most scientists would agree that stable expression (as used in Kinnebrew 2022) at a moderate expression level is a better system to compare mutant phenotypes, assess basal and activated signaling, and provide an accurate measure of the fold-change in signal output in response to SHH. Notably, introduction of a mutation in the CRD cholesterol binding site at the endogenous mouse Smo locus (an even better experiment than stable expression) leads to complete loss of SMO activity (PMID 28344083). This result again justifies our investigation of the pathway of cholesterol movement from the membrane to the CRD site.

      We have changed the initial discussion and reflect a more general outlook.

      Changes made: (Paragraph 1, Introduction)

      “PTCH modulates the availability of accessible cholesterol at the primary cilium and thereby regulates SMO, with models invoking effects on both the CRD and 7TM pockets.”

      Changes made: (Results subsection 3, paragraph 1)

      “According to the dual-site model, to reach the binding site in the CRD (ζ), cholesterol translocate along the TMD-CRD interface from the TM binding site (α∗) is required.”

      Added lines: (Paragraph 5, Results subsection 3):

      “The computational investigation showed here covers the dual-site model, where cholesterol reaches the CRD site via binding to the TM binding site first. In comparison to the CRD site, the TM site is more stable by ∼ 2 kcal/mol (Figure 2—Figure Supplement 3b, d).”

      Added lines: (Paragraph 2, Conclusions):

      “Here we have explored the role the CRD-site plays in SMO activation. In addition, through simulating the CRD site-dependent SMO activation hypothesis, we have also simulated the TMD site-dependent activation. We show that the overall stability of cholesterol is higher than the CRD site by ∼ 2 kcal/mol.”

      (2) Bias in Presentation of Translocation Pathways

      The manuscript presents the model of cholesterol translocation through SMO to the CRD as the predominant (if not sole) mechanism of activation. Statements such as: "Cholesterol traverses SMO to ultimately reach the CRD binding site" (p. 6) suggest an exclusivity that is not supported by prior literature in the field. Indeed, the authors’ own MD data presented here demonstrate more stable cholesterol binding at the TMD than at the CRD (p 17), and binding of cholesterol to the TMD site is essential for SMO activation. As such, it is appropriate to acknowledge that cholesterol may activate SMO by translocating through the TM5/6 tunnel, then binding to the TMD site, as this is a likely route of SMO activation in addition to the CRD translocation route they highlight in their discussion.

      The authors describe two possible translocation pathways (Pathway 1: TM2/3 entry to TMD; Pathway 2: TM5/6 entry and direct CRD transfer), but do not sufficiently acknowledge that their own empirical data support Pathway 2 as more relevant. Indeed, because their experimental data suggest Pathway 2 is more strongly linked to SMO activation, this pathway should be weighted more heavily in the authors’ discussion. In addition, Pathway 2 is linked to cholesterol binding to both the TMD and CRD sites (the former because the TMD binding site is at the terminus of the hydrophobic tunnel, the latter via the translocation pathway described in the present manuscript), so it is appropriate that Pathway 2 figures more prominently than Pathway 1 in the authors’ discussion.

      The authors also claim that "there is no experimental structure with cholesterol in the inner leaflet region of SMO TMD" (p 16). However, a structural study of apo-SMO from the Manglik and Cheng labs (Zhang et al., Nat Comm, 2022) identified a cholesterol molecule docked at the TM5/6 interface and also proposed a "squeezing" mechanism by which cholesterol could enter the TM5/6 pocket from the membrane. The authors do not consider this SMO conformation in their models, nor do they discuss the possibility that conformational dynamics at the TM5/6 interface could facilitate cholesterol flipping and translocation into the hydrophobic conduit, despite both possibilities having precedent in the 2022 empirical cryoEM structural analysis.

      Recommendation: The authors should avoid oversimplifying the SMO cholesterol activation process, either by tempering these claims or broadening their discussion to better reflect the complexity and multiplicity of cholesterol access and activation routes for SMO. They should also consider the 2022 apo-SMO cryoEM structure in their analysis of the TM5/6 translocation pathway.

      We thank the reviewer for this comprehensive overview of the existing literature and parts we have missed to include in the discussion. We agree with the reviewer, since our data shows that both pathways are probable. Through our manuscript, we have avoided using a competitive approach (that one pathway dominates over the other). Instead, we have evaluated both pathways independently and presented a comparative rather than competitive overview of both pathways from our observations. While we agree that experimental evidence suggests the inner leaflet pathway is possible, we cannot discount the observations made in previous studies that support the outer leaflet pathway, particularly Hedger et al. (2019), Bansal et al. (2023), and Kinnebrew et al. (2021). Therefore, considering the reviewer’s comments have made the following changes:

      (1) Added lines: (Paragraph 3, Conclusions):

      “We show that the barriers associated with the pathway starting from the outer leaflet are lower by ∼0.7 kcal, (p=0.0013). We also provide evidence that cholesterol can enter SMO via both leaflets, considering that multiple computational and experimental studies have found cholesterol entry sites and activation modulation via the outer leaflet, between TM2TM3. This is countered by evidence from multiple experimental and computational studies corroborating entry via the inner leaflet, between TM5-TM6, including this study. Overall, we posit that cholesterol translocation from either pathway is feasible.”

      (2)nChanges made: (Paragraph 6, Results subsection 2)

      “Based on our experimental and computational data, we conclude that cholesterol translocation can happen via either pathway. This is supported on the basis of the following observations: mutations along pathway 2 affect SMO activity more significantly, and the presence of a direct conduit that connects the inner leaflet to the TMD binding site. In addition, a resolved structure of SMO in the presence of cholesterol shows a cholesterol situated at the entry point from the membrane into the protein between TM5 and TM6, in the inner leaflet. However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol vs. 6.5 ± 0.8 kcal/mol, p \= 0.0013). Additionally, PTCH1 controls cholesterol accessibility in the outer leaflet. This shows that there is a possibility for transport from both leaflets. One possibility that might alter the thermodynamic barriers is native membrane asymmetry, particularly the anionic lipid-rich inner leaflet. This presents as a limitation of our current model.”

      (3)nChanges made: (Paragraph 1, Results subsection 2)

      “In a structure resolved in 2022, cholesterol was observed at the interface between the protein and the membrane, in the inner leaflet, between TMs 5 and 6. However, cholesterol in the inner leaflet has a downward orientation, with the polar hydroxyl group pointing intracellularly (η). A striking observation is that this cholesterol binding site pose was never used as a starting point for simulations and was discovered independent of the pose described in Zhang et al. (2022) (Figure 4—Figure Supplement 1).”

      (3) Alternative Possibility: Direct Membrane Access to CRD

      The possibility that the CRD extracts cholesterol directly from the membrane outer leaflet is not considered. While the crystal structures place the CRD in a stable pose above the membrane, multiple cryo-EM studies suggest that the CRD is dynamic and adopts a variety of conformations, raising the possibility that the stability of the CRD in the crystal structures is a result of crystal packing and that the CRD may be far more dynamic under more physiological conditions.

      Recommendation: The authors should explicitly acknowledge and evaluate this potential mechanism and, if feasible, assess its plausibility through MD simulations.

      We thank the reviewer for the suggestion. We have addressed this comment by calculating the distance from the lipid headgroups for each lipid in the membrane to the cholesterol binding site. We show that in our study, we do not observe any bending of the CRD over the membrane, precluding any cholesterol from being extracted from the membrane directly.

      Added lines: (Paragraph 3, Conclusions):

      “An alternative possibility states that the flexibility associated with the CRD would allow it to directly access the membrane, and consequently, cholesterol. In the extensive simulations reported in this study, the binding site of cholesterol in the CRD remains at least 20 Å away from the nearest lipid head group in the membrane, suggesting that such direct extraction and the bending of the CRD do not occur within the timescales sampled (Appendix 2 – Figure 6).

      The mechanistic details of this process are still unexplored and form the basis of future work.”

      (4) Inconsistent Framing of Study Scope and Limitations

      The discussion contains some contradictory and misleading language. For example, the authors state that "In this study we only focused on the cholesterol movement from the membrane to the CRD binding site," and then several sentences later state that "We outline the entire translocation mechanism from a kinetic and thermodynamic perspective." These statements are at odds. The former appropriately (albeit briefly) notes the limited scope of the modeling, while the latter overstates the generality of the findings.

      In addition, the authors’ narrow focus on the CRD site constitutes a major caveat to the entire work. It should be acknowledged much earlier in the manuscript, preferably in the introduction, rather than mentioned as an aside in the penultimate paragraph of the conclusion.

      Recommendation: The authors should clarify the scope of the study and expand the discussion of its limitations. They should explicitly acknowledge that the study models one of several cholesterol access routes and that the findings do not rule out alternative pathways.

      We thank the reviewer for the suggestion. We have addressed this comment by explicitly mentioning the scope of the study.

      Changes made: (Paragraph 3, Conclusions)

      “We outline the entire translocation mechanism from a kinetic and thermodynamic perspective for one of the leading hypotheses for the activation mechanism of SMO.”

      (5) Summary:

      This study has the potential to make a useful contribution to our understanding of cholesterol translocation and SMO activation. However, in its current form, the manuscript presents an overly narrow and, at times, misleading view of the literature and biological models; as such, it is not nearly as impactful as it could be. I strongly encourage the authors to revise the manuscript to include:

      (1) A more balanced discussion of the CRD vs. TMD binding sites.

      (2) Acknowledgment of alternative cholesterol access pathways.

      (3) More comprehensive citation of prior structural and functional studies.

      (4) Clarification of assumptions and scope.

      Of note, the above suggestions require little to no additional MD simulations or experimental studies, but would significantly enhance the rigor and impact of the work.

      We thank the reviewer for the suggestions. We have taken into account the literature and diverse viewpoints. We have changed the initial discussion and reflected a more general outlook. In the revised version of the manuscript, we have refrained from referring to the CRD site as the orthosteric site. Instead, we refer to it as the CRD sterol-binding site. To better represent the dual-site model, we add further discussion in the Introduction. Through our manuscript, we have avoided using a competitive approach (that one pathway dominates over the other). Instead, we have evaluated both pathways independently and presented a comparative rather than competitive overview of both pathways from our observations. We explicitly mention the scope of the study.

    1. eLife Assessment

      This study addresses a key, long-standing question about how visual feature selectivity is organized in mid-level visual cortex, using an ambitious combination of large-scale neural recordings and image synthesis. It provides important insights into the complexity of single-neuron selectivity and suggests a structured organization across cortical depth. While the evidence is generally solid and technically impressive, several key claims would be strengthened by additional controls, particularly regarding the sources of similarity across neurons and the dependence of the results on modeling choices.

    2. Reviewer #1 (Public review):

      Willeke et al. hypothesize that macaque V4, like other visual areas, may exhibit a topographic functional organization. One challenge to studying the functional (tuning) organization of V4 is that neurons in V4 are selective for complex visual stimuli that are hard to parameterize. Thus, the authors leverage an approach comprising digital twins and most exciting stimuli (MEIs) that they have pioneered. This data-driven, deep-learning framework can effectively handle the difficulty of parametrizing relevant stimuli. They verify that the model-synthesized MEIs indeed drive V4 neurons more effectively than matched natural image controls. They then performed psychophysics experiments (on humans) along with the application of contrastive learning to illustrate that anatomically neighboring neurons often care about similar stimuli. Importantly, the weaknesses of the approach are clearly appreciated and discussed.

      Comments:

      (1) The correlation between predictions and data is 0.43. I'd agree with the authors that this is "reliable" and would recommend that they discuss how the fact that performance is not saturated influences the results.

      (2) Modeling V4 using a CNN and claiming that the identified functional groups look like those found in artificial vision systems may be a bit circular.

      (3) No architecture other than ResNet-50 was tested. This might be a major drawback, since the MEIs could very well be reflections of the architecture and also the statistics of the dataset, rather than intrinsic biological properties. Do the authors find the same result with different architectures as the basis of the goal-driven model?

      (4) The closed-loop analysis seems to be using a much smaller sample of the recorded neurons - "resulting in n=55 neurons for the analysis of the closed-loop paradigm".

      (5) A discussion on adversarial machine learning and the adversarial training that was used is lacking.

    3. Reviewer #2 (Public review):

      This is an ambitious and technically powerful study, investigating a long-standing question about the functional organization of area V4. The project combined large-scale single-unit electrophysiology in macaque V4 with deep learning-based activation maximization to characterize neuronal tuning in natural image space. The authors built predictive encoding models for V4 neurons and used these models to synthesize most exciting images (MEIs), which are subsequently validated in vivo using a closed-loop experimental paradigm.

      Overall, the manuscript advances three main claims:

      (1) Individual V4 neurons showed complex and highly structured selectivity for naturalistic visual features, including textures, curvatures, repeating patterns, and apparently eye-like motifs.

      (2) Neurons recorded along the same linear probe penetration tended to have more similar MEIs than neurons recorded at different cortical locations (this similarity was supported by human psychophysics and by distances in a learned, contrastive image embedding space).

      (3) MEIs clustered into a limited number of functional groups that resembled feature visualizations observed in deep convolutional neural networks.

      Strengths:

      (1) The study is important in that it is the first to apply activation maximization to neurons sampled at such fine spatial resolution. The authors used 32-channel linear silicon probes, spanning approximately 2 mm of cortical depth, with inter-contact spacing of roughly 60 µm. This enabled fine sampling across most of the cortical thickness of V4, substantially finer resolution than prior Utah-array or surface-biased approaches.

      (2) A key strength is the direct in vivo validation of model-derived synthetic images by stimulating the same neurons used to build the models, a critical step often absent in other neural network-based encoding studies.

      (3) More broadly, the study highlights the value of probing neuronal selectivity with rich, naturalistic stimulus spaces rather than relying exclusively on oversimplified stimuli such as Gabors.

      Weaknesses:

      (1) A central claim is that neurons sampled within the same penetration shared MEI tuning properties compared to neurons sampled in different penetrations because of functional organization. I am concerned about technical correlations in activity due to technical or methodology-related approaches (for example, shared reference or grounding) instead of functional organization alone. These recordings were obtained with linear silicon probes, and there have been observations that neuronal activity along this type of probe (including neuropixels probes) may be correlated above what prior work showed, using manually advanced single electrodes. For example, Fujita et al. (1992) showed finer micro-domains and systematic changes in selectivity along a cortical penetration, and it is not clear if that is true or detectable here. I think that the manuscript would be strengthened by a more thorough and explicit characterization of lower-level response correlations (at the neuronal electrophysiology level) prior to starting with fitting models. In particular, the authors could examine noise correlations along the electrode shaft (using the repeated test images, for example), as well as signal correlations in tuning, both within and across sessions. It would also be helpful to clarify whether these correlations depended on penetration day, recording chamber hole (how many were used?), or spatial separation between penetrations, and whether repeated use of the same hole yielded stable or changing correlations. Illustrations of the peristimulus time histogram changes across the shaft and across penetrations would also help. All of this would help us understand if the reports of clustering were technically inevitable due to the technique.

      (2) It is difficult to understand a story of visual cortex neurons without more information about their receptive field locations and widths, particularly given that the stimulus was full-screen. I understand that there was a sparse random dot stimulus used to find the population RF, so it should be possible to visualize the individual and population RFs. Also, the investigators inferred the locations of the important patches using a masking algorithm, but where were those masks relative to the retinal image, and how distributed were they as a function of the shaft location? This would help us understand how similar each contact was.

      (3) A major claim is that V4 MEIs formed groups that were comparable to those produced by artificial vision systems, "suggesting potential shared encoding strategies." The issue is that the "shared encoding strategy" might be the authors' use of this same class of models in the first place. It would be useful to know if different functional groups arise as a function of other encoding neural network models, beyond the robust-trained ResNet-50. I am unsure to what extent the reported clustering, depth-wise similarity, and correspondence to artificial features depended on architectural and training bias. It would substantially strengthen the manuscript to test whether a similar organizational structure would emerge using alternative encoding models, such as attention-based vision transformers, self-supervised visual representations, or other non-convolutional architectures. Another important point of contrast would be to examine the functional groups encoded by the ResNet architecture before its activations were fit to V4 neuronal activity: put simply, is ResNet just re-stating what it already knows?

      (4) Several comparisons to prior work are presented largely at a qualitative level, without quantitative support. For example, the authors state that their MEIs are consistent with known tuning properties of macaque V4, such as selectivity for shape, curvature, and texture. However, this claim is not supported by explicit image analyses or metrics that would substantiate these correspondences beyond appeal to visual inspection. Incorporating quantitative analyses, for instance, measures of curvature, texture statistics, or comparisons to established stimulus sets, would strengthen these links to prior literature and clarify the relationship between the synthesized MEIs and previously characterized V4 tuning properties.