10,000 Matching Annotations
  1. Jun 2025
    1. eLife Assessment

      This valuable work proposes a novel, rapid S. aureus entry mechanism via Ca²⁺-dependent lysosomal exocytosis and acid sphingomyelinase release, which influences bacterial sub-cellular fate. However, reliance on chemical inhibitors and the absence of a knockout phenotype weakens the overall impact, making the study incomplete.

    2. Reviewer #2 (Public review):

      In the manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype for genetic knock out is a major weakness. While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASM-mediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ?

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret. Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment ? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection.

    1. eLife Assessment

      The authors modified a common method to induce epilepsy in mice to provide an improved approach to screen new drugs for epilepsy. This is important because of the need to develop new drugs for patients who are refractory to current medications. The authors' method evokes seizures to circumvent a low rate of spontaneous seizures and the approach was validated using two common anti-seizure medications. The strength of evidence was solid, making the study invaluable, but there were some limitations to the approach and methods.

    2. Reviewer #1 (Public review):

      Summary:

      This important study by Chen et. al. describes a novel approach for optogentically evoking seizures in an etiologically relevant mouse model of epilepsy. The authors developed a model that can trigger seizures "on demand" using optogenetic stimulation of CA1 principal cells in mice rendered epileptic by an intra-hippocampal kainate (IHK) injection into CA3. The authors discuss their model in the context of the limitations of current animal models used in epilepsy drug development. In particular, their model addresses concerns regarding existing models where testing typically involves inducing acute seizures in healthy animals or waiting on infrequent, spontaneous seizures in epileptic animals.

      Strengths:

      A strength of this manuscript is that this approach may facilitate the evaluation of novel therapeutics since these evoked seizures, despite having some features that were significantly different from spontaneous seizures, are suggested to be sufficiently similar to spontaneous seizures which are more laborious to analyze. The data demonstrating the commonality of pharmacology and EEG features between evoked seizures and spontaneous seizures in epileptic mice, while also being different from evoked seizures in naïve mice, are convincing. The structural, functional, and behavioral differences between a seizure-naïve and epileptic mouse, which emerge due to the enduring changes occurring during epileptogenesis, are complex and important. Accordingly, this study highlights the importance of using mice that have underwent epileptogenesis as model organisms for testing novel therapeutics. Furthermore, this study positively impacts the wider epilepsy research community by investigating seizure semiology in these populations.

      Weaknesses:

      This study convincingly demonstrates that the feature space measurements for stimulus-evoked seizures in epileptic mice were significantly different from those in naïve mice; this result allows the authors to conclude that "seizures induced in chronically epileptic animals differed from those in naïve animals". However, the authors also conclude that "induced seizures resembled naturally occurring spontaneous seizures in epileptic animals" despite their own data demonstrating similar, albeit fewer, significant differences in feature space measurements. It is unclear if and what the threshold is whereby significant differences in these feature space measurements lead to the conclusion that the differences are meaningful, as in the comparison of epileptic and naïve mice, or not meaningful, as in the comparison of evoked and spontaneous seizures.

    3. Reviewer #2 (Public review):

      The authors aimed to develop an animal model of temporal lobe epilepsy (TLE) that will generate "on-demand" seizures and an improved platform to advance our ability to find new anti-seizure drugs (ASDs) for drug-resistant epilepsy (DRE). Unlike some of the work in this field, the authors are studying actual seizures, and hopefully events that are similar to actual epileptic seizures. To develop an optimized screening tool, however, one also needs high-throughput systems with actual seizures as a quantitative, rigorous, and reproducible outcome measures. The authors aim to provide such a model; however, this approach may be over-stated here and seems unlikely to address the critical issue of drug resistance, which is their most important claim.

      Strengths:

      - The authors have generated an animal model of "on demand" seizures, which could be used to screen new ASDs and potentially other therapies. The authors and their model make a good-faith effort to emulate the epileptic condition and to use seizure susceptibility or probability as a quantitative output measure.

      - The events considered to be seizures appear to be actual seizures, with some evidence that the seizures are different from seizures in the naïve brain. Their effort to determine how different ASDs raise seizure probability or threshold to an optogenetic stimulus to the CA1 area of the rodent hippocampus is focused on an important problem, as many if not most ASD screening uses surrogate measures that may not be as well linked to actual epileptic seizures.

      - Another concern is their stimulation of dorsal hippocampus, while ventral hippocampus would seem more appropriate.

      - Use of optogenetic techniques allows specific stimulation of the targeted CA1 pyramidal cells, and it appears that this approach is reproducible and reliable with quantitative rigor.

      - The authors have taken on a critically important problem, and have made a good-faith effort to address many of the technical concerns raised in the reviews, but the underlying problem of DRE remains.

      Weaknesses:

      - Although the model has potential advantages, it also has disadvantages. As stated by the authors, the pre-test work-load to prepare the model may not be worth the apparent advantages. And most important, the paper frequently mentions DRE but does not directly address it, and yet drug resistance is the critical issue in this field.

      - Although the paper shows examples of actual seizures, there remains some concern that some of the events might not be seizures - or a homogeneous population of seizures. More quantitative assessment of the electrical properties (e.g., duration) of the seizures and their probability is likely to be more useful than the proposed quantification in the future of the behavioral seizure stages, because the former could be both more objective and automated, while the behavioral analysis of the seizures will likely be more subjective and less reliable (and also fraught with subjectivity and analytical problems). Nonetheless, the authors point that the presence of "Racine 3 or above" behavioral seizures (in addition to their electrical data) is a good argument that many (if not all) of the "seizures" are actual epileptic seizures.

      - Optogenetic stimulation of CA1 provides cell-specificity for the stimulation, but it is not clear that this method would actually be better than electrical stimulation of a kindled rodent with superimposed hippocampal injury. The reader is unfortunately left with the concern of whether this model would be easier and more efficacious than kindling.

      - Although the authors have taken on a critically important problem, and have combined a variety of technologies, this approach may facilitate more rapid screening of ASDs against actual seizures (beneficial), but it does not really address the fundamentally critical yet difficult problem of DRE. A critical issue for DRE that is not well-addressed relates to adverse effects, which is often why many ASDs are not well tolerated by many patients (e.g., LEV). Thus, we are left with: how does this address anti-seizure DRE?

      - The focus of this paper seems to be more on seizures more than on epilepsy. In the absence of seizure spontaneity, the work seems to primarily address the issues of seizure spread and duration. Although this is useful, it does not seem to be addressing the question of what trips the system to generate a seizure.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      - The authors seem to have developed a new and useful model; however, it is not clear how this will address that core problem of DRE, which was their stated aim.

      - A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      - As stated before in the original review, the potential impact would primarily be aimed at the ETSP or a drug-testing CRO; however, much more work will be required to convince the epilepsy community that this approach will actually identify new ASDs for DRE. The approach is potentially time-consuming with a steep and potentially difficult optimization curve, and thus may not be readily adaptable to the typical epilepsy-models neuroscience laboratory.

      Any additional context you think would help readers interpret or understand the significance of the work:

      - The problem of DRE is much more complicated than described by the authors here; however, the paper could end up being more useful than is currently apparent. Although this work could be seen as technically - and maybe conceptually - elegant and a technical tour de force, will it "deliver on the promise"? Is it better than kindling for DRE? In attempting to improve the discovery process, how will the model move us to another level? Will this model really be any better than others, such as kindling?

    4. Reviewer #3 (Public review):

      This revised paper develops and characterizes a new approach for screening drugs for epilepsy. The idea is to increase the ability to study seizures in animals with epilepsy because most animal models have rare seizures. Thus, the authors use the existing intrahippocampal kainic acid (IHKA) mouse model, which can have very unpredictable seizures with long periods of time between seizures. This approach is of clear utility to researchers who may need to observe many seizure events per mouse during screening of antiseizure medications. A key strength is also that more utility can be derived from each individual mouse. The authors modified the IHKA model to inject KA into CA3 instead of CA1 in order to preserve the CA1 pyramidal cells that they will later stimulate. To express the excitatory opsin channelrhodopsin (ChR2) in area CA1, they use a virus that expresses ChR2 in cells that express the Thy-1 promoter. The authors demonstrate that CA3 delivery of KA can induce a very similar chronic epilepsy phenotype to the injection of KA in CA1 and show that optical excitation of CA1 can reliably induce seizures. The authors evaluate the impact of repeated stimulation on the reliability of seizure induction and show that seizures can be reliably induced by CA1 stimulation, at least for the short term (up to 16 days). These are strengths of the study.

      However, there are several limitations: the seizures are evoked, not spontaneous. It is not clear how induced seizures can be used to investigate if antiseizure medication can reduce spontaneous seizures. Although seizure inducibility and severity can be assessed, the lack of spontaneous seizures is a limitation. To their credit, the authors show that electrophysiological signatures of induced vs spontaneous seizures are similar in many ways, but the authors also show several differences. Notably, the induced seizures are robustly inhibited by the antiseizure medication levetiracetam and variably but significantly inhibited by diazepam, similar to many mouse models with chronic recurrent seizure activity. One also wonders if using a mouse model with numerous seizures (such as the pilocarpine model) might be more efficient than using a modified IHKA protocol.

      In this revised manuscript, the authors address some previous concerns related to definitions of seizures and events that are trains of spikes, sex as a biological variable, and present new images of ChR2 expression (but these images could be improved to see the cells more clearly). A few key concerns remain unaddressed, however. For example, it is still not clear that evoked seizures triggered by stimulating CA1 are similar to spontaneous seizures, regardless of the idea that CA1 plays a role in seizure disorders. It also remains unclear whether repeated activation of the hippocampal circuit will result in additional alterations to this circuit that affect the seizure phenotype over prolonged intervals (after 16 days). Furthermore, the use of SVM with the number of seizures being used as replicates (instead of number of mice) is inappropriate. Another theoretical concern is whether the authors are correct in suggesting that one will be able to re-use the mice for screening multiple drugs in a row.

      Strengths:<br /> - The authors show that the IHKA model of chronic epilepsy can be modified to preserve CA1 pyramidal cells, allowing optogenetic stimulation of CA1 to trigger seizures.<br /> - The authors show that repeated optogenetic stimulation of CA1 in untreated mice can promote kindling and induce seizures, indeed generating two mouse models in total.<br /> - Many electrophysiological signatures are similar between the induced and spontaneous seizures, and induced seizures reliably respond to treatment with antiseizure medications.<br /> - Given that more seizures can be observed per mouse using on-demand optogenetics, this model enhances the utility of each individual mouse.<br /> - Mice of each sex were used.

      Weaknesses:<br /> - Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently justified when using number of seizures as the statistical replicate (vs mice).<br /> - Related to the first concern, the utility of increasing number of seizures for enhancing statistical power is limited because standard practice is for sample size to be numbers of mice.<br /> - The term "seizure burden" usually refers to the number of spontaneous seizures per day, not the severity of the seizures themselves. Because the authors are evoking the seizures being studied, this study design precludes assessment of seizure burden.<br /> - It seems likely that repeatedly inducing seizures will have a long-term effect, especially in light of the downward slope at day 13-16 for induced seizures seen in Figure 4C. A duration of evaluation that is longer than 16 days is warranted.<br /> - Human epilepsy is extensively heterogeneous in both etiology and individual phenotype, and it may be hard to generalize the approach.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      Weaknesses:

      While the data generally supports the authors' conclusions, a weakness of this manuscript lies in their analytical approach where EEG feature-space comparisons used the number of spontaneous or evoked seizures as their replicates as opposed to the number of IHK mice; these large data sets tend to identify relatively small effects of uncertain biological significance as being highly statistically significant. Furthermore, the clinical relevance of similarly small differences in EEG feature space measurements between seizure-naïve and epileptic mice is also uncertain.

      In this work, we used linear mixed effect model to address two levels of variability –between animals and within animals. The interactive linear mixed effect model shows that most (~90%) of the variability in our data comes from within animals (Residual), the random effect that the model accounts for, rather than between animals. Since variability between animals are low, the model identifies common changes in seizure propagation across animals, while accounting for the variability in seizures within each animal. Therefore, the results we find are of changes that happen across animals, not of individual seizures. We made text edits to clarify the use of the linear mixed effect model. (page6, second paragraph and page 11, first paragraph)

      Finally, the multiple surgeries and long timetable to generate these mice may limit the value compared to existing models in drug-testing paradigms.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16. In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening also is a key advantage of our induced seizure model.  

      Reviewer 1 (Recommendations for the authors):

      (1) Address why the EEG data comparisons were performed between seizures and not between animals (as explicitly described in the public review). Further, a discussion of the biological significance (or lack thereof) of the effect size differences observed is warranted. This is especially concerning when the authors make the claim that spontaneous and induced seizures are essentially the same while their analysis shows all evaluated feature space parameters were significantly difference in the initial 1/3 of the EEG waveforms.

      We made text edits to clarify the use of the linear mixed effects model (page 6, second paragraph, and page 11, first paragraph)

      (2) The authors place great emphasis on the use of clinically/etiologically relevant epilepsy models in drug discovery research. There is discussion criticizing the time points required to enact kindling and the artificial nature of acute seizure induction methods. However, the combination IHK-opto seizure induction model also requires a lengthy timeline. A more tempered discussion of this novel model's strengths may benefit readers.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16.

      (3) The authors should further emphasize the benefit of having an inducible seizure model of focal epilepsy since other mouse models (e.g., genetic or TBI models) may have superior etiological relevance (construct and face validity) but may not be amenable to their optogenetic stimulation approach.

      Thank you for the suggestion. We revised the manuscript to better emphasize the potential significance of our approach. We added a discussion in the 'Application of Models...' section on page 15, second paragraph. The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation.

      (4) Suggestion: Provide immunolabeled imagery demonstrating ChR2 presence in Thy1 cells.

      Thank you for the suggestion. We added a fluorescence image showing ChR2 expression in Fig. 2A

      (5) It might be prudent to mention any potential effects of laser heat on hippocampal cell damage, although the 10 Hz, ~10 mW, and 6 s stim is unlikely to cause any substantial burns. Without knowing the diameter and material of the optic fiber, this is left up to some interpretation.

      Thank you for the comments. In the Methods section, we listed the optical fiber diameter as 400 microns (page 17, EEG and Fiber Implantation section). Using 5–18 mW laser power with a relatively large fiber diameter of 400 microns, the power density falls within the range of commonly employed channelrhodopsin activation conditions in vivo. That said, we would like to investigate potential heat effects or cell damage in a follow-up study.

      (6) There are instances in the manuscript where the authors describe experimental and analytical parameters vaguely (e.g. "Seizures were induced several times a day", "stimulation was performed every 1 - 3 hours over many days"). These descriptions can and should be more precise.

      Thank you for the comments. To enhance clarity, we added the stimulation protocol in a flowchart format in Fig. S2A, describing how we determined the threshold and proceeded to the drug test. Following this protocol, there was variability in the number of stimulations per day.

      (7) In the second to last paragraph of the discussion, the authors state "However, HPDs are not generalizable across species - they are specific to the mouse model (55)." This statement is inaccurate. The paper cited comes from Dr. Corrine Roucard's lab at Synapcell. In fact, Dr. Rouchard argues the opposite (See Neurochem Res (2017) 42:1919-1925).

      Thank you for pointing out the mistake. On page 16, in the first paragraph, reference 55 (now 58 in the revised version) was intended to refer to 'quickly produce dose-response curves with high confidence.' In the revision, we cited another paper reporting that hippocampal spikes were not reproduced in the rat IHK model. R. Klee, C. Brandt, K. Töllner, W. Löscher, Various modifications of the intrahippocampal kainate model of mesial temporal lobe epilepsy in rats fail to resolve the marked rat-to-mouse differences in type and frequency of spontaneous seizures in this model. Epilepsy Behav. 68, 129–140 (2017).

      (8) In the discussion, Levetiracetam is highlighted as an ASM that would not be detected in acute induced seizure models; the authors point out its lack of effect in MES and PTZ. However, LEV is effective in the 6Hz test (also an acute-induced seizure model). This should be stated.

      Thank you for the comments. We highlighted the discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (9) The results text indicates that 9 epileptic mice were used to test LEV and DZP. However, the individual data points illustrated in Figure 5B show N=8 mice. Please correct.

      Thank you for the comments. A total of nine epileptic mice were used to assess two drugs, with the animals being re-used as indicated in the schematic. A total of eight assessments were conducted for DZP with six mice and eight assessments for LEV with five mice. Each assessment included hourly ChR2 activations without an ASM and hourly ChR2 activations after ASM injection.

      (10) Figure 4D: Naïve mice are labeled as solid blue circles in the legend while the data points are solid blue triangles. Please correct.

      Thank you. We corrected the marker in Fig.4D.

      Reviewer 2 (Public Review):

      Weaknesses:

      (1) Although the figures provide excellent examples of individual electrographic seizures and compare induced seizures in epileptic and naïve animals, it is unclear which criteria were used to identify an actual seizure induced by the optogenetic stimulus, versus a hippocampal paroxysmal discharge (HPD), an "afterdischarge", an "electrophysiological epileptiform event" (EEE, Ref #36, D'Ambrosio et al., 2010 Epilepsy Currents), or a so-called "spike-wave-discharge" (SWD). Were HPDs or these other non-seizure events ever induced using stimulation in animals with IH-KA? A critical issue is that these other electrical events are not actual seizures, and it is unclear whether they were included in the column showing data on "electrographic afterdischarges" in Figure 5 for the studies on ASDs. This seems to be a problem in other areas of the paper, also.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9, which shows behavioral seizure severity scores observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (2) The differences between the optogenetically evoked seizures in IH-KA vs naïve mice are interpreted to be due to the "epileptogenesis" that had occurred, but the lesion from the KA-induced injury would be expected to cause differences in the electrically and behaviorally recorded seizures - even if epileptogenesis had not occurred. This is not adequately addressed.

      Thank you for the comments. IHK-injected mice had spontaneous tonic-clonic seizures before the start of optical stimulation, as shown in Figure S1.

      (3) The authors offer little mention of other research using animal models of TLE to screen ASDs, of which there are many published studies - many of them with other strengths and/or weaknesses. For example, although Grabenstatter and Dudek (2019, Epilepsia) used a version of the systemic KA model to obtain dose-response data on the effects of carbamazepine on spontaneous seizures, that work required use of KA-treated rats selected to have very high rates of spontaneous seizures, which requires careful and tedious selection of animals. The ETSP has published studies with an intra-amygdala kainic acid (IA-KA) model (West et al., 2022, Exp Neurol), where the authors claim that they can use spontaneous seizures to identify ASDs for DRE; however, their lack of a drug effect of carbamazepine may have been a false negative secondary to low seizure rates. The approach described in this paper may help with confounds caused by low or variable seizure rates. These types of issues should be discussed, along with others.

      We appreciate the reviewer’s insights. We added a discussion comparing our model with other existing models in the Discussion section (pages 15 and 16, 'Comparison to Other Seizure Models Used in Pharmacologic Screening' section). In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening is a key advantage of our induced seizure model.

      (4) The outcome measure for testing LEV and DZP on seizures was essentially the fraction of unsuccessful or successful activations of seizures, where high ASD efficacy is based on showing that the optogenetic stimulation causes fewer seizures when the drug is present. The final outcome measure is thus a percentage, which would still lead to a large number of tests to be assured of adequate statistical power. Thus, there is a concern about whether this proposed approach will have high enough resolution to be more useful than conventional screening methods so that one can obtain actual dose-response data on ASDs.

      Thank you for the comments. In this revision, we added Supplemental Figure S9, showing the severity of behavioral seizures observed before and during ASM testing for each animal. We observed a reduction in behavioral seizure severity for each subject. We would like to explore using behavioral severity as an outcome measure in a follow-up study.

      (5) The authors state that this approach should be used to test for and discover new ASDs for DRE, and also used for various open/closed loop protocols with deep-brain stimulation; however, the paper does not actually discuss rigorously or critically the background literature on other published studies in these areas or how this approach will improve future research for a broader audience than the ETSP and CROs. Thus, it is not clear whether the utility will apply more widely and how extensive a readership will be attracted to this work.

      We appreciate the reviewer’s insights. We revised the manuscript to better emphasize the potential significance of our approach (page 15, second paragraph). The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation. Regarding drug-resistant epilepsy (DRE) and anti-seizure drug (ASD) screening, we agree with the reviewer that probing new classes of ASDs for DRE represents a critical goal. However, we believe that a full exploration of additional ASD classes and/or modeling DRE lies outside the scope of this manuscript, and we would like to explore it in a follow-up study.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors should explain why 10 Hz was chosen as the stimulation frequency.

      Thank you for the comment. A frequency of 10 Hz was determined based on previous work using anesthetized animals prepared in an acute in vivo setting. To simplify the paper and avoid confusion, we did not include a discussion on how we determined the frequency. Instead, we added a detailed description of how we optimized the power in a flowchart format in Supplemental Figure S2. We hope this improves reproducibility.

      (2) After micro-injection of KA, morphological changes were observed in the hippocampus, but no comparison of Chr2 expression was made in naïve animals vs KA-injected animals. Presumably, the Thy1-Chr2 mouse expresses GFP in cells that express Chr2. Thus, it may be useful to show the expression of Chr2 in animals with hippocampal sclerosis. This may explain the lack of dramatic difference between stimulation parameters in naïve vs epileptic animals, as shown in supplemental Figure S2.

      Thank you for the suggestion. We added a fluorescence image of ChR2 expression in CA1, ipsilateral to the KA-injected site, in Fig. 2A.

      (3) The authors state that "During epileptogenesis, neural networks in the brain undergo various changes ranging from modification of membrane receptors to the formation of new synapses" and that these changes are critical for successful "on-demand" seizure induction. However, it is not clear or well-discussed whether changes in neuronal cell densities that occur during sclerosis are important for "on-demand" seizure induction as well. Also, the authors showed that naïve animals exhibit a kindling-like effect, but it was unclear whether a similar effect was present in epileptic animals (i.e. do stimulation thresholds to seizure induction change as the animal gets more induction stimulations)? If present, would the secondary kindling affect drug-testing studies (e.g., would the drug effect be different on induced seizure #2 vs induced seizure #20)?

      Thank you for the suggestion. Since this is an important aspect of the model, we would like to address the kindling effect, the secondary kindling effect, and histopathology in a longer-term setting (several weeks) in a follow-up study.

      (4) The authors show that in their model, LEV and DZP were both efficacious. The authors do not seem to mention that, over 25 years ago, LEV was originally missed in the standard ETSP screens; and, it was only discovered outside of the ETSP with the kindling model. The kindling model is now used to screen ASDs. The authors should consider adding this point to the Discussion. It remains unclear, however, if the author's screening strategy shows advantages over kindling and other such approaches in the field.

      Thank you for the suggestion. We added a discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (5) P8 paragraph 2. The authors state values for naïve animals, but they should also provide values for epileptic animals since they state that the groups were not significantly different (p>0.05). It would be useful to show values for both and state the actual p-value from the test. This issue of stating mean/median values with SD and sample size should be addressed for all data throughout the paper. Additionally, Figure S2 should be added to the manuscript and discussed, as it has data that may be valuable for the reproducibility of the paper.

      Thank you for the suggestion. Figure S2 shows the threshold power required to induce electrographic activity for n = 10 epileptic animals (9.14 ± 4.75 mW) and n = 6 naïve animals (6.17 ± 1.58 mW) (Wilcoxon rank-sum test, p = 0.137). The threshold duration was comparable between the same epileptic animals (6.30 ± 1.64 s) and naïve animals (5.67 ± 1.03 s) (Wilcoxon rank-sum test, p = 0.7133). 

      (6) In addition to the other stated references on synaptic reorganization in the CA1 area, the authors should mention similar studies from Esclapez et al. (1999, J Comp Neurol).

      Thank you. We have included the reference in the revision.

      (7) All of the raw EEG data on the seizures should be accessible to the readers.

      Thank you for the suggestion. We will consider depositing EEG data in a publicly accessible site.

      Reviewer 3 (Public review):

      Weaknesses:

      (1) Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently explained to show if there are meaningful differences between induced and spontaneous seizures. SVM modeling did not include analysis to assess the overfitting of each classifier since mice were modeled individually for classification.”

      Thank you for the comment. We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (2) The difference between seizures and epileptiform discharges or trains of spikes (which are not seizures) is not made clear.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9 to show the types of seizures observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (3) The utility of increasing the number of seizures for enhancing statistical power is limited unless the sample size under evaluation is the number of seizures. However, the standard practice is for the sample size to be the number of mice.

      In this work, we used a linear mixed-effects model to address two levels of variability—between animals and within animals. The interactive linear mixed-effects model shows that most (~90%) of the variability in our data comes from within animals (residual), the random effect that the model accounts for, rather than between animals. Since variability between animals is low, the model identifies common changes in seizure propagation across animals while accounting for the variability in seizures within each animal. Therefore, the results we find reflect changes that occur across animals, not individual seizures. We made text edits to clarify the use of the linear mixed-effects model.

      (4) Seizure burden is not easily tested.

      Thank you for the comment. We added Supplemental Figure S9 to summarize the severity of behavioral seizures before and during ASM testing. This addresses the reviewer’s comment on seizure burden. In a follow-up study, we would like to explore this type of outcome measure for drug screening.

      Reviewer 3 (Recommendations for the authors):

      (1) Provide a stronger rationale to use area CA1. For example, the authors mention that CA1 is active during seizure activity, but can seizures originate from CA1? That would make the approach logical and also explain why induced and spontaneous seizures are similar.

      Thank you for the comment. We discussed it in the Discussion section (page 14, first and second paragraphs).

      (2) Explain the use of SVM classifiers so it is more convincing that induced and spontaneous seizures are similar. Or, if they are not similar, explain that this is a limitation.

      We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (3)If feasible, extend the duration over which seizure induction reliability is assessed so that the long-term utility of the model can be demonstrated.

      Thank you for the suggestion. We would like to assess long-term utility in a follow-up study.

      (4) The GitHub link is not yet active. The authors will be required to supply their relevant code for peer evaluation as well as publication.

      Thank you. The GitHub repository is now active.

      (5) State and assess the impacts of sex as a biological variable.

      Thank you for pointing this out. Both female and male animals were included in this study: Epileptic cohort: 7 males, 3 females; Naïve cohort: 3 males, 4 females.

    1. eLife Assessment

      This useful manuscript reports on a new mouse model for LAMA2-MD, a rare but very severe congenital muscular dystrophy. The knockout mice were generated by removing exon3 in the Lama2 gene, which results in a frameshift in exon4 and a premature stop codon. These animals lack any laminin-alpha2 protein and confirm results from previous Lama2 knockout models. Additionally, this study includes weak transcriptomics data that might be a good resource for the field. However, experimental evidence, methods, and data analyses supporting the main claims of the manuscript are incomplete.

    2. Reviewer #1 (Public review):

      Strengths:

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Comments on revisions:

      This is the second revision of a paper focusing on the generation of a CRISPR/Cas9-engineered mouse model for LAMA2-MD. I have reviewed the initial submission, the first revision, and now this second revision. While there have been improvements, several issues still need to be addressed by the authors. I will outline these points without dividing them into major and minor categories:

      Introduction:

      The statement regarding existing mouse models requires correction: The claim, "They were established in the pre-gene therapy era, leaving trace of engineering, such as bacterial elements in the Lama2 gene locus, thus unsuitable for testing various gene therapy strategies," is inaccurate. Current mouse models can indeed be used for testing gene therapy strategies, regardless of whether they contain elements in the Lama2 locus. The primary consideration is whether or not they express laminin-alpha2. Please revise this statement.<br /> Results Section:

      scRNA-seq:

      The authors note that they analyzed "a total of 8,111 cells from the dyH/dyH mouse brain and 8,127 cells from the WT mouse brain were captured using the 10X Genomics platform (Figure supplement 4A, B)." This is too few cells to support firm conclusions. Furthermore, there is a discrepancy in the referred figure S4, which indicates that 10,094 cells were analyzed for dyH/dyH mice and 10,496 for wild-type mice. Please correct this inconsistency.

      Figure 5C displays differences in cell populations between wild-type and dyH/dyH mice. Given the low number of cells analyzed and the lack of replicates, these differences cannot be considered reliable. More samples should be analyzed to support these findings.

      The data suggest a defect in the BBB for dyH/dyH mice, but this conclusion is based on minimal cell counts and remains purely correlative. If BBB issues exist, experimental validation is necessary, such as injecting dyes into the bloodstream to detect any leakage. I have previously highlighted this in my comments on earlier manuscript versions.

      Bulk RNA-seq:

      The number of samples analyzed here is substantial, making the data potentially more robust. These data could serve as a valuable resource for other researchers. However, it is important to note that all data are correlative and do not provide functional insights.

      Overall:

      The manuscript still lacks significant insights, partly because existing mouse models for LAMA2-MD have been extensively analyzed. While the bulk RNA-seq data offer some value as a resource, I recommend that the authors re-assess their writing and further temper their interpretations of the findings.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      Thank you for the valuable comments and good suggestions you have proposed, and we have added information and analysis of another mouse model for LAMA2-MD in the updated version 2 of this manuscript.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Thank you for the good comments you have proposed, and we have carefully corrected the overinterpretation and overstatements in the previous updated version.

      Unfortunately, the data on RNA-seq and scRNA-seq are still rather weak. scRNA-seq was conducted with only one mouse resulting in only 8000 nuclei. I am not convinced that the data allow us to interpret them to the extent of the authors. Similar to the first version, the authors infer function by examining expression. Although they are a bit more cautious, they still argue that the BBB is not functional in dy<sup>H</sup>/dy<sup>H</sup> mice without showing leakiness. Such experiments can be done using dyes, such as Evans-blue or Cadaverin. Hence, I would suggest that they formulate the text still more carefully.

      Thank you for the valuable suggestions. We also agree that we should perform more related functional experiments such as Evans-blue or Cadaverin to confirm the impaired BBB. However, the related functional experiments haven’t been done due to the first author has been working in clinic. While, we have added the "Limitations" part, and made statements in the Limitations part with "Even though RNA-seq and scRNA-seq have been performed, the data of scRNA-seq are still insufficient due to the limited number of mouse brains. This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed".

      A similar lack of evidence is true for the suggested cobblestone-like lissencephaly of the mice. There is no strong evidence that this is indeed occurring in the mice (might also be a problem because mice die early). Hence, the conclusions need to be formulated in such a way that readers understand that these are interpretations and not facts.

      Thank you for the valuable suggestions. We do agree with this comment, and have made statement in the Limitations with "This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed". Also, for the cobblestone-like lissencephaly which was showed in LAMA2-CMD patients while not found in the mouse model, we have added the discussion as "Though the cortical malformations were not found in the dy H/dy H brains by MRI analysis probably due to the small volume in within 1 month old, Thus, the changes in transcriptomes and protein levels provided potentially useful data for the hypothesis of the impaired gliovascular basal lamina of the BBB, which might be associated with occipital pachygyria in LAMA2-CMD patients."

      Finally, I am surprised that the only improvement in the main figures is the Western blot for laminin-alpha2. The histology of skeletal muscle still looks rather poor. I do not know what the problems are but suggest that the authors try to make sections from fresh-frozen tissue. I anticipate that the mice were eventually perfused with PFA before muscles were isolated. This often results in the big gaps in the sections.

      Thank you for the valuable suggestions. We do agree with this comment and we should make sections from fresh-frozen tissue. Therefore, we have made statement in the Limitations with "Moreover, due to making sections with PFA before muscles isolated, and not from fresh-frozen tissue, there have been big gaps in the sections which do affect the histology of skeletal muscle to some extent."

      Overall, the work is improved but still would need additional experiments to make it really an important addition to the literature in the LAMA-MD field.

      Thank you for all your good comments and the valuable suggestions.

      Reviewer #2 (Public Review):

      This revised manuscript describes the production of a mouse model for LAMA2- Related Muscular Dystrophy. The authors investigate changes in transcripts within the brain and blood barrier. The authors also investigate changes in the transcriptome associated with the muscle cytoskeleton. Strengths: (1) The authors produced a mouse model of LAMA2-CMD using CRISPR-Cas9. (2) The authors identify cellular changes that disrupted the blood-brain barrier.

      Thank you for your good comments.

      Weaknesses:

      The authors throughout the manuscript overstate "discoveries" which have been previously described, published and not appropriately cited.

      Thank you for your great suggestion. We have toned-down the interpretations and overstatements throughout the manuscript, and added words such as "potentially", "possible", "some potential clues", "was speculated to probably", and so on.

      Alternations in the blood brain barrier and in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published in the literature and are not cited appropriately.

      Thank you for your great suggestion. We do agree with that alternations in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published, and the related literatures have been cited in the updated version 2.0. However, alternations in the blood brain barrier in LAMA2-CMD haven’t been extensively studied, only some papers (such as PMID: 25392494, PMID: 32792907) have investigated or discussed this issue.

      The authors have increased animal number to N=6, but this is still insufficient based on Power analysis results in statistical errors and conclusions that may be incorrect.

      Thank you for your great suggestion. We do agree that the animal number should be increased for Power analysis, and we have added statements in the Limitations with "Finally, due to the limited number of animal samples for the Power analysis, the statistical errors and conclusions might be affected."

      The use of "novel mouse model" in the manuscript overstates the impact of the study.

      Thank you for your great suggestion. We have changed the statement "novel mouse model" throughout the manuscript except the title.

      All studies presented are descriptive and do not more to the field except for producing yet another mouse model of LAMA2-CMD and is the same as all the others produced.

      Thank you for your comment. We do agree that further functional experiments have not been performed to reveal and confirm the pathogenesis. However, the analysis of phenotype was systematic and comprehensive, including survival time, motor function, serum CK, muscle MRI, muscle histopathology in different stages, and brain histopathology. Moreover, RNA-seq and scRNA-seq in LAMA2-CMD have been seldom performed before, and the data in this study could provide potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD.

      Grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength, which is better achieved using ex vivo or in vivo muscle contractility studies.

      Thank you for your great suggestion. We do agree that grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength. And we have added related statement in the Limitations with "Grip strength measurements used in this study are considered error prone and do not give an accurate measurement of muscle strength, which would be better achieved using ex vivo or in vivo muscle contractility studies."

      A lack of blinded studies as pointed out of the authors is a concern for the scientific rigor of the study.

      Thank you for your great suggestion. We performed the studies with those scoring outcome measures not blinded to the groups. Actually, it was very easy to discriminate the dy<sup>H</sup>/dy<sup>H</sup> groups from the WT/Het mice due to that the dy<sup>H</sup>/dy<sup>H</sup> mice showed much smaller body shape than other groups from as early as P7 .

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      There are multiple grammatical errors throughout the manuscript which should be corrected.

      Thank you for your recommendation. We have carefully corrected the grammatical errors within the manuscript.

      The authors mention no changes in intestinal muscles, but it is unclear if they are referring to skeletal or smooth muscle.

      Thank you for your good comment. The intestinal muscles with no changes in this study are referring to smooth muscle, and we have changes the description into intestinal smooth muscles.

    1. eLife Assessment

      The authors present useful findings on the use of a single-fly behavioral paradigm for assessing different Drosophila genetic models of neurodegeneration. The experimental design and analyses are solid and can be used for quick behavioral assessment in fly models of various neurodegenerative diseases, especially those having an impact on locomotion. The work will be of interest to Drosophila biologists using behavior as a readout for their studies.

    2. Reviewer #1 (Public review):

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioural assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioural data is detailed, and the analysis parameters are well-explained.

      Weaknesses:

      The authors have yet to link cellular physiology to behaviour. It will be interesting to see how future use of this assay helps uncover connections between cellular pathology and behavioural changes.

    3. Reviewer #2 (Public review):

      The manifestation and progression of neurodegenerative disorders is poorly understood. Many of the neuronal disorders start by presenting subtle changes in neuronal circuit and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The present study very nicely uses the flies' behavioral response to predator-mimicking passing shadows to measure subtle changes in their behavior. The data from various fly genetic models of Parkinson's disease supports their claim. This single trial method is useful to capture the individual animal's response to the threatening stimuli but stops short of capturing the fine ambulatory responses which could provide further information on an individual's behavioral response. By capturing the fine features, the authors could get detailed observations, such as posture, gait or wing positioning for a better understanding the behavioral response to the passing shadow.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their constructive comments and the Editor for the possibility to address the Reviewers’ points in this rebuttal. We 

      (1) Conducted new experiments with NP6510-Gal4 and TH-Gal4 lines to address potential behavioral differences due to targeting dopaminergic vs. both dopaminergic and serotonergic neurons

      (2) Conducted novel data analyses to emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies

      (3) Provided Supplementary Movies

      (4) Calculated additional statistics

      (5) Edited and added text to address all points of the Reviewers.

      Please see our point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioral assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable, and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioral data is detailed, and the analysis parameters are well-explained.

      We thank the Reviewer for the positive assessment of our study.

      Weaknesses:

      While the abstract promises to give us an assay to accelerate fly-to-human translation, the authors need to provide evidence to show that this is indeed the case. They have used PD lines extensively characterized by other groups, often with cheaper and easier-to-setup assays like negative geotaxis, and do not offer any new insights into them. The conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression is enormous, and the paper does not make any attempt to bridge it. It needs to be clarified how this assay provides a new understanding of the fly PD models, as the authors do not explore the cellular/circuit basis of the phenotypes. Similarly, they have assumed that the behavior they are looking at is an escape-from-predator response modulated by the central complex- is there any evidence to support these assumptions? Because of their rather superficial approach, the paper does not go beyond providing us with a collection of interesting but preliminary observations.

      We thank the Reviewer for pointing out some limitations of our study. We would like to emphasize that what we perceive as the main advantage of performing single-fly and single-trial analyses is the access to rich data distributions that provide more fine-scale information compared to bulk assays. We think that this is exactly going one step closer to ‘bridging the enormous conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression’, and we showcase this in our study by comparing the distributions over the entire repertoire of behavioral responses across fly mutants. Nevertheless, we agree with the Reviewer that many more steps in this direction are needed to improve translatability. Therefore, we toned down the corresponding statements in the Abstract and in the Introduction. Moreover, to further emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies, we complemented our comparisons of central tendencies with testing for potential differences in data dispersion, demonstrated in the novel Supplementary Figure S4.

      Looming stimuli have been used to characterize flies’ escape behaviors. These studies uncovered a surprisingly rich behavioral repertoire (Zacarias et al., 2018), which was modulated by both sensory and motor context, e.g. walking speed at time of stimulus presentation (Card and Dickinson, 2008; Oram and Card, 2022; Zacarias et al., 2018). The neural basis of these behaviors was also investigated, revealing loom-sensitive neurons in the optic lobe and the giant fiber escape pathway (Ache et al., 2019; de Vries and Clandinin, 2012). Although less frequently, passing shadows were also employed as threat-inducing stimuli in flies (Gibson et al., 2015). We opted for this variant of the stimulus so that we could ensure that the shadow reached the same coordinates in all linear track concurrently, aiding data analysis and scalability. Similar to the cited study, we found the same behavioral repertoire as in studies with looming stimuli, with an equivalent dependence on walking speed, confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli. We added a discussion on this topic to the main text.

      Reviewer #2 (Public Review):

      In this study, Kajtor et al investigated the use of a single-animal trial-based behavioral assay for the assessment of subtle changes in the locomotor behavior of different genetic models of Parkinson's disease of Drosophila. Different genotypes used in this study were Ddc-GAL4>UASParkin-275W and UAS- α-Syn-A53T. The authors measured Drosophila's response to predatormimicking passing shadow as a threatening stimulus. Along with these, various dopamine (DA) receptor mutants, Dop1R1, Dop1R2 and DopEcR were also tested.

      The behavior was measured in a custom-designed apparatus that allows simultaneous testing of 13 individual flies in a plexiglass arena. The inter-trial intervals were randomized for 40 trials within 40 minutes duration and fly responses were defined into freezing, slowing down, and running by hierarchical clustering. Most of the mutant flies showed decreased reactivity to threatening stimuli, but the speed-response behavior was genotype invariant.

      These data nicely show that measuring responses to the predator-mimicking passing shadows could be used to assess the subtle differences in the locomotion parameters in various genetic models of Drosophila.

      The understanding of the manifestation of various neuronal disorders is a topic of active research. Many of the neuronal disorders start by presenting subtle changes in neuronal circuits and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The data from the present study nicely uses the behavioral response to predator-mimicking passing shadows to measure subtle changes in behavior. However, there are a few important points that would help establish the robustness of this study.

      We thank the Reviewer for the constructive comments and the positive assessment of our study.

      (1) The visual threat stimulus for measuring response behavior in Drosophila is previously established for both single and multiple flies in an arena. A comparative analysis of data and the pros and cons of the previously established techniques (for example, Gibson et al., 2015) with the technique presented in this study would be important to establish the current assay as an important advancement.

      We thank the Reviewer for this suggestion. We included the following discussion on measuring response behavior to visual threat stimuli in the revised manuscript.

      Many earlier studies used looming stimulus, that is, a concentrically expanding shadow, mimicking the approach of a predator from above, to study escape responses in flies (Ache et al., 2019; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Oram and Card, 2022; Zacarias et al., 2018) as well as rodents (Braine and Georges, 2023; Heinemans and Moita, 2024; Lecca et al., 2017). These assays have the advantage of closely resembling naturalistic, ecologically relevant threatinducing stimuli, and allow a relatively complete characterization of the fly escape behavior repertoire. As a flip side of their large degree of freedom, they do not lend themselves easily to provide a fully standardized, scalable behavioral assay. Therefore, Gibson et al. suggested a novel threat-inducing assay operating with moving overhead translational stimuli, that is, passing shadows, and demonstrated that they induce escape behaviors in flies akin to looming discs (Gibson et al., 2015). This assay, coined ReVSA (repetitive visual stimulus-induced arousal) by the authors, had the advantage of scalability, while constraining flies to a walking arena that somewhat restricted the remarkably rich escape types flies otherwise exhibit. Here we carried this idea one step further by using a screen to present the shadows instead of a physically moving paddle and putting individual flies to linear corridors instead of the common circular fly arena. This ensured that the shadow reached the same coordinates in all linear tracks concurrently and made it easy to accurately determine when individual flies encountered the stimulus, aiding data analysis and scalability. We found the same escape behavioral repertoire as in studies with looming stimuli and ReVSA (Gibson et al., 2015; Zacarias et al., 2018), with a similar dependence on walking speed (Oram and Card, 2022; Zacarias et al., 2018), confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli.  

      (2) Parkinson's disease mutants should be validated with other GAL-4 drivers along with DdcGAL4, such as NP6510-Gal4 (Riemensperger et al., 2013). This would be important to delineate the behavioral differences due to dopaminergic neurons and serotonergic neurons and establish the Parkinson's disease phenotype robustly.

      We thank the Reviewer for point out this limitation. To address this, we repeated our key experiments in Fig.3. with both TH-Gal4 and NP6510-Gal4 lines, and their respective controls. These yielded largely similar results to the Ddc-Gal4 lines reported in Fig.3., reproducing the decreased speed and decreased overall reactivity of PD-model flies. Nevertheless, TH-Gal4 and NP6510-Gal4 mutants showed an increased propensity to stop. Stop duration showed a significant increase not only in α-Syn but also in Parkin fruit flies. These novel results have been added to the text and are demonstrated in Supplementary Figure S3.

      (3) The DopEcR mutant genotype used for behavior analysis is w1118; PBac{PB}DopEcRc02142TM6B, Tb1. Balancer chromosomes, such as TM6B,Tb can have undesirable and uncharacterised behavioral effects. This could be addressed by removing the balancer and testing the DopEcR mutant in homozygous (if viable) or heterozygous conditions.

      We appreciate the Reviewer's comment and acknowledge the potential for the DopEcR balancer chromosome to produce unintended behavioral effects. However, given that this mutant was not essential to our main conclusions, we opted not to repeat the experiment. Nevertheless, we now discuss the possible confounds associated with using the PBac{PB}DopEcRc02142 mutant allele over the balancer chromosome. “We recognize a limitation in using PBac{PB}DopEcRc02142 over the  TM6B, Tb<sup>1</sup> balancer chromosome, as the balancer itself may induce behavioral deficits in flies. We consider this unlikely, as the PBac{PB}DopEcRc02142 mutation demonstrates behavioral effects even in heterozygotes (Ishimoto et al., 2013). Additionally, to our knowledge, no studies have reported behavioral deficits in flies carrying the TM6B, Tb<sup>1</sup> balancer chromosome over a wild-type chromosome.”

      (4) The height of the arena is restricted to 1mm. However, for the wild-type flies (Canton-S) and many other mutants, the height is usually more than 1mm. Also, a 1 mm height could restrict the fly movement. For example, it might not allow the flies to flip upside down in the arena easily. This could introduce some unwanted behavioral changes. A simple experiment with an arena of height at least 2.5mm could be used to verify the effect of 1mm height.

      We thank the Reviewer for this comment, which prompted us to reassess the dimensions of the apparatus. The height of the arena was 1.5 mm, which we corrected now in the text. We observed that the arena did not restrict the flies walking and that flies could flip in the arena. We now include two Supplementary Movies to demonstrate this.

      (5) The detailed model for Monte Carlo simulation for speed-response simulation is not described. The simulation model and its hyperparameters need to be described in more depth and with proper justification.

      We thank the Reviewer for pointing out a lack of details with respect to Monte Carlo simulations. We used a nested model built from actual data distributions, without any assumptions. Accordingly, the stimulation did not have hyperparameters typical in machine learning applications, the only external parameter being the number of resamplings (3000 for each draw). We made these modeling choices clearer and expanded this part as follows.

      “The effect of movement speed on the distribution of behavioral response types was tested using a nested Monte Carlo simulation framework (Fig. S5). This simulation aimed to model how different movement speeds impact the probability distribution of response types, comparing these simulated outcomes to empirical data. This approach allowed us to determine whether observed differences in response distributions are solely due to speed variations across genotypes or if additional behavioral factors contribute to the differences. First, we calculated the probability of each response type at different specific speed values (outer model). These probabilities were derived from the grand average of all trials across each genotype, capturing the overall tendency at various speeds. Second, we simulated behavior of virtual flies (n = 3000 per genotypes, which falls within the same order of magnitude as the number of experimentally recorded trials from different genotypes) by drawing random velocity values from the empirical velocity distribution specific to the given genotype and then randomly selecting a reaction based on the reaction probabilities associated with the drawn velocity (inner model). Finally, we calculated reaction probabilities for the virtual flies and compared it with real data from animals of the same genotype.

      Differences were statistically tested by Chi-squared test.”

      (6) The statistical analysis in different experiments needs revisiting. It wasn't clear to me if the authors checked if the data is normally distributed. A simple remedy to this would be to check the normality of data using the Shapiro-Wilk test or Kolmogorov-Smirnov test. Based on the normality check, data should be further analyzed using either parametric or non-parametric statistical tests. Further, the statistical test for the age-dependent behavior response needs revisiting as well. Using two-way ANOVA is not justified given the complexity of the experimental design. Again, after checking for the normality of data, a more rigorous statistical test, such as split-plot ANOVA or a generalized linear model could be used.

      We thank the Reviewer for this comment. We performed Kolmogorov-Smirnov test for normality on the data distributions underlying Figure 3, and normality was rejected for all data distributions at p = 0.05, which justifies the use of the non-parametric Mann-Whitney U-test. Regarding ANOVA, we would like to point out that the ANOVA hypothesis test design is robust to deviations from normality (Knief and Forstmeier, 2021; Mooi et al., 2018). While the Kruskal-Wallis test is considered a reasonable non-parametric alternative of one-way ANOVA, there is no clear consensus for a non-parametric alternative of two-way ANOVA. Therefore, we left the two-way ANOVA for Figure 5 in place; however, to increase the statistical confidence in our conclusions, we performed Kruskal-Wallis tests for the main effect of age and found significant effects in all genotypes in accordance with the ANOVA, confirming the results (Stop frequency, DopEcR p = 0.0007; Dop1R1, p = 0.004; Dop1R2, p = 9.94 × 10<sup>-5</sup>; w<sup>1118</sup>, p = 9.89 × 10<sup>-13</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 2.54 × 10<sup>-5</sup>; Slowing down frequency, DopEcR, p = 0.0421; Dop1R1, p = 5.77 x 10<sup>-6</sup>; Dop1R2, p = 0.011; w<sup>1118</sup>, p = 2.62 x 10<sup>-5</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 0.0382; Speeding up frequency, DopEcR, p = 0.0003; Dop1R1, p = 2.06 x 10<sup>-7</sup>; Dop1R2, p = 2.19 x 10<sup>-6</sup>; w<sup>1118</sup>, p = 0.0044; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 1.36 x 10<sup>-5</sup>). We also changed the post hoc Tukey-tests to post hoc Mann-Whitney tests in the text to be consistent with the statistical analyses for Figure 3. These resulted in very similar results as the Tukey-tests. Of note, there isn’t a straightforward way of correcting for multiple comparisons in this case as opposed to the Tukey’s ‘honest significance’ approach, we thus report uncorrected p values and suggest considering them at p = 0.01, which minimizes type I errors. These notes have been added to the ‘Data analysis and statistics’ Methods section.

      (7) The dopamine receptor mutants used in this study are well characterized for learning and memory deficits. In the Parkinson's disease model of Drosophila, there is a loss of DA neurons in specific pockets in the central brain. Hence, it would be apt to use whole animal DA receptor mutants as general DA mutants rather than the Parkinson's disease model. The authors may want to rework the title to reflect the same.

      We thank the Reviewer for this comment, which suggests that we were not sufficiently clear on the Drosophila lines with DA receptor mutations. We used Mi{MIC} random insertion lines for dopamine receptor mutants, namely y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R1<sup>MI04437</sup> (BDSC 43773), y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R2<sup>MI08664</sup> (BDSC 51098) (Harbison et al., 2019; Pimentel et al., 2016), and w<sup>1118</sup>; PBac{PB}DopEcR<sup>c02142</sup>/TM6B, Tb<sup>1</sup> (BDSC 10847) (Ishimoto et al., 2013; Petruccelli et al., 2020, 2016). These lines carried reported mutations in dopamine receptors, most likely generating partial knock down of the respective receptors. We made this clearer by including the full names at the first occurrence of the lines in Results (beyond those in Methods) and adding references to each of the lines.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please think about focusing the manuscript either on the escape response or the PD pathology and provide additional evidence to demonstrate that you indeed have a novel system to address open questions in the field.

      As detailed above, we now emphasize more that the main advantage of our single-trial-based approach lies in the appropriate statistical comparison of rich distributions of behavioral data. Please see our response to the ‘Weaknesses’ section for more details.

      (2) Please explain the rationale for choosing the genetic lines and provide appropriate genetic controls in the experiments, e.g. trans-heterozygotes. Why use Ddc-Gal4 instead of TH or other specific Split-Gal4 lines?

      We thank the Reviewer for this suggestion. We repeated our key experiments with TH-Gal4 and NP6510-Gal4 lines. Please see our response to Point #2 of Reviewer #2 for details.

      (3) Please proofread the manuscript for ommissions. e.g. there's no legend for Fig 4b.

      We respectfully point out that the legend is there, and it reads “b, Proportion of a given response type as a function of average fly speed before the shadow presentation. Top, Parkin and α-Syn flies. Bottom, Dop1R1, Dop1R2 and DopEcR mutant flies.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In figure 2(c), representing the average walking speed data for different mutants would be useful to visually correlate the walking differences.

      We thank the Reviewer for this suggestion. The average walking speed was added in a scatter plot format, as suggested in the next point of the Reviewer. 

      (2) The data could be represented more clearly using scatter plots. Also, the color scheme could be more color-blindness friendly.

      We thank the Reviewer for this suggestion. We added scatter plots to Fig.2c that indeed represent the distribution of behavioral responses better. We also changed the color scheme and removed red/green labeling.

      (3) The manuscript should be checked for typos such as in line 252, 449, 484.

      Thank you. We fixed the typos.

      References

      Ache JM, Polsky J, Alghailani S, Parekh R, Breads P, Peek MY, Bock DD, von Reyn CR, Card GM. 2019. Neural Basis for Looming Size and Velocity Encoding in the Drosophila Giant Fiber Escape Pathway. Curr Biol 29:1073-1081.e4. doi:10.1016/j.cub.2019.01.079

      Braine A, Georges F. 2023. Emotion in action: When emotions meet motor circuits. Neurosci Biobehav Rev 155:105475. doi:10.1016/j.neubiorev.2023.105475

      Card G, Dickinson MH. 2008. Visually Mediated Motor Planning in the Escape Response of Drosophila. Curr Biol 18:1300–1307. doi:10.1016/j.cub.2008.07.094

      de Vries SEJ, Clandinin TR. 2012. Loom-Sensitive Neurons Link Computation to Action in the Drosophila Visual System. Curr Biol 22:353–362. doi:10.1016/j.cub.2012.01.007

      Gibson WT, Gonzalez CR, Fernandez C, Ramasamy L, Tabachnik T, Du RR, Felsen PD, Maire MR, Perona P, Anderson DJ. 2015. Behavioral Responses to a Repetitive Visual Threat Stimulus Express a Persistent State of Defensive Arousal in Drosophila. Curr Biol 25:1401– 1415. doi:10.1016/j.cub.2015.03.058

      Harbison ST, Kumar S, Huang W, McCoy LJ, Smith KR, Mackay TFC. 2019. Genome-Wide Association Study of Circadian Behavior in Drosophila melanogaster. Behav Genet 49:60–82. doi:10.1007/s10519-018-9932-0

      Heinemans M, Moita MA. 2024. Looming stimuli reliably drive innate defensive responses in male rats, but not learned defensive responses. Sci Rep 14:21578. doi:10.1038/s41598-02470256-2

      Ishimoto H, Wang Z, Rao Y, Wu C, Kitamoto T. 2013. A Novel Role for Ecdysone in Drosophila Conditioned Behavior: Linking GPCR-Mediated Non-canonical Steroid Action to cAMP Signaling in the Adult Brain. PLoS Genet 9:e1003843. doi:10.1371/journal.pgen.1003843

      Knief U, Forstmeier W. 2021. Violating the normality assumption may be the lesser of two evils. Behav Res Methods 53:2576–2590. doi:10.3758/s13428-021-01587-5

      Lecca S, Meye FJ, Trusel M, Tchenio A, Harris J, Schwarz MK, Burdakov D, Georges F, Mameli M. 2017. Aversive stimuli drive hypothalamus-to-habenula excitation to promote escape behavior. Elife 6:1–16. doi:10.7554/eLife.30697

      Mooi E, Sarstedt M, Mooi-Reci I. 2018. Market Research, Springer Texts in Business and Economics. Singapore: Springer Singapore. doi:10.1007/978-981-10-5218-7

      Oram TB, Card GM. 2022. Context-dependent control of behavior in Drosophila. Curr Opin Neurobiol 73:102523. doi:10.1016/j.conb.2022.02.003

      Petruccelli E, Lark A, Mrkvicka JA, Kitamoto T. 2020. Significance of DopEcR, a G-protein coupled dopamine/ecdysteroid receptor, in physiological and behavioral response to stressors. J Neurogenet 34:55–68. doi:10.1080/01677063.2019.1710144

      Petruccelli E, Li Q, Rao Y, Kitamoto T. 2016. The Unique Dopamine/Ecdysteroid Receptor Modulates Ethanol-Induced Sedation in Drosophila. J Neurosci 36:4647–4657. doi:10.1523/JNEUROSCI.3774-15.2016

      Pimentel D, Donlea JM, Talbot CB, Song SM, Thurston AJF, Miesenböck G. 2016. Operation of a homeostatic sleep switch. Nature 536:333–337. doi:10.1038/nature19055

      Zacarias R, Namiki S, Card GM, Vasconcelos ML, Moita MA. 2018. Speed dependent descending control of freezing behavior in Drosophila melanogaster. Nat Commun 9:1–11. doi:10.1038/s41467-018-05875-1

    1. eLife Assessment

      This is an important study that combines replications of findings and novel detailed MRI investigations to assess the impact of environmental enrichment and maternal behavior on mice brain structure at different stages of development. The results and evidence supporting the conclusions are convincing, but in detail, the interpretation is challenging, in particular due to inter-individual and inter-litter variability. The extent to which maternal care mediates the impact of enrichment on brain development during the perinatal period also remains unclear because behavior was observed only during short periods, and the performed analyses are still incomplete. This study will nevertheless be of significant interest to neuroscientists and researchers interested in neurodevelopment in relation to environmental factors because of its in-depth use of MRI to study brain plasticity in mice.

    2. Reviewer #1 (Public review):

      Kaller et al. (2025) explore the impact of environmental enrichment (EE) on the developing mouse brain, specifically during the perinatal period. The authors use high-resolution MRI to examine structural brain changes in neonates (postnatal day 7, P7) and compare these changes to those observed in adulthood. A key aspect of the study is the investigation of maternal care as a potential mediating factor in the effects of perinatal EE on neonatal brain development.

      The work exhibits the following notable strengths:

      (1) The study addresses a significant gap in the literature by investigating the effects of perinatal EE on whole-brain structure in neonates. Previous research has primarily focused on the effects of EE on the adult brain or specific aspects of early development, such as the visual system.

      (2) The authors employ a combination of high-resolution MRI and behavioral analysis of maternal care, providing a comprehensive view of the effects of EE.

      (3) The study reveals that EE affects brain structure as early as P7, with distinct regional changes compared to adulthood. The finding that maternal care influences neonatal brain structure and correlates with the effects of EE is particularly noteworthy.

      (4) The paper is clearly written, well-organized, and easy to follow. The figures and tables are informative and effectively illustrate the key findings.

      However, some weaknesses should be addressed to improve the quality of this study:

      (1) While the study includes an assessment of maternal care, the observational period is relatively short. A more extended or continuous assessment of maternal behavior could provide a more comprehensive understanding of its role in mediating the effects of EE.

      (2) The study primarily focuses on structural brain changes. Investigating the functional consequences of these changes could provide further insights into the long-term impact of perinatal EE.

      (3) The study demonstrates a correlation between maternal care and neonatal brain structure but does not elucidate the underlying mechanisms. Future studies could explore potential molecular or cellular mechanisms involved in these effects.

    3. Reviewer #2 (Public review):

      This paper by Kaller and colleagues combines an interesting replication of findings on the importance of maternal behavior on brain development in the offspring with a state-of-the-art MRI analysis and a novel comparison between such perinatal and early postnatal enrichment via the activity of the mother and a classical enriched environment in the adult. In general, the observations are as one would have expected. Early postnatal enrichment and adult enrichment have differential effects, which is plausible because, as the source of these changes is environmental, and environmental means very different things at these different stages. The three data sets presented are really interesting, and while the comparison between them might not always be as straightforward as it seems, the cross-sectional phenotyping with MRI already provides very important material and allows for interesting insight. Most interesting is possibly the massive effect of housing conditions at P7.

      In particular, the role of individual behavior differs. The authors highlight this role of the interaction with the environment, rather than the environment alone. Maternal care is a process that involves the pup.

      Importantly, the study shows that being born into an enriched environment predates certain changes that are still available after exposure at a later stage, but that there are also important differences. Detailed interpretation of these effects is not easy, however.

      Notably, the study does not include a condition of enrichment from birth into adulthood, and no analysis of the perinatal enrichment effects at an adult age. The timeline can be guessed from Figure 1b, but the authors might in places be more explicit about the fact that, indirectly and sometimes directly, animals of different ages (young adult versus adult) are compared. There is obviously no experience of maternal care in adulthood and no active exploration, etc in childhood. In part, this is what this paper is about, but it requires some thought for the reader to separate the more trivial from the more profound conclusions. Some more guidance would probably be welcome here. In general, Figure 4 is a great idea (and visually very appealing), but the content is not quite clear. "Adults born in EE vs. switched to EE in adulthood": this has, as far as I can tell, not been studied. What is compared are EE effects at two different time-points with two supposedly different mechanisms.

      From such a more mechanistic side, the authors might, for example, want to relate the observed patterns to what is known about the developmental (and plastic) dynamics in the respective brain regions at the given time. But age is a confounder here.

      There is another interesting point that the authors might discuss more prominently. The inter-individual differences in Z-score are dramatic within essentially all groups. So while the mean effects might still be statistically different, a large proportion of animals are within a range of values that could be found in either experimental group. The same is also true for the effects of maternal care, as depicted in Figure 3. While there is, for this ROI, a clear trend that overall relative volume decreases with maternal contact time at each time point, there is a large range of values for each maternal contact time bin. Consequently, neither genetics nor maternal care per se can be the driver of this variation. Part of it will be technical, but the trend in the data indicates that certainly not all of this is noise and technical error.

      This study has some open ends but also provides a very important and interesting direction for future study, corroborating the idea that behavior, maternal and own, does matter.

    4. Reviewer #3 (Public review):

      Summary:

      This study aimed to investigate the effect of environmental enrichment (EE) during the critical perinatal period on the developing brain structure and compare it with other periods. Different datasets of mice with EE or standard housing (SH) were compared with post-mortem MRI: dataset A (MRI at P96; 13 animals in EE during adulthood P53-P96, 14 animals in SH), dataset P (MRI at P43; 24 animals in EE during perinatal period and adulthood E17-P43, 25 animals in SH) and dataset N (MRI at P7; 52 animals in EE during perinatal period E13-P7, 67 animals in SH / resulting from 5 dams with 2 litters: 4 dams in EE and 6 dams in SH). The study replicated the effects observed during adulthood (main neuroanatomical EE/SH difference in datasets A and P: increase in the hippocampus volume) but also showed that volumetric changes for some regions differ between datasets A and P, suggesting different mechanisms of brain responses to enrichment depending on the period when EE was applied. Results on dataset N further showed that EE leads to lower brain size and differences for various regions: volume reduction in striatum, frontal, parietal, and occipital regions, hippocampus; volume increase for a few thalamic nuclei and hindbrain, suggesting different patterns of perinatal EE effects in datasets P and N. Since mice at P7 show little engagement with their environment, the authors further explored the hypothesis that the dams' behavior and interaction with neonates could be a mediator of brain differences observed at P7 between EE and SH animals. Maternal contact time was related to the P7 volumes for some regions (striatum, brainstem), but the variability and low sample size prevented a clear separation between EE and SH in terms of maternal behaviors.

      Strengths:

      (1) The question raised by this article is important at a fundamental level for our understanding of the complex interactions between the brain, behavior, and the environment.

      (2) This study replicates previous observations on the effects of EE in adult mice.

      (3) While some studies have been performed on neonates of dams exposed to EE during gestation, it is the first time that the effects of perinatal EE are investigated, in both the developing and mature brains with MRI. From a translational perspective, this is crucial for our understanding of human neurodevelopment in interaction with the environment.

      (4) The analyses carried out are numerous and detailed.

      Weaknesses:

      (1) The analyses carried out do not allow us to fully assess whether differences in maternal care mediate the effects of EE on brain structure during development. The observations support this causal hypothesis, but a complete mediation analysis would be useful if permitted by the sample size and the variability observed between litters.

      (2) The article is quite dense to read, given the number of analyses carried out. It is difficult at first reading to get a global view of the results. Figure 4 could be highlighted earlier to present the hypotheses and tests carried out.

      (3) The figures could be more explicit in terms of legends (particularly the supplementary figures).

    1. eLife Assessment

      This manuscript aims to identify the pacemaker cells in the lymphatic collecting vessels - the cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the exemplary use of existing approaches (genetic deletions and cytosolic calcium detection in multiple cell types), the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction. The inclusion of scRNAseq and membrane potential data enhances a tremendous study. This fundamental discovery establishes a new standard for the field of lymphatic physiology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the multiple cell types present in the wall of murine collecting lymphatic vessels with the goal of identifying cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the use of genetic models to delete individual genes or detect cytosolic calcium in specific cell types, the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction.

      Strengths:

      The experiments are rigorously performed, the data justify the conclusions and the limitations of the study are appropriately discussed.

      There is a need to identify therapeutic targets to improve lymphatic contraction and this work helps identify lymphatic muscle cells as potential cellular targets for intervention.

      Comments on revisions: The authors have addressed all of the reviewer comments. They should be congratulated on their precise and comprehensive study.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well written manuscript describing studies directed at identifying the cell type responsible for pacemaking in murine collecting lymphatics. Using state of the art approaches, the authors identified a number of different cell types in the wall of these lymphatics and then using targeted expression of Channel Rhodopsin and GCaMP, the authors convincingly demonstrate that only activation of lymphatic muscle cells produces coordinated lymphatic contraction and that only lymphatic muscle cells display pressure-dependent Ca2+ transients as would be expected of a pacemaker in these lymphatics.

      Strengths:

      The use of targeted expression of channel rhodopsin and GCaMP to test the hypothesis that lymphatic muscle cells serve as the pacemakers in musing lymphatic collecting vessels.

      Weaknesses:

      The only significant weakness was the lack of quantitative analysis of most of the imaging data shown in Figures 1-11. In particular the colonization analysis should be extended to show cells not expected to demonstrate colocalization as a negative control for the colocalization analysis that the authors present. These weaknesses have been resolved by revision and addition of new and novel RNAseq data, additional colocalization data and membrane potential measurements.

      Comments on revisions: No additional concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Zawieja et al. aimed to identify the pacemaker cells in the lymphatic collecting vessels. Authors have used various Cre-based expression systems and optogentic tools to identify these cells. Their findings suggest these cells are lymphatic muscle cells that drive the pacemaker activity in the lymphatic collecting vessels.

      Strengths:

      The authors have used multiple approaches to test their hypothesis. Some findings are presented as qualitative images, while some quantitative measurements are provided.

      Weaknesses:<br /> - More quantitative measurements.<br /> - Possible mechanisms associated with the pacemaker activity.<br /> - Membrane potential measurements.

      Comments on revisions: I do not have any additional comments.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      The authors have done an impressive job in responding to the previous critique and even gone beyond what was asked. I have only very minor comments on this excellent manuscript. The manuscript also needs some light editing for grammar and readability.

      We have worked to improve the grammar and readability of the manuscript.

      Comments:

      Lines 227-234: At what age was tamoxifen administered to the various CreERTM mice?

      We have updated the ages of the mice used in this study in the methods sections.

      UMAP in Figure 5A is missing label for cluster 19.

      The UMAP in Figure 5A has the label for cluster 19 at the center-bottom of the image.

      Supplement Figure 6: Cluster 10 seems to be separate from the other AdvC clusters, and it includes some expression of Myh11 and Notch3. Further, there is low expression of Pdgfra in this cluster, which can be seen in panel B and panels D-I. Are the Pdgfra negative cells in the pie charts from cluster 10? Could the cells in this cluster by more LMC like than AdvC like?

      We agree with the reviewer that the subcluster 10 of the fibroblasts cells are intriguing if only a minor population. When assessing just this population of cells, which is 77 cells out of 2261 total, 40 of the 77 were Pdgfra+ and of the 37 remaining Pdgfra- but 11 of those were still CD34+. Thus at least half of these cells could be expected to have the PdgfraCreERTM. Only 8 of the 37 were Pdgfra-Notch3+ while 12 cells were Pdgfra+Notch3+, and only 3 were Pdgfra-Myh11+ while 3 were Pdgfra+Myh11+. 26 of 77 cells were Pdgfra+Pdgfrb+ double positive, while 12 of 37 Pdgfra- cells were still Pdgfrb+. Additionally, within the 77 cells of subcluster 10 17 were positive for Scn3a (Nav1.3), 21were positive for Kcnj8 (Kir6.1), and 33 were positive for Cacna1c (Cacna1c) which are typically LMC markers would support the reviewers thinking that this group contains a fibroblast-LMC transitional cell type. Only 2 of 77 cells were positive for the BK subunit (Kcnma1), which is a classic smooth muscle marker. Another possibility is this population represents the Pdgfra+Pdgfrb+ valve interstitial cells we identified in our IF staining and in our reporter mice. Of note almost all cells in this cluster were Col3a1+ and Vim+. Even though we performed QC analysis to remove doublets, it is also possible some of these cells could represent doublets or contaminants, however the low % of Myh11 expression, a very highly expressed gene in LMCs especially compared to ion channels, would suggest this is less likely. Assessing the presence of this particular cell cluster in future RNAseq or with spatial transcriptomics will be enlightening.

      Line 360. Proofread section title.

      We have simplified this title to read “Optogenetic Stimulation of iCre-driven Channel Rhodopsin 2”

      Lines 370-371. Are the length units supposed to be microns or millimeters?

      We have corrected this to microns as was intended. Thank you for catching this error.

      The resolution for each UMAP analysis should be stated, particularly for the identification of subclusters. How was the resolution chosen?

      To select the optimal cluster resolution, we used Clustree with various resolutions. We examined the resulting tree to identify a resolution where the clusters were well-separated and biologically meaningful, ensuring minimal merging or splitting at higher resolutions. Our goal was to find a resolution that captures relevant cell subpopulations while maintaining distinct clusters without excessive fragmentation. We have now stated the resolution for the subclustering of the LECs, LMCs, and fibroblasts. We have also added greater detail regarding the total number of cells, QC analysis, and the marker identification criteria used to the methods sections. We used resolution of 0.5 for sub-clustering LMCs, 0.87 for LECs, and 1.0 for fibroblasts.  These details are now added to the manuscript.

    1. eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in terms of identifying mechanism as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

    2. Reviewer #2 (Public review):

      Summary:

      Sukhina et al. uses a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition to the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with an appropriate number of mice, robust phenotypes, and interesting conclusions, and the text is very well written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), should be studied in any future explorations using this model.

      The authors have recognized these limitations to the study in their discussion.

    3. Reviewer #3 (Public review):

      This communication from Sukhina et al argues that a period of malnutrition (modeled by caloric restriction) causes lasting immune deficiencies (myelopoesis) not rescued by re-feeding. This is a potentially important paper exploring the effects of malnutrition on immunity, which is a clinically important topic. The revised study adds some details with respect to kinetics of immune compartment and body weight changes, but most aspects raised by the referees were deferred experimentally. Several textual changes have been made to avoid over-interpreting their data. My overall assessment of this revised study is similar to my impression before, which is that while the observations are interesting, there is both a lack of mechanistic understanding of the phenomena and a lack of resolution/detail about the phenomena itself.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in others as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

      We would like to thank the editors for agreeing to review our work at eLife. We greatly appreciate them assessing this study as important and of general interest to multiple fields, as well as the opportunity to respond to reviewer comments. Please find our responses to each reviewer below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments.

      Strengths:

      The manuscript is well-written and conceived around a valid scientific question. The data supports the idea that malnutrition contributes to infection susceptibility and causes some immunological changes. The malnourished mouse model also displayed growth and development delays. The work's significance is well justified. Immunological studies in the malnourished cohort (human and mice) are scarce, so this could add valuable information.

      Weaknesses:

      The assays on myeloid cells are limited, and the study is descriptive and overstated. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I found no cellular mechanism defining the link between nutritional state and immunocompetency.

      We thank the reviewer for deeming our work significant and noting the importance of the study. We appreciate the referee’s point regarding the lack of specific cellular functional data for innate immune cells and have modified the conclusions stated in text to more accurately reflect the results presented.

      Reviewer #2 (Public review):

      Summary:

      Sukhina et al. use a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition on the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with appropriate numbers of mice, robust phenotypes, and interesting conclusions, and the text is very well-written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, which is well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), is completely ignored here.

      We thank the reviewer for agreeing that the data presented support the stated conclusions and noting the experimental rigor.  The referee highlights two important areas for future mechanistic investigation that we agree are of great importance and relevant to the submitted study. We have included further discussion of the potential role cytokines and the microbiota might play in our model.

      Reviewer #3 (Public review):

      Summary:

      Sukhina et al are trying to understand the impacts of malnutrition on immunity. They model malnutrition with a diet switch from ad libitum to 40% caloric restriction (CR) in post-weaned mice. They test impacts on immune function with listeriosis. They then test whether re-feeding corrects these defects and find aspects of emergency myelopoiesis that remain defective after a precedent period of 40% CR. Overall, this is a very interesting observational study on the impacts of sudden prolonged exposure to less caloric intake.

      Strengths:

      The study is rigorously done. The observation of lasting defects after a bout of 40% CR is quite interesting. Overall, I think the topic and findings are of interest.

      Weaknesses:

      While the observations are interesting, in this reviewer's opinion, there is both a lack of mechanistic understanding of the phenomena and also some lack of resolution/detail about the phenomena itself. Addressing the following major issues would be helpful towards aspects of both:

      (1) Is it calories, per se, or macro/micronutrients that drive these phenotypes observed with 40% CR. At the least, I would want to see isocaloric diets (primarily protein, fat, or carbs) and then some of the same readouts after 40% CR. Ie does low energy with relatively more eg protein prevent immunosuppression (as is commonly suggested)? Micronutrients would be harder to test experimentally and may be out of the scope of this study. However, it is worth noting that many of the malnutrition-associated diseases are micronutrient deficiencies.

      (2) Is immunosuppression a function of a certain weight loss threshold? Or something else? Some idea of either the tempo of immunosuppression (happens at 1, in which weight loss is detected; vs 2-3, when body length and condition appear to diverge; or 5 weeks), or grade of CR (40% vs 60% vs 80%) would be helpful since the mechanism of immunosuppression overall is unclear (but nailing it may be beyond the scope of this communication).

      (3) Does an obese mouse that gets 40% CR also become immunodeficient? As it stands, this ad libitum --> 40% CR model perhaps best models problems in the industrial world (as opposed to always being 40% CR from weaning, as might be more common in the developing world), and so modeling an obese person losing a lot of weight from CR (like would be achieved with GLP-1 drugs now) would be valuable to understanding generalizability.

      (4) Generalizing this phenomenon as "bacterial" with listeriosis, which is more like a virus in many ways (intracellular phase, requires type I IFN, etc.) and cannot be given by the natural route of infection in mice, may not be most accurate. I would want to see an experiment with E.Coli, or some other bacteria, to test the statement of generalizability (ie is it bacteria, or type I IFN-pathway dominant infections, like viruses). If this is unique listeriosis, it doesn't undermine the story as it is at all, but it would just require some word-smithing.

      (5) Previous reports (which the authors cite) implicate Leptin, the levels of which scale with fat mass, as "permissive" of a larger immune compartment (immune compartment as "luxury function" idea). Is their phenotype also leptin-mediated (ie leptin AAV)?

      (6) The inability of re-feeding to "rescue" the myeloid compartment is really interesting. Can the authors do a bone marrow transplantation (CR-->ad libitum) to test if this effect is intrinsic to the CR-experienced bone marrow?

      (7) Is the defect in emergency myelopoiesis a defect in G-CSF? Ie if the authors injected G-CSF in CR animals, do they equivalently mobilize neutrophils? Does G-CSF supplementation (as one does in humans) rescue host defense against Listeria in the CR or re-feeding paradigms?

      We thank the reviewer for considering our work of interest and noting the rigor with which it was conducted. The referee raises several excellent mechanistic hypotheses and follow-up studies to perform. We agree that defining the specific dietary deficiency driving the phenotypes is of great interest. The relative contribution of calories versus macro- and micronutrients is an area we are interested in exploring in future studies, especially given the literature on the role of micronutrients in malnutrition driven wasting as the referee notes. We also agree that it will be key to determine whether non-hematopoietic cells contribute as well as the role of soluble factors such G-CSF and Leptin in mediating the immunodeficiency all warrant further study. Likewise, it will be important to evaluate how malnutrition impacts other models of infection to determine how generalizable these phenomena are. We have added these points to the discussion section as limitations of this study.

      Regarding how the phenotypes correspond to the timing of the immunosuppression relative to weight loss, we have performed new kinetics studies to provide some insight into this area. We now find that neutropenia in peripheral blood can be detected after as little as one week of dietary restriction, with neutropenia continuing to decline after prolonged restriction. These findings indicate that the impact on myeloid cell production are indeed rapid and proceed maximum weight loss, though the severity of these phenotypes does increase as malnutrition persists. We wholeheartedly agree with the reviewer that it will be interesting to explore whether starting weight impacts these phenotypes and whether similar findings can be made in obese animals as they are treated for weight loss.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I could not find any cellular mechanism defining the link between nutritional state and immunocompetency. The assays on myeloid cells are limited, and the study is descriptive and overstated.

      Major concerns:

      (1) Malnutrition has entirely different effects on adults and children. In this study, 6-8 weeks old C57/Bl6 mice were used that mimic adult malnutrition. I do not understand then why the refeeding strategy for inpatient treatment of severely malnourished children was utilized here.

      (2) Figure 1g shows BM cellularity is reduced, but the authors claim otherwise in the text.

      (3) What is the basis of the body condition score in Figure 1d? It will be good to have it in the supplement.

      (4) Listeria monocytogenes cause systemic infection, so bioload was not determined in tissues beyond the liver.

      (5) Figure 3; T cell functional assays were limited to CD8 T cells and lymphocytes isolated from the spleen.

      (6) Why was peripheral cell count not considered? Discrepancies exist with the absolute cell number and relative abundance data, except for the neutrophil and monocyte data, which makes the data difficult to interpret. For example, for B cells, CD4 and CD8 cells.

      (7) Also, if mice exhibit thymic atrophy, why does % abundance data show otherwise? Overall, the data is confusing to interpret.

      (8) No functional tests for neutrophil or monocyte function exist to explain the higher bacterial burden in the liver or to connect the numbers with the overall pathogen load

      The rationale for examining both innate and adaptive immunity is not clear-it is even more unclear since the exact timelines for examining both innate and adaptive immunity (D0 and D5) were used.

      (9) Figure 2e doesn't make sense - why is spleen cellularity measured when bacterial load is measured in the liver?

      (10) Although it is claimed that emergency myelopoiesis is affected, no specific marker for emergency myelopoiesis other than cell numbers was studied.

      (11) I suggest including neutrophil effector functions and looking for real markers of granulopoiesis, such as Cebp-b. Since the authors attempted to examine the entirety of immune responses, it is better to measure cell abundance, types, and functions beyond the spleen. Consider the systemic spread of m while measuring bioload.

      (12) Minor grammatical errors - please re-read the entire text and correct grammatical errors to improve the flow of the text.

      (13) Sample size details missing

      (14) Be clear on which marks were used to identify monocytes. Using just CD11b and Ly6G is insufficient for neutrophil quantification.

      (15) Also, instead of saying "undernourished patients," say "patients with undernutrition" - change throughout the text. I would recommend numbering citations (as is done for Nature citations) to ease in following the text, as there are areas when there are more than ten citations with author names.

      (16) No line numbers are provided

      (17) Abstract

      -  What does accelerated contraction mean?

      -  "In" is repeated in a sentence

      -  Be clear that the study is done in a mouse model - saying just "animals" is not sufficient

      -  Indicate how malnutrition is induced in these mice

      (18) Introduction

      -  "restriction," "immune organs," - what is this referring to?

      -  You mention lymphoid tissue and innate and adaptive immunity, which doesn't make sense.

      Please correct this.

      -  You mention a lot of lymphoid tissues, i.e. lymphoid mass gain, but how about the bone marrow and spleen, which are responsible for most innate immune compartments?

      (19) Results

      a) Figure 1

      -  Why 40% reduced diet?

      -  It would be interesting to report if the organs are smaller relative to body weight. It makes sense that the organ weight is lower in the 40RD mice, especially since they are smaller, so the novelty of this data is not apparent (Figure 1f).

      -  You say, "We observed a corresponding reduction in the cellularity of the spleen and thymus, while the cellularity of the bone marrow was unaffected (Fig. 1g)." however, your BM data is significant, so this statement doesn't reflect the data you present, please correct.

      b) Figure 2

      - Figure 2d - what tissue is this from, mentioned in the figure? And measure cellularity there. The rationale for why you look only at the spleen here is weak. Also, we would benefit from including the groups without infection here for comparison purposes.

      c) Figure 3

      - The rationale for why you further looked at T cells is weak, mainly because of the following sentence. "Despite this overall loss in lymphocyte number, the relative frequency of each population was either unchanged or elevated, indicating that while malnutrition leads to a global reduction in immune cell numbers, lymphocytes are less impacted than other immune cell populations (Supplemental 1)." Please explain in the main text.

      d) Figure 4

      -  You say the peak of the adaptive immune response, but you never looked at the peak of adaptive immune - when is this? If you have the data, please show it. You also only show d0 and d5 post-infection data for adaptive immunity, so I am unsure where this statement comes from.

      -  How did you identify neutrophils and monocytes through flow cytometry? Indicate the markers used. Also, your text does not match your data; please correct it. i.e. monocyte numbers reduced, and relative abundance increased, but your text doesn't say this.

      -  Show the flow graph first then, followed by the quantification.

      -  The study would benefit from examining markers of emergency myelopoiesis such as Cebpb through qPCR.

      -  Although the number of neutrophils is lower in the BM and spleen, how does this relate to increased bacterial load in the liver? This is especially true since you did not quantify neutrophil numbers in the liver.

      e) Figure 6

      -  Some figures are incorrectly labelled.

      -  For the refeeding data, also include the data from the 40RD group to compare the level of recovery in the outcome measures.

      (20) Discussion

      -  You claim that monocytes are reduced to the same extent as neutrophils, but this is not true.

      Please correct.

      -  Indicate some limitations of your work.

      We thank the reviewer for offering these recommendations and the constructive comments. 

      Several comments raised concerns over the rationale or reasoning behind aspects of the experimental design or the data presented, which we would like to clarify:

      • Regarding the refeeding protocol, we apologize for the confusion for the rationale. We based our methodology on the general guidelines for refeeding protocols for malnourished people. We elected to increase food intake 10% daily to avoid risk of refeeding syndrome or other complications. Our method is by no means replicates the administration of specific vitamins, minerals, electrolytes, nor precise caloric content as would be given to a human patient. The citation provided offers information from the WHO regarding the complications that can arise during refeeding syndrome, which while it is from a document on pediatric care, we did not mean to imply that our method modeled refeeding intervention for children. We have modified the text to avoid this confusion.

      • The reviewer requested more clarity on why we studied both the innate and adaptive immune system as well as why we chose the time points studied. As referenced in the manuscript, prior work has observed that caloric restriction, fasting, and malnutrition all can impact the adaptive immune system. Given these previous findings, we felt it important to evaluate how malnutrition affected adaptive immune cell populations in our model. To this end, we provide data tracking the course of T-cell responses from the start of infection through day 14 at the time that the response undergoes contraction. However, since we find that bacterial burden is not properly controlled at earlier time points (day 5), when it is understood the innate immune system is more critical for mediating pathogen clearance, we elected to better characterize the effect malnutrition had on innate immune populations, something less well described in the literature. As phenotypes both in bacterial burden and within innate immune populations were observable as early as day 5, we chose to focus on that time point rather than later time points when readouts could be further confounded by secondary or compounding effects by the lack of early control of infection. We have tried to make this rationale clear in the text and have made changes to further emphasize this reasoning.

      • The reviewer also requested an explaination over why bacterial burden was measured in the liver and the immune response was measured in the spleen. While the reviewer is correct that our model is a systemic infection, it is well appreciated that bacteria rapidly disseminate to the liver and spleen and these organs serve as major sites of infection. Given the central role the spleen plays in organizing both the innate and adaptive immune response in this model, it is common practice in the field to phenotype immune cell populations in the spleen, while using the liver to quantify bacterial burden (see PMID: 37773751 as one example of many). We acknowledge this does not provide the full scope of bacterial infection or the immune response in every potentially affected tissue, but nonetheless believe the interpretation that malnourished and previously malnourished animals do not properly control infection and their immune responses are blunted compared to controls still stands.

      The reviewer raised several points about di3erences in the results for cell frequency and absolute number and why these may deviate in some circumstances. For example, the reviewer notes that we observe thymic atrophy yet the frequency of peripheral T-cells does not decline. It should be noted that absolute number can change when frequency does not and vice versa, due to changes in other cell types within the studied population of cells. As in the case of peripheral lymphocytes in our study, the frequency can stay the same or even increase when the absolute number declines (Supplemental 1). This can occur if other populations of cells decrease further, which is indeed the case as the loss of myeloid cells is greater than that of lymphocytes. Hence, we find that the frequency of T and B cells is unchanged or elevated, despite the loss in absolute number of peripheral cell, which is our stated interpretation. We believe this is consistent with our overall observations and is why it is important to report both frequency and absolute number, as we have done. 

      We have made the requested changes to the text to address the reviewers concerns as noted to improve clarity and accuracy for the description of experiments, results, and overall conclusions drawn in the manuscript. We have also included a discussion of the limitations of our work as well as additional areas for future investigation that remain open. 

      Reviewer #2 (Recommendations for the authors):

      Regarding the known drivers of myelopoiesis, can the authors quantify circulating levels of relevant immune cytokines (e.g. type I and II IFNs, GM-CSF, etc.)?

      Regarding the microbiota (point #2), how dramatically does this undernutrition modulate the microbiota both in terms of absolute load and community composition, and how effectively/quickly is this rescued by refeeding?

      We thank the reviewer for raising these recommendations. We agree that the role of circulating factors like cytokines and growth factors in contributing to the defects in myelopoiesis is of interest and is the focus of future work. Similarly, the impact of malnutrition on the microbiota is of great interest and has been evaluated by other groups in separate studies. How the known impact of malnutrition on the microbiota affects the phenotypes we observe in myelopoiesis is unclear and warrants future investigation. We have added these points to the discussion section as limitations of this study.

    1. Author Response:

      In the Weaknesses, Reviewer 3 suggests that in the Discussion, we comment upon whether WRN ATPase/3’-5’ helicase and WRNIP1 ATPase work on Y-family Pols additively or synergistically to raise fidelity. However, in the Discussion on page 20, we do comment on the role of WRN and WRNIP1 ATPase activities in conferring an additive increase in the fidelity of TLS by Y-family Pols.

    2. eLife Assessment

      This manuscript reports an important finding for understanding the molecular mechanisms of mutagenesis, carcinogenesis, and senescence. It follows a previous report showing that the Werner syndrome protein WRN and its interacting protein WRNIP1 are indispensable for translesion DNA synthesis (TLS) by Y-family DNA polymerases (Pols). The manuscript provides convincing evidence that WRN and WRNIP1 ATPases, in addition to the previously reported role of the WRN 3'>5' exonuclease activity, are essential for promoting the fidelity of replication through DNA lesions by Y-family Pols in human cells.

    3. Reviewer #1 (Public review):

      Summary:

      Y-family polymerases, such as polymerases eta, iota, and kappa, have low fidelity relative to other polymerases involved in DNA replication and repair. This is believed to be due to their active sites being less constrained than those of other polymerases. Paradoxically, work by this lab and others shows that in vivo, these Y-family polymerases are more error-free (less error-prone) during DNA damage bypass than would be expected given their low fidelity. For this reason, the authors have been focusing on other cellular factors that may increase the fidelity of Y-family polymerases. The current paper focuses on two such factors: WRN, which possesses exonuclease and helicase activities, and WRNIP1, which possesses a DNA-dependent ATPase.

      Previously, this group showed that defects in the exonuclease function of WRN lead to a loss in the fidelity of polymerases eta and iota during DNA damage bypass, presumably by removing nucleotide misinsertions. The current paper extends this work by considering the ATPase activities of WRN and WRNIP1. The authors looked at the impact of various amino acid substitutions in these proteins on the fidelity of DNA damage bypass by Y-family polymerases. They did this by both measuring the mutation frequencies of these cell lines as well as the mutation spectra observed in them. They showed that the ATPase activities of both WRN and WRNIP1, as well as the exonuclease activities of WRN, are necessary high fidelity of Y-family polymerases in cells. They specifically examined the bypass of cyclobutene pyrimidine dimers by polymerase eta, the bypass of 6-4 photoproducts by polymerases eta and iota, and the bypass of ethenoadenine by polymerase iota. Moreover, they showed that WRNIP1 ATPase defects impair the WRN exonuclease from removing misinsertions by polymerase iota at thymine glycol lesions. These defects generally do not affect the efficiency of the bypass, only its fidelity.

      Strengths:

      The manuscript by Yoon et al is the latest in a series of important and impactful papers by this research group examining the cellular factors that enhance the fidelity of translesion synthesis by Y-family polymerases in human cell lines. Overall, the study is well designed, the data are clearly presented, and the conclusions are well supported and convincing. The authors also discuss a reasonable possibility that complex formation between the WRN and WRNIP1 proteins and Y-family polymerases could tighten the active sites of these polymerases to improve fidelity. Further studies are required to demonstrate this model, but it is a very exciting model that is well supported by the current data.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    4. Reviewer #2 (Public review):

      The authors of the present study are responsible for a previous study, which also showed that in response to DNA damage, Werner syndrome protein WRN, WRN interacting protein WRNIP1, and Rev1 assemble together with Y-family Pols (Polη, Polι, or Polκ), and that they are indispensable for Trans-Lesion-Synthesis (TLS) (Genes Dev 2024). They also identified a role of WRN's 3'→5' exonuclease activity in the high in vivo fidelity of TLS by Y-family, through UV-induced CPDs by Polη, through N6 ethenodeoxyadenosine (εdA) by Polι, through thymine glycol by Polκ, and through UV-induced (6-4) photoproducts by Polη and Polι. Thus, by removing nucleotides misinserted opposite DNA lesions by the Y-family Pols, WRN's 3'→5' exonuclease activity improves the fidelity of TLS by these Pols. The present work, which follows up on this previous work, reports the crucial role also of the ATPase activities of WRN and WRNIP1 in raising the fidelity of TLS by Y family Pols, in addition to the exonuclease activity, with an entirely different mechanism, which normally consists in unwinding of DNA containing secondary structures.

      By using adequate cell line models and methodologies, notably DNA fiber, TLS, and mutation analyses assays, as well as specific ATPase point mutations, they found that progression of the replication forks through UV lesions was not affected in cells lacking the WRN exonuclease activity as well as the WRN and WRNIP1 ATPase activities, but occurs with a vast increase in error-prone TLS, notably through CPDs by Polη, with differential impacts on the nature of mutations between WRN ATPase and WRNIP1 ATPase. The relative contributions of these activities (exonuclease and ATPase) to the fidelity of TLS Pols, however, vary, depending upon the DNA lesion and the TLS Pol involved. Additionally, defects in these ATPase activities cause mutational hot spot formation in different sequence contexts. The authors provide evidence that the combined action of WRN and WRNIP1 ATPases, along with WRN 3' to 5' exonuclease, confers an enormous rise in the fidelity of TLS by Y-family Pols. They identify the means by which these otherwise highly error-prone TLS Pols have been adapted to function in an error-free manner. They suggest that WRNIP1 ATPases prevent misincorporations while WRN exonuclease removes misinserted nucleotides. This combination confers a vast increase in the fidelity of Y-family Pols, essential for genome stability.

      Overall, this is a comprehensive and thoughtful manuscript, and all the findings reported are convincing and well supported. The data cannot be considered as entirely novel, as they follow-up on the recent 2024 publication by the same authors who unveiled that the exonuclease activity of WRN and WRNIP1 confers accuracy of TLS. The experimental methods are multiple and rigorous.

    5. Reviewer #3 (Public review):

      Summary:

      Replication through DNA lesions such as UV-induced pyrimidine dimers is mainly performed by Y-family pols. These translesion synthesis (TLS) pols are intrinsically error-prone. However, in living cells, TLS must be conducted in an error-free manner. This manuscript demonstrated that WRN and WRNIP1 ATPases play an important role in addition to WRN 3'>5' exonuclease in human cells.

      Strengths:

      The authors made use of WT human fibroblasts and WRN-deficient cell line for TLS assays in human cells and siRNA knock-down experiments to analyze TLS efficiency. For the cII mutation assay, the big blue mouse embryonic fibroblasts were used. These materials, as well as other Materials and Methods, had already been well established by this group or other groups. The authors used Pol eta, iota, kappa, and theta as TLS pols, and used UV-induced CPD, (6-4)PP, epsilon dA, and thymine glycol as DNA lesions. Thus, the authors examined the generality of their results in terms of TLS pols and DNA lesions.

      Weaknesses:

      Although the main part of this manuscript is the impact of the deficiencies of WRN and WRNIP1 ATPases on TLS by Y-family DNA polymerases, especially on TLS efficiency and mutation spectrum, many readers would be interested in how these ATPases could change molecular structure of Pol eta, because the structure of it have been studied for some time.

    1. Author Response:

      We thank the reviewers for their thoughtful feedback and appreciate their recognition of the value of our findings. In response, we are refining the manuscript to clarify key terminology, more clearly describe our image analysis workflows, and temper the interpretation of our results where appropriate. We are planning to perform additional experiments to further investigate the specificity of mRNA co-localization between BK and CaV1.3 channels. We acknowledge the importance of understanding ensemble trafficking dynamics and the functional role of pre-assembly at the plasma membrane, and we plan to explore these questions in future work. We look forward to submitting a revised manuscript that addresses the reviewers’ comments in detail.

    2. eLife Assessment

      This valuable manuscript provides convincing evidence that BK and CaV1.3 channels can co-localize as ensembles early in the biosynthetic pathway, including in the ER and Golgi. The findings, supported by a range of imaging and proximity assays, offer insights into channel organization in both heterologous and endogenous systems. However, mechanistic questions remain unresolved, particularly regarding the specificity of mRNA co-localization, the dynamics of ensemble trafficking, and the functional significance of pre-assembly at the plasma membrane. While the data broadly support the central claims, certain conclusions would benefit from more restrained interpretation and additional clarification to enhance the manuscript's impact and rigor.

    3. Joint Public Review:

      This study presents a valuable contribution to our understanding of ion channel complex assembly by investigating whether BK and CaV1.3 channels begin to form functional associations early in the biosynthetic pathway, prior to reaching the plasma membrane. Using a combination of proximity ligation assays, single-molecule RNA imaging, and super-resolution microscopy, the authors provide convincing evidence that these channels co-localize intracellularly within the ER and Golgi, in both overexpression systems and a relevant endogenous cell model. The study addresses an important and underexplored aspect of membrane protein trafficking and organization, with broader implications for how ion channel signaling complexes are assembled and regulated. The experimental approaches are generally appropriate and the imaging data are clearly presented, with a commendable number of control experiments included. However, several limitations temper the interpretation of the results. The mechanisms underlying mRNA co-localization, and the role of co-translation in complex formation, remain insufficiently defined. Similarly, while intracellular colocalization is convincingly demonstrated, the study does not establish whether such early assembly is the predominant pathway for generating functional complexes at the plasma membrane. More rigorous quantification of channel co-association across compartments, and clarification of key terminology and image analysis methods, would strengthen the overall conclusions. Some of the language in the manuscript would also benefit from a more measured tone to avoid overstating the novelty of the findings. Despite these limitations, the study offers meaningful insights into intracellular ion channel organization and will be of interest to researchers in cell biology, membrane trafficking, and neurophysiology. With focused revisions addressing the outlined points, the manuscript has the potential to make a solid contribution to the field.

    1. eLife Assessment

      This important study explores the role of SIRT2 in regulating Japanese encephalitis virus replication and disease progression in rodent models. The findings presented are novel as sirtuins are known for their roles in aging, metabolism, and cell survival, but have not been studied in the context of viral infections until recently. The evidence supporting the claims is solid, although additional experiments to further characterize the clinical outcomes and directly test the link between acetylated NF-kB and SIRT2 expression would have strengthened the study. The work will be of interest to biologists studying viruses, sirtuins, and inflammation.

    2. Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      Strengths:

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

    3. Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

      We thank the reviewer for the valuable recommendation. We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      Furthermore, we acknowledge reviewers' comments that SIRT2 regulates systemic inflammatory responses and provides potent protection against viral infection. Additionally, NF-κB is not the only mediator of SIRT2's suppression of viral infection; other possible molecular mechanisms are also involved in this process.

      Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

      We thank the reviewer for the valuable recommendation. We are willing to measure NF-kB acetylation in AdSIRT2 JEV-infected cells compared to WT-infected cells, to verify that the acetylation of NF-kB is truly linked to SIRT2 expression levels as per the reviewers' suggestion.

      We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      We are accepting the reviewer's suggestion that AGK2 can also inhibit other Sirtuins. Thus, to test the contribution of other Sirtuins, the experiment could be repeated using wild-type and Sirt2 KO mice. We are willing to conduct the AGK2 experiment using JEV-infected wild-type and Sirt2 knockout mice.

    1. eLife Assessment

      This valuable study tested whether several months of dolutegravir intensification alters the size of the HIV reservoir as well as immune activation in individuals already on suppressive ART. While the general study approach is appropriate and the paper is well written, the evidence supporting the claims of the authors is incomplete. The title of the paper is only partially supported by the data, based on specific issues with the study design and analysis plan highlighted by Reviewer 1. Specifically, the primary study outcomes were not clearly described a priori, the plausibility of a biologic effect is uncertain based on lack of a consistent effect across participants, and sample size is small. Given a possible observed partial effect and relevant hypothesis, this approach warrants study in a larger trial.

    2. Reviewer #1 (Public review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes betweenthe control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result. The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

    3. Reviewer #2 (Public review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

    4. Reviewer #3 (Public review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug-drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group. Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C). The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size. The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C. This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between group,s where the results are less convincing.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

    5. Author response:

      Reviewer #1 (Public Review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which will help us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we intend to implement in the revised version. We acknowledge the reviewer’s concern regarding potential over-interpretation of certain findings, and we will take particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We will clarify this in the revised manuscript. As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We will clarify in the manuscript that this was a proof-of-concept phase 2 study, designed to generate biological signals rather than confirm efficacy in a powered comparison. The absence of a pre-specified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We will improve the Methods section to clarify how safety and tolerability were assessed during the study. Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4<sup>+</sup> T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4<sup>+</sup> T cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir.

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern may reflect the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harboring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. This explains why the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). We will highlight this result more clearly in the revised manuscript, as it directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes.

      In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a modest but biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across these independent measures suggests a signal indicative of ongoing replication in at least some individuals, and at specific timepoints.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA Shearing Index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity and weak signals, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we will adopt a more measured tone in the discussion, clearly stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a potentially meaningful biological signal that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We will revise the manuscript to reflect this more accurately. We will also note that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We will explicitly acknowledge these limitations and interpret the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We will expand the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of post-intervention follow-up. While we acknowledge that open-label designs can introduce behavioral biases, including potential changes in adherence, we will now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research.

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we will highlight this limitation and future direction in the revised manuscript.

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs-part2/cancer/cancer-screening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors. We will clarify these points in the revised manuscript.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA / Total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA / Total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA / Total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public Review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public Review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We will revise the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We will acknowledge these limitations in the revised manuscript. We will provide both the full range and the IQR in Table 1 in the revised manuscript.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values. These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/Total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size.

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4<sup>+</sup> T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We will temper the language accordingly and add commentary on the limited and modest nature of these changes. Similarly, we will expand our discussion of counterintuitive findings such as the CD4:CD8 ratio and sCD14 changes.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We will highlight this more clearly in the revised Discussion.

    1. eLife Assessment

      Tropical single-island endemic bird populations are particularly vulnerable to climate change. The authors investigate genetic evidence of how such species dealt with climate changes in the past as a possible predictor for how they will respond to change in the future, which could provide an important example for the fields of conservation genetics and island biogeography. The authors' integration of genomics and habitat modeling is commendable, but we find that the support for their conclusions is incomplete: at times, the results presented appear to contradict each other, the authors do not fully account for key variables, and the limited taxonomic scope may cause problematic biases for the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors combine PSMC and habitat modeling to try to connect habitat change during the Last Glacial Period to changes in Ne.

      Strengths:

      Observing how tropical single-island endemic bird species responded to habitat change in the past may help inform conservation interventions for these particularly vulnerable species. The combination of genomics and habitat modeling is a good idea - this sort of interdisciplinary thinking is what is needed to tackle these complex questions. Additionally, the use of PSMC makes it possible to perform this analysis on poorly-studied species with only a single genome available.

      Room for Improvement:

      Why coalescent Ne is a better predictor of extinction risk than current genomic diversity, or current Ne, isn't explicitly explained. PSMC in particular has many caveats, and some are not acknowledged or adequately addressed by the authors. For example, the authors note that population structure is a confounding factor with PSMC, but that it is not a problem in this instance. They do not provide compelling evidence for why this would be the case, they simply state that the species studied are all single-island endemics. However, single-island endemic species are not necessarily panmictic; this is even less likely to be true for species studied here that inhabit a large geographic area (ie, Australian species). Differing PSMC parameters may also impact results: the differences between passerines and non-passerines were one of their main results, but they do not provide any analysis to show that this difference was not driven by the different mutation rates used for the two groups.

      Parameters for many steps are not described, and choices that are described (such as the PSMC parameters) are not always fully explained. It is unclear why all data was mapped to the autosomes rather than removing reads that map to the sex chromosomes first. Using all the data, the reads belonging to the sex chromosomes could potentially map to other areas of the genome. It does not seem like a mapping quality filter was used, so these potential spurious alignments would not have been removed prior to analysis.

      There are points where the results are described in ways that appear to potentially differ from the supplementary figures. The authors state that even for species where PSMC results differed between models, "trends of Ne increase or decrease from the LIG to LGM were robust across all three PSMC models considered." The figures in the supplement for Pachycephala philippinensis, Rhynochetos jubatus, and Zosterops hypoxanthus appear to potentially contradict this statement, but it is difficult to tell, as the time period observed is not clearly marked on the graphs. How this robustness of trends was determined is not explained, leaving the precision of the analysis unclear.

      Table 1 also includes some information that contradicts what is in the Supplementary Tables, leading to a lack of clarity. Centropus unirufus, Chaetorhynchus papuensis, and Cnemophilus loriae are not included in Supplementary Table 4. Table 1 says Eulacestoma nigropectus, Paradisaea rubra, and Parotia lawesii did not undergo PSMC analysis, but Supplementary Table 4 says PSMC and modeling trends matched for these species. Table 1 says Rhagologus leucostigma underwent both PSMC and climate modeling, but Supplementary Table 4 says "NA" as if it was missing one of these analyses.

      Additionally, some of the results appear to contradict each other. For example, they show that there is no impact of habitat change in larger-bodied species, but also that larger-bodied species saw a decrease in Ne during the LGP. In another example, they state that when a species saw an increase in habitat during the LGP, they also had an increase in Ne. However, they also state that this was not the case for non-passerines.

      Ecosystems are highly complex; there may also be other variables influencing past demographic change other than those explored here. Results should be interpreted with caution.

    3. Reviewer #2 (Public review):

      Summary and strengths:

      In this manuscript, Karjee and colleagues used coalescent-based effective population size reconstruction (PSMC) from single genomes to understand past population trends in island birds and related this to life history traits and glacial patterns. This concept is fairly new, as there are still relatively few multiple PSMC synthesis studies. I also thought that the focus on island endemics was unique and adds value to this paper. I enjoyed seeing a paper focused on South East Asia and think that this could help contribute to our knowledge of the important biodiversity within this region.

      Major weaknesses:

      My biggest concern with this paper is that the analyses are limited to 20-30 species, and significant taxonomic bias is present (there are multiple species of passerine but only 1-2 representatives of other groups). While this is not an issue alone, many of the life history traits or geographical traits are conflated with phylogenetic diversity (e.g., there are no large-bodied passerines). Thus, it is my opinion that the impact of these drivers of past population size is conflated and cannot be disentangled with the current data. The authors themselves state that the core hypothesis surrounding Ne and habitat availability is not supported by their entire dataset (only seen in Passerines). This was not clear enough in the abstract, and conclusions cannot be drawn here as the impact of taxonomy cannot be separated from data richness, traits, etc. The PSMC analysis was done according to the most recent recommendations, and this part of the manuscript is fairly robust. However, in several places, it is incorrectly stated that the PSMC measures or can infer genetic diversity; PSMC only infers past effective population size. It cannot measure genetic diversity in the past. I cannot review the habitat reconstruction modelling as I am a conservation genomics specialist.

      Appraisal:

      I am not convinced about the findings within the paper. I do not think that the results are sufficiently supported at this time, largely due to the conflation of taxonomy with other variables. As this type of comparison is new, I do think that there is a chance for reasonable impact on the field of genomics and island biogeography if the manuscript's constraints are addressed. I do not see scope for impact on conservation at this time and find the conclusions in the abstract regarding conservation relevance to be unfounded.

    4. Author response:

      We thank the editors and the reviewers for their positive comments regarding our manuscript and the methodological approach we have taken to understand the historical demographic response of endemic island birds to climate change. We acknowledge the issues of uneven sample sizes and plan to include additional species of island endemic birds for which genomic data is now available. As requested by reviewer 1, we will also address the issues related to the PSMC analysis in the revised version of the manuscript.

    1. eLife Assessment

      This study presents important findings that enhance our understanding of immune cell interactions in the context of chronic HIV-1 infection. The evidence supporting the conclusions is convincing. The authors have employed appropriate and validated methodologies, including detailed data reprocessing and batch correction to account for inter-donor variability. The inclusion of supplementary figures and analyses, such as cell communication inference, further substantiates the robustness of the findings. Overall, this work contributes to our understanding of HIV-1 immune evasion and highlights potential therapeutic targets for reservoir eradication.

    2. Reviewer #2 (Public review):

      Summary:

      The authors observed gene ontologies associated with upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells using scRNA-seq and scATAC-seq datasets from the PBMCs of early HIV-1-infected patients, showing immune responses contributing to HIV pathogenesis and novel targets for viral elimination.

      Strengths:

      The authors carried out detailed transcriptomics profiling with scRNA-seq and scATAC-seq datasets to conclude upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells.

      Comments on revisions:

      The authors justified my comments.

    3. Reviewer #3 (Public review):

      The revised manuscript demonstrates a marked improvement over the previous version. The authors have successfully incorporated feedback, and have moreover expanded their analyses.

      The Methods section is now more detailed and meets the requirements for reproducible research. Authors have reprocessed the data, creating an integrated dataset using a previously published single-cell RNA-Seq atlas, which includes both healthy donors and individuals with chronic HIV-1 infection. An additional batch correction step was included into the processing pipeline after the explicit analysis of inter-donor variability within immune subsets, as was suggested.

      Several supplementary figures were added, which both improve the understanding of data and address questions raised by the reviewers. The manuscript also provides additional analysis of cell communication inference, as suggested. The study of interactions between NK cells and infected CD4+ T cells, as well as between monocytes and infected CD4+ T cells, is valuable for understanding the influence of cell signaling on antiviral response and the production of HIV-1 transcripts in infected cells.

      The authors have addressed all the reviewers' suggestions, and the current version of the manuscript is both more comprehensive and more informative. Additional analysis has strengthened the narrative and the reproducibility of the research.

      The resulting manuscript is both more robust and more informative.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the molecular mechanisms underlying HIV-1 persistence and host immune dysfunction in CD4+ T cells during early infection (<6 months). Using single-cell multi-omics technologies-including scRNA-seq, scATAC-seq, and single-cell multiome analyses-they characterized the transcriptional and epigenomic landscapes of HIV-1-infected CD4+ T cells. They identified key transcription factors (TFs), signaling pathways, and T cell subtypes involved in HIV-1 persistence, particularly highlighting KLF2 and Th17 cells as critical regulators of immune suppression. The study provides new insights into immune dysregulation during early HIV-1 infection and reveals potential epigenetic regulatory mechanisms in HIV-1-infected T cells.

      Strengths:

      The study excels through its innovative integration of single-cell multi-omics technologies, enabling detailed analysis of gene regulatory networks in HIV-1-infected cells. Focusing on early infection stages, it fills a crucial knowledge gap in understanding initial immune responses and viral reservoir establishment. The identification of KLF2 as a key transcription factor and Th17 cells as major viral reservoirs, supported by comprehensive bioinformatics analyses, provides robust evidence for the study's conclusions. These findings have immediate clinical relevance by identifying potential therapeutic targets for HIV-1 reservoir eradication.

      We sincerely appreciate the reviewer’s positive evaluation of our work.

      Weaknesses:

      Despite its strengths, the study has several limitations. By focusing exclusively on CD4+ T cells, the study overlooks other relevant immune cells such as CD14+ monocytes, NK cells, and B cells. Additionally, while the authors generated their own single-cell datasets, they need to validate their findings using other publicly available single-cell data from HIV-1-infected PBMCs.

      Thank you to Reviewer #1 for your feedback on our work. In response to this feedback, we have examined cell-cell interactions between HIV-1-infected CD4+ T cells and other innate immune cells, including monocytes and NK cells. We identified altered interaction signaling patterns (e.g., MIF, ICAM2, CCL5, CLEC2B) that contribute to immune dysfunction and viral persistence (page 9, Supplementary Fig. 5) In addition, we validated the expression of KLF2 and its target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which includes both healthy donors and individuals with chronic HIV-1 infection. The upregulation of key KLF2 targets in HIV-1-infected CD4+ T cells from this dataset supports the reproducibility of our findings. We have incorporated into the revised Results, Discussion, and Supplementary Materials (page 8, page 12 and Supplementary Fig. 4A).

      Reviewer #2 (Public review):

      Summary:

      The authors observed gene ontologies associated with upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells using scRNA-seq and scATAC-seq datasets from the PBMCs of early HIV-1-infected patients, showing immune responses contributing to HIV pathogenesis and novel targets for viral elimination.

      Strengths:

      The authors carried out detailed transcriptomics profiling with scRNA-seq and scATAC-seq datasets to conclude upregulated KLF2 target genes in HIV-1 RNA+ CD4 T Cells.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      This key observation of up-regulation KLF2 associated genes family might be important in the HIV field for early diagnosis and viral clearance. However, with the limited sample size and in-vivo study model, it will be hard to conclude. I highly recommend increasing the sample size of early HIV-1-infected patients.

      Thank you to Reviewer #2 for this important comment. We acknowledge the limitations of our modest sample size, which reflects the challenges of recruiting well-characterized individuals in early HIV-1 infection (<6 months) and obtaining high-quality PBMCs for single-cell multi-omic profiling. To strengthen our findings, we validated the upregulation of KLF2 target genes using a publicly available scRNA-seq dataset from HIV-1-infected PBMCs [1], which showed similar expression patterns in HIV-1 RNA+ CD4+ T cells (page 8 and Supplementary Fig. 4A).

      Reviewer #3 (Public review):

      Summary:

      This manuscript studies intracellular changes and immune processes during early HIV-1 infection with an additional focus on the small CD4+ T cell subsets. The authors used single-cell omics to achieve high resolution of transcriptomic and epigenomic data on the infected cells which were verified by viral RNA expression. The results add to understanding of transcriptional regulation which may allow progression or HIV latency later in infected cells. The biosamples were derived from early HIV infection cases, providing particularly valuable data for the HIV research field.

      Strengths:

      The authors examined the heterogeneity of infected cells within CD4 T cell populations, identified a significant and unexpected difference between naive and effector CD4 T cells, and highlighted the differences in Th2 and Th17 cells. Multiple methods were used to show the role of the increased KLF2 factor in infected cells. This is a valuable finding of a new role for the major transcription factor in further disease progression and/or persistence.

      The methods employed by the authors are robust. Single-cell RNA-Seq from PBMC samples was followed by a comprehensive annotation of immune cell subsets, 16 in total. This manuscript presents to the scientific community a valuable multi-omics dataset of good quality, which could be further analyzed in the context of larger studies.

      We sincerely thank the reviewer for the insightful and concise summary of our work.

      Weaknesses:

      Methods and Supplementary materials

      Some technical aspects could be described in more detail. For example, it is unclear how the authors filtered out cells that did not pass quality control, such as doublets and cells with low transcript/UMI content. Next, in cell annotation, what is the variability in cell types between donors? This information is important to include in the supplementary materials, especially with such a small sample size. Without this, it is difficult to determine, whether the differences between subsets on transcriptomic level, viral RNA expression level, and chromatin assessment are observed due to cell type variations or individual patient-specific variations. For the DEG analysis, did the authors exclude the most variable genes?

      Thank you to Reviewer #3 for these detailed comments and observations. In the revised Methods section (page 16), we have added information on our quality control filtering process. Specifically, we excluded cells with fewer than 200 detected genes, high mitochondrial content (>30%), or low UMI counts. Doublets were identified and removed using DoubletFinder.

      To address inter-donor variability, we included a new supplementary figure (Supplementary Fig. 1B) showing the distribution of major immune cell types across individual donors. While we observed some variation in cell-type composition between individuals, this likely reflects natural biological heterogeneity in early HIV-1 infection. Additionally, we applied fastMNN batch correction to mitigate donor-specific technical variation. After correction, the overall patterns of gene expression within each major CD4+ T cell subset were consistent across individuals (Supplementary Fig. 1C).

      Regarding the DEG analysis, we used ‘FindMarkers’ function in Seurat (v.3.2.1), which does not exclude highly variable genes. These details have been clarified in the updated Methods section (page 18).

      The annotation of 16 cell types from PBMC samples is impressive and of good quality, however, not all cell types get attention for further analysis. It’s natural to focus primarily on the CD4 T cells according to the research objectives. The authors also study potential interactions between CD4 and CD8 T cells by cell communication inference. It would be interesting to ask additional questions for other underexplored immune cell subsets, such as: 1) Could viral RNA be detected in monocytes or macrophages during early infection? 2) What are the inferred interactions between NK cells and infected CD4 T cells, are interactions similar to CD4-CD8 results? 3) What are the inferred interactions between monocytes or macrophages and infected CD4 T cells?

      In line with our study objectives, we initially focused on CD4+ T cells as primary HIV-1 targets. However, in response to the reviewer’s comment, we examined the inferred communications between HIV-1-infected CD4+ T cells and other immune cells.

      (1) With regard to the presence of viral RNA in monocytes or macrophages, we observed negligible HIV-1 RNA signal in these cell types in our dataset, consistent with their low permissiveness in early-stage infection [2]. However, we acknowledge the limitations of detecting rare infected cells at the single-cell level.

      (2) We identified increased MIF and ICAM2 signaling between NK cells and HIV-1-infected CD4+ T cells, which are associated with KLF2-mediated immune modulation. These patterns are consistent with the CD4–CD8 interaction results observed in our dataset. (Supplementary Fig. 5A)

      (3) Through the cell-cell interaction analysis with differential expression analysis, we inferred reduced CCL5 and CD55 signaling between monocytes and HIV-1-infected CD4+ T cells (Supplementary Fig. 5B). These reductions may potentially impair immune responses and antiviral defense.

      We appreciate the reviewer’s suggestions and believe that the analysis of underexplored immune subsets strengthens the relevance of our findings. These results have been incorporated into the revised Results (page 9).

      Discussion

      It would be interesting to see more discussion of the observation of how naïve T cells produce more viral RNA compared to effector T cells. It seems counterintuitive according to general levels of transcriptional and translational activity in subsets.

      Another discussion block could be added regarding the results and conclusion comparison with Ashokkumar et al. paper published earlier in 2024 (10.1093/gpbjnl/qzae003). This earlier publication used both a cell line-based HIV infection model and primary infected CD4 T cells and identified certain transcription factors correlated with viral RNA expression.

      Thank you to Reviewer #3 for the insightful suggestions. We observed that the proportion of HIV-1-infected naïve CD4 T cells is higher compared to effector T cells. Although effector CD4 T cells are generally more active, previous studies have suggested that naïve CD4 T cells are susceptible to HIV-1 infection during early infection that may associate with initial expansion and rapid progression [3, 4]. This may be due to less restriction by antiviral signaling or more accessible chromatin states in resting cells. We have added this context and cited relevant papers to address this observation (page 11)

      In addition, we have incorporated a comparative discussion with the recent study [5], which identified FOXP1 and GATA3 as transcriptional regulators associated with HIV-1 RNA expression. While these TFs were not significantly differentially expressed in our dataset, we discuss potential reasons for this discrepancy—including differences in infection model (in vitro vs. ex vivo), infection stage (latency vs. acute), and T cell subset composition—and emphasize that both studies highlight the importance of transcriptional regulation in HIV-1 persistence (page 12 and Supplementary Fig. 4B).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study has several notable limitations.

      First, it was restricted to early-stage HIV-1 infection (<6 months) without longitudinal data, preventing the authors from capturing temporal changes in immune cell populations, gene expression profiles, and epigenetic landscapes throughout disease progression.

      Thank you to Reviewer #1 for this important limitation. As noted, our study focused exclusively on early-stage HIV-1 infection (<6 months) to capture the initial immune dysregulation and epigenetic alterations. We agree that longitudinal analysis would provide valuable insights into disease progression. However, due to the limited availability of early-infection patient samples suitable for performing multi-omics profiling, we prioritized capturing a detailed snapshot at this early stage. To address this limitation, future studies incorporating longitudinal sampling—including chronic infection and long-term non-progressors—will be essential to fully elucidate the temporal dynamics of HIV-1 pathogenesis.

      Second, while the bioinformatic analysis compared "Uninfected" and "HIV-1-infected" cells from patients, the authors could have strengthened their findings by incorporating publicly available single-cell data from healthy donors and chronically infected HIV-1 patients to validate their arguments across all figures.

      To support the robustness of our findings, we incorporated a publicly available single-cell RNA-seq dataset [1], which includes both healthy donors and individuals with chronic HIV-1 infection. In this dataset, we validated the upregulation of KLF2 and its target genes in HIV-1-infected CD4+ T cells and observed generally consistent expression patterns with those in our early-infection cohort (page 8; page 12 and Supplementary Fig. S4). While not all gene-level trends were identically reflecting differences in infection stage and immune activation status, this external comparison reinforces the reproducibility of key observations and highlights the unique transcriptional features associated with early HIV-1 infection.

      Third, although the study focused on CD4+ T cells as primary HIV-1 targets, it overlooked other important immune cells such as CD8+ T cells, monocytes, and NK cells, which may contribute to viral persistence and immune dysfunction through cell-cell interactions.

      In the revised manuscript, we expanded our analysis to include predicted ligand–receptor interactions between HIV-1-infected and uninfected CD4+ T cells with innate and cytotoxic immune cells using CellChat v.2.1.1. Specifically, we evaluated interactions with NK cells and monocytes and identified altered signaling pathways such as MIF, ICAM2, CCL5, and CLEC2B, which are associated with immune modulation (Supplementary Fig. 5A). We have added these results to the revised Results (page 9).

      Lastly, comparing these findings with other chronic viral infections (e.g., HBV, HCV) would have positioned this work more effectively within the broader field of viral immunology and enhanced its impact.

      We agree that broader comparisons with other chronic viral infections could enhance the impact of our findings. In the current discussion, we noted similarities in interferon signaling disruption with viruses such as HCV and HSV. (page 11). Our observation that HIV-1-infected CD4+ T cells exhibit impaired interferon responses is consistent with immune evasion mechanisms reported in HCV and HSV infections. These results underscore both the shared and specific features of immune modulation and persistence during HIV-1 early infection.

      Reviewer #3 (Recommendations for the authors):

      Supplementary Table S1 should indicate which technique was used for sequencing. However, the current version of the table marks no protocol applied to the majority of the samples, which is confusing and needs to be corrected.

      Thank you to Reviewer #3 for pointing out this important oversight. We have revised Supplementary Table S1 to clearly indicate the sequencing method used for each sample. Separate columns for scRNA-seq, scATAC-seq, and sc-Multiome now specify whether each technique was applied (“Yes” or “No”) to improve clarity and transparency.

      (1) Wang, S., et al., An atlas of immune cell exhaustion in HIV-infected individuals revealed by single-cell transcriptomics. Emerg Microbes Infect, 2020. 9(1): p. 2333-2347.

      (2) Arfi, V., et al., Characterization of the early steps of infection of primary blood monocytes by human immunodeficiency virus type 1. J Virol, 2008. 82(13): p. 6557-65.

      (3) Douek, D.C., et al., HIV preferentially infects HIV-specific CD4+ T cells. Nature, 2002. 417(6884): p. 95-8.

      (4) Jiao, Y., et al., Higher HIV DNA in CD4+ naive T-cells during acute HIV-1 infection in rapid progressors. Viral Immunol, 2014. 27(6): p. 316-8.

      (5) Ashokkumar, M., et al., Integrated Single-cell Multiomic Analysis of HIV Latency Reversal Reveals Novel Regulators of Viral Reactivation. Genomics Proteomics Bioinformatics, 2024. 22(1).

    1. eLife Assessment

      This study presents valuable findings on the relationship between nutrient availability and NAD/NADH levels, which in turn regulate biomass production in cancer cells. The authors provide solid evidence to support their claims, offering insight into why it is difficult to predict which nutrients limit cancer cell growth: both cell type and nutrient availability together determine the oxidative capacity that constrains the synthesis of various metabolic intermediates. The manuscript will be of interest to researchers working in cancer and cell metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how cellular NAD/NADH ratios are controlled in cancer cell lines in vitro. The authors build on previous work, which shows that serine synthesis is sensitive to NAD/NADH ratios and PHGDH expression. Here, the authors demonstrate that serine synthesis is variable across a panel of cell lines, even when controlling for expression of serine synthesis enzymes such as PHGDH. The authors show that cellular NAD/NADH ratios correlate with the ability to synthesize serine and grow in serine-deprived environments when PHGDH levels remain constant. Investigating this variability in NAD/NADH ratios, the authors find that the cells that can positively respond to serine deprivation are able to increase oxygen consumption and cellular NAD/NADH ratios. Cells that do not increase oxygen consumption in response to serine deprivation do not increase NAD/NADH ratios and cannot grow well without serine. The authors go on to show that in cells with the ability to increase oxygen consumption upon serine deprivation, PHGDH expression alone is sufficient to fully restore growth-serine; in cells that cannot increase oxygen consumption, both PHGDH expression and interventions to increase NAD/NADH ratios are required to increase growth. Thus, cells need both PHGDH and NAD/NADH increases to maximize serine synthesis in response to serine deprivation. The authors previously showed that lipid synthesis likewise requires NAD regeneration. Interestingly, one cell line that does not increase oxygen consumption in response to serine limitation tends to increase oxygen consumption in response to lipid deprivation; accordingly, depriving this cell line of lipids increases the synthesis of serine. Together, these findings show that how cells respond to nutrient deprivation is highly variable and that the response to nutrient deprivation (for example, whether or not oxygen consumption is increased) will determine how well cells tolerate depletion of nutrients with related biosynthetic constraints. This work sheds light on the complexity of cancer cell metabolism and helps to explain why it is difficult to predict which nutrients will be limiting to any cancer cell type or environment.

      Strengths:

      (1) The authors use multiple interventions to manipulate NAD/NADH ratios in cells.

      (2) Experiments are well controlled and appropriately interpreted.

      Weaknesses:

      Overall the data support the conclusions of the manuscript. I have only two minor comments and suggestions:.

      (1) Figure 2B/C: data are presented as relative to +serine, which shows how some cells respond to -serine, but may also be of interest to see how absolute (not relative) NAD/NADH levels correlate with serine synthesis and serine-independent proliferation. In other words, is it the dynamic increase in the ratio that is most important, or the absolute level of the ratio?

      (2) Line 177-178: the authors write, "We hypothesized that the elevated NAD+/NADH ratio represented a cellular response to make the NAD+/NADH ratio more oxidized to enable serine synthesis". I recommend modest edits to avoid anthropomorphizing. It is possible that the ratio responds for reasons yet to be determined and not necessarily because the cell is deliberately trying to enable serine synthesis.

    3. Reviewer #2 (Public review):

      In the manuscript "Cancer cells differentially modulate mitochondrial respiration to alter redox state and enable biomass synthesis in nutrient-limited environments", Chang et al investigate how cancer cells respond to the limitation of certain environmental nutrients by regulating the cellular NAD+/NADH ratio. They focus on serine and lipid metabolism, pathways known to be controlled by the NAD+/NADH ratio, and propose that changes in mitochondrial respiration in response to deprivation of these nutrients can influence the NAD+/NADH ratio, thereby impacting biomass synthesis.

      While the study is descriptive in nature and does not investigate specific molecular mechanisms that explain the crosstalk between nutrient availability and mitochondrial redox changes, the experimental component is robust, and the conclusions are well supported by the results. Some suggestions could further refine the conclusions and enhance the quality of the manuscript.

      Main critiques:

      (1) Throughout the manuscript, the authors utilise the number of cell doublings per day as an endpoint readout of cell proliferation. It would be advisable to include a quantification of the cell number and to display the proliferation rate over time. This would provide valuable insights into the timeline of cellular responses and avoid potential confounding effects associated with the use of Sulforhodamine B dye, an indirect measure of cell proliferation based on protein content, which may be influenced by some of the interventions. Furthermore, it will help determine whether specific treatments reduce cellular doublings resulting from cell death. This concern is particularly evident in treatments with rotenone, e.g., Fig. 1G, where the increase in doublings could be attributed to cell death.

      (2) The authors propose a model in which the deprivation of extracellular nutrients impacts mitochondrial respiration, which in turn increases the NAD+/NADH ratio and ultimately affects metabolic biosynthetic pathways that occur in the cytosol, such as serine biosynthesis. The mechanism by which nutrient availability is sensed and transmitted across different cellular compartments to regulate mitochondrial redox status remains unclear. This concern is particularly relevant for serine metabolism, as its synthesis occurs in the cytosol, but the authors connect it to mitochondrial respiration. Compartment-specific measurements of NAD+/NADH ratio would help to understand to what extent the redox state is affected by nutrients in the mitochondria and in the cytoplasm (see also minor critiques point 2). Moreover, the use of the genetic tool LbNox could be employed to manipulate the NAD+/NADH ratio in a compartment-specific manner, while also avoiding the toxicity of certain compounds, such as rotenone. This set of experiments would add depth to the investigation, which might otherwise appear too descriptive.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang and colleagues provides new insights into how cancer cells adapt their metabolism under nutrient-deprived conditions. They find cells respond differentially to serine and lipid deprivation via oxidising the cell redox state, which enables biomass synthesis and cell proliferation. They identified mitochondrial respiration as the major mechanism that dictates the endogenous NAD+/NADH ratio. By incorporating a dual stress paradigm, serine and lipid deprivation, the study further suggests that the NAD+/NADH ratio can serve as a link to orchestrate the complex interplay between multiple nutrient changes in the tumour microenvironment.

      Strengths:

      A novel aspect of this study is the idea that cancer cells are not uniformly passive victims of nutrient limitation; some can actively invoke endogenous NAD+ regeneration to combat nutrient stress. The conclusion is well-supported by comparing multiple cell lines from different tissues and genetic backgrounds, which improves generalizability. While most of the smaller conclusions align with common reasoning and expectations, the step-by-step deduction that leads to a novel 'big picture' is commendable. Another notable strength is the integration of dual stress (lipid and serine deprivation), which better mimics the complex tumor microenvironment with multiple nutrient fluctuations, raising the translational potential of these findings. The observation that lipid-deprived cells can stimulate serine synthesis and support proliferation in a subset of cancer cell lines offers a novel perspective on metabolic plasticity under starvation conditions.

      Weaknesses:

      Although the authors derive a novel and valuable overarching concept, the presentation of this "big picture" is not clearly articulated, making it less accessible to readers outside the immediate field. It would greatly enhance the manuscript to include a clearer summary of the overarching model and its implications. Additionally, discussing the potential clinical significance and applications of the findings would increase the relevance and broader impact of the work. Finally, the manuscript's clarity and credibility are undermined by inconsistent figure labeling and the lack of statistical analysis, particularly for the Western blot data.

      While this study identifies changes in serine synthesis, mitochondrial respiration, PHGDH protein levels, and NAD+/NADH ratio in different cell lines, some of these relationships appear correlative rather than causally established (Figure 2; Figure 5; Figure 6). Some claims are thus overinterpreted. For example, the co-occurrence of increased NAD+/NADH ratio and citrate levels under lipid deprivation in A549 cells does not establish causality (Figure 5). Direct perturbation experiments that manipulate NAD+/NADH and assess downstream effects on citrate synthesis would substantially strengthen the conclusions.

      The study focuses predominantly on mitochondrial respiration as a source of NAD+ regeneration. However, it will also be interesting to check other significant pathways, such as NAD+ salvage, which have been implicated in supporting serine biosynthesis. In addition, the subcellular distribution of NAD+ may distinguish whether some cells are truly redox-unresponsive. Mitochondrial NAD+ regeneration might counteract the cytosolic NAD+ consumption, rendering a relatively stable intracellular NAD+/NADH ratio. The malate-aspartate shuttle can be an interesting aspect.

      The authors should acknowledge the limitations of short-term isotope tracing in their experimental design. Differences in metabolic rates across cell lines can affect the kinetics of metabolite labeling, limiting the direct comparability of metabolic fluxes between them. As a result, observed changes may reflect transient adaptations rather than stable metabolic reprogramming. It is important to clarify that the study primarily captures short-term responses, and the conclusions may not extrapolate to longer-term adaptations or protein-level changes under sustained nutrient stress.

    1. eLife Assessment

      Weiss et al. provide important new insights and convincing evidence to further our mechanistic understanding of how antigen presentation shapes skin persistence of CD8+ TRM. Using a mouse model for inducible genetic ablation of transforming growth factor beta receptor 3 (TGFBR3) in CD8+ T cells, they demonstrate TGFBR3's role in regulating CD8+ TRM persistence in skin. Furthermore, they show that the strength of T cell receptor (TCR) engagement upon initial CD8+ TRM skin seeding has a positive influence on subsequent TRM expansion following a secondary antigen-reencounter. Together, these mechanisms add to our understanding of how the skin CD8+ T cell repertoire is dynamically responsive to topical antigen.

    2. Reviewer #1 (Public review):

      Summary:

      Weiss et. al. seek to delineate the mechanisms by which antigen-specific CD8+ T cells outcompete bystanders in the epidermis when active TGF-b is limiting, resulting in selective retention of these cells and more complete differentiation into the TRM phenotype.

      Strengths:

      They begin by demonstrating that at tissue sites where cognate antigen was expressed, CD8+ T cells adopt a more mature TRM transcriptome than cells at tissue sites where cognate antigen was never expressed. By integrating their scRNA-Seq data on TRM with the much more comprehensive ImmGenT atlas, the authors provide a very useful resource for future studies in the field. Furthermore, they conclusively show that these "local antigen-experienced" TRM have increased proliferative capacity and that TCR avidity during TRM formation positively correlates with their future fitness. Finally, using an elegant experimental strategy, they establish that TCR signaling in CD8+ T cells in epidermis induces TGFBRIII expression, which likely contributes to endowing them with a competitive advantage over antigen-inexperienced TRM.

      Weaknesses:

      The main weakness in this paper lies in the authors' reliance on a single model to derive conclusions on the role of local antigen during the acute phase of the response by comparing T cells in model antigen-vaccinia virus (VV-OVA) exposed skin to T cells in contralateral skin exposed to DNFB 5 days after the VV-OVA exposure. In this setting, antigen-independent factors may contribute to the difference in CD8+ T cell number and phenotype at the two sites. For example, it was recently shown that very early memory precursors (formed 2 days after exposure) are more efficient at seeding the epithelial TRM compartment than those recruited to skin at later times (Silva et al, Sci Immunol, 2023). DNFB-treated skin may therefore recruit precursors with reduced TRM potential. In addition, TRM-skewed circulating memory precursors have been identified (Kok et al, JEM, 2020), and perhaps VV-OVA exposed skin more readily recruits this subset compared to DNFB-exposed skin. Therefore, when the DNFB challenge is performed 5 days after vaccinia virus, the DNFB site may already be at a disadvantage in the recruitment of CD8+ T cells that can efficiently form TRM. In addition, CD8+ T cell-extrinsic mechanisms may be at play, such as differences in myeloid cell recruitment and differentiation or local cytokine and chemokine levels in VV-infected and DNFB-treated skin that could account for differences seen in TRM phenotype and function between these two sites. Although the authors do show that providing exogenous peptide antigen at the DNFB-site rescues their phenotype in relation to the VV-OVA site, the potential antigen-independent factors distinguishing these two sites remain unaddressed. In addition, there is a possibility that peptide treatment of DNFB-treated initiates a second phase of priming of new circulatory effectors in the local-draining lymph nodes that are then recruited to form TRM at the DFNB-site, and that the effect does not solely rely on TRM precursors at the DNFB-treated skin site at the time of peptide treatment.

      Secondly, although the authors conclusively demonstrate that TGFBRIII is induced by TCR signals and required for conferring increased fitness to local-antigen-experienced CD8+ TRM compared to local antigen-inexperienced cells, this is done in only one experiment, albeit repeated 3 times. The data suggest that antigen encounter during TRM formation induces sustained TGFBRIII expression that persists during the antigen-independent memory phase. It remains unclear why only the antigen encounter in skin, but not already in the draining lymph nodes, induces sustained TGFBRIII expression. Further characterizing the dynamics of TGFBRIII expression on CD8+ T cells during priming in draining lymph nodes and over the course of TRM formation and persistence may shed more light on this question. Probing the role of this mechanism at other sites of TRM formation would also further strengthen their conclusions and enhance the significance of this finding.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to dissect the mechanistic basis of their previously published finding that encountering cutaneous antigen augments the persistence of CD8+ memory T cells that enter skin (TRM) (Hirai et al., 2021, Immunity). Here they use the same murine model to study the fate of CD8+ T cells after antigen-priming in the lymph nodes, (1) those that re-encounter antigen in the skin via vaccinia virus (VV) versus (2) those that do not encounter antigen in skin but rather are recruited via topical dinitrofluorobenzene (DNFB) (so-called "bystander TRM"). The authors' previous publication establishes that this first group of CD8+ TRM has a persistence advantage over bystander TRM under TGFb-limiting conditions. The current paper advances this finding by elucidating the role of TGFBR3 in regulating CD8+ TRM skin persistence upon topical antigen exposure. Key novelty of the work lies in the generation and use of the CD8+ T cell-specific TGFBR3 knockout model, which allows them to demonstrate the role of TGFBR3 in fine-tuning the degree of CD8+ T cell skin persistence and that TGFBR3 expression is promoted by CD8+ TRM encountering their cognate antigen upon initial skin entry. Future work directly measuring active TGFb in the skin under different conditions would help identify physiologic scenarios that yield active TGFb-limiting conditions, thus establishing physiologic relevance.

      Strengths:

      Technical strengths of the paper include (1) complementary imaging and flow cytometry analyses, (2) integration of their scRNA-seq data with the existing CD8+ TRM literature via pathway analysis, and (3) use of orthogonal models where possible. Using a vaccina virus (VV) model, with and without ovalbumin (OVA), the authors investigate how topical antigen exposure and TCR strength regulate CD8+ TRM skin recruitment and retention. The authors use both FTY720 and a Thy1.1 depleting antibody to demonstrate that skin CD8+ TRM expand locally following both a primary and secondary recall response to topical OVA application.

      A conceptual strength of the paper is the authors' observation that TCR signal strength upon initial TRM tissue entry helps regulate the extent of their local re-expansion on subsequent antigen re-exposure. They achieved this by applying peptides of varying affinity for the OT-I TCR on the DNFB-exposed flank in tandem with initial VV-OVA + DNFB treatment. They then measured TRM expansion after OVA peptide rechallenge, revealing that encountering a higher-affinity peptide upon skin entry leads to greater subsequent re-expansion. Additionally, by generating an OT-I Thy1.1+ E8i-creERT2 huNGFR Tgfbr3fl/fl (Tgfbr3∆CD8) mouse, the authors were able to elucidate a unique role for TGFBR3 in CD8+TRM persistence when active TGFb in skin is limited.

      Weaknesses:

      Overall, the authors' conclusions are well supported, although there are some instances where additional controls, experiments, or clarifications would add rigor. The conclusions regarding skin-localized TCR signaling leading to increased skin CD8+ TRM proliferation in-situ and increased TGFBR3 expression would be strengthened by assessing skin CD8+ TRM proliferation and TGFBR3 expression in models of high versus low avidity topical OVA-peptide exposure. The authors could further increase the novelty of the paper by exploring whether TGFBR3 is regulated at the RNA or protein level. To this end, they could perform analysis of their single-cell RNA sequencing data (Figure 1), comparing Tgfbr3 mRNA in DNFB versus VV-treated skin.

      For clarity, when discussing antigen exposure throughout the paper, it would be helpful for the authors to be more precise that they are referring to the antigen in the skin rather than in the draining lymph node. A more explicit summary of some of the lab's previous work focused on CD8+ TRM and the role of TGFb would also help readers better contextualize this work within the existing literature on which it builds.

      For rigor, it would be helpful where possible to pair flow cytometry quantification with the existing imaging data. Additional controls, namely enumerating TRM in the opposite, untreated flank skin of VV-only-treated mice and the treated flank skin of DNFB-only treated mice, would help contextualize the results seen in dually-treated mice in Figure 1. In figure legends, we suggest clearly reporting unpaired T tests comparing relevant metrics within VV or DNFB-treated groups (for example, VV-OVA PBS vs VV-OVA FTY720 in Figure 3F). Finally, quantifying right and left skin draining lymph node CD8+ T cell numbers would clarify the skin specificity and cell trafficking dynamics of the authors' model.

    1. eLife Assessment

      This study presents a useful framework to extract the individuality index to predict subjects' behavior in the target tasks. However, the evidence supporting such a framework is somewhat incomplete and would benefit from overall framing and clarity on its approaches. Overall, this study would be of interest to cognitive and AI researchers who work on cognitive models in general.

    2. Reviewer #1 (Public review):

      Summary

      The manuscript presents EIDT, a framework that extracts an "individuality index" from a source task to predict a participant's behaviour in a related target task under different conditions. However, the evidence that it truly enables cross-task individuality transfer is not convincing.

      Strengths

      The EIDT framework is clearly explained, and the experimental design and results are generally well-described. The performance of the proposed method is tested on two distinct paradigms: a Markov Decision Process (MDP) task (comparing 2-step and 3-step versions) and a handwritten digit recognition (MNIST) task under various conditions of difficulty and speed pressure. The results indicate that the EIDT framework generally achieved lower prediction error compared to baseline models and that it was better at predicting a specific individual's behaviour when using their own individuality index compared to using indices from others.

      Furthermore, the individuality index appeared to form distinct clusters for different individuals, and the framework was better at predicting a specific individual's behaviour when using their own derived index compared to using indices from other individuals.

      Weaknesses

      (1) Because the "source" and "target" tasks are merely parameter variations of the same paradigm, it is unclear whether EIDT achieves true cross-task transfer. The manuscript provides no measure of how consistent each participant's behaviour is across these variants (e.g., two- vs three-step MDP; easy vs difficult MNIST). Without this measure, the transfer results are hard to interpret. In fact, Figure 5 shows a notable drop in accuracy when transferring between the easy and difficult MNIST conditions, compared to transfers between accuracy-focused and speed-focused conditions. Does this discrepancy simply reflect larger within-participant behavioural differences between the easy and difficult settings? A direct analysis of intra-individual similarity for each task pair - and how that similarity is related to EIDT's transfer performance - is needed.

      (2) Related to the previous comment, the individuality index is central to the framework, yet remains hard to interpret. It shows much greater within-participant variability in the MNIST experiment (Figure S1) than in the MDP experiment (Figure 3). Is such a difference meaningful? It is hard to know whether it reflects noisier data, greater behavioural flexibility, or limitations of the model.

      (3) The authors suggests that the model's ability to generalize to new participants "likely relies on the fact that individuality indices form clusters and individuals similar to new participants exist in the training participant pool". It would be helpful to directly test this hypothesis by quantifying the similarity (or distance) of each test participant's individuality index to the individuals or identified clusters within the training set, and assessing whether greater similarity (or closer proximity) to the clusters in the training set is associated with higher prediction accuracy for those individuals in the test set.

    3. Reviewer #2 (Public review):

      This paper introduces a framework for modeling individual differences in decision-making by learning a low-dimensional representation (the "individuality index") from one task and using it to predict behaviour in a different task. The approach is evaluated on two types of tasks: a sequential value-based decision-making task and a perceptual decision task (MNIST). The model shows improved prediction accuracy when incorporating this learned representation compared to baseline models.

      The motivation is solid, and the modelling approach is interesting, especially the use of individual embeddings to enable cross-task generalization. That said, several aspects of the evaluation and analysis could be strengthened.

      (1) The MNIST SX baseline appears weak. RTNet isn't directly comparable in structure or training. A stronger baseline would involve training the GRU directly on the task without using the individuality index-e.g., by fixing the decoder head. This would provide a clearer picture of what the index contributes.

      (2) Although the focus is on prediction, the framework could offer more insight into how behaviour in one task generalizes to another. For example, simulating predicted behaviours while varying the individuality index might help reveal what behavioural traits it encodes.

      (3) It's not clear whether the model can reproduce human behaviour when acting on-policy. Simulating behaviour using the trained task solver and comparing it with actual participant data would help assess how well the model captures individual decision tendencies.

      (4) Figures 3 and S1 aim to show that individuality indices from the same participant are closer together than those from different participants. However, this isn't fully convincing from the visualizations alone. Including a quantitative presentation would help support the claim.

      (5) The transfer scenarios are often between very similar task conditions (e.g., different versions of MNIST or two-step vs three-step MDP). This limits the strength of the generalization claims. In particular, the effects in the MNIST experiment appear relatively modest, and the transfer is between experimental conditions within the same perceptual task. To better support the idea of generalizing behavioural traits across tasks, it would be valuable to include transfers across more structurally distinct tasks.

      (6) For both experiments, it would help to show basic summaries of participants' behavioural performance. For example, in the MDP task, first-stage choice proportions based on transition types are commonly reported. These kinds of benchmarks provide useful context.

      (7) For the MDP task, consider reporting the number or proportion of correct choices in addition to negative log-likelihood. This would make the results more interpretable.

      (8) In Figure 5, what is the difference between the "% correct" and "% match to behaviour"? If so, it would help to clarify the distinction in the text or figure captions.

      (9) For the cognitive model, it would be useful to report the fitted parameters (e.g., learning rate, inverse temperature) per individual. This can offer insight into what kinds of behavioural variability the individuality index might be capturing.

      (10) A few of the terms and labels in the paper could be made more intuitive. For example, the name "individuality index" might give the impression of a scalar value rather than a latent vector, and the labels "SX" and "SY" are somewhat arbitrary. You might consider whether clearer or more descriptive alternatives would help readers follow the paper more easily.

      (11) Please consider including training and validation curves for your models. These would help readers assess convergence, overfitting, and general training stability, especially given the complexity of the encoder-decoder architecture.

    4. Reviewer #3 (Public review):

      Summary:

      This work presents a novel neural network-based framework for parameterizing individual differences in human behavior. Using two distinct decision-making experiments, the authors demonstrate the approach's potential and claims it can predict individual behavior (1) within the same task, (2) across different tasks, and (3) across individuals. While the goal of capturing individual variability is compelling and the potential applications are promising, the claims are weakly supported, and I find that the underlying problem is conceptually ill-defined.

      Strengths:

      The idea of using neural networks for parameterizing individual differences in human behavior is novel, and the potential applications can be impactful.

      Weaknesses:

      (1) To demonstrate the effectiveness of the approach, the authors compare a Q-learning cognitive model (for the MDP task) and RTNet (for the MNIST task) against the proposed framework. However, as I understand it, neither the cognitive model nor RTNet is designed to fit or account for individual variability. If that is the case, it is unclear why these models serve as appropriate baselines. Isn't it expected that a model explicitly fitted to individual data would outperform models that do not? If so, does the observed superiority of the proposed framework simply reflect the unsurprising benefit of fitting individual variability? I think the authors should either clarify why these models constitute fair control or validate the proposed approach against stronger and more appropriate baselines.

      (2) It's not very clear in the results section what it means by having a shorter within-individual distance than between-individual distances. Related to the comment above, is there any control analysis performed for this? Also, this analysis appears to have nothing to do with predicting individual behavior. Is this evidence toward successfully parameterizing individual differences? Could this be task-dependent, especially since the transfer is evaluated on exceedingly similar tasks in both experiments? I think a bit more discussion of the motivation and implications of these results will help the reader in making sense of this analysis.

      (3) The authors have to better define what exactly he meant by transferring across different "tasks" and testing the framework in "more distinctive tasks". All presented evidence, taken at face value, demonstrated transferring across different "conditions" of the same task within the same experiment. It is unclear to me how generalizable the framework will be when applied to different tasks.

      (4) Conceptually, it is also unclear to me how plausible it is that the framework could generalize across tasks spanning multiple cognitive domains (if that's what is meant by more distinctive). For instance, how can an individual's task performance on a Posner task predict task performance on the Cambridge face memory test? Which part of the framework could have enabled such a cross-domain prediction of task performance? I think these have to be at least discussed to some extent, since without it the future direction is meaningless.

      (5) How is the negative log-likelihood, which seems to be the main metric for comparison, computed? Is this based on trial-by-trial response prediction or probability of responses, as what usually performed in cognitive modelling?

      (6) None of the presented evidence is cross-validated. The authors should consider performing K-fold cross-validation on the train, test, and evaluation split of subjects to ensure robustness of the findings.

      (7) The authors excluded 25 subjects (20% of the data) for different reasons. This is a substantial proportion, especially by the standards of what is typically observed in behavioral experiments. The authors should provide a clear justification for these exclusion criteria and, if possible, cite relevant studies that support the use of such stringent thresholds.

      (8) The authors should do a better job of creating the figures and writing the figure captions. It is unclear which specific claim the authors are addressing with the figure. For example, what is the key message of Figure 2C regarding transfer within and across participants? Why are the stats presentation different between the Cognitive model and the EIDT framework plots? In Figure 3, it's unclear what these dots and clusters represent and how they support the authors' claim that the same individual forms clusters. And isn't this experiment have 98 subjects after exclusion, this plot has way less than 98 dots as far as I can tell. Furthermore, I find Figure 5 particularly confusing, as the underlying claim it is meant to illustrate is unclear. Clearer figures and more informative captions are needed to guide the reader effectively.

      (9) I also find the writing somewhat difficult to follow. The subheadings are confusing, and it's often unclear which specific claim the authors are addressing. The presentation of results feels disorganized, making it hard to trace the evidence supporting each claim. Also, the excessive use of acronyms (e.g., SX, SY, CG, EA, ES, DA, DS) makes the text harder to parse. I recommend restructuring the results section to be clearer and significantly reducing the use of unnecessary acronyms.

    1. eLife Assessment

      This manuscript makes important contributions to the methodology commonly used to assess representational structures in human and animal brain activity recorded using various techniques (especially fMRI). The evidence in the form of mathematical analysis and simulations is solid. The impact of this contribution could be improved by extending the simulations to assess the effects of violations of explicit and implicit assumptions.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents a formalism for the relationship between neural signals and pooled signals (e.g., voxel estimates in fMRI) and explores why correlation-based and mean-removed Euclidean RDMs perform well in practice. The key assumption is that the pooled estimates are weighted averages, with i.i.d. non-negative weights. Two sets of simulations are used to support the theoretical findings: one based on fully simulated neural data and another that reverse-engineers neural data from an RDM estimated from real macaque data. The authors also discuss limitations of their simulations, particularly concerning the i.i.d. assumption of the weights.

      Strengths:

      The strengths of this work include its mathematical rigor and the clear connection that is drawn between the derivations and empirical observations. The simulations were well-designed and easy to follow. One small suggestion: a brief explanation of what is meant by "sparse" in Figure 3 would help orient the reader without requiring them to jump ahead to the methods. Overall, I found the work engaging and insightful.

      Weaknesses:

      Although I appreciate the effort to explore *why* certain dissimilarity measures perform well, it wasn't clear how these findings would inform the practical choices of researchers conducting RDM-based analyses. Many researchers likely already use correlation-based or mean-removed Euclidean distance measures, given their popularity. In that case, how do these results provide additional value or guidance beyond current practice?

      Another aspect that could benefit from further clarification is the core assumption underlying the work - that channel-based activity reflects a non-negative weighted average of neural activity. Is this widely accepted as the most plausible model, or are there alternative relationships that researchers should consider? While this may seem intuitive, it's not something I would expect all readers to be familiar with, and only a single reference was provided to support it (which I unfortunately didn't have time to read). That said, I did appreciate the discussion of the i.i.d. assumption in the discussion section. Can more be said to educate researchers as to when the i.i.d. assumption might be violated?

      I didn't find the "Simulations based on neural data" section added much, and it risks being misinterpreted. The main difference here is that neural data were reverse-engineered from a macaque RDM and then used in simulations similar to those in the previous section. What is the added value of using a real RDM to generate simulated data? Were the earlier simulations lacking in some way? There's also a risk of readers mistakenly inferring that human dissimilarities have been reconstructed from macaque data, an assumption that goes beyond the paper's core message, which focuses on linking neural and channel-based signals from the *same* source. If this section is retained, the motivation should be clarified, and the implied parallel in Figure 6, between the human data and simulated data, should be reconsidered.

    3. Reviewer #2 (Public review):

      Summary:

      The paper is a methodological contribution to multivariate pattern analysis and, in particular, the analysis of representational geometry via pairwise representational distances, sometimes called representational dissimilarity analysis (RDA). The authors investigate through theoretical analysis and simulations how true representational distances (defined on the neural level) give rise to representational distances estimated from neurophysiological data, including fMRI and cell recordings. They demonstrate that, due to the way measurements sample neural activity, the activity common to all sampled neurons can be amplified in the representational geometry derived from these measurements, and therefore, an empirical representational geometry may deviate substantially from the true representational geometry. The authors propose to modify the obtained representational structure by removing the dimension corresponding to that common activity, and argue that such a removal of a single dimension does not relevantly affect the representational structure, again underpinned by mathematical analysis and simulation.

      Importance:

      The paper may at first sight be tackling a specific problem within a specific subfield of cognitive neuroscience methods. However, understanding the structure of representations is a fundamental goal of cognitive psychology and cognitive neuroscience, and the fact that methods of representational geometry are not yet routinely used by the wider community may at least partially be due to uncertainty regarding the reliability of these methods. This paper is an important step towards clarifying and improving reliability, and therefore towards more widespread adoption of representational geometry methods.

      Strengths:

      The paper makes its argument generally well, relying on previous work by the authors as well as others to support assumptions about neural sampling by neurophysiological measurements. Their main points are underpinned by both detailed mathematical analysis and simulations, and the latter also produces intuitively accessible illustrations of the authors' argument. The authors discuss in detail under which exact circumstances common neural activity distorts the representational geometry, and therefore, when exactly the removal of the common dimension is necessary to minimize that distortion.

      Weaknesses:

      (1) The argument around the Johnson-Lindenstrauss lemma on pages 5 & 6 is somewhat confused, and also not really convincing.

      First, the correct reference for the lemma seems to be not [20] = Johnson et al. (1986), but Johnson & Lindenstrauss (1984). Moreover, as far as I can tell, Johnson et al. (1986) do not discuss random projections, and while they play a role in Johnson & Lindenstrauss (1984), that is only as a proof device. The paper text suggests that the lemma itself is probabilistic, while actually it is a statement of existence.

      Second, the authors correctly state that the lemma implies that "the number of measurement channels required for a good approximation does not depend on the number of neurons and grows only logarithmically with the number of stimuli", but it is not clear what the relevance of this statement for this paper is, considering that distances between N points can be exactly preserved within an N − 1 dimensional subspace, irrespective of the number of dimensions of the original space, and since in cognitive neuroscience the number of measurement channels is usually (much) larger than the number of experimental stimuli.

      The actually centrally important statement is not the Johnson-Lindenstrauss lemma, but one about the metric-preserving properties of random projections with zero-mean weights. It is this statement that needs to be backed up by the correct references, which, as far as I can tell, are neither the cited Johnson et al. (1986) nor even Johnson & Lindenstrauss (1984) for the lemma.

      (2) The detailed mathematical analyses and simulations focus on the effect of non-zero-mean sampling weights, and that is justified by the result that such sampling leads to a distorted representational geometry. However, there is another assumption which seems to be used almost everywhere in both mathematical analyses and simulations, and which I suspect may have a relevant effect on the observed representational geometry: statistical independence between weights. In particular, in fMRI, the existence of a naturally limited spatial resolution (due to MRI technology or vasculature) makes it unlikely that the weights with which a given neuron affects different voxels are independent.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the conditions under which representational distances estimated from brain-activity measurements accurately mirror the true geometry of the underlying neural representations. Using a theoretical framework and simulations, the authors show that (i) random weighted sampling of individual neurons preserves representational distances; (ii) the non-negative pooling characteristic of fMRI stretches the geometry along the population-mean dimension; and (iii) subtracting the across-channel mean from each activity pattern removes this distortion, explaining the well-known success of correlation-based RSA. They further argue that a mean-centred, squared Euclidean (or Mahalanobis) distance retains this corrective benefit while avoiding some pitfalls of variance normalisation.

      Strengths:

      (1) Theoretical clarity and novelty:<br /> The paper offers an elegant and convincing proof of how linear measurement models affect representational geometry and pinpoints the specific condition (non-zero-mean sampling weights) under which voxel pooling introduces a systematic bias. This quantitative explanation of why mean removal is effective in RSA is new and valuable.

      (2) Simulations:<br /> Experiments on both synthetic high-dimensional fMRI data and macaque-IT-inspired embeddings corroborate the mathematics, providing practical insights into the theoretical reasoning outlined by the authors.

      (3) Actionable recommendations:<br /> The work summarises the results into clear guidelines: random single-unit sampling is "safe" (the estimated geometry is undistorted); fMRI voxel data with unstructured or single-scale codes should be mean-centred; and multi-scale cortical maps require explicit forward modelling. These guidelines are clear, and useful for future research.

      Weaknesses:

      (1) Simplistic assumptions:<br /> The assumption that measurement-channel weights are drawn independently and identically distributed (i.i.d.) from a univariate distribution is a significant idealisation for fMRI data. Voxels have spatially structured responses (and noise), meaning they do not sample neurons with i.i.d. weights. The extent to which the conclusions (especially the "exact recovery" with mean centring) hold when this assumption is violated needs more discussion. While the paper states that the non-negative IWLCS model is a best-case scenario, the implications of deviations from this best case could be elaborated.

      (2) Random-subpopulation model for electrophysiology:<br /> Similarly, the "random subpopulation model" is presented as an idealisation of single-cell recordings. In reality, electrophysiological sampling is often biased (e.g., towards larger, more active neurons or neurons in accessible locations). The paper acknowledges biased sampling as a challenge that requires separate modelling, but the gap between this idealised model and actual practice should be highlighted more strongly when interpreting the optimistic results.

      (3) Noise as an "orthogonal issue":<br /> The theoretical derivations largely ignore measurement noise, treating it as an orthogonal problem solvable by cross-validation. Although bias from noise is a well-known problem, interactions between noise and sampling-induced distortions (especially the down-scaling of orthogonal dimensions) could complicate the picture. For instance, if a dimension is already heavily down-scaled by averaging, it might become more susceptible to being obscured by noise. Addressing or highlighting these points more explicitly would make the limitations of this theoretical framework more transparent.

      (4) Simulation parameters and generalizability:<br /> The random ground-truth geometries were generated from a Gaussian mixture in 5-D and then embedded into 1,024-D, with ≈25 % of the variance coming from the mean dimension. The sensitivity of the findings to these specific parameters (initial dimensionality, geometry complexity, proportion of mean variance, and sample size) could be discussed. How would the results change if the true neural geometry had a much higher or lower intrinsic dimensionality, or if the population-mean component were substantially smaller or larger? If the authors' claims are to generalise, more scenarios should be considered.

      (5) Mean addition to the neural-data simulation:<br /> In simulations based on neural data from Kiani et al., a random mean was added to each pattern to introduce variation along the mean dimension. This was necessary because the original patterns had identical mean activation. However, the procedure might oversimplify how population means vary naturally and could influence the conclusions, particularly regarding the impact of the population-mean dimension. While precisely modelling how the mean varies across conditions is beyond the manuscript's scope, this point should be stated and discussed more clearly.

      (6) Effect of mean removal on representational geometry:<br /> As noted, the benefits of mean removal hold "under ideal conditions". Real data often violates these assumptions. A critical reader might ask: What if conditions differ in overall activation and in more complex ways (e.g., differing correlation structures across neurons)? Is it always desirable to remove population-mean differences? For example, if a stimulus truly causes a global increase in firing across the entire population (perhaps reflecting arousal or salience), subtracting the mean would treat this genuine effect as a nuisance and eliminate it from the geometry. Prior literature has cautioned that one should interpret RSA results after demeaning carefully. For instance, Ramírez (2017) dubbed this problem "representational confusion", showing that subtracting the mean pattern can change the relationships between conditions in non-intuitive ways. These potential issues and previous results should be discussed and properly referenced by the authors.

      Appraisal, Impact, and Utility:

      The authors set out to identify principled conditions under which measured representational distances faithfully reflect the underlying neural geometry and to provide practical guidance for RSA across modalities. Overall, I believe they achieved their goals. Theoretical derivations identify the bias-inducing factors in linear measurement models, and the simulations verify the analytic claims, demonstrating that mean-pattern subtraction can indeed correct some mean-related geometric distortions. These conclusions strongly rely on idealised assumptions (e.g., i.i.d. sampling weights and negligible noise), but the manuscript is explicit about them, and the reasoning from evidence to claim is sound. A deeper exploration of how robust each conclusion is to violations of these assumptions, particularly correlated voxel weights and realistic noise, would make the argument even stronger.

      Beyond their immediate aims, the authors offer contributions likely to shape future work. Its influence is likely to influence both analysis decisions and the design of future studies exploring the geometry of brain representations. By clarifying why correlation-based RSA seems to work so robustly, they help demystify a practice that has so far been adopted heuristically. Their proposal to adopt mean-centred Euclidean or Mahalanobis distances promises a straightforward alternative that better aligns representational geometry with decoding-based interpretations.

      In sum, I see this manuscript as a significant and insightful contribution to the field. The theoretical work clarifying the impact of sampling schemes and the role of mean removal is highly valuable. However, the identified concerns, primarily regarding the idealized nature of the models (especially for fMRI), the treatment of noise, and the need for more nuanced claims, suggest that some revisions are necessary. Addressing these points would substantially strengthen the paper's conclusions and enhance its impact on the neuroscience community by ensuring the proposed methods are robustly understood and appropriately applied in real-world research settings.

    1. eLife Assessment

      This study makes an important contribution by showing that humans adapt learning rates rationally to environmental volatility yet systematically misattribute noise as volatility, demonstrating approximate rationality with simplified internal models. The evidence is compelling, encompassing a cleverly designed volatility-versus-noise paradigm, innovative lesion-based comparisons between reinforcement-learning and degraded Bayesian Observer Models, and convergent behavioural and pupillometric data. Expanding formal model comparisons (e.g., BIC/AIC) and directly contrasting RL and Bayesian fits to physiological markers would further enhance the work, but these are minor limitations that do not detract from the core findings.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understand the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      Weaknesses:

      The model space could be more extensive, although the authors have covered the most relevant models for the question at hand.

      Comments on revisions: I have no further recommendations for the authors, they have addressed my previous comments very well.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      Reinforcement Learning (RL) Model:<br /> They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent-it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      Bayesian Observer Model (BOM):<br /> To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM-in which the agent has a coarser representation of noise compared to volatility-provides the best fit to the participants' behavior. This suggests that participants are not fully distinguishing between noise and volatility, leading to misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Comments on revisions:

      The authors have addressed all my questions. Congratulations on the impressive work accomplished by the authors!

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      For clarity, the methods would benefit from further detail of task framing to participants. I.e. were there explicit instructions regarding volatility/task contingencies? Or were participants told nothing?

      We have added in the following explanatory text to the methods section (page 20), clarifying the limited instructions provided to participants:

      “Participants were informed that the task would be split into 6 blocks, that they had to learn which was the best option to choose, and that this option may change over time. They were not informed about the different forms of uncertainty we were investigating or of the underlying structure of the task (that uncertainty varied between blocks).”

      In the results, it would be useful to report the general task behavior of participants to get a sense of how they performed across different parts of the task. Also, were participants excluded if they didn't show evidence of learning adaptation to volatility?

      We have added the following text reporting overall performance to the results (page 6):

      “Participants were able to learn the best option to choose in the task, selecting the most highly rewarded option on an average of 71% of trials (range 65% - 74%).”

      And the following text to the methods, confirming that participants were not excluded if they didn’t respond to volatility/noise (the failure in this adaptation is the focus of the current study) (page 19):

      “No exclusion criteria related to task performance were used.”

      The results would benefit from a more intuitive explanation of what the lesioning is trying to recapitulate; this can get quite technical and the objective is not necessarily clear, especially for the less computationally-minded reader.

      We have amended the relevant section of the results to clarify this point (page 9):

      “Having shown that an optimal learner adjusts its learning rate to changes in volatility and noise as expected, we next sought to understand the relative noise insensitivity of participants. In these analyses we “lesion” the BOM, to reduce its performance in some way, and then assess whether doing so recapitulates the pattern of learning rate adaptation observed for participants (Fig 3e). In other words, we damage the model so it performs less well and then assess whether this damage makes the behaviour of the BOM (shown in Fig 3f) more closely resemble that seen in participants (Fig 3e).”

      The modelling might be improved by the inclusion of another class of model. Specifically, models that adapt learning rates in response to the estimation of latent states underlying the current task outcomes would be very interesting to see. In a sense, these are also estimating volatility through changeability of latent states, and it would be interesting to explore whether the findings could also be explained by an incorrect assumption that the latent state has changed when outcomes are noisy.

      Thank you for this suggestion. We have added additional sections to the supplementary materials in which we use a general latent state model and a simple RL model to try to recapitulate the behaviour of participants (and to compare with the BOM). These additional sections are extensive, so are not reproduced here. We have also added in a section to the discussion in the main paper covering this interesting question in which we confirm that we were unable to reproduce participant behaviour (or the normative effect of the lesioned BOMs) using these models but suggest that alternative latent state formulations would be interesting to explore in future work (page 18):

      “A related question is whether other, non-Bayesian model formulations may be able to account for participants’ learning adaptation in response to volatility and noise. Of note, the reinforcement learning model used to measure learning rates in separate blocks does not achieve this goal—as this model is fitted separately to each block rather than adapting between blocks (NB the simple reinforcement learning model that is fitted across all blocks does not capture participant behaviour, see supplementary information). One candidate class of model that has potential here is latent-state models (Cochran & Cisler, 2019), in which the variance and unexpected changes in the process being learned (which have a degree of similarity with noise and volatility respectively) is estimated and used to alter the model’s rates of updating as well as the estimated number of states being considered. Using the model described by Cochran and Cisler, we were unable to replicate the learning rate adaptation demonstrated by participants in the current study (see supplementary information) although it remains possible that other latent state formulations may be more successful. “

      The discussion may benefit from a little more discussion of where this work leads us - what is the next step?

      As above, we have added in a suggestion about future modelling work. We have also added in a section about the outstanding interesting questions concerning the neural representation of these quantities, reproduced in response to the suggestion by reviewer #2 below.

      Reviewer #2 (Recommendations for the authors):

      The study presents an opportunity to explore potential neural coding models that could account for the cognitive processes underlying the task. In the field of neural coding, noise correlation is often measured to understand how a population of neurons responds to the same stimulus, which could be related to the noise signal in this task. Since the brain likely treats the stimulus as the same, with noise representing minor changes, this aspect could be linked to the participants' difficulty distinguishing noise from volatility. On the other hand, signal correlation is used to understand how neurons respond to different stimuli, which can be mapped to the volatility signal in the task. It would be highly beneficial if the authors could discuss how these established concepts from neural population coding might relate to the Bayesian behavior model used in the study. For instance, how might neurons encode the distinction between noise and volatility at a population level? Could noise correlation lead to the misattribution of noise as volatility at a neural level, mirroring the behavioral findings? Discussing possible neural models that could explain the observed behavior and relating it to the existing literature on neural population coding would significantly enrich the discussion. It would also open up avenues for future research, linking these behavioral findings to potential neural mechanisms.

      We thank the reviewer for this interesting suggestion. We have added in the following paragraph to the discussion section which we hope does justice to this interesting questions (page 18):

      Previous work examining the neural representations of uncertainty have tended to report correlations between brain activity and some task-based estimate of one form of uncertainty at a time (Behrens et al., 2007; Walker et al., 2020, 2023). We are not aware of work that has, for example, systematically varied volatility and noise and reported distinct correlations for each. An interesting possibility as to how different forms of uncertainty may be encoded is suggested by parallels with the neuronal decoding literature. One question addressed by this literature is how the brain decodes changes in the world from the distributed, noisy neural responses to those changes, with a particular focus on the influence of different forms of between-neuron correlation (Averbeck et al., 2006; Kohn et al., 2016). Specifically, signal-correlation, the degree to which different neurons represent similar external quantities (required to track volatility) is distinguished from, and often limited by, noise-correlation, the degree to which the activity of different neurons covaries independently of these external quantities. One possibility relevant to the current study, which resembles the underlying logic of the BOM, is that a population of neurons represents the estimated mean of the generative process that produces task outcomes. In this case, volatility would be tracked as the signal-correlation across this population, whereas noise would be analogous to the noise-correlation and, crucially, misestimation of noise as volatility might arise as misestimation of these two forms of correlation. While the current study clearly cannot adjudicate on the neural representation of these processes, our finding of distinct behavioural and physiological responses to the two forms of uncertainty, does suggest that separable neural representations of uncertainty are maintained. “

    1. eLife Assessment

      The authors provide compelling evidence that a chloride ion stabilizes the protonated Schiff base chromophore linkage in the animal rhodopsin Antho2a. This important finding is novel and of major interest to a broad audience, including optogenetics researchers, protein engineers, spectroscopists, and environmental biologists. The study combines state-of-the-art research methods, such as spectroscopic and mutational analyses, which are complemented by QM/MM calculations, and was further improved based on the comments from the reviewers.

    2. Reviewer #1 (Public review):

      The chromophore molecule of animal and microbial rhodopsins is retinal which forms a Schiff base linkage with a lysine in the 7-th transmembrane helix. In most cases, the chromophore is positively charged by protonation of the Schiff base, which is stabilized by a negatively charged counterion. In animal opsins, three sites have been experimentally identified, Glu94 in helix 2, Glu113 in helix 3, and Glu181 in extracellular loop 2, where a glutamate acts as the counterion by deprotonation. In this paper, Sakai et al. investigated molecular properties of anthozoan-specific opsin II (ASO-II opsins), as they lack these glutamates. They found an alternative candidate, Glu292 in helix 7, from the sequences. Interestingly, the experimental data suggested that Glu292 is not the direct counterion in ASO-II opsins. Instead, they found that ASO-II opsins employ a chloride ion as the counterion. In case of microbial rhodopsin, a chloride ion serves as the counterion of light-driven chloride pumps. This paper reports the first observation of a chloride ion as the counterion in animal rhodopsin. Theoretical calculation using a QM/MM method supports their experimental data. The authors also revealed the role of Glu292, which serves as the counterion in the photoproduct and is involved in G protein activation.

      The conclusions of this paper are well supported by data.

    3. Reviewer #2 (Public review):

      Summary:

      This work reports the discovery of a new rhodopsin from reef-building corals that is characterized experimentally, spectroscopically, and by simulation. This rhodopsin lacks a carboxylate-based counterion, which is typical for this family of proteins. Instead, the authors find that a chloride ion stabilizes the protonated Schiff base and thus serves as a counterion.

      Strengths:

      This work focuses on the rhodopsin Antho2a, which absorbs in the visible spectrum with a maximum at 503 nm. Spectroscopic studies under different pH conditions, including the mutant E292A and different chloride concentrations, indicate that chloride acts as a counterion in the dark. In the photoproduct, however, the counterion is identified as E292.

      These results lead to a computational model of Antho2a in which the chloride is modeled in addition to the Schiff base. This model is improved using the hybrid QM/MM simulations. As a validation, the absorption maximum is calculated using the QM/MM approach for the protonated and deprotonated E292 residue as well as the E292A mutant. The results are in good agreement with the experiment. However, there is a larger deviation for ADC(2) than for sTD-DFT. Nevertheless, the trend is robust since the wt and E292A mutant models have similar excitation energies. The calculations are performed at a high level of theory that includes a large QM region.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Saito et al. studies the properties of anthozoan-specific opsins (ASO-II) from organisms found in reef-building coral. Their goal was to test if ASO-II opsins can absorb visible light, and if so, what are they key factors involved.

      The most exciting aspect of this work is their discovery that ASO-II opsins do not have a counterion residue (Asp or Glu) located at any of the previously known sites found in other animal opsins.

      This is very surprising. Opsins are only able to absorb visible (long wavelength light) if the retinal Schiff base is protonated, and the latter requires (as the name implies) a "counter ion". However, the authors clearly show that some ASO-II opsins do absorb visible light.

      To address this conundrum, they tested if the counterion could be provided by exogenous chloride ions (Cl-). Their results find compelling evidence supporting this idea, and their studies of ASO-II mutant E292A suggests E292 also plays a role in G protein activation and is a counterion for a protonated Schiff base in the light-activated form.

      Strengths:

      Overall, the methods are well described and carefully executed, and the results very compelling.

      Their analysis of seven ASO-II opsin sequences undoubtedly shows they all lack a Glu or Asp residue at "normal" (previously established) counter-ion sites in mammalian opsins (typically found at positions 94, 113 or 181). The experimental studies clearly demonstrate the necessity of Cl- for visible light absorbance, as do their studies of the effect of altering the pH.

      Importantly, the authors also carried out careful QM/MM computational analysis (and corresponding calculation of the expected absorbance effects), thus providing compelling support for the Cl- acting directly as a counterion to the protonated retinal Schiff base, and thus limiting the possibility that the Cl- is simply altering the absorbance of ASO-II opsins through some indirect effect on the protein.

      Altogether, the authors clearly achieved their aims, and the results support their conclusions. The manuscript is carefully written, and refreshingly, the results and conclusions not overstated.

      This study is impactful for several reasons. There is increasing interest in optogenetic tools, especially those that leverage G protein coupled receptor systems. Thus, the authors demonstration that ASO-II opsins could be useful for such studies is of interest.

      Moreover, the finding that visible light absorbance by an opsin does not absolutely require a negatively charged amino acid be placed at one of the expected sites (94, 113 or 181) typically found in animal opsins is very intriguing and will help future protein engineering efforts. The argument that the Cl- counterion system they discover here might have been a preliminary step in the evolution of amino acid based counterions used in animal opsins is also interesting.

      Finally, given the ongoing degradation of coral reefs worldwide, the focus on these curious opsins is very timely, as is the authors proposal that the lower Schiff base pKa they discovered here for ASO-II opsins may cause them to change their spectral sensitivity and G protein activation due to changes in their environmental pH.

    1. eLife Assessment

      This valuable study employs transition-metal FRET (tmFRET) and time-correlated single-photon counting to investigate allosteric conformational changes in both isolated cyclic nucleotide-binding domains (CNBDs) and full-length bacterial CNG channels, demonstrating that transmembrane domains stabilize CNBDs in their active state. By comparing isolated CNBD constructs with full-length channels, the authors reveal how allosteric networks couple domain movements to gating energetics, providing insights into ion channel regulation mechanisms. The rigorous methodology and compelling quantitative analysis establish a framework for applying tmFRET to study conformational dynamics in diverse protein systems.

    2. Reviewer #1 (Public review):

      Summary:

      This useful work extends a prior study from the authors to observe distance changes within the CNBD domains of a full length CNG channel based on changes in single photon lifetimes due to tmFRET between a metal at an introduced chelator site and a fluorescent non canonical amino acid at another site. The data are excellent and convincingly support the authors' conclusions. In addition to the methodology being of general use for other proteins, the authors show that coupling of the CNBDs to the rest of the channel stabilizes the CNBDs in their active state relative to an isolated CNBD construct.

      Strengths:

      The manuscript is very well written and clear.

    3. Reviewer #2 (Public review):

      The manuscript by Eggan et al. investigates the energetics of conformational transitions in the cyclic nucleotide-gated (CNG) channel SthK. This lab pioneered transition metal FRET (tmFRET), which has previously provided detailed insights into ion channel conformational changes. Here, the authors analyze tmFRET fluorescence lifetime measurements in the time domain, yielding detailed insights into conformational transitions within the cyclic nucleotide binding domains (CNBDs) of the channel. The integration of tmFRET with time-correlated single-photon counting (TCSPC) represents an advancement of this technique.

    4. Reviewer #3 (Public review):

      Summary:

      This is a lucidly written manuscript describing the use of transition-metal FRET to assess distance changes during functional conformational changes in a CNG channel. The experiments were performed on an isolated C-terminal nucleotide binding domain (CNBD) and on a purified full-length channel, with FRET partners placed at two positions in the CNBD.

      The data and quantitative analysis are exemplary, and they provide a roadmap for the use of this powerful approach in other proteins. In particular, the use of the fluorescence-lifetime decay histograms to learn not just the mean distance reported by the FRET, but also the distribution of states with different distances, allows better refinement of hypotheses for the gating motions.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This useful work extends a prior study from the authors to observe distance changes within the CNBD domains of a full-length CNG channel based on changes in single photon lifetimes due to tmFRET between a metal at an introduced chelator site and a fluorescent non-canonical amino acid at another site. The data are excellent and convincingly support the authors' conclusions. The methodology is of general use for other proteins. The authors also show that coupling of the CNBDs to the rest of the channel stabilizes the CNBDs in their active state, relative to an isolated CNBD construct.

      Strengths:

      The manuscript is very well written and clear.

      Reviewer #2 (Public review):

      The manuscript "Domain Coupling in Allosteric Regulation of SthK Measured Using Time-Resolved Transition Metal Ion FRET" by Eggan et al. investigates the energetics of conformational transitions in the cyclic nucleotide-gated (CNG) channel SthK. This lab pioneered transition metal FRET (tmFRET), which has previously provided detailed insights into ion channel conformational changes. Here, the authors analyze tmFRET fluorescence lifetime measurements in the time domain, yielding detailed insights into conformational transitions within the cyclic nucleotide binding domains (CNBDs) of the channel. The integration of tmFRET with time-correlated single-photon counting (TCSPC) represents an advancement of this technique.

      The results summarize known conformational transitions of the C-helix and provide distance distributions that agree with predicted values based on available structures. The authors first validated their TCSPC approach using the isolated CNBD construct previously employed for similar experiments. They then study the more complex fulllength SthK channel protein. The findings agree with earlier results from this group, demonstrating that the C-helix is more mobile in the closed state than static structures reflect. Upon adding the activating ligand cAMP, the C-helix moves closer to the bound ligand, as indicated by a reduced fluorescence lifetime, suggesting a shorter distance between the donor and acceptor. The observed effects depend on the cAMP concentration, with affinities comparable to functional measurements. Interestingly, a substantial amount of CNBDs appear to be in the activated state even in the absence of cAMP (Figure 6E and F, fA2 ~ 0.4).

      This may be attributed to cooperativity among the CNBDs, which the authors could elaborate on further. In this context, the major limitation of this study is that distance distributions are observed only in one domain. While inter-subunit FRET is detected and accounted for, the results focus exclusively on movements within one domain. Thus, the resulting energetic considerations must be assessed with caution. In the absence of the activator, the closed state is favored, while the presence of cAMP favors the open state. This quantifies the standard assumption; otherwise, an activator would not effectively activate the channel. However, the numerical values of approximately 3 kcal/mol are limited by the fact that only one domain is observed in the experiment, and only one distance (C- helix relative to the CNBD) is probed. Additional conformational changes leading to pore opening (including rotation and upward movement of the CNBD, and radial dilation of the tetrameric assembly) are not captured by the current experiments. These limitations should be taken into account when interpreting the results.

      We agree that these are important limitations to consider in interpreting our results. These limitations and future directions are now largely covered in our discussion. We believe measurements in individual domains provide unique insights into the contributions of different parts of the protein and future work will continue to address conformational energetics in other parts of the protein and subunit cooperativity. 

      Reviewer #3 (Public review):

      Summary:

      This is a lucidly written manuscript describing the use of transition-metal FRET to assess distance changes during functional conformational changes in a CNG channel.

      The experiments were performed on an isolated C-terminal nucleotide binding domain

      (CNBD) and on a purified full-length channel, with FRET partners placed at two

      positions in the CNBD.

      Strengths:

      The data and quantitative analysis are exemplary, and they provide a roadmap for use of this powerful approach in other proteins.

      Weaknesses/Comments:

      A ~3x lower Kd for nucleotide is seen for the detergent-solubilized full-length channel, compared to electrophysiological experiments. This is worth a comment in the Discussion, particularly in the context of the effect of the pore domain on the CNBD energetics.

      We are cautious to interpret our K<sub>D</sub> values given the high affinity for cAMP and the challenges of accurately determining the total protein concentrations in our experiments. We now state this explicitly in the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript is very well written and clear. Congrats to the authors.

      Minor comment: In "Measuring tmFRET in Full-Length SthK", 3rd paragraph: "... FRET model with both intersubunit and intersubunit FRET." Should read "intersubunit and intrasubunit".

      Thank you for the comment, this is now corrected.  

      Reviewer #2 (Recommendations for the authors):

      Overall, the manuscript is well-written and clearly explained. However, I recommend that the authors discuss the limitations more critically.

      The revised manuscript now largely addresses these limitations. Additional comments are addressed in short below:  

      A) Only one distance is measured.

      We believe validating a single distance as an important first step in determining the use of this technique and beginning to quantify the allosteric mechanism in SthK. Future studies aim to make additional measurements.

      B) Measurements are confined to a single domain in the cooperative tetrameric assembly.

      Isolating conformational changes in individual domains, allows us to determine how different parts of the protein contribute to the activation upon ligand binding.  

      C) The change in distance upon activation mirrors what is observed in the closed state, which casts doubt on whether these conformational changes actually lead to channel opening or merely reflect the upward swinging of the C-helix that contributes to coordinating cAMP in the binding pocket.

      Future studies aim to detect conformational changes in the pore and other parts of the protein.

      D) Rigid body movements, rotations, and dilations are not captured by the measurements. 

      Our measurements combine energetic information with some, although more limited, structural information.   

      E) Cooperativity is not considered in the interpretation of the results.

      It is currently unclear where in SthK cooperativity arises upon ligand activation (ie. at the level of the CNBD, C-Linker or pore). Our results do not provide evidence of cooperativity in the CNBD upon ligand binding. 

      Additionally, the authors directly correlate their results with the functional states of SthK previously reported, but it remains open whether the modified protein for tmFRET behaves similarly to WT SthK. Functional experiments with the protein used for tmFRET, which demonstrate comparable open probabilities and cAMP potency, would considerably strengthen the manuscript.

      Further optimization is needed to express the full-length protein used in tmFRET experiments in spheroplasts to enable electrophysiological recordings from these constructs. 

      Reviewer #3 (Recommendations for the authors):

      In the final paragraph of the Discussion, the sentence "In our experiments, we assumed that deleting the pore and transmembrane domains eliminates the coupling of these regions to the CNBD" seems trivial. Perhaps it would help to add "simply" before eliminates?

      We have taken the advice and added ‘simply’ in this sentence.  

      Can a statement be made about the magnitude of the effect in the C-terminal deletion experiments in refs 27-29?

      Due to the different channels used in the C-terminal deletion experiments in refs 27-29 (HCN1 and spHCN), compared to the channel we used (SthK), it is challenging to compare the magnitude of energetic changes between these studies. Additionally, the HCN experiments measured changes in the pore domain, compared to the conformational changes in the CNBD domain measured here.

    1. eLife Assessment

      The authors provide a convincing summary of ten years of Brain Initiative funding including the historical development, the specific funding mechanisms, and examples of grants funded and work produced. It is particularly valuable at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

    2. Reviewer #1 (Public review):

      This is a convincing description of approximately ten years of funding from the NIH BRAIN initiative. It is of particular value at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

      The paper contains a fair bit of documentation so that the curious reader can actually parse what this BRAIN program funded. The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals. In revision, the paper has been improved with respect to clarity and by bringing together two separate papers into one stronger piece.

    3. Reviewer #2 (Public review):

      Summary:

      The authors provide an important summary of ten years of Brain Initiative funding including a description of the historical development of the initiative, the specific funding mechanisms utilized, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed.

      The authors have improved the presentation by integrating the weaker of the two manuscripts with the stronger, by clarifying terminology and by performing additional analyses.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.

      Strengths:

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals.

      Weaknesses:

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1.

      We respectfully disagree about the utility of the RCR, particularly because it is metric that is normalized by both year and topical area. We have added a more detailed description of how the RCR is calculated on page 6-7. Please note that figure 1 is aimed to highlight the funding opportunities, investments and number of awards associated with small lab (exploratory) versus team (elaborated, mature) research rather than a description of publication metrics.

      Reviewer #2 (Public review):

      Summary:

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact.

      Strengths:

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented.

      Weaknesses:

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success.

      While we sincerely appreciate the feedback, we chose to retain these plots on domains and models to provide a sense of the broad spectrum of research topics contained in our TeamBCP awards. Further details on the awards can be derived from the award links provided in the text. Additionally, we retained the Sankey plots because these are a visual depiction of how awards transition from one mechanism to another, evolve in their funding sources, and advance in their research trajectories. The plot is an example of our continuity analysis which is only reported in the text and not visually shown for the remaining BCP programs.

      Recommendations for the authors:

      Editorial note:

      In the discussion, the reviewers agreed that the present manuscript does not make a sufficient independent contribution and so would be more profitably combined with the companion manuscript. Both reviewers noted that there was not much insight that relied on the single figure. Since neither manuscript is long, and they have overlapping authors (including the same first and last authors), this should not be a difficult merger to achieve.

      Thank you for the recommendation to merge. We have combined both manuscripts into one in this version.

      Reviewer #1 (Recommendations for the authors):

      The jargon of the grant programs could be described as a nightmare. Wellcome is spelled wrong.

      We have attempted to limit the use of jargon and to define acronyms in this version. We have corrected the spelling of Wellcome.

      Reviewer #2 (Recommendations for the authors):

      I suggest that the two manuscripts be combined into a single paper. Although the other manuscript could stand on its own, this one does not.

      The idea of culture change surrounding teams is useful but really forms more of a policy- focused opinion piece than a quantitative analysis of funding impact.

      If the authors insist on keeping these separate, it is critical to remove the team data from the other manuscript.

      We have combined both manuscripts and decided to retain the description of culture change but have edited and condensed this section and will use the supplemental report for qualitative assessments.

    1. Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

      Weaknesses:

      (1) The sample size for the study was not calculated, although it was a nested cohort study.

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study.

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions.

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power.

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149.

      (6) Some figures are not clear (see Figure 4 A & B).

      (7) No statement on conflict of interest was included, considering sponsorship of the study.

    2. eLife Assessment

      This study makes a novel and valuable contribution by adapting step selection functions, traditionally used in animal ecology, to explore human movement and environmental risk exposure in urban slums, offering a promising framework for spatial epidemiology, particularly regarding leptospirosis. The integration of GPS telemetry with environmental data and the stratification by gender and serostatus are notable strengths that enhance the study's relevance for public health applications. The strength of evidence is compelling.

    3. Reviewer #2 (Public review):

      Summary:

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status.

      Strengths:

      The authors assembled a rich dataset by collecting human GPS logger data, combined with field-recorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection).

      Weaknesses:

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed.

    1. eLife Assessment

      This manuscript provides valuable insights into the heterogeneity of hematopoietic stem cells and age-associated myeloid-biased hematopoiesis. While several aspects of the study are intriguing and merit further investigation, the current results remain incomplete and additional data are necessary to substantiate the conclusions. Some of the methods and data analyses partially support the claims.

    2. Reviewer #1 (Public review):

      In this study, Nishi et al. claim that the ratio of long-term hematopoietic stem cell (LT-HSC) versus short-term HSC (ST-HSC) determines the lineage output of HSCs and reduced ratio of ST-HSC in aged mice causes myeloid-biased hematopoiesis. Authors used Hoxb5 reporter mice to isolated LT-HSC and ST-HSC and performed molecular analyses and transplantation assays to support their arguments. How hematopoietic system becomes myeloid-biased upon aging is an important question with many implications in disease context as well. However, this study needs more definitive data.

      (1) Authors' experimental designs have some caveats to definitely support their claims. Authors claimed that aged LT-HSCs have no myeloid-biased clone expansion using transplantation assays. In these experiments, authors used 10 HSCs and young mice as recipients. Given the huge expansion of old HSC by number and known heterogeneity in immunophenotypically defined HSC populations, it is questionable how 10 out of so many old HSCs (an average of 300,000 up to 500,000 cells per mouse; Mitchell et al., Nature Cell Biology, 2023) can faithfully represent old HSC population. The Hoxb5+ old HSC primary and secondary recipient mice data (Fig. 2C and D) support this concern. In addition, they only used young recipients. Considering the importance of inflammatory aged niche in the myeloid-biased lineage output, transplanting young vs old LT-HSCs into aged mice will complete the whole picture.

      In response to the above comments, the authors calculated the required sample size as approximately 384 cells to represent 500,000 HSCs per old mouse. Based on the total 1260 cells used throughout the whole manuscript (Figures 2, 3, 5, 6, S3, and S6), the authors claimed that the data is reflecting old HSC behavior. However, 384 cells represent HSCs from one old mouse. Following the authors' logic, they did only 3.2 mice (1260/384) experiment for the whole manuscript to make their argument. N of 3 is not enough, especially for old mice experiments considering the heterogeneity of aged mice. Also, they did not address the comment regarding inflammatory aged niche effects.

      (2) Authors' molecular data analyses need more rigor with unbiased approaches. They claimed that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid or lymphoid gene set enrichment but aged bulk HSCs, which are just a sum of LT-HSCs and ST-HSCs by their gating scheme (Fig. 4A), showed the "tendency" of enrichment of myeloid-related genes based on the selected gene set (Fig. 4D). Although the proportion of ST-HSCs is reduced in bulk HSCs upon aging, since ST-HSCs do not exhibit lymphoid gene set enrichment based on their data, it is hard to understand how aged bulk HSCs have more myeloid gene set enrichment compared to young bulk HSCs. This bulk HSC data rather suggest that there could be a trend toward certain lineage bias (although not significant) in aged LT-HSCs or ST-HSCs. Authors need to verify the molecular lineage priming of LT-HSCs and ST-HSCs using another comprehensive dataset.

      (3) Although authors could not find any molecular evidence for myeloid-biased hematopoiesis from old HSCs (either LT or ST), they argued that the ratio between LT-HSC and ST-HSC causes myeloid-biased hematopoiesis upon aging based on young HSC experiments (Fig. 6). However, old ST-HSC functional data showed that they barely contribute to blood production unlike young Hoxb5- HSCs (ST-HSC) in the transplantation setting (Fig. 2). Is there any evidence that in unperturbed native old hematopoiesis, old Hoxb5- HSCs (ST-HSC) still contribute to blood production? To answer this question, authors performed additional experiments with increased cell number (Fig. S6). Although Fig. S6.D data has a statistical significance, it is questionable how biologically meaningful it is. More fundamental question is back to the representability. Can this cell number used in this experiment represent old HSC (either LT or ST) behavior?

    3. Reviewer #2 (Public review):

      Summary:

      Nishi et al, investigate the well-known and previously described phenomenon of age-associated myeloid-biased hematopoiesis. Using a previously established HoxB5mCherry mouse model, they used HoxB5+ and HoxB5- HSCs to discriminate cells with long-term (LT-HSCs) and short-term (ST-HSCs) reconstitution potential and compared these populations to immunophenotypically defined 'bulk HSCs' that consists of a mixture of LT-HSC and ST-HSCs. They then isolated these HSC populations from young and aged mice to test their function and myeloid bias in non-competitive and competitive transplants into young and aged recipients. Based on quantification of hematopoietic cell frequencies in the bone marrow, peripheral blood, and in some experiments the spleen and thymus, the authors argue against the currently held belief that myeloid-biased HSCs expand with age.

      While aspects of their work are fascinating and might have merit, several issues weaken the overall strength of the arguments and interpretation. Multiple experiments were done with a very low number of recipient mice, showed very large standard deviations, and had no statistically detectable difference between experimental groups. While the authors conclude that these experimental groups are not different, the displayed results seem too variable to conclude anything with certainty. The sensitivity of the performed experiments (e.g. Fig 3; Fig 6C, D) is too low to detect even reasonably strong differences between experimental groups and is thus inadequate to support the author's claims. This weakness of the study is not acknowledged in the text and is also not discussed. To support their conclusions the authors need to provide higher n-numbers and provide a detailed power analysis of the transplants in the methods section.

      As the authors attempt to challenge the current model of the age-associated expansion of myeloid-biased HSCs (which has been observed and reproduced by many different groups), ideally additional strong evidence in the form of single-cell transplants is provided.<br /> It is also unclear why the authors believe that the observed reduction of ST-HSCs relative to LT-HSCs explains the myeloid-biased phenotype observed in the peripheral blood. This point seems counterintuitive and requires further explanation.

      Based on my understanding of the presented data, the authors argue that myeloid-biased HSCs do not exist, as:<br /> a) they detect no difference between young/aged HSCs after transplant (mind low n-numbers and large std);<br /> b) myeloid progenitors downstream of HSCs only show minor or no changes in frequency and c) aged LT-HSCs do not outperform young LT-HSC in myeloid output LT-HScs in competitive transplants (mind low n-numbers and large std!!!).<br /> However, given the low n-numbers and high variance of the results, the argument seems weak and the presented data does not support the claims sufficiently. That the number of downstream progenitors does not change could be explained by other mechanisms, for instance, the frequently reported differentiation short-cuts of HSCs and/or changes in the microenvironment.

      Strengths:

      The authors present an interesting observation and offer an alternative explanation of the origins of aged-associated myeloid-biased hematopoiesis. Their data regarding the role of the microenvironment in the spleen and thymus appears to be convincing.

      Weaknesses:

      "Then, we found that the myeloid lineage proportions from young and aged LT-HSCs were nearly comparable during the observation period after transplantation (Fig. 3, B and C)."<br /> [Comment to the authors]: Given the large standard deviation and low n-numbers, the power of the analysis to detect differences between experimental groups is very low. Experimental groups with too large standard deviations (as displayed here) are difficult to interpret and might be inconclusive. The absence of clearly detectable differences between young and aged transplanted HSCs could thus simply be a false-negative result. The shown experimental results hence do not provide strong evidence for the author's interpretation of the data. The authors should add additional transplants and include a detailed power analysis to be able to detect differences between experimental groups with reasonable sensitivity.

      Line 293: "Based on these findings, we concluded that myeloid-biased hematopoiesis observed following transplantation of aged HSCs was caused by a relative decrease in ST-HSC in the bulk-HSC compartment in aged mice rather than the selective expansion of myeloid-biased HSC clones."

      [Comment to the authors]: Couldn't that also be explained by an increase in myeloid-biased HSCs, as repeatedly reported and seen in the expansion of CD150+ HSCs? It is not intuitively clear why a reduction of ST-HSCs clones would lead to a myeloid bias. The author should try to explain more clearly where they believe the increased number of myeloid cells comes from. What is the source of myeloid cells if the authors believe they are not derived from the expanded population of myeloid-biased HSCs?

      New comment for the authors:

      While the authors provide new evidence, clarify the text, and adjust their interpretation, the presented data remain weak and do not convincingly challenge the current paradigm. As myeloid-biased HSC expansion with age has been observed and published by many different groups, the authors need to provide much stronger evidence to challenge the observations of others. Key experiments that might support their claims had been suggested, but as indicated, the authors plan to provide these much more rigorous experiments in future studies. As it stands, the overall conclusions of this manuscript thus remain weak and preliminary.

      In an attempt to quantify the absolute cell number of HSPC subpopulations, the authors use a usual readout and quantify "Number of cells per minute of analysis time". This appears to be a quick and dirty reanalysis of already existing flow cytometry data. Unfortunately, this quantification cannot count the absolute number of cells reliably, as the number of cells per minute recorded is heavily influenced by the abundance of other cell populations. Instead, the author should have counted the absolute number of HSCs, MPPs, GMPs, etc. per femur, which is typically done to address this question.

      At this point, as authors are seemingly not willing to provide additional hard evidence to support their claims in this study and are instead in the process of preparing additional data for a future manuscript, I believe this study, as it stands (although weak), suggests an interesting alternative model. Despite being highly controversial, this alternative model warrants future investigations and discussions in the field. As always, it will also be important to reproduce these findings independently in other labs. As my concerns and the concerns of the other reviewers are documented and available to read by others, I believe the manuscript should be published in its current form to stimulate critical discussion and future investigations of the current model.

    4. Reviewer #3 (Public review):

      In this manuscript, Nishi et al. propose a new model to explain the previously reported myeloid-biased hematopoiesis associated with aging. Traditionally, this phenotype has been explained by the expansion of myeloid-biased hematopoietic stem cell (HSC) clones during aging. Here, the authors question this idea and show how their Hoxb5 reporter model can discriminate long-term (LT) and short-term (ST) HSC and characterized their lineage output after transplant. From these analyses, the authors conclude that changes during aging in the LT/ST HSC proportion explain the myeloid bias observed.

      Comments on revisions:

      I appreciate the authors' reply to some of my comments. However, there are some key aspects that remain unresolved. Please see below.

      - The authors propose a critical change in the way we consider the mechanisms leading to lineage biased hematopoiesis during aging. As Reviewer 2 mentioned, such a strong claim needs to be supported by solid experimental data. Unfortunately, the level of variability in key in vivo experiments (Figure 2 and 3) diminishes the robustness of these results.

      The authors argue that even with the low number of mice used in some of these experiments and the high level of variability, differences still reach (or not) statistical significance according to their analysis. I am not an expert on statistics but the only test that is mentioned is their methodology is a Welch's t test, which is only appropriate for data following a normal distribution. A more rigorous statistical analysis should be performed to sustain the claims included in the current manuscript.

      - The chosen irradiation regiment might contribute to the uncertainty of the data and influence their interpretation. As the authors show in their response to my "comment to our #3-4 response", there is a considerable (and variable) amount of "radioresistant" CD45.1+CD45.2- cells in their primary recipients, which become concerningly high in the secondary transplant. This is not found in previous publications focused on this topic and, therefore, it makes it difficult to compare those studies with the present manuscript. The inclusion of this aspect in the text is appreciated but definitely reduces the impact of their claims.

      - The correction introduced in the main text as an answer to the original comment #3-6 is still misleading. There is an assumption for GMP, CMP and MEP to increase with age if myeloid-biased HSC clones increase with age ("in contrast to what we anticipated"). Again, the link between these two changes could be more complex than just a direct correlation.

    1. eLife Assessment

      In this valuable study, Taber et al used a battery of biophysical and structural approaches to characterize the impact of erythrocytosis-related mutations in prolyl hydroxylase domain protein 2 (PHD2). The authors show that PHD2 mutant proteins are destabilized, thus supporting the tenet that dysregulation of PHD2/hypoxia induced factor (HIF) axis underpins erythrocytosis, while providing incomplete evidence that N-terminal ODD prolyl hydroxylation of HIF is indispensable for these phenotypes. Notwithstanding that this study was found to be of broad interest for a variety of fields focusing on oxygen sensing in homeostasis and pathological states, resolving inconsistencies in the biophysical analysis (e.g., NMR, SEC, and BLI/MST) was thought to be warranted to further corroborate the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis. Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.

      Strengths:

      (1) Simple, easy-to-follow manuscript. Generally well-written.

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action.

      (3) Good, well-researched background section.

      Weaknesses:

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein.

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation.

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD.

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD.

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation.

    3. Reviewer #2 (Public review):

      Summary:

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patient-derived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors.

      Strengths:

      (1) This manuscript is well-written and clear.

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims.

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells.

      Weaknesses:

      Major:

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods.

      (2) The NMR hydroxylation assay.

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B.<br /> B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec?<br /> C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this?

      (3) Data validating the CRISPR KO HEK293A cells is missing.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data.

      Minor:

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided.

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity?

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway.

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought.

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases.

      Strengths:

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis.

      Weaknesses:

      There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis.

      The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling.

      The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM).

      Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions.

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis.

      Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.  

      Strengths: 

      (1) Simple, easy-to-follow manuscript. Generally well-written. 

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action. 

      (3) Good, well-researched background section. 

      Weaknesses: 

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein. 

      Thank you for the comment.  We will further analyze the mutations on the available PHD2 crystal structures in complex with HIFa to discern how these substitution mutations may impact PHD2 structure and function.  

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable. 

      Agreed.  We will perform additional experiment as suggested by the reviewer to further assess aggregation and hydrodynamic size.  The colors used in the graph will be changed for a clearer differentiation between samples.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation. 

      We agree with the reviewer that the causal mechanism for some of the tested disease-causing mutants remain unclear.  The negative findings also raise the notion, perhaps considered controversial, that there may be other substrates of PHD2 that are impacted by certain mutations, which contribute to disease pathogenesis.  We will expand our discussion accordingly. 

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD. 

      The HIF1α concentration was indeed an oversight, which will be corrected to 0.18 mM.  The study by Flashman et al.[1] showing PHD2 having a lower affinity to the NODD than CODD likely contributes to the differential hydroxylation rates via PHD2 WT.  We showed here via MST that PHD2 P317R had Kd of 320 ± 20 uM for HIF1αCODD, which should have led to a severe enzymatic defect, even at the high concentrations used for NMR (180 uM).  However, we observed only a subtle reduction in hydroxylation efficiency in comparison to PHD2 WT.  Thus, we performed another binding method using BLI that showed a mild binding defect on CODD by PHD2 P317R, consistent with NMR data.  The perplexing result is the WT-like binding to the NODD by PHD2 P317R, which appears inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.  These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation. We will perform additional binding experiments to further interrogate and validate the binding affinity of PHD2 P317R to NODD and CODD.

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD. 

      The reviewer’s structural prediction that P317R mutation should cause a major binding defect, while agreeable with our MST data, is incongruent with our NMR and the data from Chowdhury et al.[2] that showed efficient hydroxylation of CODD via PHD2 P317R.  Moreover, we have attempted to model NODD and CODD on apo PHD2 P317R structure and found that the mutation had no major impact on CODD while the mutated residue could clash with NODD, causing a shifting of peptide positioning on the protein.  However, these modeling predictions, like any in silico projections, would need experimental validation.  As mentioned in our preceding response, we also performed BLI, which showed that PHD2 P317R had a minor binding defect for CODD, consistent with the NMR results and findings by Chowdhury et al[2].  NODD binding was also measured with BLI as purified NODD peptides were not amenable for soluble-based MST assay, which showed similar K<sub>d</sub>’s for PHD2 WT and P317R.  Considering the absence of NODD hydroxylation via PHD2 P317R as measured by NMR and modeling on apo PHD2 P317R, we posit that P317R causes deviation of NODD from its original orientation that may not affect binding due to the other interactions from the surrounding elements but unfortunately disallows NODD from turnover.  Further study would be required to validate such notion, which we feel is beyond the scope of this manuscript.  However, we will perform additional binding experiments to further interrogate PHD2 P317R binding to NODD.   

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation. 

      We thank the reviewer for raising these and other limitations.  We will expand on the shortcomings of the present study but would like to underscore that the current work using the recently described NMR assay along with other biophysical analyses suggests a previously under-appreciated role of NODD hydroxylation in the normal oxygen-sensing pathway.  

      Reviewer #2 (Public review): 

      Summary: 

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patientderived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors. 

      Strengths: 

      (1) This manuscript is well-written and clear. 

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims. 

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells. 

      Weaknesses: 

      Major: 

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods.

      Clinical and patient information on disease-causing PHD2 mutants was compiled from various case reports and summarized in an excel sheet found in the Supplementary Information.  The case reports are cited in this excel file.  A reference to the supplementary data will be added to the Figure 1 legend and in the introduction.

      (2) The NMR hydroxylation assay. 

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B. 

      As the reviewer mentioned, the assay that we developed directly measures the target proline residues.  This assay is ideal when mutations near the prolines are studied, such as A403, Y565 (He et al[3]).  In this previous work, we observed that the shifting of the target proline cross-peaks due to change in electronegativity on the pyrrolidine ring of proline in turn impacted the neighboring residues[3], which meant that the neighboring residues can be used as reporter residues for certain purposes.  In this study, we focused on investigating the mutations on PHD2 while leaving the sequence of the HIF-1α unchanged by using solely 15N-HSQC-based experiments without the need for double-labeled samples.  Nonetheless, we thank the reviewer for pointing out the confusion in the text and we will correct and clarify our description of this assay.

      B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec? 

      For previous studies, we performed intercalated 15N-HSQC and 13C-CON experiments for the kinetic measurements of wild-type HIF-1α and mutants.  We observed that the shifting pattern of A403 and I566 in the 15N-HSQC spectra aligned well with the ones of P402 and P564, respectively, in the 13C-CON spectra.  Representative data will be added to Supplemental Data.

      C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this? 

      This is an astute observation by the reviewer.  We checked and confirmed that for all kinetic datasets, the peak intensities of the end point residue are always slightly lower than the ones of the starting.  This includes the cases for PHD2 A228S and P317R in 5B, although not as obvious as the one of PHD2 WT.  We agree with the reviewer that the sample dilution is a factor as a total volume of 16 microliters of reaction components was added to the solution to trigger the reaction after the first spectrum was acquired.  It is also likely that rate of prolyl hydroxylation becomes extremely slow with only a low amount of substrate available in the system.  Therefore, the reaction would not be 100% complete which was detected by the sensitive NMR experimentation.

      (3) Data validating the CRISPR KO HEK293A cells is missing. 

      We thank the reviewer for noting this oversight.  Western blots validating PHD2 KO in HEK293A cells will be added to the Supplementary Data file.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data. 

      Agreed.  We will perform additional experiments as suggested with equal volume and concentration of each PHD2 construct loaded onto the SEC column for better assessment of aggregation.

      Minor: 

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided. 

      Additional justification for the selection of the mutants will be added to the ‘Mutations across the PHD2 enzyme induce erythrocytosis’ section.  Briefly, some mutants were chosen based on their frequency in the clinical data and their presence in potential mutational hot spots.  Various mutations were noted at W334 and R371, while F366L was identified in multiple individuals.  Additionally, 9 cases of PHD2-driven disease were reported to be caused from mutations located between residues 200 to 210 while 13 cases were reported between residues 369-379, so G206C and R371H were chosen to represent potential hot spots.  To examine a potential genotype-phenotype relationship, two of the mutants responsible for neuroendocrine tumor development, A228S and H374R, were also selected.  Finally, mutations located close or on catalytic core residues (P317R, R371H, and H374R) were chosen to test for suspected defects.   

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity? 

      This is the very question that we are currently addressing but as a part of a follow-up study.  Indeed, one thought is that the preferential defect observed could be the result of the loss of proline, an exceptionally rigid amino acid that makes contact with the backbone twice, or the addition of a specific amino acid, namely arginine, a flexible amino acid with an added charge at this site.  Although beyond the scope of this manuscript, we will investigate whether such and other characteristics in this region of PHD2/HIF1α interface contribute to the differential hydroxylation. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway. 

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought. 

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases. 

      Strengths: 

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis. 

      Weaknesses: 

      (1) There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis. 

      We thank the reviewer for this comment, and while we agree with this statement, the objective of this study per se was to elucidate the structural and/or functional defect caused by the various diseaseassociated mutations on PHD2. The subsequent study would be to validate whether the identified defects, in particular the selective NODD impairment, would lead to erythrocytosis in vivo.  However, we feel that such study would be beyond the scope of this manuscript.

      (2) The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling. 

      Agreed.  All experimental assays and systems have limitations. The HRE-luciferase assay used in the present manuscript also has limitations such as the continuous expression of exogenous PHD2 mutants driven via CMV promoter. Thus, we performed several additional biophysical methodologies to interrogate the disease-causing PHD2 mutants. The limitations of the luciferase assay will be expanded in the revised manuscript. 

      (3) The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM). 

      We thank the reviewer for the comment.  While solving the structure of PHD2 P317R in complex with HIFα substrate is beyond the scope for this study, a structure of PHD2 P317R in complex with a clinically used inhibitor has been solved (PDB:5LAT).  In analyzing this structure and that of PHD2 WT in complex with NODD, Chowdhury et al[2] stated that P317 makes hydrophobic contacts with LXXLAP motif on HIFα and R317 is predicted to interact differently with this motif. While this analysis does not directly elucidate the reason for the preferential NODD defect, it supports the possibility that P317R substitution may be more detrimental for enzymatic activity on NODD than CODD. We will discuss this notion in the revised manuscript. 

      (4) Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions. 

      We thank the reviewer for this comment, but we feel that such study would be beyond the scope of the present study. We observed that the PHD2 binding patterns to HIF1α and HIF2α were similar, and we have previously assigned >95% of the amino acids in HIF1α ODD for NMR study[3]. Thus, we first focused on the elucidation of possible defects on disease-associated PHD2 mutants using HIF1α as the substrate with the supposition that an identified deregulation on HIF1α could be extended to HIF2α paralog. 

      However, we agree with the reviewer that future studies should examine the impact of PHD2 mutants directly on HIF2α.  

      References:

      (1) Flashman, E. et al. Kinetic rationale for selectivity toward N- and C-terminal oxygen-dependent degradation domain substrates mediated by a loop region of hypoxia-inducible factor prolyl hydroxylases. J Biol Chem 283, 3808-3815 (2008).

      (2) Chowdhury, R. et al. Structural basis for oxygen degradation domain selectivity of the HIF prolyl hydroxylases. Nat Commun 7, 12673 (2016).

      (3) He, W., Gasmi-Seabrook, G.M.C., Ikura, M., Lee, J.E. & Ohh, M. Time-resolved NMR detection of prolyl-hydroxylation in intrinsically disordered region of HIF-1alpha. Proc Natl Acad Sci U S A 121, e2408104121 (2024).

    1. eLife Assessment

      Based on several lines of interesting data, the authors conclude that FMRP, though associated with stalled ribosomes, does not determine the position on the mRNAs at which ribosomes stall. Although this conclusion would be valuable if clearly established, the current set of data are incomplete and it is unclear if the methodologies applied in this paper are fully adequate to address this gap.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of FMRP in the formation and function of RNA granules in mouse brain/cultured hippocampal neurons. Most of their results indicate that FMRP does not have a role in the formation or function of RNA granules with specific mRNAs, but may have some role in distal RNA granules in neurons and their response to synaptic stimulation. This is an important work (though the results are mostly negative) in understanding the composition and function of neuronal RNA granules. The last part of the work in cultured neurons is disjointed from the rest of the manuscript, and the results are neither convincing nor provide any mechanistic insight.

      Strengths:

      (1) The study is quite thorough, the methods and analysis used are robust, and the conclusion and interpretation are diligent.

      (2) The comparative study of Rat and Mouse RNA granules is very helpful for future studies.

      (3) The conclusion that the absence of FMRP does not affect the RNA granule composition and many of its properties in the system the authors have chosen to study is well supported by the results.

      (4) The difference in the response to DHPG stimulation concerning RNA granules described here is very interesting and could provide a basis for further studies, though it has some serious technical issues.

      Weaknesses:

      (1) The system used for the study (P5 mouse brain or DIV 8-10 cultured neuron) is surprising, as the majority of defects in the absence of FMRP are reported in later stages (P30+ brain and DIV 14+ neurons). It is important to test if the conclusions drawn here hold good at different developmental stages.

      (2) The term 'distal granules' is very vague. Since there is no structural or biochemical characterization of these granules, it is difficult to understand how they are different from the proximal granules and why FMRP has an effect only on these granules.

      (3) Since the manuscript does not find any effect of FMRP on neuronal RNA granules, it does not provide any new molecular insight with respect to the function of FMRP

    3. Reviewer #2 (Public review):

      In the present manuscript, Li et al. use biochemical fractionation of "RNA granules" from P5 wildtype and FMR1 knock-out mouse brains to analyze their protein/RNA content, determine a single particle cryo-EM structure of contained ribosomes, and perform ribo-seq analysis of ribosome-protected RNA fragments (RPFs). The authors conclude from these that neither the composition of the ribosome granules, nor the state of their contained ribosomes, nor the mRNA positions with high ribosome occupancy change significantly. Besides minor changes in mRNA occupancy, the one change the authors identified is a decrease in puromycylated punctae in distal neurites of cultured primary neurons of the same mice, and their enhanced resistance to different pharmacological treatments. These results directly build on their earlier work (Anadolu et al., 2023) using analogous preparations of rat brains; the authors now perform a very similar study using WT and FMR1-KO mouse brains. This is an important topic, aiming to identify the molecular underpinnings of the FMRP protein, which is the basis of a major neurological disease. Unfortunately, several limitations of this study prevent it from being more convincing in its present form.

      In order to improve this study, our main suggestions are as follows:

      (1) The authors equate their biochemically purified "RG" fraction with their imaging-based detection of puromycin-positive punctae. They claim essentially no differences in RGs, but detect differences in the latter (mostly their abundance and sensitivity to DHPG/HHT/Aniso). In the discussion the authors acknowledge the inconsistency between these two modalities: "An inconsistency in our findings is the loss of distal RPM puncta coupled with an increase in the immunoreactivity for S6 in the RG." and "Thus, it may be that the RG is not simply made up of ribosomes from the large liquid-liquid phase RNA granules."

      How can the authors be sure that they are analysing the same entities in both modalities? A more parsimonious explanation of their results would be that, while there might be some overlap, two different entities are analyzed. Much of the main message rests on this equivalence, and I believe the authors should show its validity.

      (2) The authors show that increased nuclease digestion (and magnesium concentration) led to a reduction of their RPF sizes down to levels also seen by other researchers. Analyzing these now properly digested RPFs, the authors state that the CDS coverage and periodicity drastically improved, and that spurious enrichments of secretory mRNAs, which made up one of the major fractions in their previous work, are now reduced. In my opinion, this would be more appropriately communicated as a correction to their previous work, not as a main Figure in another manuscript.

      (3) The fold changes reported in Figure 7 (ranging between log2(-0.2) and log2(+0.25)) are all extremely small and in my opinion should not be used to derive claims such as "The loss of FMRP significantly affected the abundance and occupancy of FMRP-Clipped mRNAs in WT and FMR1-KO RG (Fig 7A, 7B), but not their enrichment between RG and RCs".

      (4) Figure 8 / S8-1 - The authors show that ~2/3 of their reads stem from PCR duplicates, but that even after removing those, the majority of peaks remain unaltered. At the same time, Figure S8-1 shows the total number of peaks to be 615 compared with 1392 before duplicate removal. Can the authors comment on this discrepancy? In addition, the dataset with properly removed artefacts should be used for their main display item instead of the current Figure 8.

      (5) Figure 9 / S9-1, the density of punctae in both WT and FMR1-KO actually increases after treatment of HHT or Anisomycin (Figure S9-1 B-C). Even if a large fraction would now be "resistant to run-off", there should not be an increase. While this effect is deemed not significant, a much smaller effect in Figure 9C is deemed significant. Can the authors explain this? Given how vastly different the sample sizes are (ranging from 23 neurites in Figures S9-1 to 5,171 neurites in Figure 9), the authors should (randomly) sample to the same size and repeat their statistical analysis again, to improve their credibility.

    4. Reviewer #3 (Public review):

      Summary: Li et al describe a set of experiments to probe the role of FMRP in ribosome stalling and RNA granule composition. The authors are able to recapitulate findings from a previous study performed in rats (this one is in mice).

      Strengths:

      1) The work addresses an important and challenging issue, investigating mechanisms that regulate stalled ribosomes, focusing on the role of FMRP. This is a complicated problem, given the heterogeneity of the granules and the challenges related to their purification. This work is a solid attempt at addressing this issue, which is widely understudied.

      2) The interpretation of the results could be interesting, if supported by solid data. The idea that FMRP could control the formation and release of RNA granules, rather than the elongation by stalled ribosomes is of high importance to the field, offering a fresh perspective into translational regulation by FMRP.

      3) The authors focused on recapitulating previous findings, published elsewhere (Anadolu et al., 2023) by the same group, but using rat tissue, rather than mouse tissue. Overall, they succeeded in doing so, demonstrating, among other findings, that stalled ribosomes are enriched in consensus mRNA motifs that are linked to FMRP. These interesting findings reinforce the role of FMRP in formation and stabilization of RNA granules. It would be nice to see extensive characterization of the mouse granules as performed in Figure 1 of Anadolu and colleagues, 2023.

      4) Some of the techniques incorporated aid in creating novel hypotheses, such as the ribopuromycilation assay and the cryo-EM of granule ribosomes.

      Weaknesses:

      1) The RNA granule characterization needs to be more rigorous. Coomassie is not proper for this type of characterization, simply because protein weight says little about its nature. The enrichment of key proteins is not robust and seems to not reach significance in multiple instances, including S6 and UPF1. Furthermore, S6 is the only proxy used for ribosome quantification. Could the authors include at least 3 other ribosomal proteins (2 from small, 2 from large subunit)?

      2) Page 12-13 - The Gene Ontology analysis is performed incorrectly. First, one should not rank genes by their RPKM levels. It is well known that housekeeping genes such as those related to actin dynamics, molecular transport and translation are highly enriched in sequencing datasets. It is usually more informative when significantly different genes are ranked by p adjust or log2 Fold Change, then compared against a background to verify enrichment of specific processes. However, the authors found no DEGs. I would suggest the removal of this analysis, incorporation of a gene set enrichment analyses (ranked by p adjust). I further suggest that the authors incorporate a dimensionality reduction analysis to demonstrate that the lack of significance stems from biology and not experimental artifacts, such as poor reproducibility across biological replicates.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of FMRP in the formation and function of RNA granules in mouse brain/cultured hippocampal neurons. Most of their results indicate that FMRP does not have a role in the formation or function of RNA granules with specific mRNAs, but may have some role in distal RNA granules in neurons and their response to synaptic stimulation. This is an important work (though the results are mostly negative) in understanding the composition and function of neuronal RNA granules. The last part of the work in cultured neurons is disjointed from the rest of the manuscript, and the results are neither convincing nor provide any mechanistic insight.

      Strengths:

      (1) The study is quite thorough, the methods and analysis used are robust, and the conclusion and interpretation are diligent.

      (2) The comparative study of Rat and Mouse RNA granules is very helpful for future studies.

      (3) The conclusion that the absence of FMRP does not affect the RNA granule composition and many of its properties in the system the authors have chosen to study is well supported by the results.

      (4) The difference in the response to DHPG stimulation concerning RNA granules described here is very interesting and could provide a basis for further studies, though it has some serious technical issues.

      Weaknesses:

      (1) The system used for the study (P5 mouse brain or DIV 8-10 cultured neuron) is surprising, as the majority of defects in the absence of FMRP are reported in later stages (P30+ brain and DIV 14+ neurons). It is important to test if the conclusions drawn here hold good at different developmental stages.

      (2) The term 'distal granules' is very vague. Since there is no structural or biochemical characterization of these granules, it is difficult to understand how they are different from the proximal granules and why FMRP has an effect only on these granules.

      (3) Since the manuscript does not find any effect of FMRP on neuronal RNA granules, it does not provide any new molecular insight with respect to the function of FMRP

      Thank you for your comments and for pointing out the strengths of the manuscript. Unfortunately, we will not be able to respond to point #1. The protocol for purification of the ribosomes from RNA granules does not work in older brains (See Khandjian et al, 2004 PNAS 101:13357), presumably due to the presence of large concentrations of myelin. While it would be possible to repeat our results later in culture, we have no expectation that it would be different since we do observe DHPG induction of elongation dependent, initiation independent mGLUR-LTD in later cultures (Graber et al, 2017 J. Neuroscience 37:9116)..We will strengthen this caveat in the discussion that our results are only at a snapshot of development and that it is certainly possible that different results may be seen at different times. We agree with point 2 that ‘distal granules’ is a vague term. We will remove the term and clarify that we only quantified granules larger than 50 microns from the cell soma. We do not know if these granules are distinct. We would respectfully disagree with point #3 that the study does not provide molecular insight into the function of FMRP, as disproving that FMRP is important for stalling and determining the position of stalling removes a major hypothesis about the function of FMRP, and showing that something is not true, is at least to me, providing insight.

      Reviewer #2 (Public review):

      In the present manuscript, Li et al. use biochemical fractionation of "RNA granules" from P5 wildtype and FMR1 knock-out mouse brains to analyze their protein/RNA content, determine a single particle cryo-EM structure of contained ribosomes, and perform ribo-seq analysis of ribosome-protected RNA fragments (RPFs). The authors conclude from these that neither the composition of the ribosome granules, nor the state of their contained ribosomes, nor the mRNA positions with high ribosome occupancy change significantly. Besides minor changes in mRNA occupancy, the one change the authors identified is a decrease in puromycylated punctae in distal neurites of cultured primary neurons of the same mice, and their enhanced resistance to different pharmacological treatments. These results directly build on their earlier work (Anadolu et al., 2023) using analogous preparations of rat brains; the authors now perform a very similar study using WT and FMR1-KO mouse brains. This is an important topic, aiming to identify the molecular underpinnings of the FMRP protein, which is the basis of a major neurological disease. Unfortunately, several limitations of this study prevent it from being more convincing in its present form.

      In order to improve this study, our main suggestions are as follows:

      (1) The authors equate their biochemically purified "RG" fraction with their imaging-based detection of puromycin-positive punctae. They claim essentially no differences in RGs, but detect differences in the latter (mostly their abundance and sensitivity to DHPG/HHT/Aniso). In the discussion the authors acknowledge the inconsistency between these two modalities: "An inconsistency in our findings is the loss of distal RPM puncta coupled with an increase in the immunoreactivity for S6 in the RG." and "Thus, it may be that the RG is not simply made up of ribosomes from the large liquid-liquid phase RNA granules."

      How can the authors be sure that they are analysing the same entities in both modalities? A more parsimonious explanation of their results would be that, while there might be some overlap, two different entities are analyzed. Much of the main message rests on this equivalence, and I believe the authors should show its validity.

      (2) The authors show that increased nuclease digestion (and magnesium concentration) led to a reduction of their RPF sizes down to levels also seen by other researchers. Analyzing these now properly digested RPFs, the authors state that the CDS coverage and periodicity drastically improved, and that spurious enrichments of secretory mRNAs, which made up one of the major fractions in their previous work, are now reduced. In my opinion, this would be more appropriately communicated as a correction to their previous work, not as a main Figure in another manuscript.

      (3) The fold changes reported in Figure 7 (ranging between log2(-0.2) and log2(+0.25)) are all extremely small and in my opinion should not be used to derive claims such as "The loss of FMRP significantly affected the abundance and occupancy of FMRP-Clipped mRNAs in WT and FMR1-KO RG (Fig 7A, 7B), but not their enrichment between RG and RCs".

      (4) Figure 8 / S8-1 - The authors show that ~2/3 of their reads stem from PCR duplicates, but that even after removing those, the majority of peaks remain unaltered. At the same time, Figure S8-1 shows the total number of peaks to be 615 compared with 1392 before duplicate removal. Can the authors comment on this discrepancy? In addition, the dataset with properly removed artefacts should be used for their main display item instead of the current Figure 8.

      (5) Figure 9 / S9-1, the density of punctae in both WT and FMR1-KO actually increases after treatment of HHT or Anisomycin (Figure S9-1 B-C). Even if a large fraction would now be "resistant to run-off", there should not be an increase. While this effect is deemed not significant, a much smaller effect in Figure 9C is deemed significant. Can the authors explain this? Given how vastly different the sample sizes are (ranging from 23 neurites in Figures S9-1 to 5,171 neurites in Figure 9), the authors should (randomly) sample to the same size and repeat their statistical analysis again, to improve their credibility.

      Thank you for your comments. We agree with the issue in point #1 that the equivalence of RPM puncta with the RG fraction is an issue and while we believe that we show in a number of ways that the two are related (anisomycin-resistant puromycylation, puromyclation only at high concentrations consistent with the hybrid state, etc), we would respectfully disagree that our main message results from the equivalence of the RPM-labeled RNA granules in neurites and the ribosomes isolated by sedimentation. We will make this point clearer in our revision. For point #2, we agree that the changes with increased nuclease is somewhat out of place in a narrative sense, but it is clearly relevant to this work. Whether or not one sees this as a ‘correction’ or an interesting point will depend on a better characterization of the structures of the stalled polysomes. My personal view is that the nuclease resistance of cleavage near the RNA entrance site is quite interesting. Since we reproduce our results with a similar nuclease treatment in mice, as reported in our previous publication, I believe the comparison could be of interest in the future and would like to retain it. We agree with point #3 and will temper these claims in our revised version. For point #4, we will determine more carefully why the number of peaks differs and switch the main and supplemental figures. We apologize for the typo in the figure legend in Figure 9, 171, not 5171. The box plot line shows the median not the average and the data is clearly skewed such that the median and average are different (i.e. there is a two-fold decrease in the average density of distal puncta between WT and FMRP, but the average density is actually slightly decreased with HHT and A, although the median increases slightly. We will now report the results in distinct modalities to clarify this, and we will reexamine the statistics to better address the skewed distribution of values in the revised version.

      Summary:

      Li et al describe a set of experiments to probe the role of FMRP in ribosome stalling and RNA granule composition. The authors are able to recapitulate findings from a previous study performed in rats (this one is in mice).

      Strengths:

      (1) The work addresses an important and challenging issue, investigating mechanisms that regulate stalled ribosomes, focusing on the role of FMRP. This is a complicated problem, given the heterogeneity of the granules and the challenges related to their purification. This work is a solid attempt at addressing this issue, which is widely understudied.

      (2) The interpretation of the results could be interesting, if supported by solid data. The idea that FMRP could control the formation and release of RNA granules, rather than the elongation by stalled ribosomes is of high importance to the field, offering a fresh perspective into translational regulation by FMRP.

      (3) The authors focused on recapitulating previous findings, published elsewhere (Anadolu et al., 2023) by the same group, but using rat tissue, rather than mouse tissue. Overall, they succeeded in doing so, demonstrating, among other findings, that stalled ribosomes are enriched in consensus mRNA motifs that are linked to FMRP. These interesting findings reinforce the role of FMRP in formation and stabilization of RNA granules. It would be nice to see extensive characterization of the mouse granules as performed in Figure 1 of Anadolu and colleagues, 2023.

      (4) Some of the techniques incorporated aid in creating novel hypotheses, such as the ribopuromycilation assay and the cryo-EM of granule ribosomes.

      Weaknesses:

      (1) The RNA granule characterization needs to be more rigorous. Coomassie is not proper for this type of characterization, simply because protein weight says little about its nature. The enrichment of key proteins is not robust and seems to not reach significance in multiple instances, including S6 and UPF1. Furthermore, S6 is the only proxy used for ribosome quantification. Could the authors include at least 3 other ribosomal proteins (2 from small, 2 from large subunit)?

      (2) Page 12-13 - The Gene Ontology analysis is performed incorrectly. First, one should not rank genes by their RPKM levels. It is well known that housekeeping genes such as those related to actin dynamics, molecular transport and translation are highly enriched in sequencing datasets. It is usually more informative when significantly different genes are ranked by p adjust or log2 Fold Change, then compared against a background to verify enrichment of specific processes. However, the authors found no DEGs. I would suggest the removal of this analysis, incorporation of a gene set enrichment analyses (ranked by p adjust). I further suggest that the authors incorporate a dimensionality reduction analysis to demonstrate that the lack of significance stems from biology and not experimental artifacts, such as poor reproducibility across biological replicates.

      Thank you for your comments on the strengths of the manuscript. We agree with point #1 that the mouse RNA granule characterization needs to be more rigorous and we plan to accomplish this in our revised version. Similarly, we will incorporate the additional statistical analysis suggested by the reviewer in a revised version.

    1. eLife Assessment

      In this study, the authors investigate the role of ZMAT3, a p53 target gene, in tumor suppression and RNA splicing regulation. Using quantitative proteomics, the authors uncover that ZMAT3 knockout leads to upregulation of HKDC1, a gene linked to mitochondrial respiration, and that ZMAT3 suppresses HKDC1 expression by inhibiting c-JUN-mediated transcription. This set of convincing evidence reveals a fundamental mechanism by which ZMAT3 contributes to p53-driven tumor suppression by regulating mitochondrial respiration.

    2. Reviewer #1 (Public review):

      Summary:

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53-mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration.

      Strengths:

      The authors use multiple orthogonal approaches to test the majority of their findings.

      The authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration.

      Weaknesses:

      Some indication as to whether other c-JUN target genes are also regulated by ZMAT3 would improve the broad relevance of the authors' findings.

    3. Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.

      Strengths:

      Mechanistically, ZMAT3 suppresses HKDC1 transcription by sequestering JUN and preventing its binding to the HKDC1 promoter, resulting in reduced HKDC1 expression. Conversely, p53 mutation leads to ZMAT3 downregulation and HKDC1 overexpression, thereby promoting increased mitochondrial respiration and proliferation. This mechanism is novel; however, the authors should address several points.

      Weaknesses:

      The authors conduct mechanistic experiments (e.g., transcript and protein quantification, luciferase assays) to demonstrate regulatory interactions between p53, ZMAT3, JUN, and HKDC1. These findings should be supported with functional assays, such as proliferation, apoptosis, or mitochondrial respiration analyses.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Kumar et al. investigate the mechanisms underlying the tumor suppressive function of the RNA binding protein ZMAT3, a previously described tumor suppressor in the p53 pathway. To this end, they use RNA-sequencing and proteomics to characterize changes in ZMAT3-deficient cells, leading them to identify the hexokinase HKDC1 as upregulated with ZMAT3 deficiency first in colorectal cancer cells, then in other cell types of both mouse and human origin. This increase in HKDC1 is associated with increased mitochondrial respiration. As ZMAT3 has been reported as an RNA-binding and DNA-binding protein, the authors investigated this via PAR-CLIP and ChIP-seq but did not observe ZMAT3 binding to HKDC1 pre-mRNA or DNA. Thus, to better understand how ZMAT3 regulates HKDC1, the authors used quantitative proteomics to identify ZMAT3-interacting proteins. They identified the transcription factor JUN as a ZMAT3-interacting protein and showed that JUN promotes the increased HKDC1 RNA expression seen with ZMAT3 inactivation. They propose that ZMAT3 inhibits JUN-mediated transcriptional induction of HKDC1 as a mechanism of tumor suppression. This work uncovers novel aspects of the p53 tumor suppressor pathway.

      Strengths:

      This novel work sheds light on one of the most well-established yet understudied p53 target genes, ZMAT3, and how it contributes to p53's tumor suppressive functions. Overall, this story establishes a p53-ZMAT3-HKDC1 tumor suppressive axis, which has been strongly substantiated using a variety of orthogonal approaches, in different cell lines and with different data sets.

      Weaknesses:

      While the role of p53 and ZMAT3 in repressing HKDC1 is well substantiated, there is a gap in understanding how ZMAT3 acts to repress JUN-driven activation of the HKDC1 locus. How does ZMAT3 inhibit JUN binding to HKDC1? Can targeted ChIP experiments or RIP experiments be used to make a more definitive model? Can ZMAT3 mutants help to understand the mechanisms? Future work can further establish the mechanisms underlying how ZMAT3 represses JUN activity.

    1. eLife Assessment

      In their study, Neiswender et al. provide important insights into how BicD2 variants linked to spinal muscular atrophy alter dynein activity and cargo specificity. While the findings suggest disease-relevant changes in BicD2's binding partners, the evidence connecting these changes to disease mechanisms remains incomplete and would benefit from further experimental validation. The work lays a strong foundation for future research, but could be strengthened by deeper functional analysis of key interactions, such as the BicD2/HOPS complex.

    2. Reviewer #1 (Public review):

      In this work, Neiswender and colleagues test the hypothesis that mutations in BicD2 that are associated with SMALED alter BicD2-cargo interactions. To do this, they first establish the WT BicD2 cargo interactome (using a proximity-dependent biotin ligase screen with Turbo-ID on the BicD2 C-terminus). In addition to known cargo interactors, they also identified many proteins in the HOPs complex. Interestingly, they find that the HOPs complex may interact with BicD2 in a different manner than other known cargos. The authors also show that while BicD2 is required for the HOPs complex localization, on average, depletion of BicD2 from HeLa and Cos7 cells causes HOPs and Lysosome mislocalization that is consistent with Kinesin-1 trafficking defects, rather than dynein. The authors also use proximity biotin ligase approaches to define the cargo interactome of three BicD2 variants associated with SMALED. One variant (R747C) has the most altered cargo interactome. The authors highlight one protein, in particular, GRAMD1A, that is only found in the R747C dataset and mislocalizes specifically when R747C is expressed.

      The work in this manuscript is of a very high quality and contributes important findings to the field. I have a few questions that, if answered, could increase the impact of this work.

      (1) I was surprised at the effect of BicD2 knockdown on LAMP (and VPS41) localization, which really suggests that in HeLa and Cos7 cells, BicD2 regulation of Kinesin-1 (rather than dynein) is the primary driver of lysosome localization. The KIF5B-knockout rescue of the BicD2-overexpression phenotype was a very powerful result that supports this conclusion. Have the authors looked at other cargos, eg, Golgi or centrosomes in G2? Can the authors include more discussion about what this result means or how they imagine dynein and kinesin-1's interaction with BicD2 is regulated?

      (2) Have the authors examined if the SMALED mutants show diminished or increased binding to KIF5B? While the authors are correct that the mutations could hyperactivate dynein because they reduce BicD2 autoinhibition, it is possible that the SMALED mutants hyperactivate dynein because they no longer bind kinesin. This would be particularly interesting, given the complex relationship between BicD2 regulation of dynein and kinesin that the authors show in Figure 3.

      (3) What is already known about the protein GRAMD1A? Did the authors choose to focus on GRAMD1A because it was the only novel interaction found in the SMALED mutant interactomes, or was this protein interesting for a different reason? Does the known function of GRAMD1A explain the potential dysfunction of cells expressing BICD2_R747C or patients who have this mutation? More discussion of this protein and why the authors focused on it would really strengthen the manuscript.

    3. Reviewer #2 (Public review):

      Neiswender et al. investigated the interactomes between wild-type BICD2 and BICD2 mutants that are associated with Spinal Muscular Atrophy with Lower Extremity Predominance (SMALED2). Although BICD2 has previously been implicated in SMALED2, it is unclear how mutations in BICD2 may contribute to disease symptoms. In this study, the authors characterize the interactome of wild-type BICD2 and identify potential new cargos, including the HOPS complex. The authors then chose three SMALED2-associated BICD2 mutants and compared each mutant interactome to that of wild-type BICD2. Each mutant had a change in the interactome, with the most drastic being BICD2_R747C, a mutation in the cargo binding domain of BICD2. This mutant displayed less interaction with a potential new BICD2 cargo, the HOPS complex. Additionally, it displayed more interaction with an ER protein, GRAMD1A.

      The data in the paper is generally strong, but the major conclusions of this paper need more evidence to be better supported.

      (1) The authors use cells that have been engineered to express the different BICD2 constructs. As shown in Figure 4B, the authors see wide expression of BICD2_WT throughout the cell. However, WT BICD2 usually localizes to the TGN. This widespread localization introduces some uncertainty about the interactome data. The authors should either try to verify the interaction data (specifically with the HOPS complex and GRAMD1A) by immunoprecipitating endogenous BICD2 or by repeating their interactome experiment in Figure 1 using BICD2 knockout cells that express the BICD2_WT construct. This should also be done to verify the immunoprecipitation and microscopy data shown in Figure 7.

      (2) The authors conclude that cargo transport defects resulting from BICD2 mutations may contribute to SMALED2 symptoms. However, the authors are unable to determine if BICD2 directly binds to the potential new cargo, the HOPS complex. To address this, the authors could purify full-length WT BICD2 and perform in vitro experiments. Furthermore, the authors were unable to identify the minimal region of BICD2 needed for HOPS interaction. The authors could expand on the experiment attempted with the extended BICD2 C-terminal using a deltaCC1 construct, which could also be used for in vitro experiments.

      (3) Again, the authors conclude that BICD2 mutants cause cargo transport defects that are likely to lead to SMALED2 symptoms. This would be better supported if the authors are able to find a protein relevant to SMALED2 and examine if/how its localization is changed under expression of the BICD2 mutants. The authors currently use the HOPS complex and GRAMD1A as indicators of cargo transport defects, but it is unclear if these are relevant to SMALED2 symptoms.

    4. Reviewer #3 (Public review):

      Summary:

      BicD2 is a motor adapter protein that facilitates cellular transport pathways, which are impacted by human disease mutations of BicD2, causing spinal muscular atrophy with lower extremity dominance (SMALED2). The authors provide evidence that some of these mutations result in interactome changes, which may be the underlying cause of the disease. This is supported by proximity biotin ligation screens, immunoprecipitation, and cell biology assays. The authors identify several novel BicD2 interactions, such as the HOPS complex that participates in the fusion of late endosomes and autophagosomes with lysosomes, which could have important functions. Three BicD2 disease mutants studied had changes in the interactome, which could be an underlying cause for SMALED2. The study extends our understanding of the BicD2 interactome under physiological conditions, as well as of the changes in cellular transport pathways that result in SMALED2. It will be of great interest for the BicD2 and dynein fields.

      Strengths:

      Extensive interactomes are presented for both WT BicD2 as well as the disease mutants, which will be valuable for the community. The HOPS complex was identified as a novel interactor of BicD2, which is important for fusion of late endosomes and lysosomes, which is of interest, since some of the BicD2 disease mutations result in Golgi-fragmentation phenotypes. The interaction with the HOPS complex is affected by the R747C mutation, which also results in a gain-of-function interaction with GRAMD1A.

      Weaknesses:

      The manuscript should be strengthened by further evidence of the BicD2/HOPS complex interaction and the functional implications for spinal muscular atrophy by changes in the interactome through mutations. Which functional implications does the loss of the BicD2/HOPS complex interaction and the gain of function interaction with GRAMD1A have in the context of the R747C mutant?

      Major points:

      (1) In the biotin proximity ligation assay, a large number of targets were identified, but it is not clear why only the HOPS complex was chosen for further verification. Immunoprecipitation was used for target verification, but due to the very high number of targets identified in the screen, and the fact that the HOPS complex is a membrane protein that could potentially be immunoprecipitated along with lysosomes or dynein, additional experiments to verify the interaction of BicD2 with the HOPS complex (reconstitution of a complex in vitro, GST-pull down of a complex from cell extracts or other approaches) are needed to strengthen the manuscript.

      (2) In the biotin proximity ligation assay, a large number of BicD2 interactions were identified that are distinct between the mutant and the WT, but it was not clear why, particularly GRAMD1A was chosen as a gain-of-function interaction, and what the functional role of a BicD2/GRAMD1A interaction may be. A Western blot shows a strengthened interaction with the R747C mutant, but GRAMD1A also interacts with WT BicD2.

      (3) Furthermore, the functional implications of changed interactions with HOPS and GRAMD1A in the R747C mutant are unclear. Additional experiments are needed to establish the functional implication of the loss of the BicD2/HOPS interaction in the BicD2/R747C mutant. For the GRAMD1A gain of function interaction, according to the authors, a significant amount of the protein localized with BicD2/R747C at the centrosomal region. This changed localization is not very clear from the presented images (no centrosomal or other markers were used, and the changed localization could also be an effect of dynein hyperactivation in the mutant). Furthermore, the functional implication of a changed localization of GRAMD1A is unclear from the presented data.

    1. eLife Assessment

      This valuable study identifies asymmetric dimethylarginine (ADMA) histones as potential determinants of the initial genomic binding of Rhino, a Drosophila-specific chromatin protein essential for piRNA cluster specification. The authors provide correlative genomic and imaging data to support their model, although functional validation of the proposed mechanism remains incomplete. The authors could revise the manuscript to reflect that they have uncovered a small subset of piRNA clusters dependent on ADMA-histones, which may not be the general rule.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand how Rhino, a chromatin protein essential for small RNA production in fruit flies, is initially recruited to specific regions of the genome. They propose that asymmetric arginine methylation of histones, particularly mediated by the enzyme DART4, plays a key role in defining the first genomic sites of Rhino localization. Using a combination of inducible expression systems, chromatin immunoprecipitation, and genetic knockdowns, the authors identify a new class of Rhino-bound loci, termed DART4 clusters, that may represent nascent or transitional piRNA clusters.

      Strengths:

      One of the main strengths of this work lies in its comprehensive use of genomic data to reveal a correlation between ADMA histones and Rhino enrichment at the border of known piRNA clusters. The use of both cultured cells and ovaries adds robustness to this observation. The knockdown of DART4 supports a role for H3R17me2a in shaping Rhino binding at a subset of genomic regions.

      Weaknesses:

      However, Rhino binding at, and piRNA production from, canonical piRNA clusters appears largely unaffected by DART4 depletion, and spreading of Rhino from ADMA-rich boundaries was not directly demonstrated. Therefore, while the correlation is clearly documented, further investigation would be needed to determine the functional requirement of these histone marks in piRNA cluster specification.

      The study identify piRNA cluster-like regions called DART4 clusters. While the model proposes that DART4 clusters represent evolutionary precursors of mature piRNA clusters, the functional output of these clusters remains limited. Additional experiments could help clarify whether low-level piRNA production from these loci is sufficient to guide Piwi-dependent silencing.

      In summary, the authors present a well-executed study that raises intriguing hypotheses about the early chromatin context of piRNA cluster formation. The work will be of interest to researchers studying genome regulation, small RNA pathways, and the chromatin mechanisms of transposon control. It provides useful resources and new candidate loci for follow-up studies, while also highlighting the need for further functional validation to fully support the proposed model.

    3. Reviewer #2 (Public review):

      This study seeks to understand how the Rhino factor knows how to localize to specific transposon loci and to specific piRNA clusters to direct the correct formation of specialized heterochromatin that promotes piRNA biogenesis in the fly germline. In particular, these dual-strand piRNA clusters with names like 42AB, 38C, 80F, and 102F generate the bulk of ovarian piRNAs in the nurse cells of the fly ovary, but the evolutionary significance of these dual-strand piRNA clusters remains mysterious since triple null mutants of these dual-strand piRNA clusters still allows fly ovaries to develop and remain fertile. Nevertheless, mutants of Rhino and its interactors Deadlock, Cutoff, Kipferl and Moonshiner, etc, causes more piRNA loss beyond these dual-strand clusters and exhibit the phenotype of major female infertility, so the impact of proper assembly of Rhino, the RDC, Kipferl etc onto proper piRNA chromatin is an important and interesting biological question that is not fully understood.

      This study tries to first test ectopic expression of Rhino via engineering a Dox-inducible Rhino transgene in the OSC line that only expresses the primary Piwi pathway that reflects the natural single pathway expression the follicle cells and is quite distinct from the nurse cell germline piRNA pathway that is promoted by Rhino, Moonshiner, etc. The authors present some compelling evidence that this ectopic Rhino expression in OSCs may reveal how Rhino can initiate de novo binding via ADMA histone marks, a feat that would be much more challenging to demonstrate in the germline where this epigenetic naïve state cannot be modeled since germ cell collapse would likely ensue. In the OSC, the authors have tested the knockdown of four of the 11 known Drosophila PRMTs (DARTs), and comparing to ectopic Rhino foci that they observe in HP1a knockdown (KD), they conclude DART1 and DART4 are the prime factors to study further in looking for disruption of ADMA histone marks. The authors also test KD of DART8 and CG17726 in OSCs, but in the fly, the authors only test Germ Line KD of DART4 only, they do not explain why these other DARTs are not tested in GLKD, the UAS-RNAi resources in Drosophila strain repositories should be very complete and have reagents for these knockdowns to be accessible.

      The authors only characterize some particular ADMA marks of H3R17me2a as showing strong decrease after DART4 GLKD, and then they see some small subset of piRNA clusters go down in piRNA production as shown in Figure 6B and Figure 6F and Supplementary Figure 7. This small subset of DART4-dependent piRNA clusters does lose Rhino and Kipferl recruitment, which is an interesting result.

      However, the biggest issue with this study is the mystery that the set of the most prominent dual-strand piRNA clusters. 42AB, 38C, 80F, and 102F, are the prime genomic loci subjected to Rhino regulation, and they do not show any change in piRNA production in the GLKD of DART4. The authors bury this surprising negative result in Supplementary Figure 5E, but this is also evident in no decrease (actually an n.s. increase) in Rhino association in Figure 5D. Since these main piRNA clusters involve the RDC, Kipferl, Moonshiner, etc, and it does not change in ADMA status and piRNA loss after DART4 GLKD, this poses a problem with the model in Figure 7C. In this study, there is only a GLKD of DART4 and no GLKD of the other DARTs in fly ovaries.

      One way the authors rationalize this peculiar exception is the argument that DART4 is only acting on evolutionarily "young" piRNA clusters like the bx, CG14629, and CG31612, but the lack of any change on the majority of other piRNA clusters in Figure 6F leaves upon the unsatisfying concern that there is much functional redundancy remaining with other DARTs not being tested by GLKD in the fly that would have a bigger impact on the other main dual-strand piRNA clusters being regulated by Rhino and ADMA-histone marks.

      Also, the current data does not provide convincing enough support for the model Figure 7C and the paper title of ADMA-histones being the key determinant in the fly ovary for Rhino recognition of the dual-strand piRNA clusters. Although much of this study's data is well constructed and presented, there remains a large gap that no other DARTs were tested in GLKD that would show a big loss of piRNAs from the main dual-strand piRNA clusters of 42AB, 38C, 80F, and 102F, where Rhino has prominent spreading in these regions.

      As the manuscript currently stands, I do not think the authors present enough data to conclude that "ADMA-histones [As a Major new histone mark class] does play a crucial role in the initial recognition of dual-strand piRNA cluster regions by Rhino" because the data here mainly just show a small subset of evolutionarily young piRNA clusters have a strong effect from GLKD of DART4. The authors could extensively revise the study to be much more specific in the title and conclusion that they have uncovered this very unique niche of a small subset of DART4-dependent piRNA clusters, but this niche finding may dampen the impact and significance of this study since other major dual-strand piRNA clusters do not change during DART4 GLKD, and the authors do not show data GLKD of any other DARTs. The niche finding of just a small subset of DART-4-dependent piRNA clusters might make another specialized genetics forum a more appropriate venue.

    1. eLife Assessment

      This is a useful study in the role of CHI3L1 in Kupffer cells, the macrophages of the liver, showing that CHI3L1 alters glucose regulation in obesity. Specifically, Chi3l1 protects glucose-dependent Kupffer cells during Metabolic dysfunction-associated steatotic liver disease (MASLD) by inhibiting glucose uptake, preventing metabolic stress and death. These data are compelling, yet require further validation.

    2. Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high-fat, high-fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq, they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective, they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH, there are some concerns about the current data that limit my enthusiasm for the study in its current form. Please see my specific comments below.

      Major:

      (1) The authors' interpretation of the results from the KC ( Clec4F) and MdM KO (LysM-Cre) experiments is flawed. For example, in Figure 2 the authors present data that knockout of Chil1 in KCs using Clec4f Cre produces worse liver steatosis and insulin resistance. However, in supplemental Figure 4, they perform the same experiment in LysM-Cre mice and find a somewhat different phenotype. The authors appear to be under the impression that LysM-Cre does not cause recombination in KCs and therefore interpret this data to mean that Chil1 is relevant in KCs and not MdMs. However, LysM-Cre DOES lead to efficient recombination in KCs and therefore Chil1 expression will be decreased in both KCs and MdM (along with PMNs) in this line.

      Therefore, a phenotype observed with KC-KO should also be present in this model unless the authors argue that loss of Chil1 from the MdMs has the opposite phenotype of KCs and therefore attenuates the phenotype. The Cx3Cr1 CreER tamoxifen inducible system is currently the only macrophage Cre strategy that will avoid KC recombination. The authors need to rethink their results with the understanding that Chil1 is deleted from KCs in the LysM-Cre experiment. In addition, it appears that only one experiment was performed, with only 5 mice in each group for both the Clec4f and LysM-Cre data. This is generally not enough to make a firm conclusion for MASH diet experiments.

      (2) The mouse weight gain is missing from Figure 2 and Supplementary Figure 4. This data is critical to interpret the changes in liver pathology, especially since they have worse insulin resistance.

      (3) Figure 4 suggests that KC death is increased with KO of Chil1. However, this data cannot be concluded from the plots shown. In Supplementary Figure 6 the authors provide a more appropriate gating scheme to quantify resident KCs that includes TIM4. The TIM4 data needs to be shown and quantified in Figure 4. As shown in Supplementary Figure 6, the F4/80 hi population is predominantly KCs at baseline; however, this is not true with MASH diets. Most of the recruited MoMFs also reside in the F4/80 hi gate where they can be identified by their lower expression of TIM4. The MoMF gate shown in this figure is incorrect. The CD11b hi population is predominantly PMNs, monocytes, and cDC,2 not MoMFs (PMID:33997821). In addition, the authors should stain the tissue for TIM4, which would also be expected to reveal a decrease in the number of resident KCs.

      (4) While the Clec4F Cre is specific to KCs, there is also less data about the impact of the Cre system on KC biology. Therefore, when looking at cell death, the authors need to include some mice that express Clec4F cre without the floxed allele to rule out any effects of the Cre itself. In addition, if the cell death phenotype is real, it should also be present in LysM Cre system for the reasons described above. Therefore, the authors should quantify the KC number and dying KCs in this mouse line as well.

      (5) I am somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. Looking at our own data and those from the Liver Atlas it appears that this gene is primarily expressed in neutrophils. At a minimum, the authors should address the expression of Chil1 in macrophage populations from other publicly available datasets in mouse MASH to validate their findings (several options include - PMID: 33440159, 32888418, 32362324). If expression of Chil1 is not present in these other data sets, perhaps an environmental/microbiome difference may account for the distinct expression pattern observed. Either way, it is important to address this issue.

    3. Reviewer #2 (Public review):

      The manuscript from Shan et al., sets out to investigate the role of Chi3l1 in different hepatic macrophage subsets (KCs and moMFs) in MASLD following their identification that KCs highly express this gene. To this end, they utilise Chi3l1KO, Clec4f-CrexChi3l1fl, and Lyz2-CrexChi3l1fl mice and WT controls fed a HFHC for different periods of time.

      Firstly, the authors perform scRNA-seq, which led to the identification of Chi3l1 (encoded by Chil1) in macrophages. However, this is on a limited number of cells (especially in the HFHC context), and hence it would also be important to validate this finding in other publicly available MASLD/Fibrosis scRNA-seq datasets. Similarly, it would be important to examine if cells other than monocytes/macrophages also express this gene, given the use of the full KO in the manuscript. Along these lines, utilisation of publicly available human MASLD scRNA-seq datasets would also be important to understand where the increased expression observed in patients comes from and the overall relevance of macrophages in this finding.

      Next, the authors use two different Cre lines (Clec4f-Cre and Lyz2-Cre) to target KCs and moMFs respectively. However, no evidence is provided to demonstrate that Chil1 is only deleted from the respective cells in the two CRE lines. Thus, KCs and moMFs should be sorted from both lines, and a qPCR performed to check the deletion of Chil1. This is especially important for the Lyz2-Cre, which has been routinely used in the literature to target KCs (as well as moMFs) and has (at least partial) penetrance in KCs (depending on the gene to be floxed). Also, while the Clec4f-Cre mice show an exacerbated MASLD phenotype, there is currently no baseline phenotype of these animals (or the Lyz2Cre) in steady state in relation to the same readouts provided in MASLD and the macrophage compartment. This is critical to understand if the phenotype is MASLD-specific or if loss of Chi3l1 already affects the macrophages under homeostatic conditions.

      Next, the authors suggest that loss of Chi3l1 promotes KC death. However, to examine this, they use Chi3l1 full KO mice instead of the Clec4f-Cre line. The reason for this is not clear, because in this regard, it is now not clear whether the effects are regulated by loss of Chi3l1 from KCs or from other hepatic cells (see point above). The authors mention that Chi3l1 is a secreted protein, so does this mean other cells are also secreting it, and are these needed for KC death? In that case, this would not explain the phenotype in the CLEC4F-Cre mice. Here, the authors do perform a basic immunophenotyping of the macrophage populations; however, the markers used are outdated, making it difficult to interpret the findings. Instead of F4/80 and CD11b, which do not allow a perfect discrimination of KCs and moMFs, especially in HFHC diet-fed mice, more robust and specific markers of KCs should be used, including CLEC4F, VSIG4, and TIM4.

      Additionally, while the authors report a reduction of KCs in terms of absolute numbers, there are no differences in proportions. This, coupled with a decrease also in moMF numbers at 16 weeks (when one would expect an increase if KCs are decreased, based on previous literature) suggests that the differences in KC numbers may be due to differences in total cell counts obtained from the obese livers compared with controls. To rule this out, total cell counts and total live CD45+ cell counts should be provided. Here, the authors also provide tunnel staining in situ to demonstrate increased KC death, but as it is typically notoriously difficult to visualise dying KCs in MASLD models, here it would be important to provide more images. Similarly, there appear to be many more Tunel+ cells in the KO that are not KCs; thus, it would be important to examine this in the CLEC4F-Cre line to ascertain direct versus indirect effects on cell survival.

      Finally, the authors suggest that Chi3l1 exerts its effects through binding glucose and preventing its uptake. They use ex vivo/in vitro models to assess this with rChi3l1; however, here I miss the key in vivo experiment using the CLEC4F-Cre mice to prove that this in KCs is sufficient for the phenotype. This is critical to confirm the take-home message of the manuscript.

    4. Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      Here are my comments:

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

    1. eLife Assessment

      This study provides valuable insights into a new toxin-antidote element in C. elegans, the first naturally occurring unlinked toxin-antidote system where endogenous small RNA pathways post-transcriptionally suppress the toxin. The strength of evidence is solid, using a combination of genomic and experimental methods. Enthusiasm, however, is tempered by its reliance on meta-analysis of existing data sets and limited experimental evaluation.

    2. Reviewer #1 (Public review):

      Summary:

      The article by Zdraljevic et al. reports the discovery of a third toxin-antidote (TA) element in C. elegans, composed of the genes mll-1 (toxin) and smll-1 (antidote). Unlike previously characterized TA systems in C. elegans, this element induces larval arrest rather than embryonic lethality. The study identifies three distinct haplotypes at the TA locus, including a hyper-divergent version in the standard laboratory strain N2, which retains a functional toxin but lacks a functional antidote. The authors propose that small RNA-mediated silencing mechanisms, dependent on MUT-16 and PRG-1, suppress the toxicity of the divergent toxin allele. This work provides insights into the evolutionary dynamics of TA elements and their regulation through RNA interference (RNAi).

      Overall, there are many things to like about this paper and only a few small quibbles, which will not require more than a little rewriting or relatively minor analyses.

      Strengths:

      (1) The discovery of a maternally deposited TA element with delayed toxicity due to delayed mRNA translation of the maternally deposited toxin mRNA is a significant addition to the literature on selfish genetic elements in metazoans.

      (2) Identifying three haplotypes at the TA locus provides a snapshot of potential evolutionary trajectories for these elements, which are often inferred but rarely demonstrated in naturally occurring strains. The genomic analysis of 550 wild isolates contextualizes the findings within natural populations, revealing geographic clustering and evolutionary pressures acting on the TA locus.

      (3) The study employs various techniques, including CRISPR/Cas9 knockouts, FISH, long-read RNA sequencing, and population genomics. The use of inducible systems to confirm toxicity and antidote functionality is particularly robust. This multifaceted approach strengthens the validity of the findings.

      (4) The authors provide compelling evidence that small RNA pathways suppress toxin activity in strains lacking a functional antidote. This highlights an alternative mechanism for neutralizing selfish genetic elements.

      Weaknesses:

      (1) The introduction focuses strongly (for good reason) on bacterial TA systems and then jumps to TA systems in C. elegans. It's unclear why TA systems in other eukaryotes are not discussed.

      (2) Similarly, there is a missed opportunity to discuss an analogy between the suppressor mechanism discovered here and the hairpin RNA suppressors of meiotic drive identified by Eric Lai and colleagues. Discussing these will provide a fuller context of the present study's findings and will not affect their novelty.

      (3) While the evidence for RNAi-mediated suppression is strong, the claim that positive selection drove diversification at piRNA binding sites requires further discussion and clarification. The elevated dN and dS are unusual (how unusual relative to other genes in vicinity? What is hyper-divergent statistically speaking?), but there is no a priori reason that there would be selection on piRNA binding sites within the mll-1 transcript to facilitate its recognition by endogenous RNAi machinery; what is the selective pressure for mll-1 to do so? Most TA systems would like to avoid being suppressed by the host. One cannot make the argument that this was motivated by the loss of the antidote because the loss of the antidote would be instantly suicidal, so the cadence of events described requiring hypermutation of the mll-1 transcript does not work.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript by Walter-McNeill, Kruglyak, and team, the authors provide solid evidence of another toxin-antidote (TA) system in C. elegans. Generally, TA systems involve selfish and linked genetic elements, one encoding a toxin that kills progeny inheriting it, unless an antidote (the second element) is also present. Currently, only two TA systems have been characterized in this species, pointing to the importance of identifying new instances of such systems to understand their transmission dynamics, prevalence, and functions in shaping worm populations.

      Strengths:

      This novel TA system (mll-1/smll-1) was identified on LGV in wild C. elegans isolates from the Hawaiian islands, by crossing divergent strains and observing allele frequency distortions by high-throughput genome sequencing after 10 generations. These allele frequency distortions were subsequently confirmed in another set of crosses with a separate divergent strain, and crosses of heterozygous males or hermaphrodites resulted in a pattern of L1 lethality in progeny (with a rod arrest phenotype) that suggested the maternal transmission of this TA system from the XZ1516 genetic background. By elegantly combining the use of near-isogenic lines, CRISPR editing to generate knock-outs, and a transgene rescue of the antidote gene, the authors identified the genes encoding the toxin and the antidote, which they refer to as mll-1 and smll-1. Moreover, the specific mll-1 isoform responsible for the production of the toxin was identified and mll-1 transcripts were observed by FISH in early and late embryos, as well as in larvae. Inducible expression of the toxin in various strains resulted in larval arrest and rod phenotypes. The authors then characterized the genetic variation of 550 wild isolates at the toxin/antidote region on LGV and distinguished three clades: (1) one with the conserved TA system, (2) one having lost the toxin and retaining a mostly functional antidote, and (3) one having lost the antidote and retaining a divergent yet coding toxin (this includes the reference strain Bristol N2, in which the homologous toxin gene has acquired mutations and is known as B0250.8). Further, the authors show that this region is under positive selection. These data are compelling and provide very strong evidence of a new TA system in this species.

      Weaknesses:

      The question remained as to how one clade, including N2, could retain the toxin gene but not possess a functional antidote. In the second part of the manuscript, the authors hypothesized that small RNA targeting (RNAi) of the toxin transcript could provide the necessary repression to allow worms to survive without the antidote. Through a meta-analysis of multiple small RNA datasets from the literature, the authors found evidence to support this idea, in which the toxin transcript is targeted by 22G siRNAs whose biogenesis is dependent on the Mutator foci protein, MUT-16. They note that from previous studies, mut-16 null mutants displayed a varied penetrance of larval arrest. In their own hands, mut-16 mutants displayed 15% varied larval arrest and 2% rod phenotypes. In an attempt to link B0250.8 to mut-16/siRNAs, they made a double mutant and examined body length as a proxy for developmental stage. Here, they observed a partial rescue of the mut-16 size defect by B0250.8 mutation. Finally, the authors also highlight data from further meta-analysis, which predicts the recognition of B0250.8 by several piRNAs. Also based on existing data from the literature, the authors link loss of Piwi (PRG-1), which binds piRNAs, to a depletion of 22G-RNAs targeting B0250.8 and an upregulation of B0250.8 expression in gonads, suggesting that piRNAs are the primary small RNAs that target B0250.8 for downregulation. The data in this portion of the manuscript are intriguing, but somewhat preliminary and incomplete, as they are based on little primary experimentation and a collection of different datasets (which have been acquired by slightly different methods in most cases). This portion of the study would require subsequent experimentation to firmly establish this mechanistic link. For example, to be able to claim that "the N2 toxin allele has acquired mutations that enable piRNA binding to initiate MUT-16-dependent 22G small RNA amplification that targets the transcript for degradation" the identified piRNA sites should be mutated and protein and transcript levels analysed in wild-type and in the strain with mutated piRNA sites. At a minimum, the protein levels in wild-type and mut-16, prg-1, and/or wago-1 mutants should be measured by western blot and/or by live imaging (introducing a GFP or some other tag to the endogenous protein via CRISPR editing) to show that the toxin is not accumulated as a protein in wt, but increases in levels in these mutants. mRNA levels in Figure S5A suggest there is still some expression of the B0250.8 transcript in a wild-type situation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors report a study on how stimulation of receptive-field surround of V1 and LGN neurons affects their firing rates. Specifically, they examine stimuli in which a grey patch covers the classical RF of the cell and a stimulus appears in the surround. Using a number of different stimulus paradigms they find a long latency response in V1 (but not the LGN) which does not depend strongly on the characteristics of the surround grating (drifting vs static, continuous vs discontinuous, predictable grating vs unpredictable pink noise). They find that population responses to simple achromatic stimuli have a different structure that does not distinguish so clearly between the grey patch and other conditions and the latency of the response was similar regardless of whether the center or surround was stimulated by the achromatic surface. Taken together they propose that the surround-response is related to the representation of the grey surface itself. They relate their findings to previous studies that have put forward the concept of an ’inverse RF’ based on strong responses to small grey patches on a full-screen grating. They also discuss their results in the context of studies that suggest that surround responses are related to predictions of the RF content or figure-ground segregation. Strengths:

      I find the study to be an interesting extension of the work on surround stimulation and the addition of the LGN data is useful showing that the surround-induced responses are not present in the feedforward path. The conclusions appear solid, being based on large numbers of neurons obtained through Neuropixels recordings. The use of many different stimulus combinations provides a rich view of the nature of the surround-induced responses.

      Weaknesses:

      The statistics are pooled across animals, which is less appropriate for hierarchical data. There is no histological confirmation of placement of the electrode in the LGN and there is no analysis of eye or face movements which may have contributed to the surround-induced responses. There are also some missing statistics and methods details which make interpretation more difficult.

      We thank the reviewer for their positive and constructive comments, and have addressed these specific issues in response to the minor comments. For the statistics across animals, we refer to “Reviewer 1 recommendations” point 1. For the histological analysis, we refer to “Reviewer 1 recommendations point 2”. For the eye and facial movements, we refer to “Reviewer 1 recommendations point 5”. Concerning missing statistics and methods details, we refer to various responses to “Reviewer 1 recommendations”. We thoroughly reviewed the manuscript and included all missing statistical and methodological details.

      Reviewer #2 (Public review):

      Cuevas et al. investigate the stimulus selectivity of surround-induced responses in the mouse primary visual cortex (V1). While classical experiments in non-human primates and cats have generally demonstrated that stimuli in the surround receptive field (RF) of V1 neurons only modulate activity to stimuli presented in the center RF, without eliciting responses when presented in isolation, recent studies in mouse V1 have indicated the presence of purely surround-induced responses. These have been linked to prediction error signals. In this study, the authors build on these previous findings by systematically examining the stimulus selectivity of surround-induced responses.

      Using neuropixels recordings in V1 and the dorsal lateral geniculate nucleus (dLGN) of head-fixed, awake mice, the authors presented various stimulus types (gratings, noise, surfaces) to the center and surround, as well as to the surround only, while also varying the size of the stimuli. Their results confirm the existence of surround-induced responses in mouse V1 neurons, demonstrating that these responses do not require spatial or temporal coherence across the surround, as would be expected if they were linked to prediction error signals. Instead, they suggest that surround-induced responses primarily reflect the representation of the achromatic surface itself.

      The literature on center-surround effects in V1 is extensive and sometimes confusing, likely due to the use of different species, stimulus configurations, contrast levels, and stimulus sizes across different studies. It is plausible that surround modulation serves multiple functions depending on these parameters. Within this context, the study by Cuevas et al. makes a significant contribution by exploring the relationship between surround-induced responses in mouse V1 and stimulus statistics. The research is meticulously conducted and incorporates a wide range of experimental stimulus conditions, providing valuable new insights regarding center-surround interactions.

      However, the current manuscript presents challenges in readability for both non-experts and experts. Some conclusions are difficult to follow or not clearly justified.

      I recommend the following improvements to enhance clarity and comprehension:

      (1) Clearly state the hypotheses being tested at the beginning of the manuscript.

      (2) Always specify the species used in referenced studies to avoid confusion (esp. Introduction and Discussion).

      (3) Briefly summarize the main findings at the beginning of each section to provide context.

      (4) Clearly define important terms such as “surface stimulus” and “early vs. late stimulus period” to ensure understanding.

      (5) Provide a rationale for each result section, explaining the significance of the findings.

      (6) Offer a detailed explanation of why the results do not support the prediction error signal hypothesis but instead suggest an encoding of the achromatic surface.

      These adjustments will help make the manuscript more accessible and its conclusions more compelling.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      • We explicitly mentioned the species used in the referenced studies.

      • We provided a clearer rationale for each experiment in the Results section.

      We have also always clearly stated the species that previous studies used, both in the Introduction and Discussion section.

      Reviewer #3 (Public review):

      Summary:

      This paper explores the phenomenon whereby some V1 neurons can respond to stimuli presented far outside their receptive field. It introduces three possible explanations for this phenomenon and it presents experiments that it argues favor the third explanation, based on figure/ground segregation.

      Strengths:

      I found it useful to see that there are three possible interpretations of this finding (prediction error, interpolation, and figure/ground). I also found it useful to see a comparison with LGN responses and to see that the effect there is not only absent but actually the opposite: stimuli presented far outside the receptive field suppress rather than drive the neurons. Other experiments presented here may also be of interest to the field.

      Weaknesses:

      The paper is not particularly clear. I came out of it rather confused as to which hypotheses were still standing and which hypotheses were ruled out. There are numerous ways to make it clearer.

      We thank the reviewer for their constructive feedback and for highlighting the need for improved clarity regarding the hypotheses and their relation to the experimental findings.

      • We have strongly improved the Introduction and Discussion section, explaining the different hypotheses and their relation to the performed experiments.

      • In the Introduction, we have clearly outlined each hypothesis and its predictions, providing a structured framework for understanding the rationale behind our experimental design. • In the Discussion, we have been more explicit in explaining how the experimental findings inform these hypotheses.

      ** Recommendations for the Authors:**

      Reviewer #1 (Recommendations for the Authors):

      (1) Given the data is hierarchical with neurons clustered within 6 mice (how many recording sessions per animal?) I would recommend the use of Linear Mixed Effects models. Simply pooling all neurons increases the risk of false alarms.

      To clarify: We used the standard method for analyzing single-unit recordings, by comparing the responses of a population of single neurons between two different conditions. This means that the responses of each single neuron were measured in the different conditions, and the statistics were therefore based on the pairwise differences computed for each neuron separately. This is a common and standard procedure in systems neuroscience, and was also used in the previous studies on this topic (Keller et al., 2020; Kirchberger et al., 2023). We were not concerned with comparing two groups of animals, for which hierarchical analyses are recommended. To address the reviewer’s concern, we did examine whether differences between baseline and the gray/drift condition, as well as the gray/drift compared to the grating condition, were consistent across sessions, which was indeed the case. These findings are presented in Supplementary Figure 6.

      (2) Line 432: “The study utilized three to eight-month-old mice of both genders”. This is confusing, I assume they mean six mice in total, please restate. What about the LGN recordings, were these done in the same mice? Can the authors please clarify how many animals, how many total units, how many included units, how many recording sessions per animal, and whether the same units were recorded in all experiments?

      We have now clarified the information regarding the animals used in the Methods section.

      • We state that “We included female and male mice (C57BL/6), a total of six animals for V1 recordings between three and eight months old. In two of those animals, we recorded simultaneously from LGN and V1.”

      • We state that“For each animal, we recorded around 2-3 sessions from each hemisphere, and we recorded from both hemispheres.”

      • We noted that the number of neurons was not mentioned for each figure caption. We apologize for this omission. We have now added the number for all of the figures and protocols to the revised manuscript. We note that the same neurons were recorded for the different conditions within each protocol, however because a few sessions were short we recorded more units for the grating protocol. Note that we did not make statistical comparisons between protocols.

      (3) I see no histology for confirmation of placement of the electrode in the LGN, how can they be sure they were recording from the LGN? There is also little description of the LGN experiments in the methods.

      For better clarity, we have included a reconstruction of the electrode track from histological sections of one animal post-experiment (Figure S4). The LGN was targeted via stereotactical surgery, and the visual responses in this area are highly distinct. In addition, we used a flash protocol to identify the early-latency responses typical for the LGN, which is described in the Methods section: “A flash stimulus was employed to confirm the locations of LGN at the beginning of the recording sessions, similar to our previous work in which we recorded from LGN and V1 simultaneously (Schneider et al., 2023). This stimulus consisted of a 100 ms white screen and a 2 s gray screen as the inter-stimulus interval, designed to identify visually responsive areas. The responses of multi-unit activity (MUA) to the flash stimulus were extracted and a CSD analysis was then performed on the MUA, sampling every two channels. The resulting CSD profiles were plotted to identify channels corresponding to the LGN. During LGN recordings, simultaneous recordings were made from V1, revealing visually responsive areas interspersed with non-responsive channels.”

      (4) Many statements are not backed up by statistics, for example, each time the authors report that the response at 90degree sign is higher than baseline (Line 121 amongst other places) there is no test to support this. Also Line 140 (negative correlation), Line 145, Line 180.

      For comparison purposes, we only presented statistical analyses across conditions. However, we have now added information to the figure captions stating that all conditions show values higher than the baseline.

      (5) As far as I can see there is no analysis of eye movements or facial movements. This could be an issue, for example, if the onset of the far surround stimuli induces movements this may lead to spurious activations in V1 that would be interpreted as surround-induced responses.

      To address this point, we have included a supplementary figure analyzing facial movements across different sessions and comparing them between conditions (Supplementary Figure 5). A detailed explanation of this analysis has been added to the Methods section. Overall, we observed no significant differences in face movements between trials with gratings, trials with the gray patch, and trials with the gray screen presented during baseline. Animals exhibited similar face movements across all three conditions, supporting the conclusion that the observed neural firing rate increases for the gray-patch condition are not related to face movements.

      (6) The experiments with the rectangular patch (Figure 3) seem to give a slightly different result as the responses for large sizes (75, 90) don’t appear to be above baseline. This condition is also perceptually the least consistent with a grey surface in the RF, the grey patch doesn’t appear to occlude the surface in this condition. I think this is largely consistent with their conclusions and it could merit some discussion in the results/discussion section.

      While the effect is maybe a bit weaker, the total surround stimulated also covers a smaller area because of the large rectangular gray patch. Furthermore, the early responses are clearly elevated above baseline, and the responses up to 70 degrees are still higher than baseline. Hence we think this data point for 90 degrees does not warrant a strong interpretation.

      Minor points:

      (1) Figure 1h: What is the statistical test reported in the panel (I guess a signed rank based on later figures)? Figure 4d doesn’t appear to be significantly different but is reported as so. Perhaps the median can be indicated on the distribution?

      We explained that we used a signed rank test for Figure 1h and now included the median of the distributions in Figure 4d.

      (2) What was the reason for having the gratings only extend to half the x-axis of the screen, rather than being full-screen? This creates a percept (in humans at least) that is more consistent with the grey patch being a hole in the grating as the grey patch has the same luminance as the background outside the grating.

      We explained in the Methods section that “We presented only half of the x-axis due to the large size of our monitor, in order to avoid over-stimulation of the animals with very large grating stimuli.”. Perceptually speaking, the gray patch appears as something occluding the grating, not as a “hole”.

      (3) Line 103: “and, importantly, had less than 10degree sign (absolute) distance to the grating stimulus’ RF center.” Re-phrase, a stimulus doesn’t have an RF center.

      We corrected this to “We included only single units into the analysis that met several criteria in terms of visual responses (see Methods) and, importantly, the RF center had less than 10(absolute) distance to the grating stimulus’ center. ”.

      (4) Line 143: “We recorded single neurons LGN” - should be “single LGN neurons”.

      We corrected this to “we recorded single LGN neurons”.

      (5) Line 200: They could spell out here that the latency is consistent with the latency observed for the grey patch conditions in the previous experiments. (6) Line 465: This is very brief. What criteria did they use for single-unit assignation? Were all units well-isolated or were multi-units included?

      We clarified in the Methods section that “We isolated single units with Kilosort 2.5 (Steinmetz et al., 2021) and manually curated them with Phy2 (Rossant et al., 2021). We included only single units with a maximum contamination of 10 percent.”

      (7) Line 469: “The experiment was run on a Windows 10”. Typo.

      We corrected this to “The experiment was run on Windows 10”.

      (9) Line 481: “We averaged the response over all trials and positions of the screen”. What do they mean by ’positions of the screen’?

      We changed this to “We computed the response for each position separately right, by averaging the response across all the trials where a square was presented at a given position.”

      (9) Line 483: “We fitted an ellipse in the center of the response”. How?

      We additionally explain how we preferred the detection of the RF using an ellipse fitting: “A heatmap of the response was computed. This heatmap was then smoothed, and we calculated the location of the peak response. From the heatmap we calculated the centroid of the response using the function regionprops.m that finds unique objects, we then selected the biggest area detected. Using the centroids provided as output. We then fitted an ellipse centered on this peak response location to the smoothed heatmap using the MATLAB function ellipse.m.“

      (10) Line 485 “...and positioned the stimulus at the response peak previously found”. Unclear wording, do you mean the center of the ellipse fit to the MUA response averaged across channels or something else? (11) Line 487: “We performed a permutation test of the responses inside the RF detected vs a circle from the same area where the screen was gray for the same trials.”. The wording is a bit unclear here, can they clarify what they mean by the ’same trials’, what is being compared to what here?

      We used a permutation test to compare the neuron’s responses to black and white squares inside the RF to the condition where there was no square in the RF (i.e. the RF was covered by the gray background).

      (12) Was the pink noise background regenerated on each trial or as the same noise pattern shown on each trial?

      We explain that “We randomly presented one of two different pink noise images”

      (13) Line 552: “...used a time window of the Gaussian smoothing kernel from-.05 to .05”. Missing units.

      We explained that “we used a time window of the Gaussian smoothing kernel from -.05 s to .05 s, with a standard deviation of 0.0125 s.”

      (14) Line 565: “Additionally, for the occluded stimulus, we included patch sizes of 70 degree sign and larger.”. Not sure what they’re referring to here.

      We changed this to: “For the population analyses, we analyzed the conditions in which the gray patch sizes were 70 degrees and 90 degrees”.

      (15) Line 569: What is perplexity, and how does changing it affect the t-SNE embeddings?

      Note that t-SNE is only used for visualization purposes. In the revised manuscript, we have expanded our explanation regarding the use of t-SNE and the choice of perplexity values. Specifically, we have clarified that we used a perplexity value of 20 for the Gratings with circular and rectangular occluders and 100 for the black-and-white condition. These values were empirically selected to ensure that the groups in the data were clearly separable while maintaining the balance between local and global relationships in the projected space. This choice allowed us to visually distinguish the different groups while preserving the meaningful structure encoded in the dissimilarity matrices. In particular, varying the perplexity values would not alter the conclusions drawn from the visualization, as t-SNE does not affect the underlying analytical steps of our study.

      (16) Line 572: “We trained a C-Support Vector Classifier based on dissimilarity matrices”. This is overly brief, please describe the construction of the dissimilarity matrices and how the training was implemented. Was this binary, multi-class? What conditions were compared exactly?

      In the revised manuscript, we have expanded our explanation regarding the construction of the dissimilarity matrices and the implementation of the C-Support Vector Classification (C-SVC) model (See Methods section).

      The dissimilarity matrices were calculated using the Euclidean distance between firing rate vectors for all pairs of trials (as shown in Figure 6a-b). These matrices were used directly as input for the classifier. It is important to note that t-SNE was not used for classification but only for visualization purposes. The classifier was binary, distinguishing between two classes (e.g., Dr vs St). We trained the model using 60% of the data for training and used 40% for testing. The C-SVC was implemented using sklearn, and the classification score corresponds to the average accuracy across 20 repetitions.

      Reviewer #2 (Recommendations for the Authors):

      The relationship between the current paper and Keller et al. is challenging to understand. It seems like the study is critiquing the previous study but rather implicitly and not directly. I would suggest either directly stating the criticism or presenting the current study as a follow-up investigation that further explores the observed effect or provides an alternative function. Additionally, defining the inverse RF versus surround-induced responses earlier than in the discussion would be beneficial. Some suggestions:

      (1) The introduction is well-written, but it would be helpful to clearly define the hypotheses regarding the function of surround-induced responses and revisit these hypotheses one by one in the results section.

      Indeed, we have generally improved the Introduction of the manuscript, and stated the hypotheses and their relationships to the Experiments more clearly.

      (2) Explicitly mention how you compare classic grating stimuli of varying sizes with gray patch stimuli. Do the patch stimuli all come with a full-field grating? For the full-field grating, you have one size parameter, while for the patch stimuli, you have two (size of the patch and size of the grating).

      We now clearly describe how we compare grating stimuli of varying sizes with gray patch stimuli.

      (3) The third paragraph in the introduction reads more like a discussion and might be better placed there.

      We have moved content from the third paragraph of the Introduction to the Discussion, where it fits more naturally.

      (4) Include 1-2 sentences explaining how you center RFs and detail the resolution of your method.

      We have added an explanation to the Methods: “To center the visual stimuli during the recording session, we averaged the multiunit activity across the responsive channels and positioned the stimulus at the center of the ellipse fit to the MUA response averaged across channels.”.

      (5) Motivate the use of achromatic stimuli. This section is generally quite hard to understand, so try to simplify it.

      We explained better in the Introduction why we performed this particular experiment.

      (6) The decoding analysis is great, but it is somewhat difficult to understand the most important results. Consider summarizing the key findings at the beginning of this section.

      We now provide a clearer motivation at the start of the Decoding section.

      Reviewer #3 (Recommendations for the Authors):

      I have a few suggestions to improve the clarity of the presentation.

      Abstract: it lists a series of observations and it ends with a conclusion (“based on these findings...”). However, it provides little explanation for how this conclusion would arise from the observations. It would be more helpful to introduce the reasoning at the top and show what is consistent with it.

      We have improved the abstract of the paper incorporating this feedback.

      To some extent, this applies to Results too. Sometimes we are shown the results of some experiment just because others have done a similar experiment. Would it be better to tell us which hypotheses it tests and whether the results are consistent with all 3 hypotheses or might rule one or more out? I came out of the paper rather confused as to which hypotheses were still standing and which hypotheses were ruled out.

      We have strongly improved our explanation of the hypotheses and the relationships to the experiments in the Introduction.

      It would be best if the Results section focused on the results of the study, without much emphasis on what previous studies did or did not measure. Here, instead, in the middle of Results we are told multiple times what Keller et al. (2020) did or did not measure, and what they did or did not find. Please focus on the questions and on the results. Where they agree or disagree with previous papers, tell us briefly that this is the case.

      We have revised the Results section in the revised manuscript, and ensured that there is much less focus on what previous studies did in the Results. Differences to previous work are now discussed in the Discussion section.

      The notation is extremely awkward. For instance “Gc” stands for two words (Gray center) but “Gr” stands for a single word (Grating). The double meaning of G is one of many sources of confusion.

      This notation needs to be revised. Here is one way to make it simpler: choose one word for each type of stimulus (e.g. Gray, White, Black, Drift, Stat, Noise) and use it without abbreviations. To indicate the configuration, combine two of those words (e.g. Gray/Drift for Gray in the center and Drift in the surround).

      We have corrected the notation in the figures and text to enhance readability and improve the reader’s understanding.

      Figure 1e and many subsequent ones: it is not clear why the firing rate is shown in a logarithmic scale. Why not show it in a linear scale? Anyway, if the logarithmic scale is preferred for some reason, then please give us ticks at numbers that we can interpret, like 0.1,1,10,100... or 0.5,1,2,4... Also, please use the same y-scale across figures so we can compare.

      To clarify: it is necessary to normalize the firing rates relative to baseline, in order to pool across neurons. However such a divisive normalization would be by itself problematic, as e.g. a change from 1 to 2 is the same as a change from 1 to 0.5, on a linear scale. Furthermore such division is highly outlier sensitive. For this reason taking the logarithm (base 10) of the ratio is an appropriate transformation. We changed the tick labels to 1, 2, 4 like the reviewer suggested.

      Figure 3: it is not clear what “size” refers to in the stimuli where there is no gray center. Is it the horizontal size of the overall stimulus? Some cartoons might help. Or just some words to explain.

      Figure 3: if my understanding of “size” above is correct, the results are remarkable: there is no effect whatsoever of replacing the center stimulus with a gray rectangle. Shouldn’t this be remarked upon?

      We have added a paragraph under figure 3 and in the Methods section explaining that the sizes represent the varying horizontal dimensions of the rectangular patch. In this protocol, the classical condition (i.e. without gray patch) was shown only as full-field gratings, which is depicted in the plot as size 0, indicating no rectangular patch was present.

      DETAILS The word “achromatic” appears many times in the paper and is essentially uninformative (all stimuli in this study are achromatic, including the gratings). It could be removed in most places except a few, where it is actually used to mean “uniform”. In those cases, it should be replaced by “uniform”.

      Ditto for the word “luminous”, which appears twice and has no apparent meaning. Please replace it with “uniform”.

      We have replaced the words achromatic and luminous with “uniform” stimuli to improve the clarity when we refer to only black or white stimuli.

      Page 3, line 70: “We raise some important factors to consider when describing responses to only surround stimulation.” This sentence might belong in the Discussion but not in the middle of a paragraph of Results.

      We removed this sentence.

      Neuropixel - Neuropixels (plural)

      “area LGN” - LGN

      We corrected for misspellings.

      References

      Keller, A.J., Roth, M.M., Scanziani, M., 2020. Feedback generates a second receptive field in neurons of the visual cortex. Nature 582, 545–549. doi:10.1038/s41586-020-2319-4.

      Kirchberger, L., Mukherjee, S., Self, M.W., Roelfsema, P.R., 2023. Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science Advances 9, eadd2498. doi:10. 1126/sciadv.add2498.

      Rossant, C., et al., 2021. phy: Interactive analysis of large-scale electrophysiological data. https://github.com/cortex-lab/phy.

      Schneider, M., Tzanou, A., Uran, C., Vinck, M., 2023. Cell-type-specific propagation of visual flicker. Cell Reports 42.

      Steinmetz, N.A., Aydin, C., Lebedeva, A., Okun, M., Pachitariu, M., Bauza, M., Beau, M., Bhagat, J., B¨ohm, C., Broux, M., Chen, S., Colonell, J., Gardner, R.J., Karsh, B., Kloosterman, F., Kostadinov, D., Mora-Lopez, C., O’Callaghan, J., Park, J., Putzeys, J., Sauerbrei, B., van Daal,R.J.J., Vollan, A.Z., Wang, S., Welkenhuysen, M., Ye, Z., Dudman, J.T., Dutta, B., Hantman, A.W., Harris, K.D., Lee, A.K., Moser, E.I., O’Keefe, J., Renart, A., Svoboda, K., H¨ausser, M., Haesler, S., Carandini, M., Harris, T.D., 2021. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588. doi:10.1126/science.abf4588.

    2. eLife Assessment

      This valuable study investigates the selectivity of neuronal responses in the neocortex and thalamus to visual stimuli presented far outside their receptive fields. The study shows convincing evidence for a long-latency surround-induced response in primary visual cortex that is absent in the dorsal lateral geniculate nucleus and does not depend strongly on the visual characteristics of the surround stimulus. The paper should be of interest to neurophysiologists interested in vision and contextual modulations.

    3. Reviewer #1 (Public review):

      Summary:

      The authors report a study on how stimulation of receptive-field surround of V1 and LGN neurons affects their firing-rates. Specifically, they examine stimuli in which a grey patch covers the classical RF of the cell and a stimulus appears in the surround. Using a number of different stimulus paradigms they find a long latency response in V1 (but not the LGN) which does not depend strongly on the characteristics of the surround grating (drifting vs static, continuous vs discontinuous, predictable grating vs unpredictable pink noise). They find that population responses to simple achromatic stimuli have a different structure that does not distinguish so clearly between the grey patch and other conditions and the latency of the response was similar regardless of whether the center or surround was stimulated by the achromatic surface. Taken together they propose that the surround-response is related to the representation of the grey surface itself. They relate their findings to previous studies which have put forward the concept of an 'inverse RF' based on strong responses to small grey patches on a full-screen grating. They also discuss their results in the context of studies that suggest that surround responses are related to predictions of the RF content or figure-ground segregation.

      Strengths:

      I find the study to be an interesting extension of the work on surround stimulation and the addition of the LGN data is useful showing that the surround-induced responses are not present in the feed-forward path. The conclusions appear solid, being based on large numbers of neurons obtained through Neuropixels recordings. The use of many different stimulus combinations provides a rich view of the nature of the surround-induced responses.

      Weaknesses:

      The LGN data comes from a small number of animals (n=2). Statistics are generally pooled across all recording sessions/animals without taking into account the higher covariance of neurons recorded in the same session. This is not a problem for paired comparisons, but for some statistics in the paper a hierarchical approach would have been more appropriate. The authors do present individual session data and the effects appear to be consistent across sessions.

    4. Reviewer #3 (Public review):

      Summary:

      This paper explores the phenomenon whereby some V1 neurons can respond to stimuli presented far outside their receptive field. It introduces three possible explanations for this phenomenon and it presents experiments that it argues favor the third explanation, which is based on figure/ground segregation.

      Strengths:

      I found it useful to see that there are three possible interpretations of this finding (prediction error, interpolation, and figure/ground). I also found it useful to see a comparison with LGN responses and to see that the effect there is not only absent but actually opposite: stimuli presented far outside the receptive field suppress rather than drive the neurons. Other experiments presented here may also be of interest to the field.

      Weaknesses:

      Though the paper has markedly improved, and now has a clearer statement of the hypotheses, it could be streamlined further, to tighten the relation between hypotheses and analyses, and to draw conclusions from those analyses in terms of the hypotheses.

    1. eLife Assessment

      This important study uses long-term behavioural observations to understand the factors that influence female-on-female aggression in gorilla social groups. The evidence supporting the claims is convincing, as it includes novel methods of assessing aggression and considers other potential factors. The work will be of interest to broad biologists working on the social interactions of animals.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

      Weaknesses:

      Although the paper has a novel approach by studying the effect of reproductive state and social environment on female-female aggression, the use of observational data without experimental manipulation limits the ability to establish causation. The authors suggest that the difference observed in female aggression direction between groups with different sex composition might be indicative of male presence buffering aggression, which seems speculative, as no direct evidence of male intervention or support was reported. Similarly, the use of reproductive state as a proxy for energetic need is an indirect measure and does not account for actual energy expenditure or caloric intake, which weakens the authors' claims that female energetic need induces risk-taking. Overall, this paper would benefit from stronger justification and empirical support to strengthen the conclusions of the study about the mechanisms driving female aggression in gorillas.

    3. Reviewer #2 (Public review):

      Summary:

      The authors' aim in this study is to assess the factors that can shift competitive incentives against higher- or lower-ranking groupmates in two gorilla species.

      Strengths:

      This is a relevant topic, where important insights could be gained. The authors brought together a substantial dataset: a long-term behavioral dataset representing two gorilla species from five social groups.

      Weaknesses:

      The authors have not fully shown the data used in the model and explored the potential of the model. Therefore, I remain cautious about the current results and conclusions.

      Some specific suggestions that require attention are

      (1) The authors described how group size can affect aggression patterns in some species (line 54), using a whole paragraph, but did not include it as an explanation variable in their model, despite that they stated the overall group size can "conflate opposing effects of females and males" (line 85). I suggest underlining the effects of numbers of males or/and females here and de-emphasizing the effect of group size in the Introduction.

      (2) There should be more details given about how the authors calculated individual Elo-ratings (line 98). It seems that authors pooled all avoidance/displacement behaviors throughout the study period. But how often was the Elo-rating they included in the model calculated? By the day or by the month? I guess it was by the day, as they "estimate female reproductive state daily" (line 123). If so, it should be made clear in the text.

      In addition, all groups were long-term studied, and the group composition seems fluctuant based on the Table 1 in Reference 11. When an individual enters/leaves the group with a stable hierarchy, it takes time before the hierarchy turns stable again. If the avoidance/displacement behaviors used for the rank relationship were not common, it would take a few days or maybe longer. Also, were the aggressive behaviors more common during rank fluctuations? In other words, if avoidance/displacement behaviors and aggressive behaviors occur simultaneously during rank fluctuations, how did the authors deal with it and take it into consideration in the analysis?

      The authors emphasized several times in the text that gorillas "form highly stable hierarchical relationships". Also, in Reference 25, they found very high stabilities of each group's hierarchy. However, the number of females involved in that analysis was different from that used here. They need to provide more basic info on each group's dominance hierarchy and verify their statement. I strongly suggest that the authors display Elo-rating trajectories and necessary relevant statistics for each group throughout the study period as part of the supplementary materials.

      (3) The authors stated why they differentiated the different stages based on female reproductive status. They also referred to the differences in energetic needs between stages of pregnancy and lactation (lines 127-128). However, in the mixed model, they only compared the interaction score between the female cycling stage and other stages. The model was not well explained, and the results could be expanded. I suggest conducting more pairwise comparisons in the model and presenting the statistics in the text, if there are significant results. If all three pregnancy stages differed significantly from cycling and lactating stages but not from each other, they may be merged as one pregnancy stage. More in-depth analysis would help provide better answers to the research questions.

    4. Reviewer #3 (Public review):

      Smit and Robbins' manuscript investigates the dynamics of aggression among female groupmates across five gorilla groups. The authors utilize longitudinal data to examine how reproductive state, group size, presence of males, and resource availability influence patterns of aggression and overall dominance rankings as measured by Elo scores. The findings underscore the important role of group composition and reproductive status, particularly pregnancy, in shaping dominance relationships in wild gorillas. While the study addresses a compelling and understudied topic, I have several comments and suggestions that may enhance clarity and improve the reader's experience.

      (1) Clarification of longitudinal data - The manuscript states that 25 years of behavioral data were used, but this number appears unclear. Based on my calculations, the maximum duration of behavioral observation for any one group appears to be 18 years. Specifically: - ATA: 6 years - BIT: 8 years - KYA: 18 years - MUK: 6 years - ORU: 8 years I recommend that the authors clarify how the 25-year duration was derived.

      (2) Consideration of group size - The authors mention that group size was excluded from analyses to avoid conflating the opposing effects of female and male group members. While this is understandable, it may still be beneficial to explore group size effects in supplementary analyses. I suggest reporting statistics related to group size and potentially including a supplementary figure. Additionally, given that the study includes both mountain and wild gorillas, it would be helpful to examine whether any interspecies differences are apparent.

      (3) Behavioral measures clarification - Lines 112-116 describe the types of aggressive behaviors observed. It would be helpful to clarify how these behaviors differ from those used to calculate Elo scores, or whether they overlap. A brief explanation would improve transparency regarding the methodology.

      (4) Aggression rates versus Elo scores - The manuscript uses aggression rates rather than dominance rank (as measured by Elo scores) as the main outcome variable, but there is no explanation on why. How would the results differ if aggression rates were replaced or supplemented with Elo scores? The current justification for prioritizing aggression rates over dominance rank needs to be more clearly supported.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to improve our understanding of the factors that influence female-on-female aggressive interactions in gorilla social hierarchies, using 25 years of behavioural data from five wild groups of two gorilla species. Researchers analysed aggressive interactions between 31 adult females, using behavioural observations and dominance hierarchies inferred through Elo-rating methods. Aggression intensity (mild, moderate, severe) and direction (measured as the rank difference between aggressor and recipient) were used as key variables. A linear mixed-effects model was applied to evaluate how aggression direction varied with reproductive state (cycling, trimester-specific pregnancy, or lactation) and sex composition of the group. This study highlights the direction of aggressive interactions between females, with most interactions being directed from higher- to lower-ranking adult females close in social rank. However, the results show that 42% of these interactions are directed from lower- to higher-ranking females. Particularly, lactating and pregnant females targeted higher-ranking individuals, which the authors suggest might be due to higher energetic needs, which increase risk-taking in lactating and pregnant females. Sex composition within the group also influenced which individuals were targeted. The authors suggest that male presence buffers female-on-female aggression, allowing females to target higher-ranking females than themselves. In contrast, females targeted lower-ranking females than themselves in groups with a larger ratio of females, which supposes a lower risk for the females since the pool of competitors is larger. The findings provide an important insight into aggression heuristics in primate social systems and the social and individual factors that influence these interactions, providing a deeper understanding of the evolutionary pressures that shape risk-taking, dominance maintenance, and the flexibility of social strategies in group-living species.

      The authors achieved their aim by demonstrating that aggression direction in female gorillas is influenced by factors such as reproductive condition and social context, and their results support the broader claim that aggression heuristics are flexible. However, some specific interpretations require further support. Despite this, the study makes a valuable contribution to the field of behavioural ecology by reframing how we think about intra-sexual competition and social rank maintenance in primates.

      Strengths:

      One of the study's major strengths is the use of an extensive dataset that compiles 25 years of behavioural data and 6871 aggressive interactions between 31 adult females in five social groups, which allows for a robust statistical analysis. This study uses a novel approach to the study of aggression in social groups by including factors such as the direction and intensity of aggressive interactions, which offers a comprehensive understanding of these complex social dynamics. In addition, this study incorporates ecological and physiological factors such as the reproductive state of the females and the sex composition of the group, which allows an integrative perspective on aggression within the broader context of body condition and social environment. The authors successfully integrate their results into broader evolutionary and ecological frameworks, enriching discussions around social hierarchies and risk sensitivity in primates and other animals.

      Thank you for the positive assessment of our work and the nice summary of the manuscript!

      Weaknesses:

      Although the paper has a novel approach by studying the effect of reproductive state and social environment on female-female aggression, the use of observational data without experimental manipulation limits the ability to establish causation. The authors suggest that the difference observed in female aggression direction between groups with different sex composition might be indicative of male presence buffering aggression, which seems speculative, as no direct evidence of male intervention or support was reported. Similarly, the use of reproductive state as a proxy for energetic need is an indirect measure and does not account for actual energy expenditure or caloric intake, which weakens the authors' claims that female energetic need induces risk-taking. Overall, this paper would benefit from stronger justification and empirical support to strengthen the conclusions of the study about the mechanisms driving female aggression in gorillas.

      We agree that experimental manipulation would allow us to extend our work. Unfortunately, this is not possible with wild, endangered gorillas.

      We have now added more references (Watts 1994; Watts 1997) and enriched our arguments regarding male presence buffering aggression. Previous research suggests that male gorillas may support lower-ranking females and they may intervene in female-female conflicts (Sicotte 2002). Unfortunately, our dataset did not allow us to test for male protection. We conduct proximity scans every 10 minutes and these scans are not associated to each interaction, meaning that we cannot reliably test if proximity to a male influence the likelihood to receive aggression.

      We have now clearly stated that reproductive state is an indirect proxy for energetic needs. We agree with your point about energy intake and expenditure, but unfortunately, we do not have data on energy expenditure or caloric intake to allow us to delve into more fine-grained analyses.

      Overall, we have tried to enrich the justification and empirical support to strengthen our conclusions by clarifying the text and adding more examples and references.

      Reviewer #2 (Public review):

      Summary:

      The authors' aim in this study is to assess the factors that can shift competitive incentives against higher- or lower-ranking groupmates in two gorilla species.

      Strengths:

      This is a relevant topic, where important insights could be gained. The authors brought together a substantial dataset: a long-term behavioral dataset representing two gorilla species from five social groups.

      Weaknesses:

      The authors have not fully shown the data used in the model and explored the potential of the model. Therefore, I remain cautious about the current results and conclusions.

      Some specific suggestions that require attention are

      (1) The authors described how group size can affect aggression patterns in some species (line 54), using a whole paragraph, but did not include it as an explanation variable in their model, despite that they stated the overall group size can "conflate opposing effects of females and males" (line 85). I suggest underlining the effects of numbers of males or/and females here and de-emphasizing the effect of group size in the Introduction.

      We did not use group size as a main predictor, as has been commonly done in other species, because of potentially conflating opposing effects of males and females. To further stress this point, we have specifically added in the introduction: “group size, the overall number of individuals in the group, might not be a good predictor of aggression heuristics, as it can conflate the effects of different kinds of individuals on aggression (see Smit & Robbins 2024 for an example of opposing effects of the number of females and number of males on female gorilla aggression).”

      We also “ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, [and] its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      (2) There should be more details given about how the authors calculated individual Elo-ratings (line 98). It seems that authors pooled all avoidance/displacement behaviors throughout the study period. But how often was the Elo-rating they included in the model calculated? By the day or by the month? I guess it was by the day, as they "estimate female reproductive state daily" (line 123). If so, it should be made clear in the text.

      We rephrased accordingly: “We used all avoidance and displacement interactions throughout the study period and we used the function elo.seq from R package EloRating to infer daily individual female Elo-scores”. We also clarified that “This method takes into account the temporal sequence of interactions and updates an individual’s Elo-scores each day the individual interacted with another...”

      In addition, all groups were long-term studied, and the group composition seems fluctuant based on the Table 1 in Reference 11. When an individual enters/leaves the group with a stable hierarchy, it takes time before the hierarchy turns stable again. If the avoidance/displacement behaviors used for the rank relationship were not common, it would take a few days or maybe longer. Also, were the aggressive behaviors more common during rank fluctuations? In other words, if avoidance/displacement behaviors and aggressive behaviors occur simultaneously during rank fluctuations, how did the authors deal with it and take it into consideration in the analysis?

      We have shown in Reference 25 (Smit & Robbins 2025) after Reference 11 (Smit & Robbins 2024) that females form highly stable hierarchies, and that dyadic dominance relationships are not influenced by dispersal or death of third individuals. Notably, new immigrant females usually start at and remain low ranking, without large fluctuations in rank. Therefore, the presence of any fluctuation periods have limited influence in the aggressive interactions in our study system.

      The authors emphasized several times in the text that gorillas "form highly stable hierarchical relationships". Also, in Reference 25, they found very high stabilities of each group's hierarchy. However, the number of females involved in that analysis was different from that used here. They need to provide more basic info on each group's dominance hierarchy and verify their statement. I strongly suggest that the authors display Elo-rating trajectories and necessary relevant statistics for each group throughout the study period as part of the supplementary materials.

      In fact, the females involved in the present analysis and the analysis of Smit & Robbins 2025 are the same. Our present analysis is based on the hierarchies of Smit & Robbins 2025. Note that female gorillas disperse and occasionally immigrate to another study group. This is why some females may appear in the hierarchies of more than one group, giving the impression that there are more females involved in the analysis of Smit & Robbins 2025 (e.g. by counting the lines in the Elo-rating plots). We now specifically state that “We present these interactions and hierarchies in detail in Smit & Robbins 2025”, to clarify that the hierarchies are the same.

      (3) The authors stated why they differentiated the different stages based on female reproductive status. They also referred to the differences in energetic needs between stages of pregnancy and lactation (lines 127-128). However, in the mixed model, they only compared the interaction score between the female cycling stage and other stages. The model was not well explained, and the results could be expanded. I suggest conducting more pairwise comparisons in the model and presenting the statistics in the text, if there are significant results. If all three pregnancy stages differed significantly from cycling and lactating stages but not from each other, they may be merged as one pregnancy stage. More in-depth analysis would help provide better answers to the research questions.

      Thank you for pointing this out. First, when we considered one pregnancy stage, pregnant females showed indeed a significantly greater interaction score than females in other reproductive stages. We have now included that in the manuscript. However, we still find relevant to test for the different stages of pregnancy, given the difference of energetic needs in these stages. We have now included the pairwise comparisons in a new table (Table 2).

      Reviewer #3 (Public review):

      Smit and Robbins' manuscript investigates the dynamics of aggression among female groupmates across five gorilla groups. The authors utilize longitudinal data to examine how reproductive state, group size, presence of males, and resource availability influence patterns of aggression and overall dominance rankings as measured by Elo scores. The findings underscore the important role of group composition and reproductive status, particularly pregnancy, in shaping dominance relationships in wild gorillas. While the study addresses a compelling and understudied topic, I have several comments and suggestions that may enhance clarity and improve the reader's experience.

      (1) Clarification of longitudinal data - The manuscript states that 25 years of behavioral data were used, but this number appears unclear. Based on my calculations, the maximum duration of behavioral observation for any one group appears to be 18 years. Specifically:

      • ATA: 6 years

      • BIT: 8 years

      • KYA: 18 years

      • MUK: 6 years

      • ORU: 8 years

      I recommend that the authors clarify how the 25-year duration was derived.

      Indeed none of the five study “groups” has been studied for 25 years in a row. However, MUK emerged from a fission of group KYA in early 2016. So, from the start of group KYA in October 1998 to the end of group MUK in December 2023, there are 25 years and 2 months. We have now rephrased to “...starting in 1998 in one of the mountain gorilla groups” in the introduction, and to “We use a long-term behavioural dataset on five wild groups of the two gorilla species, starting in 1998” in the abstract.

      (2) Consideration of group size - The authors mention that group size was excluded from analyses to avoid conflating the opposing effects of female and male group members. While this is understandable, it may still be beneficial to explore group size effects in supplementary analyses. I suggest reporting statistics related to group size and potentially including a supplementary figure. Additionally, given that the study includes both mountain and wild gorillas, it would be helpful to examine whether any interspecies differences are apparent.

      We have now added the suggested extra test: “When we ran our analysis testing for group size (number of weaned individuals in the group), instead of the numbers of females and males, its influence on interaction score was not significant (estimate=-0.001, p-value=0.682).”

      Regarding species differences: In our analysis, we test for species (mountain vs western) and we find no significant differences between the two. This is stated in the results.

      (3) Behavioral measures clarification - Lines 112-116 describe the types of aggressive behaviors observed. It would be helpful to clarify how these behaviors differ from those used to calculate Elo scores, or whether they overlap. A brief explanation would improve transparency regarding the methodology.

      We now added short explanations into brackets for behaviours that are not obvious. We also added a sentence in the text to clarify the difference with the behaviours used to calculate Elo scores: “These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”.

      (4) Aggression rates versus Elo scores - The manuscript uses aggression rates rather than dominance rank (as measured by Elo scores) as the main outcome variable, but there is no explanation on why. How would the results differ if aggression rates were replaced or supplemented with Elo scores? The current justification for prioritizing aggression rates over dominance rank needs to be more clearly supported.

      The sentence we added above (“These two behaviours [avoidance and displacement] are ritualized, occurring in absence of aggression, they are considered a more reliable proxy of power relationships over aggression, and they are typically used to infer gorilla hierarchical relationships”) and the first paragraph of the results hopefully clarify that ritualized agonistic interactions are generally directionally consistent and more reliably capture the highly stable dominance relationships of female gorillas. This approach has been used to calculate dominance rank in gorillas in all studies that have considered it, dating back to the 1970s (namely in studies by Harcourt and Watts). On the other hand, aggression can be context dependent (we now clearly note that in the beginning of the Methods paragraph on aggressive interactions). Therefore, we use Eloscores inferred from ritualized interactions as base and a reliable proxy of power relationships; then we test if the direction of aggression within these relationships is driven also by energetic needs or the social environment.

    1. eLife Assessment

      This important work by Malita et al. describes a mechanism by which an intestinal infection causes an increase in daytime sleep through signaling from the gut to the blood-brain barrier. Their findings suggest that cytokines upd3 and upd2 produced by the intestine following infection act on glia of the blood brain barrier to regulate sleep by modulating Allatostatin A signaling. The evidence is compelling and elegantly performed using the ample Drosophila genetic toolbox, making this work appealing for a broad group of neuroscience researchers interested in sleep and gut-brain interactions.

    2. Joint Public Review:

      Summary:

      Malita and colleagues investigated the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting the Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation. The evidence supporting most of their claims is compelling. Nevertheless, the activation of the sleep-promoting pathway by infection should be accomplished through bacterial infection of the gut.

      Strengths:

      The work is, in general, supported by well-designed and well-performed experiments, especially those that show that the endocrine cells from the gut are the sources of the Upd cytokines, the effects of these cytokines on daytime sleep, and that the glial cells of the BBB are the target cell for the Upds action. In addition, the evidence associating the downregulation of Alst receptors in the BBB by Upd and Jak/Stat pathways is compelling.

      Weaknesses:

      (1) The model of gut inflammation that is used is based on the increase in reactive oxygen species (ROS) that is caused by adding 1% H2O2 to the food. The use of the model is supported rather weakly by two papers (ref. 26 and 27 ). The paper by Jiang et al. (26) shows that the infection by Pseudomonas entomophila induces cytokine responses Upd2 and 3, which are also induced by the Jnk pathway; there is no mention of ROS. Buchon et al. (27) is a review that refers to results that indicate that as part of the immune response to pathogens in the gut, there is production of ROS by the NADPH oxidase DUOX. Thus, there is no strong support for the use of this model.

      (2) There is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. It is known that ROS causes damage in the gut epithelium, which in turn induces the expression of the cytokines studied, which might be independent of infection and confound the results.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      The authors sought to elucidate the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood-brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation.

      The authors suggest that upd3 is more critical than upd2, which is not sufficiently addressed or explained. In addition, the study uses the gut's response to reactive oxygen molecules as a proxy for infection, which is not sufficiently justified. Finally, further verification of some fundamental tools used in this paper would further solidify these findings making them more convincing.

      Strengths:

      (1) The work addresses an important topic and proposes an intriguing mechanism that involves several interconnected tissues. The authors place their research in the appropriate context and reference related work, such as literature about sickness-induced sleep, ROS, the effect of nutritional deprivation on sleep, sleep deprivation and sleep rebound, upregulated receptor expression as a compensatory mechanism in response to low levels of a ligand, and information about Alst A.

      (2) The work is, in general, supported by well-performed experiments that use a variety of different tools, including multiple RNAi lines, CRISPR, and mutants, to dissect both signal-sending and receiving sides of the signaling pathway.

      (3) The authors provide compelling evidence that shows that endocrine cells from the gut are the source of the upd cytokines that increase daytime sleep, that the glial cells of the BBB are the targets of these upds, and that upd action causes the downregulation of Alst receptors in the BBB via the Jak/Stat pathways.

      We are pleased that the reviewers recognized the strength and significance of our findings describing a gut-to-brain cytokine signaling mechanism involving the blood-brain barrier (BBB) and its role in regulating sleep, and we thank them for their comments.

      Weaknesses:

      (1) There is a limited characterization of cell types in the midgut which are classically associated with upd cytokine production.

      We thank the reviewer for raising this point. Although several midgut cell types (including the absorptive enterocytes) may indeed produce Unpaired (Upd) cytokines, our study specifically focused on enteroendocrine cells (EECs), which are well-characterized as secretory endocrine cells capable of exerting systemic effects. As detailed in our response to Results point #2 (please see below), we show that EEC-specific manipulation of Upd signaling is both necessary and sufficient to regulate sleep in response to intestinal oxidative stress. These findings support the role of EECs as a primary source of gut-derived cytokine signaling to the brain. To acknowledge the possible involvement of other source, we have also added a statement to the Discussion in the revised manuscript noting that other, non-endocrine gut cell types may contribute to systemic Unpaired signaling that modulates sleep.

      (2) Some of the main tools used in this manuscript to manipulate the gut while not influencing the brain (e.g., Voilà and Voilà + R57C10-GAL80), are not directly shown to not affect gene expression in the brain. This is critical for a manuscript delving into intra-organ communication, as even limited expression in the brain may lead to wrong conclusions.

      We agree with the reviewer that this is an important point. To address it, we performed additional validation experiments to assess whether the voilà-GAL4 driver in combination with R57C10-GAL80 (EEC>) influences upd2 or upd3 expression in the brain. Our results show that manipulation using EEC> alters upd2 and upd3 expression in the gut (Fig. 1a,b), with new data showing that this does not affect their expression levels in neuronal tissues (Fig. S1a), supporting the specificity of our approach. These new data are now included in the revised manuscript and described in the Results section. This additional validation strengthens our conclusion that the observed sleep phenotypes result from gut-specific cytokine signaling, rather than from effects on Unpaired cytokines produced in the brain.

      (1) >(3) The model of gut inflammation used by the authors is based on the increase in reactive oxygen species (ROS) obtained by feeding flies food containing 1% H2O2. The use of this model is supported by the authors rather weakly in two papers (refs. 26 and 27 ): The paper by Jiang et al. (ref. 26) shows that the infection by Pseudomonas entomophila induces cytokine responses upd2 and 3, which are also induced by the Jnk pathway. In addition, no mention of ROS could be found in Buchon et al. (ref 27); this is a review that refers to results showing that ROS are produced by the NADPH oxidase DUOX as part of the immune response to pathogens in the gut. Thus, there is no strong support for the use of this model.

      We thank the reviewer for raising this point. We agree that the references originally cited did not sufficiently justify the use of H<sub>2</sub>O<sub>2</sub> feeding as a model of gut inflammation. To address this, we have revised the Results section to clarify that we use H<sub>2</sub>O<sub>2</sub> feeding as a controlled method to elevate intestinal ROS levels, rather than as a general model of inflammation. This approach allows us to investigate the specific effects of ROS-induced cytokine signaling in the gut. We have also added additional citations to support the physiological relevance of this model. For instance, Tamamouna et al. (2021) demonstrated that H<sub>2</sub>O<sub>2</sub> feeding induces intestinal stem-cell proliferation – a response also observed during bacterial infection – and Jiang et al. (2009) showed that enteric infections increase upd2 and upd3 expression, which we similarly observe following H<sub>2</sub>O<sub>2</sub> feeding (Fig. 3a). These findings support the use of H<sub>2</sub>O<sub>2</sub> as a tool to mimic specific ROS-linked responses in the gut. We believe this targeted and tractable model is a strength of our study, enabling us to dissect how intestinal ROS modulates systemic physiology through cytokine signaling

      Additionally, we have included a statement in the Discussion acknowledging that ROS generated during infection may activate signaling mechanisms distinct from those triggered by chemically induced oxidative stress, and that exploring these differences in future studies may yield important insights into gut–brain communication. These revisions provide a stronger justification for our model while more accurately conveying both its relevance and its limitations.

      (2) >(4) Likewise, there is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. Furthermore, it is known that ROS damages the gut epithelium, which in turn induces the expression of the cytokines studied. Thus the effects observed may not reflect the response to infection. In addition, Majcin Dorcikova et al. (2023). Circadian clock disruption promotes the degeneration of dopaminergic neurons in male Drosophila. Nat Commun. 2023 14(1):5908. doi: 10.1038/s41467-02341540-y report that the feeding of adult flies with H2O2 results in neurodegeneration if associated with circadian clock defects. Thus, it would be important to discuss or present controls that show that the feeding of H2O2 does not cause neuronal damage.

      We thank the reviewer for this thoughtful follow-up point. We would like to clarify that we do not claim that the effects observed in our study directly reflect the full response to enteric infection. As outlined in our revised response to comment 3, we have updated the manuscript to more precisely describe the H<sub>2</sub>O<sub>2</sub>-feeding paradigm as a model that induces local intestinal ROS responses comparable to, but not equivalent to, those observed during pathogenic challenges. This revised framing highlights both the potential similarities and differences between chemically induced oxidative stress and infection-induced responses. Indeed, in the revised Discussion, we now explicitly acknowledge that ROS generated during infection may engage distinct signaling mechanisms compared to exogenous H<sub>2</sub>O<sub>2</sub> and emphasize the value of future studies in delineating these pathways. We are currently pursuing this direction in an independent ongoing study investigating the effects of enteric infections. However, for the present work, we chose to focus on the effects of ROS-induced responses in isolation, as this provides a clean and well-controlled context to dissect the specific contribution of oxidative stress to cytokine signaling and sleep regulation.

      To further address the reviewer’s concern, we have also included new data (a TUNEL stain for apoptotic DNA fragmentation) in the revised manuscript showing that H<sub>2</sub>O<sub>2</sub> feeding does not damage neuronal tissues under our experimental conditions (Fig. S3f,g). This addresses the point raised regarding the potential neurotoxicity of H<sub>2</sub>O<sub>2</sub>, as described by Majcin Dorcikova et al. (2023), and supports the specificity of the sleep phenotypes observed in our study. We believe these revisions and clarifications strengthen the manuscript and make our interpretation more precise.

      (3) >(5) The novelty of the work is difficult to evaluate because of the numerous publications on sleep in Drosophila. Thus, it would be very helpful to read from the authors how this work is different and novel from other closely related works such as: Li et al. (2023) Gut AstA mediates sleep deprivation-induced energy wasting in Drosophila. Cell Discov. 23;9(1):49. doi: 10.1038/s41421-023-00541-3.

      Our work highlights a distinct role for gut-derived AstA in sleep regulation compared to findings by Lin et al. (Cell Discovery, 2023)[1], who showed that gut AstA mediates energy wasting during sleep deprivation. Their study focused on the metabolic consequences of sleep loss, proposing that sleep deprivation increases ROS in the gut, which then promotes the release of the glucagon-like hormone adipokinetic hormone (AKH) through gut AstA signaling, thereby triggering energy expenditure.

      In contrast, our study addresses the inverse question – how ROS in the gut influences sleep. In our model, intestinal ROS promotes sleep, raising the intriguing possibility – cleverly pointed out by the reviewers – that ROS generated during sleep deprivation might promote sleep by inducing Unpaired cytokine signaling in the gut. According to our findings, this suppresses wake-promoting AstA signaling in the BBB, providing a mechanism to promote sleep as a restorative response to gut-derived oxidative stress and potentially limiting further ROS accumulation. Importantly, our findings support a wakepromoting role for EEC-derived AstA, demonstrated by several lines of evidence. First, EEC-specific knockdown of AstA increases sleep. Second, activation of AstA<sup>+</sup> EECs using the heat-sensitive cation channel Transient Receptor Potential A1 (TrpA1) reduces sleep, and this effect is abolished by simultaneous knockdown of AstA, indicating that the sleep-suppressing effect is mediated by AstA and not by other peptides or secreted factors released by these cells. Third, downregulation of AstA receptor expression in BBB glial cells increases sleep, further supporting the existence of a functional gut AstA– glia arousal pathway. We have now included new data in the revised manuscript showing that AstA release from EECs is downregulated during intestinal oxidative stress (Fig. 7k,l,m). This suggests that this wake-promoting signal is suppressed both at its source (the gut endocrine cells), by unknown means, and at its target, the BBB, via Unpaired cytokine signaling that downregulates AstA receptor expression. This coordinated downregulation may serve to efficiently silence this arousal-promoting pathway and facilitate sleep during intestinal stress. These new data, along with an expanded discussion, provide further mechanistic insight into gut-derived AstA signaling and strengthen our proposed model.

      This contrasts with the interpretation by Lin et al., who observed increased AstA peptide levels in EECs after antioxidant treatment and interpreted this as peptide retention. However, peptide accumulation may result from either increased production or decreased release, and peptide levels alone are insufficient to distinguish between these possibilities. To resolve this, we examined AstA transcript levels, which can serve as a proxy for production. Following oxidative stress (24 h of 1% H<sub>2</sub>O<sub>2</sub> feeding and the following day), when animals show increased sleep (Fig. 7e), we observed a decrease in AstA transcript levels followed by an increase in peptide levels (Fig. 7k,l,m), suggesting that oxidative stress leads to reduced gut AstA production and release. Furthermore, we recently found that a class of EECs that produce the hormone Tachykinin (Tk) and are distinct from the AstA<sup>+</sup> EECs express the ROSsensitive cation channel TrpA1 (Ahrentløv et al., 2025, Nature Metabolism2). In these Tk<sup>+</sup> EECs, TrpA1 mediates ROS-induced Tk hormone release. In contrast, single-cell RNA-seq data[3] do not support TrpA1 expression in AstA<sup>+</sup> EECs, consistent with our findings that ROS does not promote AstA release – an effect that would be expected if TrpA1 were functionally expressed in AstA<sup>+</sup> EECs. This contradicts the findings of Lin et al., who reported TrpA1 expression in AstA<sup>+</sup> EECs. We have now included relevant single-cell data in the revised manuscript (Fig. S6f) showing that TrpA1 is specifically expressed in Tk<sup>+</sup> EECs, but not in AstA<sup>+</sup> EECs, and we have expanded the discussion to address discrepancies in TrpA1 expression and AstA regulation.

      Taken together, our results reveal a dual-site regulatory mechanism in which Unpaired cytokines released from the gut act at the BBB to downregulate AstA receptor expression, while AstA release from EECs is simultaneously suppressed. We thank the reviewers for raising this important point. We have also included a discussion the other point raised by the reviewers – the possibility that ROS generated during sleep deprivation may engage the same signaling pathways described here, providing a mechanistic link between sleep deprivation, intestinal stress, and sleep regulation.

      Recommendations for the authors:

      A- Material and Methods:

      (1) Feeding Assay: The cited publication (doi.org:10.1371/journal.pone.0006063) states: "For the amount of label in the fly to reflect feeding, measurements must therefore be confined to the time period before label egestion commences, about 40 minutes in Drosophila, a time period during which disturbance of the flies affects their feeding behavior. There is thus a requirement for a method of measuring feeding in undisturbed conditions." Was blue fecal matter already present on the tube when flies were homogenized at 1 hour? If so, the assay may reflect gut capacity rather than food passage (as a proxy for food intake). In addition, was the variability of food intake among flies in the same tube tested (to make sure that 1-2 flies are a good proxy for the whole population)?

      We agree that this is an important point for feeding experiments. We are aware of the methodological considerations highlighted in the cited study and have extensive experience using a range of feeding assays in Drosophila, including both short- and long-term consumption assays (e.g., dye-based and CAFE assays), as well as automated platforms such as FLIC and FlyPAD (Nature Communications, 2022; Nature Metabolism, 2022; and Nature Metabolism, 2025)[2,4,5].

      For the dye-based assay, we carefully selected a 1-hour feeding window based on prior optimization. Since animals were not starved prior to the assay, shorter time points (e.g., 30 minutes) typically result in insufficient ingestion for reliable quantification. A 1-hour period provides a robust readout while remaining within the timeframe before significant label excretion occurs under our experimental conditions. To support the robustness of our findings, we complemented the dye-based assay with data from FLIC, which enables automated, high-resolution monitoring of feeding behavior in undisturbed animals over extended periods. The FLIC results were consistent with the dye-based data, strengthening our confidence in the conclusions. To minimize variability and ensure consistency across experiments, all feeding assays were performed at the same circadian time – Zeitgeber Time 0 (ZT0), corresponding to 10:00 AM when lights are turned on in our incubators. This time point coincides with the animals' natural morning feeding peak, allowing for reproducible comparisons across conditions. Regarding variability among flies within tubes, each biological replicate in the dye assay consisted of 1–2 flies, and results were averaged across multiple replicates. We observed good consistency across samples, suggesting that these small groups reliably reflect group-level feeding behavior under our conditions.

      (2) Biological replicates: whereas the number of samples is clearly reported in each figure, the number of biological replicates is not indicated. Please include this information either in Material and methods or in the relevant figure legends. Please also include a description of what was considered a biological replicate.

      We have now clarified in the Materials and Methods section under Statistics that all replicates represent independent biological samples, as suggested by the reviewers.

      (3) Control Lines: please indicate which control lines were used instead of citing another publication. If preferred, this information could be supplied as a supplementary table.

      We now provide a clear description of the control lines used in the Materials and Methods section. Specifically, all GAL4 and GAL80 lines used in this study were backcrossed for several generations into a shared w<sup>1118</sup> background and then crossed to the same w<sup>1118</sup> strain used as the genetic background for the UAS-RNAi, <i.CRISPR, or overexpression lines. This approach ensures, to a strong approximation, that the only difference between control and experimental animals is the presence or absence of the UAS transgene.

      (4) Statistical analyses: for some results (e.g., those shown in Figure 3d), it could be useful to test the interaction between genotype and treatment.

      We thank the reviewer for this helpful suggestion. In response, we have now performed two-way ANOVA analyses to assess genotype × treatment (diet) interaction effects for the relevant data, including those shown in Figure 3d as well as additional panels where animals were exposed to oxidative stress and sleep phenotypes were measured. We have added the corresponding interaction p-values in the updated figure legends for Figures 3d, 3k, 5a–c, 5f, 5h, 5i, 6c, 6e, and 7e. All of these tests revealed significant interaction effects, supporting the conclusion that the observed differences in sleep phenotypes are specifically dependent on the interaction between genetic manipulation (e.g., cytokine or receptor knockdown) and oxidative stress. These additions reinforce the interpretation that Unpaired cytokine signaling, glial JAK-STAT pathway activity, and AstA receptor regulation functionally interact with intestinal ROS exposure to modulate sleep. We thank the reviewer for suggesting this improvement.

      (5) Reporting of p values. Some are reported as specific values whereas others are reported as less than a specific value. Please make this reporting consistent across different figures.

      All p-values reported in the manuscript are exact, except in cases where values fall below p < 0.0001. In those instances, we use the inequality because the Prism software package (GraphPad, version 10), which was used for all statistical analyses, does not report more precise values. We believe this reporting approach reflects standard practice in the field.

      (6) Please include the color code used in each figure, either in the figure itself or in the legend.

      We have now clarified the color coding in all relevant figures. In particular, we acknowledge that the meaning of the half-colored circles used to indicate H<sub>2</sub>O<sub>2</sub> treatment was not previously explained. These have now been clearly labeled in each figure to indicate treatment conditions.

      (7) The scheme describing the experimental conditions and the associated chart is confusing. Please improve.

      We have improved the schematic by replacing “ROS” with “H<sub>2</sub>O<sub>2</sub>” to more clearly indicate the experimental condition used. Additionally, we have added the corresponding circle annotations so that they now also appear consistently above the relevant charts. This revised layout enhances clarity and helps readers more easily interpret the experimental conditions. We believe these changes address the reviewer’s concern and make the figure significantly more intuitive.

      8) Please indicate which line was used for upd-Gal4 and the evidence that it faithfully reflects upd3 expression.

      We have now clarified in the Materials and Methods section that the upd3-GAL4 line used in our study is Bloomington stock #98420, which drives GAL4 expression under the control of approximately 2 kb of sequence upstream of the upd3 start codon. This line has previously been used as a transcriptional reporter for upd3 activity. The only use of this line was to illustrate reporter expression in the EECs. To support this aspect of Upd3 expression, we now include new data in the revised manuscript using fluorescent in situ hybridization (FISH) against upd3, which confirms the presence of upd3 transcripts in prospero-positive EECs of the adult midgut (Fig. S1b). Additionally, we show that upd3 transcript levels are significantly reduced in dissected midguts following EEC-specific knockdown using multiple independent RNAi lines driven by voilà-GAL4, both alone and in combination with R57C10-GAL80, consistent with endogenous expression in these cells (Fig. 1a,b).

      To further address the reviewer’s concern and provide additional support for the endogenous expression of upd3 in EECs, we performed targeted knockdown experiments focusing on molecularly defined EEC subpopulations. The adult Drosophila midgut contains two major EEC subtypes characterized by their expression of Allatostatin C (AstC) or Tachykinin (Tk), which together encompass the vast majority of EECs. To selectively manipulate these populations, we used AstC-GAL4 and Tk-GAL4 drivers – both knock-in lines in which GAL4 is inserted at the respective endogenous hormone loci. This design enables precise GAL4 expression in AstC- or Tk-expressing EECs based on their native transcriptional profile. To eliminate confounding neuronal expression, we combined these drivers with R57C10GAL80, restricting GAL4 activity to the gut and generating AstC<sup>Gut</sup>> and Tk<sup>Gut</sup>> drivers. Using these tools, we knocked down upd2 and upd3 selectively in the AstC- or Tk-positive EECs. Knockdown of either cytokine in AstC-positive EECs significantly increased sleep under homeostatic conditions, recapitulating the phenotype observed with knockdown in all EECs (Fig. 1m-o). In contrast, knockdown of upd2 or upd3 in Tk-positive EECs had no effect on sleep (Fig. 1p-r). Furthermore, we show in the revised manuscript that selective knockdown of upd2 or upd3 in AstC-positive EECs abolishes the H<sub>2</sub>O<sub>2</sub>-induced increase in sleep (Fig. 3f–h). These findings demonstrate that Unpaired cytokine signaling from AstC-positive EECs is essential for mediating the sleep response to intestinal oxidative stress, highlighting this specific EEC subtype as a key source of cytokine-driven regulation in this context. These new results indicate that AstC-positive EECs are a primary source of the Unpaired cytokines that regulate sleep, while Tk-positive EECs do not appear to contribute to this function. Importantly, upd3 transcript levels were significantly reduced in dissected midguts following AstC<sup>Gut</sup> driven knockdown (Fig. S1r), further confirming that upd3 is endogenously expressed in AstC-positive EECs. Thus we have bolstered our confidence that upd3 is indeed expressed in EECs, as illustrated by the reporter line, through several means.

      (9) Please indicate which GFP line was used with upd-Gal4 (CD8, NLS, un-tagged, etc). The Material and Methods section states that it was "UAS-mCD8::GFP (#5137);", however, the stain does not seem to match a cell membrane pattern but rather a nuclear or cytoplasmic pattern. This information would help the interpretation of Figure 1C.

      We confirm that the GFP reporter line used with upd3-GAL4 was obtained from Bloomington stock #98420. As noted by the Bloomington Drosophila Stock Center, “the identity of the UAS-GFP transgene is a guess,” and the subcellular localization of the GFP fusion is therefore uncertain. We agree with the reviewer that the signal observed in Figure 1c does not display clear membrane localization and instead appears diffuse, consistent with cytoplasmic or partially nuclear localization. In any case, what we find most salient is the reporter’s labeling of Prospero-positive EECs in the adult midgut, consistent with upd3 expression in these cells. This conclusion is further supported by multiple lines of evidence presented in the revised manuscript, as mentioned above in response to question #8: (1) fluorescent in situ hybridization (FISH) for upd3 confirms expression in EECs (Fig. S1b), (2) EEC-specific RNAi knockdown of upd3 reduces transcript levels in dissected midguts, and (3) publicly available single-cell RNA sequencing datasets[3] also indicate that upd3 is expressed at low levels in a subset of adult midgut EECs under normal conditions. We have also clarified in the revised Materials and Methods section that GFP localization is undefined in the upd3-GAL4 line, to guide interpretation of the reporter signal.

      B- Results

      (1) Figure 1: According to previous work (10.1016/j.celrep.2015.06.009, http://flygutseq.buchonlab.com/data?gene=upd3%0D%0A), in basal conditions upd3 is expressed as following: ISC (35 RPKM), EB (98 RPKM), EC (57 RPKM), and EEC (8 RPKM). Accordingly, even complete KO in EECs should eliminate only a small fraction of upd3 from whole guts, even less considering the greater abundance of other cell types such as ECs compared to EECs. It would be useful to understand where this discrepancy comes from, in case it is affecting the conclusion of the manuscript. While this point per se does not affect the main conclusions of the manuscript, it makes the interpretation of the results more difficult.

      We acknowledge the previously reported low expression of upd3 in EECs. However, the FlyGut-seq site appears to be no longer available, so we could not directly compare other related genes. Nonetheless, our data – based on in situ hybridization, reporter expression, and multiple RNAi knockdowns – consistently support upd3 expression in EECs. These complementary approaches strengthen the conclusion that EECs are an important source of systemic upd3 under the conditions tested.

      (2) Figure 1: The upd2-3 mutants show sleep defects very similar to those of EEC>RNAi and >Cas9. It would thus be helpful to try to KO upd3 with other midgut drivers (An EC driver like Myo1A or 5966GS and a progenitor driver like Esg or 5961GS) to validate these results. Such experiments might identify precisely which cells are involved in the gut-brain signaling reported here.

      We appreciate the reviewer’s suggestion and agree that exploring other potential sources of Upd3 in the gut is an interesting direction. In this study, we have focused on EECs, which are the primary hormone-secreting cells in the intestine and thus the most likely candidates for mediating systemic effects such as gut-to-brain signaling. While it is possible that other gut cell types – such as enterocytes (e.g., Myo1A<sup>+</sup>) or intestinal progenitors (e.g., Esg<sup>+</sup>) – also contribute to Upd3 production, these cells are not typically endocrine in nature. Demonstrating their involvement in gutto-brain communication would therefore require additional, extensive validation beyond the scope of the current study. Importantly, our data show that manipulating Upd3 specifically in EECs is both necessary and sufficient to modulate sleep in response to intestinal ROS, strongly supporting the conclusion that EEC-derived cytokine signaling underlies the observed phenotype. In contrast, manipulating cytokines in other gut cells could produce indirect effects – such as altered proliferation, epithelial integrity, or immune responses – that complicate the interpretation of behavioral outcomes like sleep. For these reasons, we chose to focus on EECs as the source of endocrine signals mediating gut-to-brain communication. However, to address this point raised by the reviewer, we have now included a statement in the Discussion acknowledging that other non-endocrine gut cell types may also contribute to the systemic Unpaired signaling that modulates sleep in response to intestinal oxidative stress.

      (3) Figure 3: "This effect mirrored the upregulation observed with EEC-specific overexpression of upd3, indicating that it reflects physiologically relevant production of upd3 by the gut in response to oxidative stress." Please add (Figure 3a) at the end of this sentence.

      We have now added “(Figure 3a)” at the end of the sentence to clearly reference the relevant data.

      (4) For Figure 3b, do you have data showing that the increased amount of sleep was due to the addition of H2O2 per se, rather than the procedure of adding it?

      We have added new data to address this point. To ensure that the observed sleep increase was specifically due to the presence of H<sub>2</sub>O<sub>2</sub> and not an effect of the food replacement procedure, we performed a control experiment in which animals were fed standard food prepared using the same protocol and replaced daily, but without H<sub>2</sub>O<sub>2</sub>. These animals did not exhibit increased sleep, confirming that the sleep effect is attributable to intestinal ROS rather than the supplementation procedure itself (Fig. S3a). Thanks for the suggestion.

      (5) In the text it is stated that "Since 1% H2O2 feeding induced robust responses both in upd3 expression and in sleep behavior, we asked whether gut-derived Unpaired signaling might be essential for the observed ROS-induced sleep modulation. Indeed, EEC-specific RNAi targeting upd2 or upd3 abolished the sleep response to 1% H2O2 feeding." While it is indeed true that there is no additional increase in sleep time due to EEC>upd3 RNAi, it is also true that EEC>upd3 RNAi flies, without any treatment, have already increased their sleep in the first place. It is then possible that rather than unpaired signaling being essential, an upper threshold for maximum sleep allowed by manipulation of these processes was reached. It would be useful to discuss this point.

      Several findings argue against a ceiling effect and instead support a requirement for Unpaired signaling in mediating ROS-induced sleep. Animals with EEC-specific upd2 or upd3 knockdown or null mutation not only fail to increase sleep following H<sub>2</sub>O<sub>2</sub> treatment but actually exhibit reduced sleep during oxidative stress (Fig. 3e, k, l; Fig. 5e, f), suggesting that Unpaired signaling is required to sustain sleep under these conditions. Similarly, animals with glial dome knockdown also show reduced sleep under oxidative stress, closely mirroring the phenotype of EEC-specific upd3 RNAi animals (Fig. 5a–c, g–i). These results support the conclusion that gut-to-glia Unpaired cytokine signaling is necessary for maintaining elevated sleep during oxidative stress. In the absence of this signaling, animals exhibit increased wakefulness. We identify AstA as one such wake-promoting signal that is suppressed during intestinal stress. We present new data showing that this pathway is downregulated not only via Unpaired-JAK/STAT signaling in glial cells but also through reduced AstA release from the gut in the revised manuscript. This model, in which Unpaired cytokines promote sleep during intestinal stress by suppressing arousal pathways, is discussed throughout the manuscript to address the reviewer’s point.

      (6) In Figure 3k, the dots highlighting the experiment show an empty profile, a full one, and a half one. Please define what the half dots represent.

      We have now clarified the color coding in all relevant figures. Specifically, we acknowledge that the meaning of the half-colored circles indicating H<sub>2</sub>O<sub>2</sub> treatment was not previously defined – it indicates washout or recovery time. In the revised version, these symbols are now clearly labeled in each figure to indicate the treatment condition, ensuring consistent and intuitive interpretation across all panels.

      (7) The authors used appropriate GAL4 and RNAi lines to the knockdown dome, a upd2/3 JAK-STATlinked receptor, specifically in neurons and glia, respectively, in order to identify the CNS targets of upd2/3 cytokines produced by enteroendocrine cells (EECs). Pan-neuronal dome knockdown did not alter daytime sleep in adult females, yet pan-glial dome knockdown phenocopied effects of upd2/3 knockdown in EECs. They also observed that EEC-specific knockdown of upd2 and upd3 led to a decrease in JAK-STAT reporter activity in repo-positive glial cells. This supports the authors' conclusion that glial cells, not neurons, are the targets by which unpaired cytokines regulate sleep via JAK-STAT signaling. However, they do not show nighttime sleep data of pan-neuronal and pan-glial dome knockdowns. It would strengthen their conclusion if the nighttime sleep of pan-glial dome knockdown phenocopied the upd2/3 knockdowns as well, provided the pan-neuronal dome knockdown did not alter nighttime sleep.

      We have now added nighttime sleep data for both pan-glial and pan-neuronal domeless knockdowns in the revised manuscript (Fig. 2a). Glial knockdown increased nighttime sleep, similar to EEC-specific upd2/3 knockdown, while neuronal knockdown had no effect. These results further support the glial cells’ being the relevant target of gut-derived Unpaired signaling.

      (8) The authors only used one method to induce oxidative stress (hydrogen peroxide feeding). It would strengthen their argument to test multiple methods of inducing oxidative stress, such as lipopolysaccharide (LPS) feeding. In addition, it would be useful to use a direct bacterial infection to confirm that in flies, the infection promotes sleep. Additionally, flies deficient in Dome in the BBB and infected should not be affected in their sleep by the infection. These experiments would provide direct support for the mechanism proposed. Finally, the authors should add a primary reference for using ROS as a model of bacterial infection and justify their choice better.

      We agree that directly comparing different models of intestinal stress, such as bacterial infection or LPS feeding, would provide valuable insight into how gut-derived signals influence sleep in response to infection. As noted in our detailed responses above, we now include an expanded rationale for our use of H<sub>2</sub>O<sub>2</sub> feeding as a controlled and well-established method for inducing intestinal ROS – one of the key physiological responses to enteric infection and inflammation. In the revised Discussion, we explicitly acknowledge that pathogenic infections – which trigger both intestinal ROS and additional immune pathways – may engage distinct or complementary mechanisms compared to chemically induced oxidative stress. We emphasize the importance of future studies aimed at dissecting these differences. In fact, we are actively pursuing this direction in ongoing work examining sleep responses to enteric infection. For the purposes of the present study, however, we chose to focus on a tractable and specific model of ROS-induced stress to define the contribution of Unpaired cytokine signaling to gut-brain communication and sleep regulation. This approach allowed us to isolate the effect of oxidative stress from other confounding immune stimuli and identify a glia-mediated signaling mechanism linking gut epithelial stress to changes in sleep behavior.

      (9) To confirm that animals lacking EEC Unpaired signaling are not more susceptible to ROS-induced damage, the authors assessed the survival of upd2 and upd3 knockdowns on 1% H2O2 and concluded they display no additional sensitivity to oxidative stress compared to controls. It may be useful to include other tests of sensitivity to oxidative stress, in addition to survival.

      We appreciate the reviewer’s suggestion. In our view, survival is a highly informative and stringent readout, as it reflects the overall physiological capacity of the animal to withstand oxidative stress. Importantly, our data show that animals lacking EEC-derived Unpaired signaling do not exhibit reduced survival following H<sub>2</sub>O<sub>2</sub> exposure, indicating that their oxidative stress resistance is not compromised. Furthermore, we previously confirmed that feeding behavior is unaffected in these animals, suggesting that their ability to ingest food (and thus the stressor) is not impaired. As a molecular complement to these assays in response to this point and others, we have also performed an assessment of neuronal apoptosis (a TUNEL assay, Fig. S3f,g). This assay did not identify an increase in cell death in the brains of animals fed peroxide-containing medium. Thus, gross neurological health, behavior, and overall survival appear to be resilient to the environmental treatment regime we apply here, suggesting that the outcomes we observe arise from signaling per se.

      (10) The authors confirmed that animals lacking EEC-derived upd3 displayed sleep suppression similar to controls in response to starvation. These results led the authors to conclude that there is a specific requirement for EEC-derived Unpaired signaling in responding to intestinal oxidative stress. However, they previously showed that EEC-specific knockdown of upd3 and upd2 led to increased daytime sleep under normal feeding conditions. Their interpretations of their data are inconsistent.

      We appreciate the reviewer’s comment. While animals lacking EEC-derived Unpaired signaling show increased baseline sleep under normal feeding conditions, they still exhibit a robust reduction in sleep when subjected to starvation – comparable to that of control animals (Fig. S3h–j). This demonstrates that they retain the capacity to appropriately modulate sleep in response to metabolic stress. Thus, the sleep-promoting phenotype under normal conditions does not reflect a generalized inability to adjust sleep behavior. Rather, it highlights a specific role for Unpaired signaling in mediating sleep responses to intestinal oxidative stress, not in broadly regulating all sleep-modulating stimuli.

      (11) The authors report a significant increase in JAK-STAT activity in surface glial cells at ZT0 in animals fed 1% H2O2-containing food for 20 hours. This response was abolished in animals with EECspecific knockdown of upd2 or upd3. The authors confirmed there were no unintended neuronal effects on upd2 or upd3 expression in the heads. They also observed an upregulation of dome transcript levels in the heads of animals with EEC-specific knockdown of upd3 fed 1% H2O2-containing food for 15 hours, which they interpret to be a compensatory mechanism in response to low levels of the ligand. This assay is inconsistent with previous experiments in which animals were fed hydrogen peroxide for 20 hours.

      We thank the reviewer for identifying this discrepancy. The inconsistency arose from a labeling error in the manuscript. Both the JAK-STAT reporter assays in glial cells and the dome expression measurements were performed following 15 hours of H<sub>2</sub>O<sub>2</sub> feeding, not 20 hours as previously stated. We have now corrected this in the revised manuscript.

      (12) The authors show that animals with glia-specific dome knockdown did not have decreased survival on H2O2-containing food, and displayed normal rebound sleep in the morning following sleep deprivation. These results potentially undermine the significance of the paper. If the normal sleep response to oxidative stress is an important protective mechanism, why would oxidative stress not decrease survival in dome knockdown flies (that don't have the normal sleep response to oxidative stress)? This suggests that the proposed mechanism is not important for survival. The authors conclude that Dome-mediated JAK-STAT signaling in the glial cells specifically regulates ROS-induced sleep responses, which their results support.

      We agree that our survival data show that glial dome knockdown does not reduce survival under continuous oxidative stress. However, we believe this does not undermine the importance of the sleep response as an adaptive mechanism. In our survival assay, animals were continuously exposed to 1% H<sub>2</sub>O<sub>2</sub> without the opportunity to recover. In contrast, under natural conditions, oxidative stress is likely to be intermittent, and the ability to mount a sleep response may be particularly important for promoting recovery and maintaining homeostasis during or after transient stress episodes. Thus, while the JAK-STAT-mediated sleep response may not directly enhance survival under constant oxidative challenge, it likely plays a critical role in adaptive recovery under natural conditions.

      (13) Altogether, the authors conclude that enteric oxidative stress induces the release of Unpaired cytokines which activate the JAK-STAT pathway in subperineurial glia of the BBB, which leads to the glial downregulation of receptors for AstA, which is a wake-promoting factor also released by EECs. This mechanism is supported by their results, however, this research raises some intriguing questions, such as the role of upd2 versus upd3, the role of AstA-R1 versus AstA-R2, the importance of this mechanism in terms of survival, the sex-specific nature of this mechanism, and the role that nutritional availability plays in the dual functionality of Unpaired cytokine signaling in regards to sleep.

      We thank the reviewer for highlighting these important questions. Our data suggest that Upd2 and Upd3, while often considered partially redundant, both contribute to sleep regulation, with stronger effects observed for Upd3. This is consistent with prior studies indicating overlapping but non-identical roles for these cytokines. Similarly, although AstA-R1 and AstA-R2 can both be activated by AstA, knockdown of AstA-R2 consistently produces more robust sleep phenotypes, suggesting a predominant role in mediating this effect. The possibility of sex-specific regulation is indeed compelling. While our study focused on females, many gut hormones show sex-dependent activity, and we recognize this as an important avenue for future research. Finally, we have included new data in the revised manuscript showing that gut-derived AstA is downregulated under oxidative stress, further supporting our model in which Unpaired signaling suppresses arousal pathways during intestinal stress

      (14)Data Availability: It is indicated that: "Reasonable data requests will be fulfilled by the lead author". However, eLife's guidelines for data sharing require that all data associated with an article to be made freely and widely available.

      We thank the reviewer for pointing this out. We have revised the Data Availability section of the manuscript to clarify that all data will be made freely available from the lead contact without restriction, in accordance with eLife’s open data policy.

      References

      (1) Li, Y., Zhou, X., Cheng, C., Ding, G., Zhao, P., Tan, K., Chen, L., Perrimon, N., Veenstra, J.A., Zhang, L., and Song, W. (2023). Gut AstA mediates sleep deprivaPon-induced energy wasPng in Drosophila. Cell Discov 9, 49. 10.1038/s41421-023-00541-3. (2) Ahrentlov, N., Kubrak, O., Lassen, M., Malita, A., Koyama, T., Frederiksen, A.S., Sigvardsen, C.M., John, A., Madsen, P., Halberg, K.A., et al. (2025). Protein-responsive gut hormone Tachykinin directs food choice and impacts lifespan. Nature Metabolism. 10.1038/s42255-025-01267-0.

      (3) Li, H., Janssens, J., De Waegeneer, M., Kolluru, S.S., Davie, K., Gardeux, V., Saelens, W., David, F.P.A., Brbic, M., Spanier, K., et al. (2022). Fly Cell Atlas: A single-nucleus transcriptomic atlas of the adult fruit fly. Science 375, eabk2432. 10.1126/science.abk2432.

      (4) Kubrak, O., Koyama, T., Ahrentlov, N., Jensen, L., Malita, A., Naseem, M.T., Lassen, M., Nagy, S., Texada, M.J., Halberg, K.V., and Rewitz, K. (2022). The gut hormone AllatostaPn C/SomatostaPn regulates food intake and metabolic homeostasis under nutrient stress. Nature communicaPons 13, 692. 10.1038/s41467-022-28268-x.

      (5) Malita, A., Kubrak, O., Koyama, T., Ahrentlov, N., Texada, M.J., Nagy, S., Halberg, K.V., and Rewitz, K. (2022). A gut-derived hormone suppresses sugar appePte and regulates food choice in Drosophila. Nature Metabolism 4, 1532-1550. 10.1038/s42255-022-00672-z.

    1. eLife Assessment

      This important study addresses how wing morphology and kinematics change across hoverflies of different body sizes. The authors provide convincing evidence that there is no significant correlation between body size and wing kinematics across 28 species and instead argue that non-trivial changes in wing size and shape evolved to support flight across the size range. Overall, this paper illustrates the power and beauty of an integrative approach to animal biomechanics and will be of broad interest to biologists, physicists and engineers.

    2. Reviewer #1 (Public review):

      The paper is well written and the figures well laid out. The methods are easy to follow, and the rational and logic for each experiment easy to follow. The introduction sets the scene well, and the discussion is appropriate. The summary sentences throughout the text help the reader.

      The authors have done a lot of work addressing my previous concerns and those of the other Reviewers.

    3. Reviewer #2 (Public review):

      Summary

      Le Roy et al quantify wing morphology and wing kinematics across twenty eight and eight hoverfly species, respectively; the aim is to identify how weight support during hovering is ensured across body sizes. Wing shape and relative wing size vary non-trivially with body mass, but wing kinematics are reported to be size-invariant. On the basis of these results, it is concluded that weight support is achieved solely through size-specific variations in wing morphology, and that these changes enabled hoverflies to decrease in size. Adjusting wing morphology may be preferable compared to the alternative strategy of altering wing kinematics, because kinematics may be subject to stronger evolutionary and ecological constraints, dictated by the highly specialised flight and ecology of the hoverflies.

      Strengths

      The study deploys a vast array of challenging techniques, including flight experiments, morphometrics, phylogenetic analyses, and numerical simulations; it so illustrates both the power and beauty of an integrative approach to animal biomechanics. The question is well motivated, the methods appropriately designed, and the discussion elegantly places the results in broad biomechanical, ecological, and evolutionary context.

      Weaknesses

      (1) In assessing evolutionary allometry, it is key to pinpoint the variation expected from changes in size alone. The null hypothesis for wing morphology is well-defined (isometry), but the equivalent predictions for kinematic parameters, although specified, are insufficiently justified, and directly contradict classic scaling theory. A detailed justification of the "kinematic similarity" assumption, or a change in the null hypothesis, would substantially strengthen the paper, and clarify its evolutionary implications.

      (2) By relating the aerodynamic output force to wing morphology and kinematics, it is concluded that smaller hoverflies will find it more challenging to support their body mass--a scaling argument that provides the framework for this work. This hypothesis appears to stand in direct contrast to classic scaling theory, where the gravitational force is thought to present a bigger challenge for larger animals, due to their disadvantageous surface-to-volume ratios. The same problem ought to occur in hoverflies, for wing kinematics must ultimately be the result of the energy injected by the flight engine: muscle. Much like in terrestrial animals, equivalent weight support in flying animals thus requires a positive allometry of muscle force output. In other words, if a large hoverfly is able to generate the wing kinematics that suffice to support body weight, an isometrically smaller hoverfly should be, too (but not vice versa). Clarifying the relation between the scaling of muscle mechanical input, wing kinematics, and weight support would help resolve the conflict between these two contrasting hypotheses, and considerably strengthen the biomechanical motivation and evolutionary interpretation.

      (3) One main conclusion-- that miniaturization is enabled by changes in wing morphology--is insufficiently supported by the evidence. Is it miniaturization or "gigantism" that is enabled by (or drives) the non-trivial changes in wing morphology? To clarify this question, the isolated treatment of constraints on the musculoskeletal system vs the "flapping-wing based propulsion" system needs to be replaced by an integrated analysis: the propulsion of the wings, is, after all, due to muscle action. Revisiting the scaling predictions by assessing what the engine (muscle) can impart onto the system (wings) will clarify whether non-trivial adaptations in wing shape or kinematics are necessary for smaller or larger hovering insects (if at all!).

      In many ways, this work provides a blueprint for work in evolutionary biomechanics; the breadth of both the methods and the discussion reflects outstanding scholarship.

    4. Reviewer #3 (Public review):

      This paper addresses an important question about how changes in wing morphology vs. wing kinematics change with body size across an important group of high-performance insects, the hoverflies. The biomechanics and morphology convincingly support the conclusions that there is no significant correlation between wing kinematics and size across the eight specific species analyzed in depth and that instead wing morphology changes allometrically. The morphological analysis is enhanced with phylogenetically appropriate tests across a larger data set incorporating museum specimens.

      The authors have made very extensive revisions that have significantly improved the manuscript and brought the strength of conclusions in line with the excellent data. Most significantly, they have expanded their morphological analysis to include museum specimens and removed the conclusions about evolutionary drivers of miniaturization. As a result, the conclusion about morphological changes scaling with body size rather than kinematic properties is strongly supported and very nicely presented with a strong complementary set of data. I only have minor textual edits for them to consider.

    1. eLife Assessment

      This is an overall valuable set of findings on the role of centrally produced estrogens in the control of behaviors in male and female medaka. The significance of the findings rests on the revealed potential mechanism between brain derived estrogens modulating social behaviors in males as well as females. The results are supported by the analysis of multiple transgenic lines although the evidence is incomplete, and further validation would be necessary to fully validate the conclusions on the role of brain-derived estrogens. Nonetheless, the findings have led to helpful hypotheses on the hormonal control of behaviors in teleosts that can be tested further.

    2. Reviewer #1 (Public review):

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically-tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      Strengths:

      • Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones

      • The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation

      • Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'

      • Includes multiple follow-up experiments, which leads to tests of internal replication and an impactful mechanistic proposal

      • Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionary ancient

      Weaknesses:

      • As stated in the summary, the authors are attributing the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either

      • The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering

      • Lack of attribution of previous published work from other research groups that would provide the proper context of the present study

      • There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation"

      • The experimental design for studying aggression in males has flaws. A standard test like a resident-intruder test should be used.

      • While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside

      • The statistics comparing "experimental to experimental" and "control to experimental" isn't appropriate

    3. Reviewer #3 (Public review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of brain-derived estrogens in the control of sexual and aggressive behavior in medaka. The constitutive deletion of Cyp19a1b markedly reduced brain estrogen content in males and to a lesser extent in females. These effects are accompanied by reduced sexual and aggressive behavior in males and reduced preference for males in females. These effects are reversed by adult treatment with supporting a role for estrogens. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of estrogens in social behavior in the most abundant vertebrate taxon, however the conclusion of brain-derived estrogens awaits definitive confirmation.

      Strengths:

      • Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxon are more abundant and yet proportionally less studied that the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.

      • Results obtained from multiple mutant lines converge to show that estrogen signaling, likely synthesized in the brain drives aspects of male sexual behavior.

      • The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.

      • The authors have made important corrections to tone down some of the conclusions which are more in line with the results.

      Weaknesses:

      • No evaluation of the mRNA and protein products of Cyp19a1b and ESR2a are presented, such that there is no proper demonstration that the mutation indeed leads to aromatase reduction. The conclusion that these effects dependent on brain derived estrogens is therefore only supported by measures of E2 with an EIA kit that is not validated. No discussion of these shortcomings is provided in the discussion thus further weakening the conclusion manuscript.

      • Most experiments are weakly powered (low sample size).

      • The variability of the mRNA content for a same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      Conclusions:

      Overall, the claims regarding role of estrogens originating in the brain on male sexual behavior is supported by converging evidence from multiple mutant lines. The role of brain-derived estrogens on gene expression in the brain is weaker as are the results in females.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)>

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      We thank the reviewer for this very positive evaluation of our work and greatly appreciate their helpful comments and suggestions for improving the manuscript. We agree with the comment that the term “neuroestrogens” is misleading. Therefore, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript, including the title.

      In the following sections, “neuroestrogens” has been revised to align with the surrounding context.

      Line 21: “in the brain, also known as neuroestrogens,” → “in the brain.”

      Line 28: “neuroestrogens” → “these estrogens.”

      Line 30: “mechanism of action of neuroestrogens” → “mode of action of brain-derived estrogens.”

      Line 43: “brain-derived estrogens, also called neuroestrogens,” → “estrogens.”

      Line 74: “neuroestrogen synthesis is selectively impaired while gonadal estrogen synthesis remains intact” → “estrogen synthesis in the brain is selectively impaired while that in the gonads remains intact.”

      Line 77: “neuroestrogens” → “these estrogens.”

      Line 335: “levels of neuroestrogens” → “brain estrogen levels.”

      Line 338: “neuroestrogens” → “these estrogens.”

      Line 351: “neuroestrogens” → “these estrogens.”

      Line 357: “neuroestrogen action” → “the action of brain-derived estrogens.”

      Line 359: “neuroestrogens” → “estrogen synthesis in the brain.”

      Line 390: “active synthesis of neuroestrogens” → “active estrogen synthesis in the brain.”

      Line 431: “neuroestrogens” → “estrogens in the brain.”

      Line 431: “neuroestrogen action” → “the action of brain-derived estrogens.”

      Line 433: “neuroestrogen action” → “their action.”

      Strengths:

      Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones.

      The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation.

      Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'.

      Includes multiple follow-up experiments, which lead to tests of internal replication and an impactful mechanistic proposal.

      Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionarily ancient.

      We again thank the reviewer for their positive evaluation of our work.

      Weaknesses:

      (1) As stated in the summary, the authors attribute the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either.

      As noted in Response to reviewer #1’s summary comment, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript.

      Line 63: We have also added the text “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (18– 20).” Following this addition, “This observation suggests” in the subsequent sentence has been replaced with “These observations suggest.”

      The following references (#18–20), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      P. M. Forlano, D. L. Deitcher, D. A. Myers, A. H. Bass, Anatomical distribution and cellular basis for high levels of aromatase activity in the brain of teleost fish: aromatase enzyme and mRNA expression identify glia as source. J. Neurosci. 21, 8943–8955 (2001).

      N. Diotel, Y. Le Page, K. Mouriec, S. K. Tong, E. Pellegrini, C. Vaillant, I. Anglade, F. Brion, F. Pakdel, B. C. Chung, O. Kah, Aromatase in the brain of teleost fish: expression, regulation and putative functions. Front. Neuroendocrinol. 31, 172–192 (2010).

      A. Takeuchi, K. Okubo, Post-proliferative immature radial glial cells female-specifically express aromatase in the medaka optic tectum. PLoS One 8, e73663 (2013).

      (2) The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering.

      Line 282: As the reviewer correctly noted, circles were significantly reduced in mutant males of the Δ8 line, whereas no significant reduction was observed in those of the Δ4 line. However, a tendency toward reduction was evident in the Δ4 line (P = 0.1512), and both lines showed significant differences in fin displays. Based on these findings, we believe our conclusion that esr2a<sup>−/−</sup> males exhibit reduced aggression remains valid. To clarify this point and address potential reader concerns, we have revised the text as follows: “esr2a<sup>−/−</sup> males from both the Δ8 and Δ4 lines exhibited significantly fewer fin displays than their wildtype siblings (P = 0.0461 and 0.0293, respectively). Circles followed a similar pattern, with a significant reduction in the Δ8 line (P = 0.0446) and a comparable but non-significant decrease in the Δ4 line (P = 0.1512) (Fig. 5L; Fig. S8E), showing less aggression.”

      (3) Lack of attribution of previously published work from other research groups that would provide the proper context of the present study.

      In response to this and other comments from this reviewer, we have revised the Introduction and Discussion sections as follows.

      Line 56: “solely responsible” in the Introduction has been modified to “largely responsible”.

      Line 57: “This is consistent with the recent finding in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (10)” has been revised to “This is consistent with recent observations in a few teleost species that genetic ablation of AR severely impairs male-typical behaviors (13–16) and with findings in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in maletypical courtship (12)” to include previous studies on the behavior of AR mutant fish (Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023; Nishiike and Okubo, 2024) in the Introduction.

      Line 65: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)” has been added to the Introduction. This addition provides an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015).

      Line 367: “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31– 33)” has been edited to read “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      After the revisions described above, the following references (#13, 14, and 22) have been added to the reference list, with other references renumbered accordingly:

      L. Yong, Z. Thet, Y. Zhu, Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. J. Exp. Biol. 220, 3017–3021 (2017).

      B. A. Alward, V. A. Laud, C. J. Skalnik, R. A. York, S. A. Juntti, R. D. Fernald, Modular genetic control of social status in a cichlid fish. Proc. Natl. Acad. Sci. U.S.A. 117, 28167–28174 (2020).

      L. A. O’Connell, H. A. Hofmann, Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153, 1341–1351 (2012).

      (4) There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation".

      Line 68: As detailed in Response to reviewer #1’s comment 3 on weaknesses, we have cited previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction.

      The following revisions have also been made to avoid phrases such as “contrary to expectation” and “unexpected.”

      Line 76: “Contrary to our expectations” → “Remarkably.”

      Line 109: “Contrary to this expectation, however” → “Nevertheless.”

      Line 135: “Again, contrary to our expectation, cyp19a1b<sup>−/−</sup> males” → “cyp19a1b<sup>−/−</sup> males.”

      Line 333: “unexpected” → “noteworthy.”

      Line 337: “unexpected” → “notable.”

      (5) The experimental design for studying aggression in males has flaws. A standard test like a resident intruder test should be used.

      We agree that the resident-intruder test is the most commonly used method for assessing aggression. However, medaka form shoals and lack strong territoriality, and even slight dominance differences between the resident and the intruder can increase variability in the results, compromising data consistency. Therefore, in this study, we adopted an alternative approach: placing four unfamiliar males together in a tank and quantifying aggressive interactions in total. This method allows for the assessment of aggression regardless of territorial tendencies, making it more appropriate for our investigation.

      (6) While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside.

      We agree that the data and discussion for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021, Curr Biol, 1699–1710). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when combined with our prior findings, the female data in this study offer valuable insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      (7) The statistics comparing "experimental to experimental" and "control to experimental" aren't appropriate.

      The reviewer raises concerns about the statistical analysis used for Figures 4C and 4E, suggesting that Bonferroni’s test should be used instead of Dunnett’s test. However, Dunnett’s test is commonly used to compare treatment groups to a reference group that receives no treatment, as in our study. Since we do not compare the treated groups with each other, we believe Dunnett’s test is the most appropriate choice.

      Line 619: The reviewer’s concern may have arisen from the phrase “comparisons between control and experimental groups” in the Materials and Methods. We have revised it to “comparisons between untreated and E2-treated groups in Fig. 4, C and D” for clarity.

      Reviewer #2 (Public Review):

      Summary:

      The novelty of this study stems from the observations that neuro-estrogens appear to interact with brain androgen receptors to support male-typical behaviors. The study provides a step forward in clarifying the somewhat contradictory findings that, in teleosts and unlike other vertebrates, androgens regulate male-typical behaviors without requiring aromatization, but at the same time estrogens appear to also be involved in regulating male-typical behaviors. They manipulate the expression of one aromatase isoform, cyp19a1b, that is purported to be brain-specific in teleosts. Their findings are important in that brain estrogen content is sensitive to the brain-specific cyp19a1b deficiency, leading to alterations in both sexual behavior and aggressive behavior. Interestingly, these males have relatively intact fertility rates, despite the effects on the brain.

      We thank this reviewer for their positive evaluation of our work and constructive comments, which we found very helpful in improving the manuscript.

      That said, the framing of the study, the relevant context, and several aspects of the methods and results raise concerns. Two interpretations need to be addressed/tempered:

      (1) that the rescue of cyp19a1b deficiency by tank-applied estradiol is not necessarily a brain/neuroestrogen mode of action, and

      Line 155: cyp19a1b-deficient males exhibited a severe reduction in brain E2 levels, yet their peripheral E2 levels remained comparable to those in wild-type males. Given this hormonal milieu and the lack of behavioral change in wild-type males following E2 treatment, the observed recovery of mating behavior in cyp19a1b-deficient males following E2 treatment can be best explained by the restoration of brain E2 levels. However, as the reviewer pointed out, we cannot rule out the possibility that bath-immersed E2 influenced behavior through an indirect peripheral mechanism. To address this concern, we have modified the text as follows: “These results suggest that reduced E2 in the brain is the primary cause of the mating defects, highlighting a pivotal role of brain-derived estrogens in male mating behavior. However, caution is warranted, as an indirect peripheral effect of bath-immersed E2 on behavior cannot be ruled out, although this is unlikely given the comparable peripheral E2 levels in cyp19a1b-deficient and wild-type males. In contrast to mating.”

      (2) the large increases in peripheral and brain androgen levels in the cyp19a1b deficient animals imply some indirect/compensatory effects of lifelong cyp19a1b deficiency.

      As stated in line 151, androgen/AR signaling has a strong facilitative effect on male-typical behaviors in teleosts. If increased androgen levels in the periphery and brain affected behavior, the expected effect would be facilitative. However, cyp19a1b-deficient males exhibited impaired male-typical behaviors, suggesting that elevated androgen levels were unlikely to be responsible. Although chronic androgen elevation could cause androgen receptor desensitization, which could lead to behavioral suppression, our long-term androgen treatments have consistently promoted, rather than inhibited, male-typical behaviors (e.g., Nishiike et al., Proc Natl Acad Sci USA 121:e2316459121). Hence, this possibility is also highly unlikely.

      Reviewer #3 (Public Review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of neuro-estrogens in the control of sexual and aggressive behavior in teleost fish. The constitutive deletion of Cyp19a1b reduced brain estrogen content by 87% in males and about 50% in females. It led to reduced sexual and aggressive behavior in males and reduced sexual behavior in females. These effects are reversed by adult treatment with estradiol thus indicating that they are activational in nature. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara, and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of neuro-estrogens in social behavior in the most abundant vertebrate taxa. While estrogens are involved in the organization of the brain and behavior of some birds and rodents, neuro-estrogens appear to play an activational role in fish through a facilitatory action of androgen signaling.

      We thank this reviewer for their positive evaluation of our work and comments that have improved the manuscript.

      Strengths:

      Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxa are more abundant and yet proportionally less studied than the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.

      Results obtained from multiple mutant lines converge to show that estrogen signaling drives aspects of male sexual behavior.

      The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.

      We again thank the reviewer for their positive evaluation of our work.

      Weaknesses:

      (1) The new transgenic lines are under-characterized. There is no evaluation of the mRNA and protein products of Cyp19a1b and ESR2a.

      We did not directly assess the function of cyp19a1b and esr2a in our mutant fish. However, the observed reduction in brain E2 levels, with no change in peripheral E2 levels, in cyp19a1b-deficient fish strongly supports the loss of cyp19a1b function. This is stated in the Results section (line 97) as follows: “These results show that cyp19a1b-deficient fish have reduced estrogen levels coupled with increased androgen levels in the brain, confirming the loss of cyp19a1b function.”

      Line 473: A previous study reported that female medaka lacking esr2a fail to release eggs due to oviduct atresia (Kayo et al., 2019, Sci Rep 9:8868). Similarly, in this study, some esr2a-deficient females exhibited spawning behavior but were unable to release eggs, although the sample size was limited (Δ8 line: 2/3; Δ4 line: 1/1). In contrast, this was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function. To incorporate this information into the manuscript, the following text has been added to the Materials and Methods: “A previous study reported that esr2a-deficient female medaka cannot release eggs due to oviduct atresia (59). Likewise, some esr2a-deficient females generated in this study, despite the limited sample size, exhibited spawning behavior but were unable to release eggs (Δ8 line: 2/3; Δ4 line: 1/1), while such failure was not observed in wild-type females (Δ8 line: 0/12; Δ4 line: 0/11). These results support the effective loss of esr2a function.”

      The following reference (#59), cited in the newly added text above, have been included in the reference list:

      D. Kayo, B. Zempo, S. Tomihara, Y. Oka, S. Kanda, Gene knockout analysis reveals essentiality of estrogen receptor β1 (Esr2a) for female reproduction in medaka. Sci. Rep. 9, 8868 (2019).

      (2) The stereotypic sequence of sexual behavior is poorly described, in particular, the part played by the two sexual partners, such that the conclusions are not easily understandable, notably with regards to the distinction between motivation and performance.

      Line 103: To provide a more detailed description of medaka mating behavior, we have revised the text from “The mating behavior of medaka follows a stereotypical pattern, wherein a series of followings, courtship displays, and wrappings by the male leads to spawning” to “The mating behavior of medaka follows a stereotypical sequence. It begins with the male approaching and closely following the female (following). The male then performs a courtship display, rapidly swimming in a circular pattern in front of the female. If the female is receptive, the male grasps her with his fins (wrapping), culminating in the simultaneous release of eggs and sperm (spawning).”

      (3) The behavior of females is only assessed from the perspective of the male, which raises questions about the interpretation of the reduced behavior of the males.

      In medaka, female mating behavior is largely passive, except for rejecting courtship attempts and releasing eggs. Therefore, its analysis relies on measuring the latency to receive following, courtship displays, or wrappings from the male and the frequency of courtship rejection or wrapping refusal. We understand the reviewer’s perspective that cyp19a1b-deficient females might not be less receptive but instead less attractive to males, potentially leading to reduced male mating efforts. However, since these females are approached and followed by males at levels comparable to wild-type females, this possibility appears unlikely. Moreover, cyp19a1b-deficient females tend to avoid males and exhibit a slightly female-oriented sexual preference. While these traits are closely associated with reduced sexual receptivity, they do not readily align with reduced sexual attractiveness. Therefore, it is more plausible to conclude that these females have decreased receptivity rather than being less attractive to males.

      (4) At no point do the authors seem to consider that a reduced behavior of one sex could result from a reduced sensory perception from this sex or a reduced attractivity or sensory communication from the other sex.

      Line 112: As noted above, the impaired mating behavior of cyp19a1b-deficient females is unlikely to be due to reduced attractiveness to males. Similarly, mating behavior tests using esr2b-deficient females as stimulus females suggest that the impaired mating behavior of cyp19a1b-deficient males cannot be attributed to reduced attractiveness to females. However, the possibility that their impaired mating behavior could be attributed to altered cognition or sexual preference cannot be ruled out. To reflect this in the manuscript, we have revised the text “, suggesting that they are less motivated to mate” to “. These results suggest that they are less motivated to mate, though an alternative interpretation that their cognition or sexual preference may be altered cannot be dismissed.”

      (5) Aspects of the methods are not detailed enough to allow proper evaluation of their quality or replication of the data.

      In response to this and other specific comments from this reviewer, we have revised the Materials and Methods section to include more detailed descriptions of the methods.

      Line 469: The following text has been added to describe the method for domain identification in medaka Esr2a: “The DNA- and ligand-binding domains of medaka Esr2a were identified by sequence alignment with yellow perch (Perca flavescens) Esr2a, for which these domain locations have been reported (58).”

      The following reference (#58), cited in the newly added text above, have been included in the reference list:

      S. G. Lynn, W. J. Birge, B. S. Shepherd, Molecular characterization and sex-specific tissue expression of estrogen receptor α (esr1), estrogen receptor βa (esr2a) and ovarian aromatase (cyp19a1a) in yellow perch (Perca flavescens). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 149, 126–147 (2008).

      Line 540: The text “, and the total area of signal in each brain nucleus was calculated using Olyvia software (Olympus)” has been revised to include additional details on the single ISH method as follows: “. The total area of signal across all relevant sections, including both hemispheres, was calculated for each brain nucleus using Olyvia software (Olympus). Images were converted to a 256-level intensity scale, and pixels with intensities from 161 to 256 were considered signals. All sections used for comparison were processed in the same batch, without corrections between samples.”

      Line 596: The following text has been added to include additional details on the double ISH method: “Cells were identified as coexpressing the two genes when Alexa Fluor 555 and fluorescein signals were clearly observed in the cytoplasm surrounding DAPI-stained nuclei, with intensities markedly stronger than the background noise.”

      (6) It seems very dangerous to use the response to a mutant abnormal behavior (ESR2-KO females) as a test, given that it is not clear what is the cause of the disrupted behavior.

      esr2b-deficient females have fully developed ovaries, a normal sex steroid milieu, and sexual attractiveness to males comparable to wild-type females, yet they are completely unreceptive to male courtship (Nishiike et al., 2021, Curr Biol, 1699–1710). Although, as the reviewer noted, the detailed mechanisms underlying this phenotype remain unclear, it is evident that the loss of estrogen/Esr2b signaling in the brain severely impairs sexual receptivity. Therefore, using esr2b-deficient females as stimulus females in the mating behavior test eliminates the influence of female sexual receptivity and male attractiveness to females, enabling the exclusive assessment of male mating motivation. This rationale is already presented in the Results section (lines 116–120), and we believe this experimental design offers a robust framework for assessing male mating motivation.

      Additionally, the mating behavior test with esr2b-deficient females complemented the test with wildtype females, and its results were not the sole basis for our discussion of the male mating behavior phenotype. The results of both tests were largely concordant, and we believe that the conclusions drawn from them are highly reliable.

      Meanwhile, in the test with esr2b-deficient females, cyp19a1b-deficient males were courted more frequently by these females than wild-type males. As the reviewer noted, this may suggest an anomaly in the test. Accordingly, we have confined our discussion to the possibility that “Perhaps cyp19a1b<sup>−/−</sup> males are misidentified as females by esr2b-deficient females because they are reluctant to court or they exhibit some female-like behavior” (line 131).

      (7) Most experiments are weakly powered (low sample size) and analyzed by multiple T-tests while 2 way ANOVA could have been used in several instances. No mention of T or F values, or degrees of freedom.

      Histological analysis was conducted with a relatively small sample size, as our previous experience suggested that interindividual variability in the results would not be substantial. As significant differences were detected in many analyses, further increasing the sample size is unnecessary.

      Although two-way ANOVA could be used instead of multiple T-tests for analyzing the data in Figures 4D, 4F, 6D, S4A, and S4B, we applied the Bonferroni–Dunn correction to control for multiple pairwise comparisons in multiple T-tests. As this comparison method is equivalent to the post hoc test following two-way ANOVA, the statistical results are identical regardless of whether T-tests or two-way ANOVA are used.

      For the data in Figures 4D, 4F, S4A, and S4B, the primary focus is on whether relative luciferase activity differs between E2-treated and untreated conditions for each mutant construct. Therefore, two-way ANOVA is not particularly relevant, as assessing the main effect of construct type or its interaction with E2 treatment does not provide meaningful insights. Similarly, in Figure 6D, the focus is solely on whether wild-type and mutant females differ in time spent at each distance. Given this, two-way ANOVA is unnecessary, as analyzing the main effect of distance is not meaningful.

      Accordingly, two-way ANOVA was not employed in this study, and therefore, its corresponding F values were not included. As the figure legends specify the sample sizes for all analyses, specifying degrees of freedom separately was deemed unnecessary.

      (8) The variability of the mRNA content for the same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      As the reviewer pointed out, the overall area of ara expression is larger in Figure 2J than in Figure 2F. However, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, this difference is unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear less pronounced in Figures 2J and 2K than in Figures 2F and 2H. This is likely attributable to the smaller sample size used in the experiments for Figures 2J and 2K, resulting in less distinct differences. However, as the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      (9) The discussion confuses the effects of estrogens on sexual differentiation (developmental programming = permanent) and activation (= reversible activation of brain circuits in adulthood) of the brain and behavior. Whether sex differences in the circuits underlying social behaviors exist is not clear.

      We recognize that the effects of adult steroids are sometimes not considered to be sexual differentiation, as they do not differentiate the neural substrate, but rather transiently activate the already masculinized or feminized substrate. Arnold (2017, J Neurosci Res 95:291–300) contends that all factors that cause sex differences, including the transient effects of adult steroids, should be incorporated into a theory of sexual differentiation, and indeed, these effects may be the most potent proximate factors that make males and females different. We concur with this perspective and have adopted it as a foundation for our manuscript.

      In teleosts, early developmental exposure to steroids has minimal impact, and sexual differentiation relies primarily on steroid action in adulthood (Okubo et al., 2022, Spectrum of Sex, pp. 111–133). This is evidenced by the effective reversal of sex-typical behaviors through experimental hormonal manipulation in adult teleosts and the absence of transient early-life steroid surges observed in mammals and birds. Accordingly, our discussion on brain sexual differentiation, including the statement in line 347, “This variation among species may represent the activation of neuroestrogen synthesis at life stages critical for sexual differentiation of behavior that are unique to each species”, remains well-supported. Additionally, given these considerations, while sex differences in neural circuit activation are evident in teleosts, substantial structural differences in these circuits are unlikely.

      (10) Overall, the claims regarding the activational role of neuro-estrogens on male sexual behavior are supported by converging evidence from multiple mutant lines. The role of neuroestrogens on gene expression in the brain is mostly solid too. The data for females are comparatively weaker. Conclusions regarding sexual differentiation should be considered carefully.

      We agree that the data for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when integrated with our prior findings, the data on females in this study provide significant insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors set out to answer an intriguing question regarding the hormonal control of innate social behaviors in medaka. Specifically, they wanted to test the effects of cyp19a1b mutation on mating and aggression in males. They also test these effects in females. Their approach takes them down several distinct experimental pathways, including one investigating how cyp19a1a function is related to androgen receptor expression and how estrogens themselves may act on the androgen receptor to modulate its expression, as well as how different esr genes may be involved. The study and its results are valuable and a clear, general conclusion of a pathway from brain aromatase>estrogens>esr genes> androgen receptor can be made. This is important, novel, and impactful. However, there are issues with how the study logic is set up, the approach for assessing certain behaviors, the statistics used, the interpretation of findings, and placing the findings in the proper context based on previous work, which manifests as a general issue where previous work is not properly attributed to.

      Thank you for your thoughtful review. We have carefully addressed each specific comment, as detailed below.

      Major comments:

      (1) The background for the rationale of the current study is misleading and lacks proper context. The authors root the logic of their experiment in determining whether estrogens regulate male-typical behaviors because the current assumption is androgens are "solely responsible" for male-typical behaviors in teleosts. This is not the case. Previous studies have shown aromatase/estrogens are involved in male-typical aggression in teleosts. For example, to name a couple:

      Huffman, L. S., O'Connell, L. A., & Hofmann, H. A. (2013). Aromatase regulates aggression in the African cichlid fish Astatotilapia burtoni. Physiology & behavior, 112, 77-83.

      O'Connell, L. A., & Hofmann, H. A. (2012). Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology, 153(3), 1341-1351.

      And even a recent paper sheds light on a possible AR>aromatase.estradiol hypothesis of male typical behaviors:

      Lopez, M. S., & Alward, B. A. (2024). Androgen receptor deficiency is associated with reduced aromatase expression in the ventromedial hypothalamus of male cichlids. Annals of the New York Academy of Sciences.

      Interestingly, the authors cite Hufmann et al in the discussion, so I don't understand why they make the claims they do about estrogens and male-typical behavior.

      Related to this, is an issue of proper attribution to published work. Indeed, missing are key references from lab groups using AR mutant teleosts. Here are a couple:

      Yong, L., Thet, Z., & Zhu, Y. (2017). Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. Journal of Experimental Biology, 220(17), 3017-3021.

      Alward, B. A., Laud, V. A., Skalnik, C. J., York, R. A., Juntti, S. A., & Fernald, R. D. (2020). Modular genetic control of social status in a cichlid fish. Proceedings of the National Academy of Sciences, 117(45), 28167-28174.

      Ogino, Y., Ansai, S., Watanabe, E., Yasugi, M., Katayama, Y., Sakamoto, H., ... & Iguchi, T. (2023). Evolutionary differentiation of androgen receptor is responsible for sexual characteristic development in a teleost fish. Nature communications, 14(1), 1428.

      As noted in Response to reviewer #1’s comment 3 on weaknesses, we have revised the Introduction and Discussion sections as follows.

      Line 56: “solely responsible” in the Introduction has been modified to “largely responsible”.

      Line 57: The text “This is consistent with the recent finding in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (10)” has been revised to “This is consistent with recent observations in a few teleost species that genetic ablation of AR severely impairs male-typical behaviors (13–16) and with findings in medaka fish (Oryzias latipes) that estrogens act through the ESR subtype Esr2b to prevent females from engaging in male-typical courtship (12)” to include previous studies on the behavior of AR mutant fish (Yong et al., 2017; Alward et al., 2020; Ogino et al., 2023; Nishiike and Okubo, 2024) in the Introduction.

      Line 65: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)” has been added to the Introduction, providing an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015).

      Line 367: “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31– 33)” has been edited to read “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      After the revisions described above, the following references (#13, 14, and 22) have been added to the reference list:

      L. Yong, Z. Thet, Y. Zhu, Genetic editing of the androgen receptor contributes to impaired male courtship behavior in zebrafish. J. Exp. Biol. 220, 3017–3021 (2017).

      B. A. Alward, V. A. Laud, C. J. Skalnik, R. A. York, S. A. Juntti, R. D. Fernald, Modular genetic control of social status in a cichlid fish. Proc. Natl. Acad. Sci. U.S.A. 117, 28167–28174 (2020).

      L. A. O’Connell, H. A. Hofmann, Social status predicts how sex steroid receptors regulate complex behavior across levels of biological organization. Endocrinology 153, 1341–1351 (2012).

      While Lopez and Alward (2024) provide valuable insights into the regulation of cyp19a1b expression by androgens, our study focuses specifically on the functional aspects of cyp19a1b. Expanding the discussion to include expression regulation would divert from the primary focus of our manuscript. For this reason, we have opted not to cite this reference.

      (2) As it is now, the authors are only citing a book chapter/review from their own group. This is a serious issue as it does not provide the proper context for the work. The authors need to fix their issues of attribution to previously published work and the proper interpretation of the work that they are aware of as it pertains to ideas proposed on the roles of androgens and estrogens in the control of male-typical behaviors. This is also important to get the citations right because the common use of "contrary to expectations" when describing their results is actually not correct. Many of the observations are expected to a degree. However, this doesn't take away from a generally stellar experimental design and mostly clear results. The authors do not need to rely on enhancing the impact of their paper by making false claims of unexpected findings. The depth and clarity of your findings are where the impact of your work is.

      As detailed in Response to reviewer #1’s comment 3 on weaknesses, we have cited previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015) in the Introduction.

      Additionally, as noted in Response to reviewer #1’s comment 4 on weaknesses, we have made the following revisions to avoid phrases such as “contrary to expectation” and “unexpected.”

      Line 76: “Contrary to our expectations” → “Remarkably.”

      Line 109: “Contrary to this expectation, however” → “Nevertheless.”

      Line 135: “Again, contrary to our expectation, cyp19a1b<sup>−/−</sup> males” → “cyp19a1b<sup>−/−</sup> males.”

      Line 333: “unexpected” → “noteworthy.”

      Line 337: “unexpected” → “notable.”

      (3) The experimental design for studying aggression in males has flaws. A standard test like a residentintruder test should be used. An assay in which only male mutants are housed together? I do not understand the logic there and the logic for the approach isn't even explained. Too many confounds that are not controlled for. It makes it seem like an aspect of the study that was thrown in as an aside.

      As noted in Response to reviewer #1’s comment 5 on weaknesses, medaka form shoals and lack strong territoriality. As a result, even slight differences in dominance between the resident and intruder can substantially impact the outcomes of the resident-intruder test. Therefore, we adopted an alternative approach in this study.

      (4) Hormonal differences in the mutants seem to vary based on sex, and the meaning of these differences, or how they affect interpreting the findings, wasn't discussed. There was no acknowledegment of the fact that female central E2 was still at 50%, meaning the "rescue" experiments using peripheral injections are not given the proper context. For example, this is different than giving a fish with only 16% of their normal central E2 an E2 injection. Missing as well is a clear hypothesis for why E2 injections did not rescue aggression deficits in cyp19a1b mutant males.

      Line 385: As the reviewer pointed out, the degree of brain estrogen reduction in cyp19a1b-deficient fish differs greatly between males and females. This is likely because females receive a large supply of estrogens from the ovaries. Given that estrogen levels in cyp19a1b-deficient females were 50% of those in wild-type females, it can be inferred that half of their brain estrogens are synthesized locally, while the other half originates from the ovaries. This is an important finding, and we have already noted in the Discussion that “females have higher brain levels of estrogens, half of which are synthesized locally in the brain (i.e., neuroestrogens)” However, as this explanation was not sufficiently clear, we have revised it to “females have higher brain levels of estrogens, with half being synthesized locally and the other half supplied by the ovaries.”

      The reviewer raised a concern that conducting the estrogen rescue experiment in females, where 50% of brain estrogens remain, might be inappropriate. However, as this experiment was conducted exclusively in males, this concern is not applicable.

      Line 377: As noted in the reviewer’s subsequent comment, the failure of aggression recovery in E2treated cyp19a1b-deficient males could be due to insufficient induction of ara/arb expression in aggression-relevant brain regions. To address this concern, we have inserted the following statement into the Discussion after “the development of male behaviors may require moderate neuroestrogen levels that are sufficient to induce the expression of ara and arb, but not esr2b, in the underlying neural circuitry”: “This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.”

      (5) In relation to that, the "null" results may have some of the most interesting implications, but they are barely discussed. For example, what does it mean that E2 didn't restore aggression in male cyp19 mutants? Is this a brain region factor? Could this relate to findings from Lopez et al NYAS, where male and female Ara mutants show different effects on brain-region-specific aromatase expression? And maybe this relates to the different impact of estrogens on ar expression. Were the different effects impacted in aggression areas? Maybe this is why E2 injection didn't retore aggression in males. You could make the argument that: (1) E2 doesn't restore ar expression in aggression regions and that's why there was no rescue. Or (2) that the circuits in adulthood that regulate aggression are NOT dependent on aggression but in early development they are. Another null finding not expanded on is why the two esr2a mutant lines showed differences. There is no reason to trust one line over the other, meaning we still don't know whether esr2a is required for latency to follow.

      As stated in our response to the previous comment, we have added the following text to the Discussion (line 377): “This may account for the lack of aggression recovery in E2-treated cyp19a1b-deficient males in this study.” Meanwhile, as discussed in lines 341–342, it is highly unlikely that the neural circuits regulating aggression are primarily influenced by early-life estrogen exposure, because androgen administration in adulthood alone is sufficient to induce high levels of aggression in both sexes. This notion is further supported by previous observations that cyp19a1b expression in the brain is minimal during embryonic development (Okubo et al., 2011, J Neuroendocrinol, 23:412–423).

      The findings of Lopez and Alward (2024) pertain to the regulation of cyp19a1b expression by androgen receptors. While this represents an important aspect of neuroendocrine regulation, it does not appear to be directly relevant to our discussion on cyp19a1b-mediated regulation of androgen receptor expression.

      To ensure the reliability of behavioral analyses in mutant fish, we consider a phenotype valid only when it is consistently observed in two independent mutant lines. In the mating behavior test examining esr2adeficient males using esr2b-deficient females as stimulus females, Δ8 line males exhibited a shorter latency to initiate following than wild-type males, whereas Δ4 line males did not. This discrepancy led us to refrain from drawing conclusions about the role of esr2a in mating behavior, even though the mating behavior test using wild-type females as stimulus females yielded consistent results in the Δ8 and Δ4 lines. Therefore, we do not consider the reviewer’s concern to be a significant issue.

      (6) Not sure what's going on with the statistics, but it is not appropriate here to treat a "control" group as special. All groups are "experimental" groups. There is nothing special about the control group in this context. all should be Bonferroni post-hoc tests.

      Line 619: As detailed in Response to reviewer #1’s comment 7 on weaknesses, we consider Dunnett’s test the most appropriate choice for the experiments presented in Figures 4C and 4E. We acknowledge that the reviewer’s concern may stem from the phrase “comparisons between control and experimental groups” in the Materials and Methods section. To clarify this point, we have revised it to “comparisons between untreated and E2-treated groups in Fig. 4, C and D” for clarity.

      Minor comments:

      Line 47: then how can you say the aromatization hypothesis is "correct"? it only applies to a few species so far. Need to change the framing, not state so strongly such a vague thing as a hypothesis being "correct".

      Line 45: To address this concern, we have modified “widely accepted as correct” to “widely acknowledged”, ensuring a more precise characterization.

      Figure 1: looks like a dosage effect in males but not females. this should be discussed at some point, even if just to mention a dosage effect exists and put it in context.

      Line 91: We have revised the sentence “In males, brain E2 in heterozygotes (cyp19a1b+/−) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A)” by adding “, indicating a dosage effect of cyp19a1b mutation” to make this point explicit.

      Were male cyp19 KO aggressive towards females?

      We have not observed cyp19a1b-deficient males exhibiting aggressive behavior towards females in our experiments. Therefore, we do not consider them aggressive toward females.

      Please explain how infertility would lead to reduced mating.

      Line 142: As the reviewer has questioned, even if cyp19a1b-deficient males exhibit infertility due to efferent duct obstruction, it is difficult to imagine that this directly leads to reduced mating. However, the inability to release sperm could indirectly affect behavior. To address this, we have added “, possibly due to the perception of impaired sperm release” after “If this is also the case in medaka, the observed behavioral defects might be secondary to infertility.”

      Describe something about the timing of the treatment here. How can peripheral E2 injections restore it when peripheral levels are normal? Did these injections restore central levels? This needs to be shown experimentally.

      Line 517: As described in the Materials and Methods, E2 treatment was conducted by immersing fish in E2-containing water for 4 days. However, we had not explicitly stated that the water was changed daily to maintain the nominal concentration. To clarify this and address reviewer #2’s comment 9, we have revised “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan) or vehicle (ethanol) alone by immersion in water for 4 days” to “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan), which was first dissolved in 100% ethanol (vehicle), or with the vehicle alone by immersion in water for 4 days, with daily water changes to maintain the nominal concentration.”

      Line 522: The treatment effectively restored mating activity and ara/arb expression in the brain, suggesting a sufficient increase in brain E2 levels. However, we did not measure the actual increase, and its extent remains uncertain. To reflect this in the manuscript, we have now added the following sentence: “Although the exact increase in brain E2 levels following E2 treatment was not quantified, the observed positive effects on behavior and gene expression suggest that it was sufficient.”

      I know the nomenclature differs among those who study teleosts, but it's ARa and then gene is ar1 (as an example; arb would be ar2). You're recommended the following citation to remain consistent:

      Munley, K. M., Hoadley, A. P., & Alward, B. A. (2023). A phylogenetics-based nomenclature system for steroid receptors in teleost fishes. General and Comparative Endocrinology, 114436.

      Paralogous genes resulting from the third round of whole-genome duplication in teleosts are typically designated by adding the suffixes “a” and “b” to their gene symbols. This convention also applies to the two androgen receptor genes, commonly referred to as ara and arb. While the alternative names ar1 and ar2 may gain broader acceptance in the future, ara and arb remain more widely used at present. Therefore, we have chosen to retain ara and arb in this manuscript.

      Line 268: how is this "suggesting" less aggression? They literally showed fewer aggressive displays, so it doesn't suggest it - it literally shows it.

      Line 285: Following this thoughtful suggestion, we have changed “suggesting less aggression” to “showing less aggression.”

      Line 317: how can you still call it the primary driver?

      The stimulatory effects of aromatase/estrogens on male-typical behaviors are exerted through the potentiation of androgen/AR signaling. Thus, we still believe that androgens—specifically 11KT in teleosts—serve as the primary drivers of these behaviors.

      Line 318: not all deficits, like aggression, were rescued.

      Line 334: To address this comment, “These behavioral deficits were rescued by estrogen administration, indicating that reduced levels of neuroestrogens are the primary cause of the observed phenotypes: in other words, neuroestrogens are pivotal for male-typical behaviors in teleosts” has been modified and now reads “Deficits in mating were rescued by estrogen administration, indicating that reduced brain estrogen levels are the primary cause of the observed mating impairment; in other words, brain-derived estrogens are pivotal at least for male-typical mating behaviors in teleosts.”

      Line 324: what do you mean by "sufficient"? To show that, you'd have to castrate the male and only give estrogen back. the authors continue to overstate virtually every aspect of their study, seemingly in an unnecessary manner.

      Line 341: Our intention was to convey that brain-derived estrogens early in life are not essential for the expression of male-typical behaviors in teleosts. However, we recognize that the term “sufficient” could be misinterpreted as implying that estrogens alone are adequate, without contributions from other factors such as androgens. To clarify this, we have revised the text from “neuroestrogen activity in adulthood is sufficient for the execution of male-typical behaviors, while that in early in life is not requisite. Thus, while” to “brain-derived estrogens early in life is not essential for the execution of male-typical behaviors. While.”

      Line 329: so? in adult mice, amygdala aromatase neurons still regulate aggression. The amount in adulthood seems less important compared to site-specific functions.

      Line 346: We do not intend to suggest that brain aromatase activity in adulthood plays a negligible role in male behaviors in rodents, as we have already acknowledged its necessity in the Introduction (lines 42–43). To enhance clarity and prevent misinterpretation, we have added “, although it remains important for male behavior in adulthood” to the end of the sentence: “brain aromatase activity in rodents reaches its peak during the perinatal period and thereafter declines with age.”

      Line 351: This contradicts what you all have been saying.

      Line 65: As mentioned in Response to reviewer #1’s comment 3 on weaknesses, the following text has been added to the Introduction: “It is worth mentioning that systemic administration of estrogens and an aromatase inhibitor increased and decreased male aggression, respectively, in several teleost species, potentially reflecting the behavioral effects of brain-derived estrogens (21–24)”, providing an overview of previous studies on the effects of estrogens and aromatase on male fish aggression (Hallgren et al., 2006; O’Connell and Hofmann, 2012; Huffman et al., 2013; Jalabert et al., 2015). With this revision, we believe the inconsistency has been addressed.

      Line 367: Additionally, we have revised the sentence from “treatment of males with an aromatase inhibitor reduces their male-typical behaviors (31–33)” to “treatment of males with an aromatase inhibitor reduces their male-typical behaviors, while estrogens exert the opposite effect (21–24).”

      Line 360: change to "...possibility that is not mutually exclusive,"

      Line 378: We have revised the phrase as suggested from “Another possibility, not mutually exclusive,” to “Another possibility that is not mutually exclusive.”

      Line 363: but it didn't rescue aggression

      Line 381: In response, we have revised the sentence from “This possibility is supported by the present observation that estrogen treatment facilitated mating behavior in cyp19a1b-deficient males but not in their wild-type siblings” to “This possibility is at least likely for mating behavior, as estrogen treatment facilitated mating behavior in cyp19a1b-deficient males but not in their wild-type siblings.”

      Line 367: on average

      To explain the sex differences in the role of aromatase, what about the downstream molecular or neural targets? In mammals, hodology is related to sex differences. there could be convergent sex differences in regulating the same type of behaviors as well.

      Our findings demonstrate that brain-derived estrogens promote the expression of ara, arb, and their downstream target genes vt and gal in males, while enhancing the expression of npba, a downstream target of Esr2b signaling, in females. The identity of additional target genes and their roles in specific neural circuits remain to be elucidated, and we aim to address these in future research.

      Lines 378-382: this doesn't logically follow. pgf2a could be the target of estrogens which in the intact animal do regulate female sexual receptivity. And how can you say this given that your lab has shown in esr2b mutants females don't mate?

      We agree that PGF2α signaling may be activated by estrogen signaling, as stated in lines 404–407: “the present finding provides a likely explanation for this apparent contradiction, namely, that neuroestrogens, rather than or in addition to ovarian-derived circulating estrogens, may function upstream of PGF2α signaling to mediate female receptivity.” The observation that esr2b-deficient females do not accept male courtship is also stated in lines 401–403: “we recently challenged it by showing that female medaka deficient for esr2b are completely unreceptive to males, and thus estrogens play a critical role in female receptivity.”

      Line 396-397: or the remaining estrogens are enough to activate esr2b-dependent female-typical mating behaviors.

      We agree that cyp19a1b deficiency did not completely preclude female mating behavior, most likely because residual estrogens in the brains of cyp19a1b-deficient females enable weak activation of Esr2b signaling. However, the relevant section in the Discussion is not focused on examining why mating behavior persisted, but rather on considering the implications of this finding for the neural circuits regulating mating behavior. Therefore, incorporating the suggested explanation here would shift the focus and would not be appropriate.

      Line 420-421: this is a lot of variation. Was age controlled for?

      The time required for medaka to reach sexual maturity varies with rearing density and food availability. Due to space constraints, we adjust these parameters as needed, which led to variation in the ages of the experimental fish. However, since all experiments were conducted using sibling fish of the same age that had just reached sexual maturity, we believe this does not affect our conclusions.

      Line 457: have these kits been validated in medaka?

      Although we have not directly validated its applicability in medaka, its extensive use in this species suggests that it us unlikely to pose any issues (e.g., Ussery et al., 2018, Aquat Toxicol, 205:58–65; Lee et al., 2019, Ecotoxicol Environ Saf, 173:174–181; Kayo et al., 2020, Gen Comp Endocrinol, 285:113272; Fischer et al., 2021, Aquat Toxicol, 236:105873; Royan et al., 2023, Endocrinology, 164:bqad030).

      Line 589, re fish that spawned: how many times did this happen? Please note it is based on genotype and experiment. This could be important.

      Line 627: In response to this comment, we have added the following details: “Specifically, 7/18 cyp19a1b<sup>+/+</sup>, 11/18 cyp19a1b<sup>+/−</sup>, and 6/18 cyp19a1b<sup>−/−</sup> males were excluded in Fig. 1D; 6/10 cyp19a1b<sup>+/+</sup>, 3/10 cyp19a1b<sup>+/−</sup>, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B; 2/23 esr1+/+ and 5/24 esr1−/− males were excluded in Fig. S7; 2/24 esr2a+/+ and 3/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8A; 0/23 esr2a+/+ and 0/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8B.”

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      (A1) The framing of neuroestrogens being important for male-typical rodents, and not for other vertebrate lineages, does not account for other groups (birds) in which this is true (the authors can consult their cited work by Balthazart (Reference 6) for extensive accounting of this). This makes the novelty clause in the abstract "indicating that neuro-estrogens are pivotal for male-typical behaviors even in nonrodents" less surprising and should be acknowledged by the authors by amending or omitting this novelty clause. The findings regarding androgen receptor transcription (next sentence) are more important and pertinent.

      Line 27: We recognize that the aromatization hypothesis applies to some birds, including zebra finches, as stated in the Introduction (lines 48–49) and Discussion (lines 432–433). However, this was not reflected in the Abstract. Following the reviewer’s suggestion, we have changed “in non-rodents” to “in teleosts.”

      (A2) The medaka line that has been engineered to have aromatase absent in the brain is presented briefly in the abstract, but can be misinterpreted as naturally occurring. This should be amended, by including something like "engineered" or "directed mutant" before 'male medaka fish'.

      Line 24: We have added “mutagenesis-derived” before “male medaka fish” in response to this comment.

      Introduction:

      (I1) The paragraph on teleost brain aromatase should acknowledge that while the capacity for estrogen synthesis in the brain is 100-1000 fold higher in teleosts as compared to rodents and other vertebrates, the majority of this derives from glial and not neural sources. This can be confusing for readers since the term 'neuroestrogens' often refers to the neuronal origin and signalling. And this observation includes the exclusive radial glial expression of cyp19a1b in medaka (Diotel et al., 2010), and first discovered in midshipman (Forlano et al., 2001), each of which should also be cited here. In addition, the authors expend much text comparing teleosts and rodents, but it is worth expanding these kinds of comparisons, especially by pointing out that parts of the primate brain are found to densely express aromatase (see work by Ei Terasawa and others).

      In response to this comment and a similar comment from reviewer #1, we have replaced “neuroestrogens” with “brain-derived estrogens” or “brain estrogens” throughout the manuscript.

      Line 63: We have also added the text “In teleost brains, including those of medaka, aromatase is exclusively localized in radial glial cells, in contrast to its neuronal localization in rodent brains (18– 20).” As a result of this addition, we have changed “This observation suggests” to “These observations suggest” in the subsequent sentence.

      Line 51: Additionally, to include information on aromatase in the primate brain, we have added the following text: “In primates, the hypothalamic aromatization of androgens to estrogens plays a central role in female gametogenesis (10) but is not essential for male behaviors (7, 8).”

      The following references (#10 and 18–20), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      E. Terasawa, Neuroestradiol in regulation of GnRH release. Horm. Behav. 104, 138–145 (2018).

      P. M. Forlano, D. L. Deitcher, D. A. Myers, A. H. Bass, Anatomical distribution and cellular basis for high levels of aromatase activity in the brain of teleost fish: aromatase enzyme and mRNA expression identify glia as source. J. Neurosci. 21, 8943–8955 (2001).

      N. Diotel, Y. Le Page, K. Mouriec, S. K. Tong, E. Pellegrini, C. Vaillant, I. Anglade, F. Brion, F. Pakdel, B. C. Chung, O. Kah, Aromatase in the brain of teleost fish: expression, regulation and putative functions. Front. Neuroendocrinol. 31, 172–192 (2010).

      A. Takeuchi, K. Okubo, Post-proliferative immature radial glial cells female-specifically express aromatase in the medaka optic tectum. PLoS One 8, e73663 (2013).

      (I2) It is difficult to resolve from the introduction and work cited how restricted cyp19a1b is to the medaka brain. Important for the results of this study, it is not clear whether it is more of a bias in the brain vs other tissues, or if the cyp19a1b deficiency is restricted to the brain, and gonadal/peripheral cyp19 expression persists. The authors need to improve their consideration of the alternatives, i.e., that this manipulation is not somehow affecting: 1) peripheral aromatase expression (either cyp19a1a or cyp19a1b) in the gonad or elsewhere, 2) compensatory processes, such as other steroidogenic genes (are androgen synthesizing enzymes increasing?).

      Our previous study demonstrated that cyp19a1b is expressed in the gonads, but at levels tens to hundreds of times lower than those in the brain (Okubo et al., 2011, J Neuroendocrinol 23:412–423). Additionally, a separate study in medaka reported that cyp19a1b expression in the ovary is considerably lower than that of cyp19a1a (Nakamoto et al., 2018, Mol Cell Endocrinol 460:104–122). Given these observations, any potential effect of cyp19a1b knockout on peripheral estrogen synthesis is likely negligible. Indeed, Figures S1C and S1D confirm that cyp19a1b knockout does not alter peripheral E2 levels.

      Line 72: To incorporate this information into the Introduction and address the following comment, we have added the following text: “In medaka, cyp19a1b is also expressed in the gonads, but only at a level tens to hundreds of times lower than in the brain and substantially lower than that of cyp19a1a (26, 27).”

      The following references (#26 and 27), cited in the newly added text above, have been included in the reference list, with other references renumbered accordingly:

      K. Okubo, A. Takeuchi, R. Chaube, B. Paul-Prasanth, S. Kanda, Y. Oka, Y. Nagahama, Sex differences in aromatase gene expression in the medaka brain. J. Neuroendocrinol. 23, 412–423 (2011).

      M. Nakamoto, Y. Shibata, K. Ohno, T. Usami, Y. Kamei, Y. Taniguchi, T. Todo, T. Sakamoto, G. Young, P. Swanson, K. Naruse, Y. Nagahama, Ovarian aromatase loss-of-function mutant medaka undergo ovary degeneration and partial female-to-male sex reversal after puberty. Mol. Cell. Endocrinol. 460, 104–122 (2018).

      We have not assessed whether the expression of other steroidogenic enzymes is altered in cyp19a1bdeficient fish, and this may be investigated in future studies.

      (I3) Related, there are documented sex differences in the brain expression of cyp19a1b especially in adulthood (Okubo et al 2011) and this study should be cited here for context.

      Line 72: As stated in our previous response, we have cited Okubo et al. (2011) by adding the following sentence: “In medaka, cyp19a1b is also expressed in the gonads, but only at a level tens to hundreds of times lower than in the brain and substantially lower than that of cyp19a1a (26, 27).”

      Methods

      (M1) The rationale is unclear as presented for using mutagen screening for cype19a1b while using CRISPR for esr2a. Are there methodological/biochemical reasons why the authors chose to not use the same method for both?

      At the time we generated the cyp19a1b knockouts, genome editing was not yet available, and the TILLING-based screening was the only method for obtaining mutants in medaka. In contrast, by the time we generated the esr2a knockouts, CRISPR/Cas9 had become available, enabling a more efficient and convenient generation of knockout lines. This is why the two knockout lines were generated using different methods.

      (M2) Measurement of steroids in biological matrices is not straightforward, and it is good that the authors use multiple extraction steps (organic followed by C18 columns) before loading samples on the ELISA plates, which are notoriously sensitive. Even though these methods have been published before by this group of authors previously, the quality control and ELISA performance values (recovery, parallelism, etc.) should be presented for readers to evaluate.

      Thank you for appreciating our sample purification method. Unfortunately, we have not evaluated the recovery rate or parallelism, but we recognize this a subject for future studies.

      (M3) Mating behavior - E2 treated males were not co-housed with social partners for the full 24 hr before testing, but instead a few hours (?) prior to testing. The rationale for this should be spelled out explicitly.

      Line 494: In response to this comment, we have added “to ensure the efficacy of E2 treatment” to the end of the sentence “The set-up was modified for E2-treated males, which were kept on E2 treatment and not introduced to the test tanks until the day of testing.”

      (M4) The E2 treatment is listed as 1ng/ml vs. vehicle (ethanol). Is the E2 dissolved in 100% ethanol for administration to the tank water? Clarification is needed.

      Line 517: As the reviewer correctly assumed, E2 was first dissolved in 100% ethanol before being added to the tank water. To provide this information and address reviewer #1’s minor comment 5, we have revised “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan) or vehicle (ethanol) alone by immersion in water for 4 days” to “males were treated with 1 ng/ml of E2 (Fujifilm Wako Pure Chemical, Osaka, Japan), which was first dissolved in 100% ethanol (vehicle), or with the vehicle alone by immersion in water for 4 days, with daily water changes to maintain the nominal concentration.”

      (M5) The authors exclude fish from the analysis of courtship display behavior for those individuals that spawned immediately at the start of the testing (and therefore it was impossible to register courtship display behaviors). How often did fish in the various treatment groups exhibit this "fast spawning" behavior? Was the occurrence rate different by treatment group? It is unlikely that these omissions from the data set drove large-scale patterns, but an indication of how often this occurred would be reassuring.

      Line 627: In response to this comment, we have included the following details: “Specifically, 7/18 cyp19a1b<sup>+/+</sup>, 11/18 cyp19a1b<sup+/−</sup>, and 6/18 cyp19a1b<sup>−/−</sup> males were excluded in Fig. 1D; 6/10 cyp19a1b+/+, 3/10 cyp19a1b+/−, and 6/10 cyp19a1b<sup>−/−</sup> females were excluded in Fig. 6B; 2/23 esr1+/+ and 5/24 esr1−/− males were excluded in Fig. S7; 2/24 esr2a+/+ and 3/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8A; 0/23 esr2a+/+ and 0/23 esr2a<sup>−/−</sup> males were excluded in Fig. S8B.” These data indicate that the proportion of excluded males is nearly constant within each trial and is independent of the genotype of the focal fish.

      Results

      (R1) It is striking to see the genetic-'dose' dependent suppression of brain E2 content by heterozygous and homozygous cyp19a1b deficiency, indicating that, as the authors point out, the majority of E2 in the male medaka brain (and 1/2 in the female brain) have a brain-derived origin. It is important also for the interpretation that there are large compensatory increases in brain levels of androgens, when E2 levels drop in the cyp19a1b mutant homozygotes. This latter point should receive more attention.

      Also, there are large increases in peripheral androgen levels in the homozygote mutants for cyp19a1b in both males and females. This indicates a peripheral effect in addition to the clear brain knockdown of E2 synthesis. These nuances need to be addressed.

      In response to this comment, we have revised the Results section as follows:

      Line 91: “, indicating a dosage effect of cyp19a1b mutation” has been added to the end of the sentence “In males, brain E2 in heterozygotes (cyp19a1b<sup>+/−</sup>) was also reduced to 45% of the level in wild-type siblings (P = 0.0284) (Fig. 1A).”

      Line 94: To draw more attention to the increase in brain androgen levels caused by cyp19a1b deficiency, “Brain levels of testosterone” has been modified to “Strikingly, brain levels of testosterone.”

      Line 100: “Their peripheral 11KT levels also increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D)” has been modified and now reads “In addition, peripheral 11KT levels in cyp19a1b<sup>−/−</sup> males and females increased 3.7- and 1.8-fold, respectively (P = 0.0789, males; P = 0.0118, females) (Fig. S1, C and D), indicating peripheral influence in addition to central effects.”

      (R2) The interpretation on page 4 that cyp19a1b deficient males are 'less motivated' to mate is premature, given the behavioral measures used in this study. There are several competing explanations for these findings (e.g., alterations in motivation, sensory discrimination, preference, etc.) that could be followed up in future work, but the current results are not able to distinguish among these possibilities.

      Line 112: We agree that the possibility of altered cognition or sexual preference cannot be dismissed. To incorporate this perspective, we have revised the text “, suggesting that they are less motivated to mate” to “These results suggest that they are less motivated to mate, though an alternative interpretation that their cognition or sexual preference may be altered cannot be dismissed.”

      (R3) On page 5, the authors present that peripheral E2 manipulation (delivery to the fish tank) restores courtship behavior in males, and then go on to erroneously conclude that this demonstrates "that reduced E2 in the brain was the primary cause of the mating defects, indicating a pivotal role of neuroestrogens in male mating behavior." Because this is a peripheral E2 treatment, there can be manifold effects on gonadal physiology or other endocrine events that can have indirect effects on the brain and behavior. Without manipulation of E2 directly to the brain to 'rescue' the cyp19a1b deficiency, the authors cannot conclude that these effects are directly on the central nervous system. Tellingly, the tank E2 treatment did not rescue aggressive behavior, suggestive of the potential for indirect effects.

      Line 155: As detailed in Response to reviewer #2’s specific comment 1, we have revised the text from “These results demonstrated that reduced E2 in the brain was the primary cause of the mating defects, indicating a pivotal role of neuroestrogens in male mating behavior. In contrast” to “These results suggest that reduced E2 in the brain is the primary cause of the mating defects, highlighting a pivotal role of brain-derived estrogens in male mating behavior. However, caution is warranted, as an indirect peripheral effect of bath-immersed E2 on behavior cannot be ruled out, although this is unlikely given the comparable peripheral E2 levels in cyp19a1b-deficient and wild-type males. In contrast to mating.”

      (R4) The downregulation of androgen-dependent gene expression (vasotocin in pNVT and galanin in pPMp) in the cyp19a1b deficient males (Figure 3) could be due to exceedingly high levels of brain androgens in the cyp19a1b deficient males. The best way to test the idea that estrogens can restore the expression to be more wild-type directly (like what is happening for ara and arb) is to look at these same markers (vasotocin and galanin) in these same brain areas in the brains of E2-treated males. The authors should have these brains from Figure 2. Unless I missed something, those experiments were not performed/reported here. It is clear that the ara and arb receptors have EREs and are 'rescued' by E2 treatment, but in principle, there could be indirect actions for reasons stated above for the behavior due to the peripheral E2 tank application.

      Thank you for your insightful comment. We agree that the current results cannot exclude the possibility that excessive androgen levels caused the downregulation of vt and gal. However, our previous studies showed that excessive 11KT administration to gonadectomized males and females increased the expression of these genes to levels comparable to wild-type males (Yamashita et al., 2020, eLife, 9:e59470; Kawabata-Sakata et al., 2024, Mol Cell Endocrinol 580:112101), making this scenario unlikely. That said, testing whether estrogen treatment restores vt and gal expression in cyp19a1bdeficient males would be informative, and we see this as an important direction for future research.

      Discussion

      (D1) The authors need to clarify whether EREs are found in other vertebrate AR introns, or is this unique to the teleost genome duplication?

      We have identified multiple ERE-like sequences within intron 1 of the mouse AR gene. However, sequence data alone do not provide sufficient evidence of their functionality, rendering this information of limited relevance. Therefore, we have chosen not to include this discussion in the current paper.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors are strongly encouraged to report information regarding the effect of Cyp19a1b deletion on the brain content of aromatase protein (ideally both isoforms investigated separately) as the two isoforms are mostly but not completely brain vs gonad specific. The analysis of other tissues would also strengthen the characterization of this model.

      We agree that measuring aromatase protein levels in the brain of our fish would be valuable for confirming the loss of cyp19a1b function. However, as no suitable method is currently available, this issue will need to be addressed in future studies. While this constitutes indirect evidence, the observed reduction in brain E2 levels, with no change in peripheral E2 levels, in cyp19a1b-deficient fish strongly suggests the loss of cyp19a1b function, as noted in Response to reviewer #3’s comment 1 on weaknesses.

      (2) As presented, this study reads as niche work. A better description of the behavior and reproductive significance of the different aspects of the behavioral sequence would allow a better understanding of the results and would thus allow the non-specialist to appreciate the significance of the observations.

      Line 103: In response to this comment and Reviewer #3’s comment 2 on weaknesses, we have revised the sentence from “The mating behavior of medaka follows a stereotypical pattern, wherein a series of followings, courtship displays, and wrappings by the male leads to spawning” to “The mating behavior of medaka follows a stereotypical sequence. It begins with the male approaching and closely following the female (following). The male then performs a courtship display, rapidly swimming in a circular pattern in front of the female. If the female is receptive, the male grasps her with his fins (wrapping), culminating in the simultaneous release of eggs and sperm (spawning)” in order to provide a more detailed description of medaka mating behavior.

      (3) The data regarding female behavior are limited and incomplete. It is suggested to keep this for another manuscript unless data on the behavior of the female herself is added. Indeed, analyzing female's behavior from the male's perspective complicates the interpretation of the results while a description of what the females do would provide valuable and interpretable information.

      We thank the reviewer for this thoughtful suggestion and agree that the data and discussion for females are less extensive than for males. However, we have previously elucidated the mechanism by which estrogen/Esr2b signaling promotes female mating behavior (Nishiike et al., 2021). Accordingly, it follows that the new insights into female behavior gained from the cyp19a1b knockout model are more limited than those for males. Nevertheless, when combined with our prior findings, the female data in this study offer valuable insights, and the overall mechanism through which estrogens promote female mating behavior is becoming clearer. Therefore, we do not consider the female data in this study to be incomplete or merely supplementary.

      (4) In Figure 2, the validity to run multiple T-tests rather than a two-way ANOVA comparing TRT and genotype is questionable. Moreover, why are the absolute values in CTL higher than in the initial experiment comparing genotypes for ara in PPa, pPPp, and NVT as well as for arb in aPPp. More importantly, these graphs do not seem to reproduce the genotype effects for ara in pPPp and NVT and for arb in aPPp.

      The data in Figures 2J and 2K were analyzed with an exclusive focus on the difference between vehicletreated and E2-treated males, without considering genotype differences. Therefore, the use of T-tests for significance testing is appropriate.

      As the reviewer noted, the overall ara expression area is larger in Figure 2J than in Figure 2F. However, as detailed in Response to reviewer #3’s comment 8 on weaknesses, the relative area ratios of ara expression among brain nuclei are consistent between the two figures, indicating the reproducibility of the results. Thus, we consider this difference unlikely to affect the conclusions of this study.

      Additionally, the differences in ara expression in pPPp and arb expression in aPPp between wild-type and cyp19a1b-deficient males appear smaller in Figures 2J and 2K compared to Figures 2F and 2H. This is likely due to the smaller sample size used in the experiments for Figures 2J and 2K, which makes the differences less distinct. However, since the same genotype-dependent trends are observed in both sets of figures, the conclusion that ara and arb expression is reduced in cyp19a1b-deficient male brains remains valid.

      (5) More information is required regarding the analysis of single ISH - How was the positive signal selected from the background in the single ISH analyses? How was this measure standardized across animals? How many sections were imaged per region? Do the values represent unilateral or bilateral analysis?

      Line 540: Following this comment, we have provided additional details on the single ISH method in the manuscript. Specifically, “, and the total area of signal in each brain nucleus was calculated using Olyvia software (Olympus)” has been revised to “The total area of signal across all relevant sections, including both hemispheres, was calculated for each brain nucleus using Olyvia software (Olympus). Images were converted to a 256-level intensity scale, and pixels with intensities from 161 to 256 were considered signals. All sections used for comparison were processed in the same batch, without corrections between samples.”

      (6) More information should be provided in the methods regarding the image analysis of double ISH. In particular, what were the criteria to consider a cell as labeled are not clear. This is not clear either from the representative images.

      Line 596: To provide additional details on the single ISH method in the manuscript, we have added the following sentence: “Cells were identified as coexpressing the two genes when Alexa Fluor 555 and fluorescein signals were clearly observed in the cytoplasm surrounding DAPI-stained nuclei, with intensities markedly stronger than the background noise.”

      (7) There is no description of the in silico analyses run on ESR2a in the methods.

      The method for identifying estrogen-responsive element-like sequences in the esr2a locus is described in line 549: “Each nucleotide sequence of the 5′-flanking region of ara and arb was retrieved from the Ensembl medaka genome assembly and analyzed for potential canonical ERE-like sequences using Jaspar (version 5.0_alpha) and Match (public version 1.0) with default settings.”

      However, the method for domain identification in Esr2a was not described. Therefore, we have added the following text in line 469: “The DNA- and ligand-binding domains of medaka Esr2a were identified by sequence alignment with yellow perch (Perca flavescens) Esr2a, for which these domain locations have been reported (58).”

      The following reference (#58), cited in the newly added text above, have been included in the reference: S. G. Lynn, W. J. Birge, B. S. Shepherd, Molecular characterization and sex-specific tissue expression of estrogen receptor α (esr1), estrogen receptor βa (esr2a) and ovarian aromatase (cyp19a1a) in yellow perch (Perca flavescens). Comp. Biochem. Physiol. B Biochem. Mol. Biol. 149, 126–147 (2008).

      (8) Information about the validation steps of the EIA that were carried out as well as the specificity of the antibody the steroids and the extraction efficacy should be provided.

      We have not directly validated the applicability of the EIA kit, but its extensive use in medaka suggests that it us unlikely to pose any issues (e.g., Ussery et al., 2018, Aquat Toxicol, 205:58–65; Lee et al., 2019, Ecotoxicol Environ Saf, 173:174–181; Kayo et al., 2020, Gen Comp Endocrinol, 285:113272; Fischer et al., 2021, Aquat Toxicol, 236:105873; Royan et al., 2023, Endocrinology, 164:bqad030).

      The specificity (cross-reactivity) of the antibodies is detailed as follows.

      (1) Estradiol ELISA kits: estradiol, 100%; estrone, 1.38%; estriol, 1.0%; 5α-dihydrotestosterone, 0.04%; androstenediol, 0.03%; testosterone, 0.03%; aldosterone, <0.01%; cortisol, <0.01%; progesterone, <0.01%.

      (2) Testosterone ELISA kits: testosterone, 100%; 5α-dihydrotestosterone, 27.4%; androstenedione, 3.7%; 11-ketotestosterone, 2.2%; androstenediol, 0.51%; progesterone, 0.14%; androsterone, 0.05%; estradiol, <0.01%.

      (3) 11-Keto Testosterone ELISA kits: 11-ketotestosterone, 100%; adrenosterone, 2.9%; testosterone, <0.01%.

      As this information is publicly available on the manufacturer’s website, we deemed it unnecessary to include it in the manuscript.

      Unfortunately, we have not evaluated the extraction efficacy of the samples, but we recognize this a subject for future studies.

      (9) I wonder whether the evaluation of the impact of the mutation by comparing the behavior of a group of wild-type males to a group of mutated males is the most appropriate. Justifying this approach against testing the behavior of one mutated male facing one or several wild-type males would be appreciated.

      We agree that the resident-intruder test, in which a single focal resident is confronted with one or more stimulus intruders, is the most commonly used method for assessing aggression. However, medaka form shoals and lack strong territoriality, and even slight dominance differences between the resident and the intruder can increase variability in the results, compromising data consistency. Therefore, in this study, we adopted an alternative approach: placing four unfamiliar males together in a tank and quantifying aggressive interactions in total. This method allows for the assessment of aggression regardless of territorial tendencies, making it more appropriate for our investigation.

      (10) Lines 329-331: this sentence should be rephrased as it contributes to the confusion between sexual differentiation and activation of circuits. The restoration of sexual behavior by adult estrogen treatment pleads in favor of an activational role of neuro-estrogens on behavior rather than an organizational role. Therefore, referring to sexual differentiation is misleading, even more so that the study never compares sexes.

      As detailed in Response to reviewer #3’s comment 9 on weaknesses, we consider that all factors that cause sex differences, including the transient effects of adult steroids, need to be incorporated into a theory of sexual differentiation. In teleosts, since steroids during early development have little effect and sexual differentiation primarily relies on steroid action in adulthood, our discussion on brain sexual differentiation remains valid, including the statement in line 347: “This variation among species may represent the activation of neuroestrogen synthesis at life stages critical for sexual differentiation of behavior that are unique to each species.”

      (11) Lines 384-386: I may have missed something but I do not see data supporting the notion that neuroestrogens may function upstream of PGF2a signaling to mediate female receptivity.

      Line 403: We acknowledge that our explanation was insufficient and apologize for any confusion. To clarify this point, “Given that estrogen/Esr2b signaling feminizes the neural substrates that mediate mating behavior, while PGF2α signaling triggers female sexual receptivity,” has been added before the sentence “The present finding provides a likely explanation for this apparent contradiction, namely, that neuroestrogens, rather than or in addition to ovarian-derived circulating estrogens, may function upstream of PGF2α signaling to mediate female receptivity.”

      Additional alteration

      Reference list (line 682): a preprint article has now been published in a peer-reviewed journal, and the information has been updated accordingly as follows: “bioRxiv doi: 10.1101/2024.01.10.574747 (2024)” to “Proc. Natl. Acad. Sci. U.S.A. 121, e2316459121 (2024).”

    1. eLife Assessment

      This important study combines imaginative experiments to demonstrate the relevance of poroelasticity in the mechanical properties of cells across physiologically relevant time and length scales. Through innovative experiments and a finite element model, the authors present solid evidence that cytosolic flows and pressure gradients can persist in cells with permeable membranes, generating spatially segregated influx and outflux zones. These findings will be of interest to the cell biology and biophysics communities. Nevertheless, a more in depth discussion of why other possible explanations for the long time scales associated to mechanical propagation are less effective could further strengthen their message.

    2. Reviewer #1 (Public review):

      Summary:

      This work investigated whether cytoplasmic poroelastic properties play an important role in cellular mechanical response over length scales and time scales relevant to cell physiology. Overall, the manuscript concludes that intracellular cytosolic flows and pressure gradients are important for cell physiology and that they act of time- and length-scales relevant to mechanotransduction and cell migration.

      Strengths:

      Their approach integrates both computational and experimental methods. The AFM deformation experiments combined with measuring z-position of beads is a challenging yet compelling method to determine poroelastic contributions to mechanical realization.

      The work is quite interesting and will be of high value to the field of cell mechanics and mechanotransduction.

      Weaknesses:

      However, there are several issues related to the lack of description of theoretical equations, experimental details, and data transparency that should be addressed, including the following:

      (1) Some details are not described for experimental procedures. For example, what were the pharmacological drugs dissolved in, and what vehicle control was used in experiments? How long were pharmacological drugs added to cells?

      (2) Details are missing from the Methods section and Figure captions about the number of biological and technical replicates performed for experiments. Figure 1C states the data are from 12 beads on 7 cells. Are those same 12 beads used in Figure 2C? If so, that information is missing from the Figure 2C caption. Similarly, this information should be provided in every figure caption so the reader can assess the rigor of the experiments. Furthermore, how heterogenous would the bead displacements be across different cells? The low number of beads and cells assessed makes this information difficult to determine.

      (3) The full equation for displacement vs. time for a poroelastic material is not provided. Scaling laws are shown, but the full equation derived from the stress response of an elastic solid and viscous fluid is not shown or described.

    3. Reviewer #2 (Public review):

      Summary:

      Malboubi et al. present a novel experimental framework to investigate the rheological properties of the cell cytoplasm. Their findings support a model where the cytoplasm behaves as a poroelastic material governed by Darcy's law - a property overlooked in previous literature. They demonstrate that this poroelastic behavior delays the equilibration of hydrostatic pressure gradients within the cytoplasm over timescales of 1 to 10 seconds following a perturbation, likely due to fluid-solid friction within the cytoplasmic matrix. Furthermore, under sustained perturbations such as depressurization, they reveal that pressure gradients can persist for minutes, which they propose might potentially influence physiological processes like mechanotransduction or cell migration typically happening on these timescales.

      Strengths:

      This article holds significant value within the ongoing efforts of the cell biology and biophysics communities to quantitatively characterize the mechanical properties of cells. The experiments are innovative and thoughtfully contextualized with quantitative estimates and a finite element model that supports the authors' hypotheses.

      Comments & Questions:

      While the hypothesis of a poroelastic cytoplasm is insightful and supported by the results, certain parts of the paper (detailed below) rely on qualitative arguments. Given the experimental approaches and accompanying modeling, the study has the potential for more in-depth discussions and stronger quantitative evidence. Placing greater emphasis on quantifications and direct comparisons between the model and experimental data would enhance the work. Additionally, exploring the limitations of the proposed model would add valuable depth to the paper.

      The authors state, "Next, we sought to quantitatively understand how the global cellular response to local indentation might arise from cellular poroelasticity." However, the evidence presented in the following paragraph appears more qualitative than strictly quantitative. For instance, the length scale estimate of ~7 μm is only qualitatively consistent with the observed ~10 μm, and the timescale 𝜏𝑧 ≈ 500 ms is similarly described as "qualitatively consistent" with experimental observations. Strengthening this point would benefit from more direct evidence linking the short timescale to cell surface tension. Have you tried perturbing surface tension and examining its impact on this short-timescale relaxation by modulating acto-myosin contractility with Y-27632, depolymerizing actin with Latrunculin, or applying hypo/hyperosmotic shocks?

      The authors demonstrate that the second relaxation timescale increases (Figure 1, Panel D) following a hyperosmotic shock, consistent with cytoplasmic matrix shrinkage, increased friction, and consequently a longer relaxation timescale. While this result aligns with expectations, is a seven-fold increase in the relaxation timescale realistic based on quantitative estimates given the extent of volume loss?

      If the authors' hypothesis is correct, an essential physiological parameter for the cytoplasm could be the permeability k and how it is modulated by perturbations, such as volume loss or gain. Have you explored whether the data supports the expected square dependency of permeability on hydraulic pore size, as predicted by simple homogeneity assumptions? Additionally, do you think that the observed decrease in k in mitotic cells compared to interphase cells is significant? I would have expected the opposite naively as mitotic cells tend to swell by 10-20 percent due to the mitotic overshoot at mitotic entry (see Son Journal of Cell Biology 2015 or Zlotek Journal of Cell Biology 2015).

      Based on your results, can you estimate the pore size of the poroelastic cytoplasmic matrix? Is this estimate realistic? I wonder whether this pore size might define a threshold above which the diffusion of freely diffusing species is significantly reduced. Is your estimate consistent with nanobead diffusion experiments reported in the literature?

      Do you have any insights into the polymer structures that define this pore size? For example, have you investigated whether depolymerizing actin or other cytoskeletal components significantly alters the relaxation timescale?

      There are no quantifications in Figure 6, nor is there a direct comparison with the model. Based on your model, would you expect the velocity of bleb growth to vary depending on the distance of the bleb from the pipette due to the local depressurization? Specifically, do blebs closer to the pipette grow more slowly?

      I find it interesting that during depressurization of the interphase cells, there is no observed volume change, whereas in pressurization of metaphase cells, there is a volume increase. I assume this might be a matter of timescale, as the microinjection experiments occur on short timescales, not allowing sufficient time for water to escape the cell. Do you observe the radius of the metaphase cells decreasing later on? This relaxation could potentially be used to characterize the permeability of the cell surface.

      I am curious about the saturation of the time lag at 30 microns from the pipette in Figure 4, Panel E for the model's prediction. A saturation which is not clearly observed in the experimental data. Could you comment on the origin of this saturation and the observed discrepancy with the experiments (Figure E panel 2)? Naively, I would have expected the time lag to scale quadratically with the distance from the pipette, as predicted by a poroelastic model and the diffusion of displacement. It seems weird to me that the beads start to move together at some distance from the pipette or else I would expect that they just stop moving. What model parameters influence this saturation? Does membrane permeability contribute to this saturation?

    4. Reviewer #3 (Public review):

      Summary:

      In this delightful study, the authors use local indentation of the cell surface combined with out-of-focus microscopy to measure the rates of pressure spread in the cell and to argue that the results can be explained with the poroelastic model. Osmotic shock that decreases cytoskeletal mesh size supports this notion. Experiments with water injection and water suction further support it, and also, together with a mechanical model and elegant measurements of decreasing fluorescence in the cell 'flashed' by external flow, demonstrate that the membrane is permeable, and that steady flow and pressure gradient can exist in a cell with water source/sink in different locations. Use of blebs as indicators of the internal pressure further supports the notion of differential cytoplasmic pressure.

      Strengths:

      The study is very imaginative, interesting, novel and important.

      Weaknesses: I have two broad critical comments:

      (1) I sense that the authors are correct that the best explanation of their results is the passive poroelastic model. Yet, to be thorough, they have to try to explain the experiments with other models and show why their explanation is parsimonious. For example, one potential explanation could be some mechanosensitive mechanism that does not involve cytoplasmic flow; another could be viscoelastic cytoskeletal mesh, again not involving poroelasticity. I can imagine more possibilities. Basically, be more thorough in the critical evaluation of your results. Besides, discuss the potential effect of significant heterogeneity of the cell.

      (2) The study is rich in biophysics but a bit light on chemical/genetic perturbations. It could be good to use low levels of chemical inhibitors for, for example, Arp2/3, PI3K, myosin etc, and see the effect and try to interpret it. Another interesting question - how adhesive strength affects the results. A different interesting avenue - one can perturb aquaporins. Etc. At least one perturbation experiment would be good.

    1. eLife Assessment

      Alignment and sequencing errors are a major concern in molecular evolution, and this valuable study represents a welcome improvement for genome-wide scans of positive selection. This new method seems to perform well and is generally convincing, although the evidence could be made more direct and more complete through additional simulations to determine the extent to which alignment errors are being properly captured.

    2. Reviewer #1 (Public review):

      Summary:

      Selberg et al. present a small but apparently very relevant modification to the existing BUSTED model. The new model allows for a fraction of codons to be assigned to an error class characterized by a very high dN/dS value. This "omega_e" category is constrained to represent no more than 1% of the alignment. The analyses convincingly show that the method performs well and represents a real improvement for genome-wide scans of positive selection. Alignment and sequencing errors are a major concern in molecular evolution. This new method, which shows strong performance, is therefore a very welcome contribution.

      Strengths:

      By thoroughly reanalyzing four datasets, the manuscript convincingly demonstrates that omega_e effectively identifies genuine alignment errors. Next, the authors evaluate the reduction in power to detect true selection through simulations. This new model is simple, efficient, and computationally fast. It is already implemented and available in HYPHY software.

      As a side note, I found it particularly interesting how the authors tested the statistical support for the new method compared to the simpler version without the error class. In many cases, the simpler model could not be statistically rejected in favor of the more complex model, despite producing biologically incorrect results in terms of parameter inference. This highlights a broader issue in molecular evolution and phylogenomics, where model selection often relies too heavily on statistical tests, potentially at the expense of biological realism. The analyses also reveal a trade-off between statistical power and the false positive rate. As with other methods, BUSTED-E cannot distinguish between alignment/sequencing errors and episodes of very strong positive selection. The authors are transparent about this limitation in the discussion.

      Weaknesses:

      Regarding the structure of the manuscript, the text could be clearer and more precise. Clear, practical recommendations for users could also be provided in the Results section. Additionally, the simulation analyses could be further developed to include scenarios with both alignment errors and positive selection, in order to better assess the method's performance. Finally, the model is evaluated only in the context of site models, whereas the widely used branch-site model is mentioned as possible but not assessed.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Selberg et al present an extension of their widely used BUSTED family of codon models for the detection of episodic ("site-branch") positive selection from coding gene sequences. The extension adds an "error component" to ω (dN/dS) to capture misaligned codons. This ω component is set to an arbitrarily high value to distinguish it from positive selection, which is characterised by ω > 1 but assumed not to be so high.

      The new method is tested on several datasets of comparative genomes, characterised by their size and the fact that the authors scanned for positive selection and/or provided filtering of alignment quality. It is also tested on simple simulations.

      Overall, the new method appears to capture relatively little of the ω variability in the alignments, although it is often significant. Given the complexity of codon evolution, adding a new parameter is more or less significant, and the question is whether it captures the signal that is intended, preferably in an unbiased manner.

      Strengths:

      This is an important issue, and I am enthusiastic to see it explicitly modeled within the codon modeling framework, rather than externalised to ad hoc filtering methods. The promise of quantifying the divergence signal from alignment error vs selection is exciting.

      The BUSTED family of models is widely used and very powerful for capturing many aspects of codon evolution, and it is thus an excellent choice for this extension.

      Weaknesses:

      (1) The definition of alignment error by a very large ω is not justified anywhere in the paper. There are known cases of bona fide positive selection with many non-synonymous and 0 synonymous substitutions over branches. How would they be classified here? E.g., lysosyme evolution, bacterial experimental evolution.

      Using the power of the model family that the authors develop, I would suggest characterising a more specific error model. E.g., radical amino-acid "changes" clustered close together in the sequence, proximity to gaps in the alignment, correlation of apparent ω with genome quality.

      Also concerning this high ω, how sensitive is its detection to computational convergence issues?

      (2) The authors should clarify the relation between the "primary filter for gross or large-scale errors" and the "secondary filter" (this method). Which sources of error are expected to be captured by the two scales of filters? What is their respective contribution to false positives of positive selection?

      Sources of error in the alignment of coding genes include:

      a) Errors in gene models, which may differ between species but also propagate among close species (i.e., when one species is used as a reference to annotate others).

      b) Inconsistent choice of alternative transcripts/isoforms.

      Both of these lead to asking an alignment algorithm to align non-homologous sequences, which violates the assumptions of the algorithms, yet both are common issues in phylogenomics.

      c) Sequencing errors, but I doubt they affect results much here.

      d) Low complexity regions of proteins.

      e) Aproximations by alignment heuristics, sometimes non-deterministic or dependent on input order.

      f) Failure to capture aspects of protein or gene evolution in the optimality criteria used.

      For example, Figure 1 seems to correspond to a wrong or inconsistent definition of the final exon of the gene in one species, which I would expect to be classified as "gross or large-scale error".

      (3) The benchmarking of the method could be improved both for real and simulated data.

      For real data, the authors only analysed sequences from land vertebrates with relatively low Ne and thus relatively low true positive selection. I suggest comparing results with e.g. Drosophila genomes, where it has been reported that 50% of all substitutions are fixed by positive selection, or with viral evolution.

      For simulations, the authors should present simulations with or without alignment errors (e.g., introduce non-homologous sequences, or just disturb the alignments) and with or without positive selection, to measure how much the new method correctly captures alignment errors and incorrect positive selection.

      I also recommend simulating under more complex models, such as multinucleotide mutations or strong GC bias, and investigating whether these other features are captured by the alignment error component.

      Finally, I suggest taking true alignments and perturbing them (e.g., add non-homologous segments or random gaps which shift the alignment locally), to verify how the method catches this. It would be interesting to apply such perturbations to genes which have been reported as strong examples of positive selection, as well as to genes with no such evidence.

      (4) It would be interesting to compare to results from the widely used filtering tool GUIDANCE, as well as to the Selectome database pipeline (https://doi.org/10.1093/nar/gkt1065). Moreover, the inconsistency between BUSTED-E and HMMCleaner, and BMGE is worrying and should be better explained.

      (5) For a new method such as this, I would like to see p-value distributions and q-q plots, to verify how unbiased the method is, and how well the chi-2 distribution captures the statistical value.

      (6) I disagree with the motivation expressed at the beginning of the Discussion: "The imprimatur of "positive selection" has lost its luster. Researchers must further refine prolific candidate lists of selected genes to confirm that the findings are robust and meaningful." Our goal should not be to find a few impressive results, but to measure accurately natural selection, whether it is frequent or rare.

    4. Author response:

      eLife Assessment

      Alignment and sequencing errors are a major concern in molecular evolution, and this valuable study represents a welcome improvement for genome-wide scans of positive selection. This new method seems to perform well and is generally convincing, although the evidence could be made more direct and more complete through additional simulations to determine the extent to which alignment errors are being properly captured.

      We thank the editors for their positive assessment and for highlighting the core strength and a key area for improvement. The main request (also echoed by both reviewers) is for us to conduct additional simulation studies where true alignment errors are known and assess the performance of BUSTED-E. We plan to conduct several simulations (on the order of 100,000 individual alignments in total) in response to that request, with the caveat that we are not aware of any tools that simulate realistic alignment errors, so these simulations are likely only a pale reflection of biological reality.

      (1) Ad hoc small local edits of alignments similar to what was implemented in the HMMCleaner paper. These local edits would include operations like replacement of codons or small stretches of sequences with random data, local transposition, inversion.

      (a) Using parametrically simulated alignments (under BUSTED models).

      (b) Using empirical alignments.

      (2) Simulations under model misspecification, specifically to address the point of reviewer 2. For example, we would simulate under models that allow for multi-nucleotide substitutions, and then apply error filtering under models which do not.

      We will also run several new large-scale screens of existing alignments, to directly and indirectly address the reviewers comments. These will include

      (a) A drosophila dataset (from https://academic.oup.com/mbe/article/42/4/msaf068/8092905)

      (b) Current Selectome data (https://selectome.org/), both filtered and unfiltered. Here the filtering procedure refers to what Selectome does to obtain what its authors think are high quality alignments.

      (c) Current OrthoMam data, both (https://orthomam.mbb.cnrs.fr/) filtered and unfiltered. Here the filtering procedure refers to what OrthoMam does to obtain what its authors think are high quality alignments.

      Reviewer #1:

      We are grateful to Reviewer #1 for their positive and encouraging review. We are pleased they found our analyses convincing and recognized BUSTED-E as a "simple, efficient, and computationally fast" improvement for evolutionary scans.

      Strengths:

      As a side note, I found it particularly interesting how the authors tested the statistical support for the new method compared to the simpler version without the error class. In many cases, the simpler model could not be statistically rejected in favor of the more complex model, despite producing biologically incorrect results in terms of parameter inference. This highlights a broader issue in molecular evolution and phylogenomics, where model selection often relies too heavily on statistical tests, potentially at the expense of biological realism.

      We agree that this observation touches upon a critical issue in phylogenomics. A statistically "good" fit does not always equate to a biologically accurate model. We believe our work serves as a useful case study in this regard. We will add discussion of the importance of considering biological realism alongside statistical adequacy in model selection.

      Weaknesses:

      Regarding the structure of the manuscript, the text could be clearer and more precise.

      We appreciate this feedback. We will perform a thorough revision of the entire manuscript to improve its clarity, flow, and precision. We will focus on streamlining the language and ensuring that our methodological descriptions and results are as unambiguous as possible.

      Clear, practical recommendations for users could also be provided in the Results section.

      To make our method more accessible and its application more straightforward, we will add a new section that provides clear, practical recommendations for users. This includes guidance on when to apply BUSTED-E, how to interpret its output, and best practices for distinguishing potential errors from strong selection.

      Additionally, the simulation analyses could be further developed to include scenarios with both alignment errors and positive selection, in order to better assess the method's performance.

      Additional simulations will be conducted (see above)

      Finally, the model is evaluated only in the context of site models, whereas the widely used branch-site model is mentioned as possible but not assessed.

      BUSTED class models support branch-site variation in dN/dS, so technically all of our analyses are already branch-site. However, we interpret the reviewer’s comment as describing use cases when a method is used to test for selection on a subset of tree branches (as opposed to the entire tree). BUSTED-E already supports this ability, and we will add a section in the manuscript describing how this type of testing can be done, including examples. However, we do not plan to conduct additional extensive data analyses or simulations, as this would probably bloat the manuscript too much.

      Reviewer #2:

      We thank Reviewer #2 for their detailed and thought-provoking comments, and for their enthusiasm for modeling alignment issues directly within the codon modeling framework. The criticisms raised are challenging and we will work on improving the justification, testing, and contextualization of our method.

      Weaknesses:

      The definition of alignment error by a very large ω is not justified anywhere in the paper... I would suggest characterising a more specific error model. E.g., radical amino-acid "changes" clustered close together in the sequence, proximity to gaps in the alignment, correlation of apparent ω with genome quality... Also concerning this high ω, how sensitive is its detection to computational convergence issues?

      This is a fundamental point that we are grateful to have the opportunity to clarify. Our intention with the high ω category is not to provide a mechanistic or biological definition of an alignment error. Rather, its purpose is to serve as a statistical "sink" for codons exhibiting patterns of divergence so extreme that they are unlikely to have resulted from a typical selective process. It is phenomenological and ad hoc. The reviewer makes sensible suggestions for other ad hoc/empirical approaches to alignment quality filtering, but most of those have already been implemented in existing (excellent) alignment filtering tools. BUSTED-E is never meant to replace them, but rather to catch what is left over. Importantly, error detection is not even the primary goal of BUSTED-E; errors are treated as a statistical nuisance. With all due respect, all of the reviewers suggestions are similarly ad hoc -- there is no rigorous quantitative justification for any of them, but they are all sensible and plausible, and usually work in practice.

      Computational convergence issues can never be fully dismissed, but we do not consider this to be a major issue. Our approach already pays careful attention to proper initialization, does convergence checks, considers multiple initial starting points. We also don’t need to estimate large ω with any degree of precision, it just needs to be “large”.

      The authors should clarify the relation between the "primary filter for gross or large-scale errors" and the "secondary filter" (this method). Which sources of error are expected to be captured by the two scales of filters?

      We will add discussion and examples to explicitly define the distinct and complementary roles of these filtering stages.

      The benchmarking of the method could be improved both for real and simulated data... I suggest comparing results with e.g. Drosophila genomes... For simulations, the authors should present simulations with or without alignment errors... and with or without positive selection... I also recommend simulating under more complex models, such as multinucleotide mutations or strong GC bias...

      We will add more simulations as suggested (see above). We will also analyze a drosophila gene alignment from previously published papers.

      It would be interesting to compare to results from the widely used filtering tool GUIDANCE, as well as to the Selectome database pipeline... Moreover, the inconsistency between BUSTED-E and HMMCleaner, and BMGE is worrying and should be better explained.

      Some of the alignments we have analyzed had already been filtered by GUIDANCE. We’ll also run the Selectome data through BUSTED-E: both filtered and unfiltered. We consider it beyond the scope of this manuscript to conduct detailed filtering pipeline instrumentation and side-by-side comparison.

      For a new method such as this, I would like to see p-value distributions and q-q plots, to verify how unbiased the method is, and how well the chi-2 distribution captures the statistical value.

      We will report these values for new null simulations.

      I disagree with the motivation expressed at the beginning of the Discussion... Our goal should not be to find a few impressive results, but to measure accurately natural selection, whether it is frequent or rare.

      That’s a philosophical point; at some level, given enough time, every single gene likely experiences some positive selection at some point in the evolutionary past. The practically important question is how to improve the sensitivity of the methods while controlling for ubiquitous noise. We do agree with the sentiment that the ultimate goal is to “measure accurately natural selection, whether it is frequent or rare”. However, we also must be pragmatic about what is possible with dN/dS methods on available genomic data.

    1. eLife Assessment

      In this valuable study, the authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance in ovarian cancer using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors convincingly identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need. However, the extent to which these findings may extend to more complex models of ovarian cancer remains unclear.

    2. Reviewer #1 (Public review):

      Summary:

      The authors provide a simple yet elegant approach to identifying therapeutic targets that synergize to prevent therapeutic resistance using cell lines, data-independent acquisition proteomics, and bioinformatic analysis. The authors identify several combinations of pharmaceuticals that were able to overcome or prevent therapeutic resistance in culture models of ovarian cancer, a disease with an unmet diagnostic and therapeutic need.

      Strengths:

      The manuscript utilizes state-of-the-art proteomic analysis, entailing data-independent acquisition methods, an approach that maximizes the robustness of identified proteins across cell lines. The authors focus their analysis on several drugs under development for the treatment of ovarian cancer and utilize straightforward thresholds for identifying proteomic adaptations across several drugs on the OVSAHO cell line. The authors utilized three independent and complementary approaches to predicting drug synergy (NetBox, GSEA, and Manual Curation). The drug combination with the most robust synergy across multiple cell lines was the inhibition of MEK and CDK4/6 using PD-0325901+Palbociclib, respectively. Additional combinations, including PARPi (rucaparib) and the fatty acid synthase inhibitor (TVB-2640). Collectively, this study provides important insight and exemplifies a solid approach to identifying drug synergy without large drug library screens.

      Weaknesses:

      The manuscript supports their findings by describing the biological function(s) of targets using referenced literature. While this is valuable, the number of downstream targets for each initial target is extensive, thus, the current work does not attempt to elucidate the mechanism of their drug synergy. Responses to drugs are quantified 72 hours after treatment and exclusively focused on cell viability and protein expression levels. The discovery phase of experimentation was solely performed on the OVSAHO cell line. An additional cell line(s) would increase the impact of how the authors went about identifying synergistic targets using bioinformatics. Ovarian cancer is elusive to treatment as primary cancer will form spheroids within ascites/peritoneal fluids in a state of pseudo-senescence to overcome environmental stress. The current manuscript is executed in 2D culture, which has been demonstrated to deviate from 3D, PDX, and primary tumours in terms of therapeutic resistance (DOI: 10.3390/cancers13164208). Collectively, the manuscript is insufficient in providing additional mechanistic insight beyond the literature, and its interpretation of data is limited to 2D culture until further validated.

    3. Reviewer #2 (Public review):

      Summary:

      Franz and colleagues combined proteomics analysis of OVSAHO cell lines treated with 6 individual drugs. The quantitative proteomics data were then used for computational analysis to identify candidates/modules that could be used to predict combination treatments for specific drugs.

      Strengths:

      The authors present solid proteomics data and computational analysis to effectively repeat at the proteomics level analysis that have previously been done predominantly with transcriptional profiling. Since most drugs either target proteins and/or proteins are the functional units of cells, this makes intuitive sense.

      Weaknesses:

      Considering the available resources of the involved teams, performing the initial analysis in a single HGSC cell is certainly a weakness/limitation.

      The data also shows how challenging it is to correctly predict drug combinations. In Table 2 (if I read it correctly), the majority of the drug combinations predicted for the initial cell line OVSAHO did not result in the predicted effect. It also shows how variable the response was in the different HGSC cell lines used for the combination treatment. The success rate will most likely continue to drop as more sophisticated models are being used (i.e., PDX). Human patients are even more challenging.

      It would most likely be useful to more directly mention/discuss these caveats in the manuscript.

    1. eLife Assessment

      This is a valuable study that suggests that HPV-human DNA junctions can be identified from cfDNA in women with cervical cancer and that detection of these junctions is indicative of recurrence. The evidence supporting the conclusions is incomplete, in part because the numbers of reads identifying breakpoints in tumor samples or in circulating cell-free serum samples are not provided. More quantitative analysis will be required to confirm that the breakpoints represented in cell-free DNA can be used as a surrogate to monitor the recurrence of cervical cancer cells, and additional patient studies would also be needed to strengthen the study. This work will be of interest to those who study and treat cervical cancer as well as other HPV-related malignancies.

    2. Reviewer #1 (Public review):

      Van Arsdale and colleagues evaluated whether human-HPV DNA junctions could be detected in serum, cell-free DNA from 16 patients with cervical cancer by hybrid capture and Illumina sequencing. Junctions were identified in seven patients, and these junctions were concordant with junctions identified in tumor DNA except for one patient, suggesting that, in most cases, the cfDNA is originating from a clone of the primary tumor. Junction detection at 6 months was found to be statistically significant prognostic for recurrence. The study further validates that type-specific E7 DNA, which is essential for tumorigenesis, was detectable by PCR for most patient sera, but had no association with recurrence. Furthermore, the study provides additional evidence that tumors harboring non-alpha-9 clade HPVs had shorter recurrence-free survival and overall worse outcome from the study's patients, as well as reanalysis of TCGA data. However, these findings need to be more extensively discussed in the context of previous publications. One identified limitation of this approach is the detection of non-tumor HPVs, but this was only seen in one patient. The major shortcoming of this study is the limited number of patients that were evaluated, but for a retrospective study, this is a reasonable number of patients evaluated, and the findings are appropriately not overstated. The design, execution, and detailed analysis of the sequencing data are a major strength. This study provides important foundational evidence for further evaluating the clinical utility of HPV DNA detection from cfDNA and specifically assessing for integration junctions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to identify cell-free HPV breakpoint junctions and assess their utility in identifying cervical cancer recurrence as a surrogate, tumor-specific assay. They added unrelated findings about a potential relationship between various viral types and cancer recurrence frequencies, concluding that clade alpha 9 types recurred at a lower rate than did non-alpha 9 viral types.

      Strengths:

      The authors analyzed 16 cervical cancer samples and matched serum samples collected initially or upon clinical treatments. An association between virus types and cancer recurrence frequencies is a novel finding that will likely induce further insights into HPV pathogenic mechanisms.

      Weaknesses:

      The main claims of this manuscript are only partially supported by the data as presented, because the sequencing data are not quantified and were not analyzed in a statistically adequate way. First, only one or at most two breakpoints are presented per tumor (Table 1). This finding is discrepant from many extensive, published genomics studies of HPV-positive cancers, in which many unique breakpoints are found frequently in individual cancers, ranging from 1 or 2 up to more than 100. Second, no information is provided about likely correlations between genomic DNA copy number at rearranged loci and breakpoint-identifying sequencing read counts. Third, no direct comparison is presented between supporting read counts from cancer samples and read counts from circulating cell-free DNA samples. Fourth, many of the initial cancer samples harbored no insertional breakpoints, so no correlation with breakpoints in the serum samples would be possible. Fifth, no mention was made about tumor heterogeneity, where a given breakpoint may not be present in every cell of the tumor. Previous literature about the general topic of using cell-free DNA breakpoints as a surrogate for cancer cells is not cited adequately. Findings about potential correlations between various viral types and variable recurrence rates are not well-supported by the authors' own data, because of the limited sample numbers studied. This section of the paper is relatively unrelated to the main thrust, which is about breakpoint detection.

    1. eLife Assessment

      This study presents important findings on increased ground beetle diversity in strip cropping compared with crop monocultures. Solid methods are used to analyze data from multiple sites with heterogeneous systems of mixed crops, allowing broad conclusions, albeit at the expense of lacking taxonomic specificity. The work will be of interest to all those applying plant diversity treatments to improve the diversity of associated animals in agricultural fields.

    2. Reviewer #3 (Public review):

      Summary: In this paper the authors examined the effects of strip cropping, a relatively new agricultural technique of alternating crops in small strips of several meters wide, on ground beetle diversity. The results show an increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures.

      Strengths: The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, unbalanced and taxonomically unspecific yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch. Moreover, after the first round of reviews, the authors have done a great job at rewriting the paper to make it less overstated, more relevant to the data at hand and more solid in the findings. Many of the weaknesses noted in the first review have been dealt with. The overall structure of the paper is good, with a clear introduction, hypotheses, results section and discussion.

      Weaknesses: The weaknesses that remain are mainly due to a difficult dataset and choices that could have stressed certain aspects more, like the relationship between strip cropping and intercropping. The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similar to intercropping, a technique which has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness.

      Unfortunately, the authors do not go into this in the introduction or otherwise and simply state that they consider strip cropping a form of intercropping.

      I also do not like the exclusive focus on percentages, as these are dimensionless. I think more could have been done to show underlying structure in the data, even after rarefaction.

      A further weakness is a limited embedding into the larger scientific discourses other than providing references. But this may be a matter of style and/or taste

    3. Author response:

      The following is the authors’ response to the original reviews.

      We thank all reviewers for the highly detailed review and the time and effort which has been invested in this review. It is clear from the reviews that we’ve had the privilege to have our work extensively and thoroughly checked by knowledgeable experts, for which we are very grateful. We have read their perspectives, questions and suggested improvements with great interest. We have reflected on the public review in detail and have included detailed responses below. First, we would like to respond to four main issues pointed out by the editor and reviewers:

      (1) Lack of yield data in the manuscript: Yield data has been collected in most of the sites and years of our study, and these have already been published and cited in our manuscript. In the appendix of our manuscript, we included a table with yield data for the sites and years in which the beetle diversity was studied. These data show that strip cropping does not cause a systematic yield reduction.

      (2) Sampling design clarification: Our paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases this resulted in variations in how data were collected or processed (e.g. taxonomic level of species identification). We have added more details to the sections on sampling design and data analysis to increase clarity and transparency.

      (3) Additional data analysis: In the revised manuscript we present an analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. This gives better insight in the variation of responses among ground beetle taxa.

      (4) Restrict findings to our system: We nuanced our findings further and focused more on the implications of our data on ground beetle communities, rather than on agrobiodiversity in a broader sense.

      Below we also respond to the editor and reviewers in more detail.

      Reviewing Editor Comments:

      (1) You only have analyzed ground beetle diversity, it would be important to add data on crop yields, which certainly must be available (note that in normal intercropping these would likely be enhanced as well).

      Most yield data have been published in three previous papers, which we already cited or cite now (one was not yet published at the time of submission). Our argumentation is based on these studies. We had also already included a table in the appendix that showed the yield data that relates specifically to our locations and years of measurement. The finding that strip cropping does not majorly affect yield is based on these findings. We revised the title of our manuscript to remove the explicit focus on yield.

      (2) Considering the heterogeneous data involving different experiments it is particularly important to describe the sampling design in detail and explain how various hierarchical levels were accounted for in the analysis.

      We agree that some important details to our analysis were not described in sufficient detail. Especially reviewer 2 pointed out several relevant points that we did account for in our analyses, but which were not clear from the text in the methods section. We are convinced that our data analyses are robust and that our conclusions are supported by the data. We revised the methods section to make our approach clearer and more transparent.

      (3) In addition to relative changes in richness and density of ground beetles you should also present the data from which these have been derived. Furthermore, you could also analyze and interpret the response of the different individual taxa to strip cropping.

      With our heterogeneous dataset it was quite complicated to show overall patterns of absolute changes in ground beetle abundance and richness, especially for the field-level analyses. As the sampling design was not always the same and occasionally samples were missing, the number of year series that made up a datapoint were different among locations and years. However, we always made sure that for the comparison of a paired monoculture and strip cropping field, the number of year series was always made equal through rarefaction. That is, the number of ground beetle(s) (species) are always expressed as the number per 2 to 6 samples. Therefore, we prefer to stick to relative changes as we are convinced that this gives a fairer representation of our complex dataset.

      We agree with the second point that both the editor and several reviewers pointed out. The indicator species analyses that we used were biased by rare species, and we now omit this analysis. Instead, we included a GLM analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. We chose for genera here (and not species) as we could then include all locations and years within the analyses, and in most cases a genus was dominated by a single species (but notable exceptions were Amara and Harpalus, which were often made up of several species). We illustrate these analyses still in a similar fashion as we did for the indicator species analysis.

      (4) Keep to your findings and don't overstate them but try to better connect them to basic ecological hypotheses potentially explaining them.

      After careful consideration of the important points that reviewers point out, we decided to nuance our reasoning about biodiversity conservation along two key lines: (1) the extent to which ground beetles can be indicators of wider biodiversity changes; and (2) our findings that are not as straightforward positive as our narrative suggests. We still believe that strip cropping contributes positively to carabid communities, and have carefully checked the text to avoid overstatements.

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      We thank reviewer 1 for their kind words and agree with this strength of the paper. The paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight variations in how data were collected or processed (e.g. taxonomic level of species identification).

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

      We acknowledge that we indeed did not formally analyze yield in our study, but we have good reason for this. The claim that strip cropping does not compromise yield comes from several extensive studies (Juventia & van Apeldoorn, 2024; Ditzler et al., 2023; Carillo-Reche et al., 2023) that were conducted in nearly all the sites and years that we included in our study. We chose not to include formal analyses of productivity for two key reasons: (1) a yield analysis would duplicate already published analyses, and (2) we prefer to focus more on the ecology of ground beetles and the effect of strip cropping on biodiversity, rather than diverging our focus also towards crop productivity. Nevertheless, we have shown the results on yield in Table S6 and refer extensively to the studies that have previously analyzed this data (line 203-207, 217-221).

      Reviwer #1 (Recommendations for the authors):

      This is a well-written study on the effects of strip cropping on ground-beetle diversity. As stated above the study is well analyzed, presented, and written but you should not pretend that you analyzed yield e.g. lines 25-27 "We show that strip cropping...enhance ground beetle biodiversity without incurring major yield loss.

      We understand the confusion caused by this sentence, and it was never our intention to give the impression that we analyzed yield losses. These findings were based on previous research by ourselves and colleagues, and we have now changed the sentence to reflect this (line 25-27).

      I think you assume that yield does not differ between strip cropping and monoculture. I am not sure this is correct as one crop might attract pests or predators spilling over to the other crop. I am also not sure if the sowing and harvest of the crop will come with the same costs. So if you assume this, you should only do it in the main manuscript and not the abstract, to justify this better.

      With three peer-reviewed papers on the same fields as we studied, we can convincingly state that strip cropping in organic agriculture generally does not result in major yield loss, although exceptions exist, which we refer to in the discussion.

      In the introduction lines 28-43, you refer to insect biomass decline. I wonder if you would like to add the study of Loboda et al. 2017 in Ecography. It seems not fitting as it is from the Artic but also the other studies you cite are not only coming from agricultural landscapes and this study is from the same time as the Hallmann et al. 2017 study and shows a decline in flies of 80%

      We have removed the sentence that this comment refers to, to streamline the introduction more.

      Lines 50-51. You only have one citation for biodiversity strategies in agricultural systems. I suggest citing Mupepele et al. 2021 in TREE. This study refers to management but also the policies and societal pressures behind it.

      We have added this citation and a recent paper by Cozim-Melges et al. (2024) here (line 49-52).

      In the methods, I am missing a section on species identifications. This would help to understand why you used "taxonomic richness".

      Thanks for pointing this out. We have now included a new section on ground beetle identification (line 304-309 in methods).

      Figure 1 is great and I like that you separated the field and crop-level data, although there is no statistical power for the crop-specific data. I personally would move k to the supplements. It is very detailed and small and therefore hard to read

      We chose to keep figure 1k, as in our view it gives a good impression of the scale of the experiment, the number of crops included and the absolute numbers of caught species.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      We did not intend to investigate the effect of strip cropping on crop yields, but rather place our work in the framework of earlier studies that already studied yield. All the monocultures and strip cropping fields were organic farms. Our findings thus compare crop diversity effects within the context of organic farming.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      We appreciate the kind words of reviewer 2 and their acknowledgement of the extensiveness of our dataset. In our opinion, the inclusion of many different crops is indeed a strength, rarely seen in similar studies; and we are happy that the figures are appreciated.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      The paper indeed combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight differences in the experimental design. At the time that we did our research, there were only a handful of farmers that were employing strip cropping within the Netherlands, which greatly reduced the number of fields for our study. Therefore, we worked in the sites that were available and studied as many crops on these sites. Since there was variation in the crops grown in the sites, for some crops we have limited replication. In the revision we have explained this more clearly (line 297-300).

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      Point well taken. Samples were always taken at least 10 meters into the field, and always in the middle of the strip. This would indeed mean that there is a small difference between the 3- and 6m wide strips regarding distance from another strip, but this was then only a difference of 1.5 to 3 meters from the edge. A difference that, based on our own extensive experience with ground beetle communities, will not have a large impact on the findings of ground beetles. The distance from field/plot edges was similar between monocultures and strip cropped fields. We present a more detailed description of the sampling design in the methods of the revised manuscript (line 294-297).

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      To be clear, this limitation relates to the comparison that we did for the community compositions of ground beetles in two crops either in strip cropping or monocultures. In this case, it was impossible to avoid potential autocorrelation due to our field design. We also acknowledge this limitation in the results section (line 130-133). However, for our other analyses we corrected for spatial autocorrelation by including variables per location, year and crop. This grouped samples that were spatially autocorrelated. Therefore, we don’t see this as a discrepancy of our other analyses.

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      We agree and acknowledge that crop type can influence carabid richness and density, which is why we have included variables to account for differences caused by crops. However, we did not observe consistent differences between crops in how strip cropping affected ground beetle richness and density. Therefore, we don’t think that crop types would have influenced our conclusions on the overall effect of strip cropping.

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      Point well taken. We have clarified this further in the revised manuscript (line 294-301, 318-322). As we combined data from several experimental designs that originally had slightly different research questions, this in part caused differences between numbers of rounds or samples per crop, location or year.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      We agree and we dealt with this issue by using year series instead of using individual samples of different rounds. This approach allowed us to get a good impression of the entire ground beetle community across seasons. For our analyses we had the choice to only include data from sampling rounds that were conducted at the same time, or to include all available data. We chose to analyze all data, and made sure that the number of samples between strip cropping and monoculture fields per location, year and crop was always the same by pooling and rarefaction.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      We did not include landscape structure as there are only 4 sites, which does not allow a meaningful analysis of potential effects landscape structure. Studying how landscape interacts with strip cropping to influence insect biodiversity would require at least, say 15 to 20 sites, which was not feasible for this study. However, such an analysis may be possible in an ongoing project (CropMix) which includes many farms that work with strip cropping.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In the revised manuscript we further clarified this point (line 365-366, 373-374).

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

      In the revised manuscript, we nuanced our findings by explaining that strip cropping is a potentially useful tool to support ground beetle biodiversity in agricultural fields (line 33-35).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points stated under 'Weaknesses' above, I provide smaller comments and recommendations:

      Overall comments:

      (i) The carabid images used in the figures were created by Ortwin Bleich and are copyrighted. I could not find him accredited in the acknowledgements; the figure legends simply state that the images were taken from his webpage. Was his permission obtained? This should be stated.

      We have received written permission from Ortwin Bleich for using his pictures in our figures, and have accredited him for this in the acknowledgements (line 455-456).

      (ii) There is a great confusion in the field concerning terminology. The authors here use intercropping and strip cropping, a specific form of intercropping, interchangeably. I advise the authors to stick to strip cropping as it is more precise and avoids confusion with other forms of intercropping.

      We agree with the definitions given by reviewer 2 and had already used them as such in the text. We defined strip cropping in the first paragraph of the introduction and do not use the term “intercropping” after this definition to avoid confusion.

      Comments to specific lines:

      Line 19: While this is likely true, there is so far not enough compelling evidence for such a strong statement blaming agriculture. Please rephrase.

      Changed the sentence to indicate more clearly that it is one of the major drivers, but that the “blame” is not solely on agriculture (line 18-19).

      Line 22: Is this the case? I am aware of strip cropping being used in other countries, many of them in Europe. Why the focus on 'Dutch'?

      Indeed, strip cropping is now being pioneered by farmers throughout Europe. However in the Netherlands, some farmers have been pioneering strip cropping already since 2014. We have added this information to indicate that our setting is in the Netherlands, and as in our opinion it gives a bit more context to our manuscript.

      Line 24: I would argue that carabids are actually not good indicators for overall biodiversity in crop fields as they respond in a very specific way, contrasting with other taxa. It is commonly observed that carabids prefer more disturbed habitats and richness often increases with management intensity and in more agriculturally dominated landscapes - in stark contrast to other taxa like wild bees or butterflies.

      We have reworded this sentence to reflect that they are not necessarily indicators of wide agricultural biodiversity, but that they do hold keystone positions within food webs in agricultural systems (line 23-25).

      Line 31: This statement here is also too strong - carabids are not overall biodiversity and patterns found for carabids likely differ strongly from patterns that would be observed in other taxa. This study is on carabids and the conclusion should thus also refer to these in order to avoid such over-simplified generalizations.

      We agree and have nuanced this sentence to indicate that our findings are only on ground beetles (line 33-35). However, we would like to point out that the statement that “patterns found for carabids likely differ strongly from patterns that would be observed in other taxa” assumes a disassociation between carabids and other taxa.

      Line 41: I am sure the authors are aware of the various methodological shortcomings of the dataset used in Hallmann et al. (2017) which likely led to an overestimation of the actual decline. Analysing the same data, Müller et al. (2023) found that weather can explain fluctuations in biomass just as well as time. I thus advise not putting too much focus on these results here as they seem questionable.

      We have removed this sentence to streamline the introduction, thus no longer mentioning the percentages given by Hallmann et al. (2017).

      Line 46: Surely likely but to my knowledge this is actually remarkably hard to prove. Instead of using the IPBES report here that simply states this as a fact, it would be better to see some actual evidence referenced.

      We removed IPBES as a source and changed this for Dirzo et al. (2014), a review that shows the consequences of biodiversity decline on a range of different ecosystem services and ecological functions (line 45-47).

      Line 52ff: I am not sure whether this old land-sparing vs. land-sharing debate is necessary here. The authors could simply skip it and directly refer to the need of agricultural areas, the dominating land-use in many regions, to become more biodiversity-friendly. It can be linked directly to Line 61 in my opinion which would result in a more concise and arguably stronger introduction.

      After reconsidering, we agree with reviewer 2 that this section was redundant and we have removed the lines on land-sparing vs land-sharing.

      Line 59: Just a note here: this argument is not meaningful when talking about strip cropping in the Netherlands as there is virtually no land left that could be converted (if anything, agricultural land is lost to construction). The debate on land-use change towards agriculture is nowadays mostly focused on the tropics and the Global South.

      We argue that strip cropping could play an important role as a measure that does not necessarily follow the trade-off between biodiversity and agriculture for a context beyond the Netherlands (line 52-58).

      Line 69: Does this statement really need 8 references?

      Line 71: ... and this one 5 additional ones?

      We have removed excess references in these two lines (line 62-66).

      Line 74: But also likely provides the necessary crop continuity for many crop pests - the authors should keep in mind that when practitioners read agricultural biodiversity, they predominantly think of weeds and insect pests.

      We agree with reviewer 2 that agricultural biodiversity is still a controversial topic. However, as the focus in this manuscript is more on biodiversity conservation, rather than pest management, we prefer to keep this sentence as is. In other published papers and future work we focus more on the role of strip cropping for pest management.

      Line 83: Consider replacing 'moments' maybe - phenological stages or development stages?

      Although we understand the point of reviewer 2, we prefer to keep it at moments, as we did not focus on phenological stages and we only wanted to say that we set pitfall traps at several moments throughout the year. However, by placing the pitfall traps at several moments throughout the year, we did capture several phenological stages.

      Line 86: Not only farming practices - there are also massive fluctuations between years in the same crop with the same management due to effects of the weather in the previous reproductive season. Interpreting carabid assemblage changes is therefore not straightforward.

      We absolutely agree that interpreting carabid assemblage is not straightforward, but as we did not study year or crop legacy effects we chose to keep this sentence to maintain focus on our research goals.

      Line 88: 'ecolocal'?

      Typo, should have been ecological. Changed (line 81).

      Line 90: 'As such, they are often used as indicator group for wider insect diversity in agroecosystems' - this is the third repetition of this statement and the second one in this paragraph - please remove. Having worked on carabids extensively myself, I also think that this is not the true reason - they are simply easy to collect passively.

      We agree with the reviewer and have removed this sentence.

      Line 141: I have doubts about the value of the ISA looking at the results. Anchomenus dorsalis is a species extremely common in cereal monoculture fields in large parts of Europe, especially in warmer and drier conditions (H. griseus was likely only returned as it is generally rare and likely only occurred in few plots that, by chance, were strip-cropped). It can hardly be considered an indicator for diverse cropping systems but it was returned as one here (which I do not doubt). This often happens with ISA in my experience as they are very sensitive to the specific context of the data they are run on. The returned species are, however, often not really useable as indicators in other contexts. I thus believe they actually have very limited value. Apart from this, we see here that both monocultures and strip cropping have their indicators, as would likely all crop types. I wonder what message we would draw from this ...

      On close reconsideration, we agree with the reviewer that the ISAs might have been too sensitive to rare species that by chance occur in one of two crop configurations. To still get an idea on what happens with specific ground beetle groups, we chose to replace the ISAs with analyses on the 12 most common ground beetle genera. For this purpose we have added new sections to the methods (line 368-374) and results (line 135-143), replaced figure 2 and table S5, and updated the discussion (line 182-200).

      Line 165: Carabid activity is high when carabids are more active. Carabids can be more active either when (i) there are simply more carabid individuals or /and (ii) when they are starved and need to search more for prey. More carabid activity does thus not necessarily indicate more individuals, it can indicate that there is less prey. This aspect is missing here and should be discussed. It is also not true that crop diversification always increases prey biomass - especially strip cropping has previously been shown to decrease pest densities (Alarcón-Segura et al., 2022). Of course, this is a chicken-egg problem (less pests => less carabids or more carabids => less pests ?) ... this should at least be discussed.

      We have rewritten this paragraph to further discuss activity density in relation to food availability (line 175-185).

      Line 178: These species are not exclusively granivorous - this speculation may be too strong here.

      Line 185: true for all but C. melanocephalus - this species is usually more associated with hedgerows, forests etc.

      After removing the ISA’s, we also chose to remove this paragraph and replace it with a paragraph that is linked to the analyses on the 12 most common genera (line 182-200).

      Line 202: These statements are too strong for my taste - the authors should add an 'on average' here. The data show that they likely do not always enhance richness by 15 % and as the authors state, some monocultures still had higher richness and densities.

      “on average” added (line 211)

      Line 203: 'can lead' - the authors cannot tell based on their results if this is always true for all taxa.

      Changed to “can lead” (line 213)

      Line 205: What is 'diversification' here?

      This concerns measures like hedgerows or flower strips. We altered the sentence to make this clearer (line 215-216).

      Line 208: Does this statement need 5 references? (as in the introduction, the reader gets the impression the authors aimed to increase the citation count of other articles here).

      We have removed excess references (line 219-221).

      Line 222: How many are 'a few'? Maybe state a proportion.

      We only found two species, we’ve changed the sentence accordingly (line 232-233).

      Line 224: As stated above, I would not overstress the results of the ISAs - the authors stated themselves that the result for A. dorsalis is likely only based on one site ...

      We removed this sentence after removing the ISAs.

      Line 305: I think there is an additional nested random level missing - the transect or individual plot the traps were located in (or was there only one replicate for each crop/strip in each experiment)? Hard to tell as the authors provide no information on the actual sample sizes.

      Indeed, there was one field or plot per cropping system per crop per location per year from which all the samples were taken. Therefore the analysis does not miss a nested random level. We provided information on sample sizes in Table S7.

      Line 314ff: The authors describe that they basically followed a (slightly extended) Chao-Hill approach (species richness, Shannon entropy & inverse Simpson) without the sampling effort / sample completeness standardization implemented in this approach and as a reader I wonder why they did not simply just use the customary Chao-Hill approach.

      We were not aware of the Chao-Hill approach, and we see it as a compliment that we independently came up with an approach similar to a now accepted approach.

      Line 329: Unclear what was nested in what here - location / year / crop or year / location / crop ?

      For the crop-level analyses, the nested structure was location > year > crop. This nested structure was chosen as every location was sampled across different years and (for some locations) the crops differed among years. However, as we pooled the samples from the same field in the field-level analyses, using the same random structure would have resulted in each individual sampling unit being distinguished as a group. Therefore, the random structure here was only location > year. We explain this now more clearly in lines 329 and 355-357.

      Line 334: I can see why the authors used these distributions but it is presented here without any justification. As a side note: Gamma (with log link) would likely be better for the Shannon model as well (I guess it cannot be 0 or negative ...).

      We explain this now better in lines 360-364.

      Line 341: Why Hellinger and not simply proportions?

      We used Hellinger transformation to give more weight to rarer species. Our pitfall traps were often dominated by large numbers of a few very abundant / active species. If we had used proportions, these species would have dominated the community analyses. We clarified this in the text (line 379-381).

      Line 348: An RDA is constrained by the assumptions / model the authors proposed and "forces" the data into a spatial ordination that resembles this model best. As the authors previously used an unconstrained PERMANOVA, it would be better to also use an NMDS that goes along with the PERMANOVA.

      The initial goal of the RDA was not to directly visualize the results of the PERMANOVA, but to show whether an overall crop configuration effect occurred, both for the whole dataset and per location. We have now added NMDS figures to link them to the PERMANOVA and added these to the supplementary figures (fig S6-S8). We also mention this approach in the methods section (line 387-390).

      Line 355f: This is also a clear indication of the strong annual fluctuations in carabid assemblages as mentioned above.

      Indeed.

      Line 361: 'pairwise'.

      Typo, we changed this.

      Line 362: reference missing.

      Reference added (line 405)

      References

      Alarcón-Segura, V., Grass, I., Breustedt, G., Rohlfs, M., Tscharntke, T., 2022. Strip intercropping of wheat and oilseed rape enhances biodiversity and biological pest control in a conventionally managed farm scenario. J. Appl. Ecol. 59, 1513-1523.

      Boetzl, F.A., Sponsler, D., Albrecht, M., Batáry, P., Birkhofer, K., Knapp, M., Krauss, J., Maas, B., Martin, E.A., Sirami, C., Sutter, L., Bertrand, C., Baillod, A.B., Bota, G., Bretagnolle, V., Brotons, L., Frank, T., Fusser, M., Giralt, D., González, E., Hof, A.R., Luka, H., Marrec, R., Nash, M.A., Ng, K., Plantegenest, M., Poulin, B., Siriwardena, G.M., Tscharntke, T., Tschumi, M., Vialatte, A., Van Vooren, L., Zubair-Anjum, M., Entling, M.H., Steffan-Dewenter, I., Schirmel, J., 2024. Distance functions of carabids in crop fields depend on functional traits, crop type and adjacent habitat: a synthesis. Proceedings of the Royal Society B: Biological Sciences 291, 20232383.

      Hallmann, C.A., Sorg, M., Jongejans, E., Siepel, H., Hofland, N., Schwan, H., Stenmans, W., Müller, A., Sumser, H., Hörren, T., Goulson, D., de Kroon, H., 2017. More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLoS One 12, e0185809.

      Knapp, M., Seidl, M., Knappová, J., Macek, M., Saska, P., 2019. Temporal changes in the spatial distribution of carabid beetles around arable field-woodlot boundaries. Scientific Reports 9, 8967.

      Müller, J., Hothorn, T., Yuan, Y., Seibold, S., Mitesser, O., Rothacher, J., Freund, J., Wild, C., Wolz, M., Menzel, A., 2023. Weather explains the decline and rise of insect biomass over 34 years. Nature.

      Toivonen, M., Huusela, E., Hyvönen, T., Marjamäki, P., Järvinen, A., Kuussaari, M., 2022. Effects of crop type and production method on arable biodiversity in boreal farmland. Agriculture, Ecosystems & Environment 337, 108061.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      We thank reviewer 3 for their kind words and appreciation for the simple language and analysis that we used.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      Point well taken. We agree that the effect of strip cropping on carabid beetle communities are subtle and we nuanced the text in the revised version to reflect this. See below for more details on how we revised the manuscript to reflect this point.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures".

      This is indeed in line with our conclusions. Based on our data and results, the advantages of strip cropping seem mostly to occur because crops with different communities are now on the same field, rather than that within the strips you get mixtures of communities related to different crops. We discussed this in the first paragraph of the discussion in the original submission (line 161-164).

      The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      We believe that this is a misunderstanding of our approach. In the field-level analyses we pooled samples from the same field (i.e. pseudo-replicates were pooled), resulting in a relatively small sample size of 50 samples. We revised the methods section to better explain this (line 318-322). Therefore, the statement “with enough data points things tend to get significant” is not applicable here.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      We agree that this would be the preferred approach if we would have had a perfectly balanced dataset. However, this approach is not feasible with our unbalanced design and differences in sampling effort. While we acknowledge the limitation of the interpretation of percentages, it does allow reporting relative changes for each combination of location, year and crop. The number of samples on which the percentages were based were always kept equal (through rarefaction) between the cropping systems (for each combination of location, year and crop), but not among crops, years and location. This approach allowed us to make a better estimation whenever more samples were available, as we did not always have an equal number of samples available between both cropping systems. For example, sometimes we had 2 samples from a strip cropped field and 6 from the monoculture, here we would use rarefaction up to 2 samples (where we would just have a better estimation from the monoculture). In other cases, we had 4 samples in both strip cropped and monoculture fields, and we chose to use rarefaction to 4 samples to get a better estimation altogether. Adding a value for actual richness or abundance to the figures would have distorted these findings, as the variation would be huge (as it would represent the number of ground beetle(s) species per 2 to 6 pitfall samples). Furthermore, the dimension that reviewer 3 describes would thus be “The number of ground beetle species / individuals per 2 to 6 samples”, not a very informative unit either.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between strip cropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      Thank you for raising this point. We have reconsidered our indicator species analysis and found that it is rather sensitive for rare species and insensitive to changes in common species. Therefore, we have replaced the indicator species analyses with a GLM analysis for the 12 most common genera of ground beetles in the revised manuscript. This will allow us to go more in depth on specific traits of the genera which abundances change depending on the cropping system. In the revised manuscript, we will also discuss these common genera more in depth, rather than focusing on rarer species (line 135-143, 182-200 in discussion). Furthermore, we have added information on rarity and habitat preference to the table that shows species abundances per location (Table S2), and mention these aspects briefly in the results (line 145-153).

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      This is in line with our point in the first paragraph of the discussion and an important message of our manuscript.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      We agree and we make a similar point in the first paragraph of the discussion (line 160-162).

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      We revised the introduction by omitting the land sharing vs. land sparing topic and better linking references to our research findings.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      We agree with the reviewer and clarified this research direction clearer in the introduction of the revised manuscript (line 66-72).

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      Point well taken. We agree that the indicator species analysis has limitations and therefore now replaced this with GLM analysis for the 12 most common ground beetle genera.

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      We thank reviewer 3 for these valuable perspectives. In the revised manuscript, we further explored the species/genera that respond to cropping systems and discuss these findings in more detail in the revised manuscript (line 182-200 in discussion).

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      The answer is simply no, we did not find more rare species in strip cropping. In the revised manuscript, we added a column for rarity (according to waarneming.nl) in the table showing abundances of species per location (table S2). We only found two rare species, one of which we only found a single individual and one that was more related to the open habitat created by a failed wheat field. We discuss this more in depth in the revised results (line 145-153).

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

      We believe that strip cropping can be a useful tool because farmers readily adopt it and it can result in modest biodiversity gains without yield loss. However, strip cropping is indeed not a silver bullet (which we also don’t claim). We nuanced the implications of our study in the revised manuscript (line 30-35, 232-237).

      Reviewer #3 (Recommendations for the authors):

      General comments:

      (1) I am missing the R script and data files in the manuscript. This is a serious drawback in assessing the quality of the work.

      Datasets and R scripts will be made available upon completion of the manuscript.

      (2) I have doubts about the clarity of the title. It more or less states that strip cropping is designed in order to maintain productivity. However, the main objective of strip cropping is to achieve ecological goals without losing productivity. I suggest a rethink of the title and what it is the authors want to convey.

      As the title lead to false expectations for multiple reviewers regarding analyses on yield, we chose to alter the title and removed any mention of yield in the title.

      (3) Line 22: I would add something along the lines of: "As an alternative to intercropping, strip cropping is pioneerd by Dutch farmers... " This makes the distinction and the connection between the two more clear.

      In our opinion, strip cropping is a form of intercropping. We have changed this sentence to reflect this point better. (line 21-22)

      (4) Line 24: "these" should read "they"

      After changing this sentence, this typo is no longer there (line 24).

      (5) Line 34-48. I think this introduction is too long. The paper is not directly about insect decline, so the authors could consider starting with line 43 and summarising 34-42 in one or two sentences.

      Removed a sentence on insect declines here to make the introduction more streamlined.

      (6) Line 51-59. I am not convinced the land sparing - land sharing idea adds anything to the paper. It is not used in the discussion and solicits much discussion in and of itself unnecessary in this paper. The point the authors want to make is not arable fields compared to natural biodiversity, but with increases in biodiversity in an already heavily degraded ecosystem; intensive agriculture. I think the introduction should focus on that narrative, instead of the land sparing-sharing dichotomy, especially because too little attention is spent on this narrative.

      We removed the section on land-sparing vs land-sharing as it was indeed off-topic.

      (7) Line 85. Dynamics is not correctly used here. It should read Ground beetle communities are sensitive.

      Changed accordingly (line 78-79).

      (8) Line 90-91. Here, it should be added that ground beetles are used as indicators for ground-dwelling insect diversity, not wider insect diversity in agricultural systems. In fact, Gerlach et al., the reference included, clearly warn against using indicator groups in a context that is too wide for a single indicator group to cover and Van Klink (2022) has recently shown in a meta-analysis that the correlation between trends in insect groups is often rather poor.

      We removed the sentence that claimed ground beetles to be indicators of general biodiversity, and have focused the text in general more on ground beetle biodiversity, rather than general biodiversity.

      (9) Line 178: was there a high weed abundance measured in the stripcropping fields? Or has there been reports on higher weed abundance in general? The references provided do not appear to support this claim.

      To our knowledge, there is only one paper on the effect of strip cropping on weeds (Ditzler et al., 2023). This paper shows strip cropping (and more diverse cropping systems) reduce weed cover, but increase weed richness and diversity. We mistakenly mentioned that crop diversification increases weed seed biomass, but have changed this accordingly to weed seed richness. The paper from Carbonne et al. (2022) indeed doesn’t show an effect of crop diversification on weeds. However, it does show a positive relation between weed seed richness and ground beetle activity density. We have moved this citation to the right place in the sentence (line 172-175).

      (10) Line 279-288. The description of sampling with pitfalls is inadequate. Please follow the guidelines for properly incorporating sufficient detail on pitfall sampling protocols as described in Brown & Matthews 2016,

      We were sadly not aware of this paper prior to the experiments, but have at least added information on all characteristics of the pitfall traps as mentioned in the paper (line 290-294).

      (11) Lines 307-310. What reasoning lies behind the choice to focus on the most beetle-rich monocultures? Do the authors have references for this way of comparing treatments? Is there much variation in the monocultures that solicits this approach? It would be preferable if the authors could elaborate on why this method is used, provide references that it is a generally accepted statistical technique and provide additional assesments of the variation in the data so it can be properly related to more familiar exploratory data analysis techniques.

      We ran two analyses for the field-level richness and abundance. First we used all combinations of monocultures and strip cropping. However, as strip cropping is made up of (at least) 2 crops, we had 2 constituent monocultures. As we would count a comparison with the same strip cropped field twice when we included both monocultures, we also chose to run the analyses again with only those monocultures that had the highest richness and abundance. This choice was done to get a conservative estimate of ground beetle richness increases through strip cropping. We explained this methodology further in the statistical analysis section (line 329-335).

      In Figure S6 the order of crop combinations is altered between 2021 on the left and 2022 on the right. This is not helpful to discover any possible patterns.

      We originally chose this order as it represented also the crop rotations, but it is indeed not helpful without that context. Therefore, we chose to change the order to have the same crop combinations within the rows.

    1. eLife Assessment

      This important study investigates how hummingbird hawkmoths integrate stimuli from across their visual field to guide flight behavior. Cue conflict experiments provide solid evidence for an integration hierarchy within the visual field: hawkmoths prioritize the avoidance of dorsal visual stimuli, potentially to avoid crashing into foliage, while they use ventrolateral optic flow to guide flight control. These findings will be of broad interest to enthusiasts of visual neuroscience and flight behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field, elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a prioritization for generating behavior that supports hawkmoth safety rather than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

    3. Reviewer #2 (Public review):

      Summary

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight. The authors linked their behavioral results to visual scene statistics in the hawkmoths' natural environment. The partition of ventral and dorsal visuomotor pathways is well in line with differences in visual cue frequencies. The response hierarchy, however, seems to be dominated by dorsal features, that are less frequent, but presumably highly relevant for the animals' flight safety.

      Strengths

      The data are very interesting and unique. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      I find the majority of the data, which are also the data supporting the main claims of the paper, compelling. However, the measurements of flight height are less solid than the rest and I think these data should be interpreted more carefully.

    4. Reviewer #3 (Public review):

      The authors have significantly improved the paper in revising to make its contributions distinct from their prior paper. They have also responded to my concerns about quantification and parameter dependency of the integration conclusion. While I think there is still more that could be done in this capacity, especially in terms of the temporal statistics and quantification of the conflict responses, they have a made a case for the conclusions as stated. The paper still stands as an important paper with solid evidence a bit limited by these concerns.

      [Editors' note: Due to very minor revisions, the paper was not sent to reviewers for an additional round of review.]

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Recent work has demonstrated that the hummingbird hawkmoth, Macroglossum stellatarum, like many other flying insects, use ventrolateral optic flow cues for flight control. However, unlike other flying insects, the same stimulus presented in the dorsal visual field, elicits a directional response. Bigge et al., use behavioral flight experiments to set these two pathways in conflict in order to understand whether these two pathways (ventrolateral and dorsal) work together to direct flight and if so, how. The authors characterize the visual environment (the amount of contrast and translational optic flow) of the hawkmoth and find that different regions of the visual field are matched to relevant visual cues in their natural environment and that the integration of the two pathways reflects a prioritization for generating behavior that supports hawkmoth safety rather than the prevalence for a particular visual cue that is more prevalent in the environment.

      Strengths:

      This study creatively utilizes previous findings that the hawkmoth partitions their visual field as a way to examine parallel processing. The behavioral assay is well-established and the authors take the extra steps to characterize the visual ecology of the hawkmoth habitat to draw exciting conclusions about the hierarchy of each pathway as it contributes to flight control.

      Reviewer #2 (Public review):

      Summary

      Bigge and colleagues use a sophisticated free-flight setup to study visuo-motor responses elicited in different parts of the visual field in the hummingbird hawkmoth. Hawkmoths have been previously shown to rely on translational optic flow information for flight control exclusively in the ventral and lateral parts of their visual field. Dorsally presented patterns, elicit a formerly completely unknown response - instead of using dorsal patterns to maintain straight flight paths, hawkmoths fly, more often, in a direction aligned with the main axis of the pattern presented (Bigge et al, 2021). Here, the authors go further and put ventral/lateral and dorsal visual cues into conflict. They found that the different visuomotor pathways act in parallel, and they identified a 'hierarchy': the avoidance of dorsal patterns had the strongest weight and optic flow-based speed regulation the lowest weight. The authors linked their behavioral results to visual scene statistics in the hawkmoths' natural environment. The partition of ventral and dorsal visuomotor pathways is well in line with differences in visual cue frequencies. The response hierarchy, however, seems to be dominated by dorsal features, that are less frequent, but presumably highly relevant for the animals' flight safety.

      Strengths

      The data are very interesting and unique. The manuscript provides a thorough analysis of free-flight behavior in a non-model organism that is extremely interesting for comparative reasons (and on its own). These data are both difficult to obtain and very valuable to the field.

      Weaknesses

      While the present manuscript clearly goes beyond Bigge et al, 2021, the advance could have perhaps been even stronger with a more fine-grained investigation of the visual responses in the dorsal visual field. Do hawkmoths, for example, show optomotor responses to rotational optic flow in the dorsal visual field?

      I find the majority of the data, which are also the data supporting the main claims of the paper, compelling. However, the measurements of flight height are less solid than the rest and I think these data should be interpreted more carefully.

      Reviewer #3 (Public review):

      The authors have significantly improved the paper in revising to make its contributions distinct from their prior paper. They have also responded to my concerns about quantification and parameter dependency of the integration conclusion. While I think there is still more that could be done in this capacity, especially in terms of the temporal statistics and quantification of the conflict responses, they have a made a case for the conclusions as stated. The paper still stands as an important paper with solid evidence a bit limited by these concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The edits have significantly improved the clarity of the manuscript. A few small notes:

      Figure 2B legend - describe what the orange dashed line represents

      We added a description.

      Figure 2B legend - references Table 1 but I believe this should reference Table S1. There are other places in the manuscript where Table 1 is referenced and it should reference S1

      We changed this for all instances in the main paper and supplement, where the reference was wrong.

      Figure S1 legend - some figure panel letters are in parentheses while others are not

      We unified the notation to not use parentheses for any of the panel letters.

      Reviewer #2 (Recommendations for the authors):

      I couldn't find the l, r, d, v indications in Fig. 1a. This was just a suggestion, but since you wrote you added them, I was wondering if this is the old figure version.

      We added them to what is now Fig. 2, which was originally part of Fig. 1. After restructuring, we did indeed not add an additional set to Fig. 1, which we have now adjusted.

      Fig. 2: Adding 'optic flow' and 'edges' to the y-axis in panels E and F, would make it faster for me to parse the figure. Maybe also add the units for the magnitudes? Same for Figure 6B

      We added 'optic flow' and 'edges' to the panels E and F in Fig. 2 and Fig. 6.

      Fig. 2: Very minor - could you use the same pictograms in D and E&F (i.e. all circles for example, instead of switching to "tunnels" in EF)?

      We used the tunnel pictograms, because we associated those with the short notations for the different conditions summarised in Table S1. Because we wanted to keep this consistent across the paper, we used the “tunnel” pictograms here too.

      In the manuscript, you still draw lots of conclusions based on these area measurements (L132-142, L204-209 etc). This does not fully reflect what you wrote in your reply to the reviewers. If you think of these measurements as qualitative rather than quantitative, I would say so in the manuscript and not use quantitative statistics etc. My suggestion would be to be more specific about potential issues that can influence the measurement (you mentioned body size, image contrast, motion blur, pitch across conditions etc) and give that data not the same weight as the rest of the measurements.

      We do express explicit caution with this measure in the methods section (l. 657-659) and the results section (l. 135-137). Nevertheless, as the trends in the data are consistent with optic flow responses in the other planes, and with responses reported in the literature, we felt that it is valuable to report the data, as well as the statistics for all readers, who can – given out cautionary statement – assess the data accordingly.

      The area measurements suggest that moths fly lower with unilateral vertical gratings (Fig. S1, G1 and G2 versus the rest). If you leave the data in can you speculate why that would be? (Sorry if I missed that)

      We agree, this seems quite consistent, but we do not have a good explanation for this observation. It would certainly require some additional experiments and variable conditions to understand what causes this phenomenon.

      Fig.4 - is panel B somehow flipped? Shouldn't the flight paths start out further away from the grating and then be moved closer to midline (as in A). That plot shows the opposite.

      Absolutely right, thank you for spotting this, it was indeed an intermediate and not the final figure which was uploaded to the manuscript. It also had outdated letter-number identifiers, which we now updated.

      L198 - should be "they avoided"

      Corrected.

    1. eLife Assessment

      By combining the 'pinging' technique with fMRI-based multivariate pattern analysis, this important study provides convincing evidence for a dual-format of attentional representation during preparatory period. The result reconciles the competing views between the sensory-like versus non-sensory accounts of attentional template and advances our understanding of how the brain flexibly utilizes different versions of template to guide attention. This work will be of interest to researchers in psychology, vision science, and cognitive science.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of the experiment reported in this paper is to examine the nature of the representation of a template of an upcoming target. To this end, participants were presented with compound gratings (consisting of tilted to the right and tilted to the left lines) and were cued to a particular orientation - red left tilt or blue right tilt (counterbalanced across participants). There are two directly compared conditions: (i) no ping: where there was a cue, that was followed by a 5.5-7.5s delay, then followed by a target grating in which the cued orientation deviated from the standard 45 degrees; and (ii) ping condition in which all aspects were the same with the only difference that a ping (visual impulse presented for 100ms) was presented after the 2.5 seconds following the cue. There was also a perception task in which only the 45 degrees to the right or to the left lines were presented. It was observed that during the delay, only in the ping condition, were the authors able to decode the orientation of the to-be-reported target using the cross-task generalization. Attention decoding, on the other hand, was decoded in both ping and non-ping conditions. It is concluded that the visual system has two different functional states associated with a template during preparation: a predominantly non-sensory representation for guidance and a latent sensory-like for prospective stimulus processing.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative - the cross-task decoding, the use of Mahalanobis distance as a function of representational similarity, the fact that the question is theoretically interesting, and the excellent figures.

      Weaknesses:

      While I think that this is an interesting study that addresses an important theoretical question, I have several concerns about the experimental paradigm and the subsequent conclusions that can be drawn.

      (1) Why was V1 separated from the rest of the visual cortex, and why the rest of the areas were simply lumped into an EVC ROI? It would be helpful to understand the separation into ROIs.

      (2) It would have been helpful to have a behavioral measure of the "attended" orientation to show that participants in fact attended to a particular orientation and were faster in the cued condition. The cue here was 100% valid, so no such behavioral measure of attention is available here.

      (3) As I was reading the manuscript I kept thinking that the word attention in this manuscript can be easily replaced with visual working memory. Have the authors considered what it is about their task or cognitive demand that makes this investigation about attention or working memory?

      (4) If I understand correctly, the only ROI that showed a significant difference for the cross-task generalization is V1. Was it predicted that only V1 would have two functional states? It should also be made clear that the only difference where the two states differ is V1.

      (5) My primary concern about the interpretation of the finding is that the result, differences in cross-task decoding within V1 between the ping and no-ping condition might simply be explained by the fact that the ping condition refocuses attention during the long delay thus "resharpening" the template. In the no-ping condition during the 5.5 to 7.5 seconds long delay, attention for orientation might start getting less "crisp." In the ping condition, however, the ping itself might simply serve to refocus attention. So, the result is not showing the difference between the latent and non-latent stages, rather it is the difference between a decaying template representation and a representation during the refocused attentional state. It is important to address this point. Would a simple tone during the delay do the same? If so, the interpretation of the results will be different.

      (6) The neural pattern distances measured using Mahalanobis values are really great! Have the authors tried to use all of the data, rather than the high AMI and low AMI to possibly show a linear relationship between response times and AMI?

      (7) After reading the whole manuscript I still don't understand what the authors think the ping is actually doing, mechanistically. I would have liked a more thorough discussion, rather than referencing previous papers (all by the co-author).

      Comments on revisions:

      I am impressed with the thoroughness with which the authors addressed my concerns. I don't have any further concerns and think that this paper makes an interesting and significant contribution to our understanding of VWM. I would only suggest adding citations to the newly added paragraph where the authors state "It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance." They could cite work by Bettencourt and Xu, 2016; and Sheremata, Somers, and Shomstein (2018).

    3. Reviewer #2 (Public review):

      Summary:

      In the present study, the authors investigated the nature of attentional templates during preparatory period of goal-directed attention. By combing the use of 'pinging' the neural activity with a visual impulse and fMRI-based multivariate decoding, the authors found that the nature of the neural representations of the prospective feature target during preparatory period was contingent on the presence of the 'pinging' impulse. While the preparatory representations contained highly similar information content as the perceptual representations when the pinging impulse was introduced, they fundamentally differed from perceptual representations in the absence of the pinging impulse. Based on these findings, the authors proposed a dual-format mechanism in which both a "non-sensory" template and a latent "sensory" template coexisted during attentional preparation. The former actively guides activity in the preparatory state, and the latter is utilized for future stimulus processing.

      Strengths:

      Overall, I think that the authors' revision has addressed most, if not all, of my major concerns noted in my previous comments.

      Weaknesses:

      The results appear convincing and I do not have additional comments.

    4. Reviewer #3 (Public review):

      This paper discusses how non-sensory and latent, sensory-like attentional templates are represented during attentional preparation. Using multivariate pattern analysis, they found that visual impulses can enhance the decoding generalization from perception to attention tasks in the preparatory stage in the visual cortex. Furthermore, the emergence of the sensory-like template coincided with enhanced information connectivity between V1 and frontoparietal areas and was associated with improved behavioral performance. It is an interesting paper with supporting evidence for the latent, sensory-like attentional template.

      (1) The authors addressed most of my previous concerns and provided additional data analysis. They conducted further analyses to demonstrate that the observed changes in network communication are associated with behavioral RTs, supporting the idea that the impulse-driven sensory-like template enhances informational connectivity between sensory and frontoparietal areas, and relates to behavior.

      (2) I would like to further clarify my previous points regarding the definition of the two types of templates and the evidence for their coexistence. The authors stated that the sensory-like template likely existed in a latent state and was reactivated by visual pings, proposing that sensory and non-sensory templates coexist. However, it remains unclear whether this reflects a dynamic switch between formats or true coexistence. If the templates are non-sensory in nature, what exactly do they represent? Are they meant to be abstract or conceptual representations, or, put simply, just "top-down attentional information"? If so, why did the generalization analyses-training classifiers on activity during the stimulus selection period and testing on preparatory activity-fail to yield significant results? While the stimulus selection period necessarily encodes both target and distractor information, it should still contain attentional information. I would appreciate more discussion from this perspective.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Why was V1 separated from the rest of the visual cortex, and why the rest of the areas were simply lumped into an EVC ROI? It would be helpful to understand the separation into ROIs.

      We thank the reviewer for raising the concerns regarding the definition of ROI. Our approach to analyze V1 separately was based on two key considerations. First, previous studies consistently identify V1 as the main locus of sensory-like templates during featurespecific preparatory attention (Kok et al., 2014; Aitken et al., 2020). Second, V1 shows the strongest orientation selectivity within the visual hierarchy (Priebe, 2016). In contrast, the extrastriate visual cortex (EVC; comprising V2, V2, V3AB and V4) demonstrates broader selectivity, such as complex features like contour and texture (Grill-Spector & Malach, 2004). Thus, we think it would be particularly informative to analyze V1 data separately as our experiment examines orientation-based attention. We should also note that we conducted MVPA separately for each visual ROIs (V2, V3, V3AB and V4). After observing similar patterns of results across these regions, we averaged the decoding accuracies into a single value and labeled it as EVC. This approach allowed us to simplify data presentation while preserving the overall data pattern in decoding performance. We now added the related explanations on the ROI definition in the revised texts (Page 26; Line 576-581).

      (2) It would have been helpful to have a behavioral measure of the "attended" orientation to show that participants in fact attended to a particular orientation and were faster in the cued condition. The cue here was 100% valid, so no such behavioral measure of attention is available here.

      We thank the reviewer for the comments. We agree that including valid and neutral cue trials would have provided valuable behavioral measures of attention; Yet, our current design was aimed at maximizing the number of trials for decoding analysis due to fMRI time constraints. Thus, we could not fit additional conditions to measure the behavioral effects of attention. However, we note that in our previous studies using a similar feature cueing paradigm, we observed benefits of attentional cueing on behavioral performance when comparing valid and neutral conditions (Liu et al., 2007; Jigo et al., 2018). Furthermore, our neural data indeed demonstrated attention-related modulation (as indicated by MVPA results, Fig. 2 in the main texts) so we are confident that on average participants followed the instruction and deployed their attention accordingly. We now added the related explanations on this point in the revised texts (Page 23; Line 492-498).

      (3) As I was reading the manuscript I kept thinking that the word attention in this manuscript can be easily replaced with visual working memory. Have the authors considered what it is about their task or cognitive demand that makes this investigation about attention or working memory?

      We thank the reviewer for this comment. We added the following extensive discussion on this point in the revised texts (Page 18; Line 363-381).

      “It could be argued that preparatory attention relies on the same mechanisms as working memory maintenance. While these functions are intuitively similar and likely overlap, there is also evidence indicating that they can be dissociated (Battistoni et al., 2017). In particular, we note that in our task, attention is guided by symbolic cues (color-orientation associations), while working memory tasks typically present the actual visual stimulus as the memorandum. A central finding in working memory studies is that neural signals during WM maintenance are sensory in nature, as demonstrated by generalizable neural activity patterns from stimulus encoding to maintenance in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019). However, in our task, neural signals during preparation were nonsensory, as demonstrated by a lack of such generalization in the No-Ping session (see also Gong et al., 2022). We believe that the differences in cue format and task demand in these studies may account for such differences. In addition to the difference in the sensory nature of the preparatory versus delay-period activity, our ping-related results also exhibited divergence from working memory studies (Wolff et al., 2017; 2020). While these studies used the visual impulse to differentiate active and latent representations of different items (e.g., attended vs. unattended memory item), our study demonstrated the active and latent representations of a single item in different formats (i.e., non-sensory vs. sensory-like). Moreover, unlike our study, the impulse did not evoke sensory-like neural patterns during memory retention (Wolff et al., 2017). These observations suggest that the cognitive and neural processes underlying preparatory attention and working memory maintenance could very well diverge. Future studies are necessary to delineate the relationship between these functions both at the behavioral and neural level.”

      (4) If I understand correctly, the only ROI that showed a significant difference for the crosstask generalization is V1. Was it predicted that only V1 would have two functional states? It should also be made clear that the only difference where the two states differ is V1.

      We thank the reviewer for this comment. We would like to clarify that our analyses revealed similar patterns of preparatory attentional representations in V1 and EVC. During the Ping session, the cross-task generalization analyses revealed decodable information in both V1 and EVC (ps < 0.001), significantly higher than that in the No-Ping session for V1 (independent t-test: t(38) = 3.145, p = 0.003; Cohen’s d = 0.995) and EVC (independent t-test: t(38) = 2.153, p = 0.038, Cohen’s d = 0.681) (Page 10; Line 194-196). While both areas maintained similar representations, additional measures (Mahalanobis distance, neural-behavior relationship and connectivity changes) showed more robust ping-evoked changes in V1 compared to EVC. This differential pattern likely reflects the primary role of V1 in orientation processing, with EVC showing a similar but weaker response profile. We have revised the text to clarity this point (Page 16; Line 327-329).

      (5) My primary concern about the interpretation of the finding is that the result, differences in cross-task decoding within V1 between the ping and no-ping condition might simply be explained by the fact that the ping condition refocuses attention during the long delay thus "resharpening" the template. In the no-ping condition during the 5.5 to 7.5 seconds long delay, attention for orientation might start getting less "crisp." In the ping condition, however, the ping itself might simply serve to refocus attention. So, the result is not showing the difference between the latent and non-latent stages, rather it is the difference between a decaying template representation and a representation during the refocused attentional state. It is important to address this point. Would a simple tone during the delay do the same? If so, the interpretation of the results will be different.

      We thank the reviewer for this comment. The reviewer proposed an alternative account suggesting that visual pings may function to refocus attention, rather than reactivate latent information during the preparatory period. If this account holds (i.e., attention became weaker in the no-ping condition and it was strengthened by the ping due to re-focusing), we would expect to observe a general enhancement of attentional decoding during the preparatory period. However, our data reveal no significant differences in overall attention decoding between two conditions during this period (ps > 0.519; BF<sub>excl</sub> > 3.247), arguing against such a possibility.

      The reviewer also raised an interesting question about whether an auditory tone during preparation could produce effects similar to those observed with visual pings. Although our study did not directly test this possibility, existing literature provides some relevant evidence. In particular, prior studies have shown that latent visual working memory contents are selectively reactivated by visual impulses, but not by auditory stimuli (Wolff et al., 2020). This finding supports the modality-specificity for visually encoded contents, suggesting that sensory impulses must match the representational domain to effectively access latent visual information, which also argues against the refocusing hypothesis above. However, we do think that this is an important question that merits direct investigation in future studies. We now added the related discussion on this point in the revised texts (Page 10, Line 202-203; Page 19, Line 392395).

      (6) The neural pattern distances measured using Mahalanobis values are really great! Have the authors tried to use all of the data, rather than the high AMI and low AMI to possibly show a linear relationship between response times and AMI?

      We thank the reviewer for this comment. We took the reviewer’s suggestion to explore the relationship between attentional modulation index (AMI) and RTs across participants for each session (see Figure 3). In the No-Ping session, we observed no significant correlation between AMI and RT (r = -0.366, p = 0.113). By contrast, the same analysis in the Ping condition revealed a significantly negative correlation (r = -0.518, p = 0.019). These results indicate that the attentional modulations evoked by visual impulse was associated with faster RTs, supporting the functional relevance of activating sensory-like representations during preparation. We have now included these inter-subject correlations in the main texts (Page 13, Line 258-264; Fig 3D and 3E) along with within-subject correlations in the Supplementary Information (Page 6, Line, 85-98; S3 Fig).

      (7) After reading the whole manuscript I still don't understand what the authors think the ping is actually doing, mechanistically. I would have liked a more thorough discussion, rather than referencing previous papers (all by the co-author).

      We thank the reviewer for this comment regarding the mechanistic basis of visual pings. We agree that this warrants deeper discussion. One possibility, as informed by theoretical studies of working memory, is that the sensory-like template could be maintained via an “activity-silent” mechanism through short-term changes in synaptic weights (Mongillo et al., 2008). In this framework, a visual impulse may function as nonspecific inputs that momentarily convert latent traces into detectable activity patterns (Rademaker & Serences, 2017). Related to our findings, it is unlikely that the orientation-specific templates observed during the Ping session emerged from purely non-sensory representations and were entirely induced by an exogenous ping, which was devoid of any orientation signal. Instead, the more parsimonious explanation is that visual impulse reactivated pre-existing latent sensory signals. To our knowledge, the detailed circuit-level mechanism of such reactivation is still unclear; existing evidence only suggests a relationship between ping-evoked inputs and the neural output (Wolff et al., 2017; Fan et al., 2021; Duncan et al., 2023). We now included the discussion on this point in the main texts (Page 19, Line 383-401).

      Reviewer #2 (Public review):

      (1) The origin of the latent sensory-like representation. By 'pinging' the neural activity with a high-contrast, task-irrelevant visual stimulus during the preparation period, the authors identified the representation of the attentional feature target that contains the same information as perceptual representations. The authors interpreted this finding as a 'sensory-like' template is inherently hosted in a latent form in the visual system, which is revealed by the pinging impulse. However, I am not sure whether such a sensory-like template is essentially created, rather than revealed, by the pinging impulses. First, unlike the classical employment of the pinging technique in working memory studies, the (latent) representation of the memoranda during the maintenance period is undisputed because participants could not have performed well in the subsequent memory test otherwise. However, this appears not to be the case in the present study. As shown in Figure 1C, there was no significant difference in behavioral performance between the ping and the no-ping sessions (see also lines 110-125, pg. 5-6). In other words, it seems to me that the subsequent attentional task performance does not necessarily rely on the generation of such sensory-like representations in the preparatory period and that the emergence of such sensory-like representations does not facilitate subsequent attentional performance either. In such a case, one might wonder whether such sensory-like templates are really created, hosted, and eventually utilized during the attentional process. Second, because the reference orientations (i.e. 45 degrees and 135 degrees) have remained unchanged throughout the experiment, it is highly possible that participants implicitly memorized these two orientations as they completed more and more trials. In such a case, one might wonder whether the 'sensory-like' templates are essentially latent working memory representations activated by the pinging as was reported in Wolff et al. (2017), rather than a functional signature of the attentional process.

      We thank the reviewer for this comment. We agree that the question of whether the sensory-like template is created or merely revealed by visual pinging is crucial for the understanding our findings. First, we acknowledge that our task may not be optimized for detecting changes in accuracy, as the task difficulty was controlled using individually adjusted thresholds (i.e., angular difference). Nevertheless, we observed some evidence supporting the neural-behavioral relationships. In particular, the impulse-driven sensory-like template in V1 contributed to facilitated faster RTs during stimulus selection (Page 12, Fig. 3D and 3E in the main texts; also see our response to R1, Point 6).

      Second, the reviewer raised an important concern about whether the attended feature might be stored in the memory system due to the trial-by-trial repetition of attention conditions (attend 45º or attend 135º). Although this is plausible, we don’t think it is likely. We note that neuroimaging evidence shows that attended working memory contents maintain sensory-like representations in visual cortex (Harrison & Tong, 2009; Serences et al., 2009; Rademaker et al., 2019), with generalizable neural activity patterns from perception to working memory delay-period, whereas unattended items in multi-item working memory tasks are stored in a latent state for prospective use (Wolff et al., 2017). Importantly, our task only required maintaining a single attentional template at a time. Thus, there was no need to store it via latent representations, if participants simply used a working memory mechanism for preparatory attention. Had they done so, we should expect to find evidence for a sensory template, i.e., generalizable neural pattern between perception and preparation in the No-Ping condition, which was not what we found. We have mentioned this point in the main texts (Page 18, Line 367-372).

      (2) The coexistence of the two types of attentional templates. The authors interpreted their findings as the outcome of a dual-format mechanism in which 'a non-sensory template' and a latent 'sensory-like' template coexist (e.g. lines 103-106, pg. 5). While I find this interpretation interesting and conceptually elegant, I am not sure whether it is appropriate to term it 'coexistence'. First, it is theoretically possible that there is only one representation in either session (i.e. a non-sensory template in the no-ping session and a sensory-like template in the ping session) in any of the brain regions considered. Second, it seems that there is no direct evidence concerning the temporal relationship between these two types of templates, provided that they commonly emerge in both sessions. Besides, due to the sluggish nature of fMRI data, it is difficult to tell whether the two types of templates temporally overlap.

      We thank the reviewer for the comment regarding our interpretation of the ‘coexistence’ of non-sensory and sensory-like attentional template. While we acknowledge the limitations of fMRI in resolving temporal relationships between these two types of templates, several aspects of our data support a dual-format interpretation.

      First, our key findings remained consistent for the subset of participants (N=14) who completed both No-Ping and Ping sessions in counterbalanced order. It thus seems improbable that participants systematically switched cognitive strategies (e.g., using non-sensory templates in the No-Ping session versus sensory-like templates in the Ping session) in response to the task-irrelevant, uninformative visual impulse. Second, while we agree with the reviewer that the temporal dynamics between these two templates remain unclear, it is difficult to imagine that orientation-specific templates observed during the Ping session emerged de novo from a purely non-sensory templates and an exogenous ping. In other words, if there is no orientation information at all to begin with, how does it come into being from an orientation-less external ping? It seems to us that the more parsimonious explanation is that there was already some orientation signal in a latent format, and it was activated by the ping, in line with the models of “activity-silent” working memory. To address these concerns, we have added the related discussion of these alternative interpretations in the main texts (Page 19, Line 387-391)

      (3) The representational distance. The authors used Mahalanobis distance to quantify the similarity of neural representation between different conditions. According to the authors' hypothesis, one would expect greater pattern similarity between 'attend leftward' and 'perceived leftward' in the ping session in comparison to the no-ping session. However, this appears not to be the case. As shown in Figures 3B and C, there was no major difference in Mahalanobis distance between the two sessions in either ROI and the authors did not report a significant main effect of the session in any of the ANOVAs. Besides, in all the ANOVAs, the authors reported only the statistic term corresponding to the interaction effect without showing the descriptive statistics related to the interaction effect. It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective and intuitive understanding of their data.

      We thank the reviewer for this comment. We expected greater pattern similarity between 'attend leftward' and 'perceived leftward' in the Ping session in comparison to the Noping session. This prediction was supported by a significant three-way interaction effect between session × attended orientation × perceived orientation (F(1,38) = 5.00, p = 0.031, η<sub>p</sub><sup>2</sup> = 0.116). In particular, there was a significant interaction between attended orientation × perceived orientation (F(1,19) = 9.335, p = 0.007, η<sub>p</sub><sup>2</sup> = 0.329) in the Ping session, but not in the No-Ping session (F(1,19) = 0.017, p = 0.898, η<sub>p</sub><sup>2</sup> = 0.001). These above-mentioned statistical results were reported in the original texts. In addition, this three-way mixed ANOVA (session × attended orientation × perceived orientation) on Mahalanobis distance in V1 revealed no significant main effects (session: F(1,38) = 0.009, p = 0.923, η<sub>p</sub><sup>2</sup> < 0.001; attended orientation: F(1,38) = 0.116, p = 0.735, η<sub>p</sub><sup>2</sup> = 0.003; perceived orientation: (F(1,38) = 1.106, p = 0.300, η<sub>p</sub><sup>2</sup> = 0.028). We agree with the reviewer that a complete reporting of analyses enhances understanding of the data. Therefore, we have now included the main effects in the main texts (Page 11, Line 233).

      We thank the reviewer for the suggestion regarding the inclusion of descriptive statistics for interaction effects. However, since the data were already visualized in Fig. 3B and 3C in the main texts, to maintain conciseness and consistency with the reporting style of other analyses in the texts, we have opted to include these statistics in the Supplementary Information (Page 5, Table 1).

      Reviewer #3 (Public review):

      (1) The title is "Dual-format Attentional Template," yet the supporting evidence for the nonsensory format and its guiding function is quite weak. The author could consider conducting further generalization analysis from stimulus selection to preparation stages to explore whether additional information emerges.

      We thank the reviewer for this comment. Our approach to investigate whether preparatory attention is encoded in sensory or non-sensory format - by training classifier using separate runs of perception task – closely followed methods from previous studies (Stokes et al., 2009; Peelen et al., 2011; Kok et al., 2017). Following the reviewer’s suggestion, we performed generalization analyses by training classifiers on activity during the stimulus selection period and testing them preparatory activity. However, we observed no significant generalization effects in either No-Ping and Ping sessions (ps > 0.780). This null result may stem from a key difference in the neural representations: classifiers trained on neural activity from stimulus selection period necessarily encode both target and distractor information, thus relying on somewhat different information than classifier trained exclusively on isolated target information in the perception task.

      (2) In Figure 2, the author did not find any decodable sensory-like coding in IPS and PFC, even during the impulse-driven session, indicating that these regions do not represent sensory-like information. However, in the final section, the author claimed that the impulse-driven sensorylike template strengthens informational connectivity between sensory and frontoparietal areas. This raises a question: how can we reconcile the lack of decodable coding in these frontoparietal regions with the reported enhancement in network communication? It would be helpful if the author provided a clearer explanation or additional evidence to bridge this gap.

      We thank the reviewer for this comment. We would like to clarity that although we did not observe sensory-like coding during preparation in frontoparietal areas, we did observe attentional signals in these regions, as evidenced by the above-chance within-task attention decoding performance (Fig. 2 in the main texts). This could reflect different neural codes in different areas, and suggests that inter-regional communication does not necessarily require identical representational formats. It seems plausible that the representation of a non-sensory attentional template in frontoparietal areas supports top-down attentional control, consistent with theories suggesting increasing abstraction as the cortical hierarchy ascends (Badre, 2008; Brincat et al., 2018), and their interaction with the sensory representation in the visual areas is enhanced by the visual impulse.

      (3) Given that the impulse-driven sensory-like template facilitated behavior, the author proposed that it might also enhance network communication. Indeed, they observed changes in informational connectivity. However, it remains unclear whether these changes in network communication have a direct and robust relationship with behavioral improvements.

      We thank the reviewer for the suggestion. To examine how network communication relates to behavior, we performed a correlation analysis between information connectivity (IC) and RTs across participants (see Figure S5). We observed a trend of correlations between V1-PFC connectivity and RTs in the Ping session (r = -0.394, p = 0.086), but not in the NoPing session (r = -0.046, <i.p\</i> = 0.846). No significant correlations were found between V1-IPS and RTs (\ps\ > 0.400) or between ICs and accuracy (ps > 0.399). These results suggests that ping-enhanced connectivity might contributed to facilitated responses. Although we may not have sufficient statistical power to warrant a strong conclusion, we think this result is still highly suggestive, so we now added the texts in the Supplementary Information (Page 8, Line 116121; S5 Fig) and mentioned this result in the main texts (Page 14, Line 292-293).

      (4) I'm uncertain about the definition of the sensory-like template in this paper. Is it referring to the Ping impulse-driven condition or the decodable performance in the early visual cortex? If it is the former, even in working memory, whether pinging identifies an activity-silent mechanism is currently debated. If it's the latter, the authors should consider whether a causal relationship - such as "activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas" - is reasonable.

      We apologize for the confusions. The sensory-like template by itself does not directly refer to representations under Ping session or the attentional decoding in early visual cortex. Instead, it pertains to the representational format of attentional signals during preparation. Specifically, its existence is inferred from cross-task generalization, where neural patterns from a perception task (perceive 45º or perceive 135º) generalize to an attention task (attend 45 º or attend 135º). We think this is a reasonable and accepted operational definition of the representational format. Our findings suggest that the sensory-like template likely existed in a latent state and was reactivated by visual pings, aligning more closely with the first account raised by the reviewer.

      We agree with the reviewer that whether ping identifies an activity-silent mechanism is currently debated (Schneegans & Bays, 2017; Barbosa et al., 2021). It is possible that visual impulse amplified a subtle but active representation of the sensory template during attentional preparation and resulted in decodable performance in visual cortex. Distinguishing between these two accounts likely requires neurophysiological measurements, which are beyond the scope of the current study. We have explicitly addressed this limitation in our Discussion (Page 19, Line 395-399).

      Nevertheless, the latent sensory-like template account remains plausible for three reasons. First, our interpretation aligns with theoretical framework proposing that the brain maintains more veridical, detailed target templates than those typically utilized for guiding attention (Wolfe, 2021; Yu et al., 2023). Second, this explanation is consistent with the proposed utility of latent working memory for prospective use, as maintaining a latent sensory-like template during preparation would be useful for subsequent stimulus selection. The latter point was further supported by the reviewer’s suggestion about whether “activating the sensory-like template strengthens the informational connectivity between sensory and frontoparietal areas is reasonable”. Our additional analyses (also refer to our response to Reviewer 3, Point 3) suggested that impulse-enhanced V1-PFC connectivity was associated with a trend of faster behavioral responses (r = -0.394, p = 0.086; see Supplementary Information, Page 8, Line 116-121; S5 Fig). Considering these findings in totality, we think it is reasonable to suggest that visual impulse may strengthen information flow among areas to enhance attentional control.

      Recommendation for the Authors:

      Reviewer #1 (Recommendation for the authors):

      I hate to suggest another fMRI experiment, but in order to make strong claims about two states, I would want to see the methodological and interpretation confounds addressed. Ping condition - would a tone lead to the same result of sharpening the template? If so, then why? Can a ping be manipulated in its effectiveness? That would be an excellent manipulation condition.

      We thank the reviewer for the comments. Please refer to our reply to Reviewer 1, Point 5 for detailed explanation.

      Reviewer #2 (Recommendation for the authors):

      It is strongly advised that these descriptive statistics related to the interaction effect should be included to facilitate a more effective understanding of their data.

      We thank the reviewer for the comments. We now included the relevant descriptive statistics in the Supplementary Information, Table 1.

      Reviewer #3 (Recommendation for the authors):

      In addition to p-values, I see many instances of 'ps'. Does this indicate the plural form of p?

      We used ‘ps’ to denote the minimal p-value across multiple statistical analyses, such as when applying identical tests to different region groups.

      References

      Aitken, F., Menelaou, G., Warrington, O., Koolschijn, R. S., Corbin, N., Callaghan, M. F., & Kok, P. (2020). Prior expectations evoke stimulus-specific activity in the deep layers of the primary visual cortex. PLoS Biology, 18(12), e3001023.

      Badre, D. (2008). Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12(5), 193-200.

      Barbosa, J., Lozano-Soldevilla, D., & Compte, A. (2021). Pinging the brain with visual impulses reveals electrically active, not activity-silent, working memories. PLoS Biology, 19(10), e3001436.

      Battistoni, E., Stein, T., & Peelen, M. V. (2017). Preparatory attention in visual cortex. Annals of the New York Academy of Sciences, 1396(1), 92-107.

      Brincat, S. L., Siegel, M., von Nicolai, C., & Miller, E. K. (2018). Gradual progression from sensory to task-related processing in cerebral cortex. Proceedings of the National Academy of Sciences, 115(30), E7202-E7211.

      Duncan, D. H., van Moorselaar, D., & Theeuwes, J. (2023). Pinging the brain to reveal the hidden attentional priority map using encephalography. Nature Communications, 14(1), 4749.

      Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuroscience, 27(1), 649-677.

      Gong, M., Chen, Y., & Liu, T. (2022). Preparatory attention to visual features primarily relies on nonsensory representation. Scientific Reports, 12(1), 21726.

      Fan, Y., Han, Q., Guo, S., & Luo, H. (2021). Distinct Neural Representations of Content and Ordinal Structure in Auditory Sequence Memory. Journal of Neuroscience, 41(29), 6290–6303.

      Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458(7238), 632-635.

      Jigo, M., Gong, M., & Liu, T. (2018). Neural determinants of task performance during feature-based attention in human cortex. eNeuro, 5(1).

      Kok, P., Failing, M. F., & de Lange, F. P. (2014). Prior expectations evoke stimulus templates in the primary visual cortex. Journal of Cognitive Neuroscience, 26(7), 1546-1554.

      Kok, P., Mostert, P., & De Lange, F. P. (2017). Prior expectations induce prestimulus sensory templates. Proceedings of the National Academy of Sciences, 114(39), 10473-10478.

      Liu, T., Stevens, S. T., & Carrasco, M. (2007). Comparing the time course and efficacy of spatial and feature-based attention. Vision Research, 47(1), 108-113.

      Mongillo, G., Barak, O., & Tsodyks, M. (2008). Synaptic theory of working memory. Science, 319(5869), 1543-1546.

      Peelen, M. V., & Kastner, S. (2011). A neural basis for real-world visual search in human occipitotemporal cortex. Proceedings of the National Academy of Sciences, 108(29), 12125-12130. Priebe, N. J. (2016). Mechanisms of orientation selectivity in the primary visual cortex. Annual Review of Vision Science, 2(1), 85-107.

      Rademaker, R. L., & Serences, J. T. (2017). Pinging the brain to reveal hidden memories. Nature Neuroscience, 20(6), 767-769.

      Rademaker, R. L., Chunharas, C., & Serences, J. T. (2019). Coexisting representations of sensory and mnemonic information in human visual cortex. Nature Neuroscience, 22(8), 1336-1344.

      Serences, J. T., Ester, E. F., Vogel, E. K., & Awh, E. (2009). Stimulus-specific delay activity in human primary visual cortex. Psychological Science, 20(2), 207-214.

      Schneegans, S., & Bays, P. M. (2017). Restoration of fMRI decodability does not imply latent working memory states. Journal of Cognitive Neuroscience, 29(12), 1977-1994.

      Stokes, M., Thompson, R., Nobre, A. C., & Duncan, J. (2009). Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proceedings of the National Academy of Sciences, 106(46), 19569-19574.

      Wolfe, J. M. (2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060-1092.

      Wolff, M. J., Jochim, J., Akyürek, E. G., & Stokes, M. G. (2017). Dynamic hidden states underlying working-memory-guided behavior. Nature Neuroscience, 20(6), 864 – 871.

      Wolff, M. J., Kandemir, G., Stokes, M. G., & Akyürek, E. G. (2020). Unimodal and bimodal access to sensory working memories by auditory and visual impulses. Journal of Neuroscience, 40(3), 671-681.

      Yu, X., Zhou, Z., Becker, S. I., Boettcher, S. E., & Geng, J. J. (2023). Good-enough attentional guidance. Trends in Cognitive Sciences, 27(4), 391-403.

    1. eLife Assessment

      This study aims to clarify the effects of cochlear neural degeneration on auditory processing in listeners with normal audiograms (sometimes referred to as 'hidden hearing loss'). The authors provide important new data demonstrating associations between cochlear neural degeneration, non-invasive assays of auditory processing, and speech perception. Based on a cross-species comparison, the findings pose compelling evidence that cochlear synaptopathy is associated with a significant part of hearing difficulties in complex environments.

    2. Reviewer #1 (Public review):

      This study is part of an ongoing effort to clarify the effects of cochlear neural degeneration (CND) on auditory processing in listeners with normal audiograms. This effort is important because ~10% of people who seek help for hearing difficulties have normal audiograms and current hearing healthcare has nothing to offer them.

      The authors identify two shortcomings in previous work that they intend to fix. The first is a lack of cross-species studies that make direct comparisons between animal models in which CND can be confirmed and humans for which CND must be inferred indirectly. The second is the low sensitivity of purely perceptual measures to subtle changes in auditory processing. To fix these shortcomings, the authors measure envelope following responses (EFRs) in gerbils and humans using the same sounds, while also performing histological analysis of the gerbil cochleae, and testing speech perception while measuring pupil size in the humans.

      The study begins with a comprehensive assessment of the hearing status of the human listeners. The only differences found between the young adult (YA) and middle aged (MA) groups are in thresholds at frequencies > 10 kHz and DPOAE amplitudes at frequencies > 5 kHz. The authors then present the EFR results, first for the humans and then for the gerbils, showing that amplitudes decrease more rapidly with increasing envelope frequency for MA than for YA in both species. The histological analysis of the gerbil cochleae shows that there were, on average, 20% fewer IHC-AN synapses at the 3 kHz place in MA relative to YA, and the number of synapses per IHC was correlated with the EFR amplitude at 1024 Hz.

      The study then returns to the humans to report the results of the speech perception tests and pupillometry. The correct understanding of keywords decreased more rapidly with decreasing SNR in MA than in YA, with a noticeable difference at 0 dB, while pupillary slope (a proxy for listening effort) increased more rapidly with decreasing SNR for MA than for YA, with the largest differences at SNRs between 5 and 15 dB. Finally, the authors report that a linear combination of audiometric threshold, EFR amplitude at 1024 Hz, and a few measures of pupillary slope is predictive of speech perception at 0 dB SNR.

      I only have two questions/concerns about the specific methodologies used:

      (1) Synapse counts were made only at the 3 kHz place on the cochlea. But the EFR sounds were presented at 85 dB SPL, which means that a rather large section of the cochlea will actually be excited. Do we know how much of the EFR actually reflects AN fibers coming from the 3 kHz place? And are we sure that this is the same for gerbils and humans given the differences in cochlear geometry, head size, etc.?

      [Note added after revision: the authors have added new data, references, and discussion that have answered my initial questions].

      (2) Unless I misunderstood, the predictive power of the final model was not tested on held out data. The standard way to fit and test such model would be to split the data into two segments, one for training and hyperparameter optimization, and one for testing. But it seems that the only spilt was for training and hyperparameter optimization.

      [Note added after revision: the authors now make it clear in their response that the modeling tells us how much of the current data can be explained but not necessary about generalization to other datasets.]

      While I find the study to be generally well executed, I am left wondering what to make of it all. The purpose of the study with respect to fixing previous methodological shortcomings was clear, but exactly how fixings these shortcomings has allowed us to advance is not. I think we can be more confident than before that EFR amplitude is sensitive to CND, and we now know that measures of listening effort may also be sensitive to CND. But where is this leading us?

      I think what this line of work is eventually aiming for is to develop a clinical tool that can be used to infer someone's CND profile. That seems like a worthwhile goal but getting there will require going beyond exploratory association studies. I think we're ready to start being explicit about what properties a CND inference tool would need to be practically useful. I have no idea whether the associations reported in this study are encouraging or not because I have no idea what level of inferential power is ultimately required.

      [Note added after revision: the authors have added to the Discussion to put their work into a broader perspective.]

      That brings me to my final comment: there is an inappropriate emphasis on statistical significance. The sample size was chosen arbitrarily. What if the sample had been half the size? Then few, if any, of the observed effects would have been significant. What if the sample had been twice the size? Then many more of the observed effects would have been significant (particularly for the pupillometry). I hope that future studies will follow a more principled approach in which relevant effect sizes are pre-specified (ideally as the strength of association that would be practically useful) and sample sizes are determined accordingly.

      [Note added after revision: my intention with this comment was not to make a philosophical or nitty-gritty point about statistics. It was more of a follow on to the previous point. Because I don't know what sort of effect size is big enough to matter (for whatever purpose), I don't find the statistical significance (or lack thereof) of the effect size observed to be informative. But I don't think there is anything more that the authors can or should do in this regard.]

      So, in summary, I think this study is a valuable but limited advance. The results increase my confidence that non-invasive measures can be used to infer underlying CND, but I am unsure how much closer we are to anything that is practically useful.

    3. Reviewer #2 (Public review):

      Summary:

      This paper addresses the bottom-up and top-down causes of hearing difficulties in middle-aged adults with clinically-normal audiograms using a cross-species approach (humans vs. gerbils, each with two age groups) mixing behavioral tests and electrophysiology.. The study is not only a follow-up of Parthasarathy et al (eLife 2020), since there are several important differences. Parthasarathy et al. (2020) only considered a group of young normal-hearing individuals with normal audiograms yet with high complaints for hearing in noisy situations. Here, this issue is considered specifically regarding aging, using a between-subject design comparing young NH and older NH individuals recruited from the general population, without additional criterion (i.e. no specifically high problems of hearing in noise). In addition, this is a cross-species approach, with the same physiological EFR measurements with the same stimuli deployed on gerbils.

      This article is of very high quality. It is extremely clear, and the results show clearly a decrease of neural phase-locking to high modulation frequencies in both middle-aged humans and gerbils, compared to younger groups/cohorts. In addition, pupillometry measurements conducted during the QuickSIN task suggest increased listening efforts in middle-aged participants, and a statistical model including both EFRs and pupillometry features suggest that both factors contribute to reduced speech-in-noise intelligibility evidenced in middle-aged individuals, beyond their slight differences in audiometric thresholds (although they were clinically normal in both groups).

      These provide strong support to the view that normal aging in humans leads to auditory nerve synaptic loss (cochlear neural degeneration - CND- or, put differently, cochlear synaptopathy) as well as increased listening effort, before any clearly visible audiometric deficits as defined in current clinical standards. This result is very important for the community, since we are still missing direct evidence that cochlear synaptopathy might likely underly a significant part of hearing difficulties in complex environments for listeners with normal thresholds, such as middle-aged and senior listeners. This paper shows that these difficulties can be reasonably well accounted for by this sensory disorder (CND), but also that listening effort, i.e. a top-down factor, further contributes to this problem. The methods are sound, well described and I would like to emphasize that they are presented concisely yet in a very precise manner, so that they can be understood very easily - even for a reader that is not familiar with the employed techniques. I believe this study will be of interest to a broad readership. I have some comments and questions which I think would make the paper even stronger once addressed.

      Main comments:

      (1) Presentation of EFR analyses / Interpretation of EFR differences found in both gerbils and humans

      a) Could you comment further on why you think you found a significant difference only at the highest mod. frequency of 1024 Hz in your study? Indeed, previous studies employing SAM or RAM tones very similar to the ones employed here were able to show age effects already at lower modulation freqs. of ~100H; e.g. there are clear age effects reported in human studies of Vasilikov et al. (2021) or Mepani et al. (2021), and also in animals ( see Garrett et al. bioRxiv : https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.pdf)

      Furthermore, some previous EEG experiments in humans that SAM tones with modulation freqs. of ~100Hz showed that EFRs do not exhibit a single peak, i.e. there are peaks not only at fm but also for the first harmonics (e.g. 2fm or 3fm) see e.g. Garrett et al. bioXiv https://www.biorxiv.org/content/biorxiv/early/2024/04/30/2020.06.09.142950.full.pdf

      Did you try to extract EFR strength by looking at the summed amplitude of multiple peaks (Vasilikov Hear Res. 2021), in particular for the lower modulation frequencies? (Indeed, there will be no harmonics for the higher mod. freqs).

      b) How the present EFR results relate to FFR results, where effects of age are already at low carrier freqs? (e.g. Märcher-Rørsted et al., Hear. Res., 2022 for pure tones with freq < 500 Hz) Do you think it could be explained by the fact that this is not the same cochlear region, and that synapses die earlier in higher compared to lower CFs. This should be discussed. Beyond the main group effect of age, there were no negative correlations of EFRs with age in your data?

      (2) Size of the effects / comparing age effects between two species: Although the size of the age effect on EFRs cannot be directly compared between humans and gerbils - the comparison remains qualitative - could you a least provide references regarding the rate of synaptic loss with aging in both humans and gerbils, so that we understand that the yNH/MA difference can be compared between the two age groups used for gerbils; it would have been critical in case of a non-significant age effect in one species.

      Equalization / control of stimuli differences across the two species: For measuring EFRs, SAM stimuli were presented at 85 dB SPL for humans vs. 30 dB above detection threshold (inferred from ABRs) for gerbils - I do not think the results strongly depend on this choice, but it would be good to comment on why you did not choose also to present stimuli 30 dB above thresholds in humans.

      Simulations of EFRs using functional models could have been used to understand (at least in humans) how the differences in EFRs obtained between the two groups are quantitatively compatible with the differences in % of remaining synaptic connections known from histopathological studies for their age range (see the approach in Märcher-Rørsted et al., Hear. Res., 2022)

      (3) Synergetic effects of CND and listening effort Could you test whether there is an interaction between CNR and listening effort? (e.g. one could hypothesize that MA subjects with largest CND have also the higher listening effort)

      Comments on revised version:

      The authors did well to address all the points raised in my review. This paper will make an important contribution to our assessment of the sources of age-related auditory processing deficits beyond the cochlea that impair speech intelligibility.

    1. eLife Assessment

      This study provides valuable findings on the effects of mating experience on sweet taste perception. The data as presented provide convincing evidence that the dopaminergic signaling-mediated reward system underlies this mating state-dependent behavioral modulation. The work will interest neuroscientists and particularly biologists working on neuromodulation and the effects of internal states on sensory perception.

    2. Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the subesophageal zone that interact with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet taste sensitivity and sugar feeding behavior in the male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the subesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. report that DopR1 and Dop2R, but not DopEcR, are involved in the mating failure-induced suppression of sugar sensitivity in these neurons. Further investigation is needed to clarify how dopamine selectively engages different receptor types depending on internal state.

      The data in this revised version are largely convincing and support the authors' conclusions. However, I remain concerned about the results shown in Figure 6E. The authors show that knocking down DopR1 or Dop2R in Gr5a+ neurons restores sucrose-evoked activity in Failed flies to levels seen in Naive and Satisfied animals. This appears to contradict the proposed model, in which these receptors positively modulate Gr5a+ activity through dopaminergic input. If dopamine signaling is reduced in Failed flies, further receptor knockdown should have no effect or further reduce activity-not restore it. I encourage the authors to clarify this apparent inconsistency and, if possible, provide a mechanistic explanation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP, to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      The authors updated missing literature in the manuscript and performed additional experiments regarding behavior, but also to further prove the functional connectivity between TH neurons and GR5a neurons.

      I have no further recommendations.

    4. Reviewer #3 (Public review):

      Summary

      This study by Wang et al. explores a compelling link between two fundamental innate behaviors in Drosophila melanogaster, mating and feeding, demonstrating that repeated sexual failure in male flies leads to a transient yet reversible decrease in sweet taste perception. The authors show that this modulation is mediated by dopamine signaling from a specific subset of dopaminergic neurons in the subesophageal zone (SEZ) that directly influence Gr5a⁺ sweet-sensing neurons.

      Aims of the Study

      The authors aimed to understand whether unsuccessful mating attempts could affect sensory processing of sweet stimuli and thus feeding behavior in male fruit flies. They further sought to dissect the neural circuitry and molecular pathways underlying this behavioral plasticity, with a particular focus on dopaminergic modulation.

      Major Strengths and Weaknesses

      Strengths:

      • Novelty: The idea that reproductive experience modulates gustatory perception adds a new dimension to our understanding of cross-modal behavioral integration.

      • Experimental approach: The study uses a broad array of genetic, pharmacological, imaging, and behavioral assays to demonstrate a causal relationship between sexual failure and reduced sweet perception, mediated by specific dopaminergic pathways.

      • Methodological design: The authors link behavioral outcomes (reduced proboscis extension reflex) with neural activity (calcium imaging of Gr5a⁺ neurons) and molecular specificity (dopamine receptor subtype roles), providing a robust multi-level framework.

      Weaknesses:

      • Ecological relevance: While the laboratory conditions are well controlled, the adaptive value or natural context of this taste modulation following mating failure remains speculative.

      Achievement of Aims and Support for Conclusions

      The authors have convincingly achieved their central aim. The results support the conclusion that sexual failure reduces sweet taste sensitivity through dopamine signaling. The reduced activity in Gr5a⁺ neuron after courtship rejection, its rescue by dopamine or successful copulation, and the requirement of specific dopamine receptors support the proposed model.

      Impact and Utility

      This work advances the field's understanding of how motivational states shaped by social experiences can directly influence sensory perception and behavior. It underscores the role of the dopaminergic system not only in reward but in integrating internal states across distinct behavioral responses. The experimental approach, including courtship conditioning paradigms and in vivo imaging methods, provides a valuable foundation for related studies in sensory modulation and behavioral plasticity.

      Additional Context

      This study supports a growing body of literature suggesting that insects possess emotion-like internal states that influence their behavior across contexts. The findings resonate with prior work on how stressors like social isolation or courtship failure lead to compensatory changes in other reward-seeking behaviors (e.g., ethanol consumption). Moreover, the concept that neural systems underlying basic drives like hunger and mating are dynamically interconnected may be conserved across phyla, suggesting broader relevance to understanding internal state-dependent modulation of behavior.

      The authors addressed all the comments of previous reviews. The changes increased the clarity of the manuscript, the interpretation of the results and reinforce the conclusion.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the suboesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the suboesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      Our immunostaining experiments showed that three dopamine receptors, Dop1R1, Dop2R, and DopEcR were expressed in Gr5a<sup>+</sup> neurons in the proboscis, which was consistent with previous findings by using RT-PCR (Inagaki et al 2012). As the reviewer pointed out, we found that Dop1R1 and Dop2R were required for courtship failure-induced suppression of sugar sensitivity, whereas Marella et al 2012 and Inagaki et al 2012 found that Dop2R and DopEcR were required for starvation-induced enhancement of sugar sensitivity. These results may suggest that different internal states (courtship failure vs. starvation) modulate the peripheral sensory system via different signaling pathways (e.g. different subsets of dopaminergic neurons; different dopamine release mechanisms; and different dopamine receptors). We have discussed these possibilities in the revised manuscript.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      This is an important consideration. To rule out potential confound from food intake during courtship conditioning, we have now also conducted courtship conditioning in vials absent of food. In the absence of any feeding opportunity over the 4.5-hour courtship conditioning period, sexually rejected males still exhibited a robust decrease in sweet taste sensitivity compared with Naïve and Satisfied controls (Figure 1-supplement 1C). These data confirm that the suppression of PER is driven by courtship failure per se, rather than by differences in feeding during the conditioning phase.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      Our initial description of the experimental setup might be a bit confusing. Here is a brief clarification of our experimental design and we have further clarified the details in the revised manuscript, which should resolve the reviewer’s concerns:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the dose-response curve). On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005).

      In the initial submission, we used 400 mM sucrose for the MAFE assay. When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding, as a natural consequence of decreased sugar sensitivity (Figure 1B). We were able to quantify the actual volume of food consumed of these flies showing PER responses towards 400 mM sucrose and observed no change (Figure 1-supplement 1A, left). To avoid potential confusion, we have now repeated the MAFE assay with 800 mM sucrose, which elicited feeding in ~100% of flies among all three groups, as shown in Figure 1C. Again, we observed no change in food intake (Figure 1-supplement 1A, right).

      These experiments in combination suggest that sexual failure suppresses sweet sensitivity of the Failed males. Meanwhile, as long as they still responded to a certain food stimulus and initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      We have now quantified the effect of TH<sup>+</sup> neuron activation on Gr5a<sup>+</sup> neuron calcium responses. in Naïve males, dTRPA1-mediated activation of TH<sup>+</sup> cells significantly enhanced sucrose-induced calcium responses (Figure 3-supplement 1C); while in Failed males, the baseline activity of Gr5a<sup>+</sup> neurons was lower (Figure 3C), the same activation also produced significant (even slightly larger) effect on the calcium responses of Gr5a<sup>+</sup> neurons (Figure 3-supplement 1D).

      Taken together, we would argue that these experiments using both Naïve and Failed males were adequate to show a functional link between TH<sup>+</sup> neurons and Gr5a<sup>+</sup> neurons. Combining with the results that these neurons form active synapses (Figure 3-supplement 1B) and that the activity of TH<sup>+</sup> neurons was dampened in sexually failed males (Figure 3G-I), our data support the notion that sexual failure suppresses sweet sensitivity via TH-Gr5a circuitry.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      We have added detailed information of statistical analysis in each figure legend.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      We agree with the reviewer that manipulating Dop1R1 and Dop2R genes (Figure 4) and the neurons expressing them (Figure 5) might have broader impacts. For specificity, we have also tested the role of Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons by RNAi experiments (Figure 6). As shown by both behavioral and calcium imaging experiments, knocking down Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons both eliminated the effect of sexual failure to dampen sweet sensitivity, further confirming the role of these two receptors in Gr5a<sup>+</sup> neurons.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      We agree that there might be some potential discrepancies. We have now addressed the issues by re-conducting these calcium imaging experiments again with a head-to-head comparison with the controls (Gr5a-GCaMP, +/- Dop1R1 and Dop2R RNAi).

      In these new experiments, Dop1R1 or Dop2R knockdown completely prevented the suppression of Gr5a<sup>+</sup> neuron responsiveness by courtship failure (Figure 6E), whereas the activities of Gr5a<sup>+</sup> neurons in Naïve/Satisfied groups were not altered. These results demonstrate that Dop1R1 and Dop2R are specifically required to mediate the decrease in sweet sensitivity following courtship failure.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

      We have changed the wording in the revised manuscript. In brief, we think that these manipulations have two consequences: suppressing the overall sweet sensitivity, and eliminating the effect of sexual failure on sweet sensitivity.

      Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      We agree with the reviewer that in the current study, we did not examine the exact mechanism of how mating experience suppressed the activity of dopaminergic neurons in the SEZ. The current study mainly focused on the behavioral characterization (sexual failure suppresses sweet sensitivity) and the downstream mechanism (TH-Gr5a pathway). We think that examining the upstream modulatory mechanism may be more suitable for a separate future study.

      We believe that a sustained reduction in sweet sensitivity (not limited to sucrose but extend to other sweet compounds Figure 1-supplement 1D-E) upon courtship failure suggests a generalized and sustained consequence on reward-related behaviors. Sexual failure may thus resemble a state of “primitive emotion” in fruit flies. We have further discussed this possibility in the revised manuscript.

      Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

      Please refer to our responses to the Public Review (Reviewer 1, Point 2) for details.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The label for the y-axis in Figure 1B should be "fraction", not "percentage".

      We have revised the figure as suggested.

      (2) I suggest that the authors indicate the ROIs they used to quantify the signal intensity in Figure 3E and G.

      We have revised the figures as suggested.

      (3) There is a typo in Figure 4A: it should be "Wilde type", not "Wide type".

      We have revised the figure as suggested.

      (4) The elav-GAL4/+ data in Figure 4-S1B, C, and D appears to be reused across these panels. However, the number of asterisks indicating significance in the MAT plots differs between them (three in panels B and C, and four in panel D). Is this a typo?

      It is indeed a typo, and we have revised the figure accordingly.

      Reviewer #2 (Recommendations for the authors):

      Additional comments:

      The authors should add this missing literature about dopamine and neuromodulation in courtship:

      Boehm et al., 2022 (eLife) - this study shows that mating affects olfactory behavior in females.

      Cazalé-Debat et al., 2024 (Nature) - Mating proximity blinds threat perception.

      Gautham et al., 2024 (Nature) - A dopamine-gated learning circuit underpins reproductive state-dependent odor preference in Drosophila females.

      We have added these references in the introduction section.

      Has the mating behavior been quantified? How often did males copulate with mated and virgin females?

      We tried to examine the copulation behavior based on our video recordings. In the “Failed” group (males paired with mated females), we observed virtually no successful copulation events at all, confirming that nearly 100% of those males experienced sexual failure. In contrast, males in the “Satisfied” group (paired with virgin females) mated on average 2-3 times during the 4.5-hour conditioning period. We have added some explanations in the manuscript.

      Do the rejected males live shorter? Is the effect also visible when they are fed with normal fly food, or is it only working with sugar?

      We did not directly measure the lifespan of these males. But we conducted a relevant assay (starvation resistance), in which “Failed” males died significantly faster than both Naïve and Satisfied controls, indicating a clear reduction in their ability to endure food deprivation (Figure 1-supplement 1B). Since sweet taste is a primary cue for food detection in Drosophila, and sugar makes up a large portion of their standard diet, the drop in sugar sensitivity we observed in Failed males could likewise impair their perception and consumption of regular fly food, hence their resistance to starvation.

      Also, the authors mention that the reward pathway is affected, this is probably the case as sugar sensation is impaired. One interesting experiment would be (and maybe has been done?) to test rejected males in normal odor-fructose conditioning. The data would suggest that they would do worse.

      We have already measured how courtship failure affected fructose sensitivity (Figure 1 supplement 1D), and we found that the reduction in fructose perception was even more profound than for sucrose. We have not yet tested whether Failed males showed deficits in odor-fructose associative conditioning. That was indeed a very interesting direction to explore. But olfactory reward learning relies on molecular and circuit mechanisms distinct from those governing taste. We therefore argue such experiments would be more suitable in a separate, follow up study.

      The authors could have added another group where males are exposed to other males. It would be interesting if this is also a "stressful" context and if it would also reduce sugar preference - probably beyond the scope of this paper.

      In our experiments, all flies, including those in the Naïve, Failed, and Satisfied groups, were housed in groups of 25 males per vial before the conditioning period (and the Naïve group remained in the same group housing until PER testing). This means every cohort experienced the same level of “social stress” from male-male interactions. While it would indeed be interesting to compare that to solitary housing or other male-only exposures, isolation itself imposes a different kind of stress, and disentangling these effects on sugar preference would require a separate, dedicated study beyond the scope of the present work.

      Would the behavior effect also show up with experienced males? Maybe this has been tested before. Does mating rejection in formerly successful males have the same impact?

      As suggested by the reviewer, we performed an additional experiment in which males that had previously mated successfully were subsequently subjected to courtship rejection. As shown in Figure 1 supplement 1F, prior successful mating did not prevent the decline in sweet sensitivity induced by subsequent mating failure, indicating that even experienced males exhibit the reduction in sugar sensitivity after rejection.

      Is the same circuit present and functioning in females? Does manipulating dopamine receptors in GR5a neurons in females lead to the same phenotype? This would suggest that different internal states in males and females could lead to the same phenotype and circuit modulations.

      This is indeed a very interesting suggestion. In male flies, Gr5a-specific knockdown of dopamine receptors did not alter baseline sweet sensitivity, but it selectively prevented the reduction in sugar perception that followed mating failure (Figure 6C-D), indicating that this dopaminergic pathway is engaged only in the context of courtship rejection. By extension, knocking down the same receptors in female GR5a neurons would likewise be expected to leave their basal sugar sensitivity unchanged. Moreover, because there is currently no established paradigm for inducing mating failure in female flies, we cannot yet test whether sexual rejection similarly modulates sweet taste in females, or whether it operates via the same circuit.

      Reviewer #3 (Recommendations for the authors):

      Suggestions to the authors:

      Introduction, line 61. I suggest the authors add references in fruit flies concerning the rewarding nature of mating. For example, the paper from Zhang et al, 2016 "Dopaminergic Circuitry Underlying Mating Drive" demonstrates the role of the dopamine rewarding system in mating drive. There is a large body of literature showing the link between dopamine and mating.

      We have added this literature in the introduction section.

      Figure 1B and Figure Supplement 1: If I understood correctly, Figure Supplement 1A shows that the total food consumption across all tested flies remains unchanged. However, fewer flies that failed to mate consumed sucrose. I would be curious to see the results for sucrose consumption per individual fly that did eat. According to their results, individual flies that failed to mate should consume more sucrose. This would change the conclusion. The authors currently show that a group of flies that failed to mate consumed less sucrose overall, but since fewer males actually ate, those that failed to mate and did eat consumed more sucrose. The authors should distinguish between failed and satisfied flies in two groups: those that ate and those that did not.

      Please see our responses to the Public Review for details (Reviewer 1, Point 2).

      Figure 1C, right: For a better understanding of all the "MAT" figures, I suggest the authors start the Y axis with the unit 25 and increase it to 400. This would match better the text (line 114) saying that it was significantly elevated in the failed group. As it is, we have the impression of a decrease in the graph.

      We have revised the figures accordingly.

      Line 103: When suggesting a reduced likelihood of meal initiation of these males, do these males take longer to eat when they did it? In other words, is the latency to eat increased in failed males? That would be a good measure of motivational state.

      We tried to analyze feeding latency in the MAFE assay by measuring the time from sucrose presentation to the first proboscis extension, but it was too short to be accurately accounted. Nevertheless, when conducting the experiments, we did not feel/observe any significant difference in the feeding latency between Failed males and Naïve or Satisfied controls.

      Line 117. I don't understand which results the authors refer to when writing "an overall elevation in the threshold to initiate feeding upon appetitive cues". Please specify.

      This phrase refers to the fact that for every sweet tastant we tested, including sucrose (Figure 1C), fructose and glucose (Figure 1 supplement 1D-E), the concentration-response curve in Failed males shifted to the right, and the Mean Acceptance Threshold (MAT) was significantly higher. In other words, for these different appetitive cues, mating failure raised the concentration of sugar required to trigger a proboscis extension, indicating a general elevation in the threshold to initiate feeding upon an appetitive cue.

      Figure 1D. Please specify the time for the satisfied group.

      For clarity, the Naïve and Satisfied groups in Figure 1D each represent pooled data from 0 to 72 hours post-treatment, as their sweet sensitivity remained stable throughout this period. Only the Failed group was shown with time-resolved data, since it was the only group exhibiting a dynamic change in sugar sensitivity over time. We have now specified this in the figure legend.

      Figure 1F. The phenotype was not totally reversed in failed-re-copulated males. Could it be due to the timing between failure and re-copulation? I suggest the authors mention in the figure or in the text, the time interval between failure and re-copulation.

      We’d like to clarify that the interval between the initial treatment (“Failed”) and the opportunity for re copulation was within 30 minutes. The incomplete reversal in the Failed-re-copulated group indeed raised interesting questions. One possible explanation is that mating failure reduces synaptic transmissions between the SEZ dopaminergic neurons and Gr5a<sup>+</sup> sweet sensory neurons (Figure 3), and the regeneration of these transmissions takes a longer time. We have added this information to the figure legend and the Method section.

      Line 227-228 and Figure 3E. The authors showed that the synaptic connections between dopaminergic neurons and Gr5a+ GRNs were significantly weakened. I am wondering about the delay between mating failure and the GFP observation. It would be informative to know this timing to interpret this decrease in synaptic connections. If the timing is relatively long, it is possible that we can observe a neuronal plasticity. However, if this timing is very short, I would not expect such synaptic plasticity.

      The interval between the behavioral treatment and the GRASP-GFP experiment was approximately 20 hours. We chose this time window because it was sufficient for both GFP expression and accumulation. Therefore, the observed reduction in synaptic connections between dopaminergic neurons and Gr5a<sup>+</sup> GRNs likely reflects a genuine, experience-induced structural and functional change rather than an immediate, transient effect. We have added this information to the revised manuscript for clarity in the Method section.

      Line 240-243: The authors demonstrated that there is a reduction of CaLexA-mediated GFP signals in dopaminergic neurons in the SEZ after mating failure, but not a reduction in Gr5a+ GRNs. I suggest replacing "indicate" with "suggest' in line 240.

      We have made the change accordingly. Meanwhile, we would like to clarify that while we observed a reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G), we did not directly test NFAT signal in Gr5a<sup>+</sup> neurons. Notably, the results that the synaptic transmissions from SEZ dopaminergic neurons to Gr5a<sup>+</sup> neurons were weakened (Figure 3E-F), and the reduction of NFAT signal in SEZ dopaminergic neurons (Figure 3G-I), were in line with a reduction in sweet sensitivity of Gr5a<sup>+</sup> neurons upon courtship failure (Figure 3B-D).

      Line 243: replace "consecutive" with "constitutive".

      We have revised it accordingly.

      Figure 5: I have trouble understanding the results obtained in Figure 5. Both constitutive activation and inhibition of Dop1R1 and Dop2R neurons lead to the same results, knowing that males who failed mating no longer exhibit decreased sweet sensitivity. I would have expected contrary results for both experimental conditions. I suggest the author to discuss their results.

      Both activation and inhibition of Dop1R1 and Dop2R neurons eliminated the effect of courtship failure on sweet sensitivity (Figure 5). These results are in line with our hypothesis that courtship failure leads to changes in dopamine signaling and hence sweet sensitivity. If dopamine signaling via Dop1R1 and Dop2R was locked, either to a silenced or a constitutively activated state, the effect of courtship failure on sweet sensitivity was eliminated.

      Nevertheless, as the reviewer pointed out, constitutive activation/inhibition should in principle lead to the opposite effect on Naïve flies. In fact, when Dop1R1<sup>+</sup>/Dop2R<sup>+</sup> neurons were silenced in Naïve flies, PER to sucrose was significantly reduced (Figure 5C-D), confirming that these neurons normally facilitate sweet sensation. Meanwhile, while neuronal activation by NaChBac did show a trend towards enhanced PER compared to the GAL4/+ controls, it did not exhibit a difference compared to +>UAS-NaChBac controls that showed a high PER level, likely due to a potential ceiling effect. We have added the discussions to the manuscript.

      Figure 7: I suggest the authors modify their figure a bit. It is not clear why in failed mating, the red arrow in "behavioral modulation" goes to the fly. The authors should find another way to show that mating failure decreased the percentage of flies that are motivated to eat sugar.

      We have modified the figure as suggested.

      Overall, I would suggest the authors be precautious with their conclusion. For example, line 337= "sexual failure suppressed feeding behavior". This is not what is shown by this study. Here, the study shows that mating failure decreases the fraction of flies to eat sucrose. Unless the authors demonstrate that this decrease is generalizable to other metabolites, I suggest the authors modify their conclusion.

      While we primarily used sucrose as the stimulant in our experiments, we also tested responses to two other sugars: fructose and glucose (Figure 1 supplement 1D-E). In all three cases, mating failure led to a significant reduction in sweet perception, suggesting that the effect of courtship failure is not limited to a single metabolite but rather reflects a general decrease in sweet sensitivity. Meanwhile, reduced sweet sensitivity indeed led to a reduction of feeding initiation (Figure 1).

    1. eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

    2. Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

    1. eLife Assessment

      Fallah et al carefully dissect projections from substantia nigra pars reticulata (SNr) and the globus pallidus externa (GPe) - two key basal ganglia nuclei - to the pedunculopontine nucleus (PPN), a brainstem nucleus that has a central role in motor control. They consider inputs from these two areas onto three types of downstream PPN neurons - GABAergic, glutamatergic, and cholinergic neurons - and carefully map connectivity along the rostrocaudal axis of the PPN. Overall, this important study provides convincing data on PPN connectivity with two key input structures that will provide a basis for further understanding PPN function.

    2. Reviewer #1 (Public review):

      Summary:

      Fallah and colleagues characterize the connectivity between two basal ganglia output nuclei, the SNr and GPe, and a the pedunculopontine nucleus, a brainstem nucleus that is part of the mesencephalic locomotor region. Through a series of systematic electrophysiological studies, they find that these regions target and inhibit different populations of neurons, with anatomical organization. Overall, SNr projects to PPN and inhibits all major cell types, while the GPe inhibits glutamatergic and GABAergic PPN neurons, and preferentially in the caudal part of the nucleus. Optogenetic manipulation of these inputs in the had opposing effects on behavior - SNr terminals in the PPN drove place aversion, while GPe terminals drove place preference.

      Strengths:

      This work is thorough and systematic characterization of a set of relatively understudied circuits. They build on the classic notions of basal ganglia connectivity and suggest a number of interesting future directions to dissect motor control and valence processing in brainstem systems.

      Limitations:

      All the cell type recording studies showing subtle differences in the degree of inhibition and anatomical organization of that inhibition suggest a complex effect of general optogenetic manipulation of SNr or GPe terminals in the PPN. It will be important to determine if SNr or GPe inputs onto a particular cell type in PPN are more or less critical for the how the locomotion and valence effects demonstrated here.

    3. Reviewer #2 (Public review):

      Strengths:

      Fallah et al carefully dissect projections from SNr and GPe - two key basal ganglia nuclei - to the PPN, an important brainstem nucleus for motor control. They consider inputs from these two areas onto 3 types of downstream PPN neurons: GABAergic, glutamatergic, and cholinergic neurons. They also carefully map connectivity along the rostrocaudal axis of the PPN. They provide important and convincing data on PPN connectivity with two important input structures, which will provide a foundation for many future studies. They also consider the behavioral relevance of these different PPN inputs for controlling movement and reinforcement, showing convincing evidence that SNr and GPe inputs have opposing effects on behavior.

      Weaknesses:

      The optogenetics and behavioral studies are intriguing, although more work will be required to fit these data together into a specific model of circuit function and to distinguish the locomotor and reinforcement effects. Interestingly, stimulation of SNr axons in the rostral vs caudal PPN likely differs (as predicted by slice experiments), indicating an area for future investigation and dissection of pathways.

    4. Reviewer #3 (Public review):

      The study by Fallah et al. provides a thorough characterization of the effects of two basal ganglia output pathways, the SNr and the GPe, on cholinergic, glutamatergic, and GABAergic neurons of the PPN. Using a combination of optogenetics-assisted electrophysiology and behavioral assays in genetically defined mouse lines, the authors show that SNr projections broadly inhibit all PPN subtypes along the rostrocaudal axis, whereas GPe projections are mostly restricted to the caudal PPN and predominantly target glutamatergic neurons, with a lesser effect on GABAergic neurons. Activation of these inputs in vivo revealed opposing behavioral effects: SNr stimulation increased locomotion and caused avoidance in the real-time place preference (RTPP) task, while GPe stimulation reduced locomotion and increased time spent in the stimulation zone.

      Strengths:

      The evidence for functional connectivity between SNr and GPe inputs and specific PPN cell types is solid and highlights a prominent influence of SNr across the PPN. The identification of a GPe projection that selectively targets caudal glutamatergic PPN neurons is unexpected and highly relevant to understanding basal ganglia-brainstem interactions. The study stands out for its systematic cell-type-specific approach and the combination of electrophysiological and behavioral data. Importantly, the authors addressed key concerns from the initial review by performing new analyses and adding important controls:

      Motor activity was re-analyzed at higher temporal resolution, revealing more nuanced effects of stimulation (Fig. S2).

      The concern that motor effects might confound RTPP performance was mitigated by analyzing unstimulated test sessions, which showed that place preference or aversion persisted in the absence of stimulation (Fig. 7G).

      The potential recruitment of SNc dopaminergic projections was directly tested using DAT-Cre mice, confirming that dopaminergic axon stimulation drives locomotion and reward but does not explain the aversive effect seen with broader SNr activation (Fig. S3).

      Weaknesses:

      While the revised analyses and added data strengthen the conclusions, the interpretation of the behavioral effects remains somewhat limited by the use of RTPP, which can be influenced by motor changes, even with unilateral stimulation. Nonetheless, the additional controls and thorough discussion now acknowledge and address these caveats appropriately.

      Some minor clarifying edits would enhance the manuscript's precision and readability, including improvements to terminology, data presentation, figure referencing, and the organization of behavioral and statistical reporting.

      Conclusion:

      This is a strong and compelling study that provides a detailed and novel characterization of basal ganglia inputs to the PPN and their behavioral relevance. The authors were responsive to reviewer feedback, and the revised manuscript is significantly improved. The findings advance our understanding of how basal ganglia output pathways engage brainstem circuits to modulate locomotion and valence.

    1. eLife Assessment

      This useful modeling study shows how spatial representations similar to experiment emerge in a recurrent neural network trained on a navigation task by requiring path integration and decodability, but without relying on grid cells. The network modeling results are solid, although the link to experimental data may benefit from further development.

    2. Reviewer #1 (Public review):

      Summary:

      This work studies representations in a network with one recurrent layer and one output layer that needs to path-integrate so that its position can be accurately decoded from its output. To formalise this problem, the authors define a cost function consisting of the decoding error and a regularisation term. They specify a decoding procedure that, at a given time, averages the output unit center locations, weighted by the activity of the unit at that time. The network is initialised without position information, and only receives a velocity signal (and a context signal to index the environment) at each timestep, so to achieve low decoding error it needs to infer its position and keep it updated with respect to its velocity by path integration.

      The authors take the trained network and let it explore a series of environments with different geometries while collecting unit activities to probe learned representations. They find localised responses in the output units (resembling place fields) and border responses in the recurrent units. Across environments, the output units show global remapping and the recurrent units show rate remapping. Stretching the environment generally produces stretched responses in output and recurrent units. Ratemaps remain stable within environments and stabilise after noise injection. Low-dimensional projections of the recurrent population activity forms environment-specific clusters that reflect the environment's geometry, which suggests independent rather than generalised representations. Finally, the authors discover that the centers of the output unit ratemaps cluster together on a triangular lattice (like the receptive fields of a single grid cell), and find significant clustering of place cell centers in empirical data as well.

      The model setup and simulations are clearly described, and are an interesting exploration of the consequences of a particular set of training requirements - here: path integration and decodability. But it is not obvious to what extent the modelling choices are a realistic reflection of how the brain solves navigation. Therefore, it is not clear whether the results generalize beyond the specifics of the setup here.

      Strengths:

      The authors introduce a very minimal set of model requirements, assumptions, and constraints. In that sense, the model can function as a useful 'baseline', that shows how spatial representations and remapping properties can emerge from the requirement of path integration and decodability alone. Moreover, the authors use the same formalism to relate their setup to existing spatial navigation models, which is informative.

      The global remapping that the authors show is convincing and well-supported by their analyses. The geometric manipulations and the resulting stretching of place responses, without additional training, are interesting. They seem to suggest that the recurrent network may scale the velocity input by the environment dimensions so that the exact same path integrator-output mappings remain valid (but maybe there are other mechanisms too that achieve the same).

      The simulations and analyses in the appendices serve as insightful controls for the main results.

      The clustering of place cell peaks on a triangular lattice is intriguing, given there is no grid cell input. It could have something to do with the fact that a triangular lattice provides optimal coverage of 2d space? The included comparison with empirical data is valuable as a first exploration, showing a promising example, but doesn't robustly support the modelling results.

    3. Reviewer #2 (Public review):

      Summary:

      The authors proposed a neural network model to explore the spatial representations of the hippocampal CA1 and entorhinal cortex (EC) and the remapping of these representations when multiple environments are learned. The model consists of a recurrent network and output units (a decoder) mimicking the EC and CA1, respectively. The major results of this study are: the EC network generates cells with their receptive fields tuned to a border of the arena; the decoder develops neuron clusters arranged in a hexagonal lattice. Thus, the model accounts for entrohinal border cells and CA1 place cells. It suggests that the remapping of place cells occurs between different environments through state transitions corresponding to unstable dynamical modes in the recurrent network.

      Strengths:

      The authors found a spatial arrangement of receptive fields similar to their model's prediction in experimental data recorded from CA1. Thus, the model proposes plausible mechanisms to generate hippocampal spatial representations without relying on grid cells. The model also suggests an interesting possibility that path integration is not the speciality of grid cells.

      Weaknesses:

      The role of grid cells in the proposed view, i.e., the boundary-to-place-to-grid model, remains elusive. The model can generate place cells without generating entorhinal grid cells. Moreover, the model can generate hexagonal grid patterns of place cells in a large arena. Whether and how the proposed model is integrated into the entire picture of the hippocampal-entorhinal memory processing remains elusive.

    4. Reviewer #3 (Public review):

      Summary:

      The authors used recurrent neural network modelling of spatial navigation tasks to investigate border and place cell behaviour during remapping phenomena.

      Strengths:

      The neural network training seemed for the most part (see comments later) well-performed, and the analyses used to make the points were thorough.

      The paper and ideas were well-explained.

      Figure 4 contained some interesting and strong evidence for map-like generalisation as environmental geometry was warped.

      Figure 7 was striking and potentially very interesting.

      It was impressive that the RNN path-integration error stayed low for so long (Fig A1), given that normally networks that only work with dead-reckoning have errors that compound. I would have loved to know how the network was doing this, given that borders did not provide sensory input to the network. I could not think of many other plausible explanations... It would be even more impressive if it was preserved when the network was slightly noisy.

      Update:

      The analysis of how the RNN remapped, using a context signal to switch between largely independent maps, and the examination of the border like tuning in the recurrent units of the RNN, were both thorough and interesting. Further, in the updated response I appreciated the additional appendix E which helped substantiate the claim that the RNN neurons were border cells.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      In the future, could you please include the exact changes made to the manuscript in the relevant section of the rebuttal, so it's clear which changes addressed the comment? That would make it easier to see what you refer to exactly - currently I have to guess which manuscript changes implement e.g. "We have tried to make these points more evident".

      Yes, we apologize for the inconvenience.

      On possible navigation solutions:

      I'm not sure if I follow this argument. If the networks uses a shifted allocentric representation centred on its initial state, it couldn't consistently decode the position from different starting positions within the same environment (I don't think egocentric is the right term here - egocentric generally refers to representations relative to the animal's own direction like "to the left" rather than "to the west" but these would not work in the allocentric decoding scheme here). In other words: If I path integrate my location relative to my starting location s1 in environment 1 and learn how to decode that representation to an environment location, I cannot use the same representation when I start from s2 in environment 1, because everything will have shifted. I still believe using boundaries is the only solution to infer the absolute location for the agent here (because that's the only information that it gets), and that's the reason for finding boundary representations (and not grid cells). Imagine doing this task on a perfect torus where there are no boundaries: it would be impossible to ever find out at what 'absolute' location you are in the environment. I have therefore not updated this part of my review, but do let me know if I misunderstood.

      Thank you for addressing this point, which is a somewhat unusual feature of our network: We believe the point you raise applies if the decoding were fixed. However, in our case, the decoding is dynamic and depends on the firing pattern, as place unit centers are decoded on a per-trajectory basis. Thus, a new place-like basis may be formed for each trajectory (and in each environment). Hence, the model is not constrained to reuse its representation across trajectories or environments, as place centers are inferred based on unit firing. However, we do observe that the network learns to use a fixed place field placement in each geometry, which likely reflects some optimal solution to the decoding problem. This might also help to explain the hexagonal arrangement of learned field centers. Finally, we agree that egocentric may not be entirely accurate, but we found it to be the best word to distinguish from the allocentric-type navigation adopted by the network.

      Regarding noise injection:

      Beyond that noise level, the network might return to high correlations, but that must be due to the boundary interactions - very much like what happens at the very beginning of entering an environment: the network has learned to use the boundary to figure out where it is from an uninformative initial hidden state. But I don't think this is currently reflected well in the main text. That still reads "Thus, even though the network was trained without noise, it appears robust even to large perturbations. This suggests that the learned solutions form an approximate attractor." I think your new (very useful!) velocity ablations show that only small noise is compensated for by attractor dynamics, and larger noise injections are error corrected through boundary interactions. I've added this to the new review.

      Thank you for your kind feedback: We have changed the phrasing in the text to say “robust even to moderate perturbations. ” As we hold that, while numerically small, the amount of injected noise is rather large when compared to the magnitude of activities in the network (see Fig. A5d); the largest maximal rate is around 0.1, which is similar to the noise level at which output representations fail to re-converge. However, some moderation is appropriate, we agree.

      On contexts being attractive:

      In the new bit of text, I'm not sure why "each environment appears to correspond to distinct attractive states (as evidenced by the global-type remapping behavior)", i.e. why global-type remapping is evidence for attractive states. Again, to me global-type remapping is evidence that contexts occupy different parts of activity space, but not that they are attractive. I like the new analysis in Appendix F, as it demonstrates that the context signal determines which region of activity space is selected (as opposed to the boundary information!). If I'm not mistaken, we know three things: 1. Different contexts exist in different parts of representation space, 2. Representations are attractive for small amounts of noise, 3. The context signal determines which point in representation space is selected (thanks to the new analysis in Appendix F). That seems to be in line with what the paper claims (I think "contexts are attractive" has been removed?) so I've updated the review.

      It seems to us that we are in agreement on this point; our aim is simply to point out that a particular context signal appears to correspond to a particular (discrete) attractor state (i.e., occupying a distinct part of representation space, as you state), it just seems we use slightly different language, but to avoid confusion, we changed this to say that “representations are attractive”.

      Thanks again for engaging with us, this discussion has been very helpful in improving the paper.

      Reviewer #2:

      However, I still struggle to understand the entire picture of the boundary-to-place-to-grid model. After all, what is the role of grid cells in the proposed view? Are they just redundant representations of the space? I encourage the authors to clarify these points in the last two paragraphs on pages 17-18 of the discussion.

      Thank you for your feedback. While we have discussed the possible role of a grid code to some extent, we agree that this point requires clarification. We have therefore added to the discussion on the role of grid cells, which now reads “While the lack of grid cells in this model is interesting, it does not disqualify grid cells from serving as a neural substrate for path integration. Rather, it suggests that path integration may also be performed by other, non-grid spatial cells, and/or that grid cells may serve additional computational purposes. If grid cells are involved during path integration, our findings indicate that additional tasks and constraints are necessary for learning such representations. This possibility has been explored in recent normative models, in which several constraints have been proposed for learning grid-like solutions. Examples include constraints concerning population vector magnitude, conformal isometry \cite{xu_conformal_2022, schaeffer_self-supervised_2023, schoyen_hexagons_2024}, capacity, spatial separation and path invariance \cite{schaeffer_self-supervised_2023}. Another possibility is that grid cells are geared more towards other cognitive tasks, such as providing a neural metric for space \cite{ginosar_are_2023, pettersen_self-supervised_2024}, or supporting memory and inference-making \cite{whittington_tolman-eichenbaum_2020}. That our model performs path integration without grid cells, and that a myriad of independent constraints are sufficient for grid-like units to emerge in other models, presents strong computational evidence that grid cells are not solely defined by path integration, and that path integration is not only reserved for grid cells.”

      Thank you again for your time and input.

    1. eLife Assessment

      This important work by Diallo et al. substantially advances our understanding of the chemosensory system of a non-hymenopteran eusocial insect by identifying the first olfactory receptor for the trail pheromone in termites. The evidence supporting the conclusions that the receptor PsimOR14 is very narrowly tuned for the pheromone neocembrene is compelling. The work will be of broad interest to entomologists, chemical ecologists, neuroscientists, and molecular biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In their comprehensive analysis Diallo et al. deorphanise the first olfactory receptor of a non-hymenopteran eusocial insect - a termite and identified the well established trail pheromone neocembrene as the receptor's best ligand. By using a large set of odorants the authors convincingly show that, as expected for a pheromone receptor, PsimOR14 is very narrowly tuned. While the authors first make use of an ectopic expression system, the empty neuron of Drosophila melanogaster, to characterise the receptor's responses, they next perform single sensillum recordings with different sensilla types on the termite antenna. By that they are able to identify a sensillum which houses three neurons, of which the B neuron exhibits the narrow responses described for PsimOR14. Hence the authors do not only identify the first pheromone receptor in a termite but can even localise its expression on the antenna. The authors in addition perform a structural analysis to explain the binding properties of the receptor and its major and minor ligands (as this is beyond my expertise, I cannot judge this part of the manuscript). Finally, they compare expression patterns of ORs in different castes and find that PsimOR14 is more strongly expressed in worker than in soldier termites, which corresponds well with stronger antennal responses in the worker caste.

      Strengths:

      The manuscript is well written and a pleasure to read.

      Weaknesses:

      Whenever it comes to the deorphanization of a receptor and its potential role in behaviour (in the case of the manuscript it would be trail following of the termite) one thinks immediately of knocking out the receptor to check whether it is necessary for the behaviour. However, I definitely do not want to ask for this (especially as the establishment of CRISPR Cas-9 in eusocial insects usually turns out to be a nightmare). I also do not know either, whether knock downs via RNAi have been established in termites, but maybe the authors could consider some speculation on this in the discussion.

      Comments on revisions:

      I appreciate how the authors have replied to my comments and I have the feeling that also the other reviewers' comments have been dealt with carefully. I therefore support the acceptance of this very nice and interesting manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors performed the functional analysis of odorant receptors (ORs) of the termite Prorhinotermes simplex to identify the receptor of trail-following pheromone. The authors performed single-sensillum recording (SSR) using the transgenic Drosophila flies expressing a candidate of the pheromone receptor and revealed that PsimOR14 strongly responds to neocembrene, the major component of the pheromone. Also, the authors found that one sensillum type (S I) detects neocembrene and also performed SSR for S I in the wild termite workers. Furthermore, the authors revealed the gene, transcript, and protein structures of PsimOR14, predict the 3D model and ligand docking of PsimOR14, and demonstrated that PsimOR14 is higher expressed in workers than soldiers using RNA-seq for heads of workers and soldiers of P. simplex and that EAG response to neocembrene is higher in workers than soldiers. I considered that this study will contribute to further understanding of the molecular and evolutionary mechanisms of chemoreception system in termites.

      Strength:

      The manuscript is well written. As far as I know, this study is the first study that identified a pheromone receptor in termites. The authors not only present a methodology for analyzing the function of termite pheromone receptors but also provide important insights in terms of the evolution of ligand selectivity of termite pheromone receptors.

      Weakness:

      This revised manuscript appears to me to have no major weaknesses.

    4. Reviewer #3 (Public review):

      Summary:

      Chemical communication is essential for the organization of eusocial insect societies. It is used in various important contexts, such as foraging and recruiting colony members to food sources. While such pheromones have been chemically identified and their function demonstrated in bioassays, little is known about their perception. Excellent candidates are the odorant receptors that have been shown to be involved in pheromone perception in other insects including ants and bees but not termites. The authors investigated the function of the odorant receptor PsimOR14, which was one of four target odorant receptors based on gene sequences and phylogenetic analyses. They used the Drosophila empty neuron system to demonstrate that the receptor was narrowly tuned to the trail pheromone neocembrene. Similar responses to the odor panel and neocembrene in antennal recordings suggested that one specific antennal sensillum expresses PsimOR14. Additional protein modeling approaches characterized the properties of the ligand binding pocket in the receptor. Finally, PsimOR14 transcripts were found to be significantly higher in worker antennae compared to soldier antennae, which corresponds to the worker's higher sensitivity to neocembrene.

      Strengths:

      The study presents an excellent characterization of a trail pheromone receptor in a termite species. The integration of receptor phylogeny, receptor functional characterization, antennal sensilla responses, receptor structure modeling, and transcriptomic analysis is especially powerful. All parts build on each other and are well supported with a good sample size. (I cannot comment on protein modeling and docking due to a lack of expertise in this area)

      Weaknesses:

      None.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their comprehensive analysis Diallo et al. deorphanise the first olfactory receptor of a nonhymenopteran eusocial insect - a termite and identified the well-established trail pheromone neocembrene as the receptor's best ligand. By using a large set of odorants the authors convincingly show that, as expected for a pheromone receptor, PsimOR14 is very narrowly tuned. While the authors first make use of an ectopic expression system, the empty neuron of Drosophila melanogaster, to characterise the receptor's responses, they next perform single sensillum recordings with different sensilla types on the termite antenna. By that, they are able to identify a sensillum that houses three neurons, of which the B neuron exhibits the narrow responses described for PsimOR14. Hence the authors do not only identify the first pheromone receptor in a termite but can even localize its expression on the antenna. The authors in addition perform a structural analysis to explain the binding properties of the receptor and its major and minor ligands (as this is beyond my expertise, I cannot judge this part of the manuscript). Finally, they compare expression patterns of ORs in different castes and find that PsimOR14 is more strongly expressed in workers than in soldier termites, which corresponds well with stronger antennal responses in the worker caste.

      Strengths:

      The manuscript is well-written and a pleasure to read. The figures are beautiful and clear. I actually had a hard time coming up with suggestions.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Whenever it comes to the deorphanization of a receptor and its potential role in behaviour (in the case of the manuscript it would be trail-following of the termite) one thinks immediately of knocking out the receptor to check whether it is necessary for the behaviour. However, I definitely do not want to ask for this (especially as the establishment of CRISPR Cas-9 in eusocial insects usually turns out to be a nightmare). I also do not know either, whether knockdowns via RNAi have been established in termites, but maybe the authors could consider some speculation on this in the discussion.

      We agree that a functional proof of the PsimOR14 function using reverse genetics would be a valuable addition to the study to firmly establish its role in trail pheromone sensing. Nevertheless, such a functional proof is difficult to obtain. Due to the very slow ontogenetic development inherent to termites (several months from an egg to the worker stage) the CRISPR Cas-9 is not a useful technique for this taxon. By contrast, termites are quite responsive to RNAimediated silencing and RNAi has previously been used for the silencing of the ORCo co-receptor in termites resulting in impairment of the trail-following behavior (DOI: 10.1093/jee/toaa248). Likewise, our previous experiments showed a decreased ORCo transcript abundance, lower sensitivity to neocembrene and reduced neocembrene trail following upon dsPsimORCo administration to P. simplex workers, while we did not succeed in reducing the transcript abundance of PsimOR14 upon dsPsimOR14 injection. We do not report these negative results in the present manuscript so as not to dilute the main message. In parallel, we are currently developing an alternative way of dsRNA delivery using nanoparticle coating, which may improve the RNAi experiments with ORs in termites.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors performed the functional analysis of odorant receptors (ORs) of the termite Prorhinotermes simplex to identify the receptor of trail-following pheromone. The authors performed single-sensillum recording (SSR) using the transgenic Drosophila flies expressing a candidate of the pheromone receptor and revealed that PsimOR14 strongly responds to neocembrene, the major component of the pheromone. Also, the authors found that one sensillum type (S I) detects neocembrene and also performed SSR for S I in wild termite workers. Furthermore, the authors revealed the gene, transcript, and protein structures of PsimOR14, predicted the 3D model and ligand docking of PsimOR14, and demonstrated that PsimOR14 is higher expressed in workers than soldiers using RNA-seq for heads of workers and soldiers of P. simplex and that EAG response to neocembrene is higher in workers than soldiers. I consider that this study will contribute to further understanding of the molecular and evolutionary mechanisms of the chemoreception system in termites.

      Strength:

      The manuscript is well written. As far as I know, this study is the first study that identified a pheromone receptor in termites. The authors not only present a methodology for analyzing the function of termite pheromone receptors but also provide important insights in terms of the evolution of ligand selectivity of termite pheromone receptors.

      We thank the reviewer for the overall positive evaluation of the manuscript.

      Weakness:

      As you can see in the "Recommendations to the Authors" section below, there are several things in this paper that are not fully explained about experimental methods. Except for this point, this paper appears to me to have no major weaknesses.

      We address point by point the specific comments listed in the Recommendation to the authors chapter below.

      Reviewer #3 (Public review):

      Summary:

      Chemical communication is essential for the organization of eusocial insect societies. It is used in various important contexts, such as foraging and recruiting colony members to food sources. While such pheromones have been chemically identified and their function demonstrated in bioassays, little is known about their perception. Excellent candidates are the odorant receptors that have been shown to be involved in pheromone perception in other insects including ants and bees but not termites. The authors investigated the function of the odorant receptor PsimOR14, which was one of four target odorant receptors based on gene sequences and phylogenetic analyses. They used the Drosophila empty neuron system to demonstrate that the receptor was narrowly tuned to the trail pheromone neocembrene. Similar responses to the odor panel and neocembrene in antennal recordings suggested that one specific antennal sensillum expresses PsimOR14. Additional protein modeling approaches characterized the properties of the ligand binding pocket in the receptor. Finally, PsimOR14 transcripts were found to be significantly higher in worker antennae compared to soldier antennae, which corresponds to the worker's higher sensitivity to neocembrene.

      Strengths:

      The study presents an excellent characterization of a trail pheromone receptor in a termite species. The integration of receptor phylogeny, receptor functional characterization, antennal sensilla responses, receptor structure modeling, and transcriptomic analysis is especially powerful. All parts build on each other and are well supported with a good sample size.

      We thank the reviewer for these positive comments.

      Weaknesses:

      The manuscript would benefit from a more detailed explanation of the research advances this work provides. Stating that this is the first deorphanization of an odorant receptor in a clade is insufficient. The introduction primarily reviews termite chemical communication and deorphanization of olfactory receptors previously performed. Although this is essential background, it lacks a good integration into explaining what problem the current study solves.

      We understand the comment about the lack of an intelligible cue to highlight the motivation and importance of the present study. In the current version of the manuscript the introduction has been reworked. As suggested by Reviewer 3 in the Recommendations section below, the introduction now integrates some parts of the original discussion, especially the part discussing the OR evolution and emergence of eusociality in hymenopteran social insects and in termites, while underscoring the need of data from termites to compare the commonalities and idiosyncrasies in neurophysiological (pre)adaptations potentially linked with the independent eusociality evolution in the two main social insect clades.

      Selecting target ORs for deorphanization is an essential step in the approach. Unfortunately, the process of choosing these ORs has not been described. Were the authors just lucky that they found the correct OR out of the 50, or was there a specific selection process that increased the probability of success?

      Indeed, we were extremely lucky. Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. The selection criteria for the first set of four receptors were (i) to have full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) to be represented on different branches (subbranches) of the phylogenetic tree. Then it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component. In the revised version, we state these selection criteria in the results section (Phylogenetic reconstruction and candidate OR selection).

      The deorphanization attempts of additional P. simplex ORs are currently running.

      The authors assigned antennal sensilla into five categories. Unfortunately, they did not support their categories well. It is not clear how they were able to differentiate SI and SII in their antennal recordings.

      We agree that the classification of multiporous sensilla into five categories lacks robust discrimination cues. The identification of the neocembrene-responding sensillum was initially carried out by SSR measurements on individual olfactory sensilla of P. simplex workers one-by-one and the topology of each tested sensillum was recorded on optical microscope photographs taken during the SSR experiment. Subsequently, the SEM and HR-SEM were performed in which we localized the neocembrene sensillum and tried to find distinguishing characters. We admit that these are not robust. Therefore, in the revised version of the manuscript we decided to abandon the attempt of sensilla classification and only report the observations about the specific sensillum in which we consistently recorded the response to neocembrene (and geranylgeraniol). The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      The authors used a large odorant panel to determine receptor tuning. The panel included volatile polar compounds and non-volatile non-polar hydrocarbons. Usually, some heat is applied to such non-volatile odorants to increase volatility for receptor testing. It is unclear how it is possible that these non-volatile compounds can reach the tested sensilla without heat application.

      The reviewer points at an important methodological error we made while designing the experiments. Indeed, the inclusion of long-chain hydrocarbons into Panel 1 without additional heat applied to the odor cartridges was inappropriate, even though the experiments were performed at 25–26 °C. We carefully considered the best solution to correct the mistake and finally decided to remove all tested ligands beyond C22 from Panel 1, i.e. altogether five compounds. These changes did not affect the remaining Panels 2-4 (containing compounds with sufficient volatility), nor did they affect the message of the manuscript on highly selective response of PsimOR14 to neocembrene (and geranylgeryniol). In consequence, Figures 2, 3 and 5 were updated, along with the supplementary tables containing the raw data on SSR measurements. In addition, the tuning curve for PsimOR14 was re-built and receptor lifetime sparseness value re-calculated (without any important change). We also exchanged squalene for limonene in the docking and molecular dynamics analysis and made new calculations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) L 208: "than" instead of "that"

      Corrected.

      (2) L 527+527 strange squares (•) before dimensions

      Apparently an error upon file conversion, corrected.

      (3) L553 "reconstructing" instead of "reconstruct"

      Corrected.

      (4) Two references (Chahda et al. and Chang et al. appear too late in the alphabet.

      Corrected. Thank you for spotting this mistake. Due to our mistake the author list was ordered according to the alphabet in Czech language, which ranks CH after H.

      Reviewer #2 (Recommendations for the authors):

      (1) L148: Why did the authors select only four ORs (PsimOR9, 14, 30, and 31) though there are 50 ORs in P. simplex? I would like you to explain why you chose them.

      Our strategy was to first select a modest set of ORs to confirm the feasibility of the Empty Neuron Drosophila system and newly established SSR setup, while taking advantage of having a set of termite pheromones, including those previously identified in the P. simplex model, some of them de novo synthesized for this project. Then, it was a matter of a good luck to hit the PsimOR14 selectively responding to the genuine P. simplex trail-following pheromone main component, while the deorphanization attempts of a set of additional P. simplex ORs is currently running. In the revised version of the manuscript, we state the selection criteria for the four ORs studied in the Results section (Phylogenetic reconstruction and candidate OR selection).

      (2) L149: Where is Figure 1A? Does this mean Figure 1?

      Thank you for spotting this mistake. Fig. 1 is now properly labelled as Fig. 1A and 1B in the figure itself and in the legend. Also the text now either refers to either 1A or 1B.

      (3) Figure 1: The authors also showed the transcription abundance of all 50 ORs of P. simplex in the right bottom of Figure 1, but there is no explanation about it in the main text.

      The heatmap reporting the transcript abundances is now labelled as Fig. 1B and is referred to in the discussion section (in the original manuscript it was referred to on the same place as Fig. 1).

      (4) L260-265: The authors confirmed higher expression of PsimOR14 in workers than soldiers by using RNA-seq data and stronger EAG responses of PsimOR14 to neocembrene in workers than soldiers, but I think that confirming the expression levels of PsimOR14 in workers and soldiers by RT-qPCR would strengthen the authors' argument (it is optional).

      qPCR validation is a suitable complement to read count comparison of RNA Seq data, especially when the data comes from one-sample transcriptomes and/or low coverage sequencing. Yet, our RNA Seq analysis is based on sequencing of three independent biological replicates per phenotype (worker heads vs. soldier heads) with ~20 millions of reads per sample. Thus, the resulting differential gene expression analysis is a sufficient and powerful technique in terms of detection limit and dynamic range.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified since the Methods section only referred to the GenBank accession numbers in the original manuscript. Therefore, we added more information in the Methods section (Bioinformatics) and make clear in the Methods that this data comes from our previous research and related bioproject.

      (5) L491: I think that "The synthetic processes of these fatty alcohols are ..." is better.

      We replaced the sentence with “The de novo organic synthesis of these fatty alcohols is described …”

      (6) L525 and 527: There are white squares between the number and the unit. Perhaps some characters have been garbled.

      Apparently an error upon file conversion, corrected.

      (7) L795: ORCo?

      Corrected.

      (8) L829-830 & Figure 4: Where is Figure 4D?

      Thank you for spotting this mistake from the older version of Figure 4. The SSR traces referred to in the legend are in fact a part of Figure 5. Moreover, Figure 4 is now reworked based on the comments by Reviewer 3.

      (9) L860-864: Why did the authors select the result of edgeR for the volcano plot in Figure 7 although the authors use both DESeq2 and edgeR? An explanation would be needed.

      Both algorithms, DESeq2 and EdgeR, are routinely used for differential gene expression analysis. Since they differ in read count normalization method and statistical testing we decided to use both of them independently in order to reduce false positives. Because the resulting fold changes were practically identical in both algorithms (results for both analyses are listed in Supplementary table S15), we only reported in Fig. 7 the outputs for edgeR to avoid redundancies. We added in the Results section the information that both techniques listed PsimOR14 among the most upregulated in workers.

      Reviewer #3 (Recommendations for the authors):

      The discussion contains many descriptions that would fit better into the introduction, where they could be used to hint at the study's importance (e.g., 292-311, 381-412). The remaining parts often lack a detailed discussion of the results that integrates details from other insect studies. Although references were provided, no details were usually outlined. It would be helpful to see a stronger emphasis on what we learn from this study.

      Along with rewriting the introduction, we also modified the discussion. As suggested, the lines 292-311 were rewritten and placed in the introduction. By contrast, we preferred to keep the two paragraphs 381-412 in the discussion, since both of them outline the potential future interesting targets of research on termite ORs.

      As suggested, the discussion has been enriched and now includes comparative examples and relevant references about the broad/narrow selectivity of insect ORs, about the expected breadth of tuning of pheromone receptors vs. ORs detecting environmental cues, about the potential role of additional neurons housed in the neocembrene-detecting sensillum of P. simplex workers, etc. From both introduction and discussion the redundant details on the chemistry of termite communication have been removed.

      This includes explanations of the advantages of the specific methodologies the authors used and how they helped solve the manuscript's problem. What does the phylogeny solve? Was it used to select the ORs tested? It would be helpful to discuss what the phylogeny shows in comparison to other well-studied OR phylogenies, like those from the social Hymenoptera.

      We understand the comment. In fact, our motivation to include the phylogenetic tree of termite ORs was essentially to demonstrate (i) the orthologous nature of OR diversity with few expansions on low taxonomic levels, and (ii) to demonstrate graphically the relationship among the four selected sequences. We do not attempt here for a comprehensive phylogenetic analysis, because it would be redundant given that we recently published a large OR phylogeny which includes all sequences used in the present manuscript and analysed them in the proper context of related (cockroaches) and unrelated insect taxa (Johny et al., 2023). This paper also discusses the termite phylogenetic pattern with those observed in other Insecta. This paper is repeatedly cited on appropriate places of the present manuscript and its main observations are provided in the Introduction section. Therefore, we feel that thorough discussion on termite phylogeny would be redundant in the present paper.

      The authors categorized the sensilla types. Potential problems in the categorization aside, it would be helpful to know if it is expected that you have sensilla specialized in perceiving one specific pheromone. What is known about sensilla in other insects?

      We understand. In the discussion of the revised version, we develop more about the features typical/expected for a pheromone receptor and the sensillum housing this receptor together with two other olfactory sensory neurons, including examples from other insects.

      As the manuscript currently stands, specialist readers with their respective background knowledge would find this study very interesting. In contrast, the general reader would probably fail to appreciate the importance of the results.

      We hope that the re-organized and simplified introduction may now be more intelligible even for non-specialist readers.

      (1) L35: Should "workers" be replaced with "worker antennae"?

      Corrected.

      (2) L62: Should "conservativeness" be replaced by "conservation"?

      Replaced with “parsimony”.

      (3) L129: How and why did the authors choose four candidate ORs? I could not find any information about this in the manuscript. I wondered why they did not pick the more highly expressed PsimOr20 and 26 (Figure 7).

      As already replied above in the Weaknesses section, we selected for the first deorphanization attempts only a modest set of four ORs, while an additional set is currently being tested. We also explained above the inclusion criteria, i.e. (i) full-length ORF and at least 6 unambiguously predicted transmembrane regions, and (ii) presence on different branches (subbranches) of the OR phylogeny. For these reasons, we did not primarily consider the expression patterns of different ORs. As for Fig. 7, it shows differential expression between soldiers and workers, which was not the primary guideline either and the data was obtained only after having the ORs tested by SSR. Yet, even though we had data on P. simplex ORs expression (Fig. 1B), we did not presume that pheromone receptors should be among the most expressed ORs, given the richness of chemical cues detected by worker termites and unlike, e.g., male moths, where ORs for sex pheromones are intuitively highly expressed.

      The strategy of OR selection is specified in the results section of the revised manuscript under “Phylogenetic reconstruction and candidate OR selection”.

      (4) 198 to 200: SI, II, and III look very similar. Additional measurements rather than qualitative descriptions are required to consider them distinct sensilla. The bending of SIII could be an artifact of preparation. I do not see how the authors could distinguish between SI and SII under the optical microscope for recordings. A detailed explanation is required.

      As we responded above in “Weaknesses” chapter, we admit that the sensilla classification is not intelligible. Therefore, we decided in the revised version to abandon the classification of sensilla types and only focus on the observations made on the neocembreneresponding sensillum. To recognize the specific sensillum, we used its topology on the last antennal segment. Because termite antennae are not densely populated with sensilla, it is relatively easy to distinguish individual sensilla based on their topology on the antenna, both in optical microscope and SEM photographs. The modifications affect Fig. 4, its legend and the corresponding part of the results section (Identification of P. simplex olfactory sensillum responding to neocembrene).

      (5) 208: "Than" instead of "that"

      Corrected.

      (6) 280: I suggest replacing "demand" with "capabilities"

      Corrected.

      (7) 312: Why "nevertheless? It sounds as if the authors suggest that there is evidence that ORs are not important for communication. This should be reworded.

      We removed “Nevertheless” from the beginning of the sentence.

      (8) 321 to 323: This sentence sounds as if something is missing. I suggest rewriting it.

      This sentence simply says that empty neuron Drosophila is a good tool for termite OR deorphanization and that termite ORs work well Drosophila ORCo. We reworded the sentence.

      (9) 323: I suggest starting a new paragraph.

      Corrected.

      (10) 421: How many colonies were used for each of the analyses?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (11) 430: Did the termites originate from one or multiple colonies and did the authors sample from the Florida and Cuba population?

      The data for this manuscript were collected from three different colonies collected in Cuba. We now describe in the Materials and Methods section which analyses were conducted with each of the colonies.

      (12) 501: How was the termite antenna fixated? The authors refer to the Drosophila methods, but given the large antennal differences between these species, more specific information would be helpful.

      Understood. We added the following information into the Methods section under “Electrophysiology”: “The grounding electrode was carefully inserted into the clypeus and the antenna was fixed on a microscope slide using a glass electrode. To avoid the antennal movement, the microscope slide was covered with double-sided tape and the three distal antennal segments were attached to the slide.”

      (13)509: I want to confirm that the authors indicate that the outlet of the glass tube with the airstream and odorant is 4 cm away from the Drosophila or termite antenna. The distance seems to be very large.

      Thank you for spotting this obvious mistake. The 4 cm distance applies for the distance between the opening for Pasteur pipette insertion into the delivery tube, the outlet itself is situated approx. 1 cm from the antenna. This information is now corrected.

      (14) 510/527: It looks like all odor panels were equally applied onto the filter paper despite the difference in solvent (hexane and paraffin oil). How was the solvent difference addressed?

      In our study we combine two types of odorant panels. First, we test on all four studied receptors a panel containing several compounds relevant for termite chemical communication including the C12 unsaturated alcohols, the diterpene neocembrene, the sesquiterpene (3R,6E)-nerolidol and other compounds. These compounds are stored in the laboratory as hexane solutions to prevent the oxidation/polymerization and it is not advisable to transfer them to another solvent. In the second step we used three additional panels of frequently occurring insect semiochemicals, which are stored as paraffin oil solutions, so as to address the breadth of PsimOR14 tuning. We are aware that the evaporation dynamics differ between the two solvents but we did not have any suitable option how to solve this problem. We believe that the use of the two solvents does not compromise the general message on the receptor specificity. For each panel, the corresponding solvent is used as a control. Similarly, the use of two different solvents for SSR can be encountered in other studies, e.g. 10.1016/j.celrep.2015.07.031.

      (15) 518: delta spikes/sec works for all tables except for the wild type in Table S5. I could not figure out how the authors get to delta spikes/sec in that table.

      Thank you for your sharp eye. Due to our mistake, the values of Δ spikes per second reported in Table S5 for W1118 were erroneously calculated using the formula for 0.5 sec stimulation instead of 1 sec. We corrected this mistake which does not impact the results interpretation in Table S5 and Fig. 2.

      522: Did the workers and soldiers originate from different colonies or different populations?

      We now clearly describe in the Material and Methods section the origin of termites for different experiments. EAG measurements were made using individuals (workers, soldiers) from one Cuban colony.

      (16) Figure 6C/D: I suggest matching colors between the two figures. For example, instead of using an orange circle in C and a green coloration of the intracellular flap in D, I recommend using blue, which is not used for something else. In addition, the binding pocket could be separated better from anything else in a different color.

      We agree that the color match for the intracellular flap was missing. This figure is now reworked and the colors should have a better match and the binding region is better delineated.

      (17) Figure 7/Table S15: It is unclear where the transcriptome data originate and what they are based on. Are these antennal transcriptomes or head transcriptomes? Do these data come from previous data sets or data generated in this study? Figure 7 refers to heads, Table S15 to workers and soldiers, and the methods only refer to antennal extractions. This should be clarified in the text, the figure, and the table.

      We admit that the replicate numbers and origin of the RNA seq data should be better specified and that the information that the RNASeq originated from samples of heads+antennae of workers and soldiers should be provided at appropriate places. Therefore, we added more information on replicates and origin of the data in the Methods section (Bioinformatics) and make clear that this data comes from our previous research and refer to the corresponding bioproject. Likewise, the Figure 7 legend and Table S15 heading have been updated.

    1. eLife Assessment

      This manuscript reports effects of a single dose of methamphetamine vs placebo on a probabilistic reversal learning task with different levels of noise, in a large group of young healthy volunteers. The paper is well written and the methods are rigorous. The findings are important and have theoretical or practical implications beyond a single a subfield. The strength of the evidence is convincing, with the methods, data, and analyses broadly supporting the claims in the paper, which are sufficiently qualified given the lack of a significant effect of the binary baseline performance variable, and the nonlinear effect of individual differences in baseline performance.

    2. Reviewer #1 (Public review):

      The authors examine how probabilistic reversal learning is affected by dopamine by studying the effects of methamphetamine (MA) administration. Based on prior evidence that the effects of pharmacological manipulation depend on baseline neurotransmitter levels, they hypothesized that MA would improve learning in people with low baseline performance. They found this effect, and specifically found that MA administration improved learning in noisy blocks, by reducing learning from misleading performance, in participants with lower baseline performance. The authors then fit participants' behavior to a computational learning model and found that an eta parameter, responsible for scaling learning rate based on previously surprising outcomes, differed in participants with low baseline performance on and off MA.

      Questions:

      (1) It would be helpful to confirm that the observed effect of MA on the eta parameter is responsible for better performance in low baseline performers. If performance on the task is simulated for parameters estimated for high and low baseline performers on and off MA, does the simulated behavior capture the main behavioral differences shown in Figure 3?

      (2) In Figure 4C, it appears that the main parameter difference between low and high baseline performance is inverse temperature, not eta. If MA is effective in people with lower baseline DA, why is the effect of MA on eta and not IT?

      Also, this parameter is noted as temperature but appears to be inverse temperature as higher values are related to better performance. The exact model for the choice function is not described in the methods.

      Comments on revisions:

      Thanks to the authors for their thorough responses and revisions. One typo to note: in the Methods, the "drug effects" paragraph is repeated.

    3. Reviewer #2 (Public review):

      Summary:

      Kirschner and colleagues test whether methamphetamine (MA) alters learning rate dynamics in a validated reversal learning task. They find evidence that MA can enhance performance for low-performers, and that the enhancement reflects a reduction in the degree to which these low-performers dynamically up-regulate their learning rates when they encounter unexpected outcomes. The net effect is that poor performers show more volatile learning rates (e.g. jumping up when they receive misleading feedback), when the environment is actually stable, undermining their performance over trials.

      Strengths:

      The study has multiple strengths, including a large sample size, placebo control, double-blind randomized design, and rigorous computational modeling of a validated task. Additionally, the analytic methods are rigorous and offer new types of analyses for people interested in exploring learning as a function of dynamically changing volatility.

      Weaknesses:

      The limitations, which are acknowledged, include that the drug they use, methamphetamine, can influence multiple neuromodulatory systems including catecholamines and acetylcholine, all of which have been implicated in learning rate dynamics. They also do not have any independent measures of any of these systems, so it is impossible to know which is having an effect.

      Another limitation which they should acknowledge is that the fact that participants were aware of having different experiences in the drug sessions means that their blinding was effectively single-blind (to the experimenters) and not double-blind. That said, the authors do provide some evidence that subjective effects of drugs (e.g. arousal, mood, etc.) did not drive differences in performance.

      Comments on revisions:

      The authors have done an outstanding job responding to, and allaying my prior concerns about their analyses.

    1. eLife Assessment

      Bonnifet et al. present data on the expression and interacting partners of the transposable element L1 in the mammalian brain. The work includes important findings addressing the potential role of L1 in aging and neurodegenerative disease. The reviewers conclude that several aspects of the study are well done. However, the experimental evidence presented supporting the L1 increase with aging is not fully conclusive and this finding remains incomplete in its current form.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Bonnifet et al. profile the presence of L1 ORF1p in the mouse and human brain and report that ORF1p is expressed in the human and mouse brain specifically in neurons at steady state and that there is an age-dependent increase in expression. This is a timely report as two recent papers have extensively documented the presence of full-length L1 transcripts in the mouse and human brain (PMID: 38773348 & PMID: 37910626). Thus, the finding that L1 ORF1p is consistently expressed in the brain is important to document and will be of value to the field.

      Strengths:

      Several parts of this manuscript appear to be well done and include the necessary controls. In particular, the documentation of neuron-specific expression of ORF1p in the mouse brain is an interesting finding with nice documentation. This will be very useful information for the field.

      Weaknesses:

      Several parts of the manuscript appear to be more preliminary and need further experiments to validate their claims. In particular, the data suggesting expression of L1 ORF1p in the human brain and the data suggesting increased expression in the aged brain need further validation. Detailed comments:

      (1) The expression of ORF1p in the human brain shown in Fig. 1j is puzzling. Why are there two strong bands in the WB? How can the authors be sure that this signal represents ORF1p expression and not non-specific labelling? While the authors discuss that others have found double bands when examining human ORF1p, there are also several labs that report only one band. This discrepancy in the field should at least be discussed and the uncertainties with their findings should be acknowledged.

      (2) The data showing a reduction in ORF1p expression in the aged mouse brain is an interesting observation, but the effect magnitude of effect is very limited and somewhat difficult to interpret. This finding should be supported by orthogonal methods to strengthen this conclusion. For example, by WB and by RNA-seq (to verify that the increase in protein is due to an increase in transcription).

      (3) The transcriptomic data using human postmortem tissue presented in Figure 4 and Figure 5 are not convincing. Quantification of transposon expression on short read sequencing has important limitations. Longer reads and complementary approaches are needed to study the expression of evolutionarily young L1s (see PMID: 38773348 & PMID: 37910626 for examples of the current state of the art). As presented, the human RNA data is inconclusive due to the short read length and small sample size. The value of including an inconclusive analysis in the manuscript is difficult to understand. With this data set, the authors cannot investigate age-related changes in L1 expression in human neurons.

      (4) In line with these comments, the title should be changed to better reflect the findings in the manuscript. A title that does not mention "L1 increase with aging" would be better.

    3. Reviewer #2 (Public review):

      Summary:

      Bonnifet et al. sought to characterize the expression pattern of L1 ORF1p expression across the entire mouse brain, in young and aged animals and to corroborate their characterization with Western blotting for L1 ORF1p and L1 RNA expression data from human samples. They also queried L1 ORF1p interacting partners in the mouse brain by IP-MS.

      Strengths:

      A major strength of the study is the use of two approaches: a deep-learning detection method to distinguish neuronal vs. non-neuronal cells and ORF1p+ cells vs. ORF1p- cells across large-scale images encompassing multiple brain regions mapped by comparison to the Allen Brain Atlas, and confocal imaging to give higher resolution on specific brain regions. These results are also corroborated by Western blotting on six mouse brain regions. Extension of their analysis to post-mortem human samples, to the extent possible, is another strength of the paper. The identification of novel ORF1p interactors in brain is also a strength in that it provides a novel dataset for future studies.

      Weaknesses:

      The main weakness of the study is that cell type specificity of ORF1p expression was not examined beyond neuron (NeuN+) vs non-neuron (NeuN-). Indeed, a recent study (Bodea et al. 2024, Nature Neuroscience) found that ORF1p expression is characteristic of parvalbumin-positive interneurons, and it would be very interesting to query whether other neuronal subtypes in different brain regions are distinguished by ORF1p expression. The data suggesting that ORF1p expression is increased in aged mouse brains is intriguing, although it seems to be based upon modestly (up to 27%, dependent on brain region) higher intensity of ORF1p staining rather than a higher proportion of ORF1+ neurons. Indeed, the proportion of NeuN+/Orf1p+ cells actually decreased in aged animals. It is difficult to interpret the significance and validity of the increase in intensity, as Hoechst staining of DNA, rather than immunostaining for a protein known to be stably expressed in young and aged neurons, was used as a control for staining intensity. The main weakness of the IP-MS portion of the study is that none of the interactors were individually validated or subjected to follow-up analyses. The list of interactors was compared to previously published datasets, but not to ORF1p interactors in any other mouse tissue.

      The authors achieved the goals of broadly characterizing ORF1p expression across different regions of the mouse brain, and identifying putative ORF1p interactors in the mouse brain. However, findings from both parts of the study are somewhat superficial in depth.

      This provides a useful dataset to the field, which likely will be used to justify and support numerous future studies into L1 activity in the aging mammalian brain and in neurodegenerative disease. Similarly, the list of ORF1p interacting proteins in the brain will likely be taken up and studied in greater depth.

      Comments on revisions:

      The co-staining of Orf1p with Parvalbumin (PV) presented in Supplemental Figure S5 is a welcome addition exploring the cell type-specificity of Orf1p staining, and broadly corroborates the work of Bodea et al. while revealing that Orf1p also is expressed in non-PV+ cells, consistent with L1 activity across a range of neuronal subtypes. The authors also have strengthened their findings regarding the increased intensity of ORF1p staining in aged compared to young animals, and the newly presented results are indeed more convincing. The prospect of increased neuronal L1 activity with age is exciting, and the results in this paper have provided the groundwork for ongoing discoveries in this area. While it is disappointing that no Orf1p interactors were followed up, this is understandable and the data are nonetheless valuable and will likely prove useful to future studies.

    1. eLife Assessment

      This fundamental study describes patterns of anatomical connectivity between the cortex and the thalamus using magnetic resonance imaging data in humans and non-human primates. The measures are related to numerous other modalities to develop a robust understanding of the organisation of the system. The authors provide convincing evidence that there is a difference between sensory and association cortices in terms of their connectivity with the thalamus, which may have downstream effects on brain function. This work will be of interest to neuroscientists interested in the organization and dynamics of cortico-thalamic circuits.

    2. Reviewer #1 (Public review):

      Summary:

      The thalamus is a central subcortical structure consisting of that receives anatomical connections from various cortical areas, each displaying a unique pattern. Previous studies have suggested that certain cortical areas may establish more extensive connections within the thalamus, influencing distributed information flow. Despite these suggestions, a quantitative understanding of cortical areas' anatomical connectivity patterns within the thalamus is lacking. In this study, the researchers addressed this gap by employing diffusion magnetic resonance imaging (dMRI) on a large cohort of healthy adults from the Human Connectome Project. Using brain-wide probabilistic tractography, a framework was developed to measure the spatial extent of anatomical connections within the thalamus for each cortical area. Additionally, the researchers integrated resting-state functional MRI, cortical myelin, and human neural gene expression data to investigate potential variations in anatomical connections along the cortical hierarchy. The results unveiled two distinct cortico-thalamic tractography motifs: 1) a sensorimotor cortical motif featuring focused thalamic connections to the posterolateral thalamus, facilitating fast, feed-forward information flow; and 2) an associative cortical motif characterized by diffuse thalamic connections targeting the anteromedial thalamus, associated with slower, feed-back information flow. These motifs exhibited consistency across human subjects and were corroborated in macaques, underscoring cross-species generalizability. In summary, the study illuminates differences in the spatial extent of anatomical connections within the thalamus for sensorimotor and association cortical areas, potentially contributing to functionally distinct cortico-thalamic information flow.

      Strengths:

      Quantitative Approach: The study employs diffusion magnetic resonance imaging (dMRI) and probabilistic tractography on a substantial sample size of 828 healthy adults, providing a robust quantitative analysis of anatomical connectivity patterns within the thalamus.

      Multi-Modal Integration: By incorporating resting-state functional MRI, cortical myelin, and human neural gene expression data, the study offers a comprehensive approach to understanding anatomical connections, enriching the interpretation of findings and enhancing the study's overall validity.

      Cross-Species Generalizability: The identification of consistent cortico-thalamic tractography motifs in both human subjects and macaques demonstrates the robustness and cross-species generalizability of the findings, strengthening the significance and broader applicability of the study.

      Supplementary Analyses: There are numerous, excellent examples of clear surrogates used to test the major claims of the paper. This is exemplary work.

      Weaknesses:

      Indirect Estimates of White Matter Connections: While dMRI is a valuable tool, it inherently provides indirect and inferred information about neural pathways. The accuracy and specificity of tractography can be influenced by various factors, including fiber crossing, partial volume effects, and algorithmic assumptions. A potential limitation in the accuracy of indirect estimates might affect the precision of spatial extent measurements, introducing uncertainty in the interpretation of cortico-thalamic connectivity patterns. Addressing the methodological limitations associated with indirect estimates and considering complementary approaches could strengthen the overall robustness of the findings.

      Comments on revised version:

      The authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Howell and colleagues focuses on describing macro patterns of anatomical connections between cortical areas and the thalamus in the human brain. This research topic poses significant challenges due to the inability to apply the gold standard of mapping anatomical connections, viral tracing, to humans. Moreover, when applied to animal models, viral tracing often has limited scope and resolution. As a result, the field has thus far lacked a comprehensive and validated description of thalamocortical anatomical connectivity in humans.

      The paper focuses on an intriguing question: whether anatomical connections from the cortex to the thalamus exhibit a diffuse pattern, targeting multiple thalamic sub-regions, or a more focal pattern, selectively targeting specific thalamic subregions. This novel and significant question holds substantial implications for our understanding of thalamocortical information processing. The authors have developed a sophisticated and innovative quantitative metric to address this question. The study revealed two main patterns: a focal pattern originating from sensorimotor cortical regions to the posterior thalamus and a more diffuse pattern from associative cortical regions to the anterior-medial thalamus. These findings are then framed within the context of thalamocortical motifs involved in feedforward versus feedback processing.

      While this paper has several strengths, including its significance and methodological sophistication, its extension to non-human primates and other forms of data for testing hierarchy, there are important limitations. These limitations, discussed in more detail below, primarily concern tracking accuracy and the known limitations of using diffusion data to track thalamocortical connections in humans. These limitations may potentially introduce systematic biases into the results, weakening their support. Addressing these limitations through better validation is crucial, though some may remain unresolved due to the fundamental constraints of diffusion imaging.

      Strengths:

      This research holds significant basic, clinical, and translational importance as it contributes to our understanding of how thalamocortical anatomical connectivity is organized. Its relevance spans across cognitive, systems, and clinical neuroscientists in various subfields.

      The central question addressed in this study, concerning whether cortico-thalamic projections are focal or diffuse, is both novel and previously unexplored to the best of my knowledge. It offers valuable insights into the potential capabilities of the thalamocortical system in terms of parallel or integrative processing.

      The development of quantitative metrics to analyze anatomical connectivity is highly innovative and suitable for addressing the research questions at hand.

      The findings are not only interesting but also robust, aligning with data from other sources that suggest a hierarchical organization in the brain.

      Using PCA to integrate results across a range of thresholds is innovative.

      The study's sophisticated integration of a diverse range of data and methods provides strong, converging support for its main findings, enhancing the overall credibility of the research.

      Weaknesses:

      Structural thalamocortical connectivity was estimated from diffusion imaging data obtained from the HCP dataset. Consequently, the robustness and accuracy of the results depend on the suitability of this data for such a purpose. Conducting tractography on the cortical-thalamic system is recognized as a challenging endeavour for several reasons. First, diffusion directions lose their clearly defined principal orientations once they reach the deep thalamic nuclei, rendering the tracking of structures on the medial side, such as the medial dorsal (MD) and pulvinar nuclei difficult. Somewhat concerning is those are regions that authors found to show diffuse connectivity patterns. Second, the thalamic radiata diverges into several directions, and routes to the lateral surface often lack the clarity necessary for successful tracking. It is unclear if all cortical regions have similar levels of accuracy, and some of the lateral associative regions might have less accurate tracking, making them appear to be more diffuse, biasing the results.

      While the methodology employed by the authors appears to be state-of-the-art, there exists uncertainty regarding its appropriateness for validation, given the well-documented issues of false positives and false negatives in probabilistic diffusion tractography, as discussed by Thomas et al. 2014 PNAS. Although replicating the results in both humans and non-human primates strengthens the study, a more compelling validation approach would involve demonstrating the method's ability to accurately trace known tracts from established tracing studies or, even better, employing phantom track data. Many of the control analyses the authors presented, such as track density, do not speak to accuracy.

      Because the authors included data from all thresholds into, it seems likely that false positives tracks were included into the results. The methodology described seems to unavoidably include anatomically implausible pathways in the spatial extent analyses.

      If tracking the medial thalamus is indeed less accurate, characterized by higher false positives and false negatives, it could potentially lead to increased variability among individual subjects. In cases where results are averaged across subjects, as the authors have apparently done, this could inadvertently contribute to the emergence of the "diffuse" motif, as described in the context of the associative cortex. This presents a critical issue that requires a more thorough control analysis and validation process to ensure that the main results are not artifacts resulting from limitations in tractography.

      The thresholding approach taken in the manuscript was aimed to control for inter-areal differences in anatomical connection strength that could confound the ED estimates. Here I am not quite clear why inter-areal differences in anatomical connection strength have to be controlled. A global threshold applied on all thalamic voxels might kill some connections that are weak but do exist. Those weak pathways are less possible to survive at high thresholds. In the meantime, the mean ED is weighted, with more conservative thresholds having higher weights. That being said, isn't it possible that more robust pathways might contribute more to the mean ED than weaker pathways?

      Comments on revised version:

      I appreciate the additional supplementary figures and responses from the authors. I think this is an important study, and the review I wrote should provide important context for readers to digest their responses.

    4. Reviewer #3 (Public review):

      Summary:

      In the current work, Howell et al studied the connectivity between cortex and thalamus using DTI tractography per parcel to all voxels in the thalamus. Following they performed various dimensional reduction techniques to uncover how differences in connectivity to the thalamus vary across cortical parcels. Following they explore the spatial correlation of these variations with cortical myelin and functional organization, thalamic nuclei, gene expression derived core-matrix cell differentiation, and extend the model towards macaques. Overall, the authors find a differentiation between sensory and association areas in terms of the association with the thalamus, which reflects differences in cortical microstructure and function, and links to core-matrix differences and can be replicated in macaques.

      Strengths:

      A clear strength of the current work is the combination of different models and approaches to study the link between the cortex and the thalamus. This approach nicely bridges different approaches to describe the role of the thalamus in cortical organisation using a diffusion-based approach. Especially the extension of the model to the macaque is quite nice.

      Appraisal:

      The aim of the study: 'to investigate the spatial extent of anatomical connectivity patterns within the thalamus in both humans and non-human primates and determine if such patterns differ between sensorimotor and association cortical areas' has been met. Further work may continue to investigate other implications of this finding.

      Discussion:

      Overall, I think the study is an intriguing addition to a growing literature studying the anatomical connectivity between thalamus and cortex and its functional implications.

      Comments on revised version:

      Thank you for the responses.