10,000 Matching Annotations
  1. Oct 2025
    1. Reviewer #1 (Public review):

      Summary:

      This manuscript by the Yin group presents interesting findings that organelle-tethered intrinsically disordered "MEMCA" scaffolds, as exemplified by ZDHHC18 at the Golgi and MARCH8 at endosomes, enhance the engagement of cGAS with organelle-proximal condensates, thereby sequestering cGAS from cytosolic DNA sensing and negatively regulating innate immunity.

      Strengths:

      These findings suggest a previously unrecognized mechanism by which Golgi/endosomal IDR scaffolds modulate cGAS activity, with implications for antiviral defense and tumor immunology. The study is conceptually intriguing and potentially impactful.

      Weaknesses:

      While the manuscript addresses a novel aspect of cGAS regulation, additional mechanistic insights and targeted validations are needed to ensure robustness:

      (1) How do ZDHHC18/MARCH8 enhance cGAS engagement? Do they act as bridges to form a ternary, membrane-tethered cGAS-DNA-MEMCA complex, or alter cGAS condensate properties allosterically?

      (2) Is organelle cGAS capture selective? For instance, can other palmitoyltransferases/E3 ligases be substituted for ZDHHC18/MARCH8?

      (3) Why does membrane association suppress cGAS enzymic activity, as dsDNA still resides in cGAS condensation?

    2. Reviewer #2 (Public review):

      Summary:

      The authors found that cGAS, a DNA sensor, relocalizes to organelle membranes (ER, Golgi, endosomes) upon DNA stimulation, revealing spatial regulation of its activity. ZDHHC18 and MARCH8 recruit cGAS to Golgi/endosomes via intrinsically disordered regions (IDRs), driving phase-separated condensates. This sequestration of cGAS-dsDNA complexes suppresses innate immune signaling, uncovering a novel regulatory mechanism.

      Strengths:

      The work overall is very interesting. The authors provided molecular and biochemical evidence.

      Weaknesses:

      Overall, the work is very interesting. However, the quality of some of the data does need to be improved, and more experiments need to be performed.

      The following points need to be addressed:

      (1) In Figure S7, no direct binding between cGAS and MARCH8 or ZD18 IDR is observed, and the interaction only occurs after DNA stimulation. However, Figure 5 shows cGAS recruitment to ZD18 or MARCH8 IDR droplets, suggesting direct interactions. This apparent discrepancy should be clarified.

      (2) The authors propose that recruiting cGAS to organelle membranes reduces its activity, as demonstrated by the FKBP experiment. However, ZD18 and MARCH8 also post-translationally modify cGAS. Do both mechanisms contribute to this effect, and can the authors test this?

      (3) To demonstrate the functional importance of MEMCA, the authors should test IFN production or STING activation in cells.

      (4) Does the IDR of MARCH8 or ZD18 influence the interaction between cGAS and DNA?

      (5) Which region of cGAS does the IDR of MARCH8 or ZD18 interact with: the cGAS-CD or the cGAS-N-terminus?

      (6) The in vitro LLPS experiments with cGAS, DNA, and ZD18/MARCH8 should be conducted under physiological conditions.

    3. Reviewer #3 (Public review):

      Summary:

      In this study by Shi et al., the authors evaluate if cGAS is recruited to the membranes of intracellular organelles. Using a combination of biochemical fractionation and imaging techniques, the authors propose that upon recognition of DNA, cGAS translocates to various subcellular locations, including the golgi, endoplasmic reticulum, and endosomes. Mechanistically, the authors propose that upon localizing to the Golgi or endosome, cGAS binding to MARCH8 and ZDHHC18 prevents cGAS activity by incorporating cGAS and dsDNA into biomolecular condensates. However, in its current form, the study does not directly address this question.

      Strengths:

      The question of evaluating cGAS sub-cellular localization as a mechanism for controlling activity is interesting, and there is some evidence that cGAS is localized to sub-cellular organelle membranes.

      Weaknesses:

      (1) The well-established nuclear localization of cGAS is not adequately addressed in the cell lines used and is inconsistent with the findings.

      (2) Previous studies have shown that ZDHHC18 and MARCH8 control cGAS activity, which detracts somewhat from the novelty.

      (3) A lot of inconsistency in the cell lines and artificial expression systems used across the study.

      (4) A key element missing is showing that in the absence of ZDHHC18 or MARCH8, the loss of endogenous cGAS localization to the various sub-cellular organelles increases cGAMP synthesis and downstream STING activation in primary cells. There is an over-reliance on artificial expression systems. An important experiment to validate the hypothesis would be to evaluate endogenous cGAS localization in MARCH8- and ZDHHC18-deficient primary cells. Further, there should be evaluation of endogenous STING responses in MARCH8- and ZDHHC18-deficient primary cells in tandem with the localization studies.

      (5) There are a large number of grammatical errors throughout the manuscript which should be addressed.

    4. Author response:

      Below we outline our provisional responses to the major points raised in the public reviews, and our planned revisions:

      (1) Mechanistic model of how ZDHHC18/MARCH8 engage the cGAS–DNA condensate (Reviewer #1 & #2

      We will add a dedicated subsection and a working-model figure describing our current view: IDRs of ZDHHC18 (Golgi) and MARCH8 (endosomes) engage pre-formed cGAS–DNA condensates at organelle membranes, and thereby tune cGAS activity through PTMs. We will explicitly discuss bridge-like versus allosteric modes by perform additional LLPS experiment (e.g. FRAP assay) to detect any IDR-driven changes in condensate properties, and explain how these scenarios fit our data.

      (2) Selectivity beyond ZDHHC18/MARCH8 (Reviewer #1)

      We will expand the text to explain existing evidence indicating that, in addition to ZDHHC18 or MARCH8, other post-translational modification (PTM) enzymes and/or membrane-associated scaffolds may also modulate cGAS. We will summarize our current datasets that support this possibility and outline how this selectivity relates to organelle identity.

      (3) Why membrane association suppresses cGAS activity (Reviewer #1)

      We will provide a concise mechanistic rationale—integrating our published work—to explain how membrane-proximal sequestration can limit cGAS catalysis despite cGAS–DNA coexistence within condensates. Specifically, we will discuss (i) IDR-dependent changes in condensate properties, and (ii) PTMs by ZDHHC18/MARCH8 that allosterically reduce catalytic efficiency; we will clearly cross-reference our prior publications that bear on these points.

      (4) Reconciling Fig. S7 (DNA-dependent binding) with Fig. 5 (recruitment to IDR droplets) (Reviewer #2)

      We will add text to clarify experimental context and readouts to prove that there is no real contradiction between Fig. S7 and Fig. 5. In the experiment shown in Fig. 5, PEG (a macromolecular crowding agent) was added to the system, which facilitates the formation of IDR phase-separated droplets. Under these conditions, cGAS partitions into the IDR condensates, leading to the observed recruitment. In contrast, Fig. S7 examines the direct physical interaction between cGAS and the IDRs using biochemical pull-down assays and shows that no direct interaction occurs in the absence of DNA. These two results reflect different experimental contexts and are therefore not mutually exclusive.

      (5) Planned additional tests to address specificity and mechanism (Reviewer #2)

      DNA pull-down: to test whether IDRs alter cGAS–DNA affinity, we will compare cGAS binding to DNA with/without MEMCA IDRs (and with charged-residue mutants).

      Domain mapping: to determine which region of cGAS engages MEMCA IDRs, we will map binding using cGAS N-terminus/core-domain truncations and key surface mutants.

      Physiological in vitro LLPS: we will repeat cGAS–DNA–IDR LLPS assays under physiological buffer conditions and report partition coefficients, FRAP, and phase diagrams to ensure physiological relevance.

      (6) Image clarity and data presentation (Reviewer #2):

      We will improve image resolution, add zoomed-in insets with organelle markers, and provide more significant Cy5-ISD signal.

      (7) Nuclear localization of cGAS and system considerations (Reviewer #3)

      We will explicitly document the nuclear signal of cGAS observed in our confocal experiments, detail the cell lines and expression systems used. We will also clarify cGAS nuclear localization in the cell lines used.

      (8) Endogenous validation and cell line consistency (Reviewer #3):

      We will perform experiments in primary cells (knockout macrophages) to address the concern of relying on overexpression.

      (9) Language and grammar (Reviewer #3):

      We will thoroughly revise the manuscript for grammar and clarity.

      Together, these planned revisions will strengthen the mechanistic basis of our findings and provide direct evidence for the physiological role of organelle-tethered IDRs in regulating cGAS activity.

    1. eLife Assessment

      Ruppert et al. investigated how activation of thermogenesis by cold exposure (CE) and methionine restriction (MetR) impacts health and leads to weight loss in mice. The authors provided valuable datasets showing that the responses to MR and CE are tissue-specific, while MR and CE affect beige adipose similarly. Although the study is descriptive, the data analyses are solid, with well-supported conclusions drawn from the findings.

    2. Reviewer #1 (Public review):

      Summary:

      Activation of thermogenesis by cold exposure and dietary protein restriction are two lifestyle changes that impact health in humans and lead to weight loss in model organisms - here, in mice. How these affect liver and adipose tissues has not been thoroughly investigated side by side. In mice, the authors show that the responses to methionine restriction and cold exposure are tissue-specific, while the effects on beige adipose are somewhat similar.

      Strengths:

      The strength of the work is the comparative approach, using transcriptomics and bioinformatic analyses to investigate the tissue-specific impact. The work was performed in mouse models and is state-of-the-art. This represents an important resource for researchers in the field of protein restriction and thermogenesis.

      Weaknesses:

      The findings are descriptive, and the conclusions remain associative. The work is limited to mouse physiology, and the human implications have not been investigated yet.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a library of RNA sequencing analysis from brown fat, liver, and white fat of mice treated with two stressors - cold challenge and methionine restriction - alone and in combination (interaction between diet and temperature). They characterize the physiologic response of the mice to the stressors, including effects on weight, food intake, and metabolism. This paper provides evidence that while both stressors increase energy expenditure, there are complex tissue-specific responses in gene expression, with additive, synergistic, and antagonistic responses seen in different tissues.

      Strengths:

      The study design and implementation are solid and well-controlled. Their writing is clear and concise. The authors do an admirable job of distilling the complex transcriptome data into digestible information for presentation in the paper. Most importantly, they do not overreach in their interpretation of their genomic data, keeping their conclusions appropriately tied to the data presented. The discussion is well thought out and addresses some interesting points raised by their results.

      Weaknesses:

      The major weakness of the paper is the almost complete reliance on RNA sequencing data, but it is presented as a transcriptomic resource.

    4. Reviewer #3 (Public review):

      Summary:

      Ruppert et al. present a well-designed 2×2 factorial study directly comparing methionine restriction (MetR) and cold exposure (CE) across liver, iBAT, iWAT, and eWAT, integrating physiology with tissue-resolved RNA-seq. This approach allows a rigorous assessment of where dietary and environmental stimuli act additively, synergistically, or antagonistically. Physiologically, MetR progressively increases energy expenditure (EE) at 22{degree sign}C and lowers RER, indicating a lipid utilization bias. By contrast, a 24-hour 4 {degree sign}C challenge elevates EE across all groups and eliminates MetR-Ctrl differences. Notably, changes in food intake and activity do not explain the MetR effect at room temperature.

      Strengths:

      The data convincingly support the central claim: MetR enhances EE and shifts fuel preference to lipids at thermoneutrality, while CE drives robust EE increases regardless of diet and attenuates MetR-driven differences. Transcriptomic analysis reveals tissue-specific responses, with additive signatures in iWAT and CE-dominant effects in iBAT. The inclusion of explicit diet×temperature interaction modeling and GSEA provides a valuable transcriptomic resource for the field.

      Weaknesses:

      Limitations include the short intervention windows (7 d MetR, 24 h CE), use of male-only cohorts, and reliance on transcriptomics without complementary proteomic, metabolomic, or functional validation. Greater mechanistic depth, especially at the level of WAT thermogenic function, would strengthen the conclusions.

    1. eLife Assessment

      This interesting study adapts machine learning tools to analyze movements of a chromatin locus in living cells in response to serum starvation. The machine learning approach developed is useful, the experiments are well controlled, and the data are solid. The study would be greatly strengthened by testing key predictions made using perturbation experiments. This work will be of interest to those studying chromosome biology and gene expression patterns.

    2. Reviewer #1 (Public review):

      Summary:

      Redchuk et al. explore the dynamic properties of chromatin upon serum starvation using machine learning approaches. They use CRISPR-tagging to visualize a region on chromosome 1 in human cells and show that in their system, chromosome 1, but not the previously reported chromosomes 10, 13, and X, undergo a change in radial position upon serum starvation. Live cell imaging showed a position change towards the periphery after serum starvation. They then apply a machine learning algorithm for the analysis of the imaging data, which reveals changes in nuclear area during serum starvation and longer displacements of the chromosome 1 locus near the nuclear periphery. Differential behavior of homologues is also reported.

      Strengths:

      (1) The study of chromatin dynamics is an interesting and important area of research.

      (2) The use of machine learning approaches to analyze live cell imaging data is timely.

      (3) With serum starvation, the authors use a simple, well-controllable model system.

      Weaknesses:

      (1) This study only provides limited new insight into chromatin dynamics.

      (2) It was not immediately evident what the use of machine learning approaches added to this study. It appears that the main conclusions could have been reached by conventional analysis.

      (3) There are several specific technical points:

      a) It was not clear what the CRISRP-Sirius probes actually labelled. The chromosome 1 sgRNA sequence is provided, but I could not find information as to which region(s) of the chromosome are actually labelled (size, location, etc.).

      b) The authors visualize a relatively small region of chromosome 1 but make conclusions regarding the entire chromosome. Additional probes on the same chromosome should be used.

      Related to this point, the discussion of why the authors are unable to reproduce the prior findings of relocation of chromosomes 10, 13, and X is not satisfying. It would be worth comparing the FISH-based painting of entire chromosomes, which generated the results suggesting relocation of these chromosomes, with the point-labelling method used here.

      c) The study lacks controls. Since in their hands chromosomes 10, 13, and X do not change position, they should be used as a negative control in all experiments demonstrating a shift in the location of chromosome 1.

      d) I did not find information about the spatial or temporal resolution of the imaging modality. This is important to assess whether the observed changes in position, relative to time, are meaningful.

      e) The authors analyze surprisingly early timepoints (up to 40 minutes) of serum starvation. Would these results look different if longer serum starvation timepoints of several hours were analyzed?

      f) The authors can do a better job of explaining what the biological meaning of the various parameters (DistR, TDist, etc.) they measure is.

      g) I did not understand the reasoning for the authors' conclusion of differential behavior of homologues. Please explain this better, or idealy use more direct labeling methods that identify the individual homologues.

      h) In many figures, statistical analysis of the data is missing, including, but not limited to, Figures 1B, C, G, Figures 4, 5, 6.

      i) No information is provided throughout the manuscript as to how many cells were analyzed in each experiment. This should be indicated in every figure legend.

    3. Reviewer #2 (Public review):

      Summary:

      The study demonstrates that CRISPR-Sirius provides a powerful approach to investigating chromosome dynamics in living cells during environmental stress. By focusing on serum starvation, the authors show that this process induces global nuclear changes, including a reduction in nuclear area and increased morphological dynamism, while at the same time driving specific reorganization of chromosome 1. Chromosome 1 relocates toward the nuclear periphery and displays distinctive patterns of motion, maintaining overall motility but punctuated by occasional long-distance displacements, particularly near the nuclear envelope. Importantly, the analysis reveals that homologous copies of chromosome 1 do not behave uniformly: peripheral loci become more mobile and responsive to starvation, whereas central homologs remain comparatively stable, often associated with nucleolar subcompartments. By integrating live imaging with machine learning and explainable AI analysis, the study highlights the complexity of nuclear organization and provides valuable insights into how chromosome-specific and locus-specific responses to stress are orchestrated within the three-dimensional nuclear landscape.

      Strengths:

      The study uses live-cell imaging to investigate the dynamics of loci during starvation. Live-cell tracking and data interpretation are carried out using machine learning and AI models, which is a major strength.

      Weaknesses:

      The manuscript is at times difficult to follow, partly because the methodological descriptions are highly specialized, especially for non-expert biologists. In addition, the observations are not tested for a mechanistic basis. Experiments that could provide deeper insights are missing, for example, why chromosome 1 moves, why the peripheral homologue dislocates, or why a "long jump" is observed at the periphery even though the speed of the loci does not change. It is also unclear whether a displacement of 0.5 μm is functionally meaningful.

    1. eLife Assessment

      This study characterises motor and somatosensory cortex neural activity during naturalistic eating and drinking tongue movement in nonhuman primates. The data, which include electrophysiology, three-dimensional tracking of tongue movements, and nerve block manipulations, are valuable to neuroscientists and neural engineers interested in tongue use. Although the current analyses provide a solid description of single neuron activity in these areas, both the population level analyses and the characterisation of activity changes following nerve block could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar video-radiography of markers implanted in the tongue. Their findings indicate that many units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which seemed to result in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      A substantial portion of the paper is dedicated to establishing directional tuning in individual neurons, followed by an analysis of how this tuning changes when sensory feedback is blocked. While such characterizations are valuable, particularly in less-studied motor cortical areas and behaviors, the discrepancies in tuning changes across the two NHPs, coupled with the overall exploratory nature of the study, render the interpretation of these subtle differences somewhat speculative. At the population level, both decoding analyses and state space trajectories from factor analysis indicate that movement direction (or spout location) is robustly represented. However, as with the single-cell findings, the nuanced differences in neural trajectories across reach directions and between baseline and sensory-block conditions remain largely descriptive. To move beyond this, model-based or hypothesis-driven approaches are needed to uncover mechanistic links between neural state space dynamics and behavior.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).<br /> • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.<br /> • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during feeding. This potentially suggests behavioral context-dependent encoding.<br /> • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations from some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in some of the results between monkeys, including the tuning characteristics of primary somatosensory cortex neurons during drinking, and the effect of nerve block on tongue movements and the associated changes in single neuron tuning.

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefit from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75. A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.

      The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results.

      I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw (e.g., the reference to Laurence-Chasen et al. 2023 just shows that there is tongue information independent of jaw kinematics, not that jaw movements don't affect these neurons' activities). I also find the nerve block results inconsistent (more tuning in one monkey, less in the other?) and difficult to really learn something fundamental from, besides that neural activity and behavior both change - in various ways - after nerve block (not at all surprising but still good to see measurements of).

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer. In the revised manuscript the authors note these potential confounds and other limitations in the Discussion.

    4. Reviewer #3 (Public review):

      Summary

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. They perform both single-unit and population level analyses during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties and population trajectories. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties and susceptibility to perturbed sensory input are different.

      Strengths

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data. In the revision, the single-unit analyses of tuning direction are robustly characterized. The differences in neural correlations across behaviors, regions and perturbations are robust. In addition to the substantial amount of largely descriptive analyses, this paper makes two convincing arguments 1) The single-neuron correlates for feeding and licking in OSMCx are different - and can't be simply explained by different kinematics and 2) Blocking sensory input alters the neural processing during orofacial behaviors. The evidence for these claims is solid.

      Weaknesses

      The main weakness of this paper is in providing an account for these differences to get some insight into neural mechanisms. For example, while the authors show changes in neural tuning and different 'neural trajectory' shapes during feeding and drinking - their analyses of these differences are descriptive and provide limited insight for the underlying neural computations.

    1. eLife Assessment

      The authors describe an interesting approach to studying the dynamics and function of membrane proteins in different lipid environments. The important findings have theoretical and practical implications beyond the study of EGFR to all membrane signalling proteins. The evidence supporting the conclusions is convincing, based on the use of a nanodisk system to study membrane proteins in vitro, combined with state-of-the-art single-molecule FRET. The work will be of broad interest to cell biologists and biochemists.

    2. Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling: how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full-length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interesting impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full-length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations, which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate, but its use is not sufficiently justified (statistically or otherwise) in this work in its current form. The experiments with varying cholesterol in particular appear to suggest an alternate model with longer fluorescent lifetimes. More justification of these interpretations of the central experiment of this work would strengthen the paper.

    3. Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function.

      Weaknesses:

      Due to the relative novelty of the approach, a number of concerns remain.

      (1) I am a little skeptical about the good correlation of the nanodisc compositions with the liposome compositions. I would rather have expected a kind of clustering of individual lipid types in the liposome membrane, in particular of cholesterol. This should then result in an uneven distribution upon nanodisc assembly, i.e., in a notable variation of lipid composition in the individual nanodiscs. Could this be ruled out by the implemented assays, or can just the overall lipid composition of the complete nanodisc fraction be analyzed?

      (2) Both templates have been added simultaneously, with a 100-fold excess of the EGFR template. Was this the result of optimization? How is the kinetics of protein production? As EGFR is in far excess, a significant precipitation, at least in the early period of the reaction, due to limiting nanodiscs, should be expected. How is the oligomeric form of the inserted EGFR? Have multiple insertions into one nanodisc been observed?

      (3) The IMAC purification does not discriminate between EGFR-filled and empty nanodiscs. Does the TEM study give any information about the composition of the particles (empty, EGFR monomers, or EGFR oligomers)? Normalizing the measured fluorescence, i.e., the total amount of solubilized receptor, with the total protein concentration of the samples could give some data on the stoichiometry of EGFR and nanodiscs.

      (4) The authors generally assume a 100% functional folding of EGFR in all analyzed environments. While this could be the case, with some other membrane proteins, it was shown that only a fraction of the nanodisc solubilized particles are in functional conformation. Furthermore, the percentage of solubilized and folded membrane protein may change with the membrane composition of the supplied nanodiscs, while non-charged lipids mostly gave rather poor sample quality. The authors normalize the ATP binding to the total amount of detectable EGFR, and variations are interpreted as suppression of activity. Would the presence of unfolded EGFR fractions in some samples with no access to ATP binding be an alternative interpretation?

    1. eLife Assessment

      In this valuable study, through carefully executed and rigorously controlled experiments, the authors challenged a previously reported role of the Death Receptor 6 (DR6/Tnfrsf21) in Wallerian degeneration (WD). Using two DR6 knockout mouse lines and multiple WD assays, both in vitro and in vivo, the authors provided convincing evidence that loss of DR6 in mice does not protect peripheral axons from WD after injury. Questions remain about whether this conclusion is generalizable to CNS axonal degeneration in disease models such as ALS, AD, and prion diseases. In addition, the authors need to provide information about the sex, age, and genetic background of their animal studies to allow readers to better assess the basis for inconsistencies from previous reports on the protective effects of DR6.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that genetic deletion of the orphan tumor necrosis factor receptor DR6 in mice does not protect peripheral axons against degeneration after axotomy. Similarly, Schwann cells in DR6 mutant mice react to axotomy similarly to wild-type controls. These negative results are important because previous work has indicated that loss or inhibition of DR6 is protective in disease models and also against Wallerian degeneration of axons following injury. This carefully executed counterexample is important for the field to consider.

      Strengths:

      A strength of the paper is the use of two independent mouse strains that knock out DR6 in slightly different ways. The authors confirm that DR6 mRNA is absent in these models (western blots for DR6 protein are less convincingly null, but given the absence of mRNA, this is likely an issue of antibody specificity). One of the DR6 knockout strains used is the same strain used in a previous paper examining the effects of DR6 on Wallerian degeneration.

      The authors use a series of established assays to evaluate axon degeneration, including light and electron microscopy on nerve histological samples and cultured dorsal root ganglion neurons in which axons are mechanically severed and degeneration is scored in time-lapse microscopy. These assays consistently show a lack of effect of loss of DR6 on Wallerian degeneration in both mouse strains examined.

      Therefore, in the specific context of these experiments, the author's data support their conclusion that loss of DR6 does not protect against Wallerian degeneration.

      Weaknesses:

      The major weaknesses of this paper include the tone of correcting previously erroneous results and the lack of reporting on important details around animal experiments that would help determine whether the results here really are discordant with previous studies, and if so, why.

      The authors do not report the genetic strain background of the mice used, the sex distributions of their experimental cohorts, or the age of the mice at the time the experiments were performed. All of these are important variables.

      The DR6 knockout strain reported in Gamage et al. (2017) was on a C57BL/6.129S segregating background. Gamage et al. reported that loss of DR6 protected axons from Wallerian degeneration for up to 4 weeks, but importantly, only in 38.5% (5 out of 13) mice they examined. In the present paper, the authors speculate on possible causes for differences between the lack of effect seen here and the effects reported in Gamage et al., including possible spontaneous background mutations, epigenetic changes, genetic modifiers, neuroinflammation, and environmental differences. A likely explanation of the incomplete penetrance reported by Gamage et al. is the segregating genetic background and the presence of modifier loci between C57BL/6 and 129S. The authors do not report the genetic background of the mice used in this study, other than to note that the knockout strain was provided by the group in Gamage et al. However, if, for example, that mutation has been made congenic on C57BL/6 in the intervening years, this would be important to know. One could also argue that the results presented here are consistent with 8 out of 13 mice presented in Gamage et al.

      Age is also an important variable. The protective effects of the spontaneous WldS mutation decrease with age, for example. It is unclear whether the possible protective effects of DR6 also change with age; perhaps this could explain the variable response seen in Gamage et al. and the lack of response seen here.

      It is unclear if sex is a factor, but this is part of why it should be reported.

      The authors also state that they do not see differences in the Schwann cell response to injury in the absence of DR6 that were reported in Gamage et al., but this is not an accurate comparison. In Gamage et al., they examined Schwann cells around axons that were protected from degeneration 2 and 4 weeks post-injury. Those axons had much thinner myelin, in contrast to axons protected by WldS or loss of Sarm1, where the myelin thickness remained relatively normal. Thus, Gamage et al. concluded that the protection of axons from degeneration and the preservation of Schwann cell myelin thickness are separate processes. Here, since no axon protection was seen, the same analysis cannot be done, and we can only say that when axons degenerate, the Schwann cells respond the same whether DR6 is expressed or not.

      The authors also take issue with Colombo et al. (2018), where it was reported that there is an increase in axon diameter and a change in the g-ratio (axon diameter to fiber diameter - the axon + myelin) in peripheral nerves in DR6 knockout mice. This change resulted in a small population of abnormally large axons that had thinner myelin than one would expect for their size. The change in g-ratio was specific to these axons and driven by the increased axon diameter, not decreased myelin thickness, although those two factors are normally loosely correlated. Here, the authors report no changes in axon size or g-ratio, but this could also be due to how the distribution of axon sizes was binned for analysis, and looking at individual data points in supplemental figure 3A, there are axons in the DR6 knockout mice that are larger than any axons in wild type. Thus, this discrepancy may be down to specifics and how statistics were performed or how histograms were binned, but it is unclear if the results presented here are dramatically at odds with the results in Colombo et al. (2018).

      Finally, it is important to note that previously reported effects of DR6 inhibition, such as protection of cultured cortical neurons from beta-amyloid toxicity, are not necessarily the same as Wallerian degeneration of axons distal to an injury studied here. The negative results presented here, showing that loss of DR6 is not protective against Wallerian degeneration induced by injury, are important given the interest in DR6 as a therapeutic target, but they are specific to these mice and this mechanism of induced axon degeneration. The extent to which these findings contradict previous work is difficult to assess due to the lack of detail in describing the mouse experiments, and care should be taken in attempting to extrapolate these results to other disease contexts, such as ALS or Alzheimer's disease.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Beirowski, Huang, and Babetto revisits the proposed role of Death Receptor 6 (DR6/Tnfrsf21) in Wallerian degeneration (WD). A prior study (Gamage et al., 2017) suggested that DR6 deletion delays axon degeneration and alters Schwann cell responses following peripheral nerve injury. Here, the authors comprehensively test this claim using two DR6 knockout mouse models (the line used in the earlier report plus a CMV-Cre derived floxed ko line) and multiple WD assays in vivo and in vitro, aligned with three positive controls, Sarm1 WldS and Phr1/Mycbp2 mutants. Contrary to the prior findings, they find no evidence that DR6 deletion affects axon degeneration kinetics or Schwann cell dynamics (assessed by cJun expression or [intact+degenerating] myelin abundance after injury) during WD. Importantly, in DRG explant assays, neurites from DR6-deficient mice degenerated at rates indistinguishable from controls. The authors conclude that DR6 is dispensable for WD, and that previously reported protective effects may have been due to confounding factors such as genetic background or spontaneous mutations.

      Strengths:

      The authors employ two independently generated DR6 knockout models, one overlapping with the previously published study, and confirm loss of DR6 expression by qPCR and Western blotting.<br /> Multiple complementary readouts of WD are applied (structural, ultrastructural, molecular, and functional), providing a robust test of the hypothesis.

      Comparisons are drawn with established positive controls (WldS, SARM1, Phr1/Mycbp2 mutants), reinforcing the validity of the assays.

      By directly addressing an influential but inconsistent prior report, the manuscript clarifies the role of DR6 and prevents potential misdirection of therapeutic strategies aimed at modulating WD in the PNS. The discussion thoughtfully considers possible explanations for the earlier results, including colony-specific second-site mutations that could explain the incomplete penetrance of the earlier reported phenotype of only 36%.

      Weaknesses:

      (1) The study focuses on peripheral nerves. The manuscript frequently refers to CNS studies to argue for consistency with their findings. It would be more accurate to frame PNS/CNS similarities as reminiscences rather than as consistencies (e.g., line 205ff in the Discussion).

      (2) The DRG explant assays are convincing, though the slight acceleration of degeneration in the DR6 floxed/Cre condition is intriguing (Figure 4E). Could the authors clarify whether this is statistically robust or biologically meaningful?

      (3) In the summary (line 43), the authors refer to Hu et al. (2013) (reference 5) as the study that previously reported AxD delay and SC response alteration after injury. However, this study did not investigate the PNS, and I believe the authors intended to reference Gamage et al. (2017) (reference 10) at this point.

      (4) In line 74ff of the results section, the authors claim that developmental myelination is not altered in DR6 mutants at postnatal day 1. However, the variability in Figure S2 appears substantial, and the group size seems underpowered to support this claim. Colombo et al. (2018) (reference 11) reported accelerated myelination at P1, but this study likewise appears underpowered. Possible reasons for these discrepancies and the large variability could be that only a defined cross-sectional area was quantified, rather than the entire nerve cross-section.

      (5) The authors stress the data of Gamage et al. (2017) on altered SC responses in DR6 mutants after injury. They employed cJun quantification to show that SC reprogramming after injury is not altered in DR6 mutants. This approach is valid and the conclusion trustworthy. Here, the addition of data showing the combined abundance of intact and degenerated myelin does not add much insight. However, Gamage et al. (2017) reported altered myelin thickness in a subset of axons at 14 days after injury, which is considerably later than the time points analyzed in the present study. While, in the Reviewer's view, the thin myelin observed by Gamage et al. in fact resembles remyelination, the authors may wish to highlight the difference in the time points analyzed.

    4. Reviewer #3 (Public review):

      Summary:

      The authors revisit the role of DR6 in axon degeneration following physical injury (Wallerian degeneration), examining both its effects on axons and its role in regulating the Schwann cell response to injury. Surprisingly, and in contrast to previous studies, they find that DR6 deletion does not delay the rate of axon degeneration after injury, suggesting that DR6 is not a mediator of this process.

      Overall, this is a valuable study. As the authors note, the current literature on DR6 is inconsistent, and these results provide useful new data and clarification. This work will help other researchers interpret their own data and re-evaluate studies related to DR6 and axon degeneration.

      Strengths:

      (1) The use of two independent DR6 knockout mouse models strengthens the conclusions, particularly when reporting the absence of a phenotype.

      (2) The focus on early time points after injury addresses a key limitation of previous studies. This approach reduces the risk of missing subtle protective phenotypes and avoids confounding results with regenerating axons at later time points after axotomy.

      Weaknesses:

      (1) The study would benefit from including an additional experimental paradigm in which DR6 deficiency is expected to have a protective effect, to increase confidence in the experimental models, and to better contextualize the findings within different pathways of axon degeneration. For example, DR6 deletion has been shown in more than one study to be partially axon protective in the NGF deprivation model in DRGs in vitro. Incorporating such an experiment could be straightforward and would strengthen the paper, especially if some of the neuroprotective effects previously reported are confirmed.

      (2) The quality of some figures could be improved, particularly the EM images in Figure 2. As presented, they make it difficult to discern subtle differences.

    1. eLife Assessment

      In their study, Brown et. al. provide an important advance in understanding the architecture of the mycobacterial outer membrane. Using all-atom simulations of model mycomembranes, the work reports compelling structural insights into how α-mycolic acids and outer leaflet lipids (PDIM and PAT) shape membrane organisation. The work revealed membrane heterogeneity with ordered inner leaflets and disordered outer leaflets that provide a molecular explanation for the resilience of the mycobacterial envelope.

    2. Reviewer #1 (Public review):

      Disclaimer:

      This reviewer is not an expert on MD simulations but has a basic understanding of the findings reported and is well-versed with mycobacterial lipids.

      Summary:

      In this manuscript titled "Dynamic Architecture of Mycobacterial Outer Membranes Revealed by All-Atom 1 Simulations", Brown et al describe outcomes of all-atom simulation of a model outer membrane of mycobacteria. This compelling study provided three key insights:<br /> (1) The likely conformation of the unusually long chain alpha-branched beta-methoxy fatty acids, mycolic acids in the mycomembrane, to be the extended U or Z type rather than the compacted W-type. (2) Outer leaflet lipids such as PDIM and PAT provide regional vertical heterogeneity and disorder in the mycomembrane that is otherwise prevented in a mycolic acid-only bilayer.<br /> (3) Removal of specific lipid classes from the symmetric membrane systems leads to significant changes in membrane thickness and resilience to high temperatures.

      Strengths:

      The authors take a step-wise approach in building the complexity of the membrane and highlight the limitations of each of the approaches. A case in point is the use of supraphysiological temperature of 333 K or even higher temperatures for some of the simulations. Overall, this is a very important piece of work for the mycobacterial field, and will help in the development of membrane-disrupting small molecules and provide important insights for lipid-lipid interactions in the mycomembrane.

      Weaknesses:

      (1) The authors used alpha-mycolic acids only for their models. The ratios of alpha, keto, and methoxy-mycolic acids are known in the literature, and it may be worth including these in their model. Future studies can be aimed at addressing changes in the dynamic behavior of the MOM by altering this ratio, but the inclusion of all three forms in the current model will be important and may alter the other major findings of the current study.

      (2) The findings from the 14 different symmetric membrane systems developed with the removal of one complex lipid at a time are very interesting but have not been analysed/discussed at length in the current manuscript. I find many interesting insights from Figures S3 and S5, which I find missing in the manuscript. These are as follows:

      a) Loss of PDIM resulted in reduced membrane thickness. This is a very important finding given that loss of PDIM can be a spontaneous phenomenon in Mtb cultures in vitro and that this is driven by increased nutrient uptake by PDIM-deficient bacilli (Domenech and Reed, 2009 Microbiology). While the latter is explained by the enhanced solute uptake by several PE/PPE transporter systems in the absence of PDIM (Wang et al, Science 2020), the findings presented by Brown et al could be very important in this context. A discussion on these aspects would be beneficial for the mycobacterial community.

      b) I find it interesting that loss of PAT or DAT does not change membrane thickness (Figure S3). While both PAT and PDIM can migrate to the interleaflet space, loss of PDIM and PAT has a different impact on membrane thickness. It is worth explaining what the likely interactions are that shape membrane thickness in the case of the modelled MOM.

      c) Figure S5: Is the presence of SGL driving PDIM and PAT to migrate to the inter-leaflet space? Again, a discussion on major lipid-lipid interactions driving these lipid migrations across the membrane thickness would be useful.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript reports all-atom molecular dynamics simulations on the outer membrane of Mycobacterium tuberculosis. This is the first all-atom MD simulation of the MTb outer membrane and complements the earlier studies, which used coarse-grained simulation.

      Strengths:

      The simulation of the outer membrane consisting of heterogeneous lipids is a challenging task, and the current work is technically very sound.

      The observation about membrane heterogeneity and ordered inner leaflets vs disordered outer leaflets is a novel result from the study. This work will also facilitate other groups to work on all-atom models of mycobacterial outer membrane for drug transport, etc.

      Weaknesses:

      Beyond a challenging simulation study, the current manuscript only provides qualitative explanations on the unusual membrane structure of MTb and does not demonstrate any practical utility of the all-atom membrane simulation. It will be difficult for the general biology community to appreciate the significance of the work, based on the manuscript in its current form, because of the high content of technical details and limited evidence on the utility of the work.

      Major Points:

      (1) The simulation by Basu et al (Phys Chem Chem Phys 2024) has studied drug transports through mycolic acid monolayers. Since the authors of the current study have all atom models of MTb outer membrane, they should carry out drug transport simulations and compare them to the outer membranes of other bacteria through which drugs can permeate. In the current manuscript, it is only discussed in lines 388-392. Can the disruption of MA cyclopropanation be simulated to show its effect on membrane structure ?

      (2) In line 277, the authors mention about 6 simulations which mimic lipid knockout strains. The results of these simulations, specifically the outcomes of in silico knockout of lipids, are not described in detail.

      (3) Figure 5 shows PDIM and PAT-driven lipid redistribution, which is a significant novel observation from the study. However, comparison of 3B and 3D shows that at 313K, the movement of the PDIM head group is much less. Since MD simulations are sensitive to random initial seeds, repeated simulations with different random seeds and initial structures may be necessary.

      (4) As per Figure 1, in the initial structure, the head group of PAT should be on the membrane surface, similar to TDM and TMM, while PDIM is placed towardsthe interior of the outer membrane. However, Figure 5 shows that at t=0, PAT has the same Z position as PDIM. It will be necessary to provide Z-position Figures for TMM and TDM to understand the difference. Is it really dependent on the chemical structure of the lipid moiety or the initial position of the lipid in the bilayer at the beginning of the simulation?

      Minor Point:

      In view of the complexity of the system undertaken for the study, the manuscript in its current form may not be informative for readers who are not experts in molecular simulations.

    1. eLife Assessment

      This important study uses innovative microfluidics-based single-cell imaging to monitor replicative lifespan, protein localization, and intracellular iron levels in aging yeast cells. The evidence for the proposed role of Ssd1 and reduced nutrients for lifespan through limiting iron uptake is convincing, even though some mechanistic details remain unclear. This work will be of interest to cell biologists working on aging and iron metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      Overexpression of the mRNA-binding protein Ssd1 was shown before to expand the replicative lifespan of yeast cells, whereas ssd1 deletion had the opposite effect. Here, the authors provide evidence that Ssd1 acts via sequestration of mRNAs of the Aft1/2-dependent iron regulon. This restricts activation of the regulon and limits accumulation of Fe2+ inside cells, thereby likely lowering oxidative damage. The effects of Ssd1 overexpression and calorie restriction on lifespan are epistatic, suggesting that they might act through the same pathway.

      Strengths:

      The study is well-designed and involves analysis of single yeast cells during replicative aging. The findings are well displayed and largely support the derived model, which also has implications for the lifespan of other organisms, including humans.

      Weaknesses:

      The model is largely supported by the findings, however, they remain largely correlative at the same time. Whether the knockout of ssd1 shortens lifespan by increased intracellular Fe2+ levels has not been tested. The finding that increased Ssd1 levels form condensates in a cell-cycle-dependent manner is interesting, yet the role of the condensates in lifespan expansion remains untested and unlinked.

    3. Reviewer #2 (Public review):

      This manuscript describes the use of a powerful technique called microfluidics to elucidate the mechanisms explaining how overexpression (OE) of Ssd1 and caloric restriction (CR) in yeast extend replicative lifespan (RLS). Microfluidics measures RLS by trapping cells in chambers mounted to a slide. The chambers hold the mother cell but allow daughters to escape. The slide, with many chambers, is recorded during the entire process, roughly 72 hours, with the video monitored afterwards to count how many daughters each of the trapped mothers produces. The power of the method is what can be done with it. For example, the entire process can be viewed by fluorescence so that GFP and mCherry-tagged proteins can be followed as cells age. The budding yeast is the only model where bona fide replicative aging can be measured, and microfluidics is the only system that allows protein localization and levels to be measured in a single cell while aging. The authors do a wonderful job of showing what this combination of tools can do.

      The authors had previously shown that Ssd1, an mRNA-binding protein, extends RLS when overexpressed. This was attributed to Ssd1 sequestering away specific mRNAs under stress, likely leading to reduced ribosomal function. It remained completely unknown how Ssd1 OE extended RLS. The authors observed that overexpressed, but not normally expressed, Ssd1 formed cytoplasmic condensates during mitosis that are resolved by cytokinesis. When the condensates fail to be resolved at the end of mitosis, this signals death.

      It has become clear in the literature that iron accumulation increases with age within the cell. The transcriptional programs that activate the iron regulon also become elevated in aging cells. This is thought to be due to impaired mitochondrial function in aging cells, with increased iron accumulation as an attempt at restoring mitochondrial activity. The authors show that Ssd1 OE and CR both reduce the expression of the iron regulon. The data presented indicate that iron accumulation shortens RLS: deletion of iron regulon components extends RLS, and adding iron to WT cells decreases RLS, but not when Ssd1 is overexpressed or when cells are calorically restricted. Interestingly, iron chelation using BPS has no impact on WT RLS, but decreases the elevated RLS in CR cells and cells overexpressing Ssd1. It was not initially clear why iron chelation would inhibit the extended lifespan seen with CR and Ssd1 OE. This was addressed by an experiment where it was shown that the iron regulon is induced (FIT2 induction) when iron is chelated. Thus, the detrimental effects of induction of the iron regulon by BPS and iron accumulation on RLS cannot be tempered by Ssd1 OE and CR once turned on.

      I did not find any weaknesses to be addressed in this paper. The draft was well-written, and the extensive experimentation was well-designed, performed, and controlled. However, I did make minor comments that I recommend the authors address:

      (1) Why would BPS not reduce RLS in WT cells? The authors could test whether OE of FIT2 reduces RLS in WT cells.

      (2) The authors should add a brief explanation for why the GDP1 promoter was chosen for Ssd1 OE.

      (3) On page 12, growth to saturation was described as glucose starvation. This is more accurately described as nutrient deprivation. Referring to it as glucose starvation is akin to CR, which growing to saturation is not. Ssd1 OE formed condensates upon saturation but not in CR. Why do the authors think Ssd1 OE did not form condensates upon CR? Too mild a stress?

      (4) The authors conclude that the main mechanism for RLS extension in CR and Ssd1 OE is the inhibition of the iron regulon in aging cells. The data certainly supports this. However, this may be an overstatement as other mutations block CR, such as mutations that impair respiration. The authors do note that induction of the iron regulon in aging cells could be a response to impaired mitochondrial function. Thus, it seems that the main goal of CR and Ssd1 OE may be to restore mitochondrial function in aging cells, one way being inactivation of the iron regulon. A discussion of how other mutations impact CR would be of benefit.

      (5) The cell cycle regulation of Ssd1 OE condensates is very interesting. There does not appear to be literature linking Ssd1 with proteasome-dependent protein turnover. Many proteins involved in cell cycle regulation and genome stability are regulated through ubiquitination. It is not necessary to do anything here about it, but it would be interesting to address how Ssd1 condensates may be regulated with such precision.

      (6) While reading the draft, I kept asking myself what the relevance to human biology was. I was very impressed with the extensive literature review at the end of the discussion, going over how well conserved this strategy is in yeast with humans. I suggest referring to this earlier, perhaps even in the abstract. This would nail down how relevant this model is for understanding human longevity regulation.

      In conclusion, I enjoyed reading this manuscript, describing how Ssd1 OE and CR lead to RLS increases, using different mechanisms. However, since the 2 strategies appear to be using redundant mechanisms, I was surprised that synergism was not observed.

    4. Reviewer #3 (Public review):

      In this paper, the authors investigate how the RNA-binding protein Ssd1 and calorie restriction (CR) influence yeast replicative lifespan, with a particular focus on age-dependent iron uptake and activation of the iron regulon. For this, they use microfluidics-based single-cell imaging to monitor replicative lifespan, protein localization, and intracellular iron levels across aging cells. They show that both Ssd1 overexpression and CR act through a shared pathway to prevent the nuclear translocation of the iron-regulon regulator Aft1 and the subsequent induction of high-affinity iron transporters. As a result, these interventions block the age-related accumulation of intracellular free iron, which otherwise shortens lifespan. Genetic and chemical epistasis experiments further demonstrate that suppression of iron regulon activation is the key mechanism by which Ssd1 and CR promote replicative longevity.

      Overall, the paper is technically rigorous, and the main conclusions are supported by a substantial body of experimental data. The microfluidics-based assays in particular provide compelling single-cell evidence for the dynamics of Ssd1 condensates and iron homeostasis.

      My main concern, however, is that the central reasoning of the paper-that Ssd1 overexpression and CR prevent the activation of the iron regulon-appears to be contradicted by previous findings, and the authors may actually be misrepresenting these studies, unless I am mistaken. In the manuscript, the authors state on two occasions:

      "Intriguingly, transcripts that had altered abundance in CR vs control media and in SSD1 vs ssd1∆ yeast included the FIT1, FIT2, FIT3, and ARN1 genes of the iron regulon (8)"

      "Ssd1 and CR both reduce the levels of mRNAs of genes within the iron regulon: FIT1, FIT2, FIT3 and ARN1 (8)"

      However, reference (8) by Kaeberlein et al. actually says the opposite:

      "Using RNA derived from three independent experiments, a total of 97 genes were observed to undergo a change in expression >1.5-fold in SSD1-V cells relative to ssd1-d cells (supplemental Table 1 at http://www.genetics.org/supplemental/). Of these 97 genes, only 6 underwent similar transcriptional changes in calorically restricted cells (Table 2). This is only slightly greater than the number of genes expected to overlap between the SSD1-V and CR datasets by chance and is in contrast to the highly significant overlap in transcriptional changes observed between CR and HAP4 overexpression (Lin et al. 2002) or between CR and high external osmolarity (Kaeberlein et al. 2002). Intriguingly, of the 6 genes that show similar transcriptional changes in calorically restricted cells and SSD1-V cells, 4 are involved in iron-siderochrome transport: FIT1, FIT2, FIT3, and ARN1 (supplemental Table 1 at http://www.genetics.org/supplemental/)."

      Although the phrasing might be ambiguous at first reading, this interpretation is confirmed upon reviewing Matt Kaeberlein's PhD thesis: https://dspace.mit.edu/handle/1721.1/8318 (page 264 and so on).

      Moreover, consistent with this, activation of the iron regulon during calorie restriction (or the diauxic shift) has also been observed in two other articles:

      https://doi.org/10.1016/S1016-8478(23)13999-9

      https://doi.org/10.1074/jbc.M307447200

      Taken together, these contradictory data might blur the proposed model and make it unclear how to reconcile the results.

    1. eLife Assessment

      With the goal of investigating the assembly and fragmentation of cellular aggregates, this manuscript investigates cyanobacterial aggregates in a laboratory setting. This investigation of the conditions and mechanisms behind aggregation is an important contribution as it yields basic understanding of natural processes and offers potential strategies for control. The combination of computational and experimental investigations in this manuscript provides solid support for the role of shear on aggregation and fragmentation. However, the role of extracellular matrix, with possibly a strong effect on aggregation, is not adequately studied.

    2. Reviewer #1 (Public review):

      Sinzato et. al. investigated how shear flow in a rheological chamber affects the assembly and fragmentation of cyanobacterial aggregates, with the goal of understanding how such aggregates might form naturally, and/or be destroyed industrially. The authors used a combination of experiments and models to show that cyanobacterial colonies can be difficult to fragment with fluid flows. Additionally, they provide biophysical support for the idea that such aggregates likely form primarily when cells stay together after cell division, rather than coming together from disparate paths.

      This work has significant relevance to the field, both practically and naturally. Combatting or preventing toxic cyanobacterial blooms is an active area of environmental research that offers a practical backbone for this manuscript's ideas. Additionally, the formation and behavior of cellular aggregates in general is of widespread interest in many fields, including marine and freshwater ecology, healthcare and antibiotic resistance research, biophysics, and microbial evolution. In this field, there are still outstanding questions regarding how microbial aggregates form into communities, including if and how they come together from separate places. Therefore, I believe that researchers from many distinct fields would find interest in the topic of this paper, and particularly Figure 5, in which a phase space that is meant to represent the different modes of aggregate formation and destruction is suggested, dependent on properties of the fluid flow and particle concentration.

      Altogether, the authors were successful in their investigation, and I find their claims to be justified. In particular, the authors achieve strong results from their experiments. Below, I outline key claims of the paper and indicate the level to which they were supported by their data.

      • Their first major claim is that fluid flows alone must be quite strong in order to fragment the cyanobacterial aggregates they have studied. With their rheological chamber, they explicitly show that energy dissipation rates must exceed "natural" conditions by multiple orders of magnitude in order to fragment lab strain colonies, and even higher to disrupt natural strains sampled from a nearby freshwater lake. This claim is well-supported by their experiments and data.

      • The authors then claim that the fragmentation of aggregates due to fluid flows occurs primarily through erosion of small pieces from larger aggregates. Because their experimental setup does not allow them to directly observe this process (for example, by watching one aggregate break into pieces), they rely on indirect methods to support the claim. Overall, the experimental evidence is generally supportive, but the models leave some gaps. I describe this conclusion in more detail below.

      • The strongest evidence for the erosion-dominated process comes from the authors' measurements of transfer of biomass between large and small size classes, as in Figure 2E and Figure 2D. The authors claim that only the erosion model can reproduce this kind of biomass transfer. However, it also seems that the idealized erosion model alone is not fully sufficient to capture the observed behavior. In Figure 2D, there remains a gap between their experiment and the prediction of the erosion model, which grows larger over time (Supplemental Figure S9). While the authors suggest that the erosion model is better than the equal-fragmentation model, it is also true that tracking the mean size (Figure 2B) or small size distribution (Figure S6) cannot distinguish between these models.

      • Taken altogether, the experimental evidence favors an erosion-dominated process. However, a few minor questions remain regarding the models. Why does the equal-fragmentation model predict no biomass transfer between size classes? To what extent, quantitatively, does the erosion model outperform the equal fragments model at capturing the biomass size distributions? Finally, why does the idealized erosion fail to capture the size distribution at late stages in Supplemental Figure S9 - would this discrepancy be resolved if the authors considered individual colony variances in cell adhesion (for instance, as hypothesized by the authors in lines 133-137)? I do not believe these questions curb the other results of the paper.

      • Their third major claim is that fluid flows only weakly cause cells to collide and adhere in a "coming together" process of aggregate formation. They test this claim in Figure 3, where they suspend single cells in their test chamber and stir them at moderate intensity, monitoring their size histogram. They show that the size histogram changes only slightly, indicating that aggregation is, by-and-large, not occurring at a high rate. Therefore, they lend support to the idea that cell aggregation likely does not initiate group formation in toxic cyanobacterial blooms. Additionally, they show that the median size of large colonies also does not change at moderate turbulent intensities. These results agree with previous studies (their own citation 25) indicating that aggregates in toxic blooms are clonal in nature. This is an important result, and well-supported by their data, but only for this specific particle concentration and stirring intensity. Later, in Figure 5 they show a much broader range of particle concentrations and energy dissipation rates that they leave untested. However, they refer to other literature that does test these regions of the phase map.

      • The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful, and represent a significant conceptual advance. By combining their experiments with discussions of other experimental investigations of scum formation in cyanobacterial blooms, the authors have investigated the two most relevant zones of this map for the present study (Zones II and III), and have made a strong contribution to the literature in regards to artificial mixing to disrupt cyanobacterial blooms.

      Other notes:

      The authors rely heavily on size distributions to make the claims of their paper. I was pleased to find the calibration histograms in Supplemental Figure S8, which provide information as to how and why they made corrections to the histograms they observed. From these calibration histograms, it seems that larger colonies are more accurately measured in the cone-and-plate shear setup, while smaller colonies can be missed, presumably due to resolution issues.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors investigate the role of fluid flow in shaping the colony size of a freshwater cyanobacterium Microcystis. To do so, they have created a novel assay by combining a rheometer with a bright field microscope. This allows them to exert precise shear forces on cyanobacterial cultures and field samples, and then quantify the effect of these shear forces on the colony size distribution. Shear force can affect the colony size in two ways: reducing size by fragmentation and increasing size by aggregation. They find limited aggregation at low shear rates, but high shear forces can create erosion-type fragmentation: colonies do not break in large pieces, but many small colonies are sheared off the large colonies. Overall, bacterial colonies from field samples seem to be more inert to shear than laboratory cultures, which the authors explain in terms of enhanced intercellular adhesion mediated by secreted polysaccharides.

      Strengths:

      • This study is timely, as cyanobacterial blooms are an increasing problem in freshwater lakes. They are expected to increase in frequency and severeness because of rising temperatures, and it is worthwhile learning how these blooms are formed. More generally, how physical aspects such as flow and shear influence colony formation is often overlooked, at least in part because of experimental challenges. Therefore, the method developed by the authors is useful and innovative, and I expect applications beyond the presented system here.

      • A strong feature of this paper is the highly quantitative approach, combining theory with experiments, and the combination of laboratory experiments and field samples.

      Weaknesses:

      • Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. The writing could have done more justice to the fact that the importance of adhesion had been described elsewhere. This being said, the same method can be used to investigate systems where shear forces are biologically more relevant.
    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Their first major claim is that fluid flows alone must be quite strong in order to fragment the cyanobacterial aggregates they have studied. With their rheological chamber, they explicitly show that energy dissipation rates must exceed "natural" conditions by multiple orders of magnitude in order to fragment lab strain colonies, and even higher to disrupt natural strains sampled from a nearby freshwater lake. This claim is well-supported by their experiments and data.

      We thank the reviewer for this positive comment. We fully agree, as our fragmentation experiments on division-formed colonies clearly demonstrate their strong mechanical resistance in naturally occurring flows.

      (2) The authors then claim that the fragmentation of aggregates due to fluid flows occurs through erosion of small pieces. Because their experimental setup does not allow them to explicitly observe this process (for example, by watching one aggregate break into pieces), they implement an idealized model to show that the nature of the changes to the size histogram agrees with an erosion process. However, in Figure 2C there is a noticeable gap between their experiment and the prediction of their model. Additionally, in a similar experiment shown in Figure S6, the experiment cannot distinguish between an idealized erosion model and an alternative, an idealized binary fission model where aggregates split into equal halves. For these reasons, this claim is weakened.

      The two idealized models of colony fragmentation, namely erosion of single cells and fragmentation into equal sizes (or binary fission), lead to distinguishable final size distributions. We believe that our experiments for division-formed colonies support the hypothesis of the erosion mechanism. Specifically, Figure 2E shows that colony fragmentation resulted in a decrease of large colonies and a strong increase of single cells and dimers (two cells). In our view, the strong increase of single cells and dimers provides quite convincing (but indirect) evidence supporting the erosion mechanism. This is described on lines 112-121. To further address the reviewer’s concern, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between these two fragmentation models for large division-formed colonies fragmented at a high dissipation rate of ε = 5.8 m<sup>2</sup>/s<sup>3</sup>. Furthermore, we have included the new Supplementary Figure S9, which details the model predictions for the colony size distribution at various time points.

      The ideal equal fragments model (i.e., where every fracture event produces two identical fragments with half the original biovolume) does not capture the biovolume transfer from large colonies to single cells, as observed for the experimental results in panel D of Figure 2 and panel E of Figure S9. In contrast, the erosion model, in panel D of Figure 2 and panel D of Figure S9, provides a good prediction of the experimental results within the experimental uncertainty. The different fragmentation models are discussed in lines 226-228 of the revised manuscript and lines 865-873 of the SI.

      (3) Their third major claim is that fluid flows only weakly cause cells to collide and adhere in a "coming together" process of aggregate formation. They test this claim in Figure 3, where they suspend single cells in their test chamber and stir them at moderate intensity, monitoring their size histogram. They show that the size histogram changes only slightly, indicating that aggregation is, by and large, not occurring at a high rate. Therefore, they lend support to the idea that cell aggregation likely does not initiate group formation in toxic cyanobacterial blooms. Additionally, they show that the median size of large colonies also does not change at moderate turbulent intensities. These results agree with previous studies (their own citation 25) indicating that aggregates in toxic blooms are clonal in nature. This is an important result and well-supported by their data, but only for this specific particle concentration and stirring intensity. Later, in Figure 5 they show a much broader range of particle concentrations and energy dissipation rates that they leave untested.

      We thank the reviewer for this positive comment. We agree that our experimental results show clear evidence that aggregated colonies have a weaker structure in comparison to division-formed colonies, thus supporting the hypothesis that clonal expansion is the main mechanism for colony formation under most natural settings. The range of energy dissipation rates of our experimental setup covers almost entirely the region for which aggregated and division-formed colonies differ in their fragmentation behavior (Zone III of Figure 5). Within this zone, aggregated colonies are fragmented and only the division-formed colonies are able to withstand the hydrodynamic stresses. Furthermore, we show that this fragmentation behavior has a low sensitivity to the total biovolume fraction, as displayed in the Supplementary Figures S2 and S4 and discussed in lines 151-154 and 160-163. We agree that our cone-and-plate setup covers a limited parameter range, and we have added a detailed discussion of these limitations in the revised manuscript, under section Materials and Methods in lines 462-473.

      (4) The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful and represent a significant conceptual advance. However, there is a region of this phase map that the authors have left untested experimentally. The lowest energy dissipation rate that the authors tested in their experiment seemed to be \dot{epsilon}~1e-2 [m^2/s^3], and the highest particle concentration they tested was 5e-4, which means that the authors never tested Zone II of their phase map. Since this seems to be an important zone for toxic blooms (i.e. the "scum formation" zone), it seems the authors have missed an important opportunity to investigate this regime of high particle concentrations and relatively weak turbulent mixing.

      We agree with the reviewer that Zone (II) of Figure 5 is of great importance to dense bloom formation under wind mixing and that this parameter range was not covered by our experiments using a cone-and-plate shear flow. The measuring range of our device was motivated by engineering applications such as artificial mixing of eutrophic lakes using bubble plumes, as well as preliminary experiments which demonstrated that high levels of dissipation rate were required to achieve fragmentation. The range of dissipation rates that can be achieved by the cone-and-plate setup is limited at the lower end by the accumulation of colonies near the stagnation point at the conical tip and at the upper end by the spillage of fluid out of the chamber. We now discuss this measuring range in lines 462-473 of the revised manuscript.

      Although our setup does not cover Zone (II), we now refer to recent results in the literature for evidence of aggregation-dominance at Zone (II). The experimental study of Wu et al. (2024) (reference number 64 of the revised manuscript) investigated the formation of Microcystis surface scum layers in wind-mixed mesocosms. Their study identified aggregation of colonies in the scum layer, resulting in increases of colony size at rates faster than cell division. These results agree with our model, and the parameters range investigated fall within the Zone II. We have included in the revised version, lines 328-337, a detailed discussion elucidating the parameter range covered in our experiments and the findings of Wu et al. (2024).

      Other items that could use more clarity:

      (5) The authors rely heavily on size distributions to make the claims of their paper. Yet, how they generated those size distributions is not clearly shown in the text. Of primary concern, the authors used a correction function (Equation S1) to estimate the counts of different size classes in their image analysis pipeline. Yet, it is unclear how well this correction function actually performs, what kinds of errors it might produce, and how well it mapped to the calibration dataset the authors used to find the fit parameters.

      We agree with the reviewer that more details of the correction function should be included. We have included in the revised version of the Supporting Information, in lines 785-796, a more detailed explanation of the correction function. Furthermore, a direct comparison of raw and corrected histograms of the size distribution and its associated uncertainty is presented in the new Supplementary Figure S8.

      (6) Second, in their models they use a fractal dimension to estimate the number of cells in the group from the group radius, but the agreement between this fractal dimension fit and the data is not shown, so it is not clear how good an approximation this fractal dimension provides. This is especially important for their later derivation of the "aggregation-to-cell division" ratio (Equation 8)

      We agree with the reviewer that more details on the estimation of fractal dimension are needed. The revised version, under Materials and Methods in lines 508-515, now includes the detailed estimation procedure, the number of colonies analysed, and the associated uncertainty.

      Reviewer #1 (Recommendations For The Authors):

      In light of the weak evidence for claim #2 outlined above, I believe the paper would benefit from a more explicit comparison in Figure 2C of the two models - idealized erosion, and idealized binary fission. With such a comparison, the authors would have stronger footing to claim that one process is more important than the other.

      As mentioned in our answer above to comment #2 of public review, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between the erosion and equal fragments (binary fission) models for large division-formed colonies fragmented under ε = 5.8 m<sup>2</sup>/s<sup>3</sup>. The comparison is further detailed in the new Supplementary Figure S9 for representative time points. Only the erosion models can recover the biovolume transfer from large colonies to single cells, as observed for the experimental results in Figure 2D and further detailed in Figure S9D. We believe that the revised version of Figure 2 and the new Supplementary Figure S9 provide strong evidence in support of the erosion fragmentation model.

      Would the authors comment on their chosen range of experimental dissipation rates? For instance, was their goal more to investigate industrial/engineering applications where the goal is to disrupt the cyanobacteria, but not really typical natural conditions under which the groups might form?

      The choice of experimental dissipation rates in our experiment was such that it covers engineering applications such as artificial mixing of eutrophic lakes using bubble plumes. We have now clarified in the Introduction, on lines 37-39, that artificial mixing has been successfully applied in several lakes to suppress cyanobacterial blooms. Furthermore, we have now clarified in the caption of Figure 5 that the bars on the right side indicate typical values of dissipation rates induced by natural wind-mixing, bubble plumes in artificially mixed lakes, and laboratory-scale experiments such as cone-and-plate systems and stirred tanks. The dissipation rates induced by the bubble plumes in artificially mixed lakes could potentially fragment aggregated cyanobacterial colonies and thus disrupt bloom formation. However, our preliminary experiments demonstrated that high levels of dissipation rate were required to achieve fragmentation, therefore we’ve focused on the upper range of values (0.01 to 10 m<sup>2</sup>/s<sup>3</sup>).

      The dissipation rates generated by the cone-and-plate approach are indeed higher than the dissipation rates under typical natural conditions in lakes. We have now added a detailed discussion of the range of dissipation rates generated by the cone-and-plate approach in the revised manuscript, under section Materials and Methods in lines 462-473, where we also explain that these values are higher than the natural dissipation rates generated by wind action in lakes. However, the more generic insights obtained by our study, shown in Figure 5, are relevant for dissipation rates of natural lakes (e.g., Zone II). Therefore, in our discussion of Figure 5 we have now included the recent findings of Wu et al. (2024) (reference number [64] of the revised manuscript), who studied bloom formation of Microcystis in mesocosm experiments at dissipation rates representative of natural conditions; see also our reply to the next comment.

      The authors should consider testing the space of Zone II on their phase map, for instance at very high particle concentrations and even lower rotational speeds, in order to show that their derivations match experiments.

      Good point. As mentioned in our answer above to comment #4 of the public review, Zone II lies beyond the measuring range of our experimental setup. Instead, we refer to the recent study of Wu et al. (2024) (reference number [64] of the revised manuscript) which demonstrated that dense scum layers of Microcystis colonies are aggregation-dominated. These mesocosm experiments agree with our model predictions and their parameter range falls within Zone II. We have included in the revised version, lines 328-337, a detailed discussion where we elucidate the parameter range covered in our experiments and compare our predictions for Zone II with the recent findings of Wu et al. (2024).

      The authors should show their calibration data and fit for the correction function of equation S1. Additionally, you may consider showing "raw" and "corrected" histograms of the size distribution, to demonstrate exactly what corrections are made.

      As mentioned in our answer above to comment #5 of the public review, we have included in the revised version of the Supporting Information the new Supplementary Figure S8, which shows the raw and adjusted histograms of the size distribution, including the associated uncertainties. Furthermore, the correction function is now explained in detail in the new Supporting Information Text in lines 785-796.

      The authors might consider commenting on Figure S3 a bit more in the main text. Even at very high dissipation rates, the cyanobacterial groups don't plummet to size 1, but stay in an equilibrium around 10-20x the diameter of a single cell. What might this mean for industrial applications trying to break up the groups?

      We agree with the reviewer that further discussion of Figure S3, panels E and F, is warranted. In the revised version of the manuscript, under section Fragmentation of Microcystis colonies occurs through erosion in lines 133-137, we have now included a discussion of this figure. Figure S3F shows that more than 90% of the total biovolume ends up in the category “small colonies” (mostly single cells and dimers); hence, most of the initially large colonies do fragment to single cells or dimers. Only about 5-10% of the biovolume remains as “large colonies” of 10-20 cells. Although it is challenging to draw definitive conclusions about the behavior of these remaining large colonies, as they account for only a minor fraction of the suspension, one hypothesis is that variability in mechanical properties between colonies results in a subset of colonies exhibiting exceptional resistance even to very high dissipation rates (see lines 133-137).

      Minor comments:

      Typo Caption of Figure 2: Should read [m^2/s^3] for units

      Thanks for catching this typo. The units in the caption of Figure 2 has been corrected to [m^2/s^3].

      There is no Equation 10 in Materials and Methods as indicated in the rheology section.

      We thank the reviewer for pointing out the lack of clarity in this algebraic manipulation. In fact, the yield stress has to be substituted in the current Equation 11 (previously Eq.10), from which the critical dissipation rate must be substituted in Equation 3. The result is the critical colony size (l* = 2.8) mentioned in line 243 of the revised manuscript. The correct equation numbers and algebraic substitutions are now indicated in lines 241-243 of the revised version of the manuscript.

      <Reviewer #2 (Public review):

      Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. As the importance of adhesion had been described elsewhere, it is not clear what this study reveals about cyanobacterial colonies that was not known before.

      We would like to emphasize several key findings that our study reveals about the impacts of fluid flow on cyanobacterial colonies:

      (I) Quantification of mechanical strength in cyanobacterial colonies: Our results demonstrate the high mechanical strength of cyanobacterial colonies, as evidenced by the requirement of high shear rates to achieve fragmentation. This is new knowledge, that was not known before for cyanobacterial colonies. To this end, our study highlights the resilience of these colonies against naturally occurring flows and bridges the gap between theoretical assumptions about colony strength and experimentally measured mechanical properties.

      (II) The discovery that the mechanical strength of colonies differs between colonies formed by cell division and colonies formed by aggregation. This is again new knowledge, that was not known before for cyanobacterial colonies.

      (III) Validation of a hypothesis regarding colony formation: Using a fluid-mechanical approach, we confirm the findings of recent genetic studies (references 25 and 67 of the revised version of the manuscript) which indicated that colony formation occurs predominantly via cell division rather than cell aggregation under natural conditions (except in very dense blooms).

      (IV) Practical guidelines for cyanobacterial bloom control: Our findings provide valuable insights into the design of artificial mixing systems applied in several lakes. Artificial mixing of lakes is based on fundamentals of fluid flow, aiming at preventing aggregation of buoyant cyanobacteria in scum layers at the water surface. Our results show that the dissipation rates generated by bubble blumes in artificially mixed lakes can fragment cyanobacterial colonies formed by aggregation, but are not intense enough to cause fragmentation of division-formed colonies (see Figure 5 and lines 348-360).

      The agreement between model and experiments is impressive, but the role of the fit parameters in achieving this agreement needs to be further clarified.

      The influence of the fit parameters (namely the stickiness α1 and the pairs of colony strength parameters S1,q1,S2,q2) is discussed in the sections Dynamical changes in colony size modelled by a two-category distribution in lines 247-253 and Materials and Methods in lines 559-565. We kept the discussion concise to maintain readability. However, we agree with the reviewer that additional details about the importance of the fit parameters and the sensitivity of the results to these parameters could be beneficial. In the revised version of the section Materials and Methods in lines 560-563, we have included a detailed discussion of the fit parameters.

      The article may not be very accessible for readers with a biology background. Overall, the presentation of the material can be improved by better describing their new method.

      We apologize for the limited readability of the description of the experimental setup and model used. In the revised version of the manuscript and the SI, we have detailed further the new methods presented here. The modifications include a detailed description of the operating range of the cone-and-plate shear setup (subsection Cone-and-plate shear of the section Materials and Methods, in lines 462-473). Furthermore, we think that incorporation of the recent experimental results of Wu et al. (2024), on lines 331-337 of the manuscript, will appeal to readers with a biology background. Their mesocosm experiments support our model prediction that aggregation is the dominant mechanism for colony formation in region (II) of Figure 5.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors seem too modest in claiming technological advance. They should describe the technological advance of combining microscopy with rheometry, in such a way that this invites others to apply this or similar approaches on biological samples. Even though I feel that the advancement of knowledge of this system by their method is relatively modest, there may be more advances in other systems.

      We appreciate the positive view of the reviewer towards the importance of this technology and we agree that its advantages should be advertised to researchers investigating similar systems. We have now given more attention to the technological advance of combining microscopic imaging with rheometry in the final paragraph of the Conclusions (lines 386400), where we now also briefly discuss an interesting recent study of marine snow (Song et al. 2023, Song and Rau 2022, reference numbers 70 and 71 of the revised manuscript), which used a similar combination of microscopy and rheometry as in our study. Furthermore, in the Methods section, we now briefly explain how the rheometry can be adjusted to investigate other systems (lines 474-480).

      (2) It seems reasonable -also based on what we already know about these aggregates - to assume that the main difference in shear sensitivity between field samples and cultures lies in the production of extracellular polysaccharide substance (EPS). To go beyond what is already known, the study could try to provide more direct and quantitative evidence for EPS involvement. For example, using a chemical quantification of EPS levels, or perturbing EPS levels using digestive enzymes.

      We agree with the reviewer that further characterization of the EPS is highly relevant to understand the mechanical strength of colonies. However, we believe that chemical quantification and/or degradation of EPS lies beyond the scope of our article and should be addressed by future studies.

      (3) Assuming EPS is indeed the reason for the differences in shear resistance: the authors speculate the reason why the field samples have more EPS lies in chemical composition (Calcium/nitrogen levels). In addition, there could be grazing that is known to promote aggregation (possibly increasing EPS), or just inherent genetic differences between strains. I am not necessarily expecting the authors to explore this direction experimentally, but it seems certainly feasible and would make the final result less speculative.

      We agree with the reviewer that there are more biotic and abiotic factors that can influence EPS amount and composition. The influence of grazing and other relevant factors on cell adhesion is discussed in references [26-29], cited in our introduction in lines 50-53. As discussed in our answer to recommendation #2, we believe that a quantitative investigation of these various factors is beyond the scope of this work and should be addressed in future studies.

      (4) A cool finding seems to be the critical relative diameter (Fig 2E), a colony size that seems invariant under shear. I was slightly surprised that the authors seem to take little effort to understand this critical diameter mechanistically (for example by predicting it, or experimentally perturbing it). Again, not a necessary requirement, but this is where the study could harness its technological advantage to provide a more quantitative understanding of something that goes beyond the existing knowledge of the system.

      We apologize to the reviewer if our descriptions and discussions of Figure 2 were unclear. One of the key conclusions from our experiments is that the critical relative diameter depends on the dissipation rate, as shown in Figure 2F. This dependence is also incorporated into the model through the constitutive equation (2). Furthermore, we expect the mechanical resistance of colonies, quantified by the critical relative diameter, to be affected by other biotic and abiotic factors that influence EPS amount and composition.

      (5) The jump from 0.019 to 1.1 m²/s³ seems large. What was the reason for not exploring intermediate values? The authors should also define low, modest and intense dissipation rates more clearly. Currently, they seem somewhat arbitrarily defined, i.e. 0.019 m²/s³ is described as low (methods) and moderate (results). In Fig 2, the authors further talk about low dissipation rates without a quantitative description.

      We thank the reviewer for pointing out the lack of clarity in the choice of parameter range and the nomenclature. Regarding the former, the suspension of division-formed colonies of Microcystis strain V163 displayed negligible fragmentation for dissipation rates between 0.019 to 1.1 m<sup>2</sup>/s<sup>3</sup>, as seen in Figures S2A and S3A. Due to the low sensitivity of the fragmentation results in this region, we don’t expect change in behavior for intermediate values. Regarding the nomenclature, we have corrected the inconsistencies throughout the text. We have chosen to name the dissipation rate values as: low for values typical of windmixing, moderate for values typical of the core of bubble plumes, and intense for values typical of propellers. Whenever mentioned in the text, the numerical value of dissipation rate is also included to avoid doubt.

      (6.) The structure and narrative of the paper can be improved. The article first describes all lab culture experiments and then the model, while the first figure already shows model fits. Perhaps it would be better to first describe the aggregation experiments, to constrain the appropriate terms of the model, and then move to fragmentation.

      We appreciate the recommendation of the reviewer regarding the structure. We have chosen to describe first the fragmentation experiments (Fig. 2), as these can be understood without introducing the aggregation effects. In contrast, the steady state results in the aggregation experiments (Fig. 3) come from the balance between aggregation and fragmentation. Therefore, we judged the current order to be more appropriate. The model fits are combined with the experimental results in Figures 2 and 3 to have a concise display. We have ensured that all the concepts required to understand each figure panel are explained prior to their discussion.

      (7) The number of data points that go into the histogram needs to be indicated. The main reason is that the authors report the distribution in terms of the biovolume fraction, suggesting the numerical counts are converted into volume. This to me seems like the most sensible parameter, but I could not find how this conversion is calculated (my apologies if I missed it). This seems especially relevant because a single large colony can impact this histogram quite considerably.

      We apologize for the lack of clarity in the calibration and conversion steps of the size distribution. As discussed above in the answer to comment #5 of the reviewer #1, more details of the calibration process have been added to the revised version of the Supporting Information Text in lines 785-796. Furthermore, the new Supplementary Figure S8 presents examples of the raw and adjusted size distribution, including the total number of counted colonies per histogram and the associated uncertainties in the concentration and biovolume distributions.

      (8) Over the timescales measured here, colonies could start sinking (or floating), possibly in a size-dependent manner, that could lead to a bias due to boundary effects. Did the authors consider this potential artifact?

      The sinking or floating of colonies is a relevant process which was taken into account in the choice of our parameter range for the dissipation rate. The minimum dissipation rate used in our experiments ensures that the upward inertial velocity near stagnation is sufficient to counteract the sedimentation of colonies. A detailed discussion of the choice of the parameter range is now included in the revised version of the Materials and Methods in lines 462-473.

      (9) "On the one hand, sequencing of the genetic diversity within Microcystis colonies supports the hypothesis that colony formation undernatural conditions is primarily driven by cell division [25]. On the other hand, cell aggregation can occur on a shorter time scale and may offer improved protection against high grazing pressure [26]." This appears somewhat constructed, as what is described as "on the other hand" is not evidence against the genetic diversity.

      We agree that the suggested dichotomy in this text appeared somewhat constructed, and we have now removed the wording “on the one hand” and “on the other hand”. The studies from reference [25] demonstrated that the genetic diversity between independent Microcystis colonies is much greater than the diversity within colonies. If cell aggregation was the dominant mechanism, a similar genetic diversity would be observed between and within colonies, which contrasts the findings from reference [25]. We have adjusted the text in the revised manuscript, in lines 46-54, to clarify this point.

      (10) The phase diagram seems largely based on extrapolations that are made outside of the measurement regime (e.g. dark red bars indicating the dissipation rate, Fig 5 - by the way 1 this color scheme could use some better contrast, by the way 2 Fig S7 suggests a wider dissipation rate range as indicated in Fig 5, why?). Hence there seems to be the need to more clearly lineate experimental results, simulations, and extrapolations in the phase diagram.

      We agree with the reviewer that further clarifications should be given about the parameter range covered in our experiments and apologize for the lack of readability in the color scheme of Fig 5. In lines 329-337, 346-347, 353-355, we have highlighted the parameters range covered by our experiments as well as the range covered by previous studies of windmixed mesocosm (namely reference [64] of the revised manuscript). Regarding the color scheme of Figure 5, we have modified the legend of the figure to improve readability. The color contrast was increased and leader lines were added to connect the colored bars with the respective label.

      (11) Unfortunately, the manuscript did not contain line numbers.

      We apologize to the reviewer for the lack of line numbers in our initial version. The revised version of the manuscript now contains line numbers, both in the main text and the supporting information.

      (12) Fig 2D. Caption is too minimal. Y-axis could better be named "Fraction of colonies" as both small and large colonies are plotted.

      The caption for Figure 2D was extended to better describe the plot. We have kept the y-axis label as “Fraction of small colonies”, since this is the quantity displayed by the three curves in the plot.

      (13) An inset should have axis labels.

      All the insets in our plots display the same variables as their respective plots. In order to keep the plots light and preserve readability, we therefore prefer to present the axis labels only along the x-axis and y-axis of the main plots, which implies by convention that the same axis labels also apply to the insets. To the best of our knowledge, this is a common approach.

      (14) Page 5, first words. Likely Fig 3A, not 2A was meant.

      We thank the reviewer for pointing out this readability issue. We intend to compare both Figures 2A and 3A. The text of the revised manuscript, in lines 146-148, has been adjusted with the correct figure numbers.

      (15) Introduction, second last paragraph, third last line. "suspension leaded to a broad distribution" I assume you meant "... led to a ..."

      We thank the reviewer for pointing out this typo. It has been corrected (line 122).

    1. eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau - via its resultant monsoon system rather than solely its high elevation - has shifted avian migratory directions from a latitudinal to a longitudinal orientation. The authors have expanded and clarified their lines of evidence (including an enlarged tracking set and explicit caveats on species-level eBird inference), such that the central claims are now solid. The conclusions - that monsoon dynamics, rather than elevation per se, are most consistent with observed longitudinal reorientation - illustrates how large, community-sourced and climate-model datasets can inform continent-scale shifts in migratory behavior over time that complement traditional approaches.

    2. Joint Public Review:

      The study assesses how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites.

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well-described, and written in such a fashion that they are transparent and repeatable.

      Editorial note: These latest revisions are minor in the sense that they expand on the dataset but do not change the primary results.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for positive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which helps us improve our manuscript. 

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable. 

      We understand your concern about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. Similar approach was widely used to estimate how biogeographic barriers facilitated the divergent vertebrate communities across the world  (e.g., Williams et al. 2024). We agree that such an approach must be used carefully. In the revision, we have explicitly clarified why this counterfactual comparison is useful – namely it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths (Gilbert and Lambert 2010, Sanmartín 2012, Bull et al. 2021). We acknowledge that the counterfactual results are theoretical and have explicitly emphasised the assumptions involved (i.e., species–environment relationships hold between pre- and post- lift environments) in the main text (Lines 91- 98). Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). 

      References:

      Bull, J. W., N. Strange, R. J. Smith, and A. Gordon. 2021. Reconciling multiple counterfactuals when evaluating biodiversity conservation impact in social-ecological systems. Conservation Biology 35:510-521.

      Gilbert, D., and D. Lambert. 2010. Counterfactual geographies: worlds that might have been. Journal of Historical Geography 36:245-252.

      Sanmartín, I. 2012. Historical Biogeography: Evolution in Time and Space. Evolution: Education and Outreach 5:555-568.

      Williams, P. J., E. F. Zipkin, and J. F. Brodie. 2024. Deep biogeographic barriers explain divergent global vertebrate communities. Nature Communications 15:2457.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      We appreciate the reviewer’s comment here. We would like to clarify that our conclusions regarding longitudinal shifts in migratory distributions are based on distribution models derived from eBird data of 50 species, not merely on migration tracks from seven species. These species-level spatiotemporal models allow us to infer large-scale biogeographic patterns across the Qinghai-Tibet Plateau (QTP).

      The original seven tracking species were used specifically for analysing the relationship between migration directions (azimuths) and environmental variables, offering independent support for the patterns revealed in the eBird-based distribution models. Recognising the reviewer’s concern on sample size and coverage, we have now expanded this part by incorporating migration tracks from 12 additional species, derived through georeferenced digitisation of published migratory maps. Importantly, this expansion did not change our conclusions, i.e., the monsoons instead of the high elevations act as a prominent role in shaping the current migration direction of birds in the QTP. While the overall conclusion remains unchanged, the expanded dataset led to slight changes in difference between spring and autumn migration. We have updated the Figure 2 and the corresponding results and conclusions throughout the manuscript. We have also clarified in the Discussion that regions of the QTP with relatively less data might lead to underestimation of some migration routes to make sure readers are aware of these data limitations (Lines 211-218).

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we have clearly stated the assumptions of niche conservatism in the Introduction (Lines 91-98).

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study. 

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. Our objective is not to determine one-to-one migratory links between specific populations, but to identify general broad-scale directional shifts when birds cross the QTP during their migration. We regret any confusion caused by our earlier wording. To make this clearer, we have now emphasised that our interests focus on the migratory direction and their environmental correlates, rather than population assignments. We have also rephrased the relevant text to explicitly clarify that our study operates at the species level and at large spatial scales (Lines 253–257). We exemplify how distribution of eBird observations and GPS tracking data of four species can be different from each other whilst showing similar migration patterns (Figure S10). We have also explicitly stated in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis could only suggest plausible routes and region-to-region linkages (Lines 200-202).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      We thank the reviewer’s honest assessment and understand the concern regarding the scope of our contribution. Our intention was not to provide an exhaustive account of all aspects of the QTP as a migratory barrier, but to address a specific and underexplored question: how the uplift of the plateau and the resulting monsoon system may have influenced the orientation of avian migration routes. By integrating both satellite tracking and community-contributed data, we have explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift have shaped migratory patterns of birds. We have also discussed the study’s limitations – including the small number of tracking species (Lines 205218), the use of occurrence data as a proxy for breeding and wintering regions (Lines 200-202), the uneven sampling coverage in the QTP (Lines 202-205) and the assumptions behind the counterfactual scenario (Lines 91-98). This ensures that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We thank the reviewer for this suggestion. We agree that radar holds promise for understanding certain aspects of bird migration, particularly for detecting flight intensity, altitudes, and timing. However, the radar systems are currently challenging to resolve migration at the level of species, populations, or individuals, which are central to questions of migratory connectivity and route selection. Most radar signals cannot distinguish between species in mixed flocks, nor can they link breeding and wintering sites for tracked individuals. In addition, the spatial coverage of radar installations remains limited, especially across remote and high-elevation regions like the Qinghai-Tibet Plateau, where infrastructure and continuous power supply are still logistically prohibitive. 

      The eBird dataset used in our study is itself a form of field-based observation, contributed by tens of thousands of birdwatchers across continents, including the QTP region (Figure S11). While eBird cannot provide individual-level tracking, it captures spatiotemporal patterns of occurrence at broad scales, making it a valuable complement to satellite tracking data. We would also emphasis that our team has extensive field experience in the Qinghai-Tibet Plateau (about twenty years), including multi-year expeditions to deploy satellite tags and observe migration at stopover sites. 

      We agree that more direct tracking (e.g. GPS tagging) would be an ideal way to validate migration pathways and population connectivity. Using the satellite-tracking data, we have showed that most tracking species shifted their migration direction when facing the QTP (Figure S6). In this revision, as stated we managed to add a number of 12 more species with satellite tracking routes. We have also noted that future studies should build on our findings by using dedicated tracking of more individual birds and monitoring of migration over the QTP. We have cited recent advances in these techniques and suggested that incorporating more tracking data could further test the hypotheses generated by our work (Lines 205-218).

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We used this sentence to introduce migration. We have removed “important” to reduce ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We stated that this claim is invoked for long-distance migrants before this sentence and have rewritten the sentence to highlight that this interpretation is for long-distance migrants. 

      L 158 what is a migration circle? I do not know such a term.

      We have amended it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We thank the reviewer for raising this important concern. We have presented this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on endogenous reserved energy gained prior to reproduction — rather than an ‘income’ strategy where birds ingest nutrients mainly collected during the period of reproductive activity. This collaborates with studies on breeding strategies of migratory birds in Asian flyways. However, we note that this interpretation would require further study.” By adding this caution, we made it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We have also doublechecked that the rest of the discussion around this point is framed appropriately. Moreover, to help illustrate why we raised this ecological interpretation, we would also draw attention to examples of satellite tracking points from several species (e.g., Beijing Swift, Demoiselle Crane) in the following, which show obvious shifts in migratory direction near the QTP region. These turning points suggest potential behavioral responses to environmental constraints, such as climatic corridors or energy availability, which could help motivate our discussion of possible capital breeding strategies in these species.

    1. Reviewer #1 (Public review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      Weaknesses:

      This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative. It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination. The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns. Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim. Additionally, it's unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase. Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

    2. Reviewer #2 (Public review):

      Summary:

      The article by Waleed et al discusses the self-organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self-organized structures can emerge.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      Comments on revised version:

      The authors have satisfactorily responded to the comments

    3. Reviewer #3 (Public review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripe-like patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the most crucial assumptions underlying continuum simulations.

      The paper is well written, figures are mostly clear, and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not explicitly stated this way by the authors, I would argue that combining these two is one of the key ingredients that distinguishes this theoretical paper from similar ones.

      The diversity of patterning processes experimentally observed and theoretically described is nicely elaborated on in the introduction of the paper. The theory development and discussion of the continuum model itself is also well-embedded in a review of the relevant broad literature on active liquid crystals and active nematics, which includes plenty of previous results by the authors themselves. Interestingly, several of the patterns identified in the present work, such as 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019) have been observed previously in different, but related, active isotropic fluid models. In light of this crowded literature, the authors do good job in delineating key results obtained in the present manuscript from existing work.

      The results of numerical simulations are well-presented. The discussion of numerical observations is comprehensive, but also at many times qualitative. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system, which is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (Nejad et al, Nat Comm 2024). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      The authors must be complimented for trying to gain further mechanistic insights into their conclusions using microscopic filament simulations that were diligently performed. It is rightfully stated that these simulations only provide plausibility tests about key assumptions underlying the hydrodynamic theory. Within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 microscopically, in which the continuum theory does also predict the formation of stripe patterns? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa? The authors clearly explain the scope and limitations of the microscopic model, which suggests that questions like these will be interesting directions of future investigations.

      Overall, the paper represents a valuable contribution to the field of active matter that should provide a fruitful basis to develop new hypothesis about the dynamic self-organisation and mechanics of dense filamentous bundles in biological systems.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.

      To address the weaknesses identified in this assessment, we have expanded the motivation and description of the theoretical model, specifically insisting on the experimental evidence supporting its rationale and assumptions. These changes in the revised manuscript are implemented in the two first paragraphs of Section “Theoretical model” and in a more detailed description and justification of the different mathematical terms that appear in that section. We have made an effort to map in our narrative different terms to mechanistic processes in the actomyosin network. Even if the nature of the manuscript is inevitably theoretical, we think that the revised manuscript will be more accessible to a broader spectrum of readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      We thank the referee for these comments.

      Weaknesses:

      (A) This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.

      We understand the point of the referee. While it is unavoidable to present the continuum hydrodynamic theory behind our results, we have made an effort in the revised manuscript to (1) motivate the essential features required from a theoretical model of the actomyosin cytoskeleton capable of describing its nematic self organization (two first paragraphs of Section “Theoretical model”), and to (2) explicitly explain the physical meaning of each of the mathematical terms in the theory, and when appropriate, relate them to molecular mechanisms in the cytoskeleton. We hope that the revised manuscript addresses the concern of the referee.

      Regarding the comparison with experiments, they are indeed qualitative because the main point of the paper is to establish a physical basis for the self-organization of dense nematic structures in actomyosin gels. Somewhat surprisingly, we argue that a compelling mechanism explaining the tendency of actomyosin gels to form patterns of dense nematic bundles has been lacking. As we review in the introduction, these patterns are qualitatively diverse across cell types and organisms in terms of geometry and dynamics, and for this reason, our goal is to show that the same material in different parameter regimes can exhibit such qualitative diversity. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured and are expected to vary wildly between cell types. In fact, estimates in the literature often rely on comparison with hydrodynamic models such as ours. For this reason, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. Second, the patterns of nematic bundles found across cell types depend on the interaction between (1) the intrinsic tendency of actomyosin gels to form such structures studied here and (2) other elements of the cellular context. For instance, polymerization and retrograde flow from the lamellipodium, the physical barrier of the nucleus, and the interaction with the focal adhesion machinery are essential to understand the emergence of stress fibers in adherent cells. Cell shape and curvature anisotropy control the orientation of actin bundles in parallel patterns in the wings and trachea of insects. Nuclear positions guide the actin bundles organizing the cellularization of Sphaeroforma arctica [11]. Here, we focus on establishing that actomyosin gels have an intrinsic ability to self organize into dense nematic bundles, and leave how this property enables the morphogenesis of specific structures for future work. We have emphasized this point in the revised section of conclusions.

      (B) It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.

      We thank the referee for this comment. Our theory is applicable to actomyosin gels originating from living cells. To our knowledge, the ability of reconstituted actomyosin gels from purified proteins to sustain the kind of contractile dynamical steady-states observed in living cells is very limited. In the revised manuscript, we cite a very recent preprint presenting very exciting but partial results in this direction [49]. Instead, reconstituted in vitro systems encapsulating actomyosin cell extracts robustly recapitulate contractile steady-states. This point has been clarified in the first paragraph of Section “Theoretical model”.

      (C) The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.

      We agree with the referee and in the revised manuscript we have avoided the term “sarcomeric” because it refers to very specific organizations in cells. What we previously called “sarcomeric patterns”, where bands of high density exhibit nematic order perpendicular to the axis of the bands, is not a structure observed to our knowledge in cells. It is introduced to delimit the relevant region in parameter space. In the revised manuscript, we refer to this pattern as “banded pattern with perpendicular nematic organization” or “banded pattern” in short.

      (D) Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.

      We thank the referee for raising this point, which was not sufficiently clarified in the original manuscript. We first note that in incompressible active nematic models, active tension is deviatoric (traceless and anisotropic) because an isotropic component would simply get absorbed by the pressure field enforcing incompressibility. Being compressible, our model admits an active tension tensor with deviatoric and isotropic components. We consider always a contractile (positive) isotropic component of active tension, but the deviatoric component can be either contractile (𝜅 > 0) or extensile (𝜅 < 0), where we follow the common terminology according to which in contractile/extensile active nematics the active stress is proportional to q with a positive/negative proportionality constant [see e.g. https://doi.org/10.1038/s41467018-05666-8]. Furthermore, as clarified in the revised manuscript, total active stresses accounting for the deviatoric and isotropic components are always contractile (positive) in all directions, as enforced by the condition |𝜅| < 1.

      For fibrillar patterns, we need 𝜅 < 0, and therefore active stresses are larger perpendicular to the nematic direction. This means that the anisotropic component of the active tension is extensile, although, accounting for the isotropic component, total active tension is contractile (see Fig. 1c). This is now clarified in the text following Eq. 7 and in Fig. 1.

      However, following fibrillar pattern formation and as a result of the interplay between active and viscous stresses, the total stress can be larger along the emergent dense nematic structures (“contractile structures”) or perpendicular to them (“extensile structures”). To clarify this point, in the revised Fig. 4 and the text referring to it, we have expanded our explanation and plotted the difference between the total stress component parallel to the nematic direction (𝜎∥) and the component perpendicular to the nematic direction (𝜎⊥), with contractile structures satisfying 𝜎∥ − 𝜎⊥ > 0 and extensile structures satisfying 𝜎∥ − 𝜎⊥ < 0. See lines 280 to 303. This is consistent with the common notion of contractile/extensile systems in incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8].

      (E) Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.

      In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, as discussed in our response to comment (A) by this referee. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, possibly interacting with the adhesion machinery, finding dynamical interactions as those suggested by the referee. As an example, we show a video of a simulation where at the edge of the circular domain, there is an actin influx modeling the lamellipodium, and in four small regions friction is higher simulating focal adhesions. Under these boundary conditions, the model presented in the paper exhibits the kind of dynamical reorganizations alluded by the referee.

      Author response video 1.

      We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish.

      (F) Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

      We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.

      We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. To our knowledge, these models have not been able to reproduce the heterogeneous nonequilibrium contractile states involving sustained self-reinforcing flows underlying the pattern formation mechanism studied in our work. The scope of the discrete network simulations has been clarified in lines 340 to 349 in the revised manuscript.

      While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-of-equilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate some aspects of our continuum modeling with discrete simulations. We have emphasized the complementarity of the two approaches in the conclusions.

      Reviewer #1 (Recommendations For The Authors):

      Questions on the theory:

      Does rho describe the density of actin or myosin? The authors say that they are modeling actomyosin material as a whole, but the actin and myosin should be modeled separately. Along, similar lines, does Q define the ordering of actin or myosin?

      Active gel models of the actomyosin cytoskeleton have been formulated with independent densities for actin and for myosin or using a single density field, implicitly assuming a fixed stoichiometry. Super-resolution imaging of the actomyosin cytoskeleton also suggest that in principle it makes sense to consider different nematic fields for actin and for myosin filaments. In the revised manuscript, we now explicitly mention that our density and nematic field are effective descriptions of the entire actomyosin gel (lines 82-84).

      A more detailed model would entail additional material parameters, not available experimentally, which may help reproduce specific experiments but that would make the systematic study of the different behaviors much more difficult. Our approach has been to keep the model minimal meeting the fundamental requirements outlined in the first paragraphs of Section “Theoretical model”.

      Should the active stress depend on material density? It seems strange (from Eq. 3) that active stress could be non-zero even where density is zero, since sigma_act does not depend on rho.

      Yes, active stress is assumed to be proportional to density. Eq. 3 in the original manuscript was misleading (it was multiplied by rho in Eq. 2). In the revised manuscript, we have explained with a bit more detail the theoretical model, clarifying this point.

      The authors should clearly explain their rationale for retaining certain types of nonlinear terms while ignoring others in theory. For instance, the nonlinearities in the equations of motion are sometimes quadratic in the fields, while there are also some cubic terms. Please remark up to what order in the fields the various interactions are modeled.

      We thank the referee for raising this point. The nonlinearities in the theory are easily explained on the basis of a small number of choices. We have added a new paragraph towards the end of Section “Theoretical model” (lines 145 to 152) providing a rationale for the origin and underlying assumptions leading to different nonlinearities.

      To connect with experiments and the biological context, please explain the biological origin of various terms in the model: (1) L-dependent terms in Eq. 2 and 4, (2) Flowalignment of nematic order and experimental evidence in support of it, (3) densitydependent susceptibility terms in Eq. 4

      (1) Unfortunately, the L-dependent terms are very bulky, but are very standard in nematic theories. The best way to understand their physical significance is through the expression of the nematic free-energy, which is now given and explained in the revised manuscript (Eq. 3). The resulting complicated expression for the molecular field and the nematic stress (Eqs. 4 and 5) are mathematical consequences of the choice of nematic free energy. In the revised manuscript, we also attempt to provide a basis for these terms in the context of the actin cytoskeleton. (2) To our knowledge, the best reference supporting this term from experiments is Reymann et al, eLife (2016). In the revised manuscript, we have provided a physical interpretation. (3) We have expanded the motivation and plausible microscopic justification of this term.

      There are different 'activity' terms in the model. Their biophysical origin is not made clear. For example, the authors should make clear if these activities arise from filament or motor activity. Relatedly, the authors should provide a comprehensive discussion of the signs of the different active parameters and their physical interpretations.

      In an active gel model, activity parameters are phenomenological and how they map to molecular mechanisms is not precisely known, although conventionally contractile active tension is ascribed to the mechanical transduction of chemical power by myosin motors. The fact is that, besides myosin activity, there are many nonequilibrium processes in the actomyosin cytoskeleton that may lead to active stresses including (de)polymerization of filaments or (un)binding of crosslinkers. In the revised manuscript, we have added sentences illustrating how different terms may result from microscopic mechanisms, but providing a precise mapping between our model and nonequilibrium dynamics of proteins is beyond the scope of our work, although our discrete network simulations address this issue to a certain degree.

      Following the suggestion of the referee, our description of the theory now discusses much more extensively the signs of activity parameters and their physical interpretations, e.g. the text following Eq. 7.

      Throughout the paper, various activity terms are varied independently of each other. Is that a reasonable assumption given that activities should depend on ATP and are thus not independent of one another?

      We agree that, ultimately, all active process depend on the conversion of chemical energy into mechanical energy. However, recent work has highlighted how active tension also depends on the microscopic architecture of the network controlled by multiple regulators of the actomyosin cytoskeleton (e.g. Chug et al, Nat Cell Biol, 2017). It is reasonable to expect that, for a given rate of ATP consumption, chemical power will be converted into mechanical power in different ways depending on the micro-architecture of the cytoskeleton, e.g. the stoichiometry of filaments, crosslinkers, myosins, or the length distribution of filaments (very long filaments crosslinked by myosins may be difficult to reorient but may contract efficiently).

      We have added a paragraph in Section “Theoretical model” with a discussion, lines 153 to 156.

      Sarcomeres are muscle fibers that exhibit alternating polarity pattern. Such patterning is not evident in what the authors call 'sarcomeres' in Fig. 2. I believe the authors should revise their terminology and not loosely interpret existing classifications in the field.

      We thank the referee for raising this point. We have changed the terminology.

      Fig 2a: Is the cartoon for filament alignment incorrect for kappa>0?

      The cartoon is correct. In the revised manuscript we have explained more clearly the physical meaning of kappa in the text following Eq. 7. In the caption of Fig. 1 and of Fig. 2a, we have also clarified that when the absolute value of kappa is <1, then active tension is positive in all directions.

      Within the section "Requirements for fibrillar and banded patterns", it will be useful to show the figures for varying the different active parameters in the main figures.

      We have followed the referee’s suggestion and moved Supp. Fig. 1 of the original manuscript to the main figures.

      How do the authors decide if bundles are contractile or extensile? Why are contractile bundles under tension while extensile bundles are under compression? I would expect the opposite.

      We agree that this point deserves a more detailed explanation. In the revised manuscript and in the new Figure 4, we further develop this point. The fibrillar pattern forms when kappa<0. We further assume that -1<kappa<0, so that active tension is positive in all directions. In this regime, the deviatoric (anisotropic) part of active tension is extensile. However, following pattern formation and because of the interplay between active and viscous stresses, the total stress in the emerging bundles may become extensile or contractile, depending on whether the largest component of stress is perpendicular or along the bundle axis. This is now presented in the updated figure, with new panels presenting maps of the total tension. The text discussing this point has been rewritten and we hope that the new version is much clearer (lines 280 to 303).

      A contractile bundle tends to shorten, but it cannot do it because of boundary conditions or the interaction with other bundles. As a result they are in tension. Conversely, an extensile bundle tries to elongate, but being constrained, it becomes compressed. As an analogy, consider the cortex of a suspended cell. The cortex is contractile, but it cannot contract because of volume regulation in th cell, which is typically pressurized. As a result, tension in the cortex is positive, as shown by Laplace’s law [10.1016/j.tcb.2020.03.005]. We have tried to clarify this point in the revised manuscript.

      Can the authors reproduce alternating density patterns using the cytosim simulations? This is an important step in establishing the correspondence between the continuum theory and the agent-based model.

      We have addressed this point in our response to public comment (F) of this referee.

      The authors do not provide code or data.

      The finite element code with an input file require to run a representative simulation in the paper is now made available, see Ref. [74].

      The customizations of Cytosim needed to account for nematic order in our discrete network simulations are available, see Ref. [98].

      Reviewer #2 (Public Review):

      Summary:

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.

      We thank the referee for these comments. In the revised manuscript, we have highlighted the novelty, particularly in the last paragraph of the introduction, the first two paragraphs of Section “Theoretical model”, and in the conclusions. Despite a very large literature on theoretical models of stress fibers, actin rings, and active nematics, we argue that the active self-organization of dense nematic structures from an isotropic and low-density gel has not been compellingly explained so far. Many models assume from the outset the presence of actin bundles, or explain their formation using localized activity gradients. The literature of active nematics has extensively studied symmetry breaking and the self-organization. However, most of the works assume initial orientational order. Only a few works study the emergence of nematic order from a uniform isotropic state, but consider dry systems lacking hydrodynamic interactions or incompressible and density-independent systems [37,38]. Yet, pattern formation in actomyosin gels is characterized by large density variations, and by highly compressible flows, which coordinate in a mechanism relying on an advective instability and self-reinforcing flows.

      Our theoretical model is not particularly novel, and as we mention in the manuscript, it can be particularized to different models used in the literature. However, we argue that it has the right minimal features to capture nematic self-organization in actomyosin gels. To our knowledge, no previous study explains the emergence of dense and nematic structures from a low-density isotropic gel as a result of activity and involving the advective instability typical of symmetry-breaking and patterning in the actomyosin cytoskeleton. These are important qualitative features of our results that resonate with a large experimental record, and as such, we believe that our work provides a new and compelling mechanism relying on self-organization to explain the prominence and diversity of patterns involving dense nematic bundles in the actomyosin cytoskeleton across species.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      We agree with the referee that this was a weakness of the original manuscript. In the revised manuscript, within reasonable space constraints given the size and dynamism of the field of active nematics, we have placed our work in the context of this field (end of introduction and first two paragraphs of Section “Theoretical model”). The published version of our companion manuscript [45] also contributes to providing a clear context to our theoretical model within the field.

      Reviewer #2 (Recommendations For The Authors):

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article. I explain my questions comments below.

      We have responded to this comment above.

      (i) Active nematics including density variations have been dealt quite extensively in the literature. For example, the works of Sriram Ramaswami have dealt with this system including linear stability analysis, simulations etc. In what way is the present work different from the system that they have considered?

      (ii) Active flows leading to self organization has been a topic of discussion in many works. For example: (i) Annual Review of Fluid Mechanics, Vol. 43:637-659, 2010, https://doi.org/10.1146/annurev-fluid-121108-145434 (ii) S Santhosh, MR Nejad, A Doostmohammadi, JM Yeomans, SP Thampi, Journal of Statistical Physics 180, 699-709 (iii) M. G. Giordano1, F. Bonelli2, L. N. Carenza1,3, G. Gonnella1 and G. Negro1, Europhysics Letters, Volume 133, Number 5. In what way this work is different from any of these?

      (iii) I am confused about the models used in the paper. There is significant literature from Prof. Mike Cates group, Prof. Julia Yeomans group, Prof. Marchetti's group who all use similar governing equations. In the present paper, I find it hard to understand whether the model used is similar to the existing ones in literature or are there significant differences. It should be clarified.

      Response to (i), (ii) and (iii).

      We completely agree with this referee (and also the previous referee), that the contextualization of our work in the field of active nematics was very insufficient. In the revised manuscript, the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model” now address this point. In short, previous active nematic models predicting patterns with density variations have been either for dry active matter (disregarding hydrodynamic interactions), or for suspensions of active particles moving in an incompressible flow. None of these previous works predict nematic pattern formation as a result of activity relying on the advective instability and self-reinforcing compressible flows, leading to high density and high order bundles surrounded by an isotropic low density phase. Yet, these are fundamental features observed in actomyosin gels. Many works deal with symmetry-breaking of a system with pre-existing order, but very few address how order emerges actively from an isotropic state. We thank the referee for pointing at the paper by Santhosh et al, who nicely make this argument and is now cited. Our mechanism is fundamentally different from that in Santhosh, whose model is incompressible and ignores density variations.

      We hope that the revised manuscript addresses this important concern.

      (i) >(iv) Below Eqn 6, it starts by saying that the “...origin..is clear...” Its not. I don't understand the physical origin of the instability, and this should be clarified, may be with some illustrations.

      We apologize for this unfortunate sentence, which we have rewritten in the revised manuscript (lines 181 to 185).

      Reviewer #3 (Public Review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.

      The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).

      We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel, as mentioned in the revised manuscript. We have emphasized this point in different places in the revised manuscript. We thank the suggestions of the referee to better connect with existing literature.

      To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.

      We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. The companion publication has now been accepted in the New Journal of Physics, with significant changes to better connect the work to the field of active nematics. A preprint reflecting those changes is available in Ref. [64], but we hope to reference the published paper that will come out soon.

      In the revised manuscript, we have significantly rewritten the Section “Theoretical model” to frame the continuum model in the context of the field of active nematics. While our model and results have commonalities with previous work, there are also important differences. We have highlighted the novelty of the present work along with the relation with previous studies and theoretical models in the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model”. Furthermore, as suggested by the referee, we have made an effort to connect our results with previous work by Kumar, Mietke, Doostmohammadi and others.

      Regarding the last point alluded by the referee (“extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis”), the picture raised by the referee would be nuanced for our compressible system as compared to the incompressible systems discussed in that reference. As we have elaborated in our response to point (D) of Referee #1, our systems are overall contractile (with positive active tension in all directions), but the deviatoric component of the active tension can be either extensile or contractile. In our “extensile” models (left in Fig. 2c), material is drawn to laterally to the nematic axis but it is not expelled along this axis. Instead, it is “expelled” by turnover. In the revised manuscript, we have added a comment about this.

      The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      We thank the referee for these comments. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. In the revised manuscript, we have expanded and clarified the characterization of emergent contractile/extensile networks by reporting the relative magnitude of stress along and perpendicular to the nematic direction. Our revised manuscript clearly shows that even though all of our simulations describe locally contractile systems with extensile anisotropic active tension, the emergent meso-structures can be either extensile or contractile, with the extensile ones exhibiting the usual bend-type instability (a secondary instability in our system) described classically for extensile active nematic systems. We have rewritten the text discussing this (lines 280 to 303), where we have placed these results in the context of recent work reporting the nontrivial relation between the contractility/extensibility of the local units vs the nematic pattern.

      I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?

      We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa may not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. As discussed in our response to referee #1, we believe that studying the formation of patterns using the discrete network simulations is far beyond the scope of our work. We discuss in lines 332 to 341, as well as in the last paragraph of the conclusions, the scope and limitations of our discrete network simulations.

      Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.

      Reviewer #3 (Recommendations For The Authors):

      • The statement "the porous actin cytoskeleton is not a nematic liquid-crystal because it can adopt extended isotropic/low-order phases" is difficult to understand and should be clarified, as the next paragraph starts formulating a nematic active liquid crystal theory. Do the authors mean a crystal that "Tends to be in a disordered phase?", according to its equilibrium properties? It would still be a "nematic liquid crystal", only its ground state is not a nematic phase.

      We agree with the referee, and we hope that changes in the introduction and in Section “Theoretical model” address this comment.

      • I could not find what Frank energy is precisely used, that would be helpful information.

      In the revised manuscript, we have provided the expression for the nematic free energy in Eq. 3.

      • The Significance of green/purple arrows in Fig 2a sketch unclear, green arrows also in b,c, do they represent the same quantity? From the simulations images it is overall it is very difficult to see how the flows are oriented near the high-density regions (i.e. if they are towards / away from the strip).

      We thank the referee for bringing this up. The colorcodings of the sketches were confusing. The modified figures (Fig. 1(c) and Fig. 2(a)) present now a clearer and unified representation of anisotropic tension. The green arrows in Fig. 2(c) represent the out-of-equilibrium flows in the steady state. We agree that the zoom is insufficient to resolve the flow structure. For this reason, in the revised Fig. 2, we have added additional panels showing the flow with higher resolution.

      • It is currently unclear how the linear stability results - beyond identification of the parameter \delta - inform any of the remaining manuscript. Quantitative comparisons of the various length scales seen in simulated patterns (e.g. Fig. 2b, 3c etc) with linear predictions and known characteristic length scales would be instructive mechanistically, would make the overall presentation more compelling and probes limitations of linear results.

      In the revised manuscript, we have provided further information so that the readers can appreciate the predictions and limitations of the linear stability results. We have added a sentence and a Figure to show that, in addition to the critical activity, the linear theory provides a good prediction of the wavelengh of the pattern. See lines 199 to 201.

      • It is not clear what is meant by "[bundle-formation] requires that active tension perpendicular to nematic orientation is larger than along this direction", and therefore also not why that would be "counter-intuitive". If interpreted naively, I would say that a large tension brings in more filaments into the bundle, so that may well be an obviously helpful feature for bundle formation and maintenance. In any case, it would be helpful if clarity is improved throughout when arguments about "directions of tensions" are made.

      We have significantly rewritten the first paragraphs of section “Microscopic origin…” to clarify this point (lines 330 to 339). This paragraph, along with other changes in the manuscript such as the explanation of Eq. 7 or the discussion about the stress anisotropy in the new version of Fig. 4 (see lines 280 to 303), provide a better explanation of this important point.

      • All density color bars: Shouldn't they rather be labelled \rho/\rho_0?

      Yes! We have corrected this typo.

      • Scalar product missing in caption definition of order parameter Fig. 2

      We have corrected this typo.

      • Fig. 3a: I suggest to put the expression for q0 in the caption

      We have changed q_0 by S_0 and clarified its meaning in the caption of what now is Fig 4.

      • Paragraph on bottom right of page 6 should several times probably refer to Fig. 3c(...), instead of Fig. 3b

      We have corrected this typo.

    1. eLife Assessment

      This important work has the potential to expand the repertoire of transgenic animals for systems neuroscience investigations across multiple fields. The generation of new reagents has the potential to open new directions in experimental design, and the Cas9-based approach for generating mice may provide additional benefits compared to existing BAC transgenic mouse lines. However, whereas some of the imaging data are compelling, quantitative analysis of transgene fidelity is incomplete, as it relies on a qualitative description of reporter XFP expression at low magnification, with some electrophysiological characterization.

    2. Reviewer #1 (Public review):

      Summary:

      I read with much attention the manuscript titled "Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons" in which the authors reveal five transgenic lines to target diverse neuronal populations of the basal ganglia. In addition, the authors also provide some assessments of the functionality of the lines.

      Strengths:

      Knockin lines made readily available through Jackson. Lines show specific expression.

      Weaknesses:

      Although I have no doubt these knocking lines will be broadly used by researchers in the field, I find the scientific advances of the study and the breadth of the resource provided quite limited. This is partly because 4 of these lines have been generated by other laboratories. For instance, there are already two other Dat-FlpO lines generated (JAX#: 033673 and 035436), with one of them already characterized (PMID: 33979604). Similarly, Drd1-Cre and Adora2a-Cre have been used abundantly since they were generated over a decade ago, and a novel Drd1-FlpO line has been characterized thoroughly recently (PMID: 38965445). Indeed, some of these lines were BAC transgenic, and I agree with the authors that there is a sound rationale for generating knock-in mice; however, the authors should then demonstrate if/how their new drivers are superior. Overall, the valuable resource generated by the authors would benefit from additional quantification and validation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors report the generation and validation of new knock-in mouse lines enabling precise targeting of basal ganglia projection neurons and midbrain dopamine neurons. By inserting recombinase sequences at endogenous loci, they provide tools that improve on older BAC-based models, with the additional benefit that all lines are openly available through Jackson Laboratories. This work is timely, fills a longstanding gap for the community, and will support both basic circuit mapping and disease-related research.

      Strengths:

      The major strength of this study is the provision of new genetic resources that will be widely used by the basal ganglia and dopamine research communities. Anatomical and electrophysiological data indicate appropriate expression and preserved intrinsic properties. The Flp lines, in particular, show labeling largely confined to basal ganglia circuits, making them especially attractive for circuit-based studies. A further strength is the use of a T2A-recombinase insertion at the native gene stop codon, which preserves endogenous regulation and maintains near-physiological expression of Adora2a, Drd1a, and DAT. The availability of both Cre and Flp versions enables powerful intersectional strategies, and open distribution through Jackson Laboratories ensures broad accessibility and long-term value.

      Weaknesses:

      The major limitation is the discrepancy between Cre and Flp lines, with Cre generally driving broader expression than Flp. This raises concerns about anatomical fidelity that require validation at the cellular level. For the DAT-FlpO line, efficiency remains insufficiently quantified, and higher-resolution co-labeling with TH immunostaining is needed. Electrophysiological comparisons between Cre and Flp versions are also incomplete; current data suggest potential physiological differences, which warrant additional statistical testing and, at a minimum, explicit discussion in the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Using latest knock-in technology, the authors generated a set of five mouse lines with expression of recombinases in striatal projection neurons and dopaminergic neurons for public use. They rigorously characterize the expression of the recombinases by intersectional crossing with reporter lines to demonstrate that these lines are faithful, and they perform electrophysiological experiments in slices to provide evidence that the respective neurons show the expected features in these assays.

      Strengths:

      The characterization of the new mouse lines is exceptional, and these will be widely used by the community. The mouse lines are openly available for the community to use.

      Weaknesses:

      No weaknesses were identified by this Reviewer.

    5. Author response:

      We thank all three reviewers for their thoughtful and constructive evaluations of our manuscript, “Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons.” We are encouraged that the reviewers recognize the value, specificity, and utility of these new lines for the basal ganglia and dopamine research communities. Below, we summarize our planned revisions and clarifications in response to the reviewers’ comments.

      (1) Novelty and comparison with existing lines

      We appreciate Reviewer 1’s point regarding the existence of previously generated Cre and Flp lines targeting similar neuronal populations. Our project was initiated six years ago, and during the course of generating and characterizing all five lines, we became aware that similar individual lines have since been developed by other groups. Nevertheless, our study provides a coordinated and independently validated set of lines created using a standardized knock-in (KI) strategy and distributed through Jackson Laboratories for unrestricted community use. Importantly, whereas previous BAC transgenic approaches rely on random insertion, which can lead to position effects and ectopic expression, our design places the recombinase coding sequence immediately downstream of the endogenous stop codon using a self-cleaving T2A peptide. This ensures expression under native promoter and regulatory control, preserving physiological gene regulation.

      To address the Reviewers’ points, we will (i) expand the Introduction and Discussion to clarify the rationale and advantages of endogenous promoter–driven recombinase expression over BAC-based systems, emphasizing that our lines provide a uniform, promoter-controlled, and publicly accessible toolkit for the community, (ii) and explore including a comparative table summarizing differences in construct design, expression fidelity, and recombination efficiency across published lines (e.g., PMID 33979604, 38965445).

      (2) Quantification, validation, and comparison of Cre vs FlpO

      We agree with Reviewers 1 and 2 that further quantification and discussion of Cre versus FlpO fidelity will strengthen the manuscript. The observed difference in expression breadth between Cre and FlpO lines likely reflects a fundamental property of the recombinases themselves rather than a discrepancy in targeting. Cre recombinase is significantly more enzymatically efficient than FlpO, meaning that even very low endogenous levels of gene expression (e.g., Drd1a or Adora2a) can drive Cre-dependent recombination, whereas FlpO requires higher expression thresholds. Consequently, reporter-based readouts will inherently appear broader for Cre lines, despite both being driven by the same endogenous promoters.

      To address these points, we will (i) provide quantitative co-labeling analyses for the DAT-FlpO line with TH immunostaining to assess efficiency and specificity, (ii) clarify in the Results and Discussion that differences between Cre and FlpO expression patterns largely stem from differences in recombinase kinetics and sensitivity, not mismatched promoter activity, (iii) and include representative high-resolution images and relevant statistics in the revised figures. Importantly, we would like to note that RNAscope may not be an ideal validation approach in this context, as in situ transcript detection cannot capture the enzymatic threshold differences that determine reporter recombination and thus will not help address observed differences between Cre and FlpO lines. Finally, we are actively performing electrophysiological comparisons between Cre and FlpO lines to rigorously quantify potential physiological differences between them. Updated analyses will be incorporated as available or described as ongoing future work.

      (3) Discussion of scope and interpretation

      We appreciate the reviewers’ suggestions to better contextualize the scope of this resource. We will revise the Discussion to (i) highlight that the Cre–FlpO pairings enable powerful intersectional and cross-line strategies for dissecting basal ganglia and midbrain circuitry, (ii) and clarify that our goal was to generate a rigorously validated foundational resource, with detailed functional comparisons and manipulation studies to be explored in subsequent work.

      In summary, we thank the reviewers for their insightful feedback. The planned revisions and clarifications will underscore the unique strengths of our knock-in design, explore potential Cre–FlpO differences, and highlight the value of this standardized and accessible toolkit for the neuroscience community.

    1. eLife Assessment

      This important study reports on the redundant roles of the decapping activators Edc3 and Scd6 in orchestrating post-transcriptional programs to modulate metabolic responses to nutrients in yeast. The authors employed mutagenesis studies in conjunction with a battery of transcriptome-wide analyses to provide convincing evidence supporting their conclusions. Considering the broad implications of post-transcriptional regulation of gene expression, this study will be of interest across a variety of biomedical disciplines ranging from biochemistry and molecular and cellular biology to those specializing in studying various pathologies.

    2. Reviewer #1 (Public review):

      Summary:

      mRNA decapping and decay factors play critical roles in post-transcriptionally regulating gene expression. Here, Kumar and colleagues investigate how deleting two yeast decapping enhancer proteins (Edc3 and Scd6), either alone or in tandem, affects the transcriptome. Using RNA-Seq, CAGE-Seq and ribosome profiling, they conclude that these factors generally act in a redundant fashion, with a mutant lacking both proteins showing an increased abundance of select mRNAs. As these upregulated transcripts are also upregulated in mutants lacking the decapping enzyme, Dcp2, and show no increases in transcription of their cognate genes, the authors conclude that this is at the level of mRNA decapping and decay. This was further supported by CAGE-Seq analyses carried out in WT cells and the scd∆6edc3∆ double mutant. Their ribosome profiling data also lead them to conclude that Scd6 and Edc3 display functional redundancy and cooperativity with Dhh1/Pat1 in repressing the translation of specific transcripts. Finally, as their data suggest that Scd6 and Edc3 repress mRNAs coding for proteins involved in cellular respiration, as well as proteins involved in the catabolism of alternative carbon sources, they go on to show that these decapping activators play a role in repressing oxidative phosphorylation.

      Strengths:

      Overall, this manuscript is well-written and contains a large amount of compelling high-quality data and analyses. At its core, it helps to shed light on the overlapping roles Edc3 and Scd6 have in sculpting the yeast transcriptome.

      Weaknesses:

      While not essential, it would be interesting if the authors carried out add-back experiments to determine which domain within Scd6/Edce3 plays a critical role for enforcing the regulation that they see? Their double mutant now puts them in a perfect position to carry out such experiments.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Kumar and Zhang presents compelling evidence that Edc3 and Scd6 decapping activators, present a high degree of redundancy that can only be overcome by double mutants of both. In addition, the authors provide strong evidence for their role in regulating starvation-induced pathways as evidenced by measurements of mitochondrial membrane potential, metabolomics and analysis of the flux of Krebs cycle intermediates.

      Strengths:

      Kumar, Zhang et al provide multiple source of evidence of the direct mechanism of Edc3 and Scd6, by using and comparing different approaches such as mRNA-seq, ribosome occupancies and translational efficiencies. By extensive analysis the authors show that this complex can also regulate genes outside the Environmental Stress Response (non-iESR) that are significantly up-regulated in all three mutants. Remarkably, the gene ontology analysis of these non-iESR genes identify enrichment for mitochondrial proteins that are implicated in the Krebs cycle. Overall, this study adds novel mechanistic insight into how nutrients control gene expression by modulating decapping and translational repression.

      Weaknesses:

      The authors show very nicely that growth phenotypes from scd6Δedc3∆ can be rescued by transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). Future work could make use of these rescue strategies, for example as a platform to further characterise protein-protein interactions between Edc3, Scd6 and Dhh1.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Kumar et al investigated the role of two decapping activators, Edc3 and Scd6, in regulating mRNA decay and translation in yeast. Using a variety of approaches including RNA-seq, ribosome profiling, proteomics, polysome analysis, and metabolomics the authors demonstrate that whereas single deletions of Edc3 or Scd6 have modest effects, the double mutant leads to increased abundance of mRNAs, many of which overlap with those targeted by the decapping activators Dhh1 and Pat1. The data suggest that Edc3 and Scd6 function redundantly to recruit Dhh1 to the Dcp2 decapping complex, thereby promoting mRNA turnover and translational repression. The authors show that these factors cooperate with Dhh1/Pat1 to repress transcripts involved in respiration, mitochondrial function, and alternative carbon source utilization, linking post-transcriptional regulation to nutrient responses. The study establishes Edc3 and Scd6 as important, but redundant regulators that fine-tune gene expression and metabolic adaptation in response to nutrient availability.

      Strengths:

      The paper has several strengths, including the comprehensive approach taken by the authors using multiple experimental techniques (RNA-seq, ribosome profiling, Western blotting, TMT-MS, polysome profiling, and metabolomics) to provide multiple lines of evidence to support their conclusions. The authors demonstrate clear redundancy of the factors by using single and double mutants for Edc3 and Scd6 and their global approach enables an understanding of these factors' roles across the yeast transcriptome. The work connects post-transcriptional processes to nutrient-dependent gene regulation, providing insights into how cells adapt to changes in their environment. The authors demonstrate the redundant roles of Edc3 and Scd6 in mRNA decapping and translation repression. Their RNA-seq and ribosome profiling results convincingly show that many mRNAs are derepressed only in the double mutants, confirming their hypothesis of redundancy. Furthermore, the functional cooperation between Edc3/Scd6 and Dhh1/Pat1 in regulating specific metabolic pathways, including mitochondrial function and carbon source utilization, is supported by the metabolomic data.

      Weaknesses:

      The study uses indirect evidence to support claims about the effect on mRNA stability rather than directly measuring mRNA stability. However, the combination of Pol II occupancy and RNA abundance measurements is consistent with the claims regarding mRNA stability. The addition of new experiments in the revision co-IPing Dhh1 and Dcp2 strengthens the argument that Edc3 and Scd6 recruit these factors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Strengths: 

      Overall, this manuscript is well-written and contains a large amount of high-quality data and analyses. At its core, it helps to shed light on the overlapping roles of Edc3 and Scd6 in sculpting the yeast transcriptome. 

      Weaknesses: 

      (1) While the data presented makes conclusions about mRNA stability based on corresponding ChIP-Seq analyses and analyzing other mutants (e.g. Dcp2 knockout), at no point is mRNA stability actually ever directly assessed. This direct assessment, even for select transcripts, would further strengthen their conclusions. 

      We appreciate the reviewer’s concern but wish to emphasize that we conducted ChIP-Seq analysis of RNA Polymerase II occupancies in the CDSs of all genes, known to be a reliable indicator of transcription rate, and found only small increases in Pol II occupancies that cannot account for the increased transcript levels of the cohort of mRNAs up-regulated in the scd∆6edc3∆ double mutant (Fig. 3E). This provides strong evidence that increased transcription is not the main driver of increased mRNA abundance in this mutant.  Bolstering this conclusion, we showed that the Hap2/Hap3/Hap4/Hap5 complex of transcription factors responsible for induction of Ox. Phos. genes was not activated in scd6Δedc3Δ cells in glucose medium (Fig. 6F(ii)); nor was the Adr1 activator of CCR genes activated (Fig. S9C(i)), ruling out transcriptional induction of their target genes in glucose-replete scd6Δ/edc3Δ cells and instead favoring reduced degradation as the mechanism underlying derepression of Ox. Phos. and CCR gene transcripts in this mutant. In Fig. 3B, we further showed that the majority of mRNAs up-regulated in the scd6Δedc3Δ double mutant are also derepressed by dcp2Δ, and in Fig. 3D that the mRNAs up-regulated in scd∆6edc3∆ cells exhibit a higher than average codon protection index (CPI) indicating a heightened involvement of decapping and co-translational degradation by Xrn1 in their decay. To provide additional support for our conclusion, we have conducted new experiments to measure the abundance of capped mRNAs genome-wide by CAGE sequencing of total mRNA in both WT and scd∆6edc3∆ cells.  As established previously, normalizing CAGE TPMs to total mRNA TPMs determined by RNA-Seq, dubbed the C/T ratio, provides a reliable measure of the capped proportion of each transcript.  The new data presented in Fig. 3C indicate that the mRNAs up-regulated in the scd∆6edc3∆ mutant have significantly lower than average C/T ratios in WT cells, whereas the C/T ratios for the down-regulated transcripts are higher than average, and that these differences between the two groups and all expressed mRNAs are diminished in the scd∆6edc3∆ double mutant. These are the results expected if the up-regulated mRNAs are selectively targeted for decapping in WT cells dependent on Edc3/Scd6, whereas the downregulated mRNAs are targeted by Edc3/Scd6 less than the average transcript. In the original version of the paper, we came to the same conclusion by analyzing our previous CAGE data for the dhh1∆ mutant for the same transcripts dysregulated scd∆6edc3∆ cells, now presented as supportive data in Fig. S3F. Finally, we added the fact that among all four Dhh1 target mRNAs examined in the previous study of He et al. (2022) and found here to be up-regulated selectively in the scd6∆edc3∆ double mutant (Fig. S10), two of them (SDS23 and HXT6) were shown directly to have longer half-lives in dhh1∆ vs. WT cells by He et al. (2018). Hence, the combined evidence is compelling that selective up-regulation of particular mRNAs in the scd∆6edc3∆ mutant results from diminished decapping/decay rather than enhanced transcription; and we feel that the additional supporting evidence that would be provided by measuring half-lives of a small group of up-regulated transcripts would not justify the considerable effort required to do so.  Moreover, the standard approach for such experiments of impairing transcription with an inhibitor of Pol II or a Pol II Ts<sup>-</sup> mutation has been criticized because of the known buffering (suppression) of mRNA decay rates in response to impaired transcription.

      (2) Scd6 and Edc3 show a high level of functional redundancy, as demonstrated by the double mutant. As these proteins form complexes with other decapping factors/activators, I'm curious if depleting both proteins in the double mutant destabilizes any of these other factors. Have the authors ever assessed the levels of other key decapping factors in the double mutants (i.e. Dhh1, Pat1, Dcp2...etc)? I wonder if depleting both proteins leads to a general destabilization of key complexes. It would also be interesting to see if depleting Edc3 or Scd6 leads to a concomitant increase in the other protein as a compensatory mechanism. 

      We thank the reviewer for this insight.  Examining our Ribo-Seq and TMT-MS data revealed that Dhh1 expression and steady-state abundance are increased ~2-fold in the scd6∆edc3∆ strain, indicating that the up-regulation of many of the same mRNAs by scd6∆edc3∆ and dhh1∆ does not result indirectly from reduced levels of Dhh1 in the scd6∆edc3∆ mutant. The predicted increased in Dhh1 expression might signify a compensatory response to the absence of Scd6/Edc3.  We also observed an ~40% reduction in Dcp2 translation (RPFs) and mRNA abundance in the scd6∆edc3∆ strain, which might contribute to the up-regulation of mRNAs dysregulated in this mutant. However, our new immunoblot analyses revealed no significant reduction in steady-state Dcp2 levels in scd6∆edc3∆ cells (Input lanes in Figs. 3F and S4C(i)-(ii)). Moreover, our previous finding that the majority of mRNAs subject to NMD, up-regulated by both upf1∆ and dcp2∆, are not upregulated by scd6∆edc3∆ implies that Dcp2 abundance in scd6∆edc3∆ cells is adequate for normal levels of NMD and favors a direct role for Scd6/Edc3 in accelerating degradation of most transcripts up-regulated in this mutant. We have added these points to the DISCUSSION.

      (3) While not essential, it would be interesting if the authors carried out add-back experiments to determine which domain within Scd6/Edce3 plays a critical role in enforcing the regulation that they see. Their double mutant now puts them in a perfect position to carry out such experiments. 

      We agree with the reviewer that our scd6∆edc3∆ strain provides an opportunity to dissect the Scd6 and Edc3 proteins to determine which domains and motifs of each protein are most critically required for their functions in activating mRNA decay. However, if conducted thoroughly, this would entail an extensive analysis requiring a combination of genetics, biochemistry and genomics.  Considering the large amount of data already presented in 43 and 34 panels of main and supplementary figures, respectively, we feel that these additional experiments would be conducted more appropriately as a stand-alone follow-up study.

      Reviewer #2 (Public review): 

      Weaknesses: 

      The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies in the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies). Also, these rescue assays could provide a good platform to further characterise the protein-protein interactions between Edc3, Scd6, and Dhh1. 

      We responded to this point immediately above in responding to Rev. #1.

      Reviewer #3 (Public review): 

      Weaknesses: 

      The limitations of the study include the use of indirect evidence to support claims that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, which is inferred from correlations in mRNA abundance and ribosome profiling data rather than direct biochemical evidence. 

      While the reviewer makes a valid point, it is important to note that the greater correlations between effects of scd6∆edc3∆ with those conferred by dhh1∆ vs. pat1∆ also extended to changes in metabolites (Fig. 7A-C). To provide more direct evidence that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, we have now conducted co-immunoprecipitation experiments (presented in new Figs. 3F and S5) demonstrating that association of Dhh1 with Dcp2 is diminished in the scd6∆edc3∆ double mutant but not in either scd6∆ or edc3∆ single mutant, thus providing biochemical support for our proposal.

      Also, there is limited exploration of other signals as the study is focused on glucose availability, and it is unclear whether the findings would apply broadly across different environmental stresses or metabolic pathways. Nonetheless, the study provides new insights into how mRNA decapping and degradation are tightly linked to metabolic regulation and nutrient responses in yeast. The RNA-seq and ribosome profiling datasets are valuable resources for the scientific community, providing quantitative information on the role of decapping activators in mRNA stability and translation control. 

      While not disputing the facts of this comment, we think it is unjustified to label as a weakness that our study focused on glucose-grown cells considering the large amount of new data and insights made possible by our multi-omics approach, presented in >70 separate figure panels and nine supplementary datafiles, which the reviewer has characterized as being valuable to the scientific community.  Parallel studies in non-preferred carbon or nitrogen sources are underway and represent large-scale investigations in their own right, for which the current dataset in glucose-replete cells provides the critical reference condition.

      Reviewer #1 (Recommendations for the authors): 

      The authors made a note that a set of 37 mRNAs is repressed exclusively by Edc3 with little contribution by Scd6, a list that includes the RPS28B mRNA. Edc3 has been previously reported to promote the decay of this mRNA in a deadenylation-independent fashion by binding to an element in its 3'UTR (PMIDs 15225544, 24492965). Can the authors comment on whether Edc3 may be binding to similar elements in the 3'UTRs of these transcripts in their shortlist? This could be an interesting topic matter for discussion as well. 

      While an interesting idea, this seems unlikely because the 3’UTR sequence in RPS28B mRNA was shown to bind Rps28 protein itself to confer heightened decapping and decay dependent on Edc3 in a negative autoregulatory loop that exerts tight control over Rps28 protein levels.  It would be surprising if Edc3mediated repression of the other 36 mRNAs would involve Rps28 as none of them encode cytoplasmic ribosomal proteins. Nevertheless, we searched for a conserved motif among the 3’UTRs of the 37 mRNAs using the MEME suite and found enrichment for motifs identified for RNA binding proteins Hrp1 and Nab2 and two novel motifs, but none of these motifs could be recognized within in the Rps28 autoregulatory loop.  We have chosen not to comment on these findings in the revised manuscript to avoid lengthening it unnecessarily with inconclusive observations.

      Reviewer #2 (Recommendations for the authors): 

      The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by the transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies on the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies); or expressing truncated mutants of EDC3 (pLfz614-7) or SCD6 (pLfz615-5), to show that they can act as dominant negative competitors, either on the binding to Dhh1 and Dcp2. 

      We addressed this comment above in our response to this Reviewer.

      Reviewer #3 (Recommendations for the authors): 

      (1) Labels such as "mRNA_up_s6,e3" are not defined in figures or the text. I suggest clearer sample labeling throughout. 

      The labels had been defined at first mention in the RESULTS but are now indicated there more explicitly, as well as in the legend to Fig. 1.

      (2) In Figure 1D it is surprising that the mRNA profile has a peak in the 5' UTR. I would expect to see such a peak in ribosome footprinting data. Is it possible these are incorrectly labeled?

      The figure is correctly labeled. Generally, one does not expect to see RPFs in the 5’UTR region unless there is an efficiently translated uORF, which appears not to be the case for MDH2.

      In general, the information in this panel and C is inadequate. None of the numbers are clearly explained in the figure legend or in the figure. 

      We had cited the legend to Fig. S3C for details of all such gene browser images but have now inserted this information into the Fig. 1D legend, at the first occurrence of such data in the regular figures. 

      (3) Figures 1C and 1D are in the wrong order.

      Corrected.

      (4) Figure 2D is a very complicated Venn Diagram. I suggest using UpSet plots as an alternative to Venn diagrams to more clearly convey overlaps between sets.  

      We provided additional explanatory text in the Fig. 2D legend to facilitate understanding.

      (5) The use of the same color scheme to represent different sets in panels of the same figure is a source of confusion. E.g. the cyan in Figures 2A, 2D, and 2E indicates unrelated categories, but one would think they are related.

      The use of the same cyan color in these three figure panels actually does designate results for the same set of 591 mRNAs up-regulated in the three mutants.  The application of the color schemes is now mentioned explicitly in Figs. 1, 2, and S3.

      (6) Reporting of p-values = 0 in figures is not useful.

      Corrected.

      (7) The whole manuscript is extremely long which reduces the overall impact. For example, the introduction is six pages long. I suggest reducing redundant text and being more concise to enhance readability. 

      We tried to streamline the text wherever possible, in particular shortening the Introduction by two pages.

      (8) Many abbreviations are used throughout the text that are not introduced the first time they are used. 

      Corrected throughout.

      (9) The ERCC normalization is unclear. Were the spike-ins added before cell lysis to allow estimation of per-cell RNA counts or to the extracted RNA? If added to extracted RNA rather than cells it is not clear to me how the claim can be made regarding increased mRNA abundance in the mutants. 

      We thank the reviewer for this comment. As we explained in the Methods, 2.4 µl of 1:100 diluted ERCC RNA Spike-In Control Mix 1 was added to 1.2 µg of each total RNA sample prior to cDNA library preparation.  Because the majority of total mRNA is comprised of rRNA, this normalization yields the abundance of each mRNA relative to rRNA. Owing to repression of rESR mRNAs encoding ribosomal proteins and biogenesis factors in the scd6∆edc3∆ strain (Fig. S3D), the ribosome content per cell is expected to be reduced in this mutant vs. WT. We showed previously that the isogenic dcp2∆ mutant that elicits an ESR response of similar magnitude, showed a 30% reduction in bulk ribosomal subunits per cell compared to same WT strain examined here {Vijjamarri, 2023 #7866}.  Assuming a similar reduction in ribosome abundance in the scd6∆edc3∆ mutant, the changes in mRNA per cell conferred by the scd6∆edc3∆ mutation are expected to be 0.7-fold of the ERCCnormalized values given in Fig. 3E, yielding fold-changes of 2.00 and 0.62 for the mRNA_up and mRNA_dn, groups, respectively, which still differ substantially from the corresponding changes in normalized Rpb1 occupancies of 1.2 and 0.93, respectively.  We have added this new analysis to the text of RESULTS.

      (10) The use of the terms "up-regulated" and "derepressed" throughout is confusing. Both refer to observed increased abundance of mRNAs, but they imply different causes which are never clearly defined. 

      We changed all occurrences of “derepressed” to “up-regulated”.

    1. eLife Assessment

      This manuscript revisits the well-studied KdpFABC potassium transport system from bacteria with a convincing set of new higher resolution structures, a protein expression strategy that permits purification of the active wildtype protein, and solid insight obtained from mutagenesis and activity assays. The thorough and thoughtful mechanistic analyses makes this a valuable contribution to the membrane transport field.

    2. Reviewer #2 (Public review):

      Summary:

      The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm.

      The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the rate-limiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying well-defined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology.

    3. Reviewer #3 (Public review):

      Summary:

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wildtype protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway.

      Strengths:

      The high resolution (2.1 Å) of the current structure is impressive, and allows many new densities in the potassium transport pathway to be resolved. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. The SSME experiments are generally rigorous.

      Weaknesses:

      The present SSME experiments do not support quantitative comparisons of different mutants, as in Figures 4D and 5E. Only qualitative inferences can be drawn among different mutant constructs.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study on potassium ion transport by the protein complex KdpFABC from E. coli reveals a 2.1 Å cryo-EM structure of the nanodisc-embedded transporter under turnover conditions. The results confirm that K+ ions pass through a previously identified tunnel that connects the channel-like subunit with the P-type ATPase-type subunit. 

      Strengths: 

      The excellent resolution of the structure and the thorough analysis of mutants using ATPase and ion transport measurements help to strengthen new and previous interpretations. The evidence supporting the conclusions is solid, including biochemical assays and analysis of mutants. The work will be of interest to the membrane transporter and channel communities and to microbiologists interested in osmoregulation and potassium homeostasis. 

      Weaknesses: 

      There is insufficient credit and citation of previous work. 

      The manuscript has been thoroughly revised with special attention to acknowledging all past work relevant to the study.

      Reviewer #2 (Public review): 

      Summary: 

      The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm. 

      Strengths: 

      The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the ratelimiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying welldefined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology. 

      Weaknesses: 

      While the results are overall compelling, several aspects of the work raised questions. First, the authors determined the structure of the pump in nanodiscs under turnover conditions and observed several structural classes, including E1-P, which is detailed in the paper. Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes? 

      As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story that is focused on the conduction pathway for K<sup>+</sup> ions. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF<sub>4</sub> and BeF<sub>x</sub>, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.

      The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable? 

      Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess. 

      To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in an expanded "Activity Assays" section of Methods.

      In addition, we have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

      The authors propose that the tunnel connecting the subunits is filled with water and lacks potassium ions. This is an important mechanistic point that has been debated in the field. It would be interesting to calculate the volume of the tunnel and estimate the number of ions that might be expected in it, given their concentration in bulk. It may also be helpful to provide additional discussion on whether some of the observed densities correspond to bound ions with low occupancy.  

      As suggested, we calculated the internal volume of the tunnel within KdpA (from the S4 K<sup>+</sup> site to the KdpA/KdpB subunit interface) based on the profile derived from Caver. Based on this volume (4.9 x 10<sup>-25</sup> L), a single K<sup>+</sup> ion within this cavity would correspond to 3.4 M, which is near saturation for a solution of KCl. We added this information together with an acknowledgment of low-occupancy K<sup>+</sup> to the fourth paragraph of the Discussion:

      " Fourth, based on the volume of the cavity in KdpA, a single K<sup>+</sup> ion would correspond to a concentration of 3.4 M, suggesting that multiple ions would exceed the solubility limit especially in the absence of counterions. Finally, map densities within the tunnel were either of comparable strength or weaker than surrounding side chain atoms, unlike at S3 and canonical binding sites. Although it is possible that weaker density could represent low occupancy K<sup>+</sup> ions, we favor a mechanism whereby individual K<sup>+</sup> ions occupy the tunnel transiently as they transit between the selectivity filter and the canonical binding site."

      In order to make this analysis, we developed a python script to calculate the volume of the tunnel as defined by the Caver software (this software is available via github.com/dls4n/tunnel). In turn, this enabled us to distinguish water molecules that were actually in the tunnel rather than bound more deeply within the structure of KdpA. As a result, we updated the water distribution plot in Fig. 4b. Notably, the 17 water molecules within this cavity would correspond to 57.8 M, which is reasonably near the expected 55 M for an aqueous solution.

      Reviewer #3 (Public review): 

      Summary: 

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wild-type protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway. 

      Strengths:  

      Although this structure is not so different from previous structures, its high resolution (2.1 Å) is impressive and allows the resolution of many new densities in the potassium transport pathway. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. 

      Weaknesses: 

      The structures are supported by solid membrane electrophysiology. These data exhibit some weaknesses, including a lack of information to assess the rigor and reproducibility (i.e., the number of replicates, the number of sensors used, controls to assess proteoliposome reconstitution efficiency, and the stability of proteoliposome absorption to the sensor). 

      To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in the "Activity Assays" section of Methods.

      Reviewing Editor Comments

      After discussing the evaluations, the Reviewers and Reviewing Editor have identified the following essential revisions that would need to be addressed to improve the eLife assessment:

      (1) Work from others in the field should be adequately described and acknowledged: 

      (a) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex." 

      The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially. 

      Explicit reference to this work has been added to as follows:

      “A series of X-ray and cryo-EM structures of KdpFABC from E. coli (Huang et al., 2017; Silberberg et al., 2022, 2021; Stock et al., 2018; Sweet et al., 2021) indicate a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex. As first proposed by Stock et al. (Stock et al., 2018), there is now a consensus that K<sup>+</sup> enters the complex from the extracellular side of the membrane through the selectivity filter of KdpA, but is blocked from crossing the membrane.”

      (b) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work." 

      This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned 

      The Q116R mutant was used by Huang et al., but indeed not used for the Stock et al paper. We have replaced the sentence in the manuscript with the following:

      “Use of the KdpD knockout strain allowed us to produce WT and mutant protein free from Ser162 phosphorylation.”

      (c) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned. 

      Reference to the Silberberg 2022 paper, where E1~P was the most highly populated state, has been added. The percentage of particles was removed as we are still processing data from the other states, which will we hope will be described in a future manuscript.

      (d) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling."  The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning. 

      To correct this oversight, we made the following changes to the text: 

      On pg. 7 in the Results section, we refer to the 2005 paper from Bramkamp & Altendorf:

      “Consistent with earlier work on this mutant (Bramkamp and Altendorf, 2005), the D583A mutant displayed substantial ATPase activity (30% of WT) but no transport, indicating that this particular mutation interfered with energy coupling.”

      At the end of pg. 10 in the Discussion, we revised the paragraph discussing D583 and Lys586 to explicitly refer to the mechanism of transport described in the 2021 paper from Silberberg et al, including proximal and distal binding sites as well as uncoupling due to the D583A mutation.

      “Similar to the Glu370/Arg493 charge pair in KdpA, Asp583 and Lys586 are the only charged residues in the membrane core of KdpB. Although they are not seen to interact directly in our structure, they coordinate accessory waters associated with the canonical binding site. Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K<sup>+</sup> ions. Based on these simulations, these authors proposed a mechanism whereby neutralization of this site either by ion binding or by D583A substitution served to stimulate ATPase activity. Indeed, earlier work on D583A (Bramkamp and Altendorf, 2005) as well as current data demonstrate uncoupling, in which K<sup>+</sup> independent ATPase activity was observed even though transport was abolished. A plausible explanation for this stimulation is seen in the behavior of Lys586 in previous structures of the E2·Pi state (7BGY and 7BH2) (Sweet et al., 2021). In these structures, M5 undergoes a conformational change that pushes the side chain of Lys586 into the CBS. As a consequence of the D583A mutation, this Lys could be freed to act as a built-in counter ion as in related P-type ATPases ZntA (Wang et al., 2014) and AHA2 (Pedersen et al., 2007). In regard to the proximal binding site and the partnering “distal binding site” on the KdpA-side of the subunit interface, our structure does not show densities at either site and thus does not provide any support for the related mechanism. In any case, in the WT complex it seems likely that Asp583 exerts allosteric control over Lys586 and ensures that its movement into the binding site is coordinated with the transition from E1~P to E2·Pi, thus leading to displacement of K<sup>+</sup> from the CBS and release to the cytoplasm. “

      (e) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. " 

      Incorrect, see Silberberg 2021. 

      On this point, we beg to differ. Although this 2021 paper shows densities in experimental cryo-EM maps and effects of mutations to residues at the KdpA and KdpB interface, the intra-tunnel transport mechanism is based on computational analysis (MD simulations) and not experimental evidence. We softened the statement to read as follows:

      “Although it has been postulated to conduct K<sup>+</sup>, direct experimental evidence has been hard to come by.”

      (f) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures. 

      Phe232 is shown as a point of reference for the KdpA/KdpB subunit interface. We added a reference to Phe232 in the Results section labeled “Intersubunit tunnel” as well as the paragraph in the Discussion addressed in point d) above.

      " These densities, which we have modeled as water, are most prevalent near the vestibule, which is the wider part of the tunnel, but then disappear completely at the subunit interface near Phe232, which is the narrowest part of the tunnel and also distinctly hydrophobic (Fig. 4)."

      " Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K<sup>+</sup> ions."

      (g) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). " 

      KcsA is not a member of the SKT superfamily. 

      Thanks. This is correct, although the SKT superfamily is believed to have evolved from KcsA. KcsA has been removed from the sentence and a reference added to a review of the SKT superfamily:

      “which also includes bona fide K<sup>+</sup> channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”

      (2) Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes? 

      As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF<sub>4</sub> and BeF<sub>x</sub>, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.

      (3) The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. 

      To address this concern, we have generated supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also added a detailed description of replicates, sensor stability and the experimental protocols in the "Activity Assays" section of Methods. In addition, we have highlighted observations of pre-steady state binding currents that were seen for some mutants (e.g., Q116R assayed with Rb<sup>+</sup>, NH<sub>4</sub><sup>+</sup> and Na<sup>+</sup>), in which an initial, transient current response was observed without an ensuing transport current. The depiction of this raw data has allowed us to explain our use of the current response at 1.25 s, after decay of this binding current, as a measure of transport rate. This approach is consistent with recommendations by the manufacturer, as documented in their 2023 publication (Bazzone et al. https://doi.org/10.3389/fphys.2023.1058583).

      (4) Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable? Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess. 

      We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

      (5) What are the different lines in Figure 1 - Supplement 1, panel G? 

      This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.

      (6) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2. 

      The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the PostAlbers cycle. Reference is made to the Silberberg 2022 paper, which made a similar observation in a detergent-solubilized sample.

      (7) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 µM in Figure 2e. 

      This was a typographical error. The text now states that Km for Q116E is <100 M.

      (8) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed. 

      The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.

      (9) Figure 3 would benefit from a slice through the protein to orient the viewer. 

      Thanks for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.

      (10) The differences between R493E, Q, and M do not appear to be significant. 

      The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001).  The associated text on pg. 6 has been slightly modified as follows:

      “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”

      (11) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?) 

      These mutations have historical significance, having been generated by random mutagenesis during early characterization of the Kdp system by Epstein and colleagues. A sentence containing relevant references has been added to this paragraph to provide this context:

      “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”

      Below are the recommendations from each of the reviewers, some of which were not included as essential revisions, but that can also be helpful to further strengthen the manuscript. 

      Reviewer #1 (Recommendations for the authors): 

      It is essential that the authors correct their selective, incomplete, and in places inappropriate references to work from others in the field. 

      Specific points: 

      (1) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex." 

      The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially. 

      (2) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work." 

      This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned 

      (3) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned. 

      (4) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling."  The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning. 

      (5) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. " 

      Incorrect, see Silberberg 2021. 

      (6) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures. 

      References have been added to address all of these points. See item 1) under Reviewing Editor’s Comments above.

      Other points: 

      (7) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). " 

      KcsA is not a member of the SKT superfamily. 

      KcsA has been removed from the sentence and a reference added to a review of the SKT family:

      “which also includes bona fide K<sup>+</sup> channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”

      (8) Page 9 " Our demonstration of coupled transport of NH4+ and Rb+ G232D not only confirms that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous."  Check grammar. 

      This sentence has been updated as follows:

      “Our observation that G232D is capable of coupled transport for NH<sub>4</sub><sup>+</sup and Rb<sup>+</sup> confirms not only that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous.

      Reviewer #2 (Recommendations for the authors): 

      (1) From an editorial point of view, I suggest a few changes to enhance readability and clarity for non-specialists. A description of the overall transport cycle at the start of the paper (perhaps as a supplementary figure) could help put the work into perspective for general readers who may not be familiar with P-type ATPase mechanisms. It is unclear what "single" and "double" occupancy refer to in the structural classes description. Why is only one structural class described in detail? I would suggest moving the discussion of what is going on with the Nterminus of KdpB to the Results section, where it is described, and shortening the corresponding paragraph in the Discussion. I would furthermore suggest adding a figure that illustrates the proposed regulatory role of the terminus and how phosphorylation might affect it. Otherwise, this section of the results reads very hollow. 

      A diagram showing the Post-Albers cycle is shown as part of Fig. 1 and is described at the end of the second paragraph. This sentence only mentioned KdpB, which may have caused confusion. We therefore changed the sentence to read as follows:

      “Like other P-type ATPases, KdpFABC employs the Post-Albers reaction cycle (Fig. 1) involving two main conformations (E1 and E2) and their phosphorylated states (E1~P and E2-P) to drive transport (Albers, 1967; Post et al., 1969).”

      Single and double occupancy was meant to refer to the number of KdpFABC complexes residing in a nanodisc. This can be seen in the class averages in Fig. 1 - figure supplement 2. The legends to Fig. 1 figure supplements 1 and 2 have been revised to explain this observation more explicitly:

      "Slight asymmetry of the main peak is consistent with a subpopulation of nanodiscs containing two KdpFABC complexes (Fig. 1 - figure supplement 2)."

      and

      "A subset of these particles were further classified to generate four main classes representing nanodiscs with a single copy of KdpFABC in either E1 or E2 conformations, nanodiscs with two copies of KdpFABC which were mainly E1 conformation, and junk."

      As stated above, the class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We continue to analyze the cryo-EM data and aim to produce a second manuscript that will include descriptions of other conformations together with the additional biophysical analysis related to their function.

      With regard to the N-terminus, we have gone on to generate a truncation of residues 2-9 in KdpB. After expression and purification, this construct remained coupled with ATPase and transport activities similar to WT, which makes proposals of a regulatory effect less compelling. Because of the novelty of observing the N-terminus and the possibility that it plays a subtle role in the kinetics of the cycle not revealed under the current assay conditions, we have retained a brief discussion of this structural observation, but moved it into the Results section as suggested.

      "Given the regulatory roles played by N- and C-termini of a variety of other P-type ATPases (Bitter et al., 2022; Cali et al., 2017; Lev et al., 2023; Timcenko et al., 2019; Zhao et al., 2021), we generated a construct in which residues 2-9 of the N-terminus of KdpB were truncated. However, ATPase and transport activities remained coupled at levels similar to WT, indicating that any functional role of the N-terminus is relatively subtle and not manifested under current assay conditions."

      (2) The wording "exceedingly strong densities" seems ambiguous. 

      We have changed this to “strong” in the Abstract and "exceptionally strong" in the Discussion. The precise values for these densities are shown in density histograms in Fig. 2 – figure supplement 1 and Fig. 5 – figure supplement 2. In the text, the densities are described as follows:

      Results sections describing the selectivity filter:

      "In fact, this S3 site contains the strongest densities in the entire map, measuring 7.9x higher than the threshold used for Fig. 2a (Fig. 2 – figure suppl. 1a)."

      Results section describing the CBS:

      "Given that this is the strongest density in KdpB, measuring 5.6x higher than the map densities shown in Fig. 5 (Fig. 5 – figure suppl 2b), we have modeled it as K<sup>+</sup>."

      (3) What are the different lines in Figure 1 - Supplement 1, panel G? 

      This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.

      (4) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2. 

      The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. We will consider citing an updated value in a future publication once this analysis is complete. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the Post-Albers cycle. Reference was made to the Silberberg 2022 paper, where a similar observation was made.

      (5) Panel 1d is called out of order after panel 1e. Please label Ser 162 in the panel. 

      The order of these panels have been switched and Ser162 has been labelled as suggested.

      (6) Several panels in Figure 1- Supplement 1 are neither referenced nor described. 

      This figure supplement is referred to multiple times in the Results and the Methods sections of the text as well as in the figure legends. Although each panel is not individually referenced, all of this information is relevant at different points in the manuscript and is explained in the legend.

      (7) Is the coordinating geometry for the S3 site consistent with what was previously observed for KcsA and relatives? 

      The general arrangement of carbonyl atoms in the S3 site is the same in KcsA and KdpA, described by the MacKinnon group as a square antiprism. However, KcsA has strict four-fold symmetry and KdpA does not. As a result, there are small discrepancies between the coordinating geometries in the two structures. This point was made graphically in our original report on the X-ray structure of KdpFABC (Huang et al. 2007, Extended Data Fig. 3), though the positions of the carbonyls are more accurately determined in the current structure due to increased resolution. We added a sentence to the Selectivity Filter section of the Results stating the following:

      "This coordination geometry is also consistent with that seen in the K<sup>+</sup> channel KcsA, though the strict four-fold symmetry of that homo-tetramer produces a more regular structure, as indicated by the smaller variance in liganding distance (2.77 Å with s.d. 0.075 Å in 1K4C) and as depicted by Huang et al. in Extended Data Fig. 3 (Huang et al., 2017)."

      (8) Label G232D in Figure 2a. 

      G232 is out of the plane shown in Fig. 2a. However, we have added a label for Cys344 to help identify the selectivity filter strands that are shown. Note, however, that G232 is visible and labeled in Fig. 2 - figure suppl. 1. This has now been noted in the legend for Fig. 2.

      (9) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 uµ in Figure 2e. 

      This was a typographical error. The text now states that Km for Q116E is <100 M.

      (10) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed. 

      The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.

      (11) Figure 3 would benefit from a slice through the protein to orient the viewer. 

      Thank you for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.

      (12) The differences between R493E, Q, and M do not appear to be significant. 

      The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001).  The associated text on pg. 6 has been slightly modified as follows:

      “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”

      Reviewer #3 (Recommendations for the authors): 

      Overall, the text was very clear, experiments were rationalized well, and conclusions were justified. A few small comments: 

      (1) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?) 

      These mutations are of historical importance, having been generated by random mutagenesis during early characterization of the Kdp system. A sentence containing relevant references has been added to this paragraph to provide this information as context:

      “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”

      (2) Typo: page 14, "diluted" 

      This typo has been corrected.

      (3) The Methods section for SSM electrophysiology could use some additional description of how the data/statistics were collected. How many replicates? Were all replicates from a single sensor/ were multiple sensors examined? Were controls done to test whether the same number of liposomes remain absorbed by the sensor over the length of the experiment? 

      We have extended our description of experimental protocols in the "Activity Assays" section of Methods. This includes the number and type of replicates as well as a discussion of binding currents that were seen for some mutants. Furthermore, a new series of supplementary figures for Figs. 2, 4, 5, and 6 show all of the raw traces for the SSME measurements (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1, Figure 5 - figure supplement 3, Figure 6 - figure supplement 2).

      We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

    1. eLife Assessment

      This is a methodologically rich manuscript that is important for revealing the center-surround inhibition profile of expectation in orientation space. The analyses are compelling in validating the critical role of predictive coding feedback. The findings provide novel insights into how expectation optimizes perception via enhancement and suppression.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tested two competing mechanisms of expectation (1) a sharpening model that suppresses unexpected information via center-surround inhibition; (2) a cancellation model that predicts a monotonic gradient response profile. Using two psychophysical experiments manipulating feature space distance between expected and unexpected stimuli, the results consistently supported the sharpening model. Computational modeling further showed that expectation effects were explained by either sharpened tuning curves or tuning shifts. Finally, convolutional neural network simulations revealed that feedback connections critically mediate the observed center-surround inhibition.

      Strengths:

      The manuscript provides compelling and convergent evidence from both psychophysical experiments and computational modeling to robustly support the sharpening model of expectation, demonstrating clear center-surround inhibition of unexpected information.

      Comments on revisions:

      I appreciate the authors' thoughtful revisions. I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This is a compelling and methodologically rich manuscript. The authors used a variety of methods, including psychophysics, computational modeling, and artificial neural networks, to reveal a non-monotonic, center-surround "Mexican-hat" profile of expectation in orientation space. Their data convincingly extend analogous findings in attention and working memory, and the modeling nicely teases apart sharpening vs. shift mechanisms.

      Strengths:

      The findings are novel and important in elucidating the potential neural mechanisms by which expectation shapes perception. The authors conducted a series of well-designed psychophysical experiments to careful examination of the profile of expectation's modulation. Computational modeling also provides further insights, linking the neural mechanisms of expectation to behavioral results.

      Comments on revisions:

      I think the authors did a great job in addressing my previous comments. I have no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review):

      (1)  The sharpening model of expectation can predict surround suppression. The authors could further clarify how the cancellation model predicts a monotonic profile of expectation (Figure 1C) with the highest response at the expected orientation, while the cancellation model suggests a suppression of neurons tuned toward the expected stimulus.

      We thank the reviewer for the comment. We would like to emphasize that as the expected signal is suppressed, the relative weight or salience of unexpected inputs increases. We have clarified this interpretation in the manuscript as follows:

      “Here, given these two mechanisms making opposite predictions about how expectation changes the neural responses of unexpected stimuli, thereby displaying different profiles of expectation, we speculated that if expectation operates by the sharpening model with suppressing unexpected information, we should observe an inhibitory zone surrounding the focus of expectation, and its profile then should display as a center-surround inhibition (Fig. 1c, left). If, however, expectation operates as suggested by the cancelation model with highlighting unexpected information, the inhibitory zone surrounding the focus of expectation should be eliminated, and the profile should instead display a monotonic gradient (Fig. 1c, right).”

      (2) I'm a bit concerned about whether the profile solely arises from modulation of expectation. The two auditory cues are each associated with a fixed orientation, which may be confounded by other cognitive processes like visual working memory or attention (which I think the authors also discussed). Although the authors tried to use SFD task to render orientation task-irrelevant, luminance edges (i.e., orientation) and spatial frequency in gratings are highly intertwined and orientation of the gratings may help recall the first grating's SF (fixed at 0.9 c/{degree sign}), especially given the first and second grating's orientations are not very different (4.8{degree sign}).

      We agree that dissociating expectation from attention and other top-down processes remains a key challenge in visual expectation research (see Summerfield & Egner, 2009; Summerfield & de Lange, 2014; de Lange et al., 2018). As is generally acknowledged, expectation reflects the probability of a sensory event, while selective attention relates to its behavioral relevance. To minimize attentional influences, our task design ensured that grating orientation was not taskrelevant: on each trial, participants discriminated either orientation or spatial frequency difference, such that orientation itself did not require attentional allocation, a point already discussed in the manuscript.

      Regarding visual working memory, we argue that even if participants recalled the first grating’s spatial frequency in the SFD task, they were not required to retain its precise spatial frequency (or orientation), as their task was simply to judge whether the second grating appeared denser or sparser. In other words, orientation (or spatial frequency) itself was not task-relevant. Moreover, although not included in the manuscript, we conducted a post-experiment debriefing in which participants were asked whether they noticed any association between the auditory tone and the grating orientation. None of the participants reported this relationship correctly, suggesting that the tone-orientation mapping remained implicit and was unlikely to be driven by strategic attention or memory.

      However, we acknowledge that certain confounding processes such as statistical learning or implicit mapping acquisition cannot be fully ruled out given the current paradigm. Future studies using methods with higher temporal resolution (e.g., EEG/MEG) may help to dissociate these mechanisms more precisely.

      (3) For each of the expected orientations (20{degree sign} or 70{degree sign}), the unexpected ones are linearly separable (i.e., all unexpected ones lie on one side of the expected angle). This might further encourage people to shift their attended or expected orientation, according to the optimal tuning hypothesis. Would this provide an alternative explanation to the tuning shift that the authors found?

      We thank the reviewer for pointing out the relevance of the optimal tuning hypothesis. We acknowledge that the optimal tuning theory (Navalpakkam & Itti, 2007) is an important framework, particularly in visual search paradigms, where attentional templates may shift away from non-target features to enhance discriminability.

      In our task, this hypothesis would predict a shift of expectation toward <20° in E20° trials and >70° in E70° trials, given that all unexpected orientations lie on one side of the expected angle. Importantly, the optimal tuning hypothesis predicts such shifts not only in Δ20°, Δ25°, and Δ30° trials but also in the Δ0° trials. In this regard, the observed shift in Δ20° and Δ30° (Experiment 2) and Δ25° (Experiment 3) trials is broadly consistent with the predictions of the optimal tuning account. However, we did not observe a corresponding shift away from nontarget features in the Δ0° condition, suggesting limited behavioral evidence for optimal tuning effects under our current task settings.

      It is important to note that most previous studies supporting optimal tuning (e.g., Navalpakkam & Itti, 2007; Scolari & Serences, 2009; Geng, DiQuattro, & Helm, 2017; Yu & Geng, 2019) have used visual search paradigms that differ from our design in several critical ways, including the number of stimuli presented, their spatial arrangement (eccentricity), task demands, and so on. Therefore, it is difficult to determine whether the optimal tuning hypothesis could serve as an alternative explanation within the context of our current study. We agree that future studies could further examine how such task parameters influence the presence or absence of optimal tuning.

      (4) It is great that the authors conducted computational modeling to elucidate the potential neuronal mechanisms of expectation. But I think the sharpening hypothesis (e.g., reviewed in de Lange, Heilbron & Kok, 2018) focuses on the neural population level, i.e., narrowing of population tuning profile, while the authors conducted the sharpening at the neuronal tuning level. However, the sharpening of population does not necessarily rely on the sharpening of individual neuronal tuning. For example, neuronal gain modulation can also account for such population sharpening. I think similar logic applies to the orientation adjustment experiment. The behavioral level shift does not necessarily suggest a similar shift at the neuronal level. I would recommend that the authors comment on this.

      We thank the reviewer for this to-the-point comment. As de Lange et al. (2018) noted, “there is not always a direct correspondence between neural-level and voxel-level selectivity patterns.” That is, neuronal tuning, population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying mechanisms and do not necessarily align in a one-toone fashion. We fully acknowledge that population-level tuning effects may also result from various neuronal mechanisms such as gain modulation (for review, see Salinas & Thier, 2000), shifts in preferred orientation (Ringach, et al., 1997; Jeyabalaratnam et al., 2013), asymmetric broadening of tuning curves (Schumacher et al., 2022), or tuning curve sharpening (Ringach, et al., 1997; Schoups et al., 2001).  

      In our modeling, we implemented sharpening and shifts of neuronal tuning curves as a conceptual model simplification, intended to explore potential mechanisms underlying expectation-related center-surround suppression effects. While sharpening-based accounts (e.g., Kok et al. 2012) have often been emphasized, we stress that other mechanisms, such as gain modulation or tuning shifts, may also contribute. Our goal is not to provide a definitive account, but to highlight such plausible mechanisms and encourage future investigation. We have revised the Discussion to emphasize that multiple mechanisms may underlie the observed effects.

      “We note that our implementation of sharpening and shifts at the neuronal level serves as a conceptual model simplification, as population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying neuronal mechanisms and do not necessarily align in a one-to-one fashion. Here, we stress that other potential mechanisms beyond sharpening, such as tuning shifts, may also contribute to visual expectation.” 

      (5) If the orientation adjustment experiment suggests that both sharpening and shifting are present at the same time, have the authors tried combining both in their computational model?

      We agree with the reviewer that it is necessary to consider the combined model. Accordingly, we implemented a computational model incorporating sharpening of the expected orientation channel together with shifting of the unexpected orientation channels. This model

      successfully captured the sharpening of the expected-orientation channel and the shift of the unexpectedorientation channels (Supplementary Fig. 3). For the expected orientation (Δ0°) , results showed that the amplitude change was significantly higher than zero on both OD (t(23) = 2.582, p = 0.017, Cohen’s d = 0.527) and SFD (t(23) = 2.078, p = 0.049, Cohen’s d = 0.424) tasks (Supplementary Fig. 3e, vertical stripes); the width change was significantly lower than zero on both OD (t(23) = -2.438, p = 0.023, Cohen’s d = 0.498) and SFD (t(23) = -2.578, p = 0.017, Cohen’s d = 0.526) tasks (Supplementary Fig. 3e, diagonal stripes). For unexpected orientations (Δ10°-Δ40°), however, the amplitude and width changes were not significant with zero on either OD (amplitude change: t(23) = 0.443, p = 0.662, Cohen’s d = 0.091; width change: t(23) = -1.819, p = 0.082, Cohen’s d = 0.371) or SFD (amplitude change: t(23) = 1.130, p = 0.270, Cohen’s d = 0.231; width change: t(23) = -1.710, p = 0.101, Cohen’s d = 0.349) tasks (Supplementary Fig. 3f). In the meantime, the location shift was significantly different than zero for unexpected orientations (Δ10°-Δ40°, OD task: t(23) = 3.611, p = 0.001, Cohen’s d = 0.737; SFD task: t(23) = 2.418, p = 0.024, Cohen’s d = 0.493 (Supplementary Fig. 3g). These results provided further evidence that tuning sharpening and tuning shift jointly contribute to center– surround inhibition in expectation.  

      Reviewer#1 (Recommendation for the Author):

      (1) A direct comparison between tasks (baseline vs. expectation conditions) would have strengthened the findings. Specifically, contrasting performance in the orientation discrimination task with the spatial frequency discrimination task could have provided clearer evidence that participants actually used the auditory cues to attend to the expected orientation. This comparison would be particularly important for validating cue manipulation in the orientation discrimination task.

      We agree that a direct comparison between the orientation discrimination (OD) and spatial frequency discrimination (SFD) tasks could further clarify how expectation (auditory cues) differentially modulates orientation relevance. However, the primary goal of the current study was to examine expectation effects within each task separately and to demonstrate that such effects are independent of attentional modulation driven by the task-relevance of orientation.

      In addition, the OD and SFD tasks differ not only in the relevant task features (orientation vs. spatial frequency discrimination), but also in stimulus properties and difficulty, for example, the arbitrary use of 20–70° as the orientation range and ~0.9 cycles/° as the spatial frequency setting, a direct comparison could introduce confounding factors unrelated to expectation.

      Importantly, Previous studies (e.g., Kok et al., 2012, 2017; Aitken et al., 2020) and our current results show that participants performed significantly better when the auditory cue matched the expected orientation, supporting the validity of our expectation manipulation.

      (2) An interesting consideration is why the center-surround inhibition profile of expectation was independent of the task-relevance of orientation. Previous studies (e.g., Kok et al., 2012) have found that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant. This could be useful to discuss the possible discrepancies.

      We thank the reviewer for this inspiring comment. Kok et al. (2012) showed that both orientation and contrast tasks elicited similar fMRI decoding results, regardless of task relevance, suggesting neural mechanisms of expectation operate independently of whether orientation is task relevant. Behaviorally, they reported better performance for expected versus unexpected trials in the orientation task (3.4° vs. 3.8°, t(17) = 2.8, p = 0.013), and a marginal trend (although not significant) in the contrast task (4.3% vs. 5.0%, t(17) = 1.9, p = 0.075). If any differences between the two tasks exist, they may lie in the correlation between behavioral and fMRI effects, a question that goes beyond the scope of the current study. Therefore, it is hard to strongly conclude that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant in their paper.

      Our study differs from theirs in at least two important ways, which may account for the clearer expectation facilitatory effect we observed in the expectation (Δ0°) condition. First, in our study, the orientation-irrelevant task involved spatial frequency discrimination (SFD) rather than contrast discrimination. Compared to contrast, spatial frequency has been shown to exhibit a clear cueing effect, as reported in Fang & Liu (2019). Second, our design included a baseline condition, which was absent in their study. We computed discrimination sensitivity (DS) to quantify how much the discrimination threshold (DT) changed relative to baseline. By using this baseline-referenced approach, we observed a significant facilitatory expectation effect in the Δ0° condition, an effect that shifted from marginal significance in their orientation-irrelevant task to clear significance in our study.

      (3) The authors might consider briefly explaining how the orientation adjustment paradigm used in this study is particularly effective for examining the potential co-existence of tuning sharpening and tuning shift computations, and how this approach complements traditional orientation discrimination tasks in characterizing expectation-related mechanisms.

      We thank the reviewer for this valuable suggestion. We agree that further clarification is needed to better connect the two experiments. To explain this, we have elaborated further in the manuscript.

      “To further explore the co-existence of both Tuning sharpening and Tuning shift computations in center-surround inhibition profile of expectation, participants were asked to perform a classic orientation adjustment experiment. Unlike profile experiment (discrimination tasks), the adjustment experiment provides a direct, trial-by-trial measure of participants’ perceived orientation, capturing the full distribution of responses. This enables the construction of orientation-specific tuning curves, allowing us to detect both tuning sharpening and tuning shifts, thereby offering a more nuanced understanding of the computational mechanisms underlying expectation.”

      (4) These interesting findings raise important questions about their relationship to existing hybrid models of attentional modulation. Could the authors discuss how their results might align with or extend previous work demonstrating combined feature-similarity gain and surround suppression effects for orientation (e.g., Fang & Liu, 2019)? Could a hybrid model potentially provide a better account of these data than the pure surround suppression model?

      We thank the reviewer for this valuable comment. We agree that hybrid model should be mentioned in the manuscript and we have elaborated further in the Discussion.

      “For example, within the orientation space, the inhibitory zone was about 20°, 45°, and 54° for expectation evident here, feature-based attention[21], and visual perceptual learning[35], respectively; within the feature-based attention, it was about 30° and 45° in color [77] and motion direction [53] spaces, respectively These variations hint at the exciting possibility that the width of the inhibitory surround may flexibly adapt to stimulus context and task demands, ultimately facilitating our perception and behavior in a changing environment. This principle is consistent with the hybrid model of feature-based attention [53,54,75], where attention is deployed adaptively to prioritize task-relevant information through feature-similarity gain which filters out the most distinctive distractors, and surround suppression which inhibits similar and confusable ones, thereby jointly shaping the attentional tuning profile.”

      (5) On page 19, there appears to be a missing symbol in the description of the Tuning Sharpening model. The text states: 'the tuning width of each channel's tuning function is parameterized by ??', where the question marks seem to indicate a missing parameter symbol.

      We appreciate the reviewer’s careful attention. Yes, the "ơ" is missing, which was likely caused by a formatting issue. We have corrected it.

    1. eLife Assessment

      This important study reports the results of efforts to replicate two phenomena of significant interest to early-career scientists and scientific policymakers: the Matthew effect and the early-career setback effect. Several previous studies of these effects have focused on early-career researchers with grant proposals that fell just below or just above a funding threshold. Those just above the threshold were more likely to be successful when they applied for funding later in the career (an example of the well-known Matthew effect), while those just below were more likely to go on to have stronger publication records (the early-career setback effect). In this study the Matthew effect was found to be robust across funders, and to generalize from those close to the funding threshold to the whole population. The early-career setback effect was not robust across funders and did not generalize to the whole population. The evidence reported is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed a multi-funder study to determine if the Matthew effect and early-career setback effect were reproducible across funding programs and processes. The authors extended the analysis of these effects to all applicants and compared the results to the prior studies that only looked at near-hit/near-miss applicants to determine if the effects were generalizable to the whole applicant pool. Further, the authors included new models that also account for researcher behavior and their overall likelihood to reapply for later funding and how this behavior may resolve what appears to be a paradox between the Matthew effect and the early-career setback effect.

      Strengths:

      Figure 4 shows that the "Post (late) MFCR" is the same for the funded and unfunded groups, indicating that the impact of early career funding (at least, in terms of citation metrics) is transient in researcher's overall careers. This finding should encourage researchers to persevere when needed and that long-term success is attainable.

      The inclusion of the collider bias in the models to account for researcher behavioral responses is a key strength of the paper and enhance the analysis and nuanced discussion of the results.

      Weaknesses:

      The discussion of limitations is thorough and point to the need for additional studies. One limitation that is acknowledged is that the authors only looked at applicants who reapplied for funding at the same funder. Given that the authors had the names and affiliations of the applicants from all of the funders, it would be helpful to understand why they were not able to look at applicants across their full data set. Was the limitation technical or a result of the study design? What would have to change to enable this broader analysis?

      In Section 4.1, the authors make a statement that the "between MFCR" difference was seen at 5 years, but not at 10 years, and so the authors chose to use the 5-year period for the presentation of their results. It would be helpful to also see the 10-year analysis and have further justification from the authors on why they selected to look at the 5-year period and how their conclusions might or might not change if they consider the longer time period.

      The discussion could also include that many funders require novel research directions as a condition of receiving an early-career award. For those who receive these awards, they must establish the new research program, begin publishing, and they may initially see a lower citation rate until the impact of the research is more broadly recognized. Are there ways to explore how these time lags impact the "Between MFCR" on those who were funded more so than those who were not funded?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript evaluates the generalizability of two phenomena of great interest to early-career scientists and scientific policymakers. These phenomena describe how early funding success can promote future funding success (the Matthew Effect) and how initially unsuccessful applicants may later succeed (the early-career setback effect). Given the often-normative aspirations of science-of-science studies, the manuscript represents a much-needed and highly significant effort, as it allows a broader audience to assess whether they should reconsider their behavior or policies.

      Strengths:

      The evidence provided by the authors for the generalizability of the Matthew Effect is very strong and convincing. The manuscripts addresses an important topic of practical concern to early-career scientists and scientific policymakers.

      Weaknesses: If I am correctly interpreting S11 and S12, the statements on the early-career setback effect could be stronger and more direct. The argument in the main text relies on assumptions and simulations to suggest that observations of the early-career setback effect may depend on reapplications. In contrast, S11 and S12 appear to provide more direct evidence against its generalizability, showing that the effect seems to exist in, and be driven by, only one of the six funding agencies considered (FWF). This narrow replication may not be obvious to readers ("the early-career setback effect also replicates, but is not robust across funders").

      I would also suggest that the authors provide a more nuanced discussion of the limitations of their Bayesian model. While the model seems appropriate for accounting for major factors, it appears to exclude others, such as the emergence of new scientific fields or the strategic reorientation of funders toward such fields.

    4. Reviewer #3 (Public review):

      Summary:

      This paper investigates the Matthew effect, where early success in funding peer review can translate into potentially unwarranted later success. It also investigated the previously found "setback" effect for those who narrowly miss out on funding.

      Strengths:

      The study used data from six funding agencies, which increases the generalisability, and was able to link bibliographic data for around 95% of applicants. The authors nicely illustrate how the previously found "setback" effect for near-miss applicants could be a collider bias due to those who chose to apply sometime later. This is a good explanation for the counter-intuitive effect and is nicely shown in Figure 5.

      Weaknesses:

      Most of the methods were clearly presented, but I have a few questions and comments, as outlined below.

      In Figure 4(a) why are the "post" means much lower than the "pre"? This contradicts the expected research trajectory of researchers. Or is this simply due to less follow-up time? But doesn't the field citation ratio control for follow-up time?

      The choice of the log-normal distribution for latent quality was not entirely clear to me. This would create some skew, rather than a symmetric distribution, which may be reasonable but log-normal distributions can have a very long tail which might not mimic reality, as I would not expect a small number of researchers to be extremely above the crowd. However, then the skew was potentially dampened by using percentile scores. Some further reasoning and plots of the priors would help.

      Can the authors confirm the results of Figure S9 which show no visible effect of altering the standard deviation for the review parameter or the mean citations? Is this just because the prior for quality is dominated by the data? Could it be that the width of the distribution for quality does not matter, as it's the relative difference/ranking that counts? So the beta in equation 6 changes to adjust to the different quality scale?

      The contrary result for the FWF is not explained (Table S3). Does this funder have different rules around re-applicants or many other competing funders?

      The outlined qualitative research sounds worthwhile. Another potential mechanism (based on anecdote) is that some researchers react irrationally to rejection or acceptance, tending to think that the whole agency likes or hates their work based on one experience. Many researchers do not appreciate that it was a somewhat random selection of reviewers who viewed their work, and it will unlikely be the same reviewers next time.

      "A key implication is the importance of encouraging promising, but initially unsuccessful applicants to reapply." Yes, A policy implication is to give people multiple chances to be lucky, perhaps by giving fewer grants to more people, which could be achieved by shortening the funding period (e.g., 4 year fellowships instead of 5 years). Although this will have some costs as applicants would need to spend more time on applications and suffer increased stress of shorter-term contracts. The bridge grants is potentially an ideal half-way house between many short-term and few long-term awards. Giving more grants to fewer people is supported by this analysis showing a diminishing returns in research outputs with more funding, DOI: 10.1371/journal.pone.0065263.

      Making more room for re-applicants also made me wonder if there should be an upper cap on funding, potentially for people who have been incredibly successful. Of course, funders generally want to award successful researchers, but people who've won over some limit, for example $50 million, could likely be expected to win funding from other sources such as philanthropy and business. Graded caps could occur by career stage.

    1. eLife Assessment

      This important research addresses the effects of subjective control and task difficulty on experienced stress using a novel behavioral task administered on the same day in two large online samples. Convincing evidence is provided, establishing the internal and external task validity of the task, as well as a relationship between the sense of control and task difficulty, with individual differences in relevant mental health constructs. Evidence for the specificity of the link between control and stress would be more substantial if the design had not conflated control and reward rate. This work will be of interest to psychologists and clinicians studying the concepts of controllability, stress, and psychopathology.

    2. Reviewer #1 (Public review):

      Summary:

      This work investigated how the sense of control influences perceptions of stress. In a novel "Wheel Stopping" task, the authors used task variations in difficulty and controllability to measure and manipulate perceived control in two large cohorts of online participants. The authors first demonstrate that their behavioral task exhibits good internal consistency and external validity, indicating that perceived control during the task is linked to relevant measures of anxiety, depression, and locus of control. Most importantly, manipulating controllability in the task resulted in reduced subjective stress, demonstrating a direct impact of control on stress perception. However, this work has some minor limitations to this work due to the design of the stressor manipulations/measurements and the necessary logistics associated with online versus in-person stress studies.<br /> Nevertheless, this research adds to our understanding of when and how control can influence the effects of stress and has particular relevance for mental health interventions.

      Strengths:

      The primary strength of this research is the development of a unique and clever task design that can reliably and validly elicit variations in beliefs about control. Impressively, higher subjective control in the task was associated with decreased psychopathology measures such as anxiety and depression in a non-clinical sample of participants. In addition, the authors found that lower control and higher task difficulty led to higher perceived stress, suggesting that the task can reliably manipulate perceptions of stress. Prior tasks have not included both controllability and difficulty in this manner and have not directly tested the direct influence of these factors on incidental stress, making this work both novel and important for the field.

      Weaknesses:

      One minor weakness of this research is the validity of the online stress measurements and manipulations. In this study, the authors measure subjective stress via self-report both during the task and after either a Trier Social Stress Test (high-stress condition) or a memory test (low-stress condition). One concern is that these stress manipulations were really "threats" of stress, where participants never had to complete the stress tasks (i.e., recording a speech for judgment). While this is not unusual for an in-lab study and can reliably elicit substantial stress/anxiety, in an online study, there is a possibility for communication between participants (via online forums dedicated to such communication), which could weaken the stress effects. That said, the authors did find sensible increases and decreases in perceived stress between relevant time points; however, future work could improve upon this design by including more comprehensive stress manipulations and by measuring implicit physiological signs of stress.

      Comments on revisions:

      I appreciate the authors' responses to my comments and concerns. I have decided not to make changes to my public review, as I believe it remains relevant and fair after revisions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have developed a behavioral paradigm to experimentally manipulate the sense of control experienced by participants by varying the level of difficulty in a wheel-stopping task. In the first study, this manipulation is tested by administering the task in a factorial design with two levels of controllability and two levels of stressor intensity to a large number of participants online, while simultaneously recording subjective ratings of perceived control, anxiety, and stress. In a second study, the authors employed the wheel stopping task to induce a high sense of controllability and investigate whether this manipulation buffers the response to a subsequent stress induction when compared to a neutral task, such as watching pleasant videos.

      Strengths:

      (1) The authors validate a method to manipulate stress.

      (2) The authors use an experimental manipulation to induce an enhanced sense of controllability to test its impact on the response to stress induction.\

      (3) The studies involved big sample sizes.

      Weaknesses:

      (1) The study was not preregistered.

      (2) The control manipulation is conflated with task difficulty and, therefore, the reward rate. In the revised version of the manuscript, the authors perform statistical analysis to demonstrate that the relationship between perceived level of control and subjective stress remains robust after the inclusion of win rate in the model. This analysis strengthens the authors's claims, but the evidence would more substantial if the design did not conflate reward rate and control. The authors properly discuss this issue in the revised manuscript.

      This study will be of interest to psychologists and cognitive scientists who are interested in understanding how controllability and its subjective perception influence how people respond to stress exposure. The demonstration that an increased sense of control buffers/protects against subsequent stress is important and may trigger further studies to characterize this phenomenon better. However, beyond the highlighted weaknesses, the current study only studied the effect of stress induction consequent to the performance of the WS task on the same day, and its generalizability is not warranted.

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting investigation on the benefits of perceiving control and its impact on the subjective experience of stress. To assess the subjective sense of control, the authors introduce a novel wheel stopping (WS) task where control is manipulated via size and speed to induce conditions of low and high control. The authors demonstrate that the subjective sense of control is associated with experienced subjective stress and individual differences related to mental health measures. In a second experiment, they further demonstrate that an increased sense of control buffers subjective stress induced by a trier social stress manipulation, more so than a typical stress-buffering mechanism of watching neutral/calming videos.

      Strengths:

      Several strengths of the manuscript can be highlighted. For instance, the paper introduces a new paradigm and a clever manipulation to test a significant and important question. Additionally, it is a well-powered investigation that allows for confidence in replicability and demonstrate both high internal consistency and high external validity, along with an interesting set of individual difference analyses. Finally, the results are quite interesting and support prior literature, while also making a significant contribution to the field in understanding the benefits of perceiving control.

      Weaknesses:

      The authors have addressed all my queries, and I believe the revised paper has been improved and will make an important contribution to the literature.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work investigated how the sense of control influences perceptions of stress. In a novel "Wheel Stopping" task, the authors used task variations in difficulty and controllability to measure and manipulate perceived control in two large cohorts of online participants. The authors first show that their behavioral task has good internal consistency and external validity, showing that perceived control during the task was linked to relevant measures of anxiety, depression, and locus of control. Most importantly, manipulating controllability in the task led to reduced subjective stress, showing a direct impact of control on stress perception. However, this work has minor limitations due to the design of the stressor manipulations/measurements and the necessary logistics associated with online versus in-person stress studies.

      Nevertheless, this research adds to our understanding of when and how control can influence the effects of stress and is particularly relevant to mental health interventions.

      We thank the reviewer for their clear and accurate summary of the findings. 

      Strengths:

      The primary strength of this research is the development of a unique and clever task design that can reliably and validly elicit variations in beliefs about control. Impressively, higher subjective control in the task was associated with decreased psychopathology measures such an anxiety and depression in a non-clinical sample of participants. In addition, the authors found that lower control and higher difficulty in the task led to higher perceived stress, suggesting that the task can reliably manipulate perceptions of stress. Prior tasks have not included both controllability and difficulty in this manner and have not directly tested the direct influence of these factors on incidental stress, making this work both novel and important for the field.

      We thank the reviewer for their positive comments.

      Weaknesses:

      One minor weakness of this research is the validity of the online stress measurements and manipulations. In this study, the authors measure subjective stress via self-report both during the task and also after either a Trier Social Stress Test (high-stress condition) or a memory test (low-stress condition). One concern is that these stress manipulations were really "threats" of stress, where participants never had to complete the stress tasks (i.e., recording a speech for judgment). While this is not unusual for an in-lab study and can reliably elicit substantial stress/anxiety, in an online study, there is a possibility for communication between participants (via online forums dedicated to such communication), which could weaken the stress effects. That said, the authors did find sensible increases and decreases of perceived stress between relevant time points, but future work could improve upon this design by including more complete stress manipulations and measuring implicit physiological signs of stress.

      We thank the reviewer for urging us to expand on this point. The reviewer is right that stress was merely anticipatory and is in that sense different to the canonical TSST. However, there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). Importantly, and as the reviewer notes in relation to our study specifically, the anticipatory TSST had a significant impact on subjective stress in the expected direction demonstrating that it was effective at eliciting subjective stress. We have elaborated further on this in our manuscript (pages 8 and 9) as follows: 

      “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”

      Reviewer #2 (Public review):

      Summary:

      The authors have developed a behavioral paradigm to experimentally manipulate the sense of control experienced by the participants by changing the level of difficulty of a wheel-stopping task. In the first study, this manipulation is tested by administering the task in a factorial design with two levels of controllability and two levels of stressor intensity to a large number of participants online while simultaneously recording subjective ratings on perceived control, anxiety, and stress. In the second study, the authors used the wheel-stopping task to induce a high sense of controllability and test whether this manipulation buffers the response to a subsequent stress induction when compared to a neutral task, like looking at pleasant videos.

      We thank the reviewer for their accurate summary.

      Strengths:

      (1) The authors validate a method to manipulate stress.

      (2) The authors use an experimental manipulation to induce an enhanced sense of controllability to test its impact on the response to stress induction.

      (3) The studies involved big sample sizes.

      We thank the reviewer for noting these positive aspects of our study. 

      Weaknesses:

      (1) The study was not preregistered.

      This is correct.

      (2) The control manipulation is conflated with task difficulty, and, therefore the reward rate. Although the authors acknowledge this limitation at the end of the discussion, it is a very important limitation, and its implications are not properly discussed. The discussion states that this is a common limitation with previous studies of control but omits that many studies have controlled for it using yoking.

      We agree that these are very important issues to consider in the interpretation of our findings. It is important to note, that while our task design does not separate these constructs, we are able to do so in our statistical analyses. For example, our measure of perceived difficulty was included in analyses assessing the fluctuations in stress and control in which subjective control still had a unique effect on the experience of stress over and above perceived difficulty, suggesting that subjective control explains variance in stress beyond what is accounted for by perceived difficulty. Similarly, we have also included additional analyses in which we include the win rate (i.e. percentage of trials won) as a covariate when assessing the relationship between subjective control, perceived difficulty and subjective stress, in which subjective control and perceived difficulty still uniquely predict subjective stress when controlling for win rate. This suggests that there is unique variance in subjective control, separate from perceived task difficulty and win rate that is relevant to stress. We have included these analyses (page 16 of manuscript) as follows:

      “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both  win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.”

      As well as expanding on this in the Discussion (pages 27 and 28) as follows:

      “While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses. Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”

      Further, in terms of the wider literature on these issues, we have added more to this point in our discussion, especially in relation to previous literature that also varies control by reward rate (e.g. Dorfman & Gershman, 2019, who use a reward rate of 80% in high control conditions and 50% in low control conditions). This can be found in the manuscript on page 27 as follows: 

      “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition).”

      (3) The methods are not always clear enough, and it is difficult to know whether all the manipulations are done within-subjects or some key manipulations are done between subjects.

      We have added more information in the methods section (page 8) clarifying withinsubject manipulations (WS task parameters) and between-subject manipulations (stressor intensity task, WS task version in Study 1, and WS task/video task in Study 2). Additionally, as recommended by Reviewer 1, we have provided more information in the methods section and Table S3 regarding the details of on-screen written feedback provided to participants after each trial of the WS Task.

      (4) The analysis of internal consistency is based on splitting the data into odd/even sliders. This choice of data parcellation may cause missed drifts in task performance due to learning, practice effects, or tiredness, thus potentially inflating internal consistency.

      We agree that this can indeed be an issue, though drift is likely to be present in any task including even in mood in resting-state (Jangraw et al., 2023). To respond to this specific point, we parcellated the timepoints into a 1<sup>st</sup>/2<sup>nd</sup> half split and report the ICC in the supplementary information. While values are lower, indeed likely due to systematic drifts in task performance as participants learn to perform the task (especially for Study 2 since the order of parameters were designed to get easier throughout the experiment), the ICC values are still high. Control sliders: Study 1 = 0.82, Study 2: = 0.68; Difficulty sliders: Study 1: = 0.84, Study 2 = 0.57; Stress sliders: Study 1 = 0.45, Study 2 = 0.71. As seen, the lowest ICC is for stress sliders in Study 1. This may be because the first 3 sliders (included in the 1<sup>st</sup> half split) were all related to the stress task (initial, post-stress, task, post-debrief) and the final 4 sliders (in the 2<sup>nd</sup> half split) were the three sliders during the WS task and shortly afterwards. 

      (5) Study 2 manipulates the effect of domain (win versus loss WS task), but the interaction of this factor with stressor intensity is not included in the analysis.

      We agree that this would be a valuable analysis to include. We have run additional analyses (section Sensitivity and Exploratory Analyses, pages 24 and 25), testing the interaction of Domain (win or loss) with stressor intensity (and time) when predicting the stress buffering and stress relief effects. This revealed no significant main effects of domain or interactions including domain, suggesting that domain did not impact the stress induction or relief differently depending on whether it was followed by the high or low stressor intensity condition. While the control by time interaction (our main effect of interest) still held for stress induction in this more complex model, the control by time interaction did not hold for the stress relief. However, this more complex model did not provide a better fit for the data, motivating us to continue to draw conclusions from the original model specification with domain as a covariate (rather than an interaction).

      We outline these analyses on page 24 of the manuscript, as follows:

      “Third, we included the interaction of domain with stressor intensity and with time, to test whether the win or loss domain in the WS task significantly impacted stress induction or stress relief differently depending on stressor intensity. There were no significant effects or interactions of domain (Table S14) for stress induction or stress relief, and the main effect of interest (the interaction between time and control) still held for the stress induction (β= 10.20, SE=4.99 p=.041, Table S14), though was no longer significant for the stress relief  (β= 6.72, SE=4.28, p=.117, Table S14). This more complex model did not significantly improve model fit (χ<sup>²</sup>(3)= 1.46, p=.691) compared to our original specification (with domain as a covariate rather than an interaction) and had slightly worse fit (higher AIC and BIC) than the original model (AIC = 5477.2 versus 5472.7, BIC = 5538.5 versus 5520.8).”

      This study will be of interest to psychologists and cognitive scientists interested in understanding how controllability and its subjective perception impact how people respond to stress exposure. Demonstrating that an increased sense of control buffers/protects against subsequent stress is important and may trigger further studies to characterize this phenomenon better. However, beyond the highlighted weaknesses, the current study only studied the effect of stress induction consecutive to the performance of the WS task on the same day and its generalizability is not warranted.

      We thank the reviewer for this assessment and agree that we cannot assume these findings would generalise to more prolonged effects on stress responses.

      Reviewer #3 (Public review):

      Summary:

      This is an interesting investigation of the benefits of perceiving control and its impact on the subjective experience of stress. To assess a subjective sense of control, the authors introduce a novel wheel-stopping (WS) task where control is manipulated via size and speed to induce low and high control conditions. The authors demonstrate that the subjective sense of control is associated with experienced subjective stress and individual differences related to mental health measures. In a second experiment, they further show that an increased sense of control buffers subjective stress induced by a trier social stress manipulation, more so than a more typical stress buffering mechanism of watching neutral/calming videos.

      We agree with this accurate summary of our study. 

      Strengths:

      There are several strengths to the manuscript that can be highlighted. For instance, the paper introduces a new paradigm and a clever manipulation to test an important and significant question. Additionally, it is a well-powered investigation that allows for confidence in replicability and the ability to show both high internal consistency and high external validity with an interesting set of individual difference analyses. Finally, the results are quite interesting and support prior literature while also providing a significant contribution to the field with respect to understanding the benefits of perceiving control.

      We thank the reviewer for this positive assessment. 

      Weaknesses:

      There are also some questions that, if addressed, could help our readership.

      (1) A key manipulation was the high-intensity stressor (Anticipatory TSST signal), which was measured via subjective ratings recorded on a sliding scale at different intervals during testing. Typically, the TSST conducted in the lab is associated with increases in cortisol assessments and physiological responses (e.g., skin conductance and heart rate). The current study is limited to subjective measures of stress, given the online nature of the study. Since TSST online may also yield psychologically different results than in the lab (i.e., presumably in a comfortable environment, not facing a panel of judges), it would be helpful for the authors to briefly discuss how the subjective results compare with other examples from the literature (either online or in the lab). The question is whether the experienced stress was sufficiently stressful given that it was online and measured via subjective reports. The control condition (low intensity via reading recipes) is helpful, but the low-intensity stress does not seem to differ from baseline readings at the beginning of the experiment.

      We agree that it would be helpful to expand on this further. Similar to the comment made by Reviewer 1, we wish to point out that there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). We have elaborated further on this in our manuscript on pages 8 and 9 as follows:

      “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”

      (2) The neutral videos represent an important condition to contrast with WS, but it raises two questions. First, the conditions are quite different in terms of experience, and it is interesting to consider what another more active (but not controlled per se) condition would be in comparison to the WS performance. That is, there is no instrumental action during the neutral video viewing (even passive ratings about the video), and the active demands could be an important component of the ability to mitigate stress. Second, the subjective ratings of the stress of the neutral video appear equivalent to the win condition. Would it have been useful to have a high arousal video (akin to the loss condition) to test the idea that experience of control will buffer against stress? That way, the subjective stress experience of stress would start at equivalent points after WS3.

      We agree with the reviewer that this is an important issue to clarify. In our deliberations when designing this study, we considered that that any task with actionoutcome contingencies would have a degree of controllability. To better distinguish experiences of control (WS task) to an experience of no/neutral control (i.e., neither high nor low controllability), we decided to use a task in which no actions were required during the task itself. Importantly, however, there was an active demand and concentration was still required in order to perform the attention checks regarding the content of the videos and ratings of the videos. 

      Thank you for the suggestion of having a high arousal video condition. This would indeed be interesting to test how experiencing ‘neutral’ control and high(er) stress levels preceding the stressor task influences stress buffering and stress relief, and we have included this suggestion for future research in the discussion section (page 28) as below:

      “Another avenue for future research would be to test how control buffers against stress when compared to a neutral control scenario of higher stress levels, akin to the loss domain in the WS Task, given that participants found the video condition generally relaxing. However, given that we found no differences dependent on domain for the stress induction in the WS Task conditions, it is possible that different versions of a neutral control condition would not impact the stress induction.”

      (3) For the stress relief analysis, the authors included time points 2 and 3 (after the stressor and debrief) but not a baseline reading before stress. Given the potential baseline differences across conditions, can this decision be justified in the manuscript?

      We thank the reviewer for raising this. Regarding the stress relief analyses (timepoints 2 and 3) and not including timepoint 1 (after the WS/video task) stress in the model, we have added to the manuscript that there was no significant difference in stress ratings between the high control and neutral control (collapsed across stress and domain) at timepoint 1 (hence why we do not think it’s necessary to include in the stress relief model). Nevertheless, we have now included a sensitivity analysis to test the Timepoint*Control interaction of stress relief when including timepoint 1 stress as a covariate. The timepoint by control interaction still holds, suggesting that the initial stress level prior to the stress induction does not impact our results of interest. The details of this analysis are included in the Sensitivity and Exploratory Analyses section on page 24:

      “Although there were no significant differences between control groups in subjective stress immediately after the WS/video task (t(175.6)=1.17, p=.244), we included participants’ stress level after the WS/video task as a covariate in the stress relief analyses (Table S12). The results revealed a main effect of initial stress (β= 0.643, SE=0.040, p<.001, Table S12) on the stress relief after the stressor debrief. Compared to excluding initial stress as in the original analyses (Table 4), there was now no longer a main effect of domain (β= 0.236, SE=2.60, p=.093, Table S12), but the inference of all other effects remained the same. Importantly, there was still a significant time by control interaction (β= 9.65, SE=3.74, p=.010, Table S12) showing that the decrease in stress after the debrief was greater in the highly controllable WS condition than the neutral control video condition, even when accounting for the initial stress level.”

      (4) Is the increased control experience during the losses condition more valuable in mitigating experienced stress than the win condition?

      We agree that this would be helpful to clarify. To test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) to test for a Domain*Time interaction. This revealed no significant Domain*Time interaction, suggesting that the stress buffering or stress relief effect was not dependent on domain in the high control conditions. These analyses are outlined in the Sensitivity and Exploratory Analyses section on page 25:

      “Finally, to test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) for the stress induction and stress relief to test for an interaction of domain and time. For the stress induction, there was no significant two-way interaction of domain and time (β= -1.45, SE=4.80, p=.763), nor a significant three-way interaction of domain by time by stressor intensity (β= -3.96, SE=6.74, p=.557, Table S15), suggesting that there were no differences in the stress induction dependent on domain. Similarly for the stress relief, there was no significant two-way interaction of domain and time (β= -5.92, SE=4.42, p=.182), nor a significant three-way interaction of domain by time by stressor intensity interaction (β= 8.86, SE=6.21, p=.154, Table S15), suggesting that there were no differences in the stress relief dependent on the WS Task domain.

      (5) The subjective measure of control ("how in control do you feel right now") tends to follow a successful or failed attempt at the WS task. How much is the experience of control mediated by the degree of experienced success/schedule of reinforcement? Is it an assessment of control or, an evaluation of how well they are doing and/or resolution of uncertainty? An interesting paper by Cockburn et al. 2014 highlights the potential for positive prediction errors to enhance the desire for control.

      We thank the reviewer for this comment. Similar to comments regarding reward rate, our task does not allow us to fully separate control from success/reinforcement because of the manipulation of difficulty. However, we did undertake sensitivity analyses and the inclusion of overall win rate accounted for limited variance when predicting stress over and above subjective control and difficulty (page 16). 

      “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both  win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.” 

      (6) While the authors do a very good job in their inclusion and synthesis of the relevant literature, they could also amplify some discussion in specific areas. For example, operationalizing task controllability via task difficulty is an interesting approach. It would be useful to discuss their approach (along with any others in the literature that have used it) and compare it to other typically used paradigms measuring control via presence or absence of choice, as mentioned by the authors briefly in the introduction.

      We are delighted to expand on this particular point and have done so in the Discussion on page 27:

      “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition). While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses.” 

      (7) The paper is well-written. However, it would be useful to expand on Figure 1 to include a) separate figures for study 1 (currently not included) and 2, and b) a timeline that includes the measurements of subjective stress (incorporated in Figure 1). It would also be helpful to include Figure S4 in the manuscript.

      We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (now top panel within Figure 4). 

      Reviewer #1 (Recommendations for the authors):

      (1) Study 2 shows a greater decrease in subjective stress after the high-control task manipulation than after the pleasant video. One possible confound is whether the amount of time to complete the WS task and the video differ. It could be helpful to look at the average completion time for the WS task and compare that to the length of the videos. Alternatively, in future studies, control for this by dynamically adjusting the video play length to each participant based on how long they took to complete the WS task.

      This is an interesting suggestion. As a result, we have included the time taken as a covariate in the stress induction and stress relief analyses to ensure that any differences in time between the WS task and video task were not accounting for any of the stress induction or relief analyses. Controlling for the total time taken did not impact the stress induction or relief results. This is included in the Sensitivity and Exploratory Analyses section on page 24:

      “Our second sensitivity analyses was conducted because the experiment took longer to complete for the video condition (mean = 54.3 minutes, SD = 12.4 minutes) than the WS task condition (mean = 39.7 minutes, SD = 12.8 minutes, t(186.19)=-9.32, p<.001). We therefore included the total time (in ms) as a covariate in the stress induction and stress relief analyses for Study 2. This showed that accounting for total time did not change the results of interest (Table S13), further highlighting that the time by control interactions were robust.”

      (2) Because participants received feedback about their success/failure in the WS task, a confounding factor could be that they received positive feedback on highly controllable trials and negative feedback on low control trials (and/or highly difficult trials). This would suggest that it is not controllability per se that contributes to stress perception but rather feedback valence. The authors show that this is a likely factor in their results in Study 2, which shows significant effects of the loss domain on perceived control and stress. Was a similar analysis done in Study 1? Do participants receive feedback in Study 1? It would be helpful to include this information somewhere in the manuscript. I would be curious to know whether *any* feedback at all influences controllability/stress perceptions.

      We thank the reviewer for this interesting suggestion. It is an interesting question as to whether feedback valence is related to stress in Study 1, and we have added this point to the Discussion on pages 27 and 28. To speak to this point, when we include the overall win rate (which captures the subsequent feedback received) when predicting subjective stress, win rate is not a significant predictor of stress over and above perceived difficulty and subjective control, suggesting that overall feedback valence may not be related to stress in Study 1. We take this as evidence that feedback may not be as important in terms of accounting for the relationship between stress and control. However, we unfortunately do not have any data in which there was no feedback provided to speak to this conclusively. This would be an interesting future study. The excerpt below is added to pages 27 and 28 of the discussion section:

      “Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”

      To respond specifically to the reviewer’s question about the feedback given to participants, written feedback was provided on screen to participants on a trial-bytrial basis also in Study 1 (i.e. for both studies), and we have provided more clarity about this in the manuscript on page 8 as well as providing additional details in Table S3:

      “After each trial, participants were shown written feedback on screen as to whether the segment had successfully stopped on the red zone (or not), and the associated reward (or lack of). See Table S3 for details.”

      (3) I'm not sure how to interpret the fact that in Figure S1, the BICs are all essentially the same. Does this mean that you don't really need all of these varying aspects of the task to achieve the same effects? Could the task be made simpler?

      The similarity of BIC values suggests that a simpler WS task would have produced a worse account of the data approximately in keeping with the extent to which it is a simpler model. Here, the BIC scores for the models are similar, suggesting that adding these parameters adds explanatory power in keeping with what would have been expected from adding a parameter, but not more. We do note that the BIC is a relatively strict and conservative comparison. The fact that the most complex model overall narrowly improves parsimony; combined with the interpretable parameter values and the prior expectations given the task setup led us to focus on this most complex model.  

      (4) A minor point, but the authors refer to their sample as "neurotypical." Were they assessed for prior/current psychopathology/medications? If not, I might use a different term here (perhaps "non-clinical sample"), since some prior work has shown that online samples actually have higher instances of psychopathology compared to community samples.

      We have changed the phrasing of ‘neurotypical’ to a ‘non-clinical sample’ as recommended.

      Reviewer #2 (Recommendations for the authors):

      Figure 4S is very informative and could be presented in the main text.

      We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (top panel of Figure 4). 

      References:

      Dorfman, H. M., & Gershman, S. J. (2019). Controllability governs the balance between Pavlovian and instrumental action selection. Nature Communications, 10(1), 5826. https://doi.org/10.1038/s41467-019-13737-7

      DuPont, C. M., Pressman, S. D., Reed, R. G., Manuck, S. B., Marsland, A. L., & Gianaros, P. J. (2022). An online Trier social stress paradigm to evoke affective and cardiovascular responses. Psychophysiology, 59(10), e14067. https://doi.org/10.1111/psyp.14067

      Jangraw, D. C., Keren, H., Sun, H., Bedder, R. L., Rutledge, R. B., Pereira, F., Thomas, A. G., Pine, D. S., Zheng, C., Nielson, D. M., & Stringaris, A. (2023). A highly replicable decline in mood during rest and simple tasks. Nature Human Behaviour, 7(4), 596–610. https://doi.org/10.1038/s41562-023-015197

      Meier, M., Haub, K., Schramm, M.-L., Hamma, M., Bentele, U. U., Dimitroff, S. J., Gärtner, R., Denk, B. F., Benz, A. B. E., Unternaehrer, E., & Pruessner, J. C. (2022). Validation of an online version of the trier social stress test in adult men and women. Psychoneuroendocrinology, 142, 105818. https://doi.org/10.1016/j.psyneuen.2022.105818

      Nasso, S., Vanderhasselt, M.-A., Demeyer, I., & De Raedt, R. (2019). Autonomic regulation in response to stress: The influence of anticipatory emotion regulation strategies and trait rumination. Emotion, 19(3), 443–454. https://doi.org/10.1037/emo0000448

      Schlatter, S., Schmidt, L., Lilot, M., Guillot, A., & Debarnot, U. (2021). Implementing biofeedback as a proactive coping strategy: Psychological and physiological effects on anticipatory stress. Behaviour Research and Therapy, 140, 103834. https://doi.org/10.1016/j.brat.2021.103834

      Steinbeis, N., Engert, V., Linz, R., & Singer, T. (2015). The effects of stress and affiliation on social decision-making: Investigating the tend-and-befriend pattern. Psychoneuroendocrinology, 62, 138–148. https://doi.org/10.1016/j.psyneuen.2015.08.003

    1. eLife Assessment

      This important study addresses the timely and interesting question of how itaconate generation emerged in evolution, using taxonomic analysis of the gene and enzyme cis-aconitate decarboxylase (CAD). The authors provide solid evidence identifying three CAD branches in metazoans and showing that the early metazoan paleo-form indeed generates aconitate and is already linked to innate immunity. They further provide limited evidence suggesting that taxonomic differences in subcellular localisation of this enzyme may allow for innate immune signalling without compromising cellular energetics. The implications of the study will be of high interest to the field of innate host defence and immunometabolism.

    2. Reviewer #1 (Public review):

      Summary:

      The taxonomic analysis of IRG1 evolution is compelling and fills an important gap in the literature. However, the experimental evidence for IRG1 localization requires greater detail and confirmation.

      Strengths:

      The phylogenetic analysis of IRG1 evolution fills an important gap in the literature. The identification of independent acquisition of metazoan and fungal IRG1 from prokaryotic sources is novel, and the observation that human IRG1 lost mitochondrial matrix localization is particularly interesting, with potentially significant implications for the study of itaconate biology.

      Weaknesses:

      The protease protection assay was conducted with MTS-IRG1 but not with wild-type IRG1, which should also be tested. Moreover, no complementary methods, such as microscopy, were employed to validate localization. Beyond humans, the structure and localization of mouse IRG1, highly relevant given the widespread use of the mouse as a model for IRG1 functional studies, are not addressed. Finally, if itaconate is indeed synthesized outside the mitochondrial matrix to safeguard metabolic activity, it is not discussed how this reconciles with its reported inhibitory effect on SDH.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are trying to explain how the metabolite itaconate evolved, since although it's involved in host defense, it can also limit mitochondrial function. They are trying to probe the trade-off between these two functions.

      Strengths:

      The evolutionary aspect is novel; this is the first time to my knowledge that the evolution of IRG1 has been analysed, and there are interesting findings here. The key finding appears to be that subcellular localisation is an important aspect, allowing host defense in some organisms without compromising bioenergetics. This is an interesting finding in the context of immunomebolism, although it needs extra analysis.

      Weaknesses:

      The work concerning sub-mitochondrial localisation is confusing and needs better analysis.

    4. Reviewer #3 (Public review):

      Summary:

      IRG1 is highly expressed in activated human and mouse myeloid cells. It encodes the mitochondrial enzyme cis-aconitate decarboxylase 1 (ACOD1) that generates itaconate. Itaconate has anti-microbial activity and acts immunoregulatory by interfering with cellular metabolism, signaling to cytokine production, and multiple other processes.

      The authors perform a phylogenetic analysis of IRG1 to obtain insight into the evolution of itaconate biosynthesis. Combining BLAST with human IRG1 and a MmgE/Ptrp domain search, they find CAD in all domains of life, but the presence of IRG1 homologs is patchy in eukaryotes, indicating that itaconate biosynthesis is not essential. The phylogenetic analysis showed a more distant relationship of fungal and metazoan CAD/IRG1 to many prokaryotic sequences, suggesting independent acquisition of these metazoan and fungal CAD genes. In metazoans, three subbranches of paleo-IRG1 (in mollusks/early chordates) and two paralogous vertebrate forms (IRG1 and IRG1-like) were identified, with the latter derived from paleo-IRG1, and by genome duplication. While most jawed vertebrates have both IRG1 and IRG1L, metatherian and eutherian mammals have lost IRG1L and contain only IRG1.

      Interestingly, sequence analysis of both paralogues showed that many IRG1L genes contain an N-terminal mitochondrial targeting sequence (MTS) that is absent from most IRG1 sequences. Limited proteolysis of submitochondrial localization confirmed that zebrafish IRG1L is only sensitive to proteases in the presence of high Triton X-100, indicative of association with mitochondrial matrix. In contrast, a recent paper from the Galan lab (Lian 2003 Nature Microbiology) reported that human IRG1 is not localized to the mitochondrial matrix, although enriched in mitochondria. Here, the authors generated a matrix-targeted human IRG1 by adding the N-terminal MTS and found that it localizes to the matrix based on a limited proteolysis assay. The loss of MTS-containing IRG1L from most mammals appears, therefore, to indicate that itaconate generation is directed to the cytoplasm, potentially reducing inhibition of TCA cycle activity in the mitochondria.

      Next, the authors confirmed that the recombinant IRG1L protein has CAD activity in vitro. The last part of the manuscript addresses the expression of paleo-IRG1 in oysters and amphioxus, where they found high mRNA levels in oyster hemocytes which was further increased by poly(I:C), which was also the case in amphioxus tissues after feeding of LPS or poly(I:C), indicating a role for paleo-IRG1/itaconate in early metazoan innate immunity.

      Strengths

      (1) Phylogenetic perspective largely lacking so far in the IRG1/itaconate field.

      (2) Manuscript clearly written and understandable across disciplines.

      (3) Phylogenetic analyses complemented by biochemical and gene expression analyses to link to function.

      (4) Lack of MTS in IRG1 and change in localization from mitochondria, highly relevant antimicrobial and cellular effects of itaconate.

      Weaknesses:

      (1) Biochemical and functional analysis of different CAD mRNA and proteins lacks depth.

      (2) The submitochondrial localization assay lacks a native human IRG1 control.

      (3) CAD activity shown for IRG1L but not paleo-IRG1.

      (4) Itaconate production by early metazoans after PAMP stimulation?

      (5) No measurement of energy metabolism (trade-offs?).

      I acknowledge that some of these limitations are inevitable because the range of detailed experimental analysis is necessarily limited. However, some of these data would be important to support central claims of the manuscript (further discussed below).

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The taxonomic analysis of IRG1 evolution is compelling and fills an important gap in the literature. However, the experimental evidence for IRG1 localization requires greater detail and confirmation. 

      Strengths: 

      The phylogenetic analysis of IRG1 evolution fills an important gap in the literature. The identification of independent acquisition of metazoan and fungal IRG1 from prokaryotic sources is novel, and the observation that human IRG1 lost mitochondrial matrix localization is particularly interesting, with potentially significant implications for the study of itaconate biology. 

      We thank the reviewer for appreciating the novelty of our study in exploring IRG1 evolution.  

      Weaknesses: 

      The protease protection assay was conducted with MTS-IRG1 but not with wild-type IRG1, which should also be tested. Moreover, no complementary methods, such as microscopy, were employed to validate localization. Beyond humans, the structure and localization of mouse IRG1, highly relevant given the widespread use of the mouse as a model for IRG1 functional studies, are not addressed. 

      Regarding submitochondrial localization of IRG1, we want to draw attention to the published data that a protease protection assay for wild-type mammalian IRG1 has been performed by Lian et al. 2023 (Extended Data Fig. 4), which convincingly demonstrated an outer-mitochondrial membrane localization of endogenous mouse IRG1 in mouse DC2.4 cells upon LPS stimulation that induces IRG1 expression. 

      Regarding complementary microscopy evidence, the same paper performed two-color,  DNA-paint super-resolution imaging to demonstrate an enrichment of IRG1 to mitochondria with a lack of co-localization of the inner membrane/matrix marker Cox IV. 

      Given the direct visualization of sub-mitochondrial localization, we consider applying super-resolution microscopy to revisit the sub-mitochondrial localization of di[erent IRG1 constructs in the study.   

      Reference:

      Lian H, Park D, Chen M, Schueder F, Lara-Tejero M, Liu J, Galán JE. Parkinson's disease kinase LRRK2 coordinates a cell-intrinsic itaconate-dependent defence pathway against intracellular Salmonella. Nat Microbiol. 2023 Oct;8(10):1880-1895. doi: 10.1038/s41564-023-01459-y. Epub 2023 Aug 28. PMID: 37640963; PMCID: PMC10962312.

      Finally, if itaconate is indeed synthesized outside the mitochondrial matrix to safeguard metabolic activity, it is not discussed how this reconciles with its reported inhibitory e[ect on SDH. 

      We thank the excellent point raised by the reviewer. Indeed, itaconate has been proposed to inhibit matrix SDH exhibiting anti-inflammation function (Lampropoulou, Cell Metab 2016). While the mitochondrial transport of itaconate has not been fully characterized in vivo or in cells, a specific itaconate transport activity has been shown for the mitochondrial 2-oxoglutarate transporter OGC using in vitro proteoliposome system (Mills et al. Nature 2018). 

      We plan to discuss this important point on mitochondrial itaconate transport in the revision. 

      Reference: 

      Lampropoulou V, Sergushichev A, Bambouskova M, Nair S, Vincent EE, Loginicheva E, Cervantes-Barragan L, Ma X, Huang SC, Griss T, Weinheimer CJ, Khader S, Randolph GJ, Pearce EJ, Jones RG, Diwan A, Diamond MS, Artyomov MN. Itaconate Links Inhibition of Succinate Dehydrogenase with Macrophage Metabolic Remodeling and Regulation of Inflammation. Cell Metab. 2016 Jul 12;24(1):158-66. doi: 10.1016/j.cmet.2016.06.004. Epub 2016 Jun 30. PMID: 27374498; PMCID: PMC5108454.  

      Mills EL, Ryan DG, Prag HA, Dikovskaya D, Menon D, Zaslona Z, Jedrychowski MP, Costa ASH, Higgins M, Hams E, Szpyt J, Runtsch MC, King MS, McGouran JF, Fischer R, Kessler BM, McGettrick AF, Hughes MM, Carroll RG, Booty LM, Knatko EV, Meakin PJ, Ashford MLJ, Modis LK, Brunori G, Sévin DC, Fallon PG, Caldwell ST, Kunji ERS, Chouchani ET, Frezza C, Dinkova-Kostova AT, Hartley RC, Murphy MP, O'Neill LA. Itaconate is an anti-inflammatory metabolite that activates Nrf2 via alkylation of KEAP1. Nature. 2018 Apr 5;556(7699):113117. doi: 10.1038/nature25986. Epub 2018 Mar 28. PMID: 29590092; PMCID: PMC6047741.

      Reviewer #2 (Public review): 

      Summary: 

      The authors are trying to explain how the metabolite itaconate evolved, since although it's involved in host defense, it can also limit mitochondrial function. They are trying to probe the trade-o[ between these two functions. 

      Strengths: 

      The evolutionary aspect is novel; this is the first time to my knowledge that the evolution of IRG1 has been analysed, and there are interesting findings here. The key finding appears to be that subcellular localisation is an important aspect, allowing host defense in some organisms without compromising bioenergetics. This is an interesting finding in the context of immunomebolism, although it needs extra analysis. 

      Weaknesses: 

      The work concerning sub-mitochondrial localisation is confusing and needs better analysis. 

      We thank the reviewer for the constructive feedback. As in our response to reviewer 1, we want to draw attention to the published data in which the outer mitochondrial membrane localization of IRG1 has been demonstrated by protease protection assay and explored using super-resolution imaging by Lian et al. 2023 (Extended Data Fig. 4). Given the direct visualization of sub-mitochondrial localization by super-resolution imaging, we plan to revisit and to apply the method to di[erent IRG1 constructs used in the paper.

      Reviewer #3 (Public review): 

      Summary: 

      IRG1 is highly expressed in activated human and mouse myeloid cells. It encodes the mitochondrial enzyme cis-aconitate decarboxylase 1 (ACOD1) that generates itaconate. Itaconate has anti-microbial activity and acts immunoregulatory by interfering with cellular metabolism, signaling to cytokine production, and multiple other processes. 

      The authors perform a phylogenetic analysis of IRG1 to obtain insight into the evolution of itaconate biosynthesis. Combining BLAST with human IRG1 and a MmgE/Ptrp domain search, they find CAD in all domains of life, but the presence of IRG1 homologs is patchy in eukaryotes, indicating that itaconate biosynthesis is not essential. The phylogenetic analysis showed a more distant relationship of fungal and metazoan CAD/IRG1 to many prokaryotic sequences, suggesting independent acquisition of these metazoan and fungal CAD genes. In metazoans, three subbranches of paleo-IRG1 (in mollusks/early chordates) and two paralogous vertebrate forms (IRG1 and IRG1-like) were identified, with the latter derived from paleo-IRG1, and by genome duplication. While most jawed vertebrates have both IRG1 and IRG1L, metatherian and eutherian mammals have lost IRG1L and contain only IRG1. 

      Interestingly, sequence analysis of both paralogues showed that many IRG1L genes contain an N-terminal mitochondrial targeting sequence (MTS) that is absent from most IRG1 sequences. Limited proteolysis of submitochondrial localization confirmed that zebrafish IRG1L is only sensitive to proteases in the presence of high Triton X-100, indicative of association with mitochondrial matrix. In contrast, a recent paper from the Galan lab (Lian 2003 Nature Microbiology) reported that human IRG1 is not localized to the mitochondrial matrix, although enriched in mitochondria. Here, the authors generated a matrix-targeted human IRG1 by adding the N-terminal MTS and found that it localizes to the matrix based on a limited proteolysis assay. The loss of MTS-containing IRG1L from most mammals appears, therefore, to indicate that itaconate generation is directed to the cytoplasm, potentially reducing inhibition of TCA cycle activity in the mitochondria. 

      Next, the authors confirmed that the recombinant IRG1L protein has CAD activity in vitro. The last part of the manuscript addresses the expression of paleo-IRG1 in oysters and amphioxus, where they found high mRNA levels in oyster hemocytes which was further increased by poly(I:C), which was also the case in amphioxus tissues after feeding of LPS or poly(I:C), indicating a role for paleo-IRG1/itaconate in early metazoan innate immunity. 

      Strengths 

      (1) Phylogenetic perspective largely lacking so far in the IRG1/itaconate field. 

      (2) Manuscript clearly written and understandable across disciplines. 

      (3) Phylogenetic analyses complemented by biochemical and gene expression analyses to link to function. 

      (4) Lack of MTS in IRG1 and change in localization from mitochondria, highly relevant antimicrobial and cellular e[ects of itaconate. 

      We thank the reviewer for the positive comments with the strengths.  

      Weaknesses: 

      (1) Biochemical and functional analysis of di[erent CAD mRNA and proteins lacks depth. 

      We plan to explore two types of experiments: 

      First, we plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in synthesize itaconate. The positive data will also answer question (3) below.

      Second, we plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1. The data will also address question (4) below. 

      (2) The submitochondrial localization assay lacks a native human IRG1 control. 

      As in our response to reviewer 1, we believe Lian et al. 2023. provided strong evidence supporting an outer mitochondrial membrane localization of wild-type endogenous, mouse IRG1. Given the direct visualization using suer-resolution imaging, we plan to revisit submitochondrial localization of di[erent IRG1 constructs using super-resolution imaging. 

      (3) CAD activity shown for IRG1L but not paleo-IRG1. 

      We plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in producing itaconate.

      (4) Itaconate production by early metazoans after PAMP stimulation? 

      We plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1.

      (5) No measurement of energy metabolism (trade-o[s?). 

      Because PAMP signaling might trigger other downstream e[ects that also impair mitochondrial function, for instance nitric oxide that inhibits complex IV, we plan to avoid PAMP condition and direct test the e[ect of itaconate production. We plan to compare the impact on mitochondrial bioenergetics, if the same CAD enzymes (thus with the same activity) can be expressed at the same level intra-mitochondrially and extramitochondrially, for instance in the case of MTS-hACOD1 and hACOD1.

    1. eLife Assessment

      This work provides a valuable comparison of sentence structure representations in the human brain and state-of-the-art Large Language Models (LLMs). Based on solid analysis of 7T fMRI data, it systematically identifies sentences in which LLMs underperform relative to models that explicitly code for syntactic structure. The study will be of significant interest to both cognitive neuroscientists and artificial intelligence researchers.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates whether transformer-based models can represent sentence-level semantics in a human-like way. The authors designed a set of 108 sentences specifically to dissociate lexical semantics from sentence-level information and collected 7T fMRI data from 30 participants reading these sentences. They conducted representational similarity analysis (RSA) comparing brain data and model representations, as well as the human behavioral ratings. It is found that transformer-based models match brain representation better than a static word embedding baseline, which ignores word order, but fall short of models that encode the structural relations between words. The main contributions of this paper are:

      (1) The construction of a sentence set that disentangles sentence structure from word meaning.

      (2) A comprehensive comparison of neural sentence representations (via fMRI), human behavior, and multiple computational models at the sentence level.

      Strengths:

      (1) The paper evaluates a wide variety of models, including layer-wise analysis for transformers and region-wise analysis in the human brain.

      (2) The stimulus design allows precise dissociation between lexical and sentence-level semantics. The RSA-based approach is empirically sound and intuitive.

      (3) The constructed sentences, along with the fMRI and behavioral data, represent a valuable resource for studying sentence representation.

      Weaknesses:

      (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.

      (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.

      (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.

      (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.

      (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.

      (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.

      (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.

    3. Reviewer #2 (Public review):

      Summary:

      The paper used fMRI data while reading a set of sentences. The sentences are designed to disentangle syntax from meaning. RSA was performed using voxel activations and a variety of language models. The results show that transformers are inferior to models with explicit syntactic representation in terms of matching brain representations.

      Strengths:

      (1) The study controls for some variables that allow for an investigation of sentence structure in the brain. This controlled setting has an advantage over naturalistic stimuli in targeting more specific linguistic phenomena.

      (2) The study combines fMRI data with behavioral similarity ratings and a variety of language models (static, transformers, graph-based models).

      Weaknesses:

      (1) The stimuli are not fully controlled for lexical content across conditions. Residual lexical differences between sentences could still influence both brain and model similarity patterns. To more cleanly isolate syntactic effects, it would be useful to systematically vary only a single structural element while keeping all other lexical content constant (e.g., the boy kicked the ball / the ball kicked the boy). It would be better to engage more with the minimal pair paradigm, which is widely used in large language model probing research.

      (2) The comparisons are done across fundamentally different model types, including static embeddings, graph-based parsers, and transformers. The inherent differences in dimensionality and training objectives might make the conclusion drawn from RSA inconclusive. Transformer embeddings typically occupy much higher-dimensional, anisotropic representational spaces, and their similarity structure may reflect richer, more heterogeneous information than models explicitly encoding semantic roles. A lower RSA correlation in this study does not necessarily imply that transformers fail to encode syntactic information; rather, they may represent additional aspects of meaning or context that diverge from the narrow structural contrasts probed here.

      (3) The interpretation of the RSA correlation largely depends on the understanding of models. The authors suggest that because hybrid models correlate better than transformers, this implies that transformers are inferior at representing syntax. However, this is not a direct test of syntactic ability. Transformers may encode syntactic information, but it may not be expressed in a way that aligns with the RSA paradigm or the chosen stimuli. RSA does not reveal what the model encodes, and the models might achieve a good correlation for non-syntactic reasons (e.g., length of sentence, orthographic similarity, lexical features).

    4. Reviewer #3 (Public review):

      Summary:

      Large Language Models have revolutionized Artificial Intelligence and can now match or surpass human language abilities on many tasks. This has fueled interest in cognitive neuroscience in exposing representational similarities between Language Models and brain recordings of language comprehension. The current study breaks from this mold by: (1) Systematically identifying sentence structures for which brain and Large Language Model representations diverge. (2) Demonstrating that brain representations for these sentences can be better accounted for by a model structured by the semantic roles of words in the sentence. As such, the study may now fuel interest in characterizing how Large Language Models and brain representations differ, which may prompt new, more brain-like language models.

      Strengths:

      (1) This study presents a bold and solid challenge to a literature trend that has touted similarities between Transformer models and human cognition based on representational correlations with brain activity. This challenge is substantiated by identifying sentences for which brain and model representations of sentences diverge and explaining those divergences using models structured by semantic roles/syntax.

      (2) This study conducts a rigorous pre-registered analysis of a comprehensive selection of the state-of-the-art Large Language Models, on a controlled sentence comprehension fMRI dataset. The analysis is conducted within a Representation Similarity framework to support similarity comparisons between graph structures and brain activity without needing to vectorize graphs. Transformer models are predicted and shown to diverge from brain representations on subsets of sentences with similar word-level content but different sentence structures.

      (3) The study introduces a 7T fMRI sentence comprehension dataset and accompanying human sentence similarity ratings, which may be a fruitful resource for developing more human-like language models. Unlike other model-based sentence datasets, the relation between grammatical structure and word-level content is controlled, and subsets of sentences for which models and brains diverge are identified.

      Weaknesses:

      (1) The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models, which are pinpointed here, in the general case, (some) Transformers are more human-like than the other models considered.

      (2) There may be confounds between the critical sentence structure manipulations and visual representations of sentence stimuli. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with visual cortex representations, and computational models tend to reflect the number of words/tokens/elements in sentences. Although the study commendably controls for confounds associated with sentence length, there could still be residual effects that remain. For instance, the Graph model correlates most strongly with the visual cortex despite these sentence length controls.

      (3) Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences here because different similarity metrics applied to the same model produce positive or negative correlations with brain data.

    5. Author response:

      We thank the reviewers for their insightful comments on our manuscript. Here we briefly highlight our responses to several issues raised by reviewers, and also provide a summary of planned changes to be made with the next draft.

      Reviewer 1:

      (1) The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We also report model correlations for each model separately in Fig S2. We will clarify this in our revised manuscript.

      (2) We agree with the reviewer that including a context-free grammar model as a comparison would be informative. We will incorporate this in the revised manuscript.

      (3) The reviewer raises questions about the low correlation between behavioural and brain similarities. While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We will provide additional discussion of the relationship between behavioural judgements and brain data in the revised manuscript.

      (4) The reviewer suggests contrasting our models with a ‘semantic ground truth’, as in our design matrix shown in Fig 1. While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. In particular, sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.

      (5) In the revised draft we will modify the location of Fig. 5 so that it flows better with the text.

      (6) We agree that the discussion of the differences between brain regions could be expanded. We will include this in the revised version of our manuscript. The reviewer questions our inclusion of the simple-average and group-average RSA analysis as they show similar results. We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.

      (7) We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.

      Reviewer 2:

      (1) The reviewer argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs, however this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest. Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. Most of these comparisons will not be between minimal pairs, but we selected sentences so as to provide a range of semantic similarities (low to high), while also providing for semantic contrasts of theoretical interest (such as the ‘swapped’ and ‘substituted’ sentence pairs). We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies. We will add additional detail in the revised manuscript providing additional explanation for how stimuli were chosen, and contrasting this with minimal pair approaches.

      (2) The reviewer notes that low RSA correlations do not imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning. The reviewer also notes that transformer embeddings are highly anisotropic, however we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli.

      (3) The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.

      Reviewer 3:

      (1) The reviewer emphasises the need for nuance in our conclusions, given that some of the transformers achieve higher correlations when assessed over the full set of sentences. We agree with this comment, and will modify the discussion section in the revised manuscript to address this point. Having said that, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We will highlight this issue more clearly in our revised manuscript.

      (2) The reviewer notes that despite our existing controls, residual confounds of sentence length may remain. We agree that this is a potential issue, and will add discussion to the revised manuscript. We also will present further supplementary analyses which we believe indicate that sentence length effects do not drive our main results. At the same time, we believe the fact that our results are robust to simultaneously controlling for sentence length and the ‘minimum length effect’ (Fig. S5) indicates they are not primarily driven by sentence length effects.

      (3) The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. An alterative approach of first embedding a graph into a vector and then training an encoding model on the graph embeddings has a similar limitation of being dependent not just on the graph representation, but also on the way it was embedded into a vector and the way the encoding model was trained. Arguably this process is more opaque than similarity methods, since it is unclear to what extent the graph embeddings preserve the logic and properties of a graph-based representation. Further, it not clear whether there is any single method which can overcome the difficulty of comparing distinct formalisms for representing semantics. The reviewer also highlights how the correlations measured for the syntax model differ greatly depending on whether the Smatch or WWLK similarity metrics are used. We believe this highlights the need for careful examination of commonly used graph similarity metrics, as has been noted in previous research. We will include additional discussion of this issue in our revised manuscript.

    1. eLife Assessment

      This useful study introduces a computational pipeline for designing RNA in situ fluorescence hybridization probes that could improve the sensitivity and specificity of RNA detection in cells. While the approach is novel and the preliminary data suggestive, the evidence supporting a clear advantage over existing probe design strategies is incomplete. The work will be of interest to researchers developing or using molecular tools for imaging RNA in cells.

    2. Reviewer #1 (Public review):

      The authors describe a new computational pipeline designed to identify smFISH probes with improved RNA detection compared to preexisting approaches. smFISH is a powerful and relatively straightforward technique to detect single RNAs in cells at subcellular resolution, which is critical for understanding gene expression regulation at the RNA level. However, existing methods for designing smFISH oligos suffer from several limitations, including off-target binding that produces high background signals, as well as a restricted number of probes that are sufficiently specific to target shorter-than-average mRNAs. To address these challenges, the authors developed TrueProbes, a computational method that aims to minimize off-target-mediated background fluorescence.

      Overall, the study addresses a technically relevant problem. If improved, this would allow researchers to study gene expression regulation more effectively using single-molecule FISH. However, based on the current presentation of data, it is not yet clear that TrueProbes offers significant advantages over preexisting pipelines. In the following section, I describe some concerns, which should be adequately addressed.

      Major Comments:

      (1) The manuscript currently presents only one example in which different pipelines were tested to generate probes (targeting ARF4). While the images suggest that both TrueProbes and Stellaris outperform the other pipelines, the comparison is potentially misleading because the number of probes used differs substantially. I recommend that the authors include at least three independent examples in which an equal number of probes are designed across pipelines, so that signal-to-noise can be assessed in a controlled and comparable way. This would allow the probe number to be held constant while directly evaluating performance.

      (2) It is also unclear how many biological replicates were performed for the ARF4 experiments. If only a single replicate was included, it is difficult to conclude that TrueProbes consistently outperforms other pipelines in a robust and reproducible manner. I suggest the authors include data from at least three biological replicates with appropriate statistical analysis, and ideally extend this to additional smFISH targets as outlined in Comment 1.

      (3) No controls are presented to demonstrate that the TrueProbes-designed smFISH spots are specifically detecting ARF4. The current experiment primarily measures signal-to-noise, but it remains possible that some detected spots do not correspond to ARF4 mRNAs. Since one of the major criteria used by TrueProbes is to limit cross-hybridization, the authors should perform ARF4 knockdown experiments and demonstrate that nearly all ARF4 smFISH signal is lost. A similar approach should be applied to the additional examples recommended in Comment 1.

      (4) In the limitations of the study, the authors note that "RNA secondary and tertiary structures are not included, which may lead to inaccuracies if binding sites are structurally occluded." However, I am not convinced that this is a true limitation, since formamide in the smFISH protocol should denature secondary structures and allow oligo access to the RNA. I recommend that the authors comment on this point and clarify whether secondary structure poses a practical limitation in smFISH probe design.

      (5) The authors also correctly acknowledge in their limitations that "RNA-protein interactions, which can modulate accessibility of the transcript, are not modeled." I suggest referencing relevant studies on this issue, particularly Buxbaum et al. (2014, Science), which would provide important context.

    3. Reviewer #2 (Public review):

      Summary:

      Hughes et al present a new single-molecule RNA fluorescence in situ hybridization (smFISH) probe design software, termed "TrueProbes" in this manuscript. They claim that all existing smFISH (and variants) probe design software packages have limitations that ultimately impact experimental performance. The author's claim to address the majority of these limitations in TrueProbes by introducing multiple computational steps to ensure high-quality probe design. The manuscript's goal is clear, and the authors provide some evidence by designing and targeting one gene. Overall, the manuscript lacks rigorous evidence to support the claims, does not demonstrate its suitability for a variety of smFISH-type experiments, and some of the provided quantification data are unclear. While TrueProbes clearly has potential, more data is required, or the authors should tone down the claims.

      Strengths:

      (1) The problem is well-articulated in the abstract and the introduction.

      (2) Figures 3 and 4 follow a consistent color scheme where each probe design method has its own color, which helps the reader visually compare methods.

      (3) The authors compared multiple probe design software packages both computationally and experimentally.

      (4) TrueProbes does produce visually and quantitatively better results when compared to 2 of the 4 existing smFISH probe design packages (Paintshop and MERFISH panel designer).

      (5) The authors introduce a comprehensive steady-state thermodynamic model to help optimally guide probe design.

      Weaknesses:

      (1) The abstract describes the problem well and introduces the solution (the TrueProbes software), but fails to provide specific ways in which the TrueProbes software performs better. The authors state that "...[TrueProbes] consistently outperformed alternatives across multiple computational metrics and experimental validation assays", but specific, quantitative evidence of improved performance would strengthen the statement.

      (2) The text claims that TrueProbes outperforms all other probe design software, but Figure 3 indicates that TrueProbes has neither the greatest number of on-target binding nor the lowest number of off-target binding. The data in Figure 3 does not support the claims made in the text. Specifically, the authors claim that "RNA FISH Experimental Results Demonstrate that Off Target and Binding Affinity Inclusive Probe Design Improve RNA FISH Signal Discrimination" (lines 217-218). However, despite their claim that Stellaris and Oligostan-HT produce more off-target probes when evaluated with the TrueProbes framework, the experiment results are nearly identical. The authors should consider modifying their claims or performing new experiments that more clearly demonstrate their claims.

      (3) The bar graphs in Figure 3 do not seem to agree with the probability graphs in Figure 4. For example, Figure 3 indicates that Stellaris probes have higher off-target binding than TrueProbes; however, in Figure 4, their probability graphs lie almost on top of each other.

      (4) The authors performed validation for only one gene (ARF4), because "...it had the highest gene expression (in TPM units) and the fewest isoforms among all candidate genes for the Jurkat cell line" (lines 176-177). While the results do look good, this is a minimal use case and does not really showcase the power of their method. One experiment that could be helpful would be two-color (or more) smFISH in tissue, where the chances for off-target binding contributing to higher errors are much greater than in an adherent cell line.

      (5) A common strategy for both smFISH and highly multiplexed methods is to use secondary DNA oligos with dye molecules instead of direct conjugation. Given that this is a primary design goal of PaintSHOP and the Zhuang lab's MERFISH probe design code, it would be helpful to demonstrate that TrueProbes can design a two-layer probe strategy for high-quality RNA-FISH labeling.

      (6) The authors claim, "For every probe set, TrueProbes can simulate expected smRNA FISH outcomes including optimal probe, RNA, and salt concentrations and optionally account for probe secondary structure, hybridization temperature, multiple targets, fluorophore choice, DNA, nascent RNA, and photon count statistics (Figures S2A, S2B). The model can be used to generate predictions for temperature and cell line sensitivity, multi-target discrimination, multiple fluorophore colocalization; when provided transcript expression levels and probe/background intensity, it can start to generate predictions for spot intensity, background, signal to noise ratio, and false negative rates (Figure S2C)." (lines 156-163). Figure S2 is a flow chart and does not provide evidence for any of these items. The authors should provide evidence for these claims, either as a figure or an example script in their software repository. If that is not possible, then it should be removed.

      (7) All thermodynamic equations are performed at steady state. The authors do not justify this assumption, and there is no discussion of the potential impacts of either low molecule numbers or violations of the well-mixed assumption. Can the authors please include a discussion on the potential impacts non non-steady state dynamics?

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a new platform termed "TrueProbes" for designing mRNA FISH probes. In comparison to existing design strategies, the authors incorporate a comprehensive thermodynamic and kinetic model to account for probe states that may contribute to nonspecific background. The authors validate their design pipeline using Jurkat cells and provide evidence of improved probe performance.

      Strengths:

      A notable strength of TrueProbes is the consideration of genome-wide binding affinities, which aims to minimize off-target signals. The work will be of interest to researchers employing mRNA FISH in certain human cell lines.

      Weaknesses:

      However, in my view, the experimental validation is not sufficient to justify the broad claims of the platform. Given the number of assumptions in the model, additional experimental comparisons across probe design methods, ideally targeting transcripts with different expression levels, would be necessary to establish the general superiority of this approach.

    5. Author response:

      Reviewer #1 (Public Review):

      The authors describe a new computational pipeline designed to identify smFISH probes with improved RNA detection compared to preexisting approaches. smFISH is a powerful and relatively straightforward technique to detect single RNAs in cells at subcellular resolution, which is critical for understanding gene expression regulation at the RNA level. However, existing methods for designing smFISH oligos suffer from several limitations, including off-target binding that produces high background signals, as well as a restricted number of probes that are sufficiently specific to target shorter-than-average mRNAs. To address these challenges, the authors developed TrueProbes, a computational method that aims to minimize off-target-mediated background fluorescence.

      Overall, the study addresses a technically relevant problem. If improved, this would allow researchers to study gene expression regulation more effectively using single-molecule FISH. However, based on the current presentation of data, it is not yet clear that TrueProbes offers significant advantages over preexisting pipelines. In the following section, I describe some concerns, which should be adequately addressed.

      Major Comments:

      (1) The manuscript currently presents only one example in which different pipelines were tested to generate probes (targeting ARF4). While the images suggest that both TrueProbes and Stellaris outperform the other pipelines, the comparison is potentially misleading because the number of probes used differs substantially. I recommend that the authors include at least three independent examples in which an equal number of probes are designed across pipelines, so that signal-to-noise can be assessed in a controlled and comparable way. This would allow the probe number to be held constant while directly evaluating performance.

      This is an important observation. We have already addressed this issue in Figures 3E-G and Supplementary Figure 4E-G, where we plotted the number of OFF-targets for each ON-target probe. If we select longer genes to ensure an equal number of designed probes with strong signals, we will still end up with the same number of ON-target probes. Consequently, Figures 3B-D and 3E-G would show similar trends, albeit with different values on the y-axis. Additionally, we will conduct an analysis using Stellaris at its highest probe design stringency setting to compare the software under its strictest design conditions. Additional experiments are outside the scope of the current manuscript.

      (2) It is also unclear how many biological replicates were performed for the ARF4 experiments. If only a single replicate was included, it is difficult to conclude that TrueProbes consistently outperforms other pipelines in a robust and reproducible manner. I suggest the authors include data from at least three biological replicates with appropriate statistical analysis, and ideally extend this to additional smFISH targets as outlined in Comment 1.

      Three biological replicates were utilized for the ARF4 experiments. As stated in the original submission, the average data from all three replicates is presented in Figure 4, while the data for each individual replicate can be found in Figure S5. Statistical analyses were conducted for both the pooled data in Figure 4 and the individual data in Figure S5. The results of all statistical calculations are detailed in Supplemental Table 1. We will update the text to clearly indicate the number of biological replicates and the outcomes of the statistical analysis.

      (3) No controls are presented to demonstrate that the TrueProbes-designed smFISH spots are specifically detecting ARF4. The current experiment primarily measures signal-to-noise, but it remains possible that some detected spots do not correspond to ARF4 mRNAs. Since one of the major criteria used by TrueProbes is to limit cross-hybridization, the authors should perform ARF4 knockdown experiments and demonstrate that nearly all ARF4 smFISH signal is lost. A similar approach should be applied to the additional examples recommended in Comment 1.

      Thank you for your suggestion. Currently, we lack the expertise in our lab to conduct such experiments, so they are beyond the scope of this manuscript. However, we will create additional supplementary figures to demonstrate that the likelihood of false positives is low, based on the assumption that current publicly available BLAST algorithms, genome annotations, and reference transcription expression data are accurate.

      We will include a comparison in our supplementary materials showing the off-target RNA that can bind the highest number of probes simultaneously for each software. Additionally, we will perform a correlation analysis to illustrate the relationship between spot intensity for different software and the number of probes they design. This will help us estimate how the number of probes bound to RNA correlates with expected spot intensity ranges.

      Using this information, along with autofluorescence background intensity measurements from no-probe controls, we will estimate the minimum number of probes that need to bind to targets to be detected as single spots. If this minimum is higher than the maximum number of simultaneous off-target probe bindings, we anticipate that the detected spot signal will primarily reflect ARF4 rather than other transcripts.

      (4) In the limitations of the study, the authors note that "RNA secondary and tertiary structures are not included, which may lead to inaccuracies if binding sites are structurally occluded." However, I am not convinced that this is a true limitation, since formamide in the smFISH protocol should denature secondary structures and allow oligo access to the RNA. I recommend that the authors comment on this point and clarify whether secondary structure poses a practical limitation in smFISH probe design.

      Thank you for pointing this out. We will revise the manuscript to clarify: "We did not include RNA secondary and tertiary structures in the model because the use of formamide in RNA-FISH experiments denatures these structures, allowing oligonucleotides to access the RNA."

      (5) The authors also correctly acknowledge in their limitations that "RNA-protein interactions, which can modulate accessibility of the transcript, are not modeled." I suggest referencing relevant studies on this issue, particularly Buxbaum et al. (2014, Science), which would provide important context.

      Thank you for highlighting the literature that supports this limitation. We will include Buxbaum et al. (2014, Science) and additional studies that discuss how RNA-protein interactions can affect RNA-FISH experiments.

      Reviewer #2 (Public review):

      Summary:

      Hughes et al present a new single-molecule RNA fluorescence in situ hybridization (smFISH) probe design software, termed "TrueProbes" in this manuscript. They claim that all existing smFISH (and variants) probe design software packages have limitations that ultimately impact experimental performance. The author's claim to address the majority of these limitations in TrueProbes by introducing multiple computational steps to ensure high-quality probe design. The manuscript's goal is clear, and the authors provide some evidence by designing and targeting one gene. Overall, the manuscript lacks rigorous evidence to support the claims, does not demonstrate its suitability for a variety of smFISH-type experiments, and some of the provided quantification data are unclear. While TrueProbes clearly has potential, more data is required, or the authors should tone down the claims.

      We appreciate the reviewer’s thoughtful feedback. We will revise the text to ensure that all claims are backed by computational or experimental evidence. For claims that do not have supporting results, we will relocate them to the discussion section as potential future extensions. Since our probe design is open access, both we and the community can further develop our codes as needed.

      Strengths:

      (1) The problem is well-articulated in the abstract and the introduction.

      (2) Figures 3 and 4 follow a consistent color scheme where each probe design method has its own color, which helps the reader visually compare methods.

      (3) The authors compared multiple probe design software packages both computationally and experimentally.

      (4) TrueProbes does produce visually and quantitatively better results when compared to 2 of the 4 existing smFISH probe design packages (Paintshop and MERFISH panel designer).

      (5) The authors introduce a comprehensive steady-state thermodynamic model to help optimally guide probe design.

      We like to thank the reviewer for pointing out the strength of the manuscript.

      Weaknesses:

      (1) The abstract describes the problem well and introduces the solution (the TrueProbes software), but fails to provide specific ways in which the TrueProbes software performs better. The authors state that "...[TrueProbes] consistently outperformed alternatives across multiple computational metrics and experimental validation assays", but specific, quantitative evidence of improved performance would strengthen the statement.

      Thank you for acknowledging the clarity of the abstract and introduction. We will revise the abstract to provide more specific details on how TrueProbes outperforms other software. Additionally, we will include specific computational and experimental metrics that demonstrate TrueProbes' improved performance compared to other software.

      (2) The text claims that TrueProbes outperforms all other probe design software, but Figure 3 indicates that TrueProbes has neither the greatest number of on-target binding nor the lowest number of off-target binding. The data in Figure 3 does not support the claims made in the text. Specifically, the authors claim that "RNA FISH Experimental Results Demonstrate that Off Target and Binding Affinity Inclusive Probe Design Improve RNA FISH Signal Discrimination" (lines 217-218). However, despite their claim that Stellaris and Oligostan-HT produce more off-target probes when evaluated with the TrueProbes framework, the experiment results are nearly identical. The authors should consider modifying their claims or performing new experiments that more clearly demonstrate their claims.

      In Figure 3, we aim to convey two main points. 

      The first point is to compare the number of ON-target probes designed by each software using their most stringent design criteria (Figure 3A). Currently, we are using a medium strict design criterion for Stellaris (level 3). As shown in the new supplementary figure XX, when we apply the most stringent design criteria for Stellaris (level 5), the number of ON-target probes decreases to XX probes. This clearly indicates that, based on theoretical calculations, TrueProbes can design more probes than any of its competitors.

      The second point is to compare the number of OFF-targets produced by each probe design. To illustrate this, we used two different metrics. In Figures 3B-D, we compare the total number of probes bound to OFF-target RNA. However, since each software generates a different number of ON-target probes, the number of OFF-targets may vary simply due to the differences in ON-target probe counts. Therefore, we introduced a second metric to compare OFF-targets. In Figures 3E-G, we present the number of OFF-targets normalized by the number of ON-targets. Using this metric, TrueProbes shows the lowest number of OFF-targets. We will updat the manuscript to clarify this point.

      Regarding the experiments and their comparison to theoretical calculations: The theoretical calculations consider only the reference DNA and RNA genomes along with the oligonucleotide sequences for the probes. We then use a thermodynamic model to identify ON- and OFF-targets. Thus, these theoretical calculations represent an upper bound on the maximum possible number of ON-targets and the minimum number of OFF-targets. All other design software evaluated in this manuscript relies on the same or less reference data and makes certain assumptions. None of these methods quantitatively compare their computational designs with experimental results; they simply design probes based on unverified assumptions, conduct experiments, and present spot data to conclude that their probe designs are effective.

      We will update the manuscript to clarify the goals of the theoretical model and its relationship to the experiments. Future work will be necessary to enhance our theoretical model to fully account for additional aspects of RNA-FISH experiments (e.g., formaldehyde crosslinking, hybridization conditions, washing steps) to better predict the experimental data shown in Figure 4. We will also adjuste our claims to accurately reflect the current capabilities of our theoretical framework and its relation to experimental outcomes.

      (3) The bar graphs in Figure 3 do not seem to agree with the probability graphs in Figure 4. For example, Figure 3 indicates that Stellaris probes have higher off-target binding than TrueProbes; however, in Figure 4, their probability graphs lie almost on top of each other.

      The predictions in Figure 3 regarding the number of probe off-target binding events, based on reference gene expression data, do not necessarily encompass all the information required to predict RNA-FISH signal intensity. Therefore, these predictions should not be expected to translate directly into the experimental results shown in Figure 4, particularly concerning the background signal.

      While our software aims to minimize off-target probe binding, this does not automatically lead to a reduction in off-target background signal. Numerous other factors influence the spot background and overall signal-to-noise ratio (SNR) performance, beyond just probe-target binding interactions. Although we strive to minimize off-target background through probe binding, this approach is not designed to directly predict the SNR. Extending the computational analysis of probe binding dynamics to RNA-FISH signal intensity dynamics is beyond the scope of this study.

      We have revised our text to clearly separate computational results from experimental results into two distinct sections. We will use different terminology to describe the outcomes of computational performance versus experimental performance, reducing potential confusion between these two aspects. Additionally, we will clarify our conceptual overview in Figure 1 regarding traditional probe design limitations related to sensitivity and specificity. We will specify how the signal from the number of probes bound to ON-target RNA, relative to those bound to OFF-targets and cellular autofluorescence, translates—either linearly or non-linearly—into the signal-to-noise ratio.

      (4) The authors performed validation for only one gene (ARF4), because "...it had the highest gene expression (in TPM units) and the fewest isoforms among all candidate genes for the Jurkat cell line" (lines 176-177). While the results do look good, this is a minimal use case and does not really showcase the power of their method. One experiment that could be helpful would be two-color (or more) smFISH in tissue, where the chances for off-target binding contributing to higher errors are much greater than in an adherent cell line.

      Thank you for highlighting these valuable experiments. Currently, our lab lacks the expertise to generate tissue samples beyond culturing cells. Additionally, implementing a two-color probe design in tissues containing different cell types with unknown expression levels presents further challenges. Due to these limitations, designing and conducting two-color experiments in tissue samples is beyond the scope of the current manuscript, but we plan to pursue this in the future.

      (5) A common strategy for both smFISH and highly multiplexed methods is to use secondary DNA oligos with dye molecules instead of direct conjugation. Given that this is a primary design goal of PaintSHOP and the Zhuang lab's MERFISH probe design code, it would be helpful to demonstrate that TrueProbes can design a two-layer probe strategy for high-quality RNA-FISH labeling.

      Thank you for bringing this to our attention. TrueProbes is currently designed and tested specifically for primary smRNA-FISH probes. Our focus is on demonstrating a new approach to designing these probes without the added complexities of secondary probes and multiplexing. Future work will expand on this foundation to incorporate secondary probe detection and transcript multiplexing.

      (6) The authors claim, "For every probe set, TrueProbes can simulate expected smRNA FISH outcomes including optimal probe, RNA, and salt concentrations and optionally account for probe secondary structure, hybridization temperature, multiple targets, fluorophore choice, DNA, nascent RNA, and photon count statistics (Figures S2A, S2B). The model can be used to generate predictions for temperature and cell line sensitivity, multi-target discrimination, multiple fluorophore colocalization; when provided transcript expression levels and probe/background intensity, it can start to generate predictions for spot intensity, background, signal to noise ratio, and false negative rates (Figure S2C)." (lines 156-163). Figure S2 is a flow chart and does not provide evidence for any of these items. The authors should provide evidence for these claims, either as a figure or an example script in their software repository. If that is not possible, then it should be removed.

      The supplemental information of the article will be updated to include figures that illustrate predictions for each capability currently offered by TrueProbes, along with the scripts used to generate these predictions. Any capabilities that do not have corresponding scripts will be removed from this section and instead referred to as potential improvements or future additions to the TrueProbes framework in the discussion section.

      (7) All thermodynamic equations are performed at steady state. The authors do not justify this assumption, and there is no discussion of the potential impacts of either low molecule numbers or violations of the well-mixed assumption. Can the authors please include a discussion on the potential impacts non non-steady state dynamics?

      Thermodynamic equations are calculated at steady state because RNA-FISH hybridization reactions typically last from eight to twenty hours. This duration allows probes adequate time to localize to their targets and reach binding equilibrium, based on current estimates of DNA oligonucleotide association and dissociation rate constants. We will address the potential violation of the well-mixed assumption in the assumptions and limitations section, specifically discussing how RNA localization can affect the spatial distribution of both on-target and off-target probes within cells, which may disrupt the well-mixed condition.

      Low molecule numbers are not a significant concern, as probe DNA oligonucleotide concentrations in RNA-FISH protocols are much higher than the number of transcripts present in cells, by several orders of magnitude.

      The assumptions and limitations section will be revised to clearly state: “Probe hybridization reactions were computed at steady state because most RNA-FISH protocols utilize probe hybridization incubation steps lasting over eight hours, which should provide sufficient time to reach equilibrium based on current estimates of forward and reverse reaction rate constants. Predictions from the equilibrium model may be less accurate for RNA-FISH experiments with shorter hybridization times, where non-steady state dynamics can result in different transient outcomes depending on the duration of hybridization.”

      Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a new platform termed "TrueProbes" for designing mRNA FISH probes. In comparison to existing design strategies, the authors incorporate a comprehensive thermodynamic and kinetic model to account for probe states that may contribute to nonspecific background. The authors validate their design pipeline using Jurkat cells and provide evidence of improved probe performance.

      Strengths:

      A notable strength of TrueProbes is the consideration of genome-wide binding affinities, which aims to minimize off-target signals. The work will be of interest to researchers employing mRNA FISH in certain human cell lines.

      Weaknesses:

      However, in my view, the experimental validation is not sufficient to justify the broad claims of the platform. Given the number of assumptions in the model, additional experimental comparisons across probe design methods, ideally targeting transcripts with different expression levels, would be necessary to establish the general superiority of this approach.

      We will revise our text to make our claims more specific and clearer, avoiding overgeneralizations and ensuring that all claims are adequately supported by the data we present.

    1. eLife Assessment

      This set of experiments provides a valuable finding regarding the need for prior inhibitory training to recruit the infralimbic cortex in extinction learning. The multiple clever behavioral designs supply converging lines of evidence in a compelling manner, but several issues, such as the group sizes and appropriate analysis of data, render the overall strength of support incomplete. With these issues resolved, this manuscript will be of interest to behavioral neuroscientists, especially those interested in learning & memory and/or cortical function.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      (4) Incomplete presentation of conditioning data.

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect non-specific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (Do-Monte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      Impact:

      The role of IL in extinction retrieval remains a central question in the fear learning literature. However, because the test used conflates extinction retrieval with new learning and the manipulations lack cell-type specificity, the evidence presented here does not convincingly support the main claims. The study highlights the need for more precise manipulations and more rigorous behavioral testing to resolve this issue.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      (2) Very clear representation of groups and experimental design for each figure.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

    4. Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      We thank the Reviewer for their positive assessment.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      ChR2 was intentionally expressed in the infralimbic cortex (IL) without distinction between local neuronal populations for two reasons. First, this manuscript aimed to uncover some of the features characterizing the encoding of inhibitory memories in the IL, and this encoding likely engages interactions among various neuronal populations within the IL. Second, the hypotheses tested in the manuscript derived from findings that indiscriminately stimulated the IL using the GABA<sub>A</sub> receptor antagonist picrotoxin, which is best mimicked by the approach taken. We agree that it is also important to determine the respective contributions of distinct IL neuronal populations to inhibitory encoding; however, the global approach implemented in the present experiments represents a necessary initial step. This rationale will be incorporated into the revised manuscript, which will also make reference to the need to identify the relative contributions of the various neuronal populations within the IL. 

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      It is unclear when retrieval of what has been learned across extinction ceases and additional extinction learning occurs. In fact, it is only the first stimulus presentation that unequivocally permits a distinction between retrieval and additional extinction learning, as the conditions for this additional learning have not been fulfilled at that presentation. However, confining evidence for retrieval to the first stimulus presentation introduces concerns that other factors could influence performance. For instance, processing of the stimulus present at the start of the session may differ from that present at the end of the previous session, thereby affecting what is retrieved. Such differences between the stimuli present at the start and end of an extinction session have been long recognized as a potential explanation for spontaneous recovery (Estes, 1955). More importantly, whether the test data presented confound retrieval and additional extinction learning or not, the interpretation remains the same with respect to the effects of a prior history of inhibitory learning on enabling the facilitative effects of IL stimulation. Finally, it is unclear how these facilitative effects could occur in the absence of the subjects retrieving the extinction memory formed under the stimulation. Nevertheless, the revised manuscript will provide the trial-by-trial performance during the post-extinction retrieval tests and discuss this issue.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      Efforts were made to match group performance upon completion of each training stage and before IL stimulation. Unfortunately, these efforts were not completely successful due to exclusions following post-mortem analyses. However, we acknowledge that the unexpected interactions deserve further discussion, and this will be incorporated into the revised manuscript (see also comment from Reviewer 2). Although we cannot exclude that sample sizes may have contributed to some of these interactions, we remain confident about the reliability of the main findings reported, especially given their replication across the various protocols. Overall, the manuscript provides evidence that IL stimulation does not facilitate brief extinction in the absence of prior inhibitory experience in five different experiments, replicating previous findings (Lingawi et al., 2018; Lingawi et al., 2017). It also replicates these previous findings by showing that prior experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the facilitative effects of such stimulation following fear or appetitive backward conditioning are replicated in the present manuscript.  

      (4) Incomplete presentation of conditioning data.

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      We apologize, as we incorrectly labeled the X axis for the backward conditioning data set in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. This error will be corrected in the revised manuscript.

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect non-specific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (Do-Monte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      As noted above, the revised manuscript will show that the interpretations of the main findings stand whether ore the test data confounds retrieval with additional extinction learning. The revised manuscript will also clarify the plotting of the data for the backward conditioning stages. We do agree that further discussion of the unexpected interactions is necessary, and this will also be incorporated into the revised manuscript. However, the various replications of the core findings provide strong evidence for their reliability and the interpretations advanced in the original manuscript. The proposal that the results reflect non-specific facilitation or disruption of behavior seems highly unlikely. Indeed, the present experiments and previous findings (Lingawi et al., 2018; Lingawi et al., 2017) provide multiple demonstrations that IL stimulation fails to produce any facilitation in the absence of prior inhibitory experience with the target stimulus. Although these demonstrations appear inconsistent with previous studies (Do-Monte et al., 2015; Chen et al., 2021), this inconsistency is likely explained by the fact that these studies manipulated activity in specific IL neuronal populations. Previous work has already revealed differences between manipulations targeting discrete IL neuronal populations as opposed to general IL activity (Kim et al., 2016). Importantly, as previously noted, the present manuscript aimed to generally explore inhibitory encoding in the IL that, as we will acknowledge, is likely to engage several neuronal populations within the IL. Adequate statements on these matters will be included in the revised manuscript.

      Impact:

      The role of IL in extinction retrieval remains a central question in the fear learning literature. However, because the test used conflates extinction retrieval with new learning and the manipulations lack cell-type specificity, the evidence presented here does not convincingly support the main claims. The study highlights the need for more precise manipulations and more rigorous behavioral testing to resolve this issue.

      As noted in our responses, the interpretations of the data presented remain identical whether the test data conflate extinction retrieval with additional extinction learning or not. Although we agree that it is important to establish the role of specific IL neuronal populations in extinction learning, this was beyond the scope of the manuscript and the findings reported remain valuable to our understanding of inhibitory encoding within the IL.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure.

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      As noted (see response to Reviewer 1), efforts were made daily to match group performance across the training stages, but these efforts were ultimately hampered by the necessary exclusions following post-mortem analyses. This will be made explicit in the revised manuscript. Regarding freezing during Extinction 1, as noted by the Reviewer, the difference, which was not statistically significant, was absent across trials during the subsequent forward fear conditioning stage. Likewise, the protocol difference observed during the initial forward fear conditioning was absent in subsequent stages. We are therefore confident that these initial differences (significant or not) did not impact the main findings at test. Importantly, these findings replicate previous work using identical protocols in which no differences were present during the training stages. These considerations will be addressed in the revised manuscript.

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      We apologize, as noted above, we incorrectly labeled the X axis for the backward conditioning data sets in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. The data shown in these Figures use the average of all trials on a given day. This will be clarified in the methods section of the revised manuscript. The labeling errors on the Figures will be corrected.

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      We agree with the Reviewer that further discussion of the Protocol x Virus interaction that emerged during the backward conditioning and forward conditioning stages of Experiment 3 is warranted. This will be provided in the revised manuscript. Briefly, during both stages, follow-up analyses did not reveal any differences (main effects or interactions) between the two groups trained with the light stimulus (Diff-EYFP and Diff-ChR2). By contrast, the ChR2 group trained with the tone (Back-ChR2) froze more overall than the EYFP group (Back-EYFP), but there were no other significant differences between the two groups. Based on these analyses, the Protocol x Virus interaction appears to be driven by greater freezing in the ChR2 group trained with the tone rather than a difference in the backward conditioning performance based on stimulus identity. Consistent with this, the statistical analyses did not reveal a main effect of Protocol during either the backward conditioning stage or the stimulus trials during the forward conditioning stage. Nevertheless, during this latter stage, a main effect of Protocol emerged during baseline performance, but once again, this seems to be driven by the Back-ChR2 group. Critically, it is unclear how greater stimulus freezing in the Back-ChR2 group during forward conditioning would lead to lower freezing during the post-extinction retrieval test.  

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      We confirm that overall, there was a significant decline in freezing across the extinction session shown in Figure 4B. The Reviewer is correct to point out that this decline was modest (if not negligible) in the Diff-EYFP group, which was receiving its first inhibitory training with the target tone stimulus. It is worth noting that across all experiments, most groups that did not receive infralimbic stimulation displayed a modest decline in freezing during the extinction session since it was relatively brief, involving only 6 or 8 tone alone presentations. This was intentional, as we aimed for the brief extinction session to generate minimal inhibitory learning and thereby to detect any facilitatory effect of infralimbic stimulation. This issue will be clarified and explained in the revised version of the manuscript.

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

      In line with the Reviewer’s suggestion (see also Reviewer 3), the revised manuscript will include a discussion of the broader implications of the findings regarding inhibitory brain circuitry and will acknowledge the need to further explore sex differences and IL functions.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      All experimental parameters were based on previously published experiments showing the capacity of the backward conditioning protocols to generate inhibitory learning and the forward conditioning protocols to produce excitatory learning. Although this was mentioned in the methods section, we acknowledge that further explanation is required to justify the need for multiple days of backward training. This will be provided in the revised manuscript.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

      The revised manuscript will make an effort to condense the discussion and focus on broader implications for the literature.

      References

      Chen, Y.-H., Wu, J.-L., Hu, N.-Y., Zhuang, J.-P., Li, W.-P., Zhang, S.-R., Li, X.-W., Yang, J.-M., & Gao, T.-M. (2021). Distinct projections from the infralimbic cortex exert opposing effects in modulating anxiety and fear. J Clin Invest, 131(14), e145692. https://doi.org/10.1172/JCI145692

      Do-Monte, F. H., Manzano-Nieves, G., Quiñones-Laracuente, K., Ramos-Medina, L., & Quirk, G. J. (2015). Revisiting the role of infralimbic cortex in fear extinction with optogenetics. J Neurosci, 35(8), 3607-3615. https://doi.org/10.1523/JNEUROSCI.3137-14.2015

      Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychol Rev, 62(3), 145-154. https://doi.org/10.1037/h0048509

      Kim, H.-S., Cho, H.-Y., Augustine, G. J., & Han, J.-H. (2016). Selective Control of Fear Expression by Optogenetic Manipulation of Infralimbic Cortex after Extinction. Neuropsychopharmacology, 41(5), 1261-1273. https://doi.org/10.1038/npp.2015.276

      Lingawi, N. W., Holmes, N. M., Westbrook, R. F., & Laurent, V. (2018). The infralimbic cortex encodes inhibition irrespective of motivational significance. Neurobiol Learn Mem, 150, 64-74. https://doi.org/10.1016/j.nlm.2018.03.001

      Lingawi, N. W., Westbrook, R. F., & Laurent, V. (2017). Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex. Cereb Cortex, 27(12), 5547-5556. https://doi.org/10.1093/cercor/bhw322

    1. eLife Assessment

      This important study presents convincing findings on creating an exhaustive library of new enhancer-AAVs targeting astrocytes and oligodendrocytes with high potential for both basic and translational work, which will be of value to a large and growing community. However, the outdated description of glial biology in the Introduction, the overstated claims of utility in the Conclusion, and the loose stringency in the criteria used to assemble the library diminish the strengths of the claims. The work will be of interest to neuroscientists working on glial cell biology.

    2. Reviewer #1 (Public review):

      The goal of this study was to generate a library of new enhancer-driven AAVs in order to selectively and efficiently target astrocytes and oligodendrocytes in rodents. The implied criteria are that such viral vectors should have high specificity for the intended cell type and effectively express in all astrocytes/oligos in the brain or, alternatively, be specific for defined brain regions, layers, or subtypes of astrocytes/oligos. In addition, they should be compatible with intravenous retro-orbital delivery to facilitate experimentation and brain-wide targeting (i.e., show organ specificity and high efficiency in the brain). Ideally, these new AAVs would also maintain their characteristics across disease contexts and show applicability in non-human primates. Tools with such characteristics are generally lacking in studying glial cells and would be extremely useful to scale up and accelerate glial research, allowing targeting of astrocytes/oligos with distinct molecular identity and intersectional strategies.

      At present, however, none of the enhancer-AAVs presented in the study seems to meet this combination of criteria, at least not with the level of stringency typically expected in the field. The main reason is that, in its current form, the study does not present one candidate AAV iteratively improved to meet all these criteria; instead, it presents a catalogue of new AAVs with various degrees of specificity, completeness, and mixed characteristics. Therefore, their utility should be interpreted cautiously. Moreover, the way specificity and completeness are intermixed in the analysis makes it difficult to evaluate the actual utility of any given AAV. The study might have been strengthened by focusing on a small set of the most promising candidates (i.e., AiE0890m_3x2C) and validating them thoroughly for expression specificity, completeness, effective cargo expression, ability to allow specific pan-astrocyte or astrocyte-subtype targeting in vivo, and preserved properties in NHPs and in disease, as this would encourage their adoption by the community. Currently, too many AAVs are assessed inconsistently against the desired criteria, with none being evaluated through and through.

      The impact of the catalogue is also greatly diminished by the fact that a suite of AAVs with outstanding specificity and efficiency is already available for the study of astrocytes (e.g., 4x6T AAVs) and was not utilized as a standard to benchmark the new library, making it difficult to appreciate the relative benefits of the new AAVs. The inclusion of expression data in NHPs is very significant, but benchmarking against established AAVs would also be needed to fully appreciate their value.

      Importantly, readers should also be aware that the study seems noticeably limited in its literacy with glial biology. The introduction and discussion frame the field in a way that seems outdated, creating the impression that the diverse roles of glia in health and disease have not yet been studied, which may inadvertently be perceived as dismissive and stigmatizing.

      In summary, the paper introduces potentially useful viral tools and lays the foundations for future multiplexed targeting of distinct glial cell subpopulations in rodents and in non-human primates, which are extremely important directions. Some of the regionally restricted or even sparsely expressed AAVs may prove valuable in enabling subpopulation-specific targeting or molecular profiling strategies, but currently lack full benchmarking. At present, the promises over the utility of the new tools seem overstated, and the library may not yet represent an actionable resource for targeting astrocytes and oligodendrocytes.

    3. Reviewer #2 (Public review):

      Enhancer elements are regulatory DNA sequences that are capable of driving specific expression patterns. As these elements are generally short and context-independent, enhancers can be used in expression vectors (e.g., packaged in an adeno-associated virus, AAV) to limit expression to target cell populations. This approach was identified as a major strategy for cell-type-specific manipulation in the brain and has been pursued by both standard research studies as well as large-scale efforts led by the BRAIN Initiative. This manuscript describes a major effort to generate enhancer-AAVs targeting astrocytes and oligodendrocytes orchestrated by a large research team led by the Allen Institute for Brain Science. This manuscript parallels other recent publications describing sets of enhancer-AAVs, following rigorous, similar methods with relatively broad testing and application.

      To identify and screen candidate enhancers, the scientists prioritized candidates via analysis of single-nucleus accessibility and methylation datasets (i.e., snATAC-seq) and tested them in mice. The scientists prioritized candidate enhancers that exhibited specificity of accessibility in the target cell type. Following selection, the scientists cloned the candidate sequences into AAV vectors with a minimal promoter and reporter gene, packaged the virus, delivered it to the mouse brain, and screened for activity based on reporter expression. Candidates that passed initial screening were further characterized via imaging and sorting, followed by single-cell RNA-seq. This process had around a 50% success rate and yielded 25 astrocyte and 21 oligodendrocyte enhancer-AAVs with the targeted cell-type-specific expression patterns.

      The scientists went on to test for subtype-specific activity patterns, finding wide diversity in astrocyte activities across sub-populations and conversely, homogenous oligodendrocyte activation. They optimized a few of these via concatenating the enhancer core sequence to increase expression levels of the reporter gene and showed strong specificity and completeness of cell targeting for a set of these enhancer-AAVs. Following characterization and validation, they then deployed these enhancer-AAVs in a number of demonstration applications to show the utility for basic and translational science. All the constructs developed here are available for public use via Addgene, ensuring that these new tools can be used by other researchers.

      There really are no obvious weaknesses in the work presented here, from the generation of the enhancer-AAVs to use in sophisticated validation studies. The enhancer-AAV testing is rigorous and provides critical information necessary for other scientists to select and use these constructs. The applications demonstrate the power of enhancer-AAV approaches. The toolbox presented here may not enable specific targeting of all relevant cellular subtypes or activity states for astrocytes and oligodendrocytes, and future work will be needed to fully understand the activity of the enhancers, identity of the target cell types, and context-dependent utility of these constructs. However, the set of enhancer-AAVs developed here should be transformative for researchers working on accessing and manipulating these cell types and have a major impact on the field.

    1. eLife Assessment

      This paper presents a collection of analyses relating structure and function in the whole-brain Drosophila EM connectome and whole-brain calcium imaging data. The linkage of detailed anatomical structure with population activity is of broad interest in circuit neuroscience in light of increasingly detailed brain maps, but the analysis methods used made the evidence incomplete. The conclusions are useful for specific network observations, but a more thorough analysis of the anatomical and functional data is needed to support the overall claims.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors analyze connectome data from Drosophila and compare the physical wiring with functional connectivity estimated from calcium imaging data. They quantify structure-function relationships as a correlation of the two connectivity modalities. They report correlations roughly comparable to what has been described in the literature on sc/fc relationships in mammalian connectome data at the meso-scale. They then repeat their analysis, focusing on segregated versus unsegregated synapses. They derive separate connectomes using one or the other class of synapse. They show differential contributions to the sc/fc relationships by segregated versus unsegregated synapses.

      Strengths:

      There is nice synthesis of multimodal imaging data (Ca and EM data from flies and meso-scale data from human and marmoset).

      Weaknesses:

      (1) The paper is written in an unusual way. The introduction intermingles results with background, making it hard to figure out what precisely is being tested.

      (2) There are also major methodological gaps. Though the mammalian connectomes are used as a point of reference, no descriptions of their origins or processing are included.

      (3) A major weakness stems from the actual calculation of the sc/fc correlation. In general, SC is sparse. In the case of the EM connectomes, it is *exceptionally* sparse (most neural elements are not connected to one another). The authors calculated sc/fc coupling by correlating the off-diagonal elements of sc (the logarithm of its edge weights) and fc matrices with one another. The logarithmic transformation yields a value of infinity for all zero entries. The authors simply impute these elements with 0. This makes no sense and, depending on whether these zero elements are distributed systematically versus uniformly random, could either inflate or deflate the sc/fc correlations. Care must be taken here.

      (4) Further, in constructing the segregated versus unsegregated connectomes, they use absolute thresholds for collecting synapses. It is unclear, however, whether similar numbers of synapses were included in both matrices. If the number is different, that might explain the differential relationship with fc; one matrix has more non-zero entries (and as noted earlier, those zero entries are problematic).

      (5) There was also considerable text (in the results) describing the processing of the Ca data. In this section, the authors frequently refer to some pipelines as "better" or "worse" (more or less effective). But it is not clear what measures they adopted to assess the effectiveness of a pipeline.

    3. Reviewer #2 (Public review):

      Summary:

      Okuno et al. investigate the structure-function relationship in the fruit fly Drosophila melanogaster. To do so, they combine published data from two recent synapse-level connectomes ("hemibrain" and "FlyWire") with a dataset comprising functional whole-brain calcium imaging and behavioural data. First, they investigate the applicability of fMRI pre-processing techniques on data from calcium imaging. They then cross-correlate this pre-processed functional data with structural data extracted from the connectomes, including a comparison to humans. The authors proceed to compare the two connectomes and find significant differences, which they attribute to differences in the accuracy of the synapse detections. Next, they present a novel algorithm to quantify whether neurons are segregated (pre- and postsynapses are spatially separate) or unsegregated (pre- and postsynapses are mixed). Using this approach, they find that unsegregated neurons may contribute more to function than segregated neurons. Applying a general linear model to the functional dataset suggests that activity in two brain areas (Wedge and AVLP) is suppressed during walking. The authors identify a GABAergic neuron in the connectome that could be responsible for this effect and suggest it may provide feedback to the fly's "compass" in the central complex.

      Strengths:

      The study tackles a relevant question in connectomics by exploring the relationship between structural and functional connectivity in the Drosophila brain. The authors apply a range of established and adapted analytical methods, including fMRI-style preprocessing and a novel synaptic segregation index. The effort to integrate multiple datasets and to compare across species reflects a broad and methodical approach.

      Weaknesses:

      The manuscript would benefit from a clearer overarching narrative to unify the various analyses, which currently appear somewhat disjointed. While the technical methods are extensive, the writing is often convoluted and lacks crucial details, making it difficult to follow the logic and interpret key findings. Additionally, the conclusions are relatively incremental and lack a compelling conceptual advance, limiting the overall impact of the work.

      (1) The introduction currently contains a number of findings and conclusions that would be better placed in the results and discussion to clearly delineate past findings from new results and speculations.

      (2) The narrative would benefit greatly from some clear statements along the lines of "we wanted to find out X, therefore we did Y".

      (3) More concise terminology would be helpful. For example, the connectomes are currently referred to as either "hemibrain", "FlyEM", "whole-brain", or "FlyWire".

      (4) The abstract claims "a new, more robust method to quantify the degree of pre- and post-synaptic segregation". However, the study fails to provide evidence that this method is indeed more robust than existing methods.

      (5) The authors define unsegregated neurons as having mixed pre- and postsynapses in the same space. However, this ignores the neurons' topology: a neuron can exhibit a clearly defined dendrite with (mostly) postsynapses and a clearly defined axon with (mostly) presynapses, which then occupy the same space. This is different from genuinely unsegregated neurons with no distinct dendritic and axonal compartments, such as CT1.

      (6) It is not entirely clear where the marmoset dataset originates from. Was it generated for this study? If not, why is there a note in the Ethics Declaration?

      (7) On the differences between hemibrain and FlyWire: What is the "18.8 million post-synapses" for FlyWire referring to? The (thresholded) FlyWire synapse table has 130M connections (=postsynapses). Subsetting that synapse cloud to the hemibrain volume still gives ~47M synapses. Further subsetting to only connections between proofread neurons inside the hemibrain volume gives 19.4M - perhaps the authors did something like that? Similarly, the hemibrain synapse table contains 64M postsynapses. Do the 21M "FlyEM" post-synapses refer to proofread neurons only? If the authors indeed used only (post-)synapses from proofread neurons, they need to make that explicit in results and methods, and account for differences in reconstruction status when making any comparisons. For example, the mushroom body in the hemibrain got a lot more attention than in FlyWire, which would explain the differences reported here. For that reason, connection weights are often expressed as, e.g., a fraction of the target's inputs instead of the total number of synapses when comparing connectivity across connectomic datasets. Furthermore, in Figure 3b, it looks like the FlyWire synapse cloud was not trimmed to the exact hemibrain boundaries: for example, the trimmed FlyWire synapse cloud seems to extend further into the optic lobes than the hemibrain volume does.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have a higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric.

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. However, here the authors do not perform any analyses with neurotransmitter information.<br /> Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view, constitute evidence of "similar design concepts."

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.<br /> The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).

    1. eLife Assessment

      This study by Lapao et al. uncovers a novel role for the Rab27A effector SYTL5 in regulating mitochondrial function and mitophagy under hypoxic conditions. Using a range of imaging and functional assays, the authors demonstrate that SYTL5 localizes to mitochondria in a Rab27A-dependent manner and impacts mitochondrial respiration and metabolic reprogramming. While the findings are solid and valuable in the area of cancer biology, further mechanistic clarity and improved imaging would strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Ana Lapao et al. investigated the roles of Rab27 effector SYTL5 in cellular membrane trafficking pathways. The authors found that SYTL5 localizes to mitochondria in a Rab27A-dependent manner. They demonstrated that SYTL5-Rab27A positive vesicles containing mitochondrial material are formed under hypoxic conditions, thus they speculate that SYTL5 and Rab27A play roles in mitophagy. They also found that both SYTL5 and Rab27A are important for normal mitochondrial respiration. Cells lacking SYTL5 undergo a shift from mitochondrial oxygen consumption to glycolysis which is a common process known as the Warburg effect in cancer cells. Based on cancer patient database, the author noticed that low SYTL5 expression is related to reduced survival for adrenocortical carcinoma patients, indicating SYTL5 could be a negative regulator of the Warburg effect and potentially tumorigenesis.

      Strengths:

      The authors take advantages of multiple techniques and novel methods to perform the experiments.

      (1) Live-cell imaging revealed that stably inducible expression of SYTL5 co-localized with filamentous structures positive for mitochondria. This result was further confirmed by using correlative light and EM (CLEM) analysis and western blotting from purified mitochondrial fraction.

      (2) In order to investigate whether SYTL5 and RAB27A are required for mitophagy in hypoxic conditions, two established mitophagy reporter U2OS cell lines were used to analyze the autophagic flux.

      Weaknesses:

      This study revealed a potential function of SYTL5 in mitophagy and mitochondrial metabolism. However, the mechanistic evidence that establishes the relationship between SYTL5/Rab27A and mitophagy is insufficient. The involvement of SYTL5 in ACC needs more investigation. Furthermore, images and results supporting the major conclusions need to be improved.

      Comments on revisions: The authors did not revise the paper as suggested.

    3. Reviewer #2 (Public review):

      Summary:

      The authors provide convincing evidence that Rab27 and STYL5 work together to regulate mitochondrial activity and homeostasis.

      Strengths:

      The development of models which allow the function to be dissected, and the rigous approach and testing of mitochondrial activity.

      This work is carefully done, and supports the importance of the roles of Rab27A and STYL5.

    4. Reviewer #3 (Public review):

      In the manuscript by Lapao et al., the authors uncover a role for the RAB27A effector protein SYTL5 in regulating mitochondrial function and apparent selective turnover of mitochondrial components. The authors find that SYTL5 localizes to mitochondria in a RAB27A dependent way and that loss of SYTL5 (or RAB27A) impairs lysosomal turnover of MTCO1 (but not a matrix-based reporter/other mitochondrial proteins). The authors go on to show that loss of SYTL5 impacts mitochondrial respiration and ECAR and as such may influence the Warburg effect and tumorigenesis. Of relevance here, the authors go on to show that SYTL5 expression is reduced in adrenocortical carcinomas and this correlates with reduced survival rates.

      As previously reviewed, this is a very intriguing body of work and reveals a new role for SYTL5/RAB27A at the mitochondria. Unfortunately, it appears that SYTL5 is challenging protein to detect endogenously and the authors' cell lines "comprise a heterogenous pool with high variability", which means that a lot of my original concerns remain. It is still also not clear if the conventional autophagy machinery is required for this pathway, especially if SYTL5/RAB27A mitochondrial recruitment is upstream of this. Hopefully, in future work, the authors (and/or others) will be able to address this and build on the mechanisms of this interesting and potentially important pathway.

    1. eLife Assessment

      This work provides one of the first important attempts to look at Drosophila immune responses against bacterial, viral, and fungal pathogens in a way that combines the roles of four major arms in immunity (Imd signaling, Toll signaling, phagocytosis, and melanization) rather than studying them separately. The findings are compelling and the tools provided can be used as they are, or built upon, in various contexts.

    2. Reviewer #1 (Public review):

      Summary:

      The innate immune system serves as the first line of defense against invading pathogens. Four major immune-specific modules-the Toll pathway, the Imd pathway, melanization, and phagocytosis-play critical roles in orchestrating the immune response. Traditionally, most studies have focused on the function of individual modules in isolation. However, in recent years, it has become increasingly evident that effective immune defense requires intricate interactions among these pathways.

      Despite this growing recognition, the precise roles, timing, and interconnections of these immune modules remain poorly understood. Moreover, addressing these questions represents a major scientific undertaking.

      Strengths:

      In this manuscript, Ryckebusch et al. systematically evaluate both the individual and combined contributions of these four immune modules to host defense against a range of pathogens. Their findings significantly enhance our understanding of the layered architecture of innate immunity.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors take a holistic view at the Drosophila immunity by selecting four major components of fly immunity often studied separately (Toll signaling, Imd signaling, phagocytosis and melanization), and studying their combinatory effects on the efficiency of the immune response. They achieve this by using fly lines mutant for one of these components, or modules, as well as for a combination of them, and testing the survival of these flies upon infection with a plethora of pathogens (bacterial, viral and fungal).

      Strengths:

      It is clear that this manuscript has required a large amount of hands-on work, considering the number of pathogens, mutations and timepoints tested. In my opinion, this work is a very welcome addition to the literature on fly immune responses, which obviously do not occur one type of a response at a time, but in parallel, subsequently and/or are interconnected. I find that the major strength of this work is the overall concept, which is made possible by the mutations designed to target the specific immune function of each module, without effects on other functions. I believe that the combinatory mutants will be of use for the fly community and enable further studies of interplay of these components of immune response in various settings.

      To control for the effects arising from the genetic variation other than the intended mutations, the mutants have been backcrossed into a widely used, isogenized Drosophila strain called w1118. Therefore, the differences accounted for by the genotype are controlled.

      I also appreciate that the authors have investigated the two possible ways of dealing with an infection: tolerance and resistance, and how the modules play into those.

      Weaknesses:

      While controlling for the background effects is vital, the w1118 background is problematic (an issue not limited to this manuscript) because of the wide effects of the white mutation on several phenotypes (also other than eye color/eyesight). It is a possibility that the mutation influences the functionality of the immune response components. I acknowledge that it is not reasonable to ask for data in different backgrounds better representing a "wild type" fly, but I think this matter should be brought up and discussed.

      The whole study has been conducted on male flies. Immune responses show quite extensive sex-specific variation across a variety of species studied, also in the fly. But the reasons for this variation are not fully understood. Therefore, I suggest that the authors would conduct a subset of experiments on female flies to see if the findings apply to both sexes, especially the infection-specificity of the module combinations.

      Comments on the revised manuscript:

      I appreciate the author's responses to the points I raised and the additional work they have conducted. The authors have now discussed the possible background effect and added an experiment on female flies showing that the module function is applicable to both sexes.

    1. eLife Assessment

      This potentially valuable study presents claims of evidence for coordinated membrane potential oscillations in E. coli biofilms that can be linked to a putative K+ channel and that may serve to enhance photo-protection. The finding of waves of membrane potential would be of interest to a wide audience from molecular biology to microbiology and physical biology. Unfortunately, a major issue is that it is unclear whether the dye used can act as a Nernstian membrane potential dye in E. coli. The arguments of the authors, who largely ignore previously published contradictory evidence, are not adequate in that they do not engage with the fact that the dye behaves in their hands differently than in the hands of others. In addition, the lack of proper validation of the experimental method including key control experiments leaves the evidence incomplete.

    2. Reviewer #1 (Public Review):

      (1) Significance of the findings:

      Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      (2) Strengths of the manuscript:

      - The authors report original data.<br /> - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.<br /> - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.<br /> - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.<br /> - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative gated-voltage-gated K+ ion channel (Kch channel) : enhancing survival under photo-toxic conditions.

      (3) Weakness:

      - Contrarily to what is stated in the abstract, the group of B. Maier has already reported collective electrical oscillations in the Gram-negative bacterium Neisseria gonorrhoeae (Hennes et al., PLoS Biol, 2023).<br /> - The data presented in the manuscript are not sufficient to conclude on the photo-protective role of the Kch channel. The authors should perform the appropriate control experiments related to Fig4D,E, i.e. reproduce these experiments without ThT to rule out possible photo-conversion effects on ThT that would modify its toxicity. In addition, it looks like the data reported on Fig 4E are extracted from Fig 4D. If this is indeed the case, it would be more conclusive to report the percentage of PI-positive cells in the population for each condition. This percentage should be calculated independently for each replicate. The authors should then report the average value and standard deviation of the percentage of dead cells for each condition.<br /> - Although Fig 4A clearly shows that light stimulation has an influence on the dynamics of ThT signal in the biofilm, it is important to rule out possible contributions of other environmental variations that occur when the flow is stopped at the onset of light stimulation. I understand that for technical reasons, the flow of fresh medium must be stopped for the sake of imaging. Therefore, I suggest to perform control experiments consisting in stopping the flow at different time intervals before image acquisition (30min or 1h before). If there is no significant contribution from environmental variations due to medium perfusion arrest, the dynamics of ThT signal must be unchanged regardless of the delay between flow stop and the start of light stimulation.<br /> - To precise the role of K+ in the habituation response, I suggest using the ionophore valinomycin at sub-inhibitory concentrations (5 or 10µM). It should abolish the habituation response. In addition, the Kch complementation experiment exhibits a sharp drop after the first peak but on a single point. It would be more convincing to increase the temporal resolution (1min->10s) to show that there are indeed a first and a second peak. Finally, the high concentration (100µM) of CCCP used in this study completely inhibits cell activity. Therefore, it is not surprising that no ThT dynamics was observed upon light stimulation at such concentration of CCCP.<br /> - Since TMRM signal exhibits a linear increase after the first response peak (Supp Fig1D), I recommend to mitigate the statement at line 78.<br /> - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. At minima, I recommend to plot the spatio-temporal diagram of ThT intensity profile averaged along the azimuthal direction in the biofilm. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel: I have plotted the spatio-temporal diagram for Video S3 and no electrical propagation is evident at the second peak. In addition, the authors should provide technical details of how R^2(t) is measured in the first regime (Fig 7E).<br /> - In the series of images presented in supplementary Figure 4A, no wavefront is apparent. Although the microscopy technics used in this figure differs from other images (like in Fig2), the wavefront should be still present. In addition, there is no second peak in confocal images as well (Supp Fig4B) .<br /> - Many important technical details are missing (e.g. biofilm size, R^2, curvature and 445nm irradiance measurements). The description of how these quantitates are measured should be detailed in the Material & Methods section.<br /> - Fig 5C: The curve in Fig 5D seems to correspond to the biofilm case. Since the model is made for single cells, the curve obtained by the model should be compared with the average curve presented in Fig 1B (i.e. single cell experiments).<br /> - For clarity, I suggest to indicate on the panels if the experiments concern single cell or biofilm experiments. Finally, please provide bright-field images associated to ThT images to locate bacteria.<br /> - In Fig 7B, the plateau is higher in the simulations than in the biofilm experiments. The authors should add a comment in the paper to explain this discrepancy.

    3. Reviewer #2 (Public Review):

      The authors use ThT dye as a Nernstian potential dye in E. coli. Quantitative measurements of membrane potential using any cationic indicator dye are based on the equilibration of the dye across the membrane according to Boltzmann's law.

      Ideally, the dye should have high membrane permeability to ensure rapid equilibration. Others have demonstrated that E.coli cells in the presence of ThT do not load unless there is blue light present, that the loading profile does not look like it is expected for a cationic Nernstian dye. They also show that the loading profile of the dye is different for E.coli cells deleted for the TolC pump. I, therefore, objected to interpreting the signal from the ThT as a Vm signal when used in E.coli. Nothing the authors have said has suggested that I should be changing this assessment.

      Specifically, the authors responded to my concerns as follows:

      (1) 'We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.' This seems to go against ethical practices when it comes to scientific literature citations. If the authors identified work that handles the same topic they do, which they believe is scientifically flawed, the discussion to reflect that should be included.

      (2)'The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.'<br /> It seems the authors object to the basic principle behind the usage of Nernstian dyes. If the authors wish to use ThT according to some other model, and not as a Nernstian indicator, they need to explain and develop that model. Instead, they state 'ThT is a Nernstian voltage indicator' in their manuscript and expect the dye to behave like a passive voltage indicator throughout it.

      (3)'We think the proton effect is a million times weaker than that due to potassium i.e. 0.2 M K+<br /> versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.'<br /> I agree with this statement by the authors. At near-neutral extracellular pH, E.coli keeps near-neutral intracellular pH, and the contribution from the chemical concentration gradient to the electrochemical potential of protons is negligible. The main contribution is from the membrane potential. However, this has nothing to do with the criticism to which this is the response of the authors. The criticism is that ThT has been observed not to permeate the cell without blue light. The blue light has been observed to influence the electrochemical potential of protons (and given that at near-neutral intracellular and extracellular pH this is mostly the membrane potential, as authors note themselves, we are talking about Vm effectively). Thus, two things are happening when one is loading the ThT, not just expected equilibration but also lowering of membrane potential. The electrochemical potential of protons is coupled via the membrane potential to all the other electrochemical potentials of ions, including the mentioned K+.

      (4) 'The vast majority of cells continue to be viable. We do not think membrane damage is dominating.' In response to the question on how the authors demonstrated TMRM loading and in which conditions (and while reminding them that TMRM loading profile in E.coli has been demonstrated in Potassium Phosphate buffer). The request was to demonstrate TMRM loading profile in their condition as well as to show that it does not depend on light. Cells could still be viable, as membrane permeabilisation with light is gradual, but the loading of ThT dye is no longer based on simple electrochemical potential (of the dye) equilibration.

      (5) On the comment on the action of CCCP with references included, authors include a comment that consists of phrases like 'our understanding of the literature' with no citations of such literature. Difficult to comment further without references.

      (6) 'Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee's comments thus seem tenable.'<br /> The authors have misunderstood my comment. I am not advocating shielding (I agree that this is not it) but stating that this is not the only other explanation for what they see (apart from electrical signaling). The other I proposed is that the membrane has changed in composition and/or the effective light power the cells can tolerate. The authors comment only on the light power (not convincingly though, giving the number for that power would be more appropriate), not on the possible changes in the membrane permeability.

      (7) 'The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibrate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.' I am not sure what the authors mean by another mechanism. The mechanism of action of a Nernstian dye is passive equilibration according to the electrochemical potential (i.e. until the electrochemical potential of the dye is 0).

      (8) 'In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger<br /> equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.'

      I gave a very concrete comment on the fact that in the HH model conductivity and leakage are as they are because this was explicitly measured. The authors state that they have carefully adopted their model based on what is currently understood for E.coli electrophysiology. It is not clear how. HH uses gKn^4 based on Figure2 here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1392413/pdf/jphysiol01442-0106.pdf, i.e. measured rise and fall of potassium conductance on msec time scales. I looked at the citation the authors have given and found a resistance of an entire biofilm of a given strain at 3 applied voltages. So why n^4 based on that? Why does unknown current have gqz^4 form? Sodium conductance in HH is described by m^3hgNa (again based on detailed conductance measurements), so why unknown current in E.coli by gQz^4? Why leakage is in the form that it is, based on what measurement?

      Throughout their responses, the authors seem to think that collapsing the electrochemical gradient of protons is all about protons, and this is not the case. At near neutral inside and outside pH, the electrochemical potential of protons is simply membrane voltage. And membrane voltage acts on all ions in the cell.

      Authors have started their response to concrete comments on the usage of ThT dye with comments on papers from my group that are not all directly relevant to this publication. I understand that their intention is to discredit a reviewer but given that my role here is to review this manuscript, I will only address their comments to the publications/part of publications that are relevant to this manuscript and mention what is not relevant.

      Publications in the order these were commented on.

      (1) In a comment on the paper that describes the usage of ThT dye as a Nernstian dye authors seem to talk about a model of an entire active cell.<br /> 'Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model.' The two have nothing to do with each other. Nernstian dye equilibrates according to its electrochemical potential. Once that happens it can measure the potential (under the assumption that not too much dye has entered and thus lowered too much the membrane potential under measurement). The time scale of that is important, and the dye can only measure processes that are slower than that equilibration. If one wants to use a dye that acts under a different model, first that needs to be developed, and then coupled to any other active cell model.

      (2) The part of this paper that is relevant is simply the usage of TMRM dye. It is used as Nernstian dye, so all the above said applies. The rest is a study of flagellar motor.

      (3) The authors seem to not understand that the electrochemical potential of protons is coupled to the electrochemical potentials of all other ions, via the membrane potential. In the manuscript authors talk about, PMF~Vm, as DeltapH~0. Other than that this publication is not relevant to their current manuscript.

      (4) The manuscript in fact states precisely that PMF cannot be generated by protons only and some other ions need to be moved out for the purpose. In near neutral environment it stated that these need to be cations (K+ e.g.). The model used in this manuscript is a pump-leak model. Neither is relevant for the usage of ThT dye.

      Further comments include, along the lines of:

      'The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility data based on a simple Nernstian battery model (they assume E. coli are unexcitable<br /> matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in fact it is a problem with their simple battery model.'

      The only assumption made when using a cationic Nernstian dye is that it equilibrates passively across the membrane according to its electrochemical potential. As it does that, it does lower the membrane potential, which is why as little as possible is added so that this is negligible. The equilibration should be as fast as possible, but at the very least it should be known, as no change in membrane potential can be measured that is faster than that.

      This behaviour should be orthogonal to what the cell is doing, it is a probe after all. If the cell is excitable, a Nernstian dye can be used, as long as it's still passively equilibrating and doing so faster than any changes in membrane potential due to excitations of the cells. There are absolutely no assumptions made on the active system that is about to be measured by this expected behaviour of a Nernstian dye. And there shouldn't be, it is a probe. If one wants to use a dye that is not purely Nernstian that behaviour needs to be described and a model proposed. As far as I can find, authors do no such thing.

      There is a comment on the use of a flagellar motor as a readout of PMF, stating that the motor can be stopped by YcgR citing the work from 2023. Indeed, there is a range of references such as https://doi.org/10.1016/j.molcel.2010.03.001 that demonstrate this (from around 2000-2010 as far as I am aware). The timescale of such slowdown is hours (see here Figure 5 https://www.cell.com/cell/pdf/S0092-8674(10)00019-X.pdf). Needless to say, the flagellar motor when used as a probe, needs to stay that in the conditions used. Thus one should always be on the lookout at any other such proteins that could slow it down and we are not aware of yet or make the speed no longer proportional to the PMF. In the papers my group uses the motor the changes are fast, often reversible, and in the observation window of 30min. They are also the same with DeltaYcgR strain, which we have not included as it seemed given the time scales it's obvious, but certainly can in the future (as well as stay vigilant on any conditions that would render the motor a no longer suitable probe for PMF).

    4. Reviewer #3 (Public Review):

      This manuscript by Akabuogu et al. investigates membrane potential dynamics in E. coli. Membrane potential fluctuations have been observed in bacteria by several research groups in recent years, including in the context of bacterial biofilms where they have been proposed to play a role in cellular communication. Here, these authors investigate membrane potential in E. coli, in both single cells and biofilms. I have reviewed the revised manuscript provided by the authors, as well as their responses to the initial reviews; my opinion about the manuscript is largely unchanged. I have focused my public review on those issues that I believe to be most pressing, with additional comments included in the review to authors. Although these authors are working in an exciting research area, the evidence they provide for their claims is inadequate, and several key control experiments are still missing. In some cases, the authors allude to potentially relevant data in their responses to the initial reviews, but unfortunately these data are not shown. Furthermore, I cannot identify any traveling wavefronts in the data included in this manuscript. In addition to the challenges associated with the use of Thioflavin-T (ThT) raised by the second reviewer, these caveats make the work presented in this manuscript difficult to interpret.

      First, some of the key experiments presented in the paper lack required controls:

      (1) This paper asserts that the observed ThT fluorescence dynamics are induced by blue light. This is a fundamental claim in the paper, since the authors go on to argue that these dynamics are part of a blue light response. This claim must be supported by the appropriate negative control experiment measuring ThT fluorescence dynamics in the absence of blue light- if this idea is correct, these dynamics should not be observed in the absence of blue light exposure. If this experiment cannot be performed with ThT since blue light is used for its excitation, TMRM can be used instead.

      In response to this, the authors wrote that "the fluorescent baseline is too weak to measure cleanly in this experiment." If they observe no ThT signal above noise in their time lapse data in the absence of blue light, this should be reported in the manuscript- this would be a satisfactory negative control. They then wrote that "It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal." I am not sure what they mean by this- perhaps that ThT fluorescence changes strongly only in response to blue light? This is a fundamental control for this experiment that ought to be presented to the reader.

      (2) The authors claim that a ∆kch mutant is more susceptible to blue light stress, as evidenced by PI staining. The premise that the cells are mounting a protective response to blue light via these channels rests on this claim. However, they do not perform the negative control experiment, conducting PI staining for WT the ∆kch mutant in the absence of blue light. In the absence of this control it is not possible to rule out effects of the ∆kch mutation on overall viability and/or PI uptake. The authors do include a growth curve for comparison, but planktonic growth is a very different context than surface-attached biofilm growth. Additionally, the ∆kch mutation may have impacts on PI permeability specifically that are not addressed by a growth curve. The negative control experiment is of key importance here.

      Second, the ideas presented in this manuscript rely entirely on analysis of ThT fluorescence data, specifically a time course of cellular fluorescence following blue light treatment. However, alternate explanations for and potential confounders of the observed dynamics are not sufficiently addressed:

      (1) Bacterial cells are autofluorescent, and this fluorescence can change significantly in response to stress (e.g. blue light exposure). To characterize and/or rule out autofluorescence contributions to the measurement, the authors should present time lapse fluorescence traces of unstained cells for comparison, acquired under the same imaging conditions in both wild type and ∆kch mutant cells. In their response to reviewers the authors suggested that they have conducted this experiment and found that the autofluorescence contribution is negligible, which is good, but these data should be included in the manuscript along with a description of how these controls were conducted.

      (2) Similarly, in my initial review I raised a concern about the possible contributions of photobleaching to the observed fluorescence dynamics. This is particularly relevant for the interpretation of the experiment in which catalase appears to attenuate the decay of the ThT signal; this attenuation could alternatively be due to catalase decreasing ThT photobleaching. In their response, the authors indicated that photobleaching is negligible, which would be good, but they do not share any evidence to support this claim. Photobleaching can be assessed in this experiment by varying the light dosage (illumination power, frequency, and/or duration) and confirming that the observed fluorescence dynamics are unaffected.

      Third, the paper claims in two instances that there are propagating waves of ThT fluorescence that move through biofilms, but I do not observe these waves in any case:

      (1) The first wavefront claim relates to small cell clusters, in Fig. 2A and Video S2 and S3 (with Fig. 2A and Video S2 showing the same biofilm.) I simply do not see any evidence of propagation in either case- rather, all cells get brighter and dimmer in tandem. I downloaded and analyzed Video S3 in several ways (plotting intensity profiles for different regions at different distances from the cluster center, drawing a kymograph across the cluster, etc.) and in no case did I see any evidence of a propagating wavefront. (I attempted this same analysis on the biofilm shown in Fig. 2A and Video S2 with similar results, but the images shown in the figure panels and especially the video are still both so saturated that the quantification is difficult to interpret.) If there is evidence for wavefronts, it should be demonstrated explicitly by analysis of several clusters. For example, a figure of time-to-peak vs. position in the cluster demonstrating a propagating wave would satisfy this. Currently, I do not see any wavefronts in this data.

      (2) The other wavefront claim relates to biofilms, and the relevant data is presented in Fig. S4 (and I believe also in what is now Video S8, but no supplemental video legends are provided, and this video is not cited in text.) As before, I cannot discern any wavefronts in the image and video provided; Reviewer 1 was also not able to detect wave propagation in this video by kymograph. Some mean squared displacements are shown in Fig. 7. As before, the methods for how these were obtained are not clearly documented either in this manuscript or in the BioRXiv preprint linked in the initial response to reviewers, and since wavefronts are not evident in the video it is hard to understand what is being measured here- radial distance from where? (The methods section mentions radial distance from the substrate, this should mean Z position above the imaging surface, and no wavefronts are evident in Z in the figure panels or movie.) Thus, clear demonstration of these wavefronts is still missing here as well.

      Fourth, I have some specific questions about the study of blue light stress and the use of PI as a cell viability indicator:

      (1) The logic of this paper includes the premise that blue light exposure is a stressor under the experimental conditions employed in the paper. Although it is of course generally true that blue light can be damaging to bacteria, this is dependent on light power and dosage. The control I recommended above, staining cells with PI in the presence and absence of blue light, will also allow the authors to confirm that this blue light treatment is indeed a stressor- the PI staining would be expected to increase in the presence of blue light if this is so.

      (2) The presence of ThT may complicate the study of the blue light stress response, since ThT enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). The authors could investigate ThT toxicity under these conditions by staining cells with PI after exposing them to blue light with or without ThT staining.

      (3) In my initial review, I wrote the following: "In Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3[BC]), this complicates the interpretation of this experiment." In their response, the authors suggested that these results are not relevant in this case because "In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia." However, the logic of the paper is that the cells are in fact dying due to an imposed external stressor, which presumably also confers an increased burden as the cells try to deal with the stress. Instead, the authors should simply use a parallel method to confirm the results of PI staining. For example, the experiment could be repeated with other stains, or the viability of blue light-treated cells could be addressed more directly by outgrowth or colony-forming unit assays.

      The CFU assay suggested above has the additional advantage that it can also be performed on planktonic cells in liquid culture that are exposed to blue light. If, as the paper suggests, a protective response to blue light is being coordinated at the biofilm level by these membrane potential fluctuations, the WT strain might be expected to lose its survival advantage vs. the ∆kch mutant in the absence of a biofilm.

      Fifth, in several cases the data are presented in a way that are difficult to interpret, or the paper makes claims that are different to observe in the data:

      (1) The authors suggest that the ThT and TMRM traces presented in Fig. S1D have similar shapes, but this is not obvious to me- the TMRM curve has very little decrease after the initial peak and only a modest, gradual rise thereafter. The authors suggest that this is due to increased TMRM photobleaching, but I would expect that photobleaching should exacerbate the signal decrease after the initial peak. Since this figure is used to support the use of ThT as a membrane potential indicator, and since this is the only alternative measurement of membrane potential presented in text, the authors should discuss this discrepancy in more detail.

      (2) The comparison of single cells to microcolonies presented in figures 1B and D still needs revision:

      First, both reviewer 1 and I commented in our initial reviews that the ThT traces, here and elsewhere, should not be normalized- this will help with the interpretation of some of the claims throughout the manuscript.

      Second, the way these figures are shown with all traces overlaid at full opacity makes it very difficult to see what is being compared. Since the point of the comparison is the time to first peak (and the standard deviation thereof), histograms of the distributions of time to first peak in both cases should be plotted as a separate figure panel.<br /> Third, statistical significance tests ought to be used to evaluate the statistical strength of the comparisons between these curves. The authors compare both means and standard deviations of the time to first peak, and there are appropriate statistical tests for both types of comparisons.

      (3) The authors claim that the curve shown in Fig. S4B is similar to the simulation result shown in Fig. 7B. I remain unconvinced that this is so, particularly with respect to the kinetics of the second peak- at least it seems to me that the differences should be acknowledged and discussed. In any case, the best thing to do would be to move Fig. S4B to the main text alongside Fig. 7B so that the readers can make the comparison more easily.

      (4) As I wrote in my first review, in the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, these fluctuations cannot be distinguished from measurement noise. A no-light control could help clarify this.

      (5) In the lower irradiance conditions in Fig. 4A, the ThT dynamics are slower overall, and it looks like the ThT intensity is beginning to rise at the end of the measurement. The authors write that no second peak is observed below an irradiance threshold of 15.99 µW/mm2. However, could a more prominent second peak be observed in these cases if the measurement time was extended? Additionally, the end of these curves looks similar to the curve in Fig. S4B, in which the authors write that the slow rise is evidence of the presence of a second peak, in contrast to their interpretation here.

      Additional considerations:

      (1) The analysis and interpretation of the first peak, and particularly of the time-to-fire data is challenging throughout the manuscript the time resolution of the data set is quite limited. It seems that a large proportion of cells have already fired after a single acquisition frame. It would be ideal to increase the time resolution on this measurement to improve precision. This could be done by imaging more quickly, but that would perhaps necessitate more blue light exposure; an alternative is to do this experiment under lower blue light irradiance where the first spike time is increased (Figure 4A).

      (2) The authors suggest in the manuscript that "E. coli biofilms use electrical signalling to coordinate long-range responses to light stress." In addition to the technical caveats discussed above, I am missing a discussion about what these responses might be. What constitutes a long-range response to light stress, and are there known examples of such responses in bacteria?

      (3) The presence of long-range blue light responses can also be interrogated experimentally, for example, by repeating the Live/Dead experiment in planktonic culture or the single-cell condition. If the protection from blue light specifically emerges due to coordinated activity of the biofilm, the ∆kch mutant would not be expected to show a change in Live/Dead staining in non-biofilm conditions. The CFU experiment I mentioned above could also implicate coordinated long-range responses specifically, if biofilms and liquid culture experiments can be compared (although I know that recovering cells from biofilms is challenging.)

      4. At the end of the results section, the authors suggest a critical biofilm size of only 4 μm for wavefront propagation (not much larger than a single cell!) The authors show responses for various biofilm sizes in Fig. 2C, but these are all substantially larger (and this figure also does not contain wavefront information.) Are there data for cell clusters above and below this size that could support this claim more directly?

      (5) In Fig. 4C, the overall trajectories of extracellular potassium are indeed similar, but the kinetics of the second peak of potassium are different than those observed by ThT (it rises minutes earlier)- is this consistent with the idea that Kch is responsible for that peak? Additionally, the potassium dynamics also include the first ThT peak- is this surprising given that the Kch channel has no effect on this peak according to the model?

      Detailed comments:

      Why are Fig. 2A and Video S2 called a microcluster, whereas Video S3, which is smaller, is called a biofilm?

      "We observed a spontaneous rapid rise in spikes within cells in the center of the biofilm" (Line 140): What does "spontaneous" mean here?

      "This demonstrates that the ion-channel mediated membrane potential dynamics is a light stress relief process.", "E. coli cells employ ion-channel mediated dynamics to manage ROS-induced stress linked to light irradiation." (Line 268 and the second sentence of the Fig. 4F legend): This claim is not well-supported. There are several possible interpretations of the catalase experiment (which should be discussed); this experiment perhaps suggests that ROS impacts membrane potential but does not indicate that these membrane potential fluctuations help the cells respond to blue light stress. The loss of viability in the ∆kch mutant might indicate a link between these membrane potential experiments and viability, but it is hard to interpret without the no light controls I mention above.

      "The model also predicts... the external light stress" (Lines 338-341): Please clarify this section. Where does this prediction arise from in the modeling work? Second, I am not sure what is meant by "modulates the light stress" or "keeps the cell dynamics robust to the intensity of external light stress" (especially since the dynamics clearly vary with irradiance, as seen in Figure 4A).

      "We hypothesized that E. coli not only modulates the light-induced stress but also handles the increase of the ROS by adjusting the profile of the membrane potential dynamics" (Line 347): I am not sure what "handles the ROS by adjusting the profile of the membrane potential dynamics" means. What is meant by "handling" ROS? Is the hypothesis that membrane potential dynamics themselves are protective against ROS, or that they induce a ROS-protective response downstream, or something else? Later the authors write that changes in the response to ROS in the model agree with the hypothesis, but just showing that ROS impacts the membrane potential does not seem to demonstrate that this has a protective effect against ROS.

      "Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli." (Line 391): This is misleading- mechanosensitive ion channels totally ablate membrane potential dynamics, they don't have a specific effect on the first hyperpolarization event. The claim that mechanonsensitive ion channels are specifically involved in the first event also appears in the abstract.

      Also, the apparent membrane potential is much lower even at the start of the experiment in these mutants (Fig. 6C-D)- is this expected? This seems to imply that these ion channels also have a blue light-independent effect.

      Throughout the paper, there are claims that the initial ThT spike is involved in "registering the presence of the light stress" and similar. What is the evidence for this claim?

      "We have presented much better quantitative agreement of our model with the propagating wavefronts in E. coli biofilms..." (Line 619): It is not evident to me that the agreement between model and prediction is "much better" in this work than in the cited work (reference 57, Hennes et al. 2023). The model in Figure 4 of ref. 57 seems to capture the key features of their data.

      In methods, "Only cells that are hyperpolarized were counted in the experiment as live" (Line 745): what percentage of cells did not hyperpolarize in these experiments?

      Some indication of standard deviation (error bars or shading) should be added to all figures where mean traces are plotted.

      Video S8 is very confusing- why does the video play first forwards and then backwards? It is easy to misinterpret this as a rise in the intensity at the end of the experiment.

    1. eLife Assessment

      This is a fundamental study that provides a detailed single-cell transcriptomic and epigenomic map of the mouse trabecular meshwor, identifying three distinct trabecular meshwor subtypes with specific functional roles. It links the glaucoma-associated transcription factor LMX1B to mitochondrial regulation in TM3 cells and demonstrates that nicotinamide treatment prevents IOP elevation in Lmx1bV265D/+ mutant mice, highlighting a potential metabolic therapeutic strategy for glaucoma. This convincing work would be further supported by data that link the transcriptional data with mitochondrial functional assays.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides a comprehensive single-cell and multiomic characterization of trabecular meshwork (TM) cells in the mouse eye, a structure critical to intraocular pressure (IOP) regulation and glaucoma pathogenesis. Using scRNA-seq, snATAC-seq, immunofluorescence, and in situ hybridization, the authors identify three transcriptionally and spatially distinct TM cell subtypes. The study further demonstrates that mitochondrial dysfunction, specifically in one subtype (TM3), contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. Importantly, treatment with nicotinamide (vitamin B3), known to support mitochondrial health, prevents IOP elevation in this model. The authors also link their findings to human datasets, suggesting the existence of analogous TM3-like cells with potential relevance to human glaucoma.

      Strengths:

      The study is methodologically rigorous, integrating single-cell transcriptomic and chromatin accessibility profiling with spatial validation and in vivo functional testing. The identification of TM subtypes is consistent across mouse strains and institutions, providing robust evidence of conserved TM cell heterogeneity. The use of a glaucoma model to show subtype-specific vulnerability, combined with a therapeutic intervention-gives the study strong mechanistic and translational significance. The inclusion of chromatin accessibility data adds further depth by implicating active transcription factors such as LMX1B, a gene known to be associated with glaucoma risk. The integration with human single-cell datasets enhances the potential relevance of the findings to human disease.

      Weaknesses:

      Although the LMX1B transcription factor is implicated as a key regulator in TM3 cells, its role in directly controlling mitochondrial gene expression is not fully explored. Additional analysis of motif accessibility or binding enrichment near relevant target genes could substantiate this mechanistic link. The therapeutic effect of vitamin B3 is clearly demonstrated phenotypically, but the underlying cellular and molecular mechanisms remain somewhat underdeveloped - for instance, changes in mitochondrial function, oxidative stress markers, or NAD+ levels are not directly measured. While the human relevance of TM3 cells is suggested through marker overlap, more quantitative approaches, such as cell identity mapping or gene signature scoring in human datasets, would strengthen the translational connection.

      Overall, this is a compelling and carefully executed study that offers significant advances in our understanding of TM cell biology and its role in glaucoma. The integration of multimodal data, disease modeling, and therapeutic testing represents a valuable contribution to the field. With additional mechanistic depth, the study has the potential to become a foundational resource for future research into IOP regulation and glaucoma treatment.

    3. Reviewer #2 (Public review):

      Summary:

      This elegant study by Tolman and colleagues provides fundamental findings that substantially advance our knowledge of the major cell types within the limbus of the mouse eye, focusing on the aqueous humor outflow pathway. The authors used single-cell and single-nuclei RNAseq to very clearly identify 3 subtypes of the trabecular meshwork (TM) cells in the mouse eye, with each subtype having unique markers and proposed functions. The U. Columbia results are strengthened by an independent replication in a different mouse strain at a separate laboratory (Duke). Bioinformatics analyses of these expression data were used to identify cellular compartments, molecular functions, and biological processes. Although there were some common pathways among the 3 subtypes of TM cells (e.g., ECM metabolism), there also were distinct functions. For example:

      • TM1 cell expression supports heavy engagement in ECM metabolism and structure, as well as TGFβ2 signaling.

      • TM2 cells were enriched in laminin and pathways involved in phagocytosis, lysosomal function, and antigen expression, as well as End3/VEGF/angiopoietin signaling.

      • TM3 cells were enriched in actin binding and mitochondrial metabolism.

      They used high-resolution immunostaining and in situ hybridization to show that these 3 TM subtypes express distinct markers and occupy distinct locations within the TM tissue. The authors compared their expression data with other published scRNAseq studies of the mouse as well as the human aqueous outflow pathway. They used ATAC-seq to map open chromatin regions in order to predict transcription factor binding sites. Their results were also evaluated in the context of human IOP and glaucoma risk alleles from published GWAS data, with interesting and meaningful correlations. Although not discussed in their manuscript, their expression data support other signaling pathways/ proteins/ genes that have been implicated in glaucoma, including: TGFβ2, BMP signaling (including involvement of ID proteins), MYOC, actin cytoskeleton (CLANs), WNT signaling, etc.

      In addition to these very impressive data, the authors used scRNAseq to examine changes in TM cell gene expression in the mouse glaucoma model of mutant Lmxb1-induced ocular hypertension. In man, LMX1B is associated with Nail-Patella syndrome, which can include the development of glaucoma, demonstrating the clinical relevance of this mouse model. Among the gene expression changes detected, TM3 cells had altered expression of genes associated with mitochondrial metabolism. The authors used their previous experience using nicotinamide to metabolically protect DBA2/J mice from glaucomatous damage, and they hypothesized that nicotinamide supplementation of mutant Lmx1b mice would help restore normal mitochondrial metabolism in the TM and prevent Lmx1b-mediated ocular hypertension. Adding nicotinamide to the drinking water significantly prevented Lmxb1 mutant mice from developing high intraocular pressure. This is a laudable example of dissecting the molecular pathogenic mechanisms responsible for a disease (glaucoma) and then discovering and testing a potential therapy that directly intervenes in the disease process and thereby protects from the disease.

      Strengths:<br /> There are numerous strengths in this comprehensive study including:<br /> • Deep scRNA sequencing that was confirmed by an independent dataset in another mouse strain at another university.<br /> • Identification and validation of molecular markers for each mouse TM cell subset along with localization of these subsets within the mouse aqueous outflow pathway.<br /> • Rigorous bioinformatics analysis of these data as well as comparison of the current data with previously published mouse and human scRNAseq data.<br /> • Correlating their current data with GWAS glaucoma and IOP "hits".<br /> • Discovering gene expression changes in the 3 TM subgroups in the mouse mutant Lmx1b model of glaucoma.<br /> • Further pursuing the indication of dysfunctional mitochondrial metabolism in TM3 cells from Lmx1b mutant mice to test the efficacy of dietary supplementation with nicotinamide. The authors nicely demonstrate the disease modifying efficacy of nicotinamide in preventing IOP elevation in these Lmx1b mutant mice, preventing the development of glaucoma. These results have clinical implications for new glaucoma therapies.

      Weaknesses:<br /> • Occasional over-interpretation of data. The authors have used changes in gene expression (RNAseq) to implicate functions and signaling pathways. For example: they have not directly measured "changes in metabolism", "mitochondrial dysfunction" or "activity of Lmx1b".<br /> • In their very thorough data set, there is enrichment of or changes in gene expression that support other pathways that have been previously reported to be associated with glaucoma (such as TGFβ2, BMP signaling, actin cytoskeletal organization (CLANs), WNT signaling, ossification, etc. that appears to be a lost opportunity to further enhance the significance of this work.

    4. Reviewer #3 (Public review):

      Summary:In this study, the authors perform multimodal single-cell transcriptomic and epigenomic profiling of 9,394 mouse TM cells, identifying three transcriptionally distinct TM subtypes with validated molecular signatures. TM1 cells are enriched for extracellular matrix genes, TM2 for secreted ligands supporting Schlemm's canal, and TM3 for contractile and mitochondrial/metabolic functions. The transcription factor LMX1B, previously linked to glaucoma, shows the highest expression in TM3 cells and appears to regulate mitochondrial pathways. In Lmx1bV265D mutant mice, TM3 cells exhibit transcriptional signs of mitochondrial dysfunction associated with elevated IOP. Notably, vitamin B3 treatment significantly mitigates IOP elevation, suggesting a potential therapeutic avenue.

      This is an excellent and collaborative study involving investigators from two institutions, offering the most detailed single-cell transcriptomic and epigenetic profiling of the mouse limbal tissues-including both TM and Schlemm's canal (SC), from wild-type and Lmx1bV265D mutant mice. The study defines three TM subtypes and characterizes their distinct molecular signatures, associated pathways, and transcriptional regulators. The authors also compare their dataset with previously published murine and human studies, including those by Van Zyl et al., providing valuable cross-species insights.

      Strengths:

      (1) Comprehensive dataset with high single-cell resolution<br /> (2) Use of multiple bioinformatic and cross-comparative approaches<br /> (3) Integration of 3D imaging of TM and SC for anatomical context<br /> (4) Convincing identification and validation of three TM subtypes using molecular markers.

      Weaknesses:

      (1) Insufficient evidence linking mitochondrial dysfunction to TM3 cells in Lmx1bV265D mice: While the identification of TM3 cells as metabolically specialized and Lmx1b-enriched is compelling, the proposed link between Lmx1b mutation and mitochondrial dysfunction remains underdeveloped. It is unclear whether mitochondrial defects are a primary consequence of Lmx1b-mediated transcriptional dysregulation or a secondary response to elevated IOP. Additional evidence is needed to clarify whether Lmx1b directly regulates mitochondrial genes (e.g., via ChIP-seq, motif analysis, or ATAC-seq), or whether mitochondrial changes are downstream effects.<br /> Furthermore, the protective effects of nicotinamide (NAM) are interpreted as evidence of mitochondrial involvement, but no direct mitochondrial measurements (e.g., immunostaining, electron microscopy, OCR assays) are provided. It is essential to validate mitochondrial dysfunction in TM3 cells using in vivo functional assays to support the central conclusion of the paper. Without this, the claim that mitochondrial dysfunction drives IOP elevation in Lmx1bV265D mice remains speculative. Alternatively, authors should consider revising their claims that mitochondrial dysfunction in these mice is a central driver of TM dysfunction.

      (2) Mechanism of NAM-mediated protection is unclear: The manuscript states that NAM treatment prevents IOP elevation in Lmx1bV265D mice via metabolic support, yet no data are shown to confirm that NAM specifically rescues mitochondrial function. Do NAM-treated TM3 cells show improved mitochondrial integrity? Are reactive oxygen species (ROS) reduced? Does NAM also protect RGCs from glaucomatous damage? Addressing these points would clarify whether the therapeutic effects of NAM are indeed mitochondrial.

      (3) Lack of direct evidence that LMX1B regulates mitochondrial genes: While transcriptomic and motif accessibility analyses suggest that LMX1B is enriched in TM3 cells and may influence mitochondrial function, no mechanistic data are provided to demonstrate direct regulation of mitochondrial genes. Including ChIP-seq data, motif enrichment at mitochondrial gene loci, or perturbation studies (e.g., Lmx1b knockout or overexpression in TM3 cells) would greatly strengthen this central claim.

      (4)Focus on LMX1B in Fig. 5F lacks broader context: Figure 5F shows that several transcription factors (TFs)-including Tcf21, Foxs1, Arid3b, Myc, Gli2, Patz1, Plag1, Npas2, Nr1h4, and Nfatc2-exhibit stronger positive correlations or motif accessibility changes than LMX1B. Yet the manuscript focuses almost exclusively on LMX1B. The rationale for this focus should be clarified, especially given LMX1B's relatively lower ranking in the correlation analysis. Were the functions of these other highly ranked TFs examined or considered in the context of TM biology or glaucoma? Discussing their potential roles would enhance the interpretation of the transcriptional regulatory landscape and demonstrate the broader relevance of the findings.

      Other weaknesses:

      (1) In abstract, they say a number of 9,394 wild-type TM cell transcriptomes. The number of Lmx1bV265D/+ TM cell transcriptomes analyzed is not provided. This information is essential for evaluating the comparative analysis and should be clearly stated in the Abstract and again in the main text (e.g., lines 121-123). Including both wild-type and mutant cell counts will help readers assess the balance and robustness of the dataset.

      (2) Did the authors monitor mouse weight or other health parameters to assess potential systemic effects of treatment? It is known that the taste of compounds in drinking water can alter fluid or food intake, which may influence general health. Also, does Lmx1bV265D/+ have mice exhibit non-ocular phenotypes, and if so, does nicotinamide confer protection in those tissues as well? Additionally, starting the dose of the nicotinamide at postnatal day 2, how long the mice were treated with water containing nicotinamide, and after how many days or weeks IOP was reduced, and how long the decrease in the IOP was sustained.<br /> (3) While the IOP reduction observed in NAM-treated Lmx1bV265D/+ mice appears statistically significant, it is unclear whether this reflects meaningful biological protection. Several untreated mice exhibit very high IOP values, which may skew the analysis. The authors should report the mean values for IOP in both untreated and NAM-treated groups to clarify the magnitude and variability of the response.<br /> (4) Additionally, since NAM has been shown to protect RGCs in other glaucoma models directly, the authors should assess whether RGCs are preserved in NAM-treated Lmx1b V265D/+ mice. Demonstrating RGC protection would support a synergistic effect of NAM through both IOP reduction and direct neuroprotection, strengthening the translational relevance of the treatment.<br /> (5) Can the authors add any other functional validation studies to explore to understand the pathways enriched in all the subtypes of TM1, TM2, and TM3 cells, in addition to the ICH/IF/RNAscope validation?<br /> (6) The authors should include a representative image of the limbal dissection. While Figure S1 provides a schematic, mouse eyes are very small, and dissecting unfixed limbal tissue is technically challenging. It is also difficult to reconcile the claim that the majority of cells in the limbal region are TM and endothelium. As shown in Figure S6, DAPI staining suggests a much higher abundance of scleral cells compared to TM cells within the limbal strip. Additional clarification or visual evidence would help validate the dissection strategy and cellular composition of the captured region.

    1. eLife Assessment

      This is a valuable methodological contribution towards accurate characterization of viral genetic diversity using long-read sequencing and unique molecular identifiers (UMIs). However, the methods are currently incomplete and the sensitivity is not rigorously demonstrated. Addressing these gaps would strengthen the manuscript and make it a key addition to the field.

    2. Reviewer #1 (Public review):

      Tamao et al. aimed to quantify the diversity and mutation rate of the influenza (PR8 strain) in order to establish a high-resolution method for studying intra-host viral evolution. To achieve this, the authors combined RNA sequencing with single-molecule unique molecular identifiers (UMIs) to minimize errors introduced during technical processing. They proposed an in vitro infection model with a single viral particle to represent biological genetic diversity, alongside a control model using in vitro transcribed RNA for two viral genes, PB2 and HA.

      Through this approach, the authors demonstrated that UMIs reduced technical errors by approximately tenfold. By analyzing four viral populations and comparing them to in vitro transcribed RNA controls, they estimated that ~98.1% of observed mutations originated from viral replication rather than technical artifacts. Their results further showed that most mutations were synonymous and introduced randomly. However, the distribution of mutations suggested selective pressures that favored certain variants. Additionally, comparison with a closely related influenza strain (A/Alaska/1935) revealed two positively selected mutations, though these were absent in the strain responsible for the most recent pandemic (CA01).

      Overall, the study is well-designed, and the interpretations are strongly supported by the data. However, the following clarifications are recommended:

      (1) The methods section is overly brief. Even if techniques are cited, more experimental details should be included. For example, since the study focuses heavily on methodology, details such as the number of PCR cycles in RT-PCR or the rationale for choosing HA and PB2 as representative in vitro transcripts should be provided.

      (2) Information on library preparation and sequencing metrics should be included. For example, the total number of reads, any filtering steps, and quality score distributions/cutoff for the analyzed reads.

      (3) In the Results section (line 115, "Quantification of error rate caused by RT"), the mutation rate attributed to viral replication is calculated. However, in line 138, it is unclear whether the reported value reflects PB2, HA, or both, and whether the comparison is based on the error rate of the same viral RNA or the mean of multiple values (as shown in Figure 3A). Please clarify whether this number applies universally to all influenza RNAs or provide the observed range.

      (4) Since the T7 polymerase introduced errors are only applied to the in vitro transcription control, how were these accounted for when comparing mutation rates between transcribed RNA and cell-culture-derived virus?

      (5) Figure 2 shows that a UMI group size of 4 has an error rate of zero, but this group size is not mentioned in the text. Please clarify.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a technically oriented application of UMI-based long-read sequencing to study intra-host diversity in influenza virus populations. The authors aim to minimize sequencing artifacts and improve the detection of rare variants, proposing that this approach may inform predictive models of viral evolution. While the methodology appears robust and successfully reduces sequencing error rates, key experimental and analytical details are missing, and the biological insight is modest. The study includes only four samples, with no independent biological replicates or controls, which limits the generalizability of the findings. Claims related to rare variant detection and evolutionary selection are not fully supported by the data presented.

      Strengths:

      The study addresses an important technical challenge in viral genomics by implementing a UMI-based long-read sequencing approach to reduce amplification and sequencing errors. The methodological focus is well presented, and the work contributes to improving the resolution of low-frequency variant detection in complex viral populations.

      Weaknesses:

      The application of UMI-based error correction to viral population sequencing has been established in previous studies (e.g., in HIV), and this manuscript does not introduce a substantial methodological or conceptual advance beyond its use in the context of influenza.

      The study lacks independent biological replicates or additional viral systems that would strengthen the generalizability of the conclusions. Potential sources of technical error are not explored or explicitly controlled. Key methodological details are missing, including the number of PCR cycles, the input number of molecules, and UMI family size distributions. These are essential to support the claimed sensitivity of the method.

      The assertion that variants at {greater than or equal to}0.1% frequency can be reliably detected is based on total read count rather than the number of unique input molecules. Without information on UMI diversity and family sizes, the detection limit cannot be reliably assessed.

      Although genetic variation is described, the functional relevance of observed mutations in HA and NA is not addressed or discussed in the context of known antigenic or evolutionary features of influenza. The manuscript is largely focused on technical performance, with limited exploration of the biological implications or mechanistic insights into influenza virus evolution.

      The experimental scale is small, with only four viral populations derived from single particles analyzed. This limited sample size restricts the ability to draw broader conclusions about quasispecies dynamics or evolutionary pressures.

    1. eLife Assessment

      This study provides important insights into the role of polyUbiquitination in neurodegenerative diseases, elucidating how pUb promotes neurodegeneration by affecting proteasomal function. The findings not only offer a new perspective on the pathophysiology of neurodegenerative diseases but also provide potential targets for developing new therapeutic strategies. The experiments in the revised submission provide solid evidence to support the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further consideration.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.

      (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Comments on revisions:

      This study, through a systematic experimental design, reveals the crucial role of pUb in forming a positive feedback loop by inhibiting proteasome activity in neurodegenerative diseases. The data are comprehensive and highly innovative. However, some of the results are not entirely convincing, particularly the staining results in Figure 1.

      In Figure 1A, the density of DAPI staining differs significantly between the control patient and the AD patient, making it difficult to conclusively demonstrate a clear increase in PINK1 in AD patients. Quantitative analysis is needed. In Fig 1C, the PINK1 staining in the mouse brain appears to resemble non-specific staining.

    1. eLife Assessment

      This manuscript presents an in-depth analysis of gene expression across multiple brown algal species with differing life histories, providing convincing evidence for the conservation of life cycle-specific gene expression. While largely descriptive, the study is an important step forward in understanding the core cellular processes that differ between life cycle phases, and its findings will be of broad interest to developmental and evolutionary biologists.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript by Ratchinski et al presents a comprehensive analysis of developmental and life history gene expression patterns in brown algal species. The manuscript shows that the degree of generation bias or generation-specific gene expression correlates with the degree of dimorphism. It also reports conservation of life cycle features within generations and marked changes in gene expression patterns in Ectocarpus in the transition between gamete and early sporophyte. The manuscript also reports considerable conservation of gene expression modules between two representative species, particularly in genes associated with conserved functional characteristics.

      Strengths:

      The manuscript represents a considerable "tour de force" dataset and analytical effort. While the data presented is largely descriptive, it is likely to provide a very useful resource for studies of brown algal development and for comparative studies with other developmental and life cycle systems.

      Comments on revisions

      The authors have provided in their response (point 1) a good clarification for their rationale in excluding fucoid algae from the study, based on the diploid nature of the fucoid life cycle. Similarly, they have noted (point 2) that "the relationship between changes in gene expression during very early sporophyte development and during alternation of life cycle generations could be investigated further using a highlydimorphic kelp model system such as Saccharina latissima." For the benefit of the reader who may not be too familiar with the different life cycles in brown algae, I would recommend that these clarifications are included in the Discussion.

      Otherwise the authors have addressed my previous comments adequately.

    1. eLife Assessment

      In this preregistered study, Kunkel and colleagues set out to compare the magnitude and duration of placebo versus nocebo effects in healthy volunteers, and also to examine the different factors contributing to these effects. The authors follow a rigorous methodology in a within-subjects design, taking into consideration standard conventions for manipulation of expectations, and using an appropriate sham condition. They present compelling evidence of long-lasting placebo and nocebo effects, with nocebo responses demonstrating consistently greater strength. These valuable results have the potential for a great impact in the field of experimental and clinical pain.

    2. Reviewer #1 (Public review):

      Summary:

      The study aimed to: (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning, (2) examine the persistence of these effects one week later, and (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participant's expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      Weaknesses:

      There are a limited number of trials per test condition (10) which means that the trajectory of responses to the manipulation may not be explored, which would be an interesting future study.

      The differences between the nocebo and control condition in pain ratings during conditioning could be explained by differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about the expectation effects here. A a randomisation error meant that 25 participants received an unbalanced number 448 of trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80), although the authors accounted for this during analysis so it is not of major concern.

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Comments on revisions:

      I am satisfied with the author's revisions to the manuscript and have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      Comments on revisions:

      The authors have addressed all of my concerns and comments - one point for them to verify is that indeed analyses that have not been preregistered will be flagged as such. The provided pre-registration link doesn't specify much about the analysis plans and specific tests used.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase. The trial number was chosen to ensure comparability with previous studies addressing similar research questions with similar designs (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We have now acknowledged this limitation and future direction in the revised manuscript.

      The paragraph reads as follows: “It is important to note that our study was designed in alignment with previous studies addressing similar questions (e.g., Colloca et al., 2010). Our primary aim was to directly compare placebo and nocebo effects in a within-subject design and assess their persistence of these effects one week following the first test session. One limitation of our approach is the relatively short duration of each session, which may have limited our ability to examine the trajectory of responses within a single session. Future studies could address this limitation by increasing the number of trials for a more comprehensive analysis.”

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify this point. Participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We have added this information to the revised version of the manuscript.

      The paragraph now reads as follows: “On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60 and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. Note that participants were informed that these pre-test stimuli were part of the recalibration and refamiliarization procedure conducted prior to the second test session.”

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We have addressed this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session.

      The paragraph now reads: “This asymmetry is noteworthy in and of itself because it occurred despite the equidistant stimulus calibration relative to the control condition prior to conditioning. It may be the result of different physiological effects of the stimuli over time or amplified learning in the nocebo condition, consistent with its heightened biological relevance, but it could also be a stronger effect of the verbal instructions in this condition.”

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that this is indeed unfortunate. However, we would like to point out that all analyses reported in the manuscript have been controlled for the VAS ratings in the conditioning session, i.e., potential effects of the conditioned placebo and nocebo stimuli. Moreover, we have now conducted additional analyses, presented here in our response to the reviewers, to demonstrate that this imbalance did not systematically bias the results. Importantly, the key findings observed during the test phase remain robust despite this issue.

      Specifically, when excluding these 25 participants from the analyses, the reported stronger nocebo compared to placebo effects in the test session on day 1 remain unchanged. Likewise, the comparison of placebo and nocebo effects between days 1 and 8 shows the same pattern when excluding the participants in question. The only exception is the interaction between effect (placebo vs nocebo) x session (day 1 vs day 8), which changed from a borderline significant result (p = .049) to insignificant (p = .24). However, post hoc tests continued to show the same pattern as originally reported: a significant reduction in the nocebo effect from day 1 to day 8 and no significant change in the placebo effect.

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the methodological rigor and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We thank the reviewer for pointing this out. We included a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for their helpful comment and agree that the Results section requires additional information that would typically be provided by the Methods section if it directly followed the Introduction. In response, we have moved the former Figure 4 from the Methods section to the beginning of the Results section as a new Figure 1, to improve clarity. Further, we have revised the Methods section to explicitly state that all trials during the conditioning phase were manipulated in the same way.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the authors are claiming (correctly) that there is only limited work comparing placebo/nocebo effects, there are some papers missing from their citations:

      Nocebo responses are stronger than placebo responses after subliminal pain conditioning - - Jensen, K., Kirsch, I., Odmalm, S., Kaptchuk, T. J. & Ingvar, M. Classical conditioning of analgesic and hyperalgesic pain responses without conscious awareness. Proc. Natl. Acad. Sci. USA 112, 7863-7 (2015)

      We thank the reviewer and have now included this relevant publication into the introduction of the revised manuscript.

      Hird, E.J., Charalambous, C., El-Deredy, W. et al. Boundary effects of expectation in human pain perception. Sci Rep 9, 9443 (2019). https://doi.org/10.1038/s41598-019-45811-x

      We thank the reviewer for suggesting this relevant publication. We have now included it into the discussion of the revised manuscript by adding the following paragraph:

      “Recent work using a predictive coding framework further suggests that nocebo effects may be less susceptible to prediction error than placebo effects (Hird et al., 2019), which could contribute to their greater persistence and strength in our study.”

      (2) The trial-by-trial pain ratings could have been usefully modelled with a computational model, such as a Bayesian model (this is especially pertinent given the reference to Bayesian processing in the discussion). A multilevel model could also be used to increase the power of the analysis. This is a tentative suggestion, as I appreciate it would require a significant investment of time and work - alternatively, the authors could acknowledge it in the Discussion as a useful future avenue for investigation, if this is preferred.

      We thank the reviewer for this thoughtful suggestion. While we agree that computational modelling approaches could provide valuable insights into individual learning, our study was not designed with this in mind and the relatively small number of trials per condition and the absence of trial-by-trial expectancy ratings limit the applicability of such models. We have therefore chosen not to pursue such analysis but highlight it in the discussion as a promising direction for future research.

      “Notably, the most recent experience was the most predictive in all three analyses; for instance, the placebo effect on day 8 was predicted by the placebo effect on day 1, not by the initial conditioning. This finding supports the Bayesian inference framework, where recent experiences are weighted more heavily in the process of model updating because they are more likely to reflect the current state of the environment, providing the most relevant and immediate information needed to guide future actions and predictions24. Interestingly, while a change in pain predicted subsequent nocebo effects, it seemed less influential than for placebo effects. This aligns with findings that longer conditioning enhanced placebo effects, while it did not affect nocebo responses10 and the conclusion that nocebo instruction may be sufficient to trigger nocebo responses. Using Bayesian modeling, future studies could identify individual differences in the development of placebo and nocebo effects by integrating prior experiences and sensory inputs, providing a probabilistic framework for understanding the underlying mechanisms.”

      (3) The paper is missing any justification of sample size, i.e. power analysis - please include this.

      We apologize for the missing information on our a priori power analysis. As there is a lack of prior studies investigating within-subjects comparisons of placebo and nocebo effects that could inform precise effect size estimates for our research question, we based our calculation on the ability detect small effects. Specifically, the study was powered to detect effect sizes in the range of d = 0.2 - 0.25 with α = .05 and power = .9, yielding a required sample size of N = 83-129. We have now added this information to the methods section of the revised manuscript.

      (4) "On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60 and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation."

      What were the instructions about this? Was it before the electrode was applied? This runs the risk of unblinding participants, as they only expect to feel changes in stimulus intensity due to the TENS stimulation.

      We thank the reviewer for pointing out the potential risk of unblinding participants due to the re-familiarization process prior to the second test session. We would like to clarify that we followed specific procedures to prevent participants from associating this process with the experimental manipulation. The re-familiarisation with the thermal stimuli was conducted after the electrode had been applied and re-tested to ensure that both stimulus modalities were re-introduced in a consistent and neutral context. Participants were explicitly informed that both procedures were standard checks prior to the actual test session (“We will check both once again before we begin the actual measurement.”). For the thermal stimuli, we informed participants that they would experience three different intensities to allow the skin to acclimate (e.g., “...we will test the heat stimuli in 3 trials with different temperatures, allowing your skin to acclimate to the stimuli. …”), without implying any connection to the experimental conditions.

      Importantly, this re-familiarization procedure mirrored what participants had already experienced during the initial calibration session on day 1. We therefore assume that participants interpreted as a routine technical step rather than part of the experimental manipulation. We have now clarified this procedure in the methods section of the revised manuscript.

      (5) "For a comparison of pain intensity ratings between time-points, an ANOVA with the within-subject factors Condition (placebo, nocebo, control) and Session (day 1, day 8) was carried out. For the comparison of placebo and nocebo effects between the two test days, an ANOVA with the with-subject factors Effect (placebo effect, nocebo effect) and Session (day 1, day 8) was used."

      It seems that one ANOVA is looking at raw pain scores and one is looking at difference scores, but this is a bit confusing - please rephrase/clarify this, and explain why it is useful to include both.

      We thank the reviewer for highlighting this point. Our primary analyses focus on placebo and nocebo effects, which we define as the difference in pain intensity ratings between the control and the placebo condition (placebo effect) and the nocebo and the control condition (nocebo effect), respectively.

      To examine whether condition effects were present at each time-point, we first conducted two separate repeated measures ANOVAs - one for day 1 and one for day 8 - with the within-subject factor CONDITION (placebo, nocebo, control).

      To compare the magnitude and persistence of placebo and nocebo effects over time, we then calculated the above-mentioned difference scores and submitted these to a second ANOVA with within-subject factors EFFECT (placebo vs. nocebo effect) and SESSION (day 1 vs. day 8). We have now clarified this approach on page 19 of the revised manuscript. To avoid confusion, the Condition x Session ANOVA has been removed from the manuscript.

      (6) Please can the authors provide a figure illustrating trial-by-trial ratings during test trials as well as during conditioning trials?

      In response to the reviewer’s point, we now provide the trial-by-trial ratings of the test phases on days 1 and 8 as an additional figure in the Supplement (Figure S1) and would like to clarify that trial-by-trial pain intensity ratings of the conditioning phase are displayed in Figure 2C of the manuscript,

      (7) "Separate multiple linear regression analyses were performed to examine the influence of expectations (GEEE ratings) and experienced effects (VAS ratings) on subsequent placebo and nocebo effects. For day 1, the placebo effect was entered as the dependent variable and the following variables as potential predictors: (i) expected improvement with placebo before conditioning, (ii) placebo effect during conditioning and (iii) the expected improvement with placebo before the test session at day 1"

      The term "placebo effect during conditioning" is a bit confusing - I believe this is just the effect of varying stimulus intensities - please could the authors be more explicit on the terminology they use to describe this? NB changes in pain rating during the conditioning trials do not count as a placebo/nocebo effect, as most of the change in rating will reflect differences in stimulation intensity.

      We agree with the reviewer that the cited paragraph refers to the actual application of lower or higher pain stimuli during the conditioning session, rather than genuinely induced placebo or nocebo effect. We thank the reviewer for this helpful observation and have revised the terminology, accordingly, now referring to these as “pain relief during conditioning” and “pain worsening during conditioning”.

      (8) Supplementary materials: "The three temperature levels were perceived as significantly different (VAS ratings; placebo condition: M= 32.90, SD= 16.17; nocebo condition: M= 56.62, SD= 17.09; control condition: M= 80.84, SD= 12.18"

      This suggests that the VAS rating for the control condition was higher than for the nocebo condition. Please could the authors clarify/correct this?

      We thank the reviewer for spotting this error. The values for the control and the nocebo condition had accidentally been swapped. This has now been corrected in the manuscript: control condition: M= 56.62, SD= 17.09; nocebo condition: M= 80.84, SD= 12.18.

      (9) "To predict placebo responses a week later (VAScontrol - VASplacebo at day 8), the same independent variables were entered as for day 1 but with the following additional variables (i) the placebo effect at day 1 and (ii) the expected improvement with placebo before the test session at day 8."

      Here it would be much clearer to say 'pain ratings during test trials at day 1".

      We agree with the reviewer and have revised the manuscript as suggested.

      (10) For completeness, please present the pain intensity ratings during conditioning as well as calibration/test trials in the figure.

      Please see our answer to comment (6).

      (11) In Figure 1a, it looks like some participants had rated the control condition as zero by day 8. If so, it's inappropriate to include these participants in the analysis if they are not responding to the stimulus. Were these the participants who were excluded due to pain insensitivity?

      On day 8, the lowest pain intensity ratings observed were VAS 3 in the placebo condition and VAS 2 in the control condition, both from the same participant. All other participants reported minimum values of VAS 11 or higher (all on a scale from 0-100). Thus, no participant provided a pain rating of VAS 0, and all ratings indicated some level of pain perception in response to the stimulus. We did not define an exclusion criterion based on day 8 pain ratings in our preregistration, and we did not observe any technical issues with the stimulation procedure. To avoid post-hoc exclusions and maintain consistency with our preregistered analysis plan, we therefore decided to include all participants in the analysis.

      (12) "Comparison of day 1 and day 8. A direct comparison of placebo and nocebo effects on day 1 and day 8 pain intensity ratings showed a main effect of Effect with a stronger nocebo effect (F(1,97)= 53.93, 131 p< .001, η2= .36) but no main effect of Day (F(1,97)= 2.94, p= .089, η2 = .029). The significant Effect x Session interaction indicated that the placebo effect and the nocebo effect developed differently over time (F(1,97)= 3.98, p= .049, η2 = .039)"

      This is confusing as it talks about a main effect of "day" and then interaction with "session" - are they two different models? The authors need to clarify.

      We thank the reviewer for pointing this out. In our analysis, “Session” is the correct term for the experimental factor, which has two factor levels, “day 1” and “day 8”. This has now been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) More information on how "size of the effect" in Figures 1b and 2b was calculated is needed; this can be in the legend. If these are differences between control and each condition, then they were reversed for one condition (nocebo?), which is ok - but this should be clearly explained.

      We agree with the reviewer and have now revised the figure legends to improve clarity. The legends now read:

      1b: “Figure 1. Pain intensity ratings and placebo and nocebo effects during calibration and test sessions. (A) Mean pain intensity ratings in the placebo, nocebo and control condition during calibration, and during the test sessions at day 1 and day 8. (B) Placebo effect (control condition - placebo condition, i.e., positive value of difference) and nocebo effect (nocebo condition - control condition, i.e., positive value of difference) on day 1 and day 8. Error bars indicate the standard error of the mean, circles indicate mean ratings of individual participants. *: p < .001, : p < .01, n.s.: non-significant.”

      2b: “Figure 2. Mean and trial-by-trial pain intensity ratings, placebo and nocebo effects during conditioning. (A) Mean pain intensity ratings of the placebo, nocebo and control condition during conditioning. (B) Placebo effect (control condition - placebo condition, i.e., positive value of difference) and nocebo effect (nocebo condition - control condition, i.e., positive value of difference) during conditioning. (C) Trial-by-trial pain intensity ratings (with confidence intervals) during conditioning. Error bars indicate the standard error of the mean, circles indicate mean ratings of individual participants. ***: p < .001.”

      (2) In the methods, I was missing a clear understanding of how many trials there were in the conditioning phase, and then how many in the other testing phases. Also, how long did the experiment last in total?

      We apologize that the exact number of trials in the testing phases was not clear in the original manuscript. We now indicate on page 18 of the revised manuscript that we used 10 trials per condition in the test sessions. We have also added information on the duration of each test day (i.e., three hours on day 1 and one hour on day 8) on page 15.

      (3) In expectancy ratings, line 186 - are improvement and worsening expectations different from expected pain relief? It is implied that these are two different constructs - it would be helpful to clarify that.

      We agree that this is indeed confusing and would like to clarify that both refer to the same construct. We used the Generic rating scale for previous treatment experiences, treatment expectations, and treatment effects (GEEE questionnaire, Rief et al. 2021) that discriminates between expected symptom improvement, expected symptom worsening, and expected side effects due to a treatment. We now use the terms “expected pain relief” and “expected pain worsening” throughout the whole manuscript.

      (4) In the last section of the Results, somatosensory amplification comes out of nowhere - and could be better introduced (see point 2 above).

      We agree with the reviewer that introducing the concept of somatosensory amplification and its potential link to placebo/nocebo effects only in the Methods is unhelpful, given that this section appears at the end of the manuscript. We therefore now introduce the relevant publication (Doering et al., 2015) before reporting our findings on this concept.

      (5) In line 169, if the authors want to specify what portion of the variance was explained by expectancy, they could conduct a hierarchical regression, where they first look at R2 without the expectancy entered, and only then enter it to obtain the R2 change.

      We fully agree that hierarchical regression can be a useful approach for isolating the contribution of variables. However, in our case, expectancy was assessed at different time points (e.g., before conditioning and before the test session on day 1), and there was no principled rationale for determining the order in which these different expectancy-related variables should be entered into a hierarchical model.

      That said, in response to the reviewer’s suggestion, we have now conducted hierarchical regression analyses in which all expectancy-related variables were entered together as a single block (see below). These analyses largely confirmed the findings reported so far and are provided here in the response to the reviewers below. Given the exploratory nature of this grouping and the lack of an a priori hierarchy, we feel that the standard multiple regression models remain the most appropriate for addressing our research question because it allows us to evaluate the total contribution of expectancy-related predictors while also examining the individual contribution of each variable within the block. We would therefore prefer to retain these as the primary analyses in the manuscript.

      Results of the hierarchical regression analyses:

      Day 1 - Placebo response: In step 1, we entered the difference in pain intensity ratings between the control and the placebo condition during conditioning as a predictor. In step 2, we added the two variables reflecting expectations (i.e., expected improvement with placebo (i) before conditioning and (ii) before the test session on day 1). This allowed us to assess whether expectation-related variables explained additional variance beyond the effect of conditioning.

      The overall regression model at step 1 was significant, F(1, 102) = 13.42, p < .001, explaining 11.6% of the variance in the dependent variable (R<sup>2</sup> = .116). Adding the expectancy-related predictors in step 2 did not lead to a significant increase in explained variance, ΔR<sup>2</sup> = .007, F(2, 100) = 0.384, p = .682. Thus, the conditioning response significantly predicted placebo-related pain reduction on day 1, but additional information on expectations did not account for further variance.

      Day 1 - Nocebo response: The equivalent analysis was run for the nocebo response on day 1. In step 1, the pain intensity difference between the nocebo and the control condition was entered as a predictor before adding the two expectancy ratings (i.e., expected worsening with nocebo (i) before conditioning and (ii) before the test session on day 1).

      In step 1, the regression model was not statistically significant, F(1, 102) = 2.63, p = .108, and explained only 2.5% of the variance in nocebo response (R<sup>2</sup> = .025). Adding the expectation-related predictors in Step 2 slightly increased the explained variance by ΔR<sup>2</sup> = .027, but this change was also non-significant, F(2, 100) = 1.41, p = .250. The overall variance explained by the full model remained low (R<sup>2</sup> = .052). These results suggest that neither conditioning nor expectation-related variables reliably predicted nocebo-related pain increases on day 1.

      Day 8 - Placebo response: For the prediction of the placebo effect on day 8, the following variables reflecting perceived effects were entered as predictors in step 1: the difference in pain intensity ratings between the control and the placebo condition (i) during conditioning and (ii) on day 1. In step 2, the variables reflecting expectations were added: the expected improvement with placebo (i) before conditioning, (ii) before the test session on day 1 and (iii) before the test session on day 8.

      In step 1, the model was statistically significant, F(3, 95) = 14.86, p < .001, explaining 23.8% of the variance in the placebo response (R<sup>2</sup> = .238, Adjusted R<sup>2</sup> = .222). In step 2, the addition of the expectation-related predictors resulted in a non-significant improvement in model fit, ΔR<sup>2</sup> = .051, F(3, 92) = 2.21, p = .092. The overall variance explained by the full model increased modestly to 29.0%.

      Day 8 - Nocebo response: For the equivalent analyses of nocebo responses on day 8, the following variables were included in step 1: the difference in pain intensity ratings between the nocebo and the control condition (i) during conditioning and (ii) on day 1. In step 2, we entered the variables reflecting nocebo expectations including expected worsening with nocebo (i) before conditioning, (ii) before the test session on day 1 and (iii) before the test session on day 8. In step 1, the model significantly predicted the day 8 nocebo response, F(3, 95) = 6.04, p = .003, accounting for 11.3% of the variance (R<sup>2</sup> = .113, Adjusted R<sup>2</sup> = .094). However, the addition of expectation-related predictors in Step 2 resulted in only a negligible and non-significant improvement, ΔR<sup>2</sup> = .006, F(3, 92) = 0.215, p = .886. The full model explained just 11.9% of the variance (R<sup>2</sup> = .119).

      Typos:

      (6) Abstract - 104 heathy xxx (word missing).

      (7) Line 61 - reduce or decrease - I think you meant increase.

      Thank you, we have now corrected both sentences.

      References

      Colloca L, Petrovic P, Wager TD, Ingvar M, Benedetti F. How the number of learning trials affects placebo and nocebo responses. Pain. 2010

      Doering BK, Nestoriuc Y, Barsky AJ, Glaesmer H, Brähler E, Rief W. Is somatosensory amplification a risk factor for an increased report of side effects? Reference data from the German general population. J Psychosom Res. 2015

    1. eLife Assessment

      This work describes a highly complex automated algorithm for analyzing vascular imaging data from two-photon microscopy. This tool has the potential to be extremely valuable to the field and to fill gaps in knowledge of hemodynamic activity across a regional network. The solid biological application provides a demonstration of their pipeline's capabilities and suggests intriguing hypotheses around prolonged vascular tone changes, but will need to be followed up by further experiments to be conclusively demonstrated.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors describe a new pipeline to measure changes in vasculature diameter upon opt-genetic stimulation of neurons.

      The work is interesting and the topic is quite relevant to better understand the hemodynamic response on the graph/network level.

      Strengths:

      The manuscript provides a pipeline that allows for the detection of changes in the vessel diameter as well as simultaneously allowing for the location of the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph-level mechanisms of regulating activity-dependent blood flow.

      The interesting findings include that vessel radius changes depend on depth from the cortical surface and that dilations on average happen closer to the activated neurons.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors develop a highly detailed pipeline to analyze hemodynamic signals from in vivo two-photon fluorescence microscopy. This includes motion correction, segmentation of the vascular network, diameter measurements across time, mapping neuronal position relative to the vascular network, and analyzing vascular network properties (interactions between different vascular segments). For the segmentation, the authors use a Convolution Neural Network to identify vessel (or neural) and background pixels and train it using ground truth images based on semi-automated mapping followed by human correction/annotation. Considerable processing was done on the segmented images to improve accuracy, extract vessel center lines, and compute frame-by-frame diameters. The model was tested with artificial diameter increases and Gaussian noise and proved robust to these manipulations.

      Network-level properties include Assortativity - a measure of how similar a vessel's response is to nearby vessels - and Efficiency - the ease of flow through the network (essentially, the combined resistance of a path based on diameter and vessel length between two points).

      Strengths:

      This is a very powerful tool for cerebral vascular biologists as many of these tasks are labor intensive, prone to subjectivity, and often not performed due to the complexity of collecting and managing volumes of vascular signals. Modelling is not my specialty so I cannot speak too specifically, but the model appears to be well-designed and robust to perturbations. It has many clever features for processing the data.

      The authors rightly point out that there is a real lack in the field of knowledge of vascular network activity at single-vessel resolution. Network anatomy has been studied, but hemodynamics are typically studied either with coarse resolution or in only one or a few vessels at a time. This pipeline has the potential to change that.

      [Editors' note: this work has been through three rounds of revisions, and most recently the authors have added caveats to the discussion. This version of the paper has been assessed by the editors and the weaknesses identified previously remain with earlier versions of the work.]

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon optogenetic stimulation of neurons. The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.

      The symbols in these formulas have been updated to disambiguate them, and the accompanying statements have been adjusted for clarity.

      (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial). Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.

      We’ve now included comparisons to Imaris (a commercial) for segmentation and VesselVio (open-source) for graph extraction software.

      For the ilastik comparison, the images were preprocessed prior to ilastik segmentation, specifically by doing intensity normalization.

      Example segmentations utilizing Imaris have now been included. Imaris leaves gaps and discontinuities in the segmentation masks, as shown in Supplementary Figure 10. The Imaris segmentation masks also tend to be more circular in cross-section despite irregularities on the surface of the vessels observable in the raw data and identified in manual segmentation. This approach also requires days to months to generate per image stack.

      A comparison to VesselVio has now also been generated, and results are visualized in Supplementary Figure 11. VesselVio generates individual graphs for each time point, resulting in potential discrepancies in the structure of the graphs from different time points. Furthermore, Vesselvio uses distance transformation to estimate the vascular radius, which renders the vessel radius estimates highly susceptible to variation in the user selected methodology used to obtain segmentation results; while our approach uses intensity gradient-based boundary detection from centerlines in the image instead mitigating this bias. We have added the following paragraph to the Discussion section on the comparisons with the two methods:

      “Comparison with commercial and open-source vascular analysis pipelines

      To compare our results with those achievable on these data with other pipelines for segmentation and graph network extraction, we compared segmentation results qualitatively with Imaris version 9.2.1 (Bitplane) and vascular graph extraction with VesselVio [1]. For the Imaris comparison, three small volumes were annotated by hand to label vessels. Example slices of the segmentation results are shown in Supplementary Figure 10. Imaris tended to either over- or under-segment vessels, disregard fine details of the vascular boundaries, and produce jagged edges in the vascular segmentation masks. In addition to these issues with segmentation mask quality, manual segmentation of a single volume took days for a rater to annotate. To compare to VesselVio, binary segmentation masks (one before and one after photostimulation) generated with our deep learning models were loaded into VesselVio for graph extraction, as VesselVio does not have its own method for generating segmentation masks. This also facilitates a direct comparison of the benefits of our graph extraction pipeline to VesselVio. Visualizations of the two graphs are shown in Supplementary Figure 11. Vesselvio produced many hairs at both time points, and the total number of segments varied considerably between the two sequential stacks: while the baseline scan resulted in 546 vessel segments, the second scan had 642 vessel segments. These discrepancies are difficult to resolve in post-processing and preclude a direct comparison of individual vessel segments across time. As the segmentation masks we used in graph extraction derive from the union of multiple time points, we could better trace the vasculature and identify more connections in our extracted graph. Furthermore, VesselVio relies on the distance transform of the user supplied segmentation mask to estimate vascular radii; consequently, these estimates are highly susceptible to variations in the input segmentation masks.We repeatedly saw slight variations between boundary placements of all of the models we utilized (ilastik, UNet, and UNETR) and those produced by raters. Our pipeline mitigates this segmentation method bias by using intensity gradient-based boundary detection from centerlines in the image (as opposed to using the distance transform of the segmentation mask, as in VesselVio).”

      (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.

      On reviewer’s comment, 2D slices have been added in the Supplementary Figure 4.

      (4) The pipeline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      All code has been uploaded to Github and is available at the following location: https://github.com/AICONSlab/novas3d

      The Matlab code for skeletonization is better at preserving centerline integrity during the pruning of hairs from centerlines than the currently available open-source methods.

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.  However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect . In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A thorough discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better assess the quality of the pipeline.

      2D slices from the additional datasets have been added in the Supplementary Figure 13 to aid in visualizing the models’ ability to generalize to other datasets.

      The dice range we report on (0.7-0.8) is good when compared to those (0.56-86) of 3D segmentations of large datasets in microscopy [2], [3], [4], [5], [6]. Furthermore, we had two additional raters segment three images from the original training set. We found that the raters had a mean inter class correlation  of 0.73 [7]. Our model outperformed this Dice score on unseen data: Dice scores from our generalizability tests on C57 mice and Fischer rats on par or higher than this baseline.

      Reviewer #2 (Public review):

      The authors have addressed most of my concerns sufficiently. There are still a few serious concerns I have. Primarily, the temporal resolution of the technique still makes me dubious about nearly all of the biological results. It is good that the authors have added some vessel diameter time courses generated by their model. But I still maintain that data sampling every 42 seconds - or even 21 seconds - is problematic. First, the evidence for long vascular responses is lacking. The authors cite several papers:

      Alarcon-Martinez et al. 2020 show and explicitly state that their responses (stimulus-evoked) returned to baseline within 30 seconds. The responses to ischemia are long lasting but this is irrelevant to the current study using activated local neurons to drive vessel signals.

      Mester et al. 2019 show responses that all seem to return to baseline by around 50 seconds post-stimulus.

      In Mester et al. 2019, diffuse stimulations with blue light showed a return to baseline around 50 seconds post-stimulus (cf. Figure 1E,2C,2D). However, focal stimulations where the stimulation light is raster scanned over a small region focused in the field of view show longer-lasting responses (cf. Figure 4) that have not returned to baseline by 70 seconds post-stimulus [8]. Alarcon-Martinez et al. do report that their responses return baseline within 30 seconds; however, their physiological stimulation may lead to different neuronal and vessel response kinetics than those elicited by the optogenetic stimulations as in current work.

      O'Herron et al. 2022 and Hartmann et al. 2021 use opsins expressed in vessel walls (not neurons as in the current study) and directly constrict vessels with light. So this is unrelated to neuronal activity-induced vascular signals in the current study.

      We agree that optogenetic activation of vessel-associated cells is distinct from optogenetic activation of neurons, but we do expect the effects of such perturbations on the vasculature to have some commonalities.

      There are other papers including Vazquez et al 2014 (PMID: 23761666) and Uhlirova et al 2016 (PMID: 27244241) and many others showing optogenetically-evoked neural activity drives vascular responses that return back to baseline within 30 seconds. The stimulation time and the cell types labeled may be different across these studies which can make a difference. But vascular responses lasting 300 seconds or more after a stimulus of a few seconds are just not common in the literature and so are very suspect - likely at least in part due to the limitations of the algorithm.

      The photostimulation in Vazquez et al. 2014 used diffuse photostimulation with a fiberoptic probe similar to Mester et al. 2019 as opposed to raster scanning focal stimulation we used in this study and in the study by Mester et al. 2019  where we observed the focal photostimulation to elicited longer than a minute vascular responses. Uhlirova et al. 2016 used photostimulation powers between 0.7 and 2.8 mW, likely lower than our 4.3 mW/mm<sup>2</sup> photostimulation. Further, even with focal photostimulation, we do see light intensity dependence of the duration of the vascular responses. Indeed, in Supplementary Figure 2, 1.1 mW/mm<sup>2</sup> photostimulation leads to briefer dilations/constrictions than does 4.3 mW/mm<sup>2</sup>; the 1.1 mW/mm<sup>2</sup> responses are in line, duration wise, with those in Uhlirova et al. 2016.

      Critically, as per Supplementary Figure 2, the analysis of the experimental recordings acquired at 3-second temporal resolution did likewise show responses in many vessels lasting for tens of seconds and even hundreds of seconds in some vessels.

      Another major issue is that the time courses provided show that the same vessel constricts at certain points and dilates later. So where in the time course the data is sampled will have a major effect on the direction and amplitude of the vascular response. In fact, I could not find how the "response" window is calculated. Is it from the first volume collected after the stimulation - or an average of some number of volumes? But clearly down-sampling the provided data to 42 or even 21 second sampling will lead to problems. If the major benefit to the field is the full volume over large regions that the model can capture and describe, there needs to be a better way to capture the vessel diameter in a meaningful way.

      In the main experiment (i.e. excluding the additional experiments presented in the Supplementary Figure 2 that were collected over a limited FOV at 3s per stack), we have collected one stack every 42 seconds. The first slice of the volume starts following the photostimulation, and the last slice finishes at 42 seconds. Each slice takes ~0.44 seconds to acquire. The data analysis pipeline (as demonstrated by the Supplementary Figure 2) is not in any way limited to data acquired at this temporal resolution and - provided reasonable signal-to-noise ratio (cf. Figure 5) - is applicable, as is, to data acquired at much higher sampling rates.

      It still seems possible that if responses are bi-phasic, then depth dependencies of constrictors vs dilators may just be due to where in the response the data are being captured - maybe the constriction phase is captured in deeper planes of the volume and the dilation phase more superficially. This may also explain why nearly a third of vessels are not consistent across trials - if the direction the volume was acquired is different across trials, different phases of the response might be captured.

      Alternatively, like neuronal responses to physiological stimuli, the vascular responses elicited by increases in neuronal activity may themselves be variable in both space and time.

      I still have concerns about other aspects of the responses but these are less strong. Particularly, these bi-phasic responses are not something typically seen and I still maintain that constrictions are not common. The authors are right that some papers do show constriction. Leaving out the direct optogenetic constriction of vessels (O'Herron 2022 & Hartmann 2021), the Alarcon-Martinez et al. 2020 paper and others such as Gonzales et al 2020 (PMID: 33051294) show different capillary branches dilating and constricting. However, these are typically found either with spontaneous fluctuations or due to highly localized application of vasoactive compounds. I am not familiar with data showing activation of a large region of tissue - as in the current study - coupled with vessel constrictions in the same region. But as the authors point out, typically only a few vessels at a time are monitored so it is possible - even if this reviewer thinks it unlikely - that this effect is real and just hasn't been seen.

      Uhlirova et al. 2016 (PMID: 27244241) observed biphasic responses in the same vessel with optogenetic stimulation in anesthetized and unanesthetized animals (cf Fig 1b and Fig 2, and section “OG stimulation of INs reproduces the biphasic arteriolar response”). Devor et al. (2007) and Lindvere et al. (2013) also reported on constrictions and dilations being elicited by sensory stimuli.

      I also have concerns about the spatial resolution of the data. It looks like the data in Figure 7 and Supplementary Figure 7 have a resolution of about 1 micron/pixel. It isn't stated so I may be wrong. But detecting changes of less than 1 micron, especially given the noise of an in vivo prep (brain movement and so on), might just be noise in the model. This could also explain constrictions as just spurious outputs in the model's diameter estimation. The high variability in adjacent vessel segments seen in Figure 6C could also be explained the same way, since these also seem biologically and even physically unlikely.

      Thank you for your comment. To address this important issue, we performed an additional validation experiment where we placed a special order of fluorescent beads with a known diameter of 7.32 ± 0.27um, imaged them following our imaging protocol, and subsequently used our pipeline to estimate their diameter. Our analysis converged on the manufacturer-specified diameters, estimating the diameter to be 7.34 ± 0.32. The manuscript has been updated to detail this experiment, as below:

      Methods section insert

      “Second, our boundary detection algorithm was used to estimate the diameters of fluorescent beads of a known radius imaged under similar acquisition parameters. Polystyrene microspheres labelled with Flash Red (Bangs Laboratories, inc, CAT# FSFR007) with a nominal diameter of 7.32um and a specified range of 7.32 ± 0.27um as determined by the manufacturer using a Coulter counter were imaged on the same multiphoton fluorescence microscope set-up used in the experiment (identical light path, resonant scanner, objective, detector, excitation wavelength and nominal lateral and axial resolutions, with 5x averaging). The images of the beads had a higher SNR than our images of the vasculature, so Gaussian noise was added to the images to degrade the SNR to the same level of that of the blood vessels. The images of the beads were segmented with a threshold, centroids calculated for individual spheres, and planes with a random normal vector extracted from each bead and used to estimate the diameter of the beads. The same smoothing and PSF deconvolution steps were applied in this task. We then reported the mean and standard deviation of the distribution of the diameter estimates. A variety of planes were used to estimate the diameters.”

      Results Section Insert

      “Our boundary detection algorithm successfully estimated the radius of precisely specified fluorescent beads. The bead images had a signal-to-noise ratio of 6.79 ± 0.16 (about 35% higher than our in vivo images): to match their SNR to that of in vivo vessel data, following deconvolution, we added Gaussian noise with a standard deviation of 85 SU to the images, bringing the SNR down to 5.05 ± 0.15. The data processing pipeline was kept unaltered except for the bead segmentation, performed via image thresholding instead of our deep learning model (trained on vessel data). The bead boundary was computed following the same algorithm used on vessel data: i.e., by the average of the minimum intensity gradients computed along 36 radial spokes emanating from the centreline vertex in the orthogonal plane. To demonstrate an averaging-induced decrease in the uncertainty of the bead radius estimates on a scale that is finer than the nominal resolution of the imaging configuration, we tested four averaging levels in 289 beads. Three of these averaging levels were lower than that used on the vessels, and one matched that used on the vessels (36 spokes per orthogonal plane and a minimum of 10 orthogonal planes per vessel). As the amount of averaging increased, the uncertainty on the diameter of the beads decreased, and our estimate of the bead's diameter converged upon the manufacturer's Coulter counter-based specifications (7.32 ± 0.27um), as tabulated below in Table 1.”

      Bibliography

      (1) J. R. Bumgarner and R. J. Nelson, “Open-source analysis and visualization of segmented vasculature datasets with VesselVio,” Cell Rep. Methods, vol. 2, no. 4, Apr. 2022, doi: 10.1016/j.crmeth.2022.100189.

      (2) G. Tetteh et al., “DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes,” Front. Neurosci., vol. 14, Dec. 2020, doi: 10.3389/fnins.2020.592352.

      (3) N. Holroyd, Z. Li, C. Walsh, E. Brown, R. Shipley, and S. Walker-Samuel, “tUbe net: a generalisable deep learning tool for 3D vessel segmentation,” Jul. 24, 2023, bioRxiv. doi: 10.1101/2023.07.24.550334.

      (4) W. Tahir et al., “Anatomical Modeling of Brain Vasculature in Two-Photon Microscopy by Generalizable Deep Learning,” BME Front., vol. 2020, p. 8620932, Dec. 2020, doi: 10.34133/2020/8620932.

      (5) R. Damseh, P. Delafontaine-Martel, P. Pouliot, F. Cheriet, and F. Lesage, “Laplacian Flow Dynamics on Geometric Graphs for Anatomical Modeling of Cerebrovascular Networks,” ArXiv191210003 Cs Eess Q-Bio, Dec. 2019, Accessed: Dec. 09, 2020. (Online). Available: http://arxiv.org/abs/1912.10003

      (6) T. Jerman, F. Pernuš, B. Likar, and Ž. Špiclin, “Enhancement of Vascular Structures in 3D and 2D Angiographic Images,” IEEE Trans. Med. Imaging, vol. 35, no. 9, pp. 2107–2118, Sep. 2016, doi: 10.1109/TMI.2016.2550102.

      (7) T. B. Smith and N. Smith, “Agreement and reliability statistics for shapes,” PLOS ONE, vol. 13, no. 8, p. e0202087, Aug. 2018, doi: 10.1371/journal.pone.0202087.

      (8) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

    1. eLife Assessment

      This paper provides important insight into how early life experience shapes adult behavior in fruit bats. The authors raised juvenile bats either in an impoverished or enriched environment and studied their foraging behaviors. The evidence is convincing that bats raised in enriched environments are more active, bold, and exploratory. The work will be of interest to ethologists and developmental psychologists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that early life experience of juvenile bats shape their outdoor foraging behaviors. They achieve this by raising juvenile bats either in an impoverished or enriched environment. They subsequently test the behavior of bats indoors and outdoors. The authors show that behavioral measures outdoors were more reliable in delineating the effect of early life experiences as the bats raised in enriched environments were more bold, active and exhibit higher exploratory tendencies.

      Strengths:

      The major strength of the study is providing a quantitative study of animal "personality" and how it is likely shaped by innate and environmental conditions. The other major strength is the ability to do reliable long term recording of bats in the outdoors giving researchers the opportunity to study bats in their natural habitat. To this point, the study also shows that the behavioral variables measured indoors do not correlate to that measured outdoor, thus providing a key insight into the importance of test animal behaviors in their natural habitat.

      Weaknesses were in the first round of review:

      It is not clear from the analysis presented in the paper how persistent those environmentally induced changes, do they remain with the bats till end of their lives.

      Comments on revisions:

      The authors have addressed those weaknesses and the paper is much stronger.

    1. eLife Assessment

      This revised manuscript presents an important characterization of mouse auditory cortex receptive field organization, utilizing two-photon imaging of specific subpopulations. They demonstrate a degradation of tonotopic organization from the input to the output neurons. The strength of the evidence is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Gu et al., employed novel viral strategies, combined with in vivo two-photon imaging, to map the tone response properties of two groups of cortical neurons in A1 - The thalamocortical recipient (TR neurons) and the corticothalamic (CT neurons). They observed a clear tonotopic gradient among TR neurons but not in CT neurons. Moreover, CT neurons exhibited high heterogeneity of their frequency tuning and broader bandwidth, suggesting increased synaptic integration in these neurons. By parsing out different projecting-specific neurons within A1, this study provides insight into how neurons with different connectivity can exhibit different frequency response-related topographic organization.

      Strengths:

      This study reveals the importance of studying neurons with projection specificity rather than layer specificity since neurons within the same layer have very diverse molecular, morphological, physiological, and connectional features. By utilizing a newly developed rabies virus CSN-N2c GCaMP-expressing vector, the authors can label and image specifically the neurons (CT neurons) in A1 that project to the MGB. To compare, they used an anterograde trans-synaptic tracing strategy to label and image neurons in A1 that receive input from MGB (TR neurons).

      Weaknesses:

      - Perhaps as cited in the introduction, it is well known that tonotopic gradient is well preserved across all layers within A1, but I feel if the authors want to highlight the specificity of their virus tracing strategy and the populations that they imaged in L2/3 (TR neurons) and L6 (CT neurons), they should perform control groups where they image general excitatory neurons in the two depths and compare to TR and CT neurons, respectively. This will show that it's not their imaging/analysis or behavioral paradigms that are different from other labs.  

      - Fig 1D and G, the y-axis is Distance from pia (%). I'm not exactly sure what this means. How does % translate to real cortical thickness? 

      - For Fig. 2G and H, is each circle a neuron or an animal? Why are they staggered on top of each other on the x-axis? If x-axis is thedistance from caudal to rostral, each neuron should have a different distance? Also,it seems like it's because Fig. 2H has more circles, that's why it has morevariation thus not significant (for example, at 600 or 900um, 2G seems to haveless circles than 2H).  

      - Similar in Fig 2J and L, why are the circles staggered onthe y-axis now? And is each circle now a neuron or a trial? It seems they havemuch more circles than Fig 2G and 2H. Also I don't think doing a correlation isthe proper stats for this type of plot (this point applies to Fig. 3H and 3J)

      - What does inter-quartile range of BF (IQRBF, in octaves) imply? What's the interpretation of this analysis? I am confused why TR neurons showhigh IQR in HF areas compared to LF areas mean homogeneity among TR neurons (line 213 - 216). On the same note, how is this different from the BF variability?  Isn't higher IQR = tohigher variability?

      - Fig. 4A-B, there's no clear critieria on how the authors categorize V, I, and O Shape. The descriptions in the Methods (line 721 - 725) are also very vague.  

      Comments on revisions:

      The authors have addressed all my questions in the previous round.

    3. Reviewer #2 (Public review):

      Summary:

      Gu and Liang et. al investigated how auditory information is mapped and transformed as it enters and exits a auditory cortex. They use anterograde transsynaptic tracers to label and perform calcium imaging of thalamorecipient neurons in A1 and retrograde tracers to label and perform calcium imaging of corticothalamic output neurons. They demonstrate a degradation of tonotopic organization from the input to output neurons.

      Strengths:

      The experiments appear well executed, well described, and analyzed.

      Weaknesses:

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording 1) CT neurons upper layers 2) TR in deeper layers 3) non-CT in deeper layers and/or 4) non-TR in upper layers.

      (2) What percent of the neurons at the depths being are CT neurons? Similar questions for TR neurons?

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      (4). Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, V1.

      Comments on revisions:

      The authors have fully addressed my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high frequency regions of primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and report that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      (3) The authors definitions of neuronal response type in the methods needs more quantitative detail. The authors state: ""Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels." The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

      Comments on revisions:

      I think the authors misunderstood my point about sound level and characteristic frequency vs best frequency tonotopic maps. Yes, many studies of cortical responses present stimuli at higher intensities than the characteristic frequencies, but as tuning curves widen with sound level, the macroscopic tonotopic organization of primary auditory cortex breaks down at higher intensities. This is why most of the classic studies of tonotopy e.g., from the Merzenich lab) generated maps of characteristic frequency. As I mentioned before, this isn't so much of an issue for the authors' comparisons of TR vs CT organization in their own study, but in general, this makes it difficult to compare aspects of spatially-organized tonotopy from imaging studies with the older electrophysiological 'truer' tonotopic maps. That said, this means that CT cells also might be tonotopically organized if the authors had been able to look at lower intensity tuning properties.

    1. eLife Assessment

      This study presents a valuable assessment of and solid evidence for increased similarity in visual appearance combined with increased chemical differences between two butterfly species in sympatry compared with differences between three populations of one of the two species in allopatry. The similarity in visual appearance hints to an evolutionary response to shared predators (but alternative explanations are possible). Thus, the difference in chemical signaling likely helps to avoid between-species mating in sympatry.

    2. Joint Public Review:

      Summary:

      Ledamoisel et al. examined the evolution of visual and chemical signals in closely related Morpho butterfly species to understand their role in species coexistence. Using an integrative, state-of-the-art approach combining spectrophotometry, visual modeling, and behavioral mate choice experiments, they quantified differences in wing iridescence and assessed its influence on mate preference in allopatry and sympatry. They also performed chemical analyses to determine whether sympatric species exhibit divergent chemical cues that may facilitate species recognition and mate discrimination. The authors found iridescent coloration to be similar in sympatric Morpho species. Furthermore, male mate choice experiments revealed that in sympatry, males fail to discriminate conspecific females based on coloration, reinforcing the idea that visual signal convergence is primarily driven by predation pressure. In contrast, the divergence of chemical signals among sympatric species suggests their potential role in facilitating species recognition and mate discrimination. The authors conclude that interactions between ecological pressures and signal evolution may shape species coexistence.

      Strengths:

      The study is well-designed and integrates multiple methodological approaches to provide a thorough assessment of signal evolution in the studied species. We appreciate the authors' careful consideration of multiple selective pressures and their combined influence on signal divergence and convergence. Additionally, the inclusion of both visual and chemical signals adds an interesting and valuable dimension to the study, enhancing its importance. Beyond butterflies, this research broadens our understanding of multimodal communication and signal evolution in the context of species coexistence.

      Reviewing Editor comment:

      The authors have improved their submission after revisions and responded to the previous concerns of the reviewers.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, Ledamoisel et al. examined the evolution of visual and chemical signals in closely related Morpho butterfly species to understand their role in species coexistence. Using an integrative, state-of-the-art approach combining spectrophotometry, visual modeling, and behavioral mate choice experiments, they quantified differences in wing iridescence and assessed its influence on mate preference in allopatry and sympatry. They also performed chemical analyses to determine whether sympatric species exhibit divergent chemical cues that may facilitate species recognition and mate discrimination. The authors found iridescent coloration to be similar in sympatric Morpho species. Furthermore, male mate choice experiments revealed that in sympatry, males fail to discriminate conspecific females based on coloration, reinforcing the idea that visual signal convergence is primarily driven by predation pressure. In contrast, the divergence of chemical signals among sympatric species suggests their potential role in facilitating species recognition and mate discrimination. The authors conclude that interactions between ecological pressures and signal evolution may shape species coexistence.

      Strengths:

      The study is well-designed and integrates multiple methodological approaches to provide a thorough assessment of signal evolution in the studied species. I appreciate the authors' careful consideration of multiple selective pressures and their combined influence on signal divergence and convergence. Additionally, the inclusion of both visual and chemical signals adds an interesting and valuable dimension to the study, enhancing its importance. Beyond butterflies, this research broadens our understanding of multimodal communication and signal evolution in the context of species coexistence.

      Weaknesses:

      (1) The broader significance of the findings needs to be better articulated. While the authors emphasize that comparing adaptive traits in sympatry and allopatry provides insights into selective processes shaping reproductive isolation and coexistence, it is unclear what key conceptual or theoretical questions are being addressed. Are these patterns expected under certain evolutionary scenarios? Have they been empirically demonstrated in other systems? The authors should explicitly state the overarching research question, incorporate some predictions, and better contextualize their findings within the existing literature. If the results challenge or support previous work, that should be highlighted to strengthen the study's importance in a broader context.

      We thank the reviewer for their valuable feedback. We understand that the framing of the results and the discussion may fail to convey the broader significance of our findings. In the first version of the manuscript, we framed our manuscript around the processes shaping reproductive isolation and co-existence in sympatry, but now realize that this question was too broad in regards to our results. We thus strictly focused on outlining the importance of ecological interactions in the evolution of traits in sympatric species. In the revised version of the manuscript, we rewrote the first paragraph of the introduction to introduce context regarding the effect of ecological interactions on trait evolution (lines 43-60). We then explicitly introduce the theoretical question investigated in our paper (i.e. “we investigate how ecological interactions in sympatry can constrain natural and sexual selection shaping trait evolution”, lines 62-63) and our predictions regarding the evolution of traits in sympatry vs. allopatry (lines 74-80). We also added predictions regarding our experiments on Morpho at the end of the introduction (lines 146-157). As a result, the discussion is now better aligned with the introduction, by discussing the putative effect of predation and mate choice on the evolution of wing iridescence in Morpho.

      (2) The motivation for studying visual signals and mate choice in allopatric populations (i.e., at the intraspecific level) is not well articulated, leaving their role in the broader narrative unclear. In particular, the rationale behind experiments 1, 2, and 3 is not well defined, as the authors have not made a strong case for the need for these intraspecific comparisons in the introduction. This issue is further compounded by the authors' primary focus on signal evolution in sympatry throughout both the results and the discussion. For instance, the divergence of iridescence in allopatry is a potentially interesting result. But the authors have not discussed its implications.

      We now clearly state in the introduction our motivation for studying visual signals and mate choice in allopatric populations (lines 74-80, lines 146-157). We argued that intraspecific comparisons help identify whether visual cues can be used in mate recognition between phylogenetically close subspecies, between whom visual resemblance is supposed to be higher than between closely-related species (tetrad experiment, and experiment 1). As M. h. bristowi and M. h. theodorus have different wing pattern, we also used this comparison to identify the traits involved in male mate preference within a species, testing the importance of iridescent color (experiment 2) or iridescent patterning (experiment 3). The results of those experiments can then be used to assess whether these traits are used in species recognition between sympatric species. See also our answers to recommendations 11 and 15 from reviewer #1.

      Overall, given that the primary conclusions are based on results and analyses in sympatry, the role of allopatric populations in shaping these conclusions needs to be better integrated and justified. Without a stronger link between the comparative framework and the study's key takeaways, the use of allopatric populations feels somewhat peripheral rather than central to the study's aim. Since the primary conclusions remain valid even without the allopatric comparisons, their inclusion requires a clearer rationale.

      To make a stronger case for the use of the allopatric population in our manuscript, we strengthened the justification behind the study of intraspecific allopatric populations vs. interspecific sympatric populations, as the iridescence measurements and the mate choice experiments in allopatric populations can serve as a baseline in studying how species interactions can shape the evolution of traits and mate recognition when compared to sympatric populations. Following your major comment #1, we rewrote the introduction to include a justification to the need for studying allopatric vs. sympatric populations (lines 74-80), and also further highlighted the need to study iridescence in sympatric species to fully understand the trait evolution of sympatric species in the discussion (339-343).

      (3) While the authors demonstrate that iridescence is indistinguishable to predators in sympatry, they overstate the role of predation in driving convergence. The present study does not experimentally demonstrate that iridescence in this species has a confusion effect or contributes to evasive mimicry. Alternatively, convergence could result from other selective forces, such as signal efficacy due to environmental conditions, rather than being solely driven by predation.

      We acknowledge that our study does not directly demonstrate that iridescence contributes to evasive mimicry. We did tone down the interpretation of the results in the discussion and state that predation is not the only selective pressure that could have promoted a convergent evolution of iridescence in sympatric species, as iridescence is a trait that could be involved in thermoregulation (lines 346-353) and camouflage (lines 363-369) for example. We made sure to mention that convergence in iridescent signals in sympatry is only an indirect support to the evasive mimicry hypothesis, and that further research is still needed, including direct predation experiments, to show that this convergence is indeed triggered by predation (lines 391-396).  

      Reviewer #2 (Public review):

      This study presents an investigation of the visual and chemical properties and mating behaviour in Morpho butterflies, aimed at addressing the nature of divergence between closely related species in sympatry. The study species consists of three subspecies of Morpho helenor (bristowi, theodorus, and helenor), and the conspecific Morpho achilles achilles. The authors postulate that whereas the iridescent blue signals of all (sub)species should function as a predator reduction signal (similar to aposematism) and therefore exhibit convergence, the same signals should indicate divergence if used as a mating signal, particularly in sympatric populations. They also assess chemical profiles among the species to assess the potential utility of scent in mediating species/sex discrimination.

      The authors first used reflectance spectrometry to calculate hue, brightness, and chroma, plus two measures of "iridescence" (perhaps better phrased as angular dependence) in each (sub)species. This indicated the ubiquitous presence of sexual dimorphism in brightness (males brighter), which also appears to be the case for iridescence (Figure 3A-B). Analysis of these data also indicated that whereas there is evidence for divergence among subspecies in allopatry, the same evidence is lacking for species in sympatry (P = 0.084). This was supported further by visual modelling, which showed that both conspecifics and birds should be (theoretically) capable of perceiving the colour difference among allopatric populations of M. helenor, whereas the same is not true for the sympatric species.

      The authors then conducted mate choice trials, first using live individuals and second using female dummies. The live experiments indicated the presence of assortative mating among the two subspecies of M. helenor (bristowi and theodorus). The dummy presentations indicated (a) bristowi males prefer conspecific wings, whereas theodorus have no preference, (b) bristowi males prefer the con(sub)specific colour pattern, (c) theodorus prefer the con(sub)specific iridescence when the pattern is manipulated to be similar among female dummies. A fourth experiment, using sympatric M. achilles and M. helenor, indicated no preference for conspecific female dummies. Finally, chemical analysis indicated substantial differences between these two species in putative pheromone compounds, and especially so in the males.

      The authors conclude that the similarity of iridescence among species in sympatry is suggestive of convergence upon a common anti-predation signal. Despite some behavioural evidence in favourof colour (iridescence)-based mate discrimination, chemical differences between Achilles and Helenor are posed as more likely to function for species isolation than visual differences.

      Overall, I enjoyed reading this manuscript, which presents a valiant attempt at studying visual, chemical and behavioural divergence in this iconic group of butterflies.

      Major comments

      My only major comment concerns the authors' favoured explanation for aposematism (or evasive mimicry) for convergence among species, which is based upon the you-can't-catch-me hypothesis first presented by Young 1971. Although there is supporting work showing that iridescent-like stimuli are more difficult to precisely localize by a range of viewers, most of the evidence as applied to the Morpho system is circumstantial, and I'm not certain that there is widespread acceptance of this hypothesis. Given that the present study deals with closely-related  (sub)species, one alternative explanation - a "null" hypothesis of sorts - is for a lack of divergence (from a common starting point) as opposed to evolutionary convergence per se. in other words, two subspecies are likely to retain ancestral character states unless there is selection that causes them to diverge. I feel that the manuscript would benefit from a discussion of this alternative, if not others. Signalling to predators could very well be involved in constraining the extent of convergence, but this seems a little premature to state as an up-front conclusion of this work. There is also the result of a *dorsal* wing manipulation by Vieira-Silva et al. 2024 which seems difficult to reconcile in light of this explanation. Whereas this paper is cited by the authors, a more nuanced discussion of their experimental results would seem appropriate here.

      We thank the reviewer for their constructive comments on our manuscript. We appreciate the reviewer’s concern regarding the way iridescence convergence between sympatric species is discussed in our manuscript, which align with similar concerns raised by Reviewer 1. Indeed, the you-can't-catch-me hypothesis has not been yet empirically tested in Morpho, this is currently a working hypothesis only supported by indirect lines of evidence.

      Among the 30 known Morpho species, iridescence is most likely the ancestral character, notably because iridescence is a trait shared by a majority of Morpho (we now mention this in the introduction lines 108-110). In this paper, we thus did not aim to identify the evolutionary forces involved in the appearance of iridescence in this group, but rather wanted to understand to what extent ecological interactions can impact the diversification (or not) of this trait. As such, the dorsal manipulations performed in Vieira-Silva et al 2024 showing that iridescence in Morpho may have a similar effect than crypsis does not impact our working hypothesis. Instead, we use VieraSilva et al 2024 to discuss the potential anti-predator effect of iridescence, that could potentially promote convergent evolution of iridescent patterns.

      In the main text, we now clearly mention our null hypothesis: under a scenario of neutral evolution of iridescence, we would expect that the divergence in wing coloration between two M. helenor subspecies would be lower than between two different Morpho species (M. helenor and M. achilles) and showed that our results sharply differ from this null expectation.

      We then improved the discussion by adding alternative hypotheses potentially explaining the convergent iridescent signal detected in sympatric species: we discussed the expected effect under neutral evolution (lines 339-343), but also added alternative hypotheses regarding the diversification of iridescence due to camouflage (lines 363-369), predator evasion (lines 373-377) and thermoregulation (lines 346-353).

      Reviewer #3 (Public review):

      The authors investigated differences in iridescence wing colouration of allopatric (geographically separated) and sympatric (coexisting) Morpho butterfly (sub)species. Their aim was to assess if iridescence wing colouration of Morpho (sub)species converged or diverged depending on coexistence and if iridescence wing colouration was involved in mating behaviour and reproductive isolation. The authors hypothesize that iridescence wing colouration of different (sub)species should converge in sympatry and diverge in allopatry. In sympatry, iridescence wing colouration can act as an effective antipredator defence with shared benefits if multiple (sub)species share the same colouration. However, shared wing colouration can have potential costs in terms of reproductive interference since wing colouration is often involved in mate recognition. If the benefits of a shared antipredator defence outweigh the costs of reproductive interference, iridescence wing colouration will show convergence and alternative mate recognition strategies might evolve, such as chemical mate recognition. In allopatry, iridescence wing colouration is expected to diverge due to adaptation to different local conditions and no alternative mate recognition is expected.

      Strengths:

      (1) Using allopatric and sympatric (sub)species that are closely related is a powerful way to test evolutionary hypotheses

      (2) By clearly defining iridescence and measuring colour spectra from a variety of angles, applying different methods, a very comprehensive dataset of iridescence wing colouration is achieved.

      (3) By experimentally manipulating wing coloration patterns, the authors show visual mate recognition for M. h. bristowi and could, in theory, separate different visual aspects of colouration (patterns VS iridescence strength).

      (4) Measurements of chemical profiles to investigate alternative mate recognition strategies in case of convergence of visual signals.

      Weaknesses:

      In my opinion, studies should be judged on the methods and data included, and not on additional measurements that could have been taken or additional treatments/species that should be included, since in most ecological and evolutionary studies, more measurements or treatments/species can always be included. However, studies do need to ensure appropriate replication and appropriate measurements to test their hypothesis AND support their conclusions. The current study failed to ensure appropriate replication, and in various cases, the results do not support the conclusions.

      First, when using allopatric and sympatric (sub)species pairs to test evolutionary hypotheses, replication is important. Ideally, multiple allopatric and sympatric (sub)species pairs are compared to avoid outlier (sub)species or pairs that lead to biased conclusions. Unfortunately, the current study compares 1 allopatric and 1 sympatric (sub)species pair, hence having poor (no) replication on the level of allopatric and sympatric (sub)species pairs,

      We would like to thank the reviewer for their constructive feedback. We agree that replication is important to test evolutionary hypotheses and that our study lacks replication for allopatric and sympatric Morpho populations. Ideally, one would require several allopatric and sympatric replicates to conclude on the effect of species interaction in trait evolution. Our study is a preliminary attempt at answering this question, covering a few Morpho populations but proposing a broad assessment of iridescence and mate preference for those populations. We clearly mentioned in the discussion that investigating multiple populations is needed to test whether the trend we observed in this paper can be generalized (line 388-392).

      Second, chemical profiles were only measured for sympatric species and not for allopatric (sub)species, which limits the interpretation of this data. The allopatric (sub)species could have been measured as non-coexistence "control". If coexistence and convergence in wing colouration drives the evolution of alternative mate recognition signals, such alternative signals should not evolve/diverge for allopatric (sub)species where wing colouration is still a reliable mate recognition cue. More importantly, no details are provided on the quantification of butterfly chemical profiles, which is essential to understand such data. It is unclear how the chemical profiles were quantified and what data (concentrations, ratios, proportions) were used to perform NDMS and generate Figure 5 and the associated statistical tests.

      We recognize that having the chemical profiles of the genitalia of the Morpho from the allopatric populations would have made a stronger case in favor of reinforcement acting on the divergence of the chemical compounds found on the genitalia of the sympatric Morpho species. Due to limited access to the biological material needed at the time of the chromatography, we could not test for lower divergence in the chemical profiles of allopatric Morpho butterflies. We made sure to mention this limitation in the discussion (lines 457-461). 

      We already stated in the methods that we compiled the area under the peak of each components found in the chromatograms of our samples and that we performed all the statistical analyses on this dataset. To make it clearer, we mention in the new version of the manuscript that the area under the peak of each component allows to measure the concentration of the components (in the methods lines 720, 723, 733). We also added some precisions in the legend of Figure 5.

      Third, throughout the discussion, the authors mention that their results support natural selection by predators on iridescent wing colouration, without measuring natural selection by predators or any other measure related to predation. It is unclear by what predators any of the butterfly species are predated on at this point

      We made sure to mention in the introduction (line 132-136) and in the discussion (line 373-377) that previous predation experiments performed on Morpho and other butterflies showed evidence that birds are likely predators for these species. These observations lead us to test for the putative effect of predation on the evolution of their color pattern, without directly testing predatory rates. We made sure this information is transparent in the revised manuscript, and now precise that assessing wing convergence is only an indirect way of testing the escape mimicry hypothesis (line 393-396).

      To continue on the interpretation of the data related to selection on specific traits by specific selection agents: This study did not measure any form of selection or any selection agent. Hence, it is not known if iridescent wing colouration is actually under selection by predators and/or mates, if maybe other selection agents are involved or if these traits converge due to genetic correlations with other traits under selection. For example, Iridescent colouration in ground beetles has functions as antipredator defence but also thermo- and water regulation. None of these issues are recognized or discussed.

      The lack of discussion of alternative selective pressures involved in the evolution of iridescence was pointed out by all reviewers. We thus modified the text to account for this comment, and no longer limit our discussion to the putative effects of predation. We now specifically discuss alternative hypotheses, including crypsis (362-369) and thermoregulation (line 346-353).

      Finally, some of the results are weakly supported by statistics or questionable methodology.

      Most notably, the perception of the iridescence coloration of allopatric subspecies by bird visual systems. Although for females, means and errors (not indicated what exactly, SD, SE or CI) are clearly above the 1 JND line, for males, means are only slightly above this line and errors or CIs clearly overlap with the 1 JND line. Since there is no additional statistical support, higher means but overlap of SD, SE or CI with the baseline provides weak statistical support for differences.

      We thank the reviewer for bringing interpretation issues concerning the chromatic distances of allopatric Morpho species measured with a bird vision model. We made sure to be nuanced in the description of this graph in the results section (line 208-212). Note that this addition does not change our main conclusion stating that Morpho and predator visual models better discriminate iridescence differences between allopatric subspecies than between sympatric species.

      We now also clearly mention in the figure’s legend that the error bars represent the confidence intervals obtained after performing a bootstrap analysis, in addition to the mention of the nature of the error bars already mentioned in the methods (line 580).

      Regarding the assortative mating experiment, the results are clearly driven by M. bristowi. For M. theodorus, females mate equally often with conspecifics (6 times) as with M. bristowi (5 times). For males, the ratio is slightly better (6 vs 3), but with such low numbers, I doubt this is statistically testable. Overall low mating for M. bristowi could indicate suboptimal experimental conditions, and hence results should be interpreted with care.

      We recognize that the tetrad experiment results are mainly driven by M. bristowi’s behavior as already mentioned in the results (line 231-232) but we now also mention it in the discussion (lines 401-402). This experiment would have benefited from more replicates, but the limited access to live males and virgin females for both subspecies was a limiting factor. Fisher’s exact test used to assess assortative mating is specifically appropriate to small sample sizes. We recognize that the sampling size is not ideal, however it is still statistically testable.

      Regarding the wing manipulation experiment, M. theodorus does not show a preference when dummies with non-modified wings are presented and prefers non-modified dummies over modified dummies. This is acknowledged by the authors but not further discussed. Certainly, some control treatment for wing modification could have been added.

      The use of controls to consider the effect of wing modification and odor by the permanent marker were already mentioned in the methods (lines 636-639). Following your recommendation and comments from the other reviewers, we now mention the use of this control in the results (lines 278283). We also address a potential issue that would have resulted in the rejection of these modified dummies by live males: we cannot be sure whether butterflies perceive these modifications as equivalent to natural coloration (lines 281-282). An additional control could have been used, adding black ink on the black dorsal parts of the pattern to assess its potential visual effect. The constraints on sampling unfortunately did not allow to add another treatment.

      Overall, the fact that certain measurements only provide evidence for 1 of the 2 (sub)species (assortative mating, wing manipulation) or one sex of one of the species (bird visual systems) means overall interpretation and overgeneralization of the results to both allopatric or sympatric species should be done with care, and such nuances should ideally be discussed.

      The aim of the authors, "to investigate the antagonistic effects of selective pressures generated by mate recognition and shared predation" has not been achieved, and the conclusions regarding this aim are not supported by the results. Nevertheless, the iridescence colour measurements are solid, and some of the behavioural experiments and chemical profile measurements seem to yield interesting results. The study would benefit from less overinterpretation of the results in the framework of predation and more careful consideration of methodological difficulties, statistical insecurities, and nuances in the results.

      Overall, we would like to thank all reviewers for their thorough assessment of our work. We understand that the imbalance between mate choice data, visual model data and chemical data only gives us a partial assessment of species recognition in Morpho butterflies, thus requiring more precision in the interpretation and the discussion of our results. We made sure to add balanced interpretations in our discussion, by mentioning the lack of replicates for allopatric and sympatric populations (lines 391-392), and the lack of chemical characterization of allopatric species (lines 458361, see previous comments) and by being more transparent on methodological limitations that we failed to convey in the first version of our manuscript. We brought nuance to our discussion and also discussed alternative hypotheses to predation to explain the convergence of iridescence found in sympatry.

      Reviewing Editor Comments:

      While all reviewers acknowledge the value of your data, they converge in their recommendations to tone down the evolutionary interpretations. Ideally, to test your main hypothesis, you would need several species pairs, or if only one, as in your case, replicated sympatric and allopatric sites for both species. Furthermore, your more specific hypotheses about convergence (vs. nondivergence), response to predators (vs. other environmental variables), and avoiding interspecific mating in sympatry (vs. not avoiding it in allopatry) would require appropriate alternative treatments/controls. We therefore recommend that you focus on those statements that you can support with your experiments and data, and introduce these statements in the introduction with reference to the appropriate literature.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25: This stated aim seems a bit off. The authors did not sensu stricto quantify 'how shared adaptive traits may shape genetic divergence' in this study. I suggest rewriting or deleting this whole sentence altogether. The study's aim is already clear in lines 29-34.

      We deleted the mention of the characterization of genetic divergence, since this study did not focus on any genetic analysis.

      (2) Line 34: The authors here state that they compared allopatric vs sympatric populations. This is strictly not true for M. Achilles. Further, the results after this sentence focus solely ondivergence/convergence in sympatry, nothing at the intraspecific level and implications of the findings

      We now mention that we tested allopatric vs. sympatric species of M. helenor only (lines 28-29). We also mention that the behavioral experiments were based on intraspecific comparisons, and discuss the implications of this result in the discussion.

      (3) Line 35: 'convergence driven by predation': this is a strong statement and cannot be directly inferred from the present set of experiments. Consider toning it down.

      We added nuance to this statement by rephrasing it “suggesting that predation may favors local resemblance” (lines 32-33)

      (4) Line 36: Replace 'behavioral results' with 'behavioral experiments' or something similar.

      Corrected

      (5) Line 45-49: These opening statements need some citations.

      We provided references for the first few lines, by citing terHorst et al 2018 (line 44) underlining the importance of species interactions in trait evolution, and Blomberg et al 2003 (line 45) showing that closely-related species tend to resemble each other by quantifying the phylogenetic signal of various traits.

      (6) Line 83, 165: 'visual effect', not sure what the authors are referring to. Please rewrite.

      We defined “visual effect” as the way wing color patterns could be perceived by predators or mates. We removed mentions of “visual effect” and directly used its definition instead.

      (7) Line 105 onwards: This section of the introduction could benefit from more concise writing. The authors might consider reducing the number of specific examples and instead offering broader general statements, supported by citations from multiple studies.

      We reduced the number of examples given in this paragraph and used general statements supported by multiple citations as examples. (lines 102-119).

      (8) Line 108-110: This sentence seems to be redundant with the previous one.

      We merged this sentence with the previous one to improve clarity. (lines 103-105)

      (9) Line 140: 'with chemical defenses': include citations here.

      We added citations of Joron et al 1999 and Merrill et al 2014, which document the evolution of convergent wing patterns (mimicry) in butterfly species with chemical-defenses.

      (10) Line 149: This is a bit of a stretch. Note that genetic divergence could be influenced by many other things, not only the processes that the authors examined.

      We agree with the reviewer that the study of the convergent vs. divergent evolution of visual cues is not enough to fully understand the mechanisms allowing genetic divergence between species. Because this paper does not focus on characterizing genetic divergence, we removed it from the manuscript to avoid oversimplification.

      (11) Line 151: Again. Here, the author's primary focus seems to be at an interspecific level. One is left to wonder about the need for comparisons at the intraspecific level in M.helenor and the implications. Please clarify

      In the end of the introduction (lines 146-157), we specifically highlighted the importance of intraspecific comparisons. While studying the effect of sympatry on the evolution of the iridescent color pattern, we use this intraspecific comparison as a baseline to account for convergence or divergence of iridescence in a sympatric interspecific pair of Morpho, because under neutral evolution two subspecies are expected to be more similar than two different species (this assumption has been clarified line 147-148). We also used intraspecific mate choice to test for the use of visual cues in mate recognition (experiment 1) and to test what type of signal could be perceived by Morphos (the iridescent coloration or the iridescent pattern, experiment 2 and 3). These results help contextualize the interspecific mate choice, focused on determining whether visual cues could also be used in species recognition. Since we show that iridescent coloration is important in mate recognition at the intraspecific scale, it helps understand why species recognition is low at the interspecific scale because of wing color convergence between M. helenor and M. achilles.

      (12) Line 154: 'signals on mate preferences'.

      Corrected.

      (13) Line 189: 'At the intraspecific level', maybe in the brackets include 'allopatric populations' just so the results are in a similar format as in the color contrast section below.

      We added details to make clearer that the intraspecific level is studied between allopatric Morpho populations (line 189).

      (14) Line 189-192: Please rearrange the figure (current B as A and vice versa) or present the results in order as in the figure (interspecific first and then intraspecific level).

      We rearranged Figure 3 so that the intraspecific comparison (allopatric population) appears as A and the interspecific level (sympatric population) appears as B, to follow the order of presentation in the main text.

      (15) Line 232: The motivation behind experiments 1, 2, and 3 is unclear. The authors have not made a strong point in the introduction about the need for these comparisons at an intraspecific level. Given that the authors are focused on divergence/convergence at an interspecific level, this set of experiments seems to be irrelevant to the present study. The implications of these findings are also not discussed.

      We added motivation to the use of experiment 1, 2, and 3 in the introduction (lines 151-154) by stating that those experiments were used to assess whether blue color could indeed be used as a mating cue in Morpho helenor (experiment 1) and to try to understand what part of the visual signal is important in mate choice in Morpho helenor: the wing pattern (experiment 2) or the iridescent coloration (experiment 3). Although motivation for these experiments was not detailed in our manuscript, we already discussed the implications of the results of experiments 1, 2 and 3 in the discussion by stating that visual cues can take many forms and that considering both color AND pattern is important in understanding visual cues (lines 408-416). We carefully reworked this new version to make it more straightforward.

      (16) Line 260: Insert 'wild-type' before model to ensure similar wording as in the previous section.

      Corrected.

      (17) Line 286: Insert 'sympatric' after mimetic.

      Corrected.

      (18) Line 307: Include a reference to the figures or table where these results are presented.

      We now mention in the main text that the different proportions of beta-ocimene found between males M. helenor and M. achilles are shown in Table S2.

      (19) Line 343: These inferences are speculative. Add a line here, something like 'although this warrants further research in this species'.

      We detailed what additional experiments are needed lines 388-396.

      (20) Line 357: The authors have not discussed their results on iridescence divergence in allopatric populations (line 190) and its implications.

      We now made clear in the beginning of the discussion that the divergence of iridescence in allopatric populations is used as a baseline to test for convergent iridescence between species (lines 339-343).

      (21) Line 361 onwards: This first paragraph is a bit confusing, as the results mainly focus on allopatry, while the title refers to sympatry.

      To avoid confusion between the title and the content of the discussion, we divided the last part of the discussion into two different parts. As the first paragraph mainly focus on allopatry, we isolated it and titled it “Iridescent color patterns can be used as mate recognition cues in M. helenor” (line 498). The next paragraph of the discussion, focusing on the sympatric Morpho populations, has been titled “Evolution of visual and olfactory cues in mimetic sister-species living in sympatry” (line 418).

      (21)  Line 383: visual cues 'as' poor species.

      Corrected.

      (23) Line 405: Why females here and not males? This is again confusing since the authors tested for male mate choice in the main experiments. Some background information on sex-specific mate choice in the methods might help.

      In this specific sentence, we talk about performing mate choice experiments to test for the discrimination of olfactory cues by females (and not males) because we found a high divergence in the chemical compounds found on male genitalia. Although female chemical compounds could also be used as a cue by males in mate recognition, olfactive mate choice is often driven by female choice in butterflies. We recognize that this perspective does not line up with the mate choice presented in our results section which focused on male mate choice based on visual cues, because of ecological reasons (Morpho males tend to be attracted to bright blue colorations but not females) and technical reasons (in cages, females tend to hide away from the males or male dummies, and this behavior is not compatible with experiments involving flying around false males). In the discussion, we made sure to precise that the perspective we cite here is about testing the implications of divergence in male olfactory cues (line 454). We also added motivation to why we chose to investigate male (and not female) mate choice based on visual cues in the methods (lines 613-618) and in the results (219-223).

      (24) Line 417: This inference is speculative. Consider toning it down.

      We rewrote the sentence: “We find evidence of converging iridescent patterns in sympatry suggesting that predation could play a major role in the evolution of iridescence. Further work is nevertheless needed to directly test this hypothesis and establish the important of evasive mimicry in Morpho” (lines 465-468).

      (25) Line 429: 'Convergent trait evolution leads to mutualistic interactions enhancing coexistence'. Careful here. It is not very evident how convergent trait evolution (iridescence) is mutualistic in this case, as there is no experimental evidence for evasive mimicry yet. Consider rewording or toning this sentence down.

      We agree with the reviewer and removed this statement, only keeping the end of the sentence: “Altogether, this study addresses how convergence in one trait as a result of biotic interactions may alter selection on traits in other sensory modalities, resulting in a complex mosaic of biodiversity. (lines 479-481).

      (26) Line 442: Since the samples come from a breeding farm, I have a few questions. How are the authors sure about the location where the specimens were collected? How long have they been kept in captivity? Have they been subjected to any artificial selection? More details are needed here.

      Since M. helenor bristowi and M. helenor theodorus are only found in the wild in West and East Ecuador respectively, those M. helenor subspecies can only be collected in those two allopatric populations. Their phenotype is directly linked to their geographic repartition, this is how we made sure about their collect location. M. h. theodorus we used in this study were caught in East Ecuador in Tena, and M. h. bristowi were caught in West Ecuador in Pedro Vincente Madonado. We received pupae from the breeding farm, meaning that the Morpho used for the experiments were raised in captivity since their date of emergence. Upon emergence, they were transferred into cages for 4 to 5 days to wait for sexual maturity before performing the tetrad and mate choice experiments. This information was added to the method (lines 490-496).

      (27) Line 476: Include some citations supporting this statement.

      We now cite Bennett and Théry (2007), reviewing avian color vision, and Briscoe (2008), characterizing the sensitivity of the photoreceptors found in the eyes of butterflies. Both citations show that the 300-700nm range is seen by avian and butterfly visual systems.

      (28) Line 480 onwards: Please clarify if the analysis used only one value (mean?) per species, sex, angle of measurement, and locality or included data from multiple individuals.

      The analyses of both colorimetric variables and global iridescence were performed using iridescence data from multiple individuals (10 males and 10 females from M. h. bristowi, M. h. theodorus, M. h. helenor and M. a. achilles), for which we measured iridescence at 21 angles of illumination. Sampling size are mentioned lines 507, 515, 540-542.

      (29) Line 510: Is there a specific reason that authors did not investigate achromatic contrasts? Provide some justification here. Or include the results of achromatic contrasts in the supplement.

      We added the achromatic results in the supplement and in the results (lines 200-204). For both the avian visual model and the Morpho visual model, the confidence intervals always overlapped with the JND threshold, showing that neither birds nor butterflies could theoretically discriminate the wing reflectance brightness in allopatric and sympatric populations.

      (30) Line 552 onwards: I may have missed it. It is not entirely clear why the authors focused on male mate choice rather than female preference for visual cues. The authors should explicitly justify this choice and cite previous studies demonstrating that male mate choice, rather than female preference, is important in this species. This should be stated in the results section as well.

      We added a paragraph in the method (lines 613-618) to describe the ecological and technical reasons leading to testing only male mate choice using visual cues (also see our response to recommendation #23).

      (31) Line 537 onwards: What was the criterion used to score that mating had occurred? Why first mating and not how long they were mating? Please add these details.

      We stopped the experiment as soon as a male/female pair was formed by joining their genitalia (we added this information in the method lines 599-600). Since the tetrad experiment involves the interaction of two males and two females from different subspecies, we considered that mate choice happened before the formation of any couple, and is not necessarily dependent on how long they mate by observing their mating behavior. For instance, we witnessed avoidance behaviors from females that systematically hide their genitalia and refused to join their abdomen to some males, while being very ‘open’ to others (but did not quantify it).  

      (32) Line 571: The authors used a black permanent marker to modify wing patterns but did not validate whether butterflies perceive these modifications as equivalent to natural coloration. It is possible that the alterations introduced unintended visual cues and may explain why most males rejected the dummies (line 267). The authors should acknowledge this limitation here.

      We now acknowledge this limitation in the method (lines 638-639) and in the results section (lines 278-283).

      (33) Line 591: Insert 'above' after protocol.

      Corrected.

      (34) Line 605: If the authors included random effects in their model, then it should be generalized linear mixed model (GLMM) and not GLM as they wrote.

      We indeed included a random effect in our model accounting for male ID and trial number, we thus replaced “GLM” by “GLMM” in the manuscript.

      (35) Line 615: This set of analyses does not seem to account for pseudo-replication, as the data were recorded from the same male more than once (Line 583). Please clarify and redo the analysis with the GLMM framework

      We run new analyses using the GLMM framework: we used a binomial GLMM to test whether individuals preferentially interacted with dummy 1 vs. dummy 2 while accounting for pseudoreplication. The previously detected tendencies hold true with these new analyses, except for the visual mate discrimination of M. achilles: we now find statistical evidence that M. achilles tend to approach more their conspecifics during the mate choice experiment, although the signal is weak (line 297-307). Indeed, while we previously concluded that both species in sympatry (M. helenor and M. achilles) could not discriminate their conspecific mates, we now emphasize that M. achilles is somewhat sensitive to some visual signals. However, its estimated probability of approaching a conspecific is only 0.54, which is low compared to the estimated probability of approaching (0.61) or touching (0.84) a con-subspecific for M. bristowi. We thus concluded that even though some visual cues could be relevant for mate recognition, they are less reliable for male choice in sympatric populations were color patterns are more convergent, compared to allopatric populations. We thus updated Figure 4 and Figure S8 and S9, which are now picturing the probability of approaching or touching a conspecific or con-subspecific with the updated pvalues retrieved from the GLMM analyses. We also updated the results (line 297-307) and the discussion (lines 430-438) to bring nuance to our previous results.  

      (36) Line 963: Figure 3D. Is there a particular reason for comparing allopatric populations only within Ecuador rather than between Ecuador and French Guiana for M. helenor? Please clarify.

      We aimed at comparing the putative discrimination of blue coloration using visual models vs. what the butterflies actually discriminate using mate choice experiments. Since we only performed mate choice experiments involving M. h. bristowi x M. h. theodorus (allopatric populations within Ecuador) and M. h. helenor x M. a. achilles (sympatric population from Ecuador), we only looked at those comparisons using visual models. We added this precision lines (559-560).

      (37) Line 980: Are these predicted probabilities or just mean proportions as written in line 614? Then the label should be changed to 'Proportion of approaches' or something similar.

      Following our answer to recommendation #35, the points now represent the probability of touching a conspecific in the graph for each male, for every trial of every male tested. We corrected the legend of the figure. 

      Reviewer #2 (Recommendations for the authors):

      (1) Line 25: "...therefore facilitating co-existence in sympathy".

      Corrected.

      (2) Line 28: "contrasting" instead of contrasted.

      Corrected.

      (3) Line 33: begin a new sentence at the colon.

      Corrected.

      (4) Line 49: the phrase "habitat filtering" is unclear and should perhaps be defined or qualified.

      We replaced “habitat filtering” by its definition and cited Keddy (1992), describing the community assembly rules and defining habitat filtering (line 46)

      (5) Line 52: remove "even".

      Corrected.

      (6) Line 53: divergent suites may also result because traits are often constrained by genetic architecture (multivariate genetic covariances). This is discussed at length and specifically in relation to ornamental coloration by Kemp et al. 2023

      We rewrote the introduction and focused on only reviewing the ecological interactions promoting trait divergence in sympatric species, and did not mention genetics in this paper.

      (7) Line 87: (and throughout) refer to "colouration" or "colour pattern" rather than "colourations".

      Corrected.

      (8) Line 151: Remove "To do so,".

      Corrected.

      (9) Line 191: I would like to see the degrees of freedom for this test.

      We added the F-statistic=2.09 and the degrees of freedom df=1 of this test, and for all the following tests.

      (10) Line 201: (and throughout) replace "on" with "of".

      Corrected.

      (11) Line 205: modelling the visual properties of the wings allows one to infer what is theoretically visible/distinguishable. The modelling is useful but not necessarily definitive of vision/behaviour per se under different conditions in the wild. I therefore think it is appropriate to phrase the wording around the modelling approach more carefully. Perhaps refer to "theoretical" or "inferred" discriminability, or state (e.g.) that species should/should not be capable of perceiving differences based on the modelling data. You do this well in your wording of lines 207-209. This need not apply in the discussion because you're then dealing with the combination of modelling results and behaviour (mating trials).

      We agree with the reviewer that visual modelling only allows to infer what is theoretically discriminated by the butterflies, and that the wording of our sentence is confusing. We therefore modified the sentence to account for those precisions: “Morpho butterflies and predators can theoretically visually perceive the difference in the blue coloration between different subspecies of M. helenor…… using both bird and Morpho visual models” (line 206-209).

      (12) Line 222: Either the chi-square test or Fisher's exact test should be sufficient (why report both?)

      Chi-square test relies on large-sample assumptions (expected counts>5) whereas Fischer’s exact test does not and is valid even with small or unbalanced sample sizes. Since the M. bristowi female/M. h. theodorus male paring only occurred 3 times, we do not meet the primary assumptions to apply a Chi-square test, although it is significant. We used a Fischer’s test to confirm the results. Using both and finding that both tests are significant shows that the results are robust, although they may appear redundant. To simplify, we remove the results of the Chisquare test and only keep the Fisher’s test in the methodology and the results.

      (13) Line 224 (and throughout): Degrees of freedom should be provided for statistical tests.

      We reported the statistic value and the degrees of freedom for all mentions of the statistical tests in the main text, except for the Fischer test which does not rely on an asymptotic distribution like the Chi-squared distribution as it is an exact test.

      (14) Lines 266-267: This sentence has interest, but it is rather vague at present. Wouldn't your controls account for the effect of manipulation? This could be explained further.

      During our mate choice experiments, all Morpho female dummies used for the experiments were painted with black markers, either on their dorsal blue band to modify their blue iridescent phenotype, or on their ventral side, thus controlling for the effect of manipulation. However, we cannot rule out that the modification of the dorsal blue iridescence could have had a “repulsive” effect for males for several reasons. For example, depending on the visual discrimination of darker colors by Morphos, the painted black band could have a slightly different color compared to the dark “brown” usually surrounding their blue iridescent patterns. We now explain this in the results (lines 278-283) and in the methodology (lines 638-639)  

      (15) Line 316: I'm not certain that the similarity is best described as "striking", given a P-value of 0.084 for this contrast

      We agree with the reviewer and removed this adjective for this line.

      (16) Lines 387-390: This sentence is puzzling because, theoretically speaking, we should expect selection on visual preference to be heightened (not relaxed) in sympatry if colouration isincluded among the traits used in mate selection. I'm not certain I have understood the meaning here.

      We would like to thank the reviewer for pointing out this typo. If shared predatory pressures favors convergent evolution of color pattern, then the visual signals become less reliable for species recognition. As a result, sexual selection on visual preference is heightened and becomes stronger, favoring the evolution of alternative cues used to discriminate conspecific mates. We changed the sentence and now write “the convergent evolution of iridescent wing patterns… may have negatively impact visual discrimination and favored the evolution of divergent olfactory cues” (lines 457-458).

      (17) Line 529: Mating experiments. Given that these are quite large butterflies, I wondered whether a 3x3x2m cage would be sufficient in size to allow the expression of male courtship. A brief description of the courtship behaviour in these species or Morphos generally would be a useful addition to the paper.

      A cage this size was enough for the males to express a flight behavior similar to what can be seen in nature, while also being able to see the females (live females or dummies). We tried to perform mate experiments in a larger cage (7m x 5m x 3m) but the trials were not conclusive because male did not find the dummies depending on where they were flying in the cage. A 3mx3mx2m cage is a good compromise maximizing interactions while still allowing enough space to fly. We now describe Morpho male behavior and female behavior in the methods (lines 613-618).

      (18) Line 546: Why are both tests needed (chi-square AND Fisher's exact)?

      Similarly to our answer on recommendations #12, were used both tests to show robustness in the statistical results. We only kept the Fisher’s test results to simplify the results.

    1. eLife Assessment

      This study presents important information about the role of mu opioid receptors in neurotransmission between the medial habenula and the interpeduncular nucleus. The authors provide convincing evidence that mu opioid receptor activation has differential effects on transmission from substance P neurons and cholinergic neurons, and that blockade of potassium channels can unmask a nicotinic cholinergic synaptic response. This work will be of high interest to those studying this brain region, and potentially to the larger neuroscience community studying motivated behavior.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors demonstrate for the first time that opioid signaling has opposing effects on the same target neuron depending on the source of the input. Further, the authors provide evidence to support the role of potassium channels in regulating a brake on glutamatergic and cholinergic signaling, with the latter finding being developmentally regulated and responsive to opioid treatment. This evidence solves a conundrum regarding cholinergic signaling in the interpeduncular nucleus that evaded elucidation for many years.

      Strengths:

      This manuscript provides 3 novel and important findings that significantly advance our understanding of the medial habenula-interpeduncular circuitry:

      (1) Mu opioid receptor activation (mOR) reduces postsynaptic glutamatergic currents elicited from substance P neurons while simultaneously enhancing postsynaptic glutamatergic currents from cholinergic neurons, with the latter being developmentally regulated.

      (2) Substance P neurons from the Mhb provide functional input to the rostral nucleus of the IPN, in addition to the previously characterized lateral nuclei.

      (3) Potassium channels (Kv1.2) provide a break on neurotransmission in the IPN,

      The findings here suggest that the authors have identified a novel mechanism for the normal function of neurotransmission in the IPN, so it would be expected to be observable in almost any animal. In the revised manuscript, the authors put forth significant effort to increase the n, thus increasing the confidence in the observations.

      There are also significant sex differences in nAChR expression in the IPN that might not be functionally apparent using the low n presented here. In the revised manuscript, the authors increased the n, and provided data to the reviewers that no significant sex differences were apparent, although there was a trend. Future studies should examine sex differences in detail.

      There are also some particularly novel observations that are presented but not followed up on, and this creates a somewhat disjointed story. For example, in Figure 2, the authors identify neurons in which no response is elicited by light stimulation of ChAT-neurons, but application of DAMGO (mOR agonist) un-silences these neurons. Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons? In the revised manuscript, the authors directly tested this with new experiments in SST+ neurons in the IPN, demonstrating convincingly that mOR activation unsilences these neurons.

      With the revisions, the authors have addressed the reviewers' concerns and significantly improved the manuscript. I find no further weaknesses.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Chittajallu and colleagues present compelling evidence that mu opioid receptor (MOR) activation can potentiate synaptic neurotransmission in a medial habenula to interpeduncular nucleus (mHb-IPN) subcircuit. While, projections from mHb tachykinin 1 (Tac1) neurons onto lateral IPN neurons show a canonical opioid-induced synaptic depression in glutamate release, excitatory neurotransmission in mHb choline acetyltransferase (ChAT) projections to the rostral IPN is potentiated by opioids. This function emerges around age P27 in mice, when MOR expression in the IPN peaks.

      Strengths:

      Carefully executed electrophysiological experiments with appropriate controls. Interesting description of a neurodevelopmental change in the effects of opioids on mHb-IPN signaling.

      Weaknesses:

      A minor concern is that the genetic strategy used to target the mHb-IPN pathway (constitutive ChR2 expression in all ChAT+ and Tac1+ neurons) might not specifically target this projection. Future studies are needed to examine the precise mechanism whereby MOR signaling can potentiate glutamatergic neurotransmission in ChAT+ MHb-IPN projections."

    4. Reviewer #3 (Public review):

      Summary:

      Here the authors describe the role of mORs in synaptic glutamate release from substance P and cholinergic neurons in the medial habenula to interpeduncular nucleus (IPN) circuit in adult mice. They show that mOR activation reduces evoked glutamate release from substance P neurons yet increases evoked glutamate release and Ach release from cholinergic neurons. Unlike glutamate release, Ach release is only detected when potassium channels are blocked with 4-AP or dendrotoxin. The authors also report a previously unidentified glutamatergic input to IPR from SP neurons and describe the developmental timing of mOR- facilitation in adolescent mice.

      Strengths:

      - The experiments provide new insight into the role of mORs in controlling evoked glutamate release in a circuit with high levels of mORs and established roles in relevant behaviors.

      - The experiments are rigorous, and the results are clear cut. The conclusions are supported by the data.

      - The findings will be of interest to those working in the field of synaptic transmission and those interested in the function of the medial habenula or interpeduncular nucleus, as well as those seeking to understand the role of opioids on normal and pathological behaviors.

      Weaknesses:

      - The mechanistic underpinnings of these interesting and novel results are not pursued.

    1. eLife Assessment

      This important study elucidates the role of the exocyst component EXOC6A at distinct stages of ciliogenesis, which advances our understanding of ciliary membrane remodeling and cilium formation. The authors provide solid evidence that EXOC6A interacts with myosin-Va and is dynamically recruited via dynein-, microtubule-, and actin-dependent mechanisms, to support proper formation of the ciliary membrane. The study will be of interest to cell biologists and other researchers interested in vesicular trafficking, organellar membrane dynamics, and ciliogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Lin et al. studies the role of EXOC6A in ciliogenesis and its relationship with the interactor myosin-Va using a range of approaches based on the RPE1 cell line model. They establish its spatio-temporal organization at centrioles, the forming ciliary vesicle and ciliary sheath using ExM, various super-resolution techniques, and EM, including correlative light and electron microscopy. They also perform live imaging analyses and functional studies using RNAi and knockout. They establish a role of EXOC6A together with myosin-Va in Golgi-derived, microtubule- and actin-based vesicle trafficking to and from the ciliary vesicle and sheath membranes. Defects in these functions impair robust ciliary shaft and axoneme formation due to defective transition zone assembly.

      Strengths:

      The study provides very high-quality data that support the conclusions. In particular, the imaging data is compelling. It also integrates all findings in a model that shows how EXOC6A participates in multiple stages of ciliogenesis and how it cooperates with other factors.

      Weaknesses:

      The precise role of EXOC6A remains somewhat unclear. While it is described as a component of the exocyst, the authors do not address its molecular functions and whether it indeed works as part of the exocyst complex during ciliogenesis.

    3. Reviewer #2 (Public review):

      Summary:

      The molecular mechanisms underlying ciliogenesis are not well understood. Previously, work from the same group (Wu et al., 2018) identified myosin-Va as an important protein in transporting preciliary vesicles to the mother vesicles, allowing for initiation of ciliogenesis. The exocyst complex has previously been implicated in ciliogenesis and protein trafficking to cilia. Here, Lin et al. investigate the role of exocyst complex protein EXOC6A in cilia formation. The authors find that EXOC6A localizes to preciliary vesicles, ciliary vesicles, and the ciliary sheath. EXOC6A colocalizes with Myo-Va in the ciliary vesicle and the ciliary sheath, and both proteins are removed from fully assembled cilia. EXOC6A is not required for Myo-Va localization, but Myo-VA and EHD1 are required for EXOC6A to localize in ciliary vesicles. The authors propose that EXOC6A vesicles continually remodel the cilium: FRAP analysis demonstrates that EXOC6A is a dynamic protein, and live imaging shows that EXOC6A fuses with and buds off from the ciliary membrane. Loss of EXOC6A reduces, but does not eliminate, the number of cilia formed in cells. Any cilia that are still present are structurally abnormal, with either bent morphologies or the absence of some transition zone proteins. Overall, the analyses and imaging are well done, and the conclusions are well supported by the data. The work will be of interest to cell biologists, especially those interested in centrosomes and cilia.

      Strengths:

      The TEM micrographs are of excellent quality. The quality of the imaging overall is very good, especially considering that these are dynamic processes occurring in a small region of the cell. The data analysis is well done and the quantifications are very helpful. The manuscript is well-written and the final figure is especially helpful in understanding the model.

      Weaknesses:

      Additional information about the functional and mechanistic roles of EXOC6A would improve the manuscript greatly.

    4. Reviewer #3 (Public review):

      Summary:

      Lin et al report on the dynamic localization of EXOC6A and Myo-Va at pre-ciliary vesicles, ciliary vesicles, and ciliary sheath membrane during ciliogenesis using three-dimensional structured illumination microscopy and ultrastructure expansion microscopy. The authors further confirm the interaction of EXOC6A and Myo-Va by co-immunoprecipitation experiments and demonstrated the requirement of EHD1 for the EXOC6A-labeled ciliary vesicles formation. Additional experiments using gene-silencing by siRNA and pharmacological tools identified the involvement of dynein-, microtubule-, and actin in the transport mechanism of EXOC6A-labeled vesicles to the centriole, as they have previously reported for Myo-Va. Notably, loss of EXOC6A severely disrupts ciliogenesis, with the majority of cells becoming arrested at the ciliary vesicle (CV) stage, highlighting the involvement of EXOC6A at later stages of ciliogenesis. As the authors observe dynamic EXOC6A-positive vesicle release and fusion with the ciliary sheath, this suggests a role in membrane and potentially membrane protein delivery to the growing cilium past the ciliary vesicle stage. While CEP290 localization at the forming cilium appears normal, the recruitment of other transition zone components, exemplified by several MKS and NPHP module components, was also impaired in EXOC6A-deficient cells.

      Strengths:

      (1) By applying different microscopy approaches, the study provides deeper insight into the spatial and temporal localization of EXOC6A and Myo-Va during ciliogenesis.

      (2) The combination of complementary siRNA and pharmacological tools targeting different components strengthens the conclusions.

      (3) This study reveals a new function of EXOC6A in delivering membrane and membrane proteins during ciliogenesis, both to the ciliary vesicle as well as to the ciliary sheath.

      (4) The overall data quality is high. The investigation of EXOC6A at different time points during ciliogenesis is well schematized and explained.

      Weaknesses:

      (1) Since many conclusions are based on EXOC6A immunostaining, it would strengthen the study to validate antibody specificity by demonstrating the absence of staining in EXOC6A-deficient cells.

      (2) While the authors generated an EXOC6A-deficient cell line, off-target effects can be clone-specific. Validating key experiments in a second independent knockout clone or rescuing the phenotype of the existing clone by re-expressing EXOC6A would ensure that the observed phenotypes are due to EXOC6A loss rather than unintended off-target effects.

      (3) Some experimental details are lacking from the materials and methods section. No information on how the co-immunoprecipitation experiments have been performed can be found. The concentrations of pharmacological agents should be provided to allow proper interpretation of the results, as higher or lower doses can produce nonspecific effects. For example, the concentrations of ciliobrevin and nocodazole used to treat RPE1 cells are not specified and should be included. More precise settings for the FRAP experiments would help others reproduce the presented data. Some details for the siRNA-based knockdowns, such as incubation times, can only be found in the figure legends.

      Taken together, the authors achieved their goal of elucidating the role of EXOC6A in ciliogenesis, demonstrating its involvement in vesicle trafficking and membrane remodeling in both early and late stages of ciliogenesis. Their findings are supported by experimental evidence. This work is likely to have an impact on the field by expanding our understanding of the molecular machinery underlying cilia biogenesis, particularly the coordination between the exocyst complex and cytoskeletal transport systems. The methods and data presented offer valuable tools for dissecting vesicle dynamics and cilium formation, providing a foundation for future research into ciliary dysfunction and related diseases. By connecting vesicle trafficking to structural maturation of an organelle, the study adds important context to the broader description of cellular architecture and organelle biogenesis.

    1. eLife Assessment

      This valuable study investigates the role of HIF1a signaling in epicardial activation and neonatal heart regeneration in mice. Using a combination of genetic and pharmacological approaches, the authors demonstrate that stabilization of HIF1a enhances epicardial activation and extends the regenerative capacity of the heart beyond the typical neonatal window following myocardial infarction. The main conclusion is well supported by solid data, although some minor concerns regarding experimental interpretation require further clarification to ensure accuracy.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts.

      Strengths:

      The study presents convincing genetic and pharmacological approaches on the role of hypoxia signaling enhance the regenerative potential of the epicardium

      Weaknesses:

      The major weakness remains the lack of convincing evidence demonstrating the role of hypoxia signaling in EMT modulation in the epicardial cells. The authors claimed that EMT assays adopted in this study are based on similar previous studies. Surprisingly, two of the references provided correspond to their own research group (PMID: 17108969, PMID: 19235142), limiting the credit for such claims, and the other two (PMID: 27023710, PMID: 12297106) assessment of cell migration but not EMT is reported. Thus, EMT remains to be convincingly demonstrated.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Gamen et al. investigated the roles of hypoxia and HIF1a signaling in regulating epicardial function during cardiac development and neonatal heart regeneration. The authors identified hypoxic regions in the epicardium during development and demonstrated that genetic and pharmacological stabilization of HIF1a during neonatal heart injury prolonged epicardial activation, preserved myocardium, enhanced infarct resolution, and maintained cardiac function beyond the normal postnatal regenerative window.

      Strengths:

      HIF1a signaling was manipulated in an epicardium-specific manner using appropriate genetic tools.

      Weaknesses:

      Some conclusions still need clarification.

      Comments on revisions:

      (1) The authors' comment on the partial overlap of HP1 and HIF1a IF signals (HIF1a is highly unstable ... broader regions of hypoxia) is reasonable and would help readers interpret the data if included in the text describing Fig. 1.

      (2) The conclusion regarding WT1+ cells in Fig. 2a and b remains unclear. Both panels display larger and smaller magenta cells, and when all are taken into account, the overall number does not appear substantially different. Additional clarification is needed on how the quantification was performed.

      (3) Regarding Figure 6-figure supplement 1c, it seems difficult to conclude the endothelial identity of WT1+ cells based on EMCN staining, as the markers do not overlap. The authors note that WT1 is upregulated in endothelial cells, but this has been reported in the context of injury, which differs from the context of the present study involving Molidustat.

    4. Reviewer #3 (Public review):

      Summary:

      The author's research here was to understand the role of hypoxia and hypoxia-induced transcription factors Hif-1a in the epicardium. The authors noted that hypoxia was prevalent in the embryonic heart and this persisted into neonatal stages until post natal day 7 (P7). Hypoxic regions in the heart were noted in the outer layer of the heart and expression of Hif-1a coincided with the epicardial gene WT1. It has been documented that at P7, the mouse heart cannot regenerate after myocardial infarction and the authors speculated that the change in epicardial hypoxic conditions could play a role in regeneration. The authors then used genetic and pharmacological tools to increase the activity of Hif genes in the heart and noted that there was a significant improvement in cardiac function when Hif-1a was active in the epicardium. The authors speculated that the presence of Hif-1a improved cell survival.

      Strengths:

      A focus on hypoxia and its effects on the epicardium in development and after myocardial infraction. This study outlines a potential to extend the regenerative time window in neonatal mammalian hearts.

      Weaknesses:

      While the observations of improved cardiac function is clear, the exact mechanism of how increased Hif-1a activity causes these effects is not completely revealed. The authors mention improved myocardium survival, but do not include studies to demonstrate this.

      There is an indication that fibrosis is decreased in hearts where Hif activity is prolonged, but there are no studies to link hypoxia and fibrosis.

      Comments on revisions:

      In the manuscript revision, the authors address my comments. They outline differences between genetic disruption of Phd2 and chemical inactivation could be due to dosing and drug half-life of Molidustat. The other comments are addressed by explaining that they have analyzed enough heart sections and hearts to come to their conclusions. The authors also state they cannot generate more numbers for this study, therefore I accept their conclusions as stated.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates the role of HIF1a signalling in epicardial activation and neonatal heart regeneration in mice. Through a combination of genetic and pharmacological approaches, the authors show that stabilization of HIF1a enhances epicardial activation and extends the regenerative capacity of the heart beyond the typical neonatal window following myocardial infarction (MI). However, several aspects of the study remain incomplete and would benefit from further clarification and additional experimental support to solidify the conclusions.

      We reveal herein prolonged epicardial activation following myocardial infarction (MI) beyond post-natal days 1-7 (P1-P7) by genetic or pharmacological stabilisation of HIF-signalling. This extends the so-called “regenerative window” during an adult-like response to injury, leading to enhanced survived myocardium and functional improvement of the heart, even against a backdrop of persistent, albeit reduced, fibrosis. The epicardium is known to enhance cardiomyocyte proliferation and myocardial growth during heart development via trophic growth factor (for example, IGF-1, FGF, VEGF, TGFβ and BMP) signalling (reviewed in PMID:29592950) and epicardium-derived cell-conditioned medium reduces infarct size and improves heart function (PMID: 21505261). Further experiments, outside of the scope of the current study, are required to determine whether activated neonatal epicardium elicits similar paracrine support to sustain the myocardium and heart function after injury beyond P7 into adulthood.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts.

      Strengths:

      The study presents convincing genetic and pharmacological approaches to the role of hypoxia signaling in enhancing the regenerative potential of the epicardium.

      Weaknesses:

      The major weakness is the lack of convincing evidence demonstrating the role of hypoxia signaling in EMT modulation in epicardial cells. Additionally, novel experimental approaches should be performed to allow for the translation of these findings to the clinical arena.

      We respectfully disagree that we have not convincingly demonstrated a role for HIF-signalling in promoting epicardial EMT. We adopt epicardial explant assays utilising a well characterised ex vivo protocol previously described for studying EMT in embryonic, neonatal and adult epicardium (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142). These assays demonstrate in WT1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> explants enhanced cobblestone to spindle-like change in cell morphology, increased cell migration, appearance of stress fibres and an up-regulation of the mesenchymal marker alpha-smooth muscle actin (αSMA); all parameters associated with EMT. In addition, our in vivo analyses of Wt1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> hearts, in response to neonatal injury, reveal elevated numbers of WT1+ epicardial cells within the sub-epicardial region and underlying myocardium as is associated with active EMT and subsequent migration from the epicardium.

      Reviewer #2 (Public review):

      Summary:

      In this study, Gamen et al. investigated the roles of hypoxia and HIF1a signaling in regulating epicardial function during cardiac development and neonatal heart regeneration. They found that WT1<sup>+</sup> epicardial cells become hypoxic and begin expressing HIF1a from mid-gestation onward. During development, epicardial HIF1a signaling regulates WT1 expression and promotes coronary vasculature formation. In the postnatal heart, genetic and pharmacological upregulation of HIF1a sustained epicardial activation and improved regenerative outcomes.

      Strengths:

      HIF1a signaling was manipulated in an epicardium-specific manner using appropriate genetic tools.

      Weaknesses:

      There appears to be a discrepancy between some of the conclusions and the provided histological data. Additionally, the study does not offer mechanistic insight into the functional recovery observed.

      We respectfully disagree with the comment that our histological data does not support our conclusions and expand on this in the response to specific reviewer comments. We agree that further mechanistic experiments outside of the scope of the current study are required to identify precisely how activated neonatal epicardium results in increased healthy myocardium after injury beyond post-natal day 7 (P7).

      Reviewer #3 (Public review):

      Summary:

      The authors' research here was to understand the role of hypoxia and hypoxia-induced transcription factor Hif-1a in the epicardium. The authors noted that hypoxia was prevalent in the embryonic heart, and this persisted into neonatal stages until postnatal day 7 (P7). Hypoxic regions in the heart were noted in the outer layer of the heart, and expression of Hif-1a coincided with the epicardial gene WT1. It has been documented that at P7, the mouse heart cannot regenerate after myocardial infarction, and the authors speculated that the change in epicardial hypoxic conditions could play a role in regeneration. The authors then used genetic and pharmacological tools to increase the activity of Hif genes in the heart and noted that there was a significant improvement in cardiac function when Hif-1a was active in the epicardium. The authors speculated that the presence of Hif-1a improved cell survival.

      Strengths:

      A focus on hypoxia and its effects on the epicardium in development and after myocardial infarction. This study outlines the potential to extend the regenerative time window in neonatal mammalian hearts.

      We thank the reviewer for this positive endorsement and recognition of the importance of mechanistic insight into how to extend the window of neonatal heart regeneration.

      Weaknesses:

      While the observations of improved cardiac function are clear, the exact mechanism of how increased Hif-1a activity causes these effects is not completely revealed. The authors mention improved myocardium survival, but do not include studies to demonstrate this.

      We report an increase in healthy myocardium arising from prolonged activation of the epicardium during the neonatal window and following injury at post-natal day 7 (P7). We speculate this recapitulates the role of the epicardium during heart development which is known to be a source of trophic growth factors that can enhance myocardial growth. Further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      There is an indication that fibrosis is decreased in hearts where Hif activity is prolonged, but there are no studies to link hypoxia and fibrosis.

      We believe the decreased fibrosis is a natural consequence of the increase in survived myocardium arising from the activated epicardium. There is strong precedent here following injury at post-natal day 1 (P1) in which fibrosis is evident early-on but is resolved over time with growth of the myocardium in the regenerating heart (PMID: 23248315).

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Address issues related to image quality, colocalization, sample labeling, appropriate controls, and quantification - particularly in Figures 1, 2, 6, and Supplementary Figure 9. Increase sample size as noted by reviewers.

      The issues of co-localisation and sample labelling have been addressed under response to reviewers. We are unable to increase sample numbers but have clarified the number of regions per section and numbers of sections per heart analysed where appropriate.

      (2) Clarify the effects of epicardial HIF1a activation on neovascularization.

      We have removed reference in the abstract to an effect on neovascularisation.

      (3) Extend assessments of epicardial hypoxia and HIF1a expression to earlier embryonic stages, when epicardial EMT is more active.

      Our earliest timepoint of E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445). In the same study, E11.5 lineage tracing of epicardial cells is restricted to outer layer of the heart; thus, our timepoints are representative in capturing both the onset and progression of in vivo EMT.

      (4) Strengthen EMT assays and mechanistic modeling. Provide evidence from physiologically relevant models, as current 2D culture assays do not adequately support conclusions about EMT. Include additional EMT markers and quantification where appropriate.

      We respectfully disagree that epicardial explants are not a valid assay for assessing EMT. As noted under responses to reviewers, such primary explants have been widely described elsewhere (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142) and enable documentation of multiple parameters that are associated with active EMT, including an assessment of the extent of cell migration, cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We support our findings in explants by revealing reduced WT1+ epicardium-derived cells (EPDCs) in the sub-epicardial region and underlying myocardium of WT1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> embryonic hearts (data in Figure 2) indicative of impaired epicardial EMT and migration of EPDCs and in vivo following neonatal MI with pharmacological inhibition of PHD2, where we observe the reciprocal phenotype of increased numbers of epicardium-derived cells emerging from the outer epicardial layer (data in Figure 6).

      (5) Strengthen mechanistic insights into the role of epicardial cells in the functional recovery observed in MI hearts.

      We agree that further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Reviewer #1 (Recommendations for the authors):

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts. The study is potentially interesting, but it presents several major caveats.

      (1) One of the critical points reported in the early stages of this study is the early co-localization of Wt1, the hypoxic report (HP1), and HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) during embryonic development. Figure 1 is meant to report such findings. However, unfortunately, I hardly see any co-localization at all in the Wt1+ epicardial cells for HP1, with some colocalization is seen for HIF1 and 2 alpha, although none of these data are quantified. Thus, it is hard to believe such co-localization.

      We respectfully disagree with this comment. We highlight cells in Figure 1 that are co-stained for WT1+ and HP1. In addition, we identify HIF1-α and HIF2- α positive cells which either reside within the epicardium, as the outer cell layer, or within the underlying sub-epicardial region, respectfully.

      (2) The authors claimed that they have analyzed the expression of the hypoxic report, as well as Wt1 and the HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) in the AV groove, as compared to the apex, in embryonic heart ranging from E12.5 to E18.5 (Figure 1). Unfortunately, all images provided that are tagged as AV groove are rather misleading. They do not represent the AV groove but part of the right ventricular free wall. If the authors want to refer to the AV groove, AV cushions should be visible underneath.

      We have removed specific reference to the AV groove and refer to the highlighted regions as the “Base” of the heart.

      (3) The authors analyzed the hypoxic condition of the developing heart from E12.5 to E18.5. However, it remains unclear why the authors only explored the hypoxic conditions from E12.5 onwards, since epicardial EMT mainly occurs earlier than this time point, i.e., E10.5 onwards. Therefore, it would be needed to explore it already at this earlier time point.

      We respectfully disagree with the reviewer and refer to the comment above regarding the fact that E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445).

      (4) The authors reported a conditional mouse model of HIF1alpha deletion by using the Wt1CreERT2 driver. Curiously, Wt1 is dependent on hypoxia signaling (i.e., HIF1a). Therefore, it is unclear whether there is a negative feedback loop between the deletion of Hif1alpha and the activation of the Cre driver might have functional consequences. Convincing evidence should be provided that such crosstalk does not interfere with Hif1alpha inactivation, and therefore, appropriate controls should be run in parallel.

      We discount a negative feedback loop in this instance based on the fact we have utilised heterozygous mice for the WT1<sup>CreERT2/+</sup> line and observe a consistent and reproducible phenotype for the developing hearts on a Wt1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> background and following injury in Wt1<sup>CreERT2/+</sup>;Phd2<sup>fl/fl</sup> mice. Collectively this indicates that the WT1-CreERT2 driver is active in the context of diminishing HIF-1α and Phd2, respectively. In addition, have carried out parallel experiments using epicardial explants derived from R26R-CreERT2;Phd2<sup>fl/fl</sup> (Figure 3) to circumvent any potential confounding issues; the results of which are consistent with increased epicardial EMT in support of our overall hypothesis.

      (5) On Figure 2a-f the authors reported that epicardial cells are diminished in Wt1CreERT2Hif1alpha mice as compared to controls. I am very sorry, but I do not see any difference. Furthermore, it is unclear to me how the authors quantified such differences, i.e., what marker signal did they use and how it was performed (Figure 2c and d)?

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining in Figure 2, which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (comparing magenta cells in panels a) versus b). Quantification was carried out for numbers of WT1+ cells residing within the PDPN-positive epicardium (and underlying PDPN-negative myocardium) across multiple images from multiple sections and multiple hearts.

      (6) On Figure 2g, the authors reported differences in total vessel length. Are they referring to impaired microvasculature development? Or is this analysis also including major coronary vessels? What about the major coronary vessels and trees, is there any affection?

      This analysis refers to the microvasculature and not the major coronary arteries or coronary trees.

      (7) The authors reported that there might be some differences in EMT markers, but unfortunately, all of them are analyzed on 2D cultures, where no substrate for EMT is present, i.e., an underlying ECM bed. Thus, the authors cannot claim that EMT is altered. Additional experiments using either collagen substrate and/or Matrigel are required to fully demonstrate that EMT is impaired. Furthermore, quantitative analyses of such differences should be provided.

      The 2D cultures are epicardial explants from mutant versus wild type hearts and represent a widely adopted previously published ex-vivo assay for investigating epicardial EMT across embryonic to adult stages (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142); including an assessment of the extent of migration and cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We do not understand the comment regarding an “underlying ECM bed” as the cells exhibit EMT routinely on tissue culture plastic and will deposit their own ECM during the culture time course and in response to EMT/cell migration. In terms of quantification this was carried out for scratch assay experiments, as a proxy for EMT and emergent mesenchymal cell migration, as presented in Figure 3i, j with significant enhanced scratch closure and cell migration following Molidustat treatment.

      (8) The description of data provided on Supplementary Figure 5 is spurious and should be removed. A note in the discussion might be sufficient.

      We respectfully disagree. The ChIP-seq data, in what is now Figure 2- figure supplement 3, highlights a HIF-1 α binding site within the Wt1 locus suggesting putative upstream regulation of WT1 by HIF-1α. Thus this provides a potential explanation as to how HIF-1α may activate the epicardium through up-regulation of Wt1/WT1.

      (9) On Figure 3, the authors further illustrate the change of EMT markers using ex vivo cardiac explants. They reported increased expression of Snai2 that, although statistically significant, is most likely of no biological relevance (increase of only 20% at transcript level). What about Snai1, Prrx1, and other EMT promoters? Are they also induced? As previously stated, these 2D cultures do not provide supporting evidence that EMT is occurring, thus 3D gel assays should be performed in which Z-axis analyses will provide evidence on the different migratory behaviour of those cells.

      We respectfully suggest that a 20% change in snai2 expression is biologically meaningful with respect to EMT. This in-turn is supported by associated cell migration, reduced ZO-1 expression, increased stress fibres and increased alpha-SMA as a mesenchymal marker; all properties associated with active EMT. Other suggested markers have not been validated as formally required for EMT, for example Snai1 (PMID: 23097346). The migratory capacity of targeted versus epicardial cells was assessed by combined explant and scratch assay experiments.

      (10) The description of single-cell analyses is very incomplete. Which mice were used for these analyses, wildtype control, or hypoxic mice? Please provide a clearer description of the samples used. Additionally, the entire rationale of these analyses is dubious. Doing single-cell analyses to analyze a couple or three markers in a very small cell population is rather ridiculous. qPCR might be far more appropriate and convincing, or a bulk RNAseq analysis of isolated epicardial cells.

      The single-cell analyses represent an unbiased assessment of different pathways in epicardial cells (identified bioinformatically) between intact P1 and P7 stages in wild type (control) hearts, with a focus on hypoxia-related gene expression and HIF-dependent pathways. It was not designed to analyse a small number of genes, rather global differences in the hypoxic states between P1 and P7 hearts. Selected genes (Vegfa, Pdk3, Egln 1 (Phd2)) were analysed to highlight the key differences in hypoxic signalling across the regenerative window. The fact the hearts were uninjured/intact is clarified in the text and legends for Figure 4 and now Figure 4-figure supplement 1.

      (11) The analyses provided in Figure 5 are very interesting and their findings are very relevant. However, I would think that the complementary experimental approach should also be done, i.e, MI followed by activation with tamoxifen, since that situation would be more realistic in the clinical setting.

      Tamoxifen causes respiratory failure in neonates with MI, so the two cannot be combined at the same time or soon after surgery. Moreover, tamoxifen takes significant time to take effect on targeted gene down-regulation which may negate sufficient activation of the epicardium following injury.

      The experiments in Figure 5 were designed to demonstrate that prolonged heart regeneration could be elicited in a cell-specific (epicardial-specific) manner via a genetic approach. The pharmacological experiments in Figure 6 are complementary in this regard by demonstrating equivalent effects with drug (Molidustat) delivery to reduce PHD2 and stabilise HIF post-MI.

      (12) In Figure 6, expression of Wt1 is highly prominent in P7 controls, mainly restricted to the epicardial lining while in the experimental setting, such Wt1 expression is broadly distributed on the subepicardial space, nicely demonstrating epicardial activation. However, it is very surprising to see such Wt1 expression in controls, something that is not expected, as compared to the data reported in Figure 4g. Could the authors please reconcile these findings?

      Figure 6 represents the injury setting and Figure 4g the intact setting (as clarified above, in the text and revised figure legends). Hence in the latter WT1 expression is significantly reduced in the P7 heart, as anticipated. With injury at P7 we anticipate activation of WT1 in control hearts, albeit restricted to the epicardial layer (as occurs in adult hearts, PMID: 21505261). In contrast, following Molidustat-treatment of P7 hearts post-MI we observe extensive epicardial expansion into the sub-epicardial region and EPDC migration into the underlying myocardium (Figure 6b).

      Reviewer #2 (Recommendations for the authors):

      The role of hypoxia and HIF1a signaling in epicardial activation is an important topic, and the genetic approaches employed in this study are appropriate. However, several aspects of the study remain unclear and would benefit from further clarification or explanation by the authors:

      (1) The authors detected hypoxic regions using an anti-pimonidazole fluorescence-conjugated monoclonal antibody (HP1). The data would become more compelling if negative and positive controls were provided.

      We believe the HP1 staining is compelling in the images shown and is consistent with hypoxic regions of the developing heart. We reveal HP1 staining at cellular resolution with neighbouring cells positive and negative for the HP1 signal in the apex of the heart and within the epicardium and sub-epicardial regions at E12.5 (Figure 1a) and diminished/altered hypoxic/HP1 regional signal through subsequent developmental stages at E14.5-18.5 (Figure 1a-d).

      (2) Many HIF1a-positive cells in the AV groove region do not appear to overlap with HP1 staining (Figure 1a). Providing a low-magnification image of HIF1α expression would be helpful to better assess the extent of overlap with HP1 staining

      HIF-1 is highly unstable and hence detection of HIF-1+ cells will likely only sample of cells compared to HP1 which is a surrogate for broader regions of hypoxia.

      (3) Although the authors conclude that epicardial HIF1a deletion results in a significant reduction of WT1⁺ cells in both the epicardium and myocardium (Figure 2a-d), the provided images are not sufficiently clear to fully support this interpretation. Providing additional evidence to support this conclusion would be helpful.

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (Figure 2a versus 2b; magenta WT1+ staining).

      (4) Similar to the point raised above, the authors' conclusion regarding the increased expression of WT1 following Molidustat treatment does not appear to be fully supported by the provided images (Figure 6b-f). Immunofluorescence staining for WT1 does not clearly demonstrate epicardial expression in the remote zone of either the control or Molidustat-treated hearts. In addition, while an increase of WT1<sup>+</sup> cells is observed in the infarct zone of the Molidustat-treated heart, it is somewhat unexpected that such expansion is not evident in the corresponding region of the control heart, given that epicardial cells typically expand near the infarct area. Clarification on these points would be helpful.

      Figure 6b reveals WT1 expression in controls (upper panel set) that is reactivated proximal to the infarct region, given WT1 is not expressed in adult epicardium but restricted to the epicardial layer (as occurs in injured adult mouse hearts PMID: 21505261). This contrasts with what is observed in the Molidustat-treated P7 hearts post-MI, where we observe epicardial expansion and migration of WT1+ cells into the underlying myocardium (Figure 6b, lower panel set, infarct zone).

      (5) The authors conclude that WT1<sup>+</sup> cells in the myocardial tissue exhibit endothelial identity based on the colocalization of WT1 and EMCN signals (Supplementary Figure 9c). However, this interpretation is difficult to assess, as WT1 is a nuclear marker and EMCN is a membrane protein, which makes precise colocalization challenging to confirm with confidence. Additional supporting evidence may be necessary to substantiate this conclusion.

      WT1 is known to be up regulated in endothelial cells in response to injury as shown previously in several studies (for example, PMID: 25681586). Here we show clear co-localisation of nuclear WT1 and cytoplasmic Endomucin (EMCN) in what is now Figure 6- figure supplement 1c and would encourage the reviewer and readers to magnify the image by zooming-in on the relevant co-stained panel.

      (6) The authors conclude that activation of epicardial HIF1a signaling has no effect on neovascularization in postnatal MI hearts (Figure 5c). However, the abstract states: "Finally, a combination of genetic and pharmacological stabilisation of HIF ... increased vascularisation, augmented infarct resolution and preserved function beyond the 7-day regenerative window" (Lines 38-41). Clarification regarding this apparent discrepancy would be appreciated.

      The abstract has been altered to remove the statement of increased vascularisation.

      (7) The study appears somewhat incomplete, as it lacks mechanistic insight into the functional recovery observed following epicardial Phd2 deletion and Molidustat treatment in postnatal MI hearts. Although the authors suggest a potential paracrine role of the epicardium in protecting cardiomyocytes from apoptosis, this hypothesis has not been experimentally addressed. Incorporating such analysis would help to reinforce the study's conclusions.

      Further experiments are required, which are out-of-scope of this study, to define a mechanistic link between the genetic or pharmacological stabilisation of HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Other points:

      (1) Providing single-channel images for Figures 1a-d and 6g would be helpful for clarity and interpretation.

      We believe the combined channel views of co-staining for two markers on a background of DAPI staining to pin-point cell nuclei, are informative and support our conclusions.

      (2) Have the authors considered using AngioTool to quantify the number of vessels in Figure 5b-c?

      AngioToolTM was used to quantify the vessels, as we have used previously (PMID: 33462113) and this is now added to the methods and legend of Figure 2.

      Reviewer #3 (Recommendations for the authors):

      There are several areas where the manuscript can be improved, such that its conclusions can be solidified.

      (1) The authors highlight a point where blocking Phd2 can enhance survival of cardiac tissue, but did not report on survival markers. They surmised that apoptosis could be decreased in Phd2 mutant or Molidustat treatment but did not show this. The authors should determine if apoptosis is decreased in the myocardium and epicardium.

      We show evidence of increased levels of healthy myocardium in the genetic and pharmacological models of stabilised HIF-signalling. We exclude increased cardiac hypertrophy or increased cardiomyocyte proliferation as causative, so suggest as a reasonable alternative enhanced survival, albeit this need not necessarily be via an apoptotic pathway given the incidence of necrotic cell death during MI. We are unable to generate new surgeries and mutant/treated heart samples to analyse for apoptotic markers at this stage.

      (2) There appears to be no difference in cardiomyocyte proliferation in Molidustat-treated animals, but the experiment was only performed on 2 to 3 animals. This is too small a sample size to conclude from these results. The authors should increase the sample size to make this assertion.

      We respectfully disagree that we are unable to conclude no effect on cardiomyocyte proliferation. We analysed multiple heart regions per section, for EdU+/cTnT+ colocalised signals across several sections per heart, set against a consistency of effect on other parameters in hearts treated with Molidustat. We are unable to generate more P7 heart surgeries +/- Molidustat and +/- EdU at this stage.

      (3) It is curious as to how, after myocardial infarction, the fibrotic scar tissue is decreased in the Phd2 deletion but not as profound in Molidustat-treated mice at d21. Can the authors speculate why the difference exists and how this decrease arises? For example, are there decreased pro-inflammatory signals in Phd2 deleted mice? Is there decreased collagen deposition and ECM gene expression? Do macrophage recruitment into the infarct zone differ between mutant/treated vs WT?

      The representative images in Figure 6k reveal a trend towards reduced fibrosis with Molidistat treatment (Figure 6l), but across all hearts analysed this was not as significant as observed in the epicardial-specific deletion injured hearts (Figure 5g, h). This may be due to the relatively short half-life of Molidustat (approximately 4-10 hours, PMID: 32248614), the dosing regimen for the drug and/or the fact that it was not specifically delivered/targeted to the epicardium.

      (4) The magnified images in Figure 1 do not match the boxes in the whole heart images. It is unclear what the white boxes signify.

      The white boxes have been removed from Figure 1. The magnified image panels are from serial heart sections and this is now clarified in the Figure 1 legend.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of how the glycocalyx of cells provide a non-specific barrier for the interaction of viruses with cell-surface receptors. Using both in vitro experiments and in vivo manipulations they provide compelling evidence for the properties of the glycocalyx to serve as an energy barrier as a main attribute of its mode of action. The work will be of broad interest to virologists and the cell biology community that studies host-pathogen interactions.

    2. Joint Public Review:

      This manuscript tests the notion that bulky membrane glycoproteins suppress viral infection through non-specific interactions. Using a suite of biochemical, biophysical, and computational methods in multiple contexts (ex vivo, in vitro, and in silico), the authors collect compelling evidence supporting the notion that (1) a wide range of surface glycoproteins erect an energy barrier for the virus to form stable adhesive interface needed for fusion and uptake and (2) the total amount of glycan, independent of their molecular identity, additively enhanced the suppression.

      As a functional assay the authors focus on viral infection starting from the assumption that a physical boundary modulated by overexpressing a protein-of-interest could prevent viral entry and subsequent infection. Here they find that glycan content (measured using the PNA lectin) of the overexpressed protein and total molecular weight, that includes amino acid weight and the glycan weight, is negatively correlated with viral infection. They continue to demonstrate that it is in effect the total glycan content, using a variety of lectin labelling, that is responsible for reduced infection in cells. Because the authors do not find a loss in virus binding this allows them to hypothesize that the glycan content presents a barrier for the stable membrane-membrane contact between virus and cell. They subsequently set out to determine the effective radius of the proteins at the membrane and demonstrate that on a supported lipid bilayer the glycosylated proteins do not transition from the mushroom to the brush regime at the densities used. Finally, using Super Resolution microscopy they find that above an effective radius of 5 nm proteins are excluded from the virus-cell interface.

      The experimental design does not present major concerns and the results provide insight on a biophysical mechanism according to which, repulsion forces between branched glycan chains of highly glycosylated proteins exert a kinetic energy barrier that limits the formation of a membrane/viral interface required for infection.

      In their revised manuscript and rebuttal, the authors address several general and specific concerns that were raised about their first submission. The revised manuscript now makes the strength of the evidence supporting their claims, compelling.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Review

      GENERAL QUESTIONS:

      (1) For many enveloped viruses, the attachment factors - paradoxically - are also surface glycoproteins, often complexed with a distinct fusion protein. The authors note here that the glycoportiens do not inhibit the initial binding, but only limit the stability of the adhesive interface needed for subsequent membrane fusion and viral uptake. How these antagonistic tendencies might play out should be discussed.

      When the surface density of receptor molecules for a virus with glycans increases, the density of free glycans not bound to the virus increases along with the amount of virus adsorbed. However, if the total amount of glycans is considered to be a function of the receptor density, the reaction may become more complicated. This complication may also be affected by the prolonged infection. If the receptor density on the cell surface is high, the infection inhibitory effect of glycans may not be obtained in a system in which a high concentration of virus is supplied from the outside world for a long time. This is because once viruses have entered the cell, they accumulate inside the cell, and viral infection is affected by the total accumulated amount, which is the integration of the number of viruses that have entered over time. This distinction indicates that the virus entry reaction and the total amount of infection in the cell must be considered separately. This is an important point, but it was not clearly mentioned in the original manuscript.

      Our experiments were conducted under conditions that clearly allowed us to detect the virusinhibiting function of glycans without being affected by the above points. In order to clarify these points, we will revise this article as follows, referring to an experiment that is somewhat related to this discussion (the Adenovirus infection experiment into HEK293T cells shown in Figure S1F)..

      (Page-3, Introduction)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (Page 20, Discussion)

      When the virus receptor is a glycoprotein or glycan itself, the inhibition of virus infection by glycans becomes more complex because the total amount of glycans is also a function of the receptor density. It is also important to note that the total amount of infection into a cell is the time integral of virus entry. Even if the probability of virus entry is significantly reduced by glycans, the cumulative number of virus entries may increase if high concentrations of virus continue to be supplied from outside the cell for a long period of time. In the case of Adenovirus, which continues to amplify in HEK293T cells after infection, we showed that MUC1 on the cell surface has an inhibitory effect on long-term cumulative infection (Supplementary Figure 1F). However, such an accumulation effect may be caseby-case depending on the virus cell system, and may be more pronounced when the cell surface density of virus receptor molecules is high. As a result, if the virus receptor molecule is a glycan or glycoprotein and infection continues for a long period of time, the infection inhibition effect may not be observed despite an apparent increase in the total amount of glycans in the cell. In any case, our results clarified the factor of virus entry inhibition dependent on the total amount of glycans because appropriate conditions were set.

      (2) Unlike polymers tethered to solid surface undergoing mushroom-to-brush transition in densitydependent manner, the glycoproteins at the cell surface are of course mobile (presumably in a density-dependent manner). They can thus redistribute in spatial patterns, which serve to minimize the free energy. I suggest the authors explicitly address how these considerations influence the in vitro reconstitution assays seeking to assess the glycosylation-dependent protein packing.

      We performed additional experiments using lipid bilayers that had lost fluidity, and found that there is no significant difference in protein binding between fluid and nonfluid bilayers. The redistribution of molecules due to molecular fluidity may play some roles but not in our experimental systems. It suggests that glycoproteins can generate intermolecular repulsion even in fluid conditions such as cell membranes, just as they do on the solid phase. This experiment was also very useful because it allowed us to compare our results in the fluid bilayer with solid-state measurements of saturation molecular density and the brush transition. This comparison gave us confidence that in the reconstituted membrane system, even at saturation density, the membrane proteins are not as stretched as they are in the condensed brush state. We have therefore added a new paragraph and a new figure (Supplementary Fig. 5B) to discuss this issue, as follows:

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      (3) The discussion of the role of excluded volume in steric repulsion between glycoprotein needs clarification. As presented, it's unclear what the role of "excluded volume" effects is in driving steric repulsion? Do the authors imply depletion forces? Or the volume unavailable due to stochastic configurations of gaussian chains? How does the formalism apply to branched membrane glycoproteins is not immediately obvious.

      Regarding the excluded volume due to steric repulsion between glycoproteins, we considered the volume that cannot be used by glycans as Gaussian chains branching from the main chain. We would like to expand on this point by adding several papers that make similar arguments. I'm glad you brought this up because we hadn't considered depletion forces - the excluded volume between glycoproteins should generate a depletion force, but in this case we believe this force will not have a significant effect on viruses that are larger than the glycoproteins. We also attempted to clarify the discussion in this section by focusing on intermolecular repulsion, and restructured paragraphs, which are also related to General Question 2 and Specific Question 2. The relevant part has been revised as follows. (page 15~page16):

      To compare the packing of proteins with different molecular weights and R<sub>F</sub>, These were smaller than the coverage of molecules at hexagonal close packing that is ~90.7%. In contrast, the coverage of b-CD43 and b-MUC1 at saturated binding was estimated to be greater than 100% under this normalization standard, indicating that the mean projected sizes of these molecules in surface direction were smaller than those expected from their R<sub>F</sub> Thus, it is clear that glycosylation reduces the saturation density of membrane proteins, regardless of molecular size.

      Highly glycosylated proteins resisted densification, indicating that some intermolecular repulsion is occurring. In the framework of polymer brush theory, the intermolecular repulsion of densely packed highly glycosylated proteins is due to an increase in either f<sub>el</sub>, f<sub>int</sub> (d<R<sub>F</sub>), or both (Hansen et al., 2003; Wu et al., 2002). The term of intermolecular interaction, f<sub>int</sub>, is regulated by intermolecular steric repulsion, which occurs when neighboring molecules cannot approach the excluded volume created by the stochastic configuration of the polymer chain (Attili et al., 2012; Faivre et al., 2018; Kreussling and Ullman, 1954; Kuo et al., 2018; Paturej et al., 2016). The magnitude of this steric repulsion depends largely on R<sub>F</sub> in dilute solutions, but the molecular structure may also affect it when molecules are densified on a surface. In other words, the glycans protruding between molecules can cause steric inhibition between neighboring proteins (Figure 5D). Such intermolecular repulsion due to branched side chains occurs only when the molecules are in close proximity and sterically interact on a twodimensional surface, but not in dilute solution, and does not occur in unbranched polymers such as underglycosylated proteins (Figure 5D). Based on the above, we propose the following model for membrane proteins: Only when the membrane proteins are glycosylated does strong steric repulsion occur between neighboring molecules during the densification process, suppressing densification.

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub>, is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (4) The authors showed that glycoprotein expression inversely correlated with viral infection and link viral entry inhibition to steric hindrance caused by the glycoprotein. Alternative explanations would be that the glycoprotein expression (a) reroutes endocytosed viral particles or (b) lowers cellular endocytic rates and via either mechanism reduce viral infection. The authors should provide evidence that these alternatives are not occurring in their system. They could for example experimentally test whether non-specific endocytosis is still operational at similar levels, measured with fluid-phase markers such as 10kDa dextrans.

      The results of the experiment suggested by the reviewer are shown in the new Supplementary Figure 3B. (This results in generation of a new Supplementary Figure 3, and previous Supplementary Figures 4-5 are now renumbered as Supplementary Figures 5-6). Endocytosis of 10KDa dextran was attenuated by the expression of several large-sized molecules, but was not affected by the expression of many other glycoproteins that have the ability to inhibit infection. These results were clearly different from the results in which virus infection was inhibited more by the amount of glycan than by molecular weight. Therefore, it was found that many glycoproteins inhibit virus infection through processes other than endocytosis. Based on the above, we have added the following to the manuscript: (p9 New paragraph:)

      We also investigated the effect of membrane glycoproteins on membrane trafficking, another process involved in viral infection. Expression of MUC1 with higher number of tandem repeats reduced the dextran transport in the fluid phase, while expression of multiple membrane glycoproteins that have infection inhibitory effects, including truncated MUC1 molecules, showed no effect on fluid phase endocytosis, indicating a molecular weight-dependent effect (Supplementary Figure 3B). The molecular weight-dependent inhibition of endocytosis may be due to factors such as steric inhibition of the approach of dextran molecules or a reduction in the transportable volume within the endosome. In any case, it is clear that many low molecular weight glycoproteins inhibit infection by disturbing processes other than endocytosis. Based on the above, we focus on the effect of glycoproteins on the formation of the interface between the virus and the cell membrane.

      (5) The authors approach their system with the goal of generalizing the cell membrane (the cumulative effect of all cell membrane molecules on viral entry), but what about the inverse? How does the nature of the molecule seeking entry affect the interface? For example, a lipid nanoparticle vs a virus with a short virus-cell distance vs a virus with a large virus-cell distance?

      Thank you for your interesting comment. If the molecular size of the ligand is large, it should affect virus adsorption and molecular exclusion from the interface. In lipid nanoparticle applications, controlling this parameter may contribute to efficiency. In addition, a related discussion is the influence of virus shell molecules that are not bound to the receptor. I will revise the text based on the above.

      Discussion (as a new paragraph after the paragraph added in Q1):

      In this study, we attempted to generalize the surface structure on the cell side, but the surface structure on the virus side may also have an effect. The efficiency of virus adsorption and the efficiency of cell membrane protein exclusion from the interface will change depending on the molecular length of the receptor-ligand, although receptor priming also has an effect. In addition, free ligands of the viral envelope or other coexisting glycoproteins may also have an effect as they are also required for exclusion from the virus-cell interface. In fact, there are reports that expression of CD43 and PSGL-1 on the virus surface reduces virus infection efficiency (Murakami et al., 2020). Such interface structure may be one of the factors that determine the infection efficiency that differs depending on the virus strain. More generally, modification of the surface structure may be effective for designing materials such as lipid nanoparticles that construct the interface with cell.

      SPECIFIC QUESTIONS:

      (1) The proposed mechanism indicates that glycosylation status does not produce an effect in the "trapping" of virus, but in later stages of the formation of the virus/membrane interface due to the high energetic costs of displacing highly glycosylated molecules at the vicinity of the virus/membrane interface. It is suggested to present a correlation between the levels of glycans in the Calu-3 cell monolayers and the number of viral particles bound to cell surface at different pulse times. Results may be quantified following the same method as shown in Figure 2 for the correlation between glycosylation levels and viral infection (in this case the resulting output could be number of viral particles bound as a function of glycan content).

      The results of this experiment are now shown as Supplementary Figure 2F and 2G. We compared the amount of virus bound after incubation for 10 minutes or for 3 hours as in the infection experiment, but no negative correlation was found between the total amount of glycans on the surface of the Calu3 monolayer and the amount of virus bound. Interestingly, there was a sight positive correlation was detected, which may be due to concentrated virus receptor expressions in glycan-enriched cells. This result shows that glycoproteins do not strongly inhibit virus binding. We will amend the text as follows (see also Q6).

      (Page 10)

      Glycans could be one of the biochemical substances ……We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no negative correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). The slight positive correlation between bound virus and glycans may be due to higher expression levels of viral receptors in glycan-rich cells. ….

      (2) The use of the purified glycosylated and non-glycosylated ectodomains of MUC1 and CD-43 to establish a relationship between glycosylation and protein density into lipid bilayers on silica beads is an elegant approach. An assessment of the impact of glycosylation in the structural conformation of both proteins, for instance determining the Flory radius of the glycosylated and non-glycosylated ectodomains by the FRET-FLIM approach used in Figure 4 would serve to further support the hypothesis of the article.

      Unfortunately, the proposed experiment did not provide a strong enough FRET signal for analysis. This was due in part to the difficulty in constructing a bead-coated bilayer incorporating PlasMem Bright Red, which established a good FRET pair in cell experiments. We also tried other fluorescent molecules, but were unable to obtain a strong and stable FRET signal. Another reason may be that the curvature of the beads is larger than that of the cells, making it difficult to obtain a sufficient cumulative FRET effect from multiple membrane dyes. We plan to improve the experimental system in the future.

      On the other hand, we also found that in this system, the signal changes were very subtle, making it difficult to detect molecular conformational changes using FRET. After reconsidering general questions (2) and (3), we speculated that the molecular density in the experiment, even at saturation binding, was below or at most equivalent to the brush transition point. In other words, proteins on the bead-coated bilayer may not be significantly extended in the vertical direction. Therefore, the conformational changes of these proteins may not be large enough to be detected by the FRET assay. We updated Figure 3C and Figure 5D (model description) to better reflect the above discussion and introduced the following discussion in the manuscript.

      (page11)

      We introduced the framework of conventional polymer brush theory to study the structure of viruscell interfaces containing proteins……. Numerous experimental measurements of the formation of polymer brushes have also been reported (Overney et al., 1996; Wu et al., 2002; Zhao and Brittain, 2000). In these measurements, the transition to a brush typically occurs at a density higher than that required to pack a surface with hemispherical polymers of diameter R<sub>F</sub>. This is the point at which the energy loss due to repulsive forces between adjacent molecules (f<sub>int</sub>) exceeds the energy required to stretch the polymer perpendicularly into a brush (f<sub>el</sub>).

      (page16)

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (3) The MUC1 glycoprotein is reported to have a dramatic effect in reducing viral infection shown in Fig 1F. On the contrary, in a different experiment shown in Fig2D and Fig2H MUC1 has almost no effect in reducing viral infection. It is not clear how these two findings can be compatible.

      The immunostaining results show that the density of MUC1 molecules is very low in the experimental system in Figure 2 (Figure 2C), which is supported by the SC-RNASeq data (as shown in Supplementary Figure 2A, MUC1 is not listed as a top molecule). In other words, the MUC1 expression level in this experiment is too low to affect virus infection inhibition. On the other hand, the Pearson correlation function represents the strength of the linear relationship between two variables, so it is not the most appropriate indicator for seeing the correlation with the MUC1 expression level, which has little change (Figure 2D, 2F). In fact, even TOS analysis, which can see the correlation by focusing on the cells with the highest expression level, cannot detect the correlation (Figure 2H).Therefore, the MUC1 data in Figure 2DFH will be annotated and corrected in the figure legend.

      Figure2 Legend:

      MUC1 has a small mean expression level and variance, and is more affected by measurement noise than other molecules when calculating the Pearson correlation function (Figure 2C-2F). In addition, the number of cells in which expression can be detected is small, so no significant correlation was detected by TOS analysis (Figure 2H).

      (4) Why is there a shift in the use of the glycan marker? How does this affect the conclusions? For the infection correlation relating protein expression with glycan content the PNA-lectin was used together with flow cytometry. For imaging the infection and correlating with glycan content the SSA-lectin is used.

      For each cell line, we selected the lectin that could be measured over the widest dynamic range. This lectin is thought to recognize the predominant glycan species in the cell line (Fig. S1C, Fig. 2D). In our model, we believe that viral infection inhibition is not specific to the type of sugar, but is highly dependent on the total amount of glycans. If this hypothesis is correct, the reason we used different lectins in each experiment is simply to select the lectin that recognizes the most predominant glycan species that is most convenient for predicting the total amount of glycans in cells. This hypothesis is consistent with our observations, where the total amount of glycans estimated by different lectins could explain the infection inhibition in a similar way in the experiments in Figures 1 and 2, and the TOS analysis in Figure 2 showed that minor glycans also have an infection inhibitory effect. On the other hand, it is of course possible to predict the total amount of glycans more accurately by obtaining as much information on glycans as possible (related to Q5). Based on the above discussion, the manuscript will be revised as follows.

      Page5

      Using HEK293T cell lines exogenously expressing genes of these proteins tagged with fluorescent markers, their glycosylation was measured by binding of a lectin from Arachis hypogaea (PNA), and the number of these proteins in the cells was measured simultaneously. PNA was used for the measurement because it has a wider dynamic range than other lectins (Supplementary Figure 1C). This suggests that GalNAc recognized by PNA is predominantly present on glycans of HEK293T cells, especially on the termini of glycans that are amenable to lectin binding, compared to other saccharides.. …

      page9  

      Our findings suggest that membrane glycoproteins nonspecifically inhibit viral infection, and we hypothesize that their inhibitory function is also nonspecific depending on the type of glycan. Our hypothesis is consistent with the observations in the TOS analysis. Although minor saccharide species in the system (such as GlcNAc and GalNAc recognized by DSA, WGA, or PNA) showed anticolocalization with infection, their scores were much lower than those of major saccharide species. This suggests that all major and minor saccharide species have an infection inhibitory effect, but cells enriched with minor type glycans are only partially present in the system, and the contribution of these cells to virus inhibition is also partial. It is also consistent with the observation that the amount of GalNAc recognized by PNA determines the virus infection inhibition in HEK 293T cells (Figure 1). Therefore, we believe that our assay using a single type of predominantly expressed lectin is still useful for estimating the total glycan content. Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of saccharide species would allow for more accurate estimation. It should be noted that the amount of bound lectin does not necessarily measure the overall glycan composition but likely reflects the sugar population at the free end of the glycan chain to which the lectin binds most.

      (5) The authors in several instances comment on the relevance and importance of the total glycan content. Nevertheless, these conclusions are often drawn when using only one glycan-binding lectin. In fact, the anti-correlation with viral infection is distinct for the various lectins (Fig 2D and Fig 2H). Would it make more sense to use a combination of lectins to get a full glycan spectrum?

      As stated in the answer to Q4, we believe that we were able to detect the infection-suppressing effect of the total glycan amount by using the measurement value of the major component glycan as an approximation. However, as you pointed out, if we could accurately measure the minor glycan components and add up their values, we believe that we could measure the total glycan amount more accurately. In order to measure multiple glycans simultaneously and with high accuracy, some kind of biochemical calibration may be necessary to compare the measurements of lectin-glycan pairs with different binding constants. We believe that these are very useful techniques, and would like to consider them as a future challenge. The corrections listed in Q4 are shown below.

      (Page 9)

      Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of glycans would allow for more accurate estimation. …….

      (6) Fig 3A shows virus binding to HEK cells upon MUC1 expression. Please provide the surface expression of the MUC1 so that the data can be compared to Fig 1F. Nevertheless, it is not clear why the authors used MUC expression as a parameter to assess virus binding. Alternatively, more conclusive data supporting the hypothesis would be the absence of a correlation between total glycan content and virus binding capacity.

      The relationship between the expression level of MUC1 in each cell and the amount of virus binding is shown in Supplementary Figure 3A. There is no correlation between the two. In HEK293T cells, many glycans are modified with MUC1, so MUC1 was used as the indicator for analysis (Supplementary Figure 1C). As you pointed out, it is better to use the amount of glycan as an indicator, so we analyzed the relationship between the amount of bound virus and the amount of glycan on the surface on the Calu-3 monolayer (Supplementary Figure 2F, 2G, introduced in the answer to Specific (Q1)). In any case, no correlation was found between virus binding and surface glycans. I will correct the manuscript as follows.

      (page 9)

      Glycans could be one of the biochemical substances that link the intracellular molecular composition and macroscopic steric forces at the cell surface. To clarify this connection, we further investigated the mechanism by which membrane glycoproteins inhibit viral infection. First, we measured viral binding to cells to determine which step of infection is inhibited. We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). These results indicate that glycoproteins do not inhibit virus binding to cells, but rather inhibit the steps required for subsequent virus internalization.

      (7) While the use of the Flory model could provide a simplification for a (disordered) flexible structure such as MUC1, where the number of amino acids equals N in the Flory model, this generalisation will not hold for all the proteins. Because folding will dramatically change the effective polypeptide chain-length and reduce available positioning of the amino acids, something the authors clearly measured (Fig 4G), this generalisation is not correct. In fact, the generalisation does not seem to be required because the authors provide an estimation for the effective Flory radius using their FRET approach

      Current theories generalizing the Flory model to proteins are incomplete, and it is certainly not possible to accurately estimate the size of individual molecules undergoing different folding. However, we found such a generalized model to be useful in understanding the overall properties of membrane proteins. In our experiments, we were indeed able to obtain the R<sub>F</sub>s of some individual molecules by FRET measurements. However, this modeling made it possible to estimate the distribution range of the RFs, including for larger proteins that cannot be measured by FRET. For example, from our results, we can estimate that the upper limit of the RFs of the longest membrane proteins is about 10.5 nm, assuming that the proteins follow the Flory model in all respects except for the shortening of the effective length due to folding. These analyses are useful for physical modeling of nonspecific phenomena, as in our case.

      In order to discuss the balance between such theoretical validity and the convenience of practical handling, we revise the manuscript as follows.

      (page 13) 

      This shift in ν indicates that glycosylation increases the size of the protein at equilibrium, but the change in R<sub>F</sub> is slight, e.g., a 1.3-fold increase for one of the longest ectodomains with N = 4000 when these values of ν are applied. This calculation also gives a rough estimate of the upper limit of the R<sub>F</sub> of the extracellular domains of all membrane proteins in the human genome (approximately 10.5 nm). Physically, this change in ν by glycosylation may be caused by the increased intramolecular exclusion induced sterically between glycan chains. This estimated ν are much smaller than that of 0.6 for polymers in good solvents, possibly due to protein folding or anchoring effects on the membrane. In fact, the ν of an intrinsically disordered protein in solution has been reported to be close to 0.6 (Riback et al., 2019; Tesei et al., 2024). Overall, these analyses using the Flory model provide information on the size distribution of membrane proteins and the influence of glycans, although the model cannot predict the exact size of each protein due to its specific folding.

      MINOR COMMENTS/EDITS:

      (1) In Figures 2A and 2C, as well as Supplemental Figure 2C, the fluorescent images indicate that GFP expression differs among the various groups. Ideally, these should be at the same GFP expression level, as the glycan and antibody staining occurred post-viral infection. For instance, ACE2 is a well-known positive control and should enhance SARS-CoV-2 infection. Yet, based on the findings presented in Supplemental Figure 2C, ACE2 appears to correlate with the lowest infection rate. The relationship between the infection rate and key glycoproteins needs clearer quantification.

      We measured the virus inhibition effect specific to each molecule using a cell line expressing low levels of viral receptors and glycoproteins (Fig. 1). On the other hand, the system in Fig. 2 contains diverse viral receptors and glycoproteins and has not been genetically manipulated. (We apologize that there was a typo in our description of experiment, which will be corrected, as shown below). The variation in infection rate between samples was caused by multiple factors but was not related to the molecule for which the correlation was measured. The receptor-based normalization used in the experiment in Fig. 1 cannot be applied in this system in Fig.2 due to the complexity of the gene expression profile. Therefore, instead of such parameter-based normalization, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      To test this hypothesis, we infected a monolayer of epithelial cells endogenously expressing highly heterogeneous populations of glycoproteins with SARS-CoV-2-PP, and measured viral infection from cell to cell visually by microscope imaging. …

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (2) For clarity, the authors should consider separating introductory and interpretive remarks from the presentation of results. These seem to get mixed up. The introduction section could be expanded to include more details about glycoproteins, their relevance to viral infection, and explanations of N- and O-glycosylation.

      Following the suggestion, (1) we added an explanation of the relationship between glycoproteins and viral infection, and N-glycosylation and O-glycosylation to the Introduction section, and (2) moved the introductory parts in the Results section to the Introduction section, as follows.

      (1; page3)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. These glycoprotein groups have no common amino acid sequences or domains. The glycans modified by these proteins include both the N-type, which binds to asparagine, and the O-type, which binds to serine and threonine. Furthermore, there have been no reports of infection-suppressing effects according to the specific monosaccharide type in the glycan. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (2 : Page 4-5)

      To confirm that glycans are a general chemical factor of steric repulsion, an extensive list of glycoproteins on the cell membrane surface would be useful. The wider the range of proteins to be measured, the better. Therefore, we collect information on glycoproteins on the genome and compile them into a list that is easy to use for various purposes. Then, by analyzing sample molecules selected from this list, it may be possible to infer the effect of the entire glycoprotein population on the steric inhibition of virus infection, despite the complexity and diversity of the Glycome (Dworkin et al., 2022; Huang et al., 2021; Moremen et al., 2012; Rademacher et al., 1988). Elucidation of the mechanism of how glycans regulate steric repulsion will also be useful to quantitatively discuss the relationship between steric repulsion and intracellular molecular composition. For this purpose, we apply the theories of polymer physics and interface chemistry.

      Results

      List of membrane glycoproteins in human genome and their inhibitory effect on virus infection

      To test the hypothesis that glycans contribute to steric repulsion at the cell surface, we first generate a list of glycoproteins in the human genome and then measure the glycan content and inhibitory effect on viral infection of test proteins selected from the list (Figure 1A). To compile the list of glycoproteins, we ….

      (3) In the sentence, "glycoproteins expressed lower than CD44 or other membrane proteins including ERBB2 did not exhibit any such correlation, although ERBB2 expressed ~4 folds higher amount than CD44 and shared ~7% among all membrane proteins," it is unclear which protein has a higher expression level: CD44 or ERBB2? Furthermore, the use of the word "although" needs clarification.

      Corrected as follows:

      (page 8)

      ……showed a weak inverse correlation with viral infection; even such a weak correlation was not observed with other proteins, including ERBB2, which is approximately four-fold more highly expressed than CD44

      (4) In Supplementary Figure 5, please provide an explanation of the data in the figure legend, particularly what the green and red signals represent.

      Corrected as follows:

      STORM images of all analyzed cells, expressing designated proteins. The detected spots of SNAPsurface Alexa 647 bound to each membrane protein are shown in red, and the spots of CF568conjugated anti-mouse IgG secondary antibody that recognizes Spike on SARS-CoV2-PP are shown in green. For cells, a pair of two-color composite images and a CF658-only image are shown. Numbers on axes are coordinates in nanometer.

      (5) It would be good to see a comprehensive demonstration of the exact method for estimation of membrane protein density (in the SI), since this is an integral part of many of the analyses in this paper. The method is detailed in the Methods section in text and is generally acceptable, but this methodology can vary quite widely and would be more convincing with calibration data provided.

      We added flow cytometry and fluorometer data for calibration (Supplementary Figure 1L,M) and introduced a sentence explaining the procedure for obtaining the values used for calibration as follows:

      (page 54)

      …….Liposome standards containing fluorescent molecules (0.01– 0.75 mol% perylene (Sigma), 0.1– 1.25 mol% Bodipy FL (Thermo), and 0.005– 0.1% DiD) as well as DOPC (Avanti polar lipids) were measured in flow cytometry (Supplmentary Figure 1L). Meanwhile, by fluorimeter, fluorescence signals of these liposomes and known concentrations of recombinant mTagBFP2, AcGFP and TagRFP-657 proteins and SNAP-Surface 488 and Alexa 647 dyes (New England Biolabs) were measured in the same excitation and emission ranges as in flow cytometry assays (Supplementary Figure 1M). Ratios between the integral of fluorescent intensities in this range between two dyes of interest are used for converting the signals measured in flow cytometry. Additional information needed for calibration is the size difference between liposomes and cells. The average diameter of liposomes is measured to be 130 nm, and the diameter of HEK 293T cells is estimated to be 13 µm (Furlan et al., 2014; Kaizuka et al., 2021b; Ushiyama et al., 2015). From these data, the signal from cells acquired by flow cytometry can be calibrated to molecular surface density. For example, the Alexa 647 signal acquired by flow cytometry can be converted to the signal of the same concentration of DID dye using fluorometer data, but the density of the dye is unknown at this point. This converted DID signal can then be calibrated to the density on liposomes rather than cells using liposome flow cytometry data. Finally, adjusted for the size difference between liposomes and cells, the surface molecular density on cells is determined. By going through one cycle of these procedures, we could obtain calibration unit, such as 1 flow cytometry signal for a cell in the designated illumination and detection setting = 0.0272 mTagBFP2 µm<sup>-2</sup> on cell surface.

      (Figure legend, Supporting Figure 1: )

      … L. Flow cytometry measurements for liposomes containing serially diluted dye-conjugated lipids and fluorescent membrane incorporating molecules (Bodipy-FL, peryelene, and DID) with indicated mol%. Linear fitting shown was used for calibration.  M. Fluorescence emission spectrum for equimolar molecules (50µM for green and far-red channels, and 100µM for blue channel), excited at 405 nm, 488 nm, and 638 nm, respectively. Membrane dyes were measured as incorporated in liposomes. Purified recombinant mTagBFP2 was used.

      (6) Fig 2A: The figure legend should describe the microscopy method for a quick and easy reference.

      Corrected as follows:

      (Figure legend, Figure 2)

      A. Maximum projection of Z-stack images at 1 µm intervals taken with a confocal microscope. SARSCoV2-pp-infected, air-liquid interface (ALI)-cultured Calu-3 cell monolayers were chemically fixed and imaged by binding of Alexa Fluor 647-labeled Neu5AC-specific lectin from Sambucus sieboldiana (SSA) and GFP expression from the infecting virus.

      (7) Fig 2B: what is the color bar supposed to represent? Is it the pixel density per a particular value? Units and additional description are required. In addition, these are "arbitrary units" of fluorescence, but you should tell us if they've been normalized and, if so, how. They must have been normalized, since the values are between 0 and 1, but then why does the scale bar for SSA only go to 0.5?

      The color bar shows the number of pixels for each dot, resulting in the scale for density scatter plot. The scale on the X-axis was incorrect. All these issues have been fixed in this revision, in the figure and in the legend as follows.

      (Figure legend, Figure 2)

      B. Density scatter plot of normalized fluorescence intensities in all pixels in Figure 2A in both GFP and SSA channels. Color indicates the pixel density.  

      (8) Fig 3D has a typo: this should most likely be "grafted polymer."

      (9) Fig 3E has a suspected typo: in the text, the author uses the word "exclusion" instead of "extrusion." The former makes more sense in this context.

      (10) Fig 5A has a typo: "Suppoorted" instead of Supported Lipid Bilayer.

      (11) Fig 7E-F has a suspected typo: Again, this should most likely be the word "exclusion" instead of "extrusion."

      Thank you so much for pointing out these mistakes, I have corrected them all as suggested.

      (12) Which other molecules are referred to, on page 6 (middle), that do not have an inhibitory effect? Please specify.

      We specified the molecules that have inhibitory effects, and revised as follows: 

      These proteins include those previously reported (MUC1, CD43) as well as those not yet reported (CD44, SDC1, CD164, F174B, CD24, PODXL) (Delaveris et al., 2020; Murakami et al., 2020). In contrast, other molecules (VCAM-1, EPHB1, TMEM123, etc.) showed little inhibitory effect on infection within the density range we used.

      (13) Fig 2 B: the color LUT is not labelled nor explained.

      Corrected as described in (7)

      (14) Please provide the scale bars for figures Fig 2A, C, E and Suppl Fig 2C, D.

      Corrected. 

      (15) Please provide the name for the example of a 200 aa protein that is meant to inhibit viral infection but is not bigger than ACE2. Also providing the densities in Fig 3A would help to correlate the data to Fig 1F.

      Corrected as follows: 

      (page 10)

      We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein (mean density ~50 µm<sup>-2</sup>) that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). …..

      In our measurements, a protein with extracellular domain of ~200 amino acids (e.g. CD164 (138aa)) at a density of ~100 μm-2 showed significant inhibition in viral infection. This molecule is shorter than the receptor ACE2 (722 aa),

      (16) In the experiments conducted in HeK cells expressing the different glycoproteins studies it is mentioned that results of infection were normalised by the amount ACE2 expression. Is the expression of receptor homogenous in the experiments conducted in Figure 2? Clarify in the methods if the expression of receptor has been quantified and somehow used to correct the intensity values of GFP used to determine infection.

      As also explained for Q1, the system in Fig. 2 contains diverse viral receptors and glycoproteins, and the receptor-based normalization used in the experiment in Fig. 1 cannot be applied. Instead, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (17) Can you provide additional details about the method of thresholding to eliminate "background" localisations in STORM?

      Method section was corrected as follows: 

      (page 59)

      …Viral protein spots not close to cell membranes were eliminated by thresholding with nearby spot density for cell protein. Specifically, the entire image was pixelated with a 0.5µm square box and all viral protein signals within the box that had no membrane protein signals were removed. Also, viral protein spots only sparsely located were eliminated by thresholding with nearby spot density for viral protein. This thresholding process removed any detected viral protein spot that did not have more than 100 other viral protein spots within 1µm.

      (18) The article says "It was shown that the number of bound lectins correlated with the amount of glycans, not with number of proteins (Figure 1E)". Figure 1E correlates experimental PNA/mol with predicted glycosylation sites, not with the number of expressed proteins. Correct sentence with the right Figure reference.

      As you pointed out, the meaning of this sentence was not clear. We have amended it as follows to clarify our intention:

      (page 8)

      Since a wide range of glycoproteins inhibit viral infection, it is possible that all types of glycoproteins have an additive effect for this function. ……. In this cell line, this inverse correlation was most pronounced when quantifying N-acetylneuraminic acid (Neu5AC, recognized by lectins SSA and MAL) compared to the various types of glycans, while some other glycans also showed weak correlations (Supplementary Figure 2C). These results showed that the amount of virus infection in cell anticorrelated with the amount of total glycans on the cell surface. As amount of glycans is determined by the total population of glycocalyx, infection inhibitory effect can be additive by glycoprotein populations as we hypothesized.

      If the inhibitory effect is nonspecific and additive, the contribution of each protein is likely to be less significant. To confirm this, we also measured the correlation between the density of each glycoprotein and viral infection. CD44, which was shown to…….. Our results demonstrate that total glycan content is a superior indicator than individual glycoprotein expression for assessing infection inhibition effect generated by cell membrane glycocalyx. These results are consistent with our hypothesis regarding the additive nature of the nonspecific inhibitory effects of each glycoprotein.

    1. eLife Assessment

      Endothelial cell-specific loss of TGF-beta signaling in mice leads to CNS vascular defects, specifically impairing retinal development and promoting immune cell infiltration. The data are solid, showing that loss of TGF-beta signaling triggers vascular inflammation and attracts immune cells specific to CNS vasculature. These findings are important, highlighting TGF-beta's role in maintaining vascular-immune homeostasis and its therapeutic potential in neurovascular inflammatory diseases.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript analyses the effects of deleting the TgfbR1 and TgfbR2 receptors from endothelial cells at postnatal stages on vascular development and blood-retina barrier maturation in the retina. The authors find that deletion of these receptors affects vascular development in the retina but importantly it affects the infiltration of immune cells across the vessels in the retina. The findings demonstrate that Tgf-beta signaling through TgfbR1/R2 heterodimers regulates primarily the immune phenotypes of endothelial cells in addition to regulating vascular development, but has minor effects on the BRB maturation. The data provided by the authors provides a solid support for their conclusions.

      Strengths:

      (1) The manuscript uses a variety of elegant genetic studies in mice to analyze the role of TgfbR1 and TgfbR2 receptors in endothelial cells at postnatal stages of vascular development and blood-retina barrier maturation in the retina.

      (2) The authors provide a nice comparison of the vascular phenotypes in endothelial-specific knockout of TgfbR1 and TgfbR2 in the retina (and to a lesser degree in the brain) with those from Npd KO mice (loss of Ndp/Fzd4 signaling) or loss of VEGF-A signaling to dissect the specific roles of Tgf-beta signaling for vascular development in the retina.

      (3) The snRNAseq data of vessel segments from the brains of WT versus TgfbR1 -iECKO mice provides a nice analysis of pathways and transcripts that are regulated by Tgf-beta signaling in endothelial cells.

      Weaknesses (Original Submission):

      (1) The authors claim that choroidal neovascular tuft phenotypes are similar in TgfbrR1 KO and TgfbrR2 KO mice. However, the phenotypes look more severe in the TgfbrR1 KO rather than TgfbrR2 KO mice. Can the authors show a quantitative comparison of the number of choroidal neovascular tufts per whole eye cross-section in both genotypes?

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation, the authors claim that there is increased vascular leakage in the TgfbR1 KO mice. However, there does not seem like Sulfo-NHS-biotin is leaking outside the vessels. Therefore, it cannot be increased vascular permeability. Can the authors provide a detailed quantification of the leakage phenotype?

      (3) The immune cell phenotyping by snRNAseq seems premature as the number of cells is very small. The authors should sort for CD45+ cells and perform single cell RNA sequencing.

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include some tracers in addition to serum IgG leakage.

      (5) A previous study (Zarkada et al., 2021, Developmental Cell) showed that EC-deletion of Alk5 affects the D tip cells. The phenotypes of those mice look very similar to those shown for TgfbrR1 KO mice. Are D tip cells lost in these mutants by snRNAseq?

      Comments on revisions:

      The authors have addressed the major weaknesses that I raised with the original submission adequately in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The authors meticulously characterized EC-specific Tgfbr1, Tgfbr2, or double knockout in the retina, demonstrating through convincing immunostaining data that loss of TGF-β signaling disrupts retinal angiogenesis and choroidal neovascularization. Compared to other genetic models (Fzd4 KO, Ndp KO, VEGF KO), the Tgfbr1/2 KO retina exhibits the most severe immune cell infiltration. The authors proposed that TGF-β signaling loss triggers vascular inflammation, attracting immune cells - a phenotype specific to CNS vasculature, as non-CNS organs remain unaffected.

      Strengths:

      The immunostaining results presented are clear and robust. The authors performed well-controlled analyses against relevant mouse models. snRNA-seq corroborates immune cell leakage in the retina and vascular inflammation in the brain.

      Comments on revisions:

      The authors have revised the manuscript and addressed all my questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Weaknesses: 

      (1) The authors claim that choroidal neovascular tuft phenotypes are similar in TgfbrR1 KO and TgfbrR2 KO mice. However, the phenotypes look more severe in the TgfbrR1 KO rather than TgfbrR2 KO mice. Can the authors show a quantitative comparison of the number of choroidal neovascular tufts per whole eye cross-section in both genotypes? 

      Thank you for asking about this.  Each VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retina exhibits multiple zones of choroidal neovascularization.  The examples in Figures 1 and Figure 1 – Figure supplements 1 and 2 are mostly from retinas with loss of TGFBR1, but we could have chosen similar examples from retinas with loss of TGFBR2.  The quantification in the original version of Figure 1- Figure supplement 1 panel C had a labeling error.  It actually showed the quantification choroidal neovascularization (CNV) in the sum of both VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, not only in VE-cad-CreER;TGFBR1 CKO/- retinas as originally labeled.  The point that it made is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now updated that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  We think it likely that a more extensive sampling would show little or no difference between these two genotypes – but the data is what it is. This is now described in the Results section. 

      We have also added a panel D to Figure 1- Figure supplement 1, which shows a retina flatmount analysis of CNV.  This is done by mounting the retina with the photoreceptor side up so that the outer retina can be optimally imaged. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors claim that there is increased vascular leakage in the TgfbR1 KO mice. However, it does not seem like Sulfo-NHS-biotin is leaking outside the vessels. Therefore, it cannot be increased vascular permeability. Can the authors provide a detailed quantification of the leakage phenotype? 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing. 

      Thank you for raising this point.  For the revised manuscript, we have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociate the tissue and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present after tissue homogenization.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusions as the original analysis: the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      As described in our response to query 2, we have conducted additional experiments to look at vascular leakage in control, VE-cad-CreER;TGFBR1 CKO/-, and NdpKO retinas.  We have also looked at Sulfo-NHS-biotin leakage in the VE-cadCreER;TGFBR1 CKO/- brain, and it is indistinguishable from WT controls.  Since Sulfo-NHS-biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) A previous study (Zarkada et al., 2021, Developmental Cell) showed that EC-deletion of Alk5 affects the D tip cells. The phenotypes of those mice look very similar to those shown for TgfbrR1 KO mice. Are D-tip cells lost in these mutants by snRNAseq? 

      Please note: Alk5 is another name for TGFBR1.  This is noted in the second sentence of paragraph 4 of the Introduction.  The reviewer is correct: there are a lot of similarities because these are exactly the same KO mice.  Also, Zarkada and we used the same VEcadCreER to recombine the CKO allele.  The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published in Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251.  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.  Finally, we have no reason to doubt the results of Zarkada et al.

      Reviewer #2 (Public review): 

      Summary:

      The authors meticulously characterized EC-specific Tgfbr1, Tgfbr2, or double knockout in the retina, demonstrating through convincing immunostaining data that loss of TGF-β signaling disrupts retinal angiogenesis and choroidal neovascularization. Compared to other genetic models (Fzd4 KO, Ndp KO, VEGF KO), the Tgfbr1/2 KO retina exhibits the most severe immune cell infiltration. The authors proposed that TGF-β signaling loss triggers vascular inflammation, attracting immune cells - a phenotype specific to CNS vasculature, as non-CNS organs remain unaffected. 

      Strengths: 

      The immunostaining results presented are clear and robust. The authors performed well-controlled analyses against relevant mouse models. snRNA-seq corroborates immune cell leakage in the retina and vascular inflammation in the brain. 

      Weaknesses: 

      The causal link between TGF-β loss, vascular inflammation, and immune infiltration remains unresolved. The authors' model posits that EC-specific TGF-β loss directly causes inflammation, which recruits immune cells. However, an alternative explanation is plausible: Tgfbr1/2 KO-induced developmental defects (e.g., leaky vessels) permit immune extravasation, subsequently triggering inflammation. The observations that vein-specific upregulation of ICAM1 staining and the lack of immune infiltration phenotypes in the non-CNS tissues support the alternative model. Late-stage induction of Tgfbr1/2 KO (avoiding developmental confounders) could clarify TGF-β's role in retinal angiogenesis versus anti-inflammation. 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      In the revised manuscript, we have expanded the Discussion section to address the two alternative hypotheses raised by the reviewer.  Here are the relevant data in a nutshell: (1) vascular leakage into the parenchyma, as measured with sulfo-NHSbiotin, in TGFBR1 endothelial CKO retinas is far less than in NdpKO retinas, where nearly all ECs convert to a fenestration+ (PLVAP+) phenotype and there is leakage of sulfo-NHS-biotin, (2) ICAM1 in ECs in TGFBR1 endothelial CKO retinas increases several-fold more than in NdpKO or Frizzled4KO retinas, (3) TGFBR1 endothelial CKO retinas have more infiltrating immune cells than NdpKO or Frizzled4KO retinas, and (4) in TGFBR1 endothelial CKO retinas large numbers of immune cells are observed within and adjacent to blood vessels.  We think that the simplest explanation for these data is that loss of TGFbeta signaling in ECs causes an endothelial inflammatory state with enhanced immune cell extravasation.  That said, the case for this model is not water-tight, and there could be less direct mechanisms at play.  In particular, this model does not explain why the inflammatory phenotype is limited to CNS (and especially retinal) vasculature.

      Regarding the last sentence of the reviewer’s comment (“Late stage induction…”), we have tried activating CreER recombination at different ages and we observe a large reduction in the inflammatory phenotype when recombination is initiated after vascular development is complete.   This observation suggests that the vascular developmental/anatomic defect – and perhaps the resulting retinal hypoxia response – is required for the inflammatory phenotype.  In the revised manuscript we have expanded the Results and Discussion sections to describe this observation.

      Reviewer #1 (Recommendations for the authors): 

      Suggestions for experiments: 

      (1) The authors need to show a quantitative comparison of the number of choroidal neovascular tufts per whole eye crosssection in both genotypes (TgfbR1 and TgfbR2 KO mice). 

      Thank you for raising this point.  The quantification in the original version of Figure 1- Figure supplement 1 panel C was mis-labeled.  It quantifies choroidal neovascularization (CNV) in both VE-cad-CreER;TGFBR1 CKO/- and VE-cadCreER;TGFBR2 CKO/- retinas, not VE-cad-CreER;TGFBR1 CKO/- retinas only as originally labeled.  The point it makes is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now corrected that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  This is now described in the Results section. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors should provide a detailed quantification of the leakage phenotype outside the vessels into the CNS parenchyma, both in the retina and brain, in TgfbR1 KO mice. 

      Thank you for raising this point.  There is no detectable Sulfo-NHS-biotin leakage into the brain parenchyma in VE-cadCreER;TGFBR1 CKO/- mice.  We have expanded Figure 2 to show and quantify the data for retinal vascular leakage (Figure 2C and D).  The data show that in VE-cad-CreER;TGFBR1 CKO/- mice there is accumulation of Sulfo-NHS-biotin in the vascular tufts but minimal accumulation elsewhere in the retinal vasculature and minimal leakage of Sulfo-NHS-biotin into the retinal parenchyma.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing to ascertain these preliminary data. 

      Thank you for raising this point.  We have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data to increase the numbers of cells.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociating the cells and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells, since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset of and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusion as in the original submission, namely that the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.  The Results section has been expanded to describe this new data and analysis.    

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      Sulfo-NHS biotin leakage in the VE-cad-CreER;TGFBR1 CKO/- brain is minimal, and it is indistinguishable from WT controls.  Since Sulfo-NHS biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) The authors should perform a more detailed RNAseq analysis of tip and stack (stalk) cells in TgfbrR1 KO mice to determine whether D tip cells are lost in these mutants by snRNAseq. 

      The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published by Zarkada et al, who analyzed the same VE-cad-CreER;TGFBR1 CKO/- mutant mice, although they refer to the TGFBR1 gene by its alternate name ALK5 [Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251].  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.

      Suggestions for improving the manuscript:  

      (6) The statement that ECs acquire properties of immune cells (Page 2, Line 90) is incorrect. Endothelial cells may acquire characteristics of antigen presenting cells. 

      Thank you for that correction.  Based on the review from Amersfoort et al (2022) (Amersfoort J, Eelen G, Carmeliet P. (2022) Immunomodulation by endothelial cells - partnering up with the immune system? Nat Rev Immunol 22:576-588) and the articles cited in it, we have changed the sentence to “Although vascular endothelial cells (ECs) are not generally considered to be part of the immune system, in some locations and under some conditions they acquire properties characteristic of immune cells, including secretion of cytokines, surface display of co-stimulatory or co-inhibitory receptors, and antigen presentation in association with MHC class II proteins (Pober and Sessa, 2014; Amersfoort et al., 2022).”  

      (7) The statement in Page 3, Line 100-101 [In CNS ECs, quiescence is maintained in part by the actions of astrocyte-derived Sonic Hedgehog, with the result that few immune cells other than resident microglia are found within the CNS (Alvarez et al., 2011).] is incomplete. Wnt signaling also suppresses the expression of leukocyte adhesion molecules from endothelial cells and therefore helps with immune cell quiescence. 

      Thank you for raising that point.  We have expanded that sentence to include Wnt signaling in CNS endothelial cells, as described in the following reference: Lengfeld JE, Lutz SE, Smith JR, Diaconu C, Scott C, Kofman SB, Choi C, Walsh CM, Raine CS, Agalliu I, Agalliu D. (2017) Endothelial Wnt/beta-catenin signaling reduces immune cell infiltration in multiple sclerosis. Proc Natl Acad Sci USA 114:E1168-E1177.

      (8) It may be beneficial for the reader to separate the results of the vascular phenotypes related to choroidal neovascularization compared to retinal vascular development. 

      Thank you for this suggestion.  The two topics are partly overlapping: choroidal neovascularization is described in Figure 1, and retinal development is described in Figures 1 and 2.  The challenge is that some of same images illustrate both phenotypes as in Figure 1, so the topics cannot be easily separated.

      (9) In addition to comparing the phenotypes in Tgfb signaling mutant mice with Wnt signaling and VEGF-A signaling mutants, the authors should compare and contrast their data with those found in Alk5 KO mice, as there are a lot of similarities. 

      The reviewer has alerted us to a nomenclature challenge which we will try to resolve in the introduction: Alk5 is just another name for TGFBR1.  The reviewer is correct: there are a lot of similarities between the present study and that of Zarkada et al (2021) because both use the same TGFBR1(=Alk5) CKO mice.

      Reviewer #2 (Recommendations for the authors): 

      Figure 2 

      For 2B, the authors should clarify whether the two regions shown in the Tgfbr1 KO retina (P14) represent central vs. peripheral areas, as phenotype severity varies. 

      For 2C, does the uneven biotin accumulation reflect developmental gradients (e.g., central-peripheral maturation timing)? 

      Thank you for raising these points.  Regarding Figure 2B, these images are all from the mid-peripheral retina, where the phenotype is moderately severe.  This is now noted in the figure legend.

      Regarding Figure 2C, the reviewer is correct that the pattern of Sulfo-NHS-biotin is uneven in VEcadCreER;Tgfbr1CKO/- retinas – it accumulates only in the tufts.  We have expanded Figure 2C to show a comparison between control (i.e.

      phenotypically WT), NdpKO, and TGFBR1 endothelial KO retinas, and we have expanded the associated part of the Results section.  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is not leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      Figure 6 

      The claim that PECAM1+ rings on veins reflect EC-immune cell binding is uncertain, as PECAM1 is also known to be expressed by immune cells. The complete correlation of PECAM1 and CD45 staining signals suggests that a subset of immune cells upregulates PECAM1. The VEcadCreER;Tgfbr1 flox/-; SUN1:GFP reporter would be helpful to delineate ECimmune cell proximity. Super-resolution imaging with Z-stacks could also resolve spatial relationships (luminal vs. abluminal immune cell adhesion). 

      Thank you for this comment.  The reviewer is correct that, at the resolution of these images, we cannot determine whether the PECAM1 immunostaining signal is derived from ECs, from leukocytes, or from both.  This is now stated in the Results section.  The PECAM1-rich endothelial ring structure associated with leukocyte extravasation has been characterized in various publications, for example in (1) Carman CV, Springer TA. (2004) A transmigratory cup in leukocyte diapedesis both through individual vascular endothelial cells and between them. J Cell Biol 167:377-388 and (2) Mamdouh Z, Mikhailov A, Muller WA. (2009) Transcellular migration of leukocytes is mediated by the endothelial lateral border recycling compartment. J Exp Med 206:2795-2808.  The ring structures visualized in Figure 6D by PECAM1 immunostaining conform to the ring structures described in these and other papers.  In showing these structures, our point is simply that they likely represent sites of leukocyte extravasation.  This is now clarified in the text.  We have also added some additional references on leukocyte extravasation and the ring structures.

      Figure 7 

      A time-course analysis of ICAM1 would strengthen the mechanistic model. Does ICAM1 upregulation precede immune infiltration (supporting inflammation as the primary defect)? Given that immune cells appear by P14 (per snRNA-seq), is ICAM1 elevated earlier? 

      This is an interesting idea, but based on what is known about leukocyte adhesion and extravasation we predict that there will not be a clean temporal separation between ICAM1 induction and leukocyte adhesion/infiltration.  That is, if the proinflammatory state causes an increase in the number of leukocytes, then as ICAM1 levels increase, leukocyte adhesion would also increase.  Similarly, if the presence of leukocytes increases the pro-inflammatory state, then as the number of leukocytes increases, the levels of ICAM1 would be predicted to increase.  Thus, we think that a time course analysis is unlikely to provide a definitive conclusion.

      Figure 8-SF1 

      In brain slices, a transient pan-IgG accumulation suggests a self-resolving defect in the BBB. However, this BBB impairment appears to be spatiotemporally distinct from ICAM1 upregulation. ICAM1 staining is restricted to the lesion site, aligning with immune cell-driven inflammation. 

      Thank you for raising these points.  The reviewer is correct that these observations don’t fit together in a clear way.  There does not appear to be a general increase in brain vascular permeability in VE-cad-CreER;TGFBR1 CKO/- mice, as shown by sulfo-NHS-biotin.  However, there is a large and transient increase in IgG in the brain parenchyma, suggestive of a general vascular alteration, and – as the reviewer correctly notes – it is not accompanied by a generalized increase in ICAM1 vascular immunostaining.  At this point, we don’t have any real insight into the mechanistic basis of the transient IgG increase.

      Thank you for handling this manuscript.

    1. eLife Assessment

      This cleverly designed and potentially important work supports our understanding regarding how and whether social behaviours promoting egalitarianism can be learned, even when implementing these norms entails a cost for oneself. However, the evidence supporting the major claims is currently incomplete, with the major limitation being whether Ps truly learn egalitarianism from a teacher or instead exhibit reduced guilt across time that is reduced when observing others behaving more selfishly. With a strengthening of the supporting evidence, this work will be of interest to a wide range of fields, including cognitive psychology/neuroscience, neuroeconomics, and social psychology, as well as policy making.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether advantageous and disadvantageous inequality aversion can be vicariously learned and generalized. Using an adapted version of the ultimatum game (UG), in three phases, participants first gave their own preference (baseline phase), then interacted with a "teacher" to learn their preference (learning phase), and finally were tested again on their own (transfer phase). The key measure is whether participants exhibited similar choice preference (i.e., rejection rate and fairness rating) influenced by the learning phase, by contrasting their transfer phase and baseline phase. Through a series of statistical modeling and computational modeling, the authors reported that both advantageous and disadvantageous inequality aversion can indeed be learned (Study 1), and even be generalised (Study 2).

      Strengths:

      This study is very interesting, that directly adapted the lab's previous work on the observational learning effect on disadvantageous inequality aversion, to test both advantageous and disadvantageous inequality aversion in the current study. Social transmission of action, emotion, and attitude have started to be looked at recently, hence this research is timely. The use of computational modeling is mostly appropriate and motivated. Study 2 that examined the vicarious inequality aversion on conditions where feedback was never provided is interesting and important to strengthen the reported effects. Both studies have proper justifications to determine the sample size.

      Weaknesses:

      Despite the strengths, a few conceptual aspects and analytical decisions have to be explained, justified, or clarified.

      INTRODUCTION/CONCEPTUALIZATION

      (1) Two terms seem to be interchangeable, which should not, in this work: vicarious/observational learning vs preference learning. For vicarious learning, individuals observe others' actions (and optionally also the corresponding consequence resulted directly by their own actions), whereas, for preference learning, individuals predict, or act on behalf of, the others' actions, and then receive feedback if that prediction is correct or not. For the current work, it seems that the experiment is more about preference learning and prediction, and less so about vicarious learning. But the intro and set are heavily around vicarious learning, and late the use of vicarious learning and preference learning is rather mixed in the text. I think either tone down the focus on vicarious learning, or discuss how they are different. Some of the references here may be helpful: Charpentier et al., Neuron, 2020; Olsson et al., Nature Reviews Neuroscience, 2020; Zhang & Glascher, Science Advances, 2020

      EXPERIMENTAL DESIGN

      (2) For each offer type, the experiment "added a uniformly distributed noise in the range of (-10 ,10)". I wonder how this looks like? With only integers such as 25:75, or even with decimal points? More importantly, is it possible to have either 70:30 or 90:10 option, after adding the noise, to have generated an 80:20 split shown to the participants? If so, for the analyses later, when participants saw the 80:20 split, which condition did this trial belong to? 70:30 or 90:10? And is such noise added only to the learning phase, or also to the baseline/transfer phases? This requires some clarification.

      (3) For the offer conditions (90:10, 70:30, 50:50, 30:70, 10:90) - are they randomized? If so, how is it done? Is it randomized within each participants, and/or also across participants (such that each participant experienced different trial sequences)? This is important, as the order especially for the leanring phase can largely impact on the preference learning of the participants.

      STATISTICAL ANALYSIS & COMPUTATIONAL MODELING

      (4) In Study 1 DI offer types (90:10, 70:30), the rejection rate for DI-AI averse looks consistently higher than that for DI averse (ie, blue line is above the yellow line). Is this significant? If so, how come? Since this is a between-subject design, I would not anticipate such a result (especially for the baseline). Also, for the LME results (eg, Table S3), only interactions were reported but not the main results.

      (5) I do not particularly find this analysis appealing: "we examined whether participants' changes in rejection rates between Transfer and Baseline, could be explained by the degree to which they vicariously learned, defined as the change in punishment rates between the first and last 5 trials of the Learning phase." Naturally, participants' behavior in the first 5 trials in the learning phase will be similar to those in the baseline; and their behavior in the last 5 trials in the learning phase would echo those at the transfer phase. I think it would be stronger to link the preference learning results to the chance between baseline and transfer phase, eg, by looking at the difference between alpha (beta) at the end of the learning phase and the initial alpha (beta).

      (6) I wonder if data from the baseline and transfer phases can also be modeled, using a simple Fehr-Schimdt model? This way, the change in alpha/beta can also be examined between the baseline and transfer phase.

      (7) I quite liked Study 2 that tests the generalization effect, and I expected to see an adapted computational modeling to directly reflect this idea. Indeed, the authors wrote "[...] given that this model [...] assumes the sort of generalization of preferences between offer types [...]". But where exactly did the preference learning model assumed the generalization? In the methods, the modeling seems to be only about Study 1; did the authors advise their model to accommodate Study 2? The authors also ran simulation for the learning phase in Study 2 (Figure 6), and how did the preference updated (if at all) for offers (90:10 and 10:90) where feedback was not given? Extending/Unpacking the computational modeling results for Study2 will be very helpful for the paper.

      Comments on revisions:

      I kept my original public review, so that future readers can see the progress and development of the manuscript.

      The authors have largely addressed my original questions/concerns, and I have two outstanding comments.

      (a) Related to my original comment #6, where I suggested to apply the F-S model also to the baseline and transfer phase. The authors were inclined not to do it, but in fact later in comment #7 and in the manuscript they opted to use a more complex F-S-based model to their learning phase. I agree that the rejection rate is indeed a clear indication, but for completeness, it'd be more consistent and compelling if the paper follows a model-free (model-agnostic) and model-based approach in all phases of the experiment.

      (b) Related to my original comment #4, I appreciate that the authors have provided more details of their LMM models. But I don't think it is accurate regardless. First, all offer levels (50:50, 30:70, 10:90), should not be coded as pure categorical levels. In fact, they have an ordinal meaning, a single ordinal predictor with three levels should be used. This also avoids the excessive number of interactions the authors have pointed out.

      Second, running a model with only interactions without main effects is flawed. All textbooks on stats emphasize that without the presence of the main effects, the interpretation of interaction only is biased.

      So these LMMs needs to be revised before the manuscript eventually gets to a version of record.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates whether individuals can learn to adopt egalitarian norms that incur a personal monetary cost, such as rejecting offers that benefit them more than the giver (advantageous inequitable offers). While these behaviors are uncommon, two experiments aim to demonstrate that individuals can learn to reject such offers by observing a "teacher" who follows these norms. The authors use computational modelling to argue that learners adopt these norms through a sophisticated process, inferring the latent structure of the teacher's preferences, akin to theory of mind.

      Strengths:

      This paper is well-written and tackles an important topic relevant to social norms, morality, and justice. The findings are promising (though further control conditions are necessary to support the conclusions). The study is well-situated in the literature, with a clever experimental design and a computational approach that may offer insights into latent cognitive processes. In the revision, the authors clarified some questions related to the initial submission.

      Weaknesses:

      Despite these strengths, I remain unconvinced that the current evidence supports the paper's central claims. Below, I outline several issues that, in my view, limit the strength of the conclusions.

      (1) Experimental Design and Missing Control Condition:

      The authors set out to test whether observing a "teacher" who is averse to advantageous inequity (Adv-I) will affect observers' own rejection of Adv-I offers. However, I think the design of the task lacks an important control condition needed to address this question. At present, participants are assigned to one of two teachers: DIS or DIS+ADV. Behavioral differences between these groups can only reveal relative differences in influence; they cannot establish whether (and how) either teacher independently affects participants' own behavior. For example, a significant difference between conditions can emerge even if participants are only affected by the DIS teacher and are not affected at all by the DIS+ADV teacher. What is crucially missing here is a no-teacher control condition, which can then be compared with each teacher condition separately. This control condition would also control for pure temporal effects unrelated to teacher influence (e.g., increasing Adv-I rejections due to guilt build-up).

      While this criticism applies to both experiments, it is especially apparent in Experiment 2. As shown in Figure 4, the interaction for 10:90 offers reflects a decrease in rejection rates following the DIS teacher, with no significant change following the DIS+ADV teacher. Ignoring temporal effects, this pattern suggests that participants may be learning NOT to reject from the DIS teacher, rather than learning to reject from the DIS+ADV teacher. On this basis, I do not see convincing evidence that participants' own choices were shaped by observing Adv-I rejections.

      In the Discussion, the authors write that "We found that participants' own Adv-I-averse preferences shifted towards the preferences of the Teacher they just observed, and the strength of these contagion effects related to the degree of behavior change participants exhibited on behalf of the Teachers, suggesting that they internalized, at least somewhat, these inequity preferences." However, there is no evidence that directly links the degree of behaviour change (on the teacher's behalf) to contagion effects (own behavioural change). I think there was a relevant analysis in the original version, but it was removed from the current version.

      (2) Modelling Efforts: The modelling approach is underdeveloped. The identification of the "best model" lacks transparency, as no model-recovery results are provided. Additionally, behavioural fits for the losing models are not shown, leaving readers in the dark about where these models fail. Readers would benefit from seeing qualitative/behavioural patterns that favour the winning model. Moreover, the reinforcement learning (RL) models used are overly simplistic, treating actions as independent when they are likely inversely related. For example, the feedback that the teacher would have rejected an offer provides evidence that rejection is "correct" but also that acceptance is "an error," and the latter is not incorporated into the modelling. In other words, offers are modelled as two-armed bandits (where separate values are learned for reject and accept actions), but the situation is effectively a one-armed bandit (if one action is correct, the other is mistaken). It is unclear to what extent this limitation affects the current RL formulations. Can the authors justify/explain their reasoning for including these specific variants? The manuscript only states Q-values for reject actions, but what are the Q-values for accept actions? This is unclear.

      In Experiment 2, only the preferred model is capable of generalization, so it is perhaps unsurprising that this model "wins." However, this does not strongly support the proposed learning mechanism, lacking a comparison with simpler generalizing mechanisms (see following comments).

      (3) Conceptual Leap in Modelling Interpretation: The distinction between simple RL models and preference-inference models seems to hinge on the ability to generalize learning from one offer to another. Whereas in the RL models, learning occurs independently for each offer (hence no cross-offer generalization), preference inference allows for generalization between different offers. However, the paper does not explore "model-free" RL models that allow generalization based on the similarity of features of the offers (e.g., payment for the receiver, payment for the offer-giver, who benefits more). Such models are more parsimonious and could explain the results without invoking a theory of mind or any modelling of the teacher. In such model versions, a learner acquires a functional form that allows prediction of the teacher's feedback based on offer features (e.g., linear or quadratic weighting). Because feedback for an offer modulates the parameters of this function (feature weights), generalization occurs without necessarily evoking any sophisticated model of the other person. This leaves open the possibility that RL models could perform just as well or even outperform the preference learning model, casting doubt on the authors' conclusions.

      Of note: even the behaviourists knew that when Little Albert was taught to fear rats, this fear generalized to rabbits. This could occur simply because rabbits are somewhat similar to rats. But this doesn't mean Little Albert had a sophisticated model of animals that he used to infer how they behave.

      In their rebuttal letter, the authors acknowledge these possibilities, but the manuscript still does not explore or address alternative mechanisms.

      (4) Limitations of the Preference-Inference Model: The preference-inference model struggles to capture key aspects of the data, such as the increase in rejection rates for 70:30 DI offers during the learning phase (e.g., Fig. 3A, AI+DI blue group). This is puzzling. Thinking about this, I realized the model makes quite strong, unintuitive predictions which are not examined. For example, if a subject begins the learning phase rejecting the 70:30 offer more than 50% of the time (meaning the starting guilt parameter is higher than 1.5), then, over learning, the tendency to reject will decrease to below 50% (the guilt parameter will be pulled down below 1.5). This is despite the fact that the teacher rejects 75% of the offers. In other words, as learning continues, learners will diverge from the teacher. On the other hand, if a participant begins learning by tending to accept this offer (guilt < 1.5), then during learning, they can increase their rejection rate but never above 50%. Thus, one can never fully converge on the teacher. I think this relates to the model's failure in accounting for the pattern mentioned above. I wonder if individuals actually abide by these strict predictions. In any case, these issues raise questions about the validity of the model as a representation of how individuals learn to align with a teacher's preferences (given that the model doesn't really allow for such an alignment).

      In their rebuttal letter, the authors acknowledged these anomalies and stated that they were able to build a better model (where anomalies are mitigated, though not fully eliminated). But they still report the current model and do not develop/discuss alternatives. A more principled model may be a Bayesian model where participants learn a belief distribution (rather than point estimates) regarding the teacher's parameters.

      (5) Statistical Analysis: The authors state in their rebuttal letter that they used the most flexible random effect structure in mixed-effects models. But this seems not to be the case in the model reported in Table SI3 (the very same model was used for other analyses too). Indeed, here it seems only intercepts are random effects. This left me confused about which models were used.

    1. eLife Assessment

      This important study provides solid evidence for new insights into the role of Type-1 nNOS interneurons in driving neuronal network activity and controlling vascular network dynamics in awake, head-fixed mice. The authors use an original strategy based on the ablation of Type-1 nNOS interneurons with local injection of saporin conjugated to a substance P analogue into the somatosensory cortex. They show that ablation of type I nNOS neurons has surprisingly little effect on neurovascular coupling, although it alters neural activity and vascular dynamics.

    2. Reviewer #1 (Public review):

      Turner et al. present an original approach to investigate the role of Type-1 nNOS interneurons in driving neuronal network activity and in controlling vascular network dynamics in awake head-fixed mice. Selective activation or suppression of Type-1 nNOS interneurons has previously been achieved using either chemogenetic, optogenetic or local pharmacology. Here, the authors took advantage of the fact that Type-1 nNOS interneurons are the only cortical cells that express the tachykinin receptor 1 to ablate them with a local injection of saporin conjugated to substance P (SP-SAP). SP-SAP causes cell death in 90 % of type1 nNOS interneurons without affecting microglia, astrocytes and neurons. The authors report that the ablation has no major effects on sleep or behavior. Refining the analysis by scoring neural and hemodynamic signals with electrode recordings, calcium signal imaging and wide field optical imaging, they observe that Type-1 nNOS interneuron ablation does not change the various phases of the sleep/wake cycle. However, it does reduce low-frequency neural activity, irrespective of the classification of arousal state. Analyzing neurovascular coupling using multiple approaches, they report small changes in resting-state neural-hemodynamic correlations across arousal states, primarily mediated by changes in neural activity. Finally, they show that nNOS type 1 interneurons play a role in controlling interhemispheric coherence and vasomotion.

      In conclusion, these results are interesting, use state-of-the-art methods and are well supported by the data and their analysis. I have only a few comments on the stimulus-evoked haemodynamic responses that can be easily addressed:

      Comments on revisions:

      As I mentioned in my initial review, this study is important. In my opinion, it could be published as is. Nonetheless, I am still somewhat dissatisfied with the authors' responses to my earlier comments. I understand that the same animals were not used for both stimulation paradigms, which is unfortunate. Nonetheless, I would have appreciated it if the authors had provided a couple of experiments illustrating GCaMP7 signals during brief stimulation in their reply to the reviewers. I am still unconvinced by the authors' suggestion that the GCaMP7 signal would remain stable during removal of the vascular undershoot. Since the absence of the undershoot is notable, I anticipate that a significant part of the initial response to prolonged stimulation is influenced by processes that occur during the 0.1-second stimulation, processes that may involve a change in the bulk neuronal response.

      In short, the data could support or refute the following statement: "Loss of type-I nNOS neurons drove minimal changes in the vasodilation elicited by brief stimulation..."

    3. Reviewer #2 (Public review):

      Summary:

      This important study by Turner et al., examines the functional role of a sparse but unique population of neurons in the cortex that express Nitric oxide synthase (Nos1). To do this, they pharmacolologically ablate these neurons in focal region of whisker related primary somatosensory (S1) cortex using a saponin-Substance P conjugate. Using widefield and 2-photon microscopy, as well as field recordings, they examine the impact of this cell specific lesion on blood flow dynamics and neuronal population activity. Within primary somatosensory cortex after Nos1 ablation, they find changes in neural activity patterns, decreased delta band power, reduced sensory evoked changes in blood flow (specifically eliminates the sustained blood flow change after stimulation) and decreased vasomotion.

      Strengths:

      This was a technically challenging study and the experiments were executed in an expert manner. The manuscript was well written and I appreciated the cartoon summary diagrams included in each figure. The analysis was rigorous and appropriate. Their discovery that Nos1 neurons can have significant effects on blood flow dynamics and neural activity is quite novel that should seed many follow up, mechanistic experiments to explain this phenomenon. The conclusions were justified by the convincing data presented.

      Weaknesses:

      I did not find any major flaws with the study. I originally noted some potential issues with the authors' characterization of the lesion and its extent, but that has been resolved in the revised manuscript.

      Comments on revisions:

      The authors have thoughtfully addressed the relatively minor concerns I had originally raised. Congratulations to the authors for producing this important paper.

    1. eLife Assessment

      This paper addresses a significant question regarding the low overlap between genetic discoveries for human complex diseases and those for gene expression by emphasizing the contribution of cell-type-specific chromatin accessibility QTLs. The analyses supporting the main claims are convincing, and the key conclusions are valuable and of interest to readers in the fields of human genetics and functional genomics.

    2. Reviewer #1 (Public review):

      Most human traits and common diseases are polygenic, influenced by numerous genetic variants across the genome. These variants are typically non-coding and likely function through gene regulatory mechanisms. To identify their target genes, one strategy is to examine if these variants are also found among genetic variants with detectable effects on gene expression levels, known as eQTLs. Surprisingly, this strategy has had limited success, and most disease variants are not identified as eQTLs, a puzzling observation recently referred to as "missing regulation".

      In this work, Jeong and Bulyk aimed to better understand the reasons behind the gap between disease-associated variants and eQTLs. They focused on immune-related diseases and used lymphoblastoid cell lines (LCLs) as a surrogate for the cell types mediating the genetic effects. Their main hypothesis is that some variants without eQTL evidence might be identifiable by studying other molecular intermediates along the path from genotype to phenotype. They specifically focused on variants that affect chromatin accessibility, known as caQTLs, as a potential marker of regulatory activity.

      The authors present data analyses supporting this hypothesis: several disease-associated variants are explained by caQTLs but not eQTLs. They further show that although caQTLs and eQTLs likely have largely overlapping underlying genetic variants, some variants are discovered only through one of these mapping strategies. Notably, they demonstrate that eQTL mapping is underpowered for gene-distal variants with small effects on gene expression, whereas caQTL mapping is not dependent on the distance to genes. Additionally, for some disease variants with caQTLs but no corresponding eQTLs in LCLs, they identify eQTLs in other cell types.

      Altogether, Jeong and Bulyk convincingly demonstrate that for immune-related diseases, discovering the missing disease-eQTLs requires both larger eQTL studies and a broader range of cell types in expression assays. It remains to be seen what fractions of the missing disease-eQTLs will be discovered with either strategy and whether these results can be extended to other diseases or traits.

      It should be noted that the problem of "missing regulation" has been investigated and discussed in several recent papers, notably Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; Mostafavi et al., Nat. Genet. 2023. The results reported by Jeong and Bulyk are not unexpected in light of this previous work (all of which they cite), but they add valuable empirical evidence that mostly aligns with the model and discussions presented in Mostafavi et al.

    3. Reviewer #2 (Public review):

      eQTLs have emerged as a method for interpreting GWAS signals. However, some GWAS signals are difficult to explain with eQTLs. In this paper, the authors demonstrated that caQTLs can explain these signals. This suggests that for GWAS signals to actually lead to disease phenotypes, they must be accessible in the chromatin. This implies that for GWAS signals to translate into disease phenotypes, they need to be accessible within the chromatin.

      However, fundamentally, caQTLs, like GWAS, have the limitation of not being able to determine which genes mediate the influence on disease phenotypes. This limitation is consistent with the constraints observed in this study.

      (1) Reproducibility / Methods. The concrete numbers provided in the authors' response (e.g., 20 YRI LCL ATAC‑seq samples used only for peak discovery; caQTL mapping restricted to 100 GBR LCLs; 99,320 ATAC peaks tested vs 14,872 genes for eQTL; 373 European RNA‑seq samples, with clarification of overlap) do not appear to be reflected in the Methods. These specifics should be incorporated directly into the Methods sections.

      (2) Experimental evidence demonstrating transcription factor binding at representative caQTL peaks would strengthen causal interpretation of these loci.

      (3) Tissue/cell‑type specificity of caQTLs: Prior work supports that chromatin‑level effects are broadly shared across cellular states, whereas expression effects are more context‑specific; thus, caQTLs are generally less "state‑specific" than eQTLs. However, this does not imply equivalence across distinct cell types: caQTLs derived from different cell types may yield different results, particularly where accessibility is cell‑type restricted.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Most human traits and common diseases are polygenic, influenced by numerous genetic variants across the genome. These variants are typically non-coding and likely function through gene regulatory mechanisms. To identify their target genes, one strategy is to examine if these variants are also found among genetic variants with detectable effects on gene expression levels, known as eQTLs. Surprisingly, this strategy has had limited success, and most disease variants are not identified as eQTLs, a puzzling observation recently referred to as "missing regulation". 

      In this work, Jeong and Bulyk aimed to better understand the reasons behind the gap between disease-associated variants and eQTLs. They focused on immune-related diseases and used lymphoblastoid cell lines (LCLs) as a surrogate for the cell types mediating the genetic effects. Their main hypothesis is that some variants without eQTL evidence might be identifiable by studying other molecular intermediates along the path from genotype to phenotype. They specifically focused on variants that affect chromatin accessibility, known as caQTLs, as a potential marker of regulatory activity. 

      The authors present data analyses supporting this hypothesis: several disease-associated variants are explained by caQTLs but not eQTLs. They further show that although caQTLs and eQTLs likely have largely overlapping underlying genetic variants, some variants are discovered only through one of these mapping strategies. Notably, they demonstrate that eQTL mapping is underpowered for gene-distal variants with small effects on gene expression, whereas caQTL mapping is not dependent on the distance to genes. Additionally, for some disease variants with caQTLs but no corresponding eQTLs in LCLs, they identify eQTLs in other cell types. 

      Altogether, Jeong and Bulyk convincingly demonstrate that for immune-related diseases, discovering the missing disease-eQTLs requires both larger eQTL studies and a broader range of cell types in expression assays. It remains to be seen what fractions of the missing diseaseeQTLs will be discovered with either strategy and whether these results can be extended to other diseases or traits. 

      We thank the reviewer for their accurate summary of our study and positive review of our findings for immune-related diseases.

      It should be noted that the problem of "missing regulation" has been investigated and discussed in several recent papers, notably Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; Mostafavi et al., Nat. Genet. 2023. The results reported by Jeong and Bulyk are not unexpected in light of this previous work (all of which they cite), but they add valuable empirical evidence that mostly aligns with the model and discussions presented in Mostafavi et al. 

      We thank the reviewer for their positive review of our results and manuscript. As Reviewer #1 noted, whether our and others' observation extends to other diseases or traits is an open question. For instance, Figure 2b in Mostafavi et al., Nat. Genet. (2023) demonstrated that there was a spectrum of depletion of eQTLs and enrichment of GWAS signals in constrained genes across various tissues and traits, respectively. Therefore, gene expression constraint may play a larger or smaller role in different diseases or traits. That immune cell types and cell states are extremely diverse (Schmiedel et al., Cell (2018) and Calderon et al., Nat. Genet. (2019), just to name a few) likely adds to the complexity of gene regulation that contributes to immune-mediated disease.

      Reviewer #2 (Public Review): 

      Summary: 

      eQTLs have emerged as a method for interpreting GWAS signals. However, some GWAS signals are difficult to explain with eQTLs. In this paper, the authors demonstrated that caQTLs can explain these signals. This suggests that for GWAS signals to actually lead to disease phenotypes, they must be accessible in the chromatin. This implies that for GWAS signals to translate into disease phenotypes, they need to be accessible within the chromatin. 

      However, fundamentally, caQTLs, like GWAS, have the limitation of not being able to determine which genes mediate the influence on disease phenotypes. This limitation is consistent with the constraints observed in this study. 

      We thank the reviewer for their accurate summary of our results.

      (1) For reproducibility, details are necessary in the method section.

      Details about adding YRI samples in ATAC-seq: For example, how many samples are there, and what is used among public data? There is LCL-derived iPSC and differentiated iPSC (cardiomyocytes) data, not LCL itself. How does this differ from LCL, and what is the rationale for including this data despite the differences?

      Banovich et al., Genome Research (2018) (PMID: 29208628), who generated data using LCLderived iPSCs and differentiated iPSCs (cardiomyocytes), also generated ATAC-seq data from 20 YRI LCL samples. We analyzed those data to identify open chromatin regions (i.e., ATACseq peaks) in LCLs and merged the regions with open chromatin regions identified with 100 GBR LCL samples from two studies by Kumasaka et al. (Nature Genetics (2016)

      PMID: 26656845 and Nature Genetics (2019) PMID: 30478436). However, we restricted the caQTL analysis to only the 100 GBR samples because of possible ancestry effects and batch effects. We attempted caQTL analysis with the 20 YRI samples as well, but the result was noisy, likely due to smaller sample size and lower read depth of the ATAC-seq data.

      caQTL is described as having better power than eQTL despite having fewer samples. How does the number of ATAC peaks used in caQTL compare to the number of gene expressions used in eQTL?

      The number of ATAC peaks used in caQTL (99,320) is ~6.7 times greater than the number of genes (14,872) used in the eQTL analysis. Therefore, there is a higher chance of detecting a significant caQTL signal and a significant colocalization signal than there is for eQTLs. However, we reasoned that since distal eQTLs are more easily detected as caQTLs and since increasing the sample size of eQTLs through meta-analysis uncovered additional eQTL colocalization at loci with caQTL colocalization only, colocalized caQTLs are likely capturing disease-relevant regulatory effects.

      Details about RNA expression data: In the method section, it states that raw data (ERP001942) was accessed, and in data availability, processed data (E-GEUV-1) was used. These need to be consistent.

      Thank you for pointing this out. We used the processed data from Expression Atlas (https://www.ebi.ac.uk/gxa/experiments/E-GEUV-1/Results), and that's what we meant by "We downloaded RNA expression level data of the LCL samples from the Expression Atlas." We have revised the “RNA expression data preparation” section in our manuscript to make the text clearer.

      How many samples were used (the text states 373, but how was it reduced from the original 465, and the total genotype is said to be 493 samples while ATAC has n=100; what are the 20 others?), and it mentions European samples, but does this exclude YRI?

      We thank the reviewer for pointing out these points of confusion. Our reported count of 493 samples included YRI samples with RNA-seq data or ATAC-seq data that we ultimately did not use for QTL analyses. There were 373 European samples with RNA-seq data that we used for eQTL analysis, and 100 GBR samples (including some that overlap with the 373 European samples) that we used for caQTL analysis. We have revised the text to clarify these points.

      (2) Experimental results determining which TFs might bind to the representative signals of caQTL are required.

      We agree that caQTL colocalization is just the start of elucidating the regulatory mechanism of a GWAS locus. Determining which TFs are bound and which TFs' binding is altered would be necessary to describe the causal regulatory mechanism. For this, we utilized the Cistrome database to search for TFs whose binding overlaps the colocalized caQTL peaks. We present the results of this analysis in Supplementary Table 3 and Supplementary Figure 4, both of which we have added in our revised manuscript. Overall, protein factors associated with active transcription, such as POL2RA, and several immune cell TFs, including RUNX3, SPI1, and RELA, were frequently detected in those peaks. Detecting these factors in most peaks supports the likelihood that the colocalized caQTL peaks are active cis-regulatory elements. These results are consistent with our observation of enriched caQTL-mediated heritability in regions with active histone marks (Figure 1).

      (3) It is stated that caQTL is less tissue-specific compared to eQTL; would caQTL performed with ATAC-seq results from different cell types, yield similar results?

      We thank the reviewer for the question. Calderon et al. (PMID: 31570894) observed that "most effects on allelic imbalance (of ATAC-seq) were shared regardless of lineage or condition". Yet, there were regions where a different cell type or state would show inaccessibility (Figure 4d in Calderon et al.). Thus, we expect that ATAC-seq results from different cell types (e.g., T cells, B cells, monocytes, etc.) would lead to additional caQTLs showing colocalization at cell-typespecific open chromatin. However, if a region is accessible in both cell types, caQTL may be detected in both. Moreover, Alasoo et al., Nature Genetics (2018) (PMID: 29379200) observed that “many disease-risk variants affect chromatin structure in a broad range of cellular states, but their effects on expression are highly context specific.” In both studies, the authors investigated immune cell types, and there could be different observations in non-immune cell types and other diseases and traits.

      Reviewer #1 (Recommendations For The Authors): 

      I think it would strengthen the paper to explore gene-level differences in the discovery of caQTLs and eQTLs. For example, complex disease-relevant genes, on average, have more/longer regulatory domains (as shown by Wang and Goldstein, AJHG 2020; Mostafavi et al., Nat. Genet. 2023). Therefore, it is plausible that for such genes, caQTLs are much more easily discoverable than eQTLs due to (i) a larger mutational target size for caQTLs, and (ii) dispersion of expression heritability across multiple domains, which hampers the discovery of eQTLs but not caQTLs, which are studied independently of other domains in the region. In other words, discovered caQTLs and eQTLs likely vary in terms of their distance to genes (as the authors report), as well as their target genes.

      We thank the reviewer for the suggestion to explore gene-level differences. We expect that the effects of complex disease-relevant genes having more / longer regulatory domains, on average, to explain our observations. We agree on both of your points that there are many more regulatory elements that are captured as accessible regions than expressed genes and that genes often have multiple independent eQTLs leading to dispersion of heritability. The genelevel trend that we described was the distance of the regulatory element from the genes. Additional analyses would be a relevant future direction.

      Also considering gene-level analysis, Mostafavi et al. show that the types of biases they report for eQTLs also apply to other molecular QTLs. It would be valuable to compare GWAS hits with versus without caQTL colocalization. Similarly, it would be insightful to compare GWAS hits with both colocalized caQTLs and eQTLs to GWAS hits with colocalized caQTLs but no eQTLs in any of the cell types. 

      We thank the reviewer for the comment. Investigating for potential biases in the colocalized caQTL would be useful, but we considered it beyond the scope of this work. In terms of biological factors, we demonstrated through mediated heritability analyses that more accessible chromatin (based on ATAC-seq read coverage) and regions with active histone marks were enriched for autoimmune disease associations (Figure 1). Furthermore, as greater distance of the regulatory variant from the transcription start site significantly reduced the cis-heritability, we would expect that distance would play a major role, similar to Mostafavi et al.’s conclusions.

      I don't think the argument for the role of natural selection contributing to the "missing regulation" is presented accurately. Specifically, large eQTLs acting on top trait-relevant genes are under stronger selection and thus, on average, segregate at lower frequencies. This makes them difficult to discover in eQTL assays. However, if not lost, they contribute as much, if not more, to trait heritability than weaker eQTLs at the same gene because their larger effects compensate for their lower frequency. At the most extreme, selection should have a "flattening" effect (e.g., see Simons et al., PLOS Biol 2018; O'Connor et al., AJHG 2019): weak and strong eQTLs at the same gene are expected to contribute equally to heritability. Therefore, the statement "Consequently, only weak eQTL variants, often in regions distal to the gene's promoter, may remain and affect traits" is not correct. If this turns out to be empirically true, other models, such as pleiotropic selection, need to explain it. 

      We thank the reviewer for the correction. We agree with the comment and have revised the sentences in the introduction accordingly.

      It is worth speculating why caQTLs may be more consistent across cell types than cis-eQTLs. Additionally, readers may infer from the paper that the focus should shift from eQTLs to caQTLs, which may not be the authors' intention. Perhaps these approaches are complementary: caQTLs can help with TSS-distal disease variants, while finding the target gene and regulatory context is more straightforward with eQTL colocalization. Addressing these points in the discussion will be helpful.

      We appreciate the reviewer's suggestion to clarify the advantages of incorporating cis-eQTLs and caQTLs. Our argument is exactly as you put it, and we added a paragraph on this in the Discussion.

      I believe the authors could do more to contextualize their findings within the existing literature on the subject, particularly Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; and Mostafavi et al., Nat. Genet. 2023. For instance, Umans et al. suggest that "if most standard eQTLs are generally benign, increasing sample size and adding more tissue types in an effort to identify even more standard eQTLs may not help us to explain many more disease risk mutations". Conversely, Mostafavi et al. argue for a multipronged approach, which appears more aligned with the authors' conclusions.

      We followed the reviewer’s suggestion to place our work in the context of existing literature on this topic. Moreover, we clarified what our recommendations for future data generation are.

      I thought Figures 1C-D were unclear. 

      We added a sentence in the figure legend describing that stronger and more significant enrichment indicate that mediated heritability is concentrated in that subset.

      Reviewer #2 (Recommendations For The Authors): 

      Complete workflow figures for caQTL calling and eQTL calling are required. 

      To improve clarity of the caQTL and eQTL calling workflow, we added Supplementary Figure 1.

    1. eLife Assessment

      This study reports important findings about the nature of feedback to primary visual cortex (V1) during object recognition. The state-of-the-art functional MRI evidence for the main claims is solid, and the combination of high-resolution fMRI with MEG yields significant insight into neural mechanisms. The findings presented here are relevant to a number of scientific fields such as object recognition, categorisation and predictive coding.

    2. Reviewer #1 (Public review):

      This study examines the spatiotemporal properties of feedback signals in the human brain during an object discrimination task. Using 7T fMRI and MEG, the authors show that task-relevant object category information can be decoded from both deep and superficial layers of V1, originating from occipito-temporal and posterior parietal cortices. In contrast, task-irrelevant category feedback does not appear in V1, even when the same objects are foveally presented. Low-level orientation information, however, is decodable from V1 regardless of task relevance and is supported by recurrence with occipito-temporal regions. These findings suggest that category decoding in V1 depends on task-driven feedback rather than feedforward visual features.

      Strengths

      This study leverages two advanced neuroimaging modalities attempting to connect object recognition across cortical layer and whole-brain levels. The revised manuscript strengthens the connection between the fMRI and MEG components.<br /> It also demonstrates that a peripheral object discrimination task is effective for isolating feedforward and feedback signals using 7T fMRI.<br /> It is particularly notable that no low-level features were fed back to V1's superficial layers in the peripheral object discrimination task. The authors further show that high- and low-level feedback to the foveal V1 are comparable in strength, supporting the idea that the superficial layer in V1 selectively represents task-relevant content.

      Weaknesses

      One alternative explanation for the absence of task-irrelevant category decoding in the foveal task could be that feedback enhancement may be required to decode complex features from V1 (compared to a coarse orientation feature). It would be informative to test whether the findings hold if the categorical boundary were defined through a low level feature other than orientation (e.g., frequency) (e.g. Ester, Sprague and Serences, 2020).

      I would like to echo the concerns raised by the other reviewer regarding multiple comparisons correction. It is important to apply correction procedures, especially given the number of statistical tests performed across brain regions where strict a priori hypotheses are unlikely. In the case of cluster-based statistics, the manuscript should clearly specify both the cluster-forming threshold and the significance threshold used for comparing true cluster masses to the shuffled distribution.

      Conclusion

      Overall, the results support the study's conclusions. This work addresses a timely question in object categorization and predictive coding-specifically, how feedback signals vary in content and timing across cortical layers.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports high-resolution functional MRI data and MEG data revealing additional mechanistic information about an established paradigm studying how foveal regions of primary visual cortex (V1) are involved in processing peripheral visual stimuli. Because of the retinotopic organization of V1, peripheral stimuli should not evoke responses in the regions of V1 that represent stimuli in the center of the visual field (the fovea). However, functional MRI responses in foveal regions do reflect the characteristics of peripheral visual stimuli - this is a surprising finding first reported in 2008. The present study uses fMRI data with sub-millimeter resolution to study the how responses at different depths in the foveal gray matter do or don't reflect peripheral object characteristics during 2 different tasks: one in which observers needed to make detailed judgments about object identity, and one in which observers needed to make more coarse judgments about object orientation. FMRI results reveal interesting and informative patterns in these two conditions. A follow-on MEG study yields information about the timing of these responses. Put together, the findings settle some questions in the field and add new information about the nature of visual feedback to V1.

      Strengths:

      (1) Rigorous and appropriate use of "laminar fMRI" techniques.

      (2) The introduction does an excellent job of contextualizing the work.

      (3) Control experiments and analyses are designed and implemented well

      Weaknesses:

      (1) The use of the term "low order" to describe object orientation is potentially confusing. During review, the authors considered this issue and responded that they would continue with the use of the term low-order to describe object orientation because a low-pass spatial frequency filter would provide object orientation information. This is certainly a reasonable perspective; nonetheless, this reviewer thinks spatial frequencies that low are not readily represented by neurons in early visual cortex and it is common to use "low-order" to refer to features extracted in early visual areas, so I think this causes confusion.

      (2) The methods contain a nice description of the methods for "correcting the vascular-related signals". I'm guessing this is the method that removed, e.g., 22% of foveal voxels (previous paragraph), but it's not entirely clear whether the voxel selection methods described in the "correcting the vascular-related signals" are describing the same processing step referred to in the previous paragraph as "a portion of voxels was removed based on large vein distribution".

      (3) It is quite difficult to perform laminar analyses across multiple visual areas because distortion compensation is not perfect and registration of functional to anatomical data will always be a bit better in some places and a bit worse in others. An ideal manuscript would include some images showing registration quality in V1, LOC, and IPS regions for a few different participants, or include some kind of quality metric indicating the confidence in depth assignments in different regions.

      (4) For the decoding analysis, it would be helpful to have more information about how samples were defined for each condition -- were the beta values for entire blocks used as samples for each condition, or were separate timepoints during a block used in the SVM as repeated samples for each condition?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1.1) The authors argue that low-level features in a feedback format could be decoded only from deep layers of V1 (and not superficial layers) during a perceptual categorization task. However, previous studies (Bergman et al., 2024; Iamshchinina et al., 2021) demonstrated that low-level features in the form of feedback can be decoded from both superficial and deep layers. While this result could be due to perceptual task or highly predictable orientation feature (orientation was kept the same throughout the experimental block), an alternative explanation is a weaker representation of orientation in the feedback (even before splitting by layers there is only a trend towards significance; also granger causality for orientation information in MEG part is lower than that for category in peripheral categorization task), because it is orthogonal to the task demand. It would be helpful if the authors added a statistical comparison of the strength of category and orientation representations in each layer and across the layers.

      We agree that the strength of feedback information is related to task demand. Specifically, we would like to highlight the relationship between task demand and feedback information in the superficial layer. Previous studies have shown that foveal feedback information is observed only when the task requires the identity information of the peripheral objects (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In this study, we found that the deep layer represented both orientation and categorical feedback information, while the superficial layer only represented categorical information. This suggests that feedback information in the superficial layer may be related to (or enhanced by) the task demands. In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. This assumption is consistent with the anatomical connections of the superficial layer, which not only receives feedback connections but also sends outputs to higher-level regions for further processing. This is also consistent with Iamshchinina et al.’s observation that, when orientation information had to be mentally rotated and reported (i.e., task-relevant), it was observed in both the superficial and deep layers of V1. Bergmann et al. observed illusory color information in the superficial layer of V1, which may reflect a combination of lateral propagation and feedback mechanisms in the superficial layer that support visual filling-in phenomena. We have revised the discussion in the manuscript: In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. Recent studies (Iamshchinina et al., 2021; Bergman et al., 2024) have also highlighted the relationship between feedback information and neural representations in V1 superficial layer.

      To further demonstrate the laminar profiles of low- and high-order information, we have re-analyzed the data and added more fine-scale laminar profiles with statistical comparisons in the revised manuscript. The results again showed significant neural decoding performances in the deep layer of both category and orientation information, and only significant decoding performances of category information in the superficial layer.

      (1.2) The authors argue that category feedback is not driven by low-level confounding features embedded in the stimuli. They demonstrate the ability to decode orientations, particularly well represented by V1, in the absence of category discrimination. However, the orientation is not a category-discriminating feature in this task. It could be that the category-discriminating features cannot be as well decoded from V1 activity patterns as orientations. Also, there are a number of these category discriminating features and it is unclear if it is a variation in their representational strength or merely the absence of the task-driven enhancement that preempts category decoding in V1 during the foveal task. In other words, I am not sure whether, if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding.

      The low-order features mentioned in the manuscript refer to visual information encoded intrinsically in V1, independent of task demands. In the foveal experiment, the task is to discriminate the color of fixation, which is unrelated to the category or orientation of the object stimuli. The results showed that only orientation information could be decoded from foveal V1. This indicates that low-order information, such as orientation, is strongly and automatically encoded in V1, even when it is irrelevant to the task. Meanwhile, category information could not be decoded, indicating that category information relies on feedback signals driven by attention or the task to the objects, both of which are absent in the fixation task. Other evidence indicates that category feedback is not driven by low-level features intrinsically encoded in V1. First, the laminar profiles of these two types of feedback information differ considerably (see response to 1.1). Second, only category feedback information was correlated with behavioral performance (MEG experiment). These findings demonstrate that category feedback information is task-driven and differs from the automatically encoded low-order information in foveal V1. The reviewer expressed some uncertainty that, whether “if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding”. Our data showed that orientation could be automatically decoded in V1, regardless of task demand. Thus, if orientation was a category-specific feature in the foveal task (i.e., sharpies are always horizontal and smoothies are always vertical), category decoding would be successful in V1. However, in this scenario, the orientation and other shape features are not independent, thus preventing us to find out whether non-orientation shape features could be decoded in V1.  

      Reviewer #2 (Public review):

      (2.1) While not necessarily a weakness, I do not fully agree with the description of the 2 kinds of feedback information as "low-order" and "high-order". I understand the motivation to do this - orientation is typically considered a low-level visual feature. But when it's the orientation of an entire object, not a single edge, orientation can only be defined after the elements of the object are grouped. Also, the discrimination between spikies and smoothies requires detecting the orientations of particular edges that form the identifying features. To my mind, it would make more sense to refer to discrimination of object orientation as "coarse" feature discrimination, and orientation of object identity as "fine" feature discrimination. Thus, the sentence on line 83, for example, would read "Interestingly, feedback with fine and coarse feature information exhibits different laminar profiles.".

      We agree that the object orientation (invariant to object category or identity) is defined on a larger spatial scale than the local orientation features such as local edges, however, in this sense, the object orientation is a coarse feature. In contrast, the category-defining information is mainly contributed by the local shape information (i.e., little cubes vs. bumps), which is more fine-scale information. One way to look at this difference is that the object orientation information is mainly carried by low-spatial frequency information and will survive low-pass filtering, hence “coarse”; while the object category information would largely be lost if the objects underwent low-pass spatial filtering.

      We believe the labeling words “low-order” and “high-order” are consistent with the typical use of these terms in the literature, referring to features intrinsically encoded in early visual cortex vs. in high level object sensitive cortical regions. The more important aspects of our results are in their differential engagement in feedforward vs. feedback processing, with low-order features automatically represented in the early visual cortex during feedforward processing while high-order features represented due to feedback processing. Results from the foveal fMRI experiment (Exp. 2) strongly support this assumption that, when objects were presented at the fovea and the task was a fixation color task irrelevant to object information, foveal V1 could only represent orientation information, not category information. Notably, there was a dramatic difference in decoding performance in foveal V1 between Exp.1 and Exp.2, which ruled out the argument that both orientation and category information were driven by local edge information represented in V1.

      (2.2) Figure 2 and text on lines 185, and 186: it is difficult to interpret/understand the findings in foveal ROIs for the foveal control task without knowing how big the ROI was. Foveal regions of V1 are grossly expanded by cortical magnification, such that the central half-degree can occupy several centimeters across the cortical surface. Without information on the spatial extent of the foveal ROI compared to the object size, we can't know whether the ROI included voxels whose population receptive fields were expected to include the edges of the objects.

      The ROI of foveal V1 was defined using data from independent localizer runs. In each localizer run, flashing checkerboards of the same size as the objects in the task runs were presented at the fovea or in the periphery. The ROI of foveal V1 was identified as the voxels responsive to the foveal checkerboards. In other words, The ROI of foveal V1 included the voxels whose population receptive fields covered the entire object in the foveal visual field.

      We included a figure in the revised manuscript comparing the activation maps induced by the foveal object stimulus in the task runs with the ROI coverage defined by the localizer runs. 

      (2.3) Line 143 and ROI section of the methods: in order for the reader to understand how robust the responses and analyses are, voxel counts should be provided for the ROIs that were defined, as well as for the number (fraction) of voxels excluded due to either high beta weights or low signal intensity (lines 505-511).

      In the revised manuscript, we have included the number of voxels in each ROI and the criteria for voxel selection:

      For each ROI, the number of voxels depended on the size of the activated region, as estimated from the localizer data. The numbers are as follows: foveal V1, 2185 ± 389; peripheral V1, 1294± 215; LOC, 3451 ± 863; and pIPS, 5154 ± 1517. To avoid the signals of large vessels, a portion of voxels was removed based on the distribution of large vessels: V1 foveal, 22.5% ± 6.6%; V1 peripheral, 6.8% ± 3.9%; LOC, 16.1% ± 8.1% ; and pIPS, 5.1% ± 3.2%. For the decoding analysis, the top 500 responsive voxels in each ROI were selected to balance the voxel numbers across different ROIs for training and testing the decoder.

      (2.4) I wasn't able to find mention of how multiple-comparisons corrections were performed for either the MEG or fMRI data (except for one Holm-Bonferonni correction in Figure S1), so it's unclear whether the reported p-values are corrected.

      For the fMRI results, there is strong evidence showing that feedback information is sent to the foveal V1 during a peripheral object task (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In addition, anatomical and functional evidence shows that the superficial and deep layers of V1 receive feedback information during visual processing. Therefore, in the current study, we specifically examined two types of feedback information in the superficial and deep layers of foveal V1, and did not apply multiple-comparison correction to the decoding results.

      Regarding the MEG results, since we did not have a strong prior about when feedback information would arrive in the foveal V1, a cluster-based permutation method was used to correct for multiple comparisons in each time course. Specifically, for each time point, the sign of the effect for each participant was randomly flipped 50000 times to obtain the null hypothesis distribution for each time point. Clusters were defined as continuous significant time points in the real and flipped time series, and the effects in each cluster were summed to create a cluster-based effect. The most significant cluster-based effect in each flipped time series was then used to generate the corrected null hypothesis distribution.

      We included these clarifications in Significance testing part of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      It would be helpful if the authors could elaborate more on the fMRI decoding results in higher-order visual areas in the Discussion (there are recent studies also investigating higher-order visual areas (Carricarte et al., 2024) and associative areas (Degutis et al., 2024)) and relate it to the MEG information transmission results between the areas overlapping with the regions recorded in the fMRI part of the study.

      We have discussed the fMRI decoding results in the LOC and IPS in the revised manuscript: 

      In the current study, fMRI signals from early visual cortex and two high-level brain regions (LOC and pIPS) were recorded. Neural dynamics of these regions were extracted from MEG signals. Decoding analyses based on fMRI and MEG signals consistently showed that object category information could be decoded from both regions. These findings raise an important question:  Further Granger causality analysis indicates that the feedback information in foveal V1 was mainly driven by signals from the LOC. Layer-specific analysis showed that category information could be decoded in the middle and superficial layers of the LOC. A reasonable interpretation of this result is that feedforward information from the early visual cortex was received by the LOC’s middle layer, then the category information was generated and fed back to foveal V1 through the LOC’s superficial layer. A recent study (Carricarte et al., 2024) found that, in object selective regions in temporal cortex, the deep layer showed the strongest fMRI responses during an imagery task. Together, the results suggest that the deep and superficial layers correspond to different feedback mechanisms. It is worth noting that other cortical regions may also generate feedback signals to the early visual cortex. The current study did not have simultaneously recorded fMRI signals from the prefrontal cortex, but it has been shown that feedback signals can be traced back to the prefrontal cortex during complex cognitive tasks, such as working memory (Finn et al., 2019; Degutis et al., 2024). Further fMRI studies with submillimeter resolution and whole-brain coverage are needed to test other potential feedback pathways during object processing.

      The behavioral performance seems quite low (67%), could authors explain the reasons for it?

      We designed the object stimuli to be difficult to distinguish on purpose. Some of our pilot data showed that the more involved the participants were in the peripheral object task, the easier the foveal feedback information was to decoded. It is reasonable to assume that if the peripheral objects were easily distinguishable, the feedback mechanism may not be fully recruited during object processing. Furthermore, since we were decoding category and orientation information rather than identity information, the difficulty of distinguishing two objects from the same category and with the same orientation would not affect the decoding of category and orientation information in the neural signals.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 52: the meaning of the sentence starting with "However, ..." is not entirely clear. Maybe the word "while" is missing after the first comma?

      (2) Line 224. If I'm understanding the rationale for the MEG analysis correctly, it was not possible to localize foveal regions, but the cross-location decoding analysis was used to approximate the strength and timing of feedback information. If this is the case, "neural representations in the foveal region" were not extracted.

      (3) Figure 4. The key information is too small to see. The lines indicating where decoding performance was significant are quite thin but very important, and the text next to them indicating onset times of significant decoding is in such a small font size I needed to zoom in to 300% to read it (yes, my eyes are getting old and tired). Increasing the font size used to represent key information would be nice.

      (4) Figure 4 caption. Line 270 describes the line color in the plots as yellow, but that color is decidedly orange to my eye.

      (5) Line 340/341: Papers that define and describe feedback-receptive fields seem important to cite here:

      Keller, A. J., Roth, M. M., & Scanziani, M. (2020). Feedback generates a second receptive field in neurons of the visual cortex. Nature, 582(7813), 545-549.

      Kirchberger, L., Mukherjee, S., Self, M. W., & Roelfsema, P. R. (2023). Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science advances, 9(3), eadd2498.

      (6) Lines 346-350: this sentence seems to have some missing or misused words, because the syntax isn't intact.

      (7) Line 367: supports should be support.

      We thank the reviewers for the comments and have corrected them in the manuscript.

    1. eLife Assessment

      This important study identifies a plant-derived metabolite, betulin, as an effective natural insecticide against aphids and uncovers its specific molecular target. The evidence is compelling, combining greenhouse and field efficacy trials with rigorous molecular, genetic, and electrophysiological approaches that converge on a conserved binding site in the aphid GABA receptor. While additional work is needed to fully assess potential off-target effects and ecological safety, the study provides a strong mechanistic foundation. These findings will be of interest to researchers in plant biology, chemical ecology, and sustainable pest management.

    2. Reviewer #1 (Public review):

      Wang, Junxiu et al. investigated the underlying molecular mechanisms of the insecticidal activity of betulin against the peach aphid, Myzus persicae. There are two important findings described in this manuscript: (a) betulin inhibits the gene expression of GABA receptor in the aphid, and (b) betulin binds to the GABA receptor protein, acting as an inhibitor. The first finding is supported by RNA-Seq and RNAi, and the second one is convinced with MST and electrophysiological assays. Further investigations on the betulin binding site on the receptor protein provided a fundamental discovery that T228 is the key amino acid residue for its affinity, thereby acting as an inhibitor, backed up by site-directed mutagenesis of the heterologously-expressed receptor in E. coli and by CRISPR-genome editing in Drosophila.

      Comments on revisions:

      All of my review comments have been addressed, and the manuscript has been revised accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      This important study shows that betulin from wild peach trees disrupts neural signaling in aphids by targeting a conserved site in the insect GABA receptor. The authors present a nicely integrated set of molecular, physiological, and genetic experiments to establish the compound's species-specific mode of action. While the mechanistic evidence is solid, the manuscript would benefit from a broader discussion of evolutionary conservation and potential off-target ecological effects.

      Strengths:

      The main strengths of the study lie in its mechanistic clarity and experimental rigor. The identification of a betulin-binding single threonine residue was supported by (1) site-directed mutagenesis and (2) functional assays. These experiments strongly support the specificity of action. Furthermore, the use of comparative analyses between aphids and fruit flies demonstrates an important effort to explore species specificity, and the integration of quantitative data further enhances the robustness of the conclusions.

      Comments on revisions:

      The revision satisfactorily addresses my concerns on evolutionary context, methodological clarity, and ecological risk.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Wang, Junxiu et al. investigated the underlying molecular mechanisms of the insecticidal activity of betulin against the peach aphid, Myzus persicae. There are two important findings described in this manuscript: (a) betulin inhibits the gene expression of GABA receptor in the aphid, and (b) betulin binds to the GABA receptor protein, acting as an inhibitor. The first finding is supported by RNA-Seq and RNAi, and the second one is convinced with MST and electrophysiological assays. Further investigations on the betulin binding site on the receptor protein provided a fundamental discovery that T228 is the key amino acid residue for its affinity, thereby acting as an inhibitor, backed up by site-directed mutagenesis of the heterologously-expressed receptor in E. coli and by CRISPR-genome editing in Drosophila.

      Although the manuscript does have strengths in principle, the weaknesses do exist: the manuscript would benefit from more comprehensive analyses to fully support its key claims in the manuscript. In particular:

      (1) The Western blotting results in Figure 5A & B appear to support the claim that betulin inhibits GABR gene expression (L26), as a decrease in target protein levels is often indicative of suppressed gene expression. The result description for Figure 5A & B is found in L312-L316, within Section 3.6 ("Responses of MpGABR to betulin"), where MST and voltage-clamp assays are also presented. It seems the observed decrease in MpGABR protein content is due to gene downregulation, rather than a direct receptor protein-betulin interaction. However, this interpretation lacks discussion or analysis in either the corresponding results section or the Discussion. In contrast, Figures 5C-F are specifically designed to illustrate protein-betulin interactions. Presenting Figure 5A & B alongside these panels might lead to confusion, as they support distinct claims (gene expression vs. protein binding/inhibition). Therefore, I recommend moving Figure 5A & B either to the end of Figure 3 or to a separate figure altogether to improve clarity and logical flow. A minor point in the Western blotting experiment is that although GAPDH was used as a reference protein, there is no explanation in the corresponding M&M section.

      We thank the reviewer for the concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses.

      (A) According to your suggestion, the original Figure 5A and B have been inserted into Figure 3, following Figure 3D. The original Figure 3E-I has been saved as a new figure, to illustrate the RNAi assay.

      (b) “GAPDH was used as a reference protein” has been supplied in the M&M section, see

      Line 209.

      (2) The description of the electrophysiological recording experiment is unclear regarding the use of GABA. I didn't realize that GABA, the true ligand of the GABA receptor, was used in this inhibition experiment until I reached the Results section (L321), which states, "In the presence of only GABA, a fast inward current was generated." Crucially, no details are provided on the experiment itself, including how GABA was applied (e.g., concentration, duration, whether GABA was treated, followed by betulin, or vice versa). This information is essential for reproducibility. Please ensure these details are thoroughly described in the corresponding M&M section.

      We thank the reviewer for the valuable comments.

      (a) Detailed information on how to apply GABA has been added to the corresponding M&M section (Lines 260-263): After 3 days of incubation, the oocytes were used for electrophysiological recording. GABA was dissolved in 1 × Ringer's solution to prepare 100 µM GABA solution. Subsequently, the 100 µM GABA solutions containing different concentrations of betulin (0, 5, 10, 20, 40, 80, 160, 320 µM) were used to perfuse the oocytes.

      (b) Additionally, we also checked other contents of M&M section to ensure that sufficient detail has been supplied.

      (3) The phylogenetic analysis, particularly concerning Figures 4 and 6B, needs significant attention for clarity and representativeness. First, your claim that MpGABR is only closely related to CAI6365831.1 (L305-L310) is inconsistent with the provided phylogenetic tree, which shows MpGABR as equally close to Metopolophium dirhodum (XP_060864885.1) and Acyrthosiphon pisum (XP_008183008.2). Therefore, singling out only Macrosiphum euphorbiae (CAI6365831.1) is not supported by the data. Second, the representation of various insect orders is insufficient. All 11 sequences in the Hemiptera category (in both Figure 4 and Figure 6B) are exclusively from the Aphididae family. This small subset cannot represent the highly diverse Order Hemiptera. Consequently, statements like "only THR228 was conserved in Hemiptera" (L338), "The results of the sequence alignment revealed that only THR228 was conserved in Hemiptera" (L430), or "THR228... is highly conserved in Hemiptera" (L486) are not adequately supported. Third, similar concerns apply to the Diptera order, which includes 10 Drosophila and 2 mosquito samples (not diverse or representative enough), and likely to other orders as well. Thereby, the Figure 6B alignment should be revised accordingly to reflect a more accurate representation or to clarify the scope of the analysis. Fourth, there's a discrepancy in the phylogenetic method used: the M&M section (L156) states that MEGA7, ClustalW, and the neighbor-joining method were used, while the Figure 4 caption mentions that MEGA X, MUSCLE, and the Maximum likelihood method were employed. This inconsistency needs to be clarified and made consistent throughout the manuscript. Fifth, I have significant concerns about the phylogenetic tree itself (Figure 4). A small glitch was observed at the Danaus plexippus node, which raises suspicion regarding potential manipulation after tree construction. More critically, the tree, especially within Coleoptera, does not appear to be clearly resolved. I am highly concerned about whether all included sequences are true GABR orthologs or if the dataset includes partial or related sequences that could distort the phylogeny. Finally, for Figure 6B, both protein (XP_) and nucleotide (XM_) sequences were mix used. I recommend using the protein sequences instead of nucleotide sequences in this figure panel, as protein sequences are more directly informative.

      We thank the reviewer for the careful reading and valuable comments.

      (a) Firstly, according to your comments, phylogenetic analysis has been re-performed with more represent species from each Order (Fig. 5 and Fig. 7B). The results revealed that only THR228 was conserved across 11 species in the Aphididae family of Hemiptera. Therefore, the expressions like "only THR228 was conserved in Hemiptera" have been revised to “among the four residues, only THR228 was conserved across 11 species in the Aphididae family of Hemiptera” (Line 106, Line 369, Line 477, and Lines 563-564).

      (b) We have modified the description of Fig. 5 (the original Fig. 4): MpGABR  (XP_022173711.1) was found to be genetically closely related to CAI6365831.1 from Macrosiphum euphorbiae, XP 008183008.2 from Acyrthosiphon pisum, and XP 060864885.1 from Metopolophium dirhodum (Fig. 5 and Table S6). See Lines 342-346.

      (c) Phylogenetic analysis was performed using MEGA7 with multiple amino acid sequence alignment (ClustalW) and the neighbor-joining method. We have revised the Fig. 5 (the original Fig. 4) caption to make it accurate and consistent throughout the manuscript.

      (d) We are sorry about the small glitch at the Danaus plexippus node. Actually, after the phylogenetic tree was constructed, it was imported in Adobe Illustration for coloring and classification annotation. There may have been operational errors during the process of resizing the image, resulting in the occurrence of the small glitch. Besides, the unclear clustering of Coleoptera may be due to improper regulation of distance (pixels) of branch from nodes. Again, thanks for your careful reading. We have rebuilt the phylogenetic tree.

      (e) Based on your suggestion, the sequence IDs have been unified as the protein sequence IDs (Fig. 5, Fig. 7B and Table S6)

      (4) The Discussion section requires significant revision to provide a more insightful and interpretative analysis of the results. Currently, much of the section primarily restates findings rather than offering deeper discussion. For instance, L409-L419 restate the results, followed by the short sentence "Collectively, these results suggest that betulin may have insecticidal effects on aphids by inhibiting MpGABR expression". It could be further expanded to make it beneficial to elaborate on proposed mechanisms by which gene expression might be suppressed, including any potential transcription factors involved. In contrast, while L422-L442 also initially summarize results, the subsequent paragraph (L445-L472) effectively discusses the potential mechanisms of inhibitory action and how mortality is triggered, which is a good model for other parts of the section. However, all the discussion ends up with a short statement, "implying that betulin acts as a CA of MpGABR" (L472), which appears to be a leap. The inference that betulin acts as a competitive antagonist (CA) is solely based on the location of its extracellular binding site, which does not exactly overlap with the GABA binding site. It needs stronger justification or actually requires further experimental validation. The authors should consider rephrasing this statement to acknowledge the need for additional studies to definitively confirm this mechanism of action.

      We appreciate the reviewer's careful reading and valuable feedback, which will certainly enhance the quality of our manuscript.

      (a) Possible reasons for the effect of betulin on MpGABR expression have been discussed in our manuscript (Lines 455-466): The regulation of gene expression is sophisticated and delicate (Pope and Medzhitov 2018). The regulatory network controlling GABR expression remains unclear. In adult rats, epileptic seizures has been reported to increase the levels of brain-derived neurotrophic factor (BDNF), which in turn prompted the transcription factors CREB and ICER to reduce the gene expression of the GABR α1 subunit (Lund et al. 2008). In Drosophila, it has been demonstrated that WIDE AWAKE, which regulated the onset of sleep, interacted with the GABR and upregulated its expression level (Liu et al. 2014). In Drosophila brain, circular RNA circ_sxc was found to inhibit the expression of miR-87-3p in the brain through sponge adsorption, thereby regulating the expression of neurotransmitter receptor ligand proteins, including GABR, and ensuring the normal function of synaptic signal transmission in brain neurons (Li et al. 2024). However, it remains unclear how betulin reduces the expression of MpGABR, and further research is needed.

      (b) In the Discussion section, we acknowledged the need for further research to ultimately confirm the mechanism by which betulin competes with GABA for binding to MpGABR (Lines 532-535): Although the mechanism by which betulin competes with GABA for binding to MpGABR requires further experimental validation, our work may have provided a novel target for developing insecticides.

      (c) Besides, we have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (d) Furthermore, the discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #2 (Public review):

      Summary:

      This important study shows that betulin from wild peach trees disrupts neural signaling in aphids by targeting a conserved site in the insect GABA receptor. The authors present a nicely integrated set of molecular, physiological, and genetic experiments to establish the compound's species-specific mode of action. While the mechanistic evidence is solid, the manuscript would benefit from a broader discussion of evolutionary conservation and

      potential off-target ecological effects.

      Strengths:

      The main strengths of the study lie in its mechanistic clarity and experimental rigor. The identification of a betulin-binding single threonine residue was supported by (1) site-directed mutagenesis and (2) functional assays. These experiments strongly support the specificity of action. Furthermore, the use of comparative analyses between aphids and fruit flies demonstrates an important effort to explore species specificity, and the integration of quantitative data further enhances the robustness of the conclusions.

      Weaknesses:

      There are several important limitations that need to be addressed. The manuscript does not explore whether the observed sensitivity to betulin reflects a broadly conserved feature of GABA receptors across animal lineages or a more lineage-specific adaptation. This evolutionary context is crucial for understanding the broader significance of the findings.

      In addition, while the compound's aphicidal effect is well established, the potential for off-target effects in non-target organisms - especially vertebrates - remains unaddressed, despite prior evidence that betulin interacts with mammalian GABAa receptors. There is little discussion on the ecological or environmental safety of exogenous betulin application, such as persistence, degradation, or exposure risks.

      We sincerely thank the reviewer for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we have provided a point-by-point response to address them.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (c) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #1 (Recommendations for the authors):

      (1) L28 Provide the full name of MST.

      Thanks for your suggestion. The full name of MST, microscale thermophoresis, has been supplied.

      (2) L87 in the Order Hemiptera.

      Thanks for your suggestion. Corrected.

      (3) L99 "Leaf bioassay" would be better to differentiate the greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (4) L104 It should be 7 doses, including the "0 mg/mL" control.

      Thanks for your suggestion. Corrected.

      (5) L104 Since the LC50 of pymetrozine is 1.0612 mg/mL, a wider range of doses should have been tested compared to the dose range of betulin.

      Thanks for your comment.

      (a) Firstly, seven doses (0, 0.0625, 0.125, 0.25, 0.5, 1, and 2 mgmL<sup>-1</sup>) were set to calculate the LC50 of betulin and pymetrozine. Since the LC50 values of betulin and pymetrozine are 0.1641 and 1.0612 mgmL<sup>–1</sup>, respectively, which are within the set range, indicating that the set dose range is reasonable and the LC50 values of betulin and pymetrozine are reliable.

      (b) To compare the control effects of betulin and pymetrozine against M. persicae, LC50 of betulin (0.1641 mgmL<sup>-1</sup>) and pymetrozine (1.0612 mgmL<sup>-1</sup>) were used to treat M. persicae.

      (6) L109 Greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (7) L112 Tween-80 and acetone in L103. Keep the order consistent throughout the manuscript.

      Thanks for your suggestion. Corrected.

      (8) L122 Mortality was recorded at 1, 5, 9, and 14 days after treatment. Revise the other similar mistakes throughout the manuscript (e.g. L250, L254, L255, L256, L259, etc.).

      Thanks for your suggestion. Corrected.

      (9) L126 apterous instead of wingless (keep a consistent expression).

      Thanks for your suggestion. Corrected.

      (10) L138 Primer Premier?

      Thanks for your comment. Corrected.

      (11) L141 Add RPS18 primers in Table S2.

      Thanks for your comment. Corrected.

      (12) L155 MEGA7 vs. MEGAX (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (13) L156 NJ method vs. ML method (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (14) L157 2.7. RNAi assay (Remove "In vitro" and re-number the following M&M sections accordingly).

      Thanks for your comment. Corrected.

      (15) L163 Add dsGFP primers in Table S2.

      Thanks for your comment. Corrected.

      (16) L166 apterous instead of wingless (keep a consistent expression).

      Thanks for your comment. Corrected.

      (17) L172 Add the source of pET-B2M vector.

      pET-B2M vector was obtained from BGI (Shenzhen, China), which has been added in our manuscript (Line 194).

      (18) L195 coding sequence instead of cDNA.

      Thanks for your comment. Corrected.

      (19) L198 the mutations of R224A ...

      Thanks for your comment. Corrected.

      (20) L199 TYR), or T228R ...

      Thanks for your comment. Corrected.

      (21) L211 and 90 ng.

      Thanks for your comment. Corrected.

      (22) L213 genomic DNA instead of gDNA, because gDNA may be confused in the context of sgRNA.

      Thanks for your suggestion. Corrected.

      (23) L253 (Fig. 1A-B).

      Thanks for your comment. Corrected.

      (24) L268 Explain why these 15 DEGs were selected for qRT-PCR.

      Thanks for your comment. These 15 DEGs were randomly selected and act as representative DEGs with different expression levels. The reason for selection of these 15 DEGs were added in the manuscript (Lines 295-296).

      (25) L287 What about GABRB? It has a TM domain.

      GABRB refers to “gamma-aminobutyric acid receptor subunit beta-like” annotated on NCBI. Theoretically, it should contain four transmembrane structural domains, while it has only one, indicating that it is incomplete.

      (26) L297 Add dsGFP as another control group.

      Thanks for your comment. Corrected.

      (27) L299 increased by 30.44% (Remove a comma).

      Thanks for your comment. Corrected.

      (28) L308 XM_022318019.1 (or protein accession number with XP_).

      Thanks for your comment. Corrected.

      (29) L338 that THR228 was conserved only in Hemiptera.

      Thanks for your comment. Since our original intention was to emphasize that THR228 is the only conserved among the four key amino acid residues, after careful consideration, we retained the expression "only THR228".

      (30) L342 or T228R.

      Thanks for your comment. Corrected.

      (31) L382 Is pyrhidone a general name for pymetrozine?

      Thanks for your comment. Corrected.

      (32) L450 Remove "and so on".

      Thanks for your comment. Corrected.

      (33) Figure 1D: Remove "Environment friendly". Replace the plant pot image on the right side with the one sprayed with pymetrozine, like the one in Figure 1F.

      Thanks for your comment. 

      (a) "Environment friendly" in Figure 1D has been removed.

      (b) We have attempted to modify the Figure 1D according to your suggestion. However, the modified Figure 1D is similar to Figure 1F and appears monotonous. Therefore, we have retained the original framework of Figure 1D.

      (34) Figure 2E 111036117 and 111041856 are in different IDs (XM_). I suggest keeping GeneID in Figure 2E and Table S2, as shown in Table S4.

      Thanks for your comment. Corrected.

      (35) Figure 2H: Add unit of the heatmap values. Or just add the title (e.g., expression level) on top of the bar.

      Thanks for your comment. Corrected.

      (36) Figure 3A: Add "aa" next to 700.

      Thanks for your comment. Corrected.

      (37) Figure 3E-G: Revise the tick marks on Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (38) Figure 5C: Remove "1" and move "WT" up to the position where "1" was.

      Thanks for your comment. Corrected.

      (39) Figure 5D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (40) Figure 5E: Remove the decimal. (e.g. 5 uM, 10 uM, 20 uM, etc.).

      Thanks for your comment. Corrected.

      (41) Figure 6B: What are the numbers next to the amino acid sequences? Provide the information in the figure caption.

      Thanks for your comment. The numbers next to the amino acid indicates the site of the last residue of the key amino acids, which was supplied in the figure caption.  

      (42) Figure 6D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5. The X-axis title should be betulin (see Figure 5D). In the figure caption at the 5th row from the top, R244A should be R224A.

      Thanks for your comment. Corrected.

      (43) Figure 7E: R122T (not R1272T).

      Thanks for your comment. Corrected.

      (44) Supplementary Figure 1: It should be Figure S1. Add dsGFP in the figure caption.

      Thanks for your comment. Corrected.

      (45) Figure S2: What are the two pink bars and the other bars in brown or blue? Add an appropriate explanation in the figure caption.

      Thanks for your comment. Corrected.

      (46) Table S1: r square?

      Thanks for your comment. It is “r square” and corrected.

      (47) Table S2: (a) Add horizontal lines to separate qPCR, RNAi, cloning, and heterologous expression from each other (b) Replace XM_022318017.1 and XM_022318019.1 with their corresponding GeneIDs, as shown in Table S4. (c) AK340444.1 is a sequence from another aphid (Acyrthosiphon pisum)-Revise it. (d) In the cloning primers, place MpGABR first, followed by MpGABRAP and MpGABRB, as shown in the manuscript and Table S5. (e) Also, in the cloning primers, MpGABRB and MpGABRAP use reverse primers without stop codon, while MpGABR uses stop codon (TCA = TGA in reverse)-Revise it accordingly. Otherwise, provide the reason.

      Thanks for your comment. Corrected.

      (48) Table S3: (a) Add "Drosophila melanogaster" and the target sequence ID in the table caption. Is it KF881792.1, as shown in Table S6? (b) Align the sequences to the left side. 

      Thanks for your comment. 

      (a) The GenBank number of target sequence is KF881792.1 (Drosophila melanogaster). We have added this information in the Table S3 note.

      (b) It has been adjusted according to your suggestion.

      (49) Table S5: (a) Replace the accession numbers with GeneID, as shown in Table S4. K340444.1 is a sequence from another aphid (Acyrthosiphon pisum), (b) Coding sequences with stop codon are 2082, 357, and 753, respectively, while the sequences without stop codon are 2079, 354, and 750, respectively. The lengths of the deduced amino acids are 693, 118, and 250. Revise accordingly.

      Thanks for your comment. Corrected.

      (50) Table S6: (a) Use GenBank No for protein sequences. There is no Gene ID in this table. (b) Order (instead of Class). (c) See my comment on the phylogenetic analysis above.

      Thanks for your comment. Corrected.

      (51) Table S7 (a) Add unit under "Binding Energy". (b) There are two ALA226 [Alkyl] with two different distances. (c) PHE227 at the bottom should be THR228?

      Thanks for your comment.

      (a) The unit of "Binding Energy" was kcalmol<sup>–1</sup>, and it was added in the table caption.

      (b) Refer to Figure 6A, there were two Alkyl interaction between ALA226 and betulin. Therefore, there were two ALA226 [Alkyl] with two different distances.

      (c) Similarly, there were two Pi-Alkyl interactions between PHE227 and betulin. Thus, there were two rows of PHE227 in the table.

      (52) Table S9 (a) R117T should be R122T. (b) r square?

      Thanks for your comment. a and b Corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction

      (a) It lacks a deeper biological and evolutionary framing of the GABA receptor system. As GABA receptors are highly conserved across animal taxa, the observed interaction between betulin and the aphid GABA receptor could have broader implications. This possibility is not addressed in the current version, which limits the reader's appreciation of the relevance of this mode of action.

      (b) Previous reports of betulin activity in mammalian systems are not mentioned in the introduction, even though they are directly relevant to concerns about off-target toxicity. Therefore, the introduction should be revised to (i) briefly introduce the evolutionary conservation of GABA receptors, and (ii) acknowledge that betulin may affect a broader range of organisms, which sets up the need for caution in its application.

      Thanks for your important suggestions.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) The possible effects of betulin on a broader range of organisms have been acknowledged in the Introduction section (Lines 68-77): An immune stimulant, Ir-Bet, was prepared using iridium complex and betulin, which evoked ferritinophagy-enhanced ferroptosis, thereby activating anti-tumor immunity (Lv 2023). The anti-inflammatory effect of betulin has been reported in macrophages at lymphoma site in mice (Szlasa et al. 2023). Betulin has been found to improve hyperlipidemia and insulin resistance and decrease atherosclerotic plaques by inhibiting the maturation of sterol regulatory element-binding protein (Tang et al. 2011). Besides, betulin and its derivatives have been found to exhibit insecticidal activity against Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024).

      (c) At the end of the introduction, we remind that betulin should be used with caution (Lines 111-112): However, given that betulin may affect a wider range of organisms, it should be used with caution.

      (2) Method

      Number of biological replicates in all assays and justification of thresholds used for significance in RNAi and survival experiments are not addressed in the manuscript.

      Thanks for your careful reading. We have checked Materials and Methods section and added corresponding number of biological replicates in all assays. Besides, the p-values for the corresponding significance analyses of RNAi and survival experiments have been added to our Manuscript.

      (2)  Discussion

      (a) Consistent with the comments on the Introduction, the absence of discussion on (i) the evolutionary conservation of GABA receptor sensitivity to betulin, (ii) potential off-target effects in non-target insects and vertebrates (if so, this cannot be use for "eco-friendly pesticide" as the authors stated in the manuscript), and (iii) ecological risks associated with the exogenous application of betulin limits both the interpretive depth and applied relevance of the study.

      (b) To strengthen the Discussion, the authors should consider addressing: (i) whether the observed sensitivity reflects a conserved pharmacological vulnerability across animal taxa or a lineage-specific adaptation; (ii) the potential ecological risks of deploying betulin as a bioinsecticide, and (iii) the need for future research into the environmental fate, degradation, and safety profile of betulin prior to any field-level application.

      Thank you for your valuable comments.

      (a) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (b) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-551): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin had specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. 

      (c) Additionally, at the end of the Discussion, we remind that more research is needed before any field application of betulin (Lines 551-553): In summary, before any field application, further research on the environmental behavior, degradation process, and safety of betulin is needed.

      Reference

      Amiri S, Dastghaib S, Ahmadi M, Mehrbod P, Khadem F, Behrouj H, Aghanoori M, Machaj F, Ghamsari M, Rosik J, Hudecki A, Afkhami A, Hashemi M, Los M, Mokarram P, Madrakian T, Ghavami S. 2020. Betulin and its derivatives as novel compounds with different pharmacological effects. Biotechnology Advances 38: 107409.

      de Almeida Teles AC, dos Santos BO, Santana EC, Durço AO, Conceição LSR, Roman Campos D, de Holanda Cavalcanti SC, de Souza Araujo AA, dos Santos MRV. 2024.

      Larvicidal activity of terpenes and their derivatives against Aedes aegypti: a systematic review and meta-analysis. Environmental Science and Pollution Research 31: 64703-64718.

      Guo L, Qiao X, Haji D, Zhou T, Liu Z, Whiteman NK, Huang J. 2023. Convergent resistance to GABA receptor neurotoxins through plant–insect coevolution. Nature Ecology & Evolution 7: 1444-1456.

      Haddi K, Turchen LM, Viteri Jumbo LO, Guedes RN, Pereira EJ, Aguiar RW, Oliveira EE. 2020. Rethinking biorational insecticides for pest management: unintended effects and consequences. Pest Management Science 76: 2286-2293.

      Huang X, Hao N, Shu L, Wei Z, Shi J, Tian Y, Chen G, Yang X, Che Z. 2025. Preparation and insecticidal activities of betulin-cinnamic acid-related hybrid compounds and insights into the stress response of Plutella xylostella L. Pest Management Science 81: 4243-4255.

      Lee HY, Min KJ. 2024. Betulinic acid increases the lifespan of Drosophila melanogaster via Sir2 and FoxO activation. Nutrients 16: 441.

      Li Q, Wang L, Tang C, Wang X, Yu Z, Ping X, Ding M, Zheng L. 2024. Adipose tissue exosome circ_sxc mediates the modulatory of adiposomes on brain aging by inhibiting brain dme-miR-87-3p. Molecular Neurobiology 61: 224-238.

      Li Y, Wang Y, Gao L, Tan Y, Cai J, Ye Z, Chen A, Xu Y, Zhao L, Tong S, Sun Q, Liu B, Zhang S, Tian D, Deng G, Zhou J, Chen Q. 2022. Betulinic acid self-assembled nanoparticles for effective treatment of glioblastoma. Journal of Nanobiotechnology 20: 39.

      Liu S, Lamaze A, Liu Q, Tabuchi M, Yang Y, Fowler M, Bharadwaj R, Zhang J, Bedont J,

      Blackshaw S, Lloyd Thomas E, Montell C, Sehgal A, Koh K, Wu Mark N. 2014. WIDE AWAKE mediates the circadian timing of sleep onset. Neuron 82: 151-166.

      Lund IV, Hu Y, Raol YH, Benham RS, Faris R, Russek SJ, Brooks Kayal AR. 2008. BDNF selectively regulates GABAA receptor transcription by activation of the JAK/STAT pathway. Science Signaling 1: ra9.

      Lv M, Zheng Y, Wu J, Shen Z, Guo B, Hu G, Huang Y, Zhao J, Qian Y, Su Z, Wu C, Xue X, Liu H, Mao Z. 2023. Evoking ferroptosis by synergistic enhancement of a cyclopentadienyl iridium-betulin immune agonist. Angewandte Chemie International Edition 62: e202312897.

      Nakao T, Banba S. 2021. Important amino acids for function of the insect Rdl GABA receptor. Pest Management Science 77: 3753-3762.

      Pope SD, Medzhitov R. 2018. Emerging principles of gene expression programs and their regulation. Molecular Cell 71: 389-397.

      Szlasa W, Ślusarczyk S, Nawrot Hadzik I, Abel R, Zalesińska A, Szewczyk A, Sauer N, Preissner R, Saczko J, Drąg M, Poręba M, Daczewska M, Kulbacka J, Drąg Zalesińska M. 2023. Betulin and its derivatives reduce inflammation and COX-2 cctivity in macrophages. Inflammation 46: 573-583.

      Tang JJ, Li JG, Qi W, Qiu WW, Li PS, Li BL, Song BL. 2011. Inhibition of SREBP by a small molecule, betulin, improves hyperlipidemia and insulin resistance and reduces atherosclerotic plaques. Cell Metabolism 13: 44-56.

      Tsang SY, Ng SK, Xu Z, Xue H. 2006. The evolution of GABAA receptor–like genes. Molecular Biology and Evolution 24: 599-610.

    1. eLife Assessment

      This study presents a valuable finding about how receptor-ligand binding pathways with multi-site phosphorylation can show non-monotonic responses to increasing ligand affinity and to kinase activity. The authors provide compelling evidence through a simple ordinary differential equation model of such signaling networks with the key new ingredient of ligand-induced receptor degradation. The work will be of interest to physicists and biologists working on signal transduction and biological information processing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors study the steady-state solutions of ODE models for molecular signaling involving ligand binding coupled to multi-site phosphorylation at saturating ligand concentrations. Although the results are in principle general, the work highlights the receptor tyrosine kinases (RTK) as model systems. After presenting previous ODE model solutions, the authors present their own "kinetic sorting" model, which is distinguished by ligand-induced phosphorylation-dependent receptor degradation and the property that every phosphorylation state is signaling competent. The authors show that this model recovers the two types of non-monotonicity experimentally reported for RTKs: maximum activity for intermediate ligand affinity and maximum activity for intermediate kinase activity.

      The main contribution of the work is in demonstrating that their model can capture both types of non-monotonicity, whereas previous models could at most capture non-monotonicity of ligand binding.

      Strengths:

      The question of how energy dissipating, and thus non-equilibrium, molecular systems can achieve steady-state solutions not accessible to equilibrium systems is of fundamental importance in biomolecular information processing and self-organization. Although the authors do not address the energy requirements of their non-equilibrium model, their comparative analysis of different alternative non-equilibrium models provides insight into the design choices necessary to achieve non-monotonic control, a property that is inaccessible at equilibrium.

      The paper is succinctly written and easy to follow, and the authors achieve their aims by providing convincing numerical solutions demonstrating non-monotonicity over the range of parameter values encompassing the biologically relevant regime.

      Weaknesses:

      (1) A key motivating framework for this work is the argument that the ability to tune to recognize intermediate ligand affinities provides a control knob for signal selection that is available to non-equilibrium systems. As such, this seems like a compelling type of ligand selectivity, which is a question of broad interest. However, as the authors note in the results, the previously published "limited signaling model" already achieves such non-monotonicity to ligand binding affinity. The introduction and abstract do not clearly delineate the new contributions of the model.

      The novel benefit of the model introduced by the authors is that it also achieves non-monotonic response to kinase activity. Because such non-monotonicity is observed for RTK, this would make the authors' model a better fit for capturing RTK behavior. However, the broad significance of achieving non-monotonicity to kinase activity is not motivated or supported by empirical evidence in the paper. As such, the conceptual significance of the modified model presented by the authors is not clear.

      UPDATE: The authors have now clarified the significance of the model in elucidating how known motifs (multisite phosphorylation and active receptor degradation) could explain the behavior, including non-monotonicity. The authors have also provided compelling arguments for the biological significance of achieving non-monotonic kinase activity response.

      (2) Whereas previous models used in the literature are schematized in Figure 1, the model proposed by the author is missing (See line 97 of page 3). Without the schematic, the text description of the model is incomplete.

      UPDATE: this issue has been resolved.

      (3) The authors use the activity of the first phosphorylation site as the default measure of activity. This choice needs to be justified. Why not use the sum of the activities at all sites?

      UPDATE: This was a non-issue. The potential misunderstanding has been mitigated by clarifications in the text.

      Comments on revisions:

      All issues previously identified were convincingly addressed. I have no additional suggestions.

    3. Reviewer #2 (Public review):

      Summary:

      In classical models of signaling network, the signaling activity increases monotonically with the ligand affinity. However, certain receptors prefer ligands of intermediate affinity. In the paper, the authors present a new minimal model to derive generic conditions for ligand specificity. In brief, this requires multi-site phosphorylation and that high-affinity complexes be more prone to degrade. This particular type of kinetic discrimination allows to overcome equilibrium constraints.

      Strengths:

      The model is simple, and it adds only a few parameters to classical generic models. They moreover vary these additional parameters in ranges based on experimental observations. They explain how the introduction of these new parameters is essential to ligand specificity. Their model quantitatively reproduces the ligand specificity of a certain receptor. They finally provide testable prediction.

      Weaknesses:

      The naming of multiple variables as activity without precise definitions may be confusing to readers.

      Comments on revisions:

      I thank the authors for addressing my comments. One point remains regarding the naming of multiple variables as activity. Besides using other words, the authors may consider giving precise definitions of terms, e.g. by writing "We define kinase activity as the phosphorylation rate $\omega=k_p\tau$." A connection that appears only at line 204 in the present manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors study the steady-state solutions of ODE models for molecular signaling involving ligand binding coupled to multi-site phosphorylation at saturating ligand concentrations. Although the results are in principle general, the work highlights the receptor tyrosine kinases (RTK) as model systems. After presenting previous ODE model solutions, the authors present their own "kinetic sorting" model, which is distinguished by ligand-induced phosphorylationdependent receptor degradation and the property that every phosphorylation state is signaling competent. The authors show that this model recovers the two types of non-monotonicity experimentally reported for RTKs: maximum activity for intermediate ligand affinity and maximum activity for intermediate kinase activity.

      The main contribution of the work is in demonstrating that their model can capture both types of non-monotonicity, whereas previous models could at most capture non-monotonicity of ligand binding.

      Strengths:

      The question of how energy-dissipating, and thus non-equilibrium, molecular systems can achieve steady-state solutions not accessible to equilibrium systems is of fundamental importance in biomolecular information processing and self-organization. Although the authors do not address the energy requirements of their non-equilibrium model, their comparative analysis of different alternative non-equilibrium models provides insight into the design choices necessary to achieve non-monotonic control, a property that is inaccessible at equilibrium.

      The paper is succinctly written and easy to follow, and the authors achieve their aims by providing convincing numerical solutions demonstrating non-monotonicity over the range of parameter values encompassing the biologically relevant regime.

      Weaknesses:

      (1) A key motivating framework for this work is the argument that the ability to tune to recognize intermediate ligand affinities provides a control knob for signal selection that is available to nonequilibrium systems. As such, this seems like a compelling type of ligand selectivity, which is a question of broad interest. However, as the authors note in the results, the previously published "limited signaling model" already achieves such non-monotonicity in ligand binding affinity. The introduction and abstract do not clearly delineate the new contributions of the model.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from the interplay between two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the non-monotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). 

      The novel benefit of the model introduced by the authors is that it also achieves a nonmonotonic response to kinase activity. Because such non-monotonicity is observed for RTK, this would make the authors' model a better fit for capturing RTK behavior. However, the broad significance of achieving non-monotonicity to kinase activity is not motivated or supported by empirical evidence in the paper. As such, the conceptual significance of the modified model presented by the authors is not clear.

      We thank the reviewer for this comment. We agree that the ability of our model to reproduce non-monotonic dependence on kinase/phosphatase activity was not sufficiently motivated in the original submission. We have now added a brief mention of the biological motivation for nonmonotonic kinase activity in the discussion (lines 229-247) to describe the potential biological significance of this behavior. In particular, non-monotonic kinase/phosphatase dependence may act as a safeguard, filtering out signaling cells with abnormally elevated kinase activity or suppressed phosphatase activity. In the presence of non-monotonic dependence on network activity, downstream signaling would remain contingent on extracellular cues, and cells with extreme kinase/phosphatase imbalances would fail to signal. This could prevent persistent, cueindependent activation, an especially important protective mechanism in pathways regulating metabolically taxing functions such as growth, proliferation, or mounting immune responses. Although direct experimental evidence for the widespread use of this mechanism is currently scarce, our motivation is supported both by the presence of similar regulatory behaviors of phosphatases which arise through distinct mechanisms (such as CD45 in T-cell receptor signaling, (Weiss, 2019)), but highlight the potential biological use of this strategy and by theoretical work on phosphorylation-dephosphorylation cycles, which demonstrates a similar effect in more general settings (Swain, 2013).

      (2) Whereas previous models used in the literature are schematized in Figure 1, the model proposed by the authors is missing (see line 97 of page 3). Without the schematic, the text description of the model is incomplete.

      We thank the reviewer for identifying this oversight, it has been corrected. See Figure 3 in the new text. 

      (3) The authors use the activity of the first phosphorylation site as the default measure of activity. This choice needs to be justified. Why not use the sum of the activities at all sites?

      We thank the reviewer for this comment. We in fact study all sites (Figure 5A in the resubmitted manuscript). Notably, as suggested by the reviewer, the concentration of the first site is indeed represented by the sum of concentrations of all phosphorylated species. The concentration of the 2<sup>nd</sup> site is represented by the sum of concentrations of all species except for the first one and so on (lines 153-155). 

      Reviewer #2 (Public review):

      Summary:

      In classical models of signaling networks, the signaling activity increases monotonically with the ligand affinity. However, certain receptors prefer ligands of intermediate affinity. In the paper, the authors present a new minimal model to derive generic conditions for ligand specificity. In brief, this requires multi-site phosphorylation and that high-anity complexes be more prone to degrade. This particular type of kinetic discrimination allows for overcoming equilibrium constraints.

      Strengths:

      The model is simple, and it adds only a few parameters to classical generic models. Moreover, the authors vary these additional parameters in ranges based on experimental observations. They explain how the introduction of these new parameters is essential to ligand specificity. Their model quantitatively reproduces the ligand specificity of a certain receptor. Finally, they provide a testable prediction.

      Weaknesses:

      The naming of certain variables may be confusing to readers.

      We apologize for the confusion due to unclear presentation. We have clarified our definitions throughout the manuscript. 

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract and introduction present the problem as if this model is solving the fundamental problem of non-monotonic dependence on ligand affinity. However, as the authors noted in their results, this problem has already been solved by a previous phosphorylation model with N-state degradation. What the authors' new model achieves is the additional experimentally observed non-monotonicity of kinase activity dependence. The abstract and introduction should be changed to reflect the actual novel contributions and also to motivate the biological significance of non-montonic kinase activity dependence.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the nonmonotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). We have also provided biological motivation behind nonmonotonic kinase activity dependance (lines 229-247). 

      (2) It is important to show (in the supplemental materials if needed) that the closest equilibrium analog to the model (for example, reversible rate constants from each of the activated states to an inactive state) does not achieve non-monotonicity with ligand affinity.

      We have added a model in the supplementary materials that represents a detailed balance Markov chain. In the model, we imagine that ligand bound receptors undergo a series of equilibrium transitions, all characterized by the same activation and inactivation rate. We show that at saturating ligand levels, the signaling output only depends on the ratio of the activation to the inactivation rate (i.e., the thermodynamic stability of the active site) (lines 466-488).

      (3) Schematics for earlier models are described in Figure 1. However, no schematic for the actual model proposed by the authors is shown. This should be added as a subpanel to Figure 1.

      We thank the reviewer for identifying our omission of our model schematic. We have included our model schematic as its own figure (Figure 3).

      (4) Minor: Figure 1 is referred to as Figure?? In line 97 of page 3.

      We thank the reviewer for identifying this error, it has been corrected. 

      Reviewer #2 (Recommendations for the authors):

      (1) There is an inconsistency between Figure 2(a) and Equation (1), it suggests that p_N is \omega^N/(\omega+\delta)^N. This makes more sense with the model defined in the supplementary material.

      We thank the reviewer for identifying this error. Equation (1) has been updated to reflect the correct relationship.

      (2) The figure presenting the model of the authors appears to be missing.

      We thank the reviewer for identifying this error, it has been corrected (Figure 3 in the new manuscript). 

      (3) The authors describe phosphorylation as irreversible in the intro, but then consider reversible phosphorylation in their model, which may be confusing to readers.

      We thank the reviewer for identifying this source of possible confusion. We have clarified that dephosphorylation is taken to be a distinct irreversible reaction, see lines 105 - 112.

      (4) The authors reuse similar names, e.g., network activity, kinase activity, signaling activity, activity. This is confusing.

      We apologize for the confusion. We note that, within the context of our model, there are important distinctions between signaling activity (the amount of signaling competent receptors) and kinase activity (value corresponding to the phosphorylation rate). We have attempted to use these different terms correctly and are happy to make clarifying corrections if there are any places where a term is misused.  

      (5) Several parameters are defined only in the captions of the figures, such as \beta and \rho.

      We thank the reviewer for identifying this omission, we have added the definitions of beta and rho to the main text (see line 129). 

      (6) The sentence at line 137 lacks some words: "Below, we kinetic...".

      We thank the reviewer for identifying this error, we have added the missing words (“Below, we show how kinetic…”).

      (7) The sentence at line 183 lacks some words: "When kinase activity...".

      We thank the reviewer for identifying this error. We have now corrected it. 

      (8) Figure 5 is very small.

      We will work with the production team to increase the size of this figure.

    1. eLife Assessment

      This important study characterizes and validates a new activity marker - fast labelling of engram neurons (FLEN) - which is transiently active and driven by cFos, allowing the monitoring of intrinsic and synaptic properties of engram neurons shortly after the learning experience. The results convincingly demonstrate the utility of this novel viral tool for studying early changes in the properties of engram cells. FLEN will provide a beneficial tool for the neuroscience community once it is made available at a plasmid repository.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Cupollilo et al describes the development, characterization and application of a novel activity labeling system; fast labelling of engram neurons (FLEN). Several such systems already exist but this study adds additional capability by leveraging an activity marker that is destabilized (and thus temporally active) as well as being driven by the full-length promoter of cFos. The authors demonstrate the activity dependent induction and timecourse of expression, first in cultured neurons and then in vivo in hippocampal CA3 neurons after one trial contextual fear conditioning. In a series of ex vivo experiments the authors perform patch clamp analysis of labeled neurons to determine if these putative engram neurons differ from non-labelled neurons using both the FLEN system as well as the previously characterized RAM system. Interestingly the early labelled neurons at 3 h post CFC (FLEN+) demonstrated no differences in excitability whereas the RAM labeled neurons at 24h after CFC had increased excitability. Examination of synaptic properties demonstrated an increase in sEPCS and mEPSC frequencies as well as those for sIPSCs and mIPSCs which was not due to a change in the mossy fiber input to these neurons.

      Strengths:

      Overall the data is of high quality and the study introduces a new tool while also reassessing some principles of circuit plasticity in the CA3 that have been the focus of prior studies.

      Weaknesses:

      No major weaknesses were noted

    3. Reviewer #2 (Public review):

      Summary:

      Cupollilo et al. investigate the properties of hippocampal CA3 neurons that express the immediate early gene cFos in response to a single foot shock. They compare ex-vivo the electrophysiological properties of these "engram neurons" labeled with two different cFos promoter-driven green markers: Their new virally delivered tool FLEN labels neurons 2-6 h after activity, while RAM contains additional enhancers and peaks considerably later (>24 h). Since the fraction of labeled CA3 cells is comparable with both constructs, it is assumed (but not tested) that they label the same population of activated neurons at different time points. Both FLEN+ and RAM+ neurons in CA3 receive more synaptic inputs compared to non-expressing control neurons, which could be a causal factor for cFos activation, or a very early consequence thereof. Frequency facilitation and E/I ratio of mossy fiber inputs were also tested, but are not different in both cFos+ groups of neurons. One day after foot shock, RAM+ neurons are more excitable than RAM- neurons, suggesting a slow increase in excitability as a major consequence of cFos activation.

      Strengths:

      The study is conducted to high standards and contributes significantly to our understanding of memory formation and consolidation in the hippocampus. Modifications of intrinsic neuronal properties seem to be more salient than overall changes in the total number of (excitatory and inhibitory) inputs, although a switch in the source of the synaptic inputs would not have been detected by the methods employed in this study

      Weaknesses:

      The new tool FLEN is not quantitatively compared to e.g. the TetTag reporter mouse. Nevertheless, the fluorescent images of FLEN+ neurons are quite convincing.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary:

      The manuscript by Cupollilo et al describes the development, characterization, and application of a novel activity labeling system; fast labelling of engram neurons (FLEN). Several such systems already exist but this study adds additional capability by leveraging an activity marker that is destabilized (and thus temporally active) as well as being driven by the full-length promoter of cFos. The authors demonstrate the activity-dependent induction and time course of expression, first in cultured neurons and then in vivo in hippocampal CA3 neurons after one trial of contextual fear conditioning. In a series of ex vivo experiments, the authors perform patch clamp analysis of labeled neurons to determine if these putative engram neurons differ from non-labelled neurons using both the FLEN system as well as the previously characterized RAM system. Interestingly the early labelled neurons at 3 h post CFC (FLEN+) demonstrated no differences in excitability whereas the RAMlabelled neurons at 24h after CFC had increased excitability. Examination of synaptic properties demonstrated an increase in sEPCS and mEPSC frequencies as well as those for sIPSCs and mIPSCs which was not due to a change in the mossy fiber input to these neurons.

      Strengths:

      Overall the data is of high quality and the study introduces a new tool while also reassessing some principles of circuit plasticity in the CA3 that have been the focus of prior studies.

      Weaknesses:

      No major weaknesses were noted.

      Reviewer #2 (Public review): 

      Summary: 

      Cupollilo et al. investigate the properties of hippocampal CA3 neurons that express the immediate early gene cFos in response to a single foot shock. They compare ex-vivo the electrophysiological properties of these "engram neurons" labeled with two different cFos promoter-driven green markers: Their new tool FLEN labels neurons 2-6 h after activity, while RAM contains additional enhancers and peaks considerably later (>24 h). Since the fraction of labeled CA3 cells is comparable with both constructs, it is assumed (but not tested) that they label the same population of activated neurons at different time points. Both FLEN+ and RAM+ neurons in CA3 receive more synaptic inputs compared to non-expressing control neurons, which could be a causal factor for cFos activation, or a very early consequence thereof. Frequency facilitation and E/I ratio of mossy fiber inputs were also tested, but are not different in both cFos+ groups of neurons. One day after foot shock, RAM+ neurons are more excitable than RAM- neurons, suggesting a slow increase in excitability as a major consequence of cFos activation.

      Strengths: 

      The study is conducted to high standards and contributes significantly to our understanding of memory formation and consolidation in the hippocampus. Modifications of intrinsic neuronal properties seem to be more salient than overall changes in the total number of (excitatory and inhibitory) inputs, although a switch in the source of the synaptic inputs would not have been detected by the methods employed in this study

      Weaknesses: 

      With regard to the new viral tool, a direct comparison between the new tool FLEN and existing cFos reporters is missing. 

      Reviewer #1 (Recommendations for the authors):

      I have only minor suggestions for the authors to consider. 

      (1) In the in vitro characterization, the percentage of labelled neurons seems very low after a powerful and prolonged activation. It was somewhat surprising and raised the question of how accurately the FLEN construct reflects endogenous cFOS activity. Could the authors speak to this?

      The reviewer is correct that the level of FLEN positive neurons, as compared to mCherry positive neurons, is low as compared to studies using viral infection with RAM vectors in neuronal cultures (Sorensen et al, 2016, Sun et al, 2020), which is around 70-80% following chemical stimulation. The authors do not provide evidence however for a comparison with endogenous c-Fos activity in cell cultures. The reason for a discrepancy in the effect of chemical stimulation of cultured neurons is not clear, but may depend on culture conditions which may vary between labs. 

      FLEN was constructed using a mouse c-Fos promoter (-355 to +109) (Cen et al, 2003). To answer the reviewer’s question we performed an additional experiment in cultured neurons in which we found that 77.1 % of FLEN positive neurons were also c-fos positive neurons (using immunocytochemistry).

      (2) The authors compare the two labelling strategies and interpret their data with the presumption that both label a similar set of active neurons. This is particularly relevant when they suggest there might be a progressive increase in the excitability of active neurons with time. This is certainly a possibility, but the authors should also consider other possibilities that the two markers might label different populations of neurons. For example, if they require different thresholds for activation, it is possible that one is more sensitive to activity than the other. As these are unknown variables the authors should temper the interpretation accordingly.

      Indeed, the reviewer is correct that this limitation should be discussed. We have added this as a point of discussion in the text (line 355-358). In the article describing the RAM strategy (Sorensen et al, 2016) the authors use RAM to label DG neurons activated during an experience in a context A (Figure 4). Exploiting the fact that engram cells are re-activated when the animal is re-exposed to the same environment of training (memory recall), they performed c-Fos staining 90 minutes following either context A or context B re-exposure. The RAM-c-Fos overlap percentage was higher in A-A rather than A-B (A-A was a bit more than 20%). This means that RAM has captured a group of cells during training that, at least in part, were re-activated during recall. This could in part support the assumption that RAM and c-Fos share a certain overlap. Of course, this was done in DG, while we worked in CA3. In addition, both strategies label in their great majority c-Fos+ neurons (see above answer to point #1). This can not completely rule out the possibility that FLEN and RAM label partly distinct population of activated cells. 

      (3) An increase in the frequency of synaptic events is observed in neurons labelled with both markers. The authors propose that this may be due to an increase in synaptic contacts based on prior studies. However, as this is the first functional assessment why not consider changes in release probability as a mechanism for this finding? 

      We have added this as a possibility in the text (line 362-363).

      (4) It would be useful to include plots of the average frequency of m/sEPSCs and m/sIPSCs in Figures 4 and 5. These figures could also be combined into a single figure.

      We agree with the reviewer that figure 4 and 5 could be merged into a single figure. In the revised version, figure 5A becomes panel C in figure 4. Text and figure descriptions were adjusted accordingly.

      Reviewer #2 (Recommendations for the authors): 

      (1) Abstract, line 24: "In contrast, FLEN+ CA3 neurons show an increased number of excitatory inputs." RAM+ neurons also show an increased number of excitatory inputs, so this is not "in contrast". Also, not just excitatory, but also inhibitory synaptic inputs are more numerous in cFos+ neurons. Please improve the summary of your findings.

      “In contrast” referred to the fact that FLEN+ neurons do not show differences in excitability as compared to FLEN- neurons, as mentioned in the previous sentence. We now provide a more explicit sentence to explain this point: “On the other hand, like RAM+ neurons, FLEN+ CA3 neurons show an increased number of excitatory inputs.”

      (2) Novel tool: Destabilized cFos reporters were introduced 23 years ago and are also part of the TetTag mouse. I am not sure that changing the green fluorescent protein to a different version merits a new acronym (FLEN). To convince the readers that this is more than a branding exercise, the authors should compare the properties (brightness, folding time, stability) of FLEN to e.g. the d2EGFP reporter introduced by Bi et al. 2002 (J Biotechnol. 93(3):231) and show significant improvements.

      We thank the reviewer for this comment which compelled us to evaluate the features of other tools used to label neurons activated following contextual fear conditioing. The key properties of FLEN as compared to other tools used to label engrams is that: (i) it is a viral tool, as opposed to transgenic mice, (ii) a c-fos promoter drives the expression of a brightly fluorescent protein allowing their identification ex vivo for functional analysis, (iii) the fluorescent protein is rapidly destabilized, providing the possibility to label neurons only a few hours after their activation by a behavioural task.

      We did not find any viral tools providing the possibility to label c-fos activated neurons for functional assesment. We have not been able to find references for the use of the d2EGFP reporter introduced by Bi et al. 2002 in a behavioural context. One of the major difference and improvement is certainly the brightness of ZsGreen. In cell cultures, ZsGreen1 showed a 8.6-fold increase in fluorescence intensity as compared with EGFP (Bell et al, 2007).

      Amongst tools with comparable properties, eSARE was developed based on a synthetic Arc promoter driving the expression of a destabilized GFP (dEGFP) (Kawashima et al 2013). We initially used ESARE–dGFP but unfortunately, in our experimental conditions we found that the signal to noise ratio was not satisfactory (number of cells label in the home cage vs. following contextual fear conditining).

      We developed a viral tool to avoid the use of transgenic reporter lines which require laborious breeding and is experimentally less flexible. Nevertheless, many transgenic mice based on the expression of fluorescent proteins under the control of IEG promoters have been developed and used. Some of these mice show a time course of expression of the transgene which is comparable to FLEN. For instance, in organotypic slices from Tet-Tag mice, the time course of expression of EGFP slices follows with a small delay endogenous cFOS expression, and starts decaying after 4 hours (Lamothe-Molina et al, 2022). However, the fluorescence was too weak to visualize neurons in the slice (Christine Gee, personal communication), and imaging is perfomed after immunocytochemistry against GFP. 

      Therefore, we feel that the name given to the FLEN strategy is legitimate. The features of the FLEN strategy were summarized in the discussion (Lines 318-322).

      (3) Line 214: "...FLEN+ CA3 PNs do not show differences in [...] patterns of bursting activity as compared to control neurons." It looks quite different to me (Figure 3E). Just because low n precludes meaningful statistical analysis, I would not conclude there is no difference.

      We agree with the reviewer that the data in Figure 3E are not conclusive due to small sample size, which limits the reliability of statistical comparison. Additionally, the classification of bursting neurons is highly dependent on the specific criteria used, which vary considerably across the literature. To avoid overinterpretation or misleading conclusions, we decided to remove the panel E of Figure 3 showing the fraction of bursting neurons. Nevertheless, we draw the attention to the more robust and interpretable results: RAM⁺ neurons exhibit an increase in firing frequency and a distinct action potential discharge pattern, data which we believe are informative of altered excitability.

      (4) Line 304: Remove the time stamp.

      This was done.

      (5) Line 334: "...results may be explained by an overall increased activity of CA1 neurons..." I don't understand - isn't CA1 downstream of CA3? 

      The reviewer is correct that the sentence was misleading. We removed the reference to CA1, as it was more of a general principle about neuronal activity.

      (6) Line 381: "resolutive", better use "sensitive". 

      This was changed.

      (7) Figure S3: Fear-conditioned animals were 3 days off Dox, controls only 2 days. As RAM expression accumulates over time off Dox, this is not a fair comparison.

      We thank the reviewer for pointing out the incorrect reporting of the experimental design in Figure S3 panel A (bottom), which could lead to misinterpretation of results. In fact, the two groups of mice (CFC vs. HC) underwent all experimental steps in parallel. Specifically, both groups were maintained on and off Doxycycline for the same duration and received viral injection on the same day. 48 hours after Dox withdrawal, the CFC group was trained for contextual conditioning, while the HC group remained in the home cage in the holding room. All animals were thus sacrificed 72 hours after Dox removal. We have corrected the figure to accurately reflect this timeline.

      (8) Please provide sequence information for c-cFos-ZsGreen1-DR. Which regulatory elements of the cFos promoter are included, is the 5' NTR included? This information is very important.

      The information is now provided in the Methods section.

      (9) Please provide the temperature during pharmacological treatments (TTX etc.) before fixation.

      The pharmacological treatment was performed in the incubator at 37°C, this is now indicated in the methods.

    1. eLife Assessment

      This work derives a valuable general theory unifying theories of efficient information transmission in the brain with population homeostasis. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters. Applying this theory to the primary visual cortex, the authors present solid evidence that accounts for stimulus-specific and neuron-specific adaptation. Reviewers have provided additional suggestions for improving the readability of the manuscript, as well as discussing previous results on adapting coding as well as those aspects of experimental data that are not fully explained by the present theory.

    2. Reviewer #1 (Public review):

      This work derives a general theory of optimal gain modulation in neural populations. It demonstrates that population homeostasis is a consequence of optimal modulation for information maximization with noisy neurons. The developed theory is then applied to the distributed distributional code (DDC) model of the primary visual cortex to demonstrate that homeostatic DDCs can account for stimulus specific adaptation.

      Strengths:

      The theory of gain modulation proposed in the paper is rigorous and the analysis is thorough. It does address the issue in an interesting, general setting. The proposed approach separates the question of which bits of sensory information are transmitted (as defined by a specific computation and tuning curve shapes) and how well are they transmitted (as defined by the tuning curve gain optimized to combat noise). This separation permits the application of the developed theory to different neural systems.

      Weaknesses:

      The manuscript effectively consits of two parts: a general theory of optimal gain modulation and a DDC model of the visual cortex. From my perspective it is not entirely clear which components of the developed theory and the model it is applied to are essential to explain the experimental phenomena in the visual cortex (Fig. 12). This "separation" into two parts makes this work, in my view, somewhat diffused.

      Overall, I think this is an interesting contribution and I assess it positively. It has the potential of deepening our understanding of efficient neural representations beyond sensory periphery.

    3. Reviewer #2 (Public review):

      Summary:

      Using the theory of efficient coding, the authors study how neural gains may be adjusted to optimize information transmission by noisy neural populations while minimizing metabolic cost, under the assumption that other aspects of neural activity (i.e. tuning) are determined by the computation performed by the network.

      The manuscript first presents mathematical results for the general case where the computational goals of the neural population are not specified (the computation is implicit in the assumed tuning curves). It then develops the theory for a specific probabilistic coding scheme. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters. The specific application further explains stimulus-specific adaptation in visual cortex.

      The mathematical derivations, simulations and application to visual cortex data are solid as far as I can tell.

      This remains a highly technical manuscript although the authors have improved the clarity of presentation of the general theory (which is the bulk of the work presented) and better motivated/explained modeling assumptions and choices. In the second part, the manuscript focuses on a specific code (homeostatic DDC) showing that this can be implemented by divisive normalization and can explain stimulus-specific adaptation.

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main assumption, and insight, is that computational goals and efficiency can be in some sense factorized: tuning curve shapes are determined by the computational goal, whereas gains can be adjusted to optimize transmission of information.

      One key result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed a close to optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific adaptation in V1.

      The novelty and significance of the work are presented clearly in the newly extended Introduction and Discussion.

      Weaknesses:

      The manuscript remains hard to read. The general theory occupies most of the manuscript, as needed to convey it fully. But as a result the second part on homeostatic DDC and adaptation is somewhat underdeveloped and risks having less visibility than it might deserve.

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the 'adapter' is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here. The authors now acknowledge this limitation in the Discussion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Review):

      Major comments:

      (1) Interpretation of key results and relationship between different parts of the manuscript. The manuscript begins with an information-transmission ansatz which is described as ”independent of the computational goal” (e.g. p. 17). While information theory indeed is not concerned with what quantity is being encoded (e.g. whether it is sensory periphery or hippocampus), the goal of the studied system is to *transmit* the largest amount of bits about the input in the presence of noise. In my view, this does not make the proposed framework ”independent of the computational goal”. Furthermore, the derived theory is then applied to a DDC model which proposes a very specific solution to inference problems. The relationship between information transmission and inference is deep and nuanced. Because the writing is very dense, it is quite hard to understand how the information transmission framework developed in the first part applies to the inference problem. How does the neural coding diagram in Figure 3 map onto the inference diagram in Figure 10? How does the problem of information transmission under constraints from the first part of the manuscript become an inference problem with DDCs? I am certain that authors have good answers to these questions - but they should be explained much better.

      We are very thankful to the reviewer for highlighting the potential confusion surrounding these issues, in particular the relationship between the two halves of the paper – which was previously exacerbated by the length of the paper. We have now added further explanations at different points within the manuscript to better disentangle these issues and clarify our key assumptions. We have also significantly cut the length of the paper by moving more technical discussions to the Methods or Appendices. We will summarise these changes here and also clarify the rationale for our approach and point out potential disagreements with the reviewer.

      Key to our approach is that we indeed do not assume the entire goal of the studied neural system (whether part of the sensory system or not) is to transmit the largest amount of information about the stimulus input (in the presence of noise). In fact, general computations, including the inference of latent causes of inputs, often require filtering out or ignoring some information in the sensory input. It is thus not plausible that tuning curves in general (i.e. in an arbitrary part of the nervous system) are optimised solely with regards to the criterion of information transmission. Accordingly we do not assume they are entirely optimised for that purpose. However, we do make a key assumption or hypothesis (which like any hypothesis might turn out to be partly or entirely wrong): that (1) a minimal feature of the tuning curve (its scale or gain) is entirely free to be optimised for the aim of information transmission (or more precisely the goal of combating the detrimental effect of neural noise on coding fidelity), (2) other aspects of the population tuning curve structure (i.e. the shape of individual tuning curves and their arrangement across the population) are determined by (other) computational goals beyond efficient coding. (Conceptually, this is akin to the modularization between indispensible error correction and general computations in a digital computer, and the need for the former to be performed in a manner that is agnostic as to the computations performed.) We have added two paragraphs in the manuscript which present the above rationale and our key hypothesis or assumption. The first of these was added to the (second paragraph of the) Introduction section, and the second is a new paragraph following Eq. 1 (which is about the gain-shape decomposition of the tuning curves, and the optimisation of the former based on efficient coding) of Results.

      Our paper can be divided into two parts. In the first part, we develop a general, computationally agnostic (in the above sense, just as in the digital computer example), efficient coding theory. In the second part, we apply that theory to a specific form of computation, namely the DDC framework for Bayesian inference. The latter theory now determines the tuning curve shapes. When combined with the results of the first part (which dictate the tuning curve scale or gain according to efficient coding theory), this “homeostatic DDC” model makes full predictions for the tuning curves (i.e., both scale and shape) and how they should adapt to stimulus statistics.

      So to summarise, it is not the case that the problem of information transmission (or rather mitigating the effect noise on coding fidelity under metabolic constraints), dealt with in the first part, has become a problem of Bayesian inference. But rather, the dictates of efficient coding for optimal gains for coding fidelity (under constraints) have been applied to and combined with a computational theory of inference.

      We have added new expository text before and after Eq. 17 in Sec. 2.7 (at the beginning of the second part of the paper on homeostatic DDCs) to again make the connection with the first part and the rationale for its combination with the original DDC framework more clear.

      With the changes outlined above, we believe and hope the connection between the two parts (which we agree with the reviewer, was indeed rather obscure previously) has been adequately clarified.

      (2) Clarity of writing for an interdisciplinary audience. I do not believe that in its current form, the manuscript is accessible to a broader, interdisciplinary audience such as eLife readers. The writing is very dense and technical, which I believe unnecessarily obscures the key results of this study.

      We thank the reviewer for this comment. We have taken several steps to improve the accessibility of this work for an interdisciplinary audience. Firstly, several sections containing dense, mathematical writing have now been moved into appendices or the Methods section, out from the main text; in their place we have made efforts to convey the core of the results, and to providing intuitions, without going into unnecessary technical detail. Secondly, we have added additional figures to help illustrate key concepts or assumptions (see Fig. 1B clarifying the conceptual approach to efficient coding and homeostatic adaptation, and Fig. 8A describing the clustered population). Lastly, we have made sure to refer back to the names of symbols more often, so as to make the analysis easier to follow for a reader with an experimental background.

      (3) Positioning within the context of the field and relationship to prior work. While the proposed theory is interesting and timely, the manuscript omits multiple closely related results which in my view should be discussed in relationship to the current work. In particular, a number of recent studies propose normative criteria for gain modulation in populations: • Duong, L., Simoncelli, E., Chklovskii, D. and Lipshutz, D., 2024. Adaptive whitening with fast gain modulation and slow synaptic plasticity. Advances in Neural Information Processing Systems

      Tring, E., Dipoppa, M. and Ringach, D.L., 2023. A power law describes the magnitude of adaptation in neural populations of primary visual cortex. Nature Communications, 14(1), p.8366.

      Ml ynarski, W. and Tkaˇcik, G., 2022. Efficient coding theory of dynamic attentional modulation. PLoS Biology

      Haimerl, C., Ruff, D.A., Cohen, M.R., Savin, C. and Simoncelli, E.P., 2023. Targeted V1 co-modulation supports task-adaptive sensory decisions. Nature Communications • The Ganguli and Simoncelli framework has been extended to a multivariate case and analyzed for a generalized class of error measures:

      Yerxa, T.E., Kee, E., DeWeese, M.R. and Cooper, E.A., 2020. Efficient sensory coding of multidimensional stimuli. PLoS Computational Biology

      Wang, Z., Stocker, A.A. and Lee, D.D., 2016. Efficient neural codes that minimize LP reconstruction error. Neural Computation, 28(12),

      We thank the reviewer again for bringing these works to our attention. For each, we explain whether we chose to include them in our Discussion section, and why.

      (1) Duong et al. (2024): We decided not to discuss this manuscript, as our assessment is that it is very relevant to our work. That study starts with the assumption that the goal of the sensory system under study is to whiten the signal covariance matrix, which is not the assumption we start with. A mechanistic ingredient (but not the only one) in their approach is gain modulation. However, in their case it is the gains of computationally auxiliary inhibitory neurons that is modulated and not (as in our case) the gain the (excitatory) coding neurons (i.e. those which encode information about the stimulus and whose response covariance is whitened). These key distinction make the connection with our work quite loose and we did not discuss this work.

      (2) Tring et al. (2023): We have added a discussion of the results of this paper and its relationship to the results of our work and that of Benucci et al. This appears in the 7th paragraph of the Discussion. This study is indeed highly relevant to our paper, as it essentially replicates the Benucci et al. experiment, this time in awake mice (rather than anesthetised cats). However, in contrast to the resul‘ts of Benucci et al., Tring et al. do not find firing rate homeostasis in mouse V1. A second, remarkable finding of Tring et al. is that adaptation mainly changes the scale of the population response vector, and only minimally affects its direction. While Tring et al. do not portray it as such, this behaviour amounts to pure stimulus-specific adaptation without the neuron-specific factor found in the Benucci et al. results (see Eq. 24 of our manuscript). As we discuss in our manuscript, when our homeostatic DDC model is based on an ideal-observer generative model, it also displays pure stimulus-specific adaptation with no neuronal factor. Our final model for Benucci’s data did contain a neural factor, because we used a non-ideal observer DDC (in particular, we assumed a smoother prior distribution over orientations compared to the distribution used in the experiment - which has a very sharp peak – as it is more natural given the inductive biases we expect in the brain). The resultant neural factor suppresses the tuning curves tuned to the adaptor stimulus. Interestingly, when gain adaptation is incomplete, and happens to a weaker degree compared to what is necessary for firing rate homeostasis, an additional neural factor emerges that is greater than one for neurons tuned to the adaptor stimulus. These two multiplicative neural factors can approximately cancel each other; such a theory would thus predict both deviation from homeostasis and approximately pure stimulus-specific adaptation. We plan to explore this possibility in future work.

      (3) Ml ynarski and Tkaˇcik (2022): We are now citing and discussing this work in the Discussion (penultimate paragraph), in the context of a possible future direction, namely extending our framework to cover the dynamics of adaptation (via a dynamic efficient gain modulation and dynamic inference). We have noted there that Mlynarski have used such a framework (which while similar has key technical differences with our approach) based on a task-dependent efficient coding objective to model top-down attentional modulation. By contrast, we have studied bottom-up and task-independent adaptation, and it would be interesting to extend our framework and develop a model to make predictions for the temporal dynamics of such adaptation.

      (4) Haimerl et al. (2023): We have elected not to include this work within our discussion either, as we do not believe it is sufficiently relevant to our work to warrant inclusion. Although this paper also considers gain modulation of neural activity, the setting and the aims of the theoretical work and the empirical phenomena it is applied to are very different from our case in various ways. Most importantly, this paper is not offering a normative account of gain modulation; rather, gain modulation is used as a mechanism for enabling fast adaptive readouts of task relevant information.

      (5) Yerxa et al. (2020): We have now included a discussion of this paper in our Discussion section. Note that, even though this study generalises the Ganguli and Simoncelli framework to higher diemsnions, just like that paper it still places strict requirements (which are arguably even more stringent in higher dimensions) on the form of the tuning curves in the population, viz. that there exists a differentiable transform of the stimulus space which renders these unimodal curves completely homogeneous (i.e., of the same shape, and placed regularly and with uniform density).

      (6) Wang et al. (2016): We have included this paper in our discussion as well. As above, this paper does not consider general tuning curves, and places the same constraint on their shape and arrangement as in Ganguli and Simoncelli paper.

      More detailed comments and feedback:

      (1) I believe that this work offers the possibility to address an important question about novelty responses in the cortex (e.g. Homann et al, 2021 PNAS). Are they encoding novelty per-se, or are they inefficient responses of a not-yet-adapted population? Perhaps it’s worth speculating about.

      We are not sure why the relatively large responses to “novel” or odd-ball stimuli should be considered inefficient or unadapted: in the context in which those stimuli are infrequent odd-balls (and thus novel or surprising when occurring), efficient coding theory would indeed typically predict a large response compared to the (relatively suppressed) responses to frequently occurring stimuli. Of course, if the statistics change and the odd-ball stimulus now becomes frequent, adaptation should occur and would be expected to suppress responses to this stimulus. As to the question of whether (large) responses to infrequent stimuli can or should be characterised as novelty responses: this is partly an interpretational or semantic issue – unless it is grounded in knowledge of how downstream populations use this type of coding in V1, which could then provide a basis for solidly linking them to detection of novelty per se. In short, our theory, could be applied to Homann et al.’s data, but we consider that beyond the scope of the current paper.

      (2) Clustering in populations - typically in efficient coding studies, tuning curve distributions are a consequence of input statistics, constraints, and optimality criteria. Here the authors introduce randomly perturbed curves for each cluster - how to interpret that in light of the efficient coding theory? This links to a more general aspect of this work - it does not specify how to find optimal tuning curves, just how to modulate them (already addressed in the discussion).

      We begin by addressing the reviewer’s more general concern regarding the fact that our theory does not address the problem of finding optimal tuning curves, only that of modulating them optimally. As we expound within the updated version of the paper (see the newly expanded 3rd paragraph in Sec. 2.1 and the expanded 2nd paragraph in Introduction), it is not plausible that the sole function of sensory systems, and neural circuits more generally, is the transmission of information. There are many other computational tasks which must be performed by the system, such as the inference of the latent causes of sensory inputs. For many such tasks, it is not even desirable to have complete transmission of information about the external stimulus, since a substantial portion of that information is not important for the task at hand, and must be discarded. For example, such discarding of information is the basis of invariant representations that occur, e.g., in higher visual areas. So we recognise that tuning curve shapes are in general dictated and shaped by computational goals beyond transmission of information or error correction. As such, we have remained agnostic as to the computational goals of neural systems and therefore the shape of the tuning curve. We have made the assumption and adopted the postulate that those computational goals determine the shape of the tuning curves, leaving the gains to be adjuted freely for the purpose of mitigating the effect noise on coding fidelity (this is similar to how error correction is done in computers independendently of the computations performed). by assuming that those computational goals are captured adequately by the shape of tuning curves, this leaves us free to optimise the gains of those curves for purely information theoretic objectives. Finally, we note that the case where the tuning curve shapes are additionally optimised for information transmission is a special case of our more general approach. For further discussion, see the updated version of our introduction.

      We now turn to our choice to model clusters using random perturbations. This is, of course, a toy model for clustering tuning curves within a population. With this toy model we are attempting to capture the important aspects of tuning curve clusters within the population while not over-complicating the simulations. Within any neural population, there will be tuning curves that are similar; however, such curves will inevitably be heterogeneous, as opposed to completely identical. Thus, when we cluster together similar curves there will be an “average” cluster tuning curve (found by, e.g., normalising all individual curves and taking the average), which all other tuning curves within the cluster are deviations from. The random perturbations we apply are our attempt to capture these deviations. However, note that the perturbations are not fully random, but instead have an “effective dimensionality” which we vary over. By giving the perturbations an effective dimensionality, we aim to capture the fact that deviations from the average cluster tuning curve may not be fully random, and may display some structure.

      (3) Figure 8 - where do Hz come from as physical units? As I understand there are no physical units in simulations.

      We have clarified this within the figure caption. The within-cluster optimisation problem requires maximising a quadratic program subject to a constraint on the total mean spike count of the cluster. The objective for the quadratic program is however mathematically homogeneous. So we can scale the variables and parameters in a consistent to be in units of Hz – i.e., turn them into mean firing rates, instead of mean spike counts, with an assumption on the length of the coding time interval. We fix this cluster firing rate to be k × 5 Hz, so that the average single-neuron firing rate is 5 Hz (based on empirical estimates – see our Sec. 2.5). This agrees with our choice of µ in our simulations (i.e., µ = 10) if we assume a coding interval of 0.1 seconds.

      (4) Inference with DDCs in changing environments. To perform efficient inference in a dynamically changing environment (as considered here), an ideal observer needs some form of posterior-prior updating. Where does that enter here?

      A shortcoming of our theory, in its current form, is that it applies only to the system in “steady-state”, without specifying the dynamics of how adaptation temporlly evolves (we assume the enrivonment has periods of relative stability that are of relatively long duration compared to the dynamical timescales of adaptation, and consider the properties of the well-adapted steady state population). Thus our efficient coding theory (which predicts homeostatic adaptation under the outlined conditions) is silent on the time-course over which homeostasis occurs. Likewise, the DDC theory (in its original formulation in Vertes & Sahani) is silent on dynamic updating of posteriors and considers only static inference with a fixed internal model. We have now discuss a new future directoin in the Discussion (where we cite the work of Mlynarski and Tkacik) to point out that our theory can in principle be extended (based on dynamic inference and efficient coding) to account for the dynamics of attention, but this is beyond the scope of the current work.

      (5) Page 6 - ”We did this in such a way that, for all , the correlation matrices, (), were derived from covariance matrices with a 1/n power-law eigenspectrum (i.e., the ranked eigenvalues of the covariance matrix fall off inversely with their rank), in line with the findings of Stringer et al. (2019) in the primary visual cortex.” This is a very specific assumption, taken from a study of a specific brain region - how does it relate to the generality of the approach?

      Our efficient coding framework has been formulated without relying on any specific assumptions about the form of the (signal or noise) correlation matrices in cortex. The homeostatic solution to this efficient coding problem, however, emerges under certain conditions. But, as we demonstrate in our discussion of the analytic solutions to our efficient coding objective and the conditions necessary for the validity of the homeostatic solution, we expect homeostasis to arise whenever the signal geometry is sufficiently high-dimensional (among other conditions). By this we mean that the fall-off of the eigenvalues of the signal correlation matrix must be sufficiently slow. Thus, a fall-off in the eigenvalue spectrum slower than 1/n would favor homeostasis even more than our results. If the fall off was faster, then whether or not (and to what degree) firing rate homeostasis becomes suboptimal depends on factors such as the fastness of the fall-off and also the size of the population. Thus (1) rate homeostasis does not require the specific 1/n spectrum, but that spectrum is consistent with the conditions for optimality of rate homeostasis, (2) in our simulations we had to make a specific choice, and relying on empirical observations in V1 was of course a well-justified choice (moreover, as far as we are aware, there have been no other studies that have characterised the spectrum of the signal covariance matrix in response to natural stimuli, based on large population recordings).

      Reviewer #2 (Public Review):

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough as far as I can tell. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in the cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed close to the optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific and neuron-specific adaptation in V1.

      We thank the reviewer for these assessments.

      Weaknesses:

      The novelty and significance of the work are not presented clearly. The relation to other theoretical work, particularly Ganguli and Simoncelli and other efficient coding theories, is explained in the Discussion but perhaps would be better placed in the Introduction, to motivate some of the many choices of the mathematical models used here.

      We thank the reviewer for this comment; we have updated our introduction to make clearer the relationship between this work and previous works within efficient coding theory. Please see the expanded 2nd paragraph of Introduction which gives a short account of previous efficient coding theories and now situates our work and differentiates it more clearly from past work.

      The manuscript is very hard to read as is, it almost feels like this could be two different papers. The first half seems like a standalone document, detailing the general theory with interesting results on homeostasis and optimal coding. The second half, from Section 2.7 on, presents a series of specific applications that appear somewhat disconnected, are not very clearly motivated nor pursued in-depth, and require ad-hoc assumptions.

      We thank the reviewer for this suggestion. The reviewer is right to note that our paper contains both the exposition of a general efficient coding theory framework in addition to applications of that framework. Following your advice we have implemented the following changes. (1) significantly shortened or entirely moved some of the less central results in the second half of Results, to the Methods or appendices (this includes the entire former section 2.7 and significant shortening of the section on implementation of Bayes ratio coding by divisive normalisation). (2) We have added a new figure (Fig 1B) and two long pieces of text to the (2nd paragraph of) Introduction, after Eq. (1), and in Sec. 2.7 (introducing homeostatic DDCs) to more clearly explain and clarify the assumptions underlying our efficient coding theory, and its connection with the second half of the Results (i.e. application to DDC theory of Bayesian inference), and better motivate why we consider the homeostatic DDC.

      For instance, it is unclear if the main significant finding is the role of homeostasis in the general theory or the demonstration that homeostatic DDC with Bayes Ratio coding captures V1 adaptation phenomena. It would be helpful to clarify if this is being proposed as a new/better computational model of V1 compared to other existing models.

      We see the central contribution of our work as not just that homeostasis arises as a result of an efficient coding objective, but also that this homeostasis is sufficient to explain V1 adaptation phenomena - in particular, stimulus specific adaptation (SSA) - when paired with an existing theory of neural representation, the DDC (itself applied to orientation coding in V1). Homeostatic adaptation alone does not explain SSA; nor do DDCs. However, when the two are combined they provide an explanation for SSA. This finding is significant, as it unifies two forms of adaptation (SSA and homeostatic adaptation) whose relationship was not previously appreciated. Our field does not currently have a standard model of V1, and we do not claim to have provided one either; rather, different models have captured different phenomena in V1, and we have done so for homeostatic SSA in V1.

      Early on in the manuscript (Section 2.1), the theory is presented as general in terms of the stimulus dimensionality and brain area, but then it is only demonstrated for orientation coding in V1.

      The efficient coding theory developed in Section 2 is indeed general throughout, we make no assumptions regarding the shape of the tuning curves or the dimensionality of the stimulus. Further, our demonstrations of the efficient coding theory through numerical simulations - make assumptions only about the form of the signal and noise covariance matrices. When we later turn our attention away from the general case, our choice to focus on orientation coding in V1 was motivated by empirical results demonstrating a co-occurrence of neural homeostasis and stimulus specific adaptation in V1.

      The manuscript relies on a specific response noise model, with arbitrary tuning curves. Using a population model with arbitrary tuning curves and noise covariance matrix, as the basis for a study of coding optimality, is problematic because not all combinations of tuning curves and covariances are achievable by neural circuits (e.g. https://pubmed.ncbi.nlm.nih.gov/27145916/ )

      First, to clarify, our theory allows for complete generality of neural tuning curve shapes, and assumes a broad family of noise models (which, while not completely arbitrary, includes cases of biological relevance and/or models commonly used in the theoretical literature). Within this class of noise covariance models, we have shown numerical results for different values for different parameters of the noise covariance model, but more importantly, have analytically outlined the general properties and requirements on noise strength and structure (and its relationship to tuning curves and signal structure) under which homeostatic adaptation would be optimal. Regarding the point that not all combinations of tuning curves and noise covariances occur in biology or are achievable by neural circuits: (1) If we are guessing correctly the specific point of the reviewer’s reference to the review paper by Kohn et al. 2016, we have in fact prominently discussed the case of information limiting noise which corresponds to a specific relationship between signal structure (as determined by tuning curves) and noise structure (as specified by the noise covariance matrix). Our family of noise models include that biologically relevant case and we have indeed paid it particular attention in our simulations and discussions (see discussion of Fig. 7 in Sec. 2.3, and that of aligned noise in Sec. 2.5). (2) As for the more general or abstract point that not all combinations of noise covariance and tuning curve structures are achievable by neural circuits, we can make the following comments. First, in lieu of a full theoretical or empirical understanding of the achievable combinations (which does not exist), we have outlined conditions for homeostatic adaptations under a broad class of noise models and arbitrary tuning curves. If some combinations within this class are not realised in biology, that does not invalidate the theoretical results, as the latter have been derived under more general conditions, which nevertheless include combinations that do occur in biology and are achievable by neural circuits (which, as pointed out, include the important case of aligned noise and signal structure – as reviewed in Kohn et al.– to which we have paid particular attention).

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the ’adapter’ is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here.

      The theory we provide predicts that, under certain (specified) conditions, we ought to see deviation from exact homeostatic results; indeed, we provide a first order approximation to the optimal gains in this case which quantifies such deviations when they are small. However, unfortunately the form of this deviation depends on a precise choice of stimulus statistics (e.g. the signal correlation matrix, the noise correlation matrix averaged over all stimulus space, and other stimulus statistics), in contrasts to the universality of the homeostatic solution, when it is a valid approximation. In our model of Benucci et al.’s experiment, we restrict to a simple one-dimensional stimulus space (corresponding to orientated gratings), without specifying neural responses to all stimuli; as such, we are not immediately able to make predictions about whether the homeostatic failure can be predicted using the specific form of deviation from homeostasis. However, we acknowledge that this is a weakness of our analysis, and that a more complete investigation would address this question. For reasons of space, we elected not to pursue this further. We have added a paragraph to our Discussion (8th paragraph) explaining this.

      Reviewer#1 (Recommendations for the authors):

      (1) To make the article more accessible I would suggest the following:

      (a) Include a few more illustrations or diagrams that demonstrate key concepts: adaptationof an entire population, clustering within a population, different sources of noise, inference with homeostatic DDCs, etc.

      We thank the reviewer for this suggestion - we have added an additional figure in (Figure 8, Panel A) to explain the concept of clustering within a population. We also added a new panel to Figure 1 (Figure 1B) which we hope will clarify the conceptual postulate underlying our efficient coding framework and its link to the second half of the paper.

      (b) Within the text refer to names of quantities much more often, rather than relying onlyon mathematical symbols (e.g. w,r,Ω, etc).

      We thank the reviewer for the suggestion; we have updated the text accordingly and believe this has improved the clarity of the exposition.

      (2) It is hard to distill which components of the considered theory are crucial to reproducing the experimental observations in Figure 12. Is it the homeostatic modulation, efficient coding, DDCs, or any combination of those or all of them necessary to reproduce the experiment? I believe this could be explained much better, also with an audience of experimentalists in mind.

      We have updated the text to provide additional clarity on this matter (see the pointers to these changes and additions in the revised manuscript, given above in response to your first comment). In particular, reproducing the experimental results requires combining DDCs with homeostatic modulation – with the latter a consequence of our efficient coding theory, and not an independent ingredient or assumption.

      (3) It would be good to comment on how sensitive the results are to the assumptions made, parameter values, etc. For example: do conclusions depend on statistics of neural responses in simulated environments? Do they generalize for different values of the constraint µ? This could be addressed in the discussion / supplementary material.

      This issue is already discussed extensively within the text - see Sec. 2.4, Analytical insight on the optimality of homeostasis, and Sec. 2.5, Conditions for the validity of the homeostatic solution to hold in cortex. In these sections, we outline that - provided a certain parameter combination is small - we expect the homeostatic result to hold. Accordingly, we anticipate that our numerical results will generalise to any settings in which that parameter combination remains small.

      (4) How many neurons/units were used for simulations?

      We apologies for omitting this detail; we used 10,000 units for our simulations. We have edited both the main text and the methods section to reflect this.

      (5) Typos etc: a) Figure 5 caption - the order of panels B and C is switched. b) Figure 6A - I suggest adding a colorbar.

      Thank you. We have relabelled the panels B and C in the appropriate figures so that the ordering in the figure caption is correct. We feel that a colourbar in figure 6A would be unnecessary, since we are only trying to convey the concept of uniform correlations, rather than any particular value for the correlations; as such we have elected not to add a colourbar. We have, however, added a more explicit explanation of this cartoon matrix in the figure caption, by referring to the colors of diagonal vs off-diagonal elements.

      Reviewer#2 (Recommendations for the authors):

      The text on page 10, with the perturbation analysis, could be moved to a supplement, leaving here only the intuition.

      We thank the reviewer for this suggestion; we have moved much of the argument into the appendix so as to not distract the reader with unnecessary technical details.

      Text before eq. 12 “...in cluster a maximize the objective...” should be ‘minimize’?

      The cluster objective as written is indeed maximised, as stated in the text. Note that, in the revised manuscript, this argument has been moved to an appendix to reduce the density of mathematics in the main text.

      Top of page 25 “S<sub>0</sub> and S<sub>0</sub>” should be “S<sub>0</sub> and S<sub>1</sub>”?

      Thank you, we have corrected the manuscript accordingly.

    1. eLife Assessment

      This important study investigates nerve-injury-induced allodynia by studying the role of a subpopulation of excitatory dorsal horn CCK+ neurons that express the estrogen receptor GPR30 and potentially modulate nociceptive sensitivity via direct inputs from primary somatosensory cortex. In this revised version, the authors addressed many of the critiques raised through added analyses that convincingly support the notion that spinal GPR30 neurons are indeed an excitatory subpopulation of CCK+ neurons that contribute to neuropathic pain. While evidence of a direct functional corticospinal projection to CCK+/GPR30+neurons is not fully demonstrated, this work will be of broad interest to researchers interested in the neural circuitry of pain.

    2. Reviewer #1 (Public review):

      In this manuscript, Chen et al. investigate the role of the membrane estrogen receptor GPR30 in spinal mechanisms of neuropathic pain. Using a wide variety of techniques, they first provide convincing evidence that GPR30 expression is restricted to neurons within the spinal cord, and that GPR30 neurons are well-positioned to receive descending input from the primary sensory cortex (S1). In addition, the authors put their findings in the context the previous knowledge in the field, presenting evidence demonstrating that GRP30 is expressed in the majority of CCK-expressing spinal neurons. Overall, this manuscript furthers our understanding of neural circuity that underlies neuropathic pain and will be of broad interest to neuroscientists, especially those interested in somatosensation. Nevertheless, the manuscript would be strengthened by additional analyses and clarification of data that is currently presented.

      Strengths:

      The authors present convincing evidence for expression of GPR30 in the spinal cord that is specific to spinal neurons. Similarly, complementary approaches including pharmacological inhibition and knockdown of GPR30 are used to demonstrate a role for the receptor in driving nerve injury-induced pain in rodent models.

      Weaknesses:

      Although steps were taken to put their data into the broader context of what is already known about the spinal circuitry of pain, more considerations and analyses would help the authors better achieve their goal. For instance, to determine whether GPR30 is expressed in excitatory or inhibitory neurons, more selective markers for these subtypes should be used over CamK2. Moreover, quantitative analysis of the extent of overlap between GPR30+ and CCK+ spinal neurons is needed to understand the potential heterogeneity of the GPR30 spinal neuron population, and to interpret experiments characterizing descending SI inputs onto GPR30 and CCK spinal neurons. Filling these gaps in knowledge would make their findings more solid.

      Revised Manuscript Update:

      In their revised manuscript, Chen et al. have added additional data that establishes GPR30 spinal neurons as a population of excitatory neurons, half of which express CCK. These data help to position GPR30 neurons in the existing framework of spinal neuron populations that contribute to neuropathic pain, strengthening the author's findings.

    3. Reviewer #3 (Public review):

      Summary:

      The authors convincingly demonstrate that a population of CCK+ spinal neurons in the deep dorsal horn express the G protein coupled estrogen receptor GPR30 to modulate pain sensitivity in the chronic constriction injury (CCI) model of neuropathic pain in mice. Using complementary pharmacological and genetic knockdown experiments they convincingly show that GPR30 inhibition or knockdown reverses mechanical, tactile and thermal hypersensitivity, conditioned place aversion, and c-fos staining in the spinal dorsal horn after CCI. They propose that GPR30 mediates an increase in postsynaptic AMPA receptors after CCI using slice electrophysiology which may underlie the increased behavioral sensitivity. They then use anterograde tracing approaches to show that CCK and GPR30 positive neurons in the deep dorsal horn may receive direct connections from primary somatosensory cortex. Chemogenetic activation of these dorsal horn neurons proposed to be connected to S1 increased nociceptive sensitivity in a GPR30 dependent manner. Overall, the data are very convincing and the experiments are well conducted and adequately controlled. However, the proposed model of descending corticospinal facilitation of nociceptive sensitivity through GPR30 in a population of CCK+ neurons in the dorsal horn is not fully supported.

      Strengths:

      The experiments are very well executed and adequately controlled throughout the manuscript. The data are nicely presented and supportive of a role for GPR30 signaling in the spinal dorsal horn influencing nociceptive sensitivity following CCI. The authors also did an excellent job of using complementary approaches to rigorously test their hypothesis.

      Weaknesses:

      The primary weakness in this manuscript involves overextending the interpretations of the data to still propose a role for corticospinal descending facilitation. While the viral tracing demonstrates a potential connection between S1 and CCK+ or GPR30+ spinal neurons, no direct evidence is provided for S1 in facilitating any activity of these neurons in the dorsal horn.

      Comments on the latest version:

      The authors did an excellent job addressing many of the critiques raised. Despite acknowledging that a direct functional corticospinal projection to CCK/GPR30+neurons is not supported by the data and revising the title, these claims still persist throughout the manuscript. Manipulating gene expression or the activity of postsynaptic neurons through a trans-synaptic labeling strategy does not directly support any claim that those upstream neurons are directly modulating spinal neurons through the proposed pathway. Indeed they might, but that is not demonstrated here.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Chen et al. investigate the role of the membrane estrogen receptor GPR30 in spinal mechanisms of neuropathic pain. Using a wide variety of techniques, they first provide convincing evidence that GPR30 expression is restricted to neurons within the spinal cord, and that GPR30 neurons are well-positioned to receive descending input from the primary sensory cortex (S1). In addition, the authors put their findings in the context of the previous knowledge in the field, presenting evidence demonstrating that GRP30 is expressed in the majority of CCK-expressing spinal neurons. Overall, this manuscript furthers our understanding of neural circuity that underlies neuropathic pain and will be of broad interest to neuroscientists, especially those interested in somatosensation. Nevertheless, the manuscript would be strengthened by additional analyses and clarification of data that is currently presented. 

      Strengths: 

      The authors present convincing evidence for the expression of GPR30 in the spinal cord that is specific to spinal neurons. Similarly, complementary approaches including pharmacological inhibition and knockdown of GPR30 are used to demonstrate the role of the receptor in driving nerve injury-induced pain in rodent models. 

      Weaknesses: 

      Although steps were taken to put their data into the broader context of what is already known about the spinal circuitry of pain, more considerations and analyses would help the authors better achieve their goal. For instance, to determine whether GPR30 is expressed in excitatory or inhibitory neurons, more selective markers for these subtypes should be used over CamK2. Moreover, quantitative analysis of the extent of overlap between GPR30+ and CCK+ spinal neurons is needed to understand the potential heterogeneity of the GPR30 spinal neuron population, and to interpret experiments characterizing descending SI inputs onto GPR30 and CCK spinal neurons. Filling these gaps in knowledge would make their findings more solid. 

      Thank you very much for your constructive feedback.

      In response to your suggestion, we have used more specific markers to distinguish excitatory (VGLUT2) and inhibitory (VGAT) neurons via in situ hybridization. These analyses revealed that GPR30 is predominantly expressed in excitatory neurons of the superficial dorsal horn (SDH), as presented in the Results section (lines 117-120) and in Figure 2A-B.

      Additionally, we performed a quantitative analysis to determine the extent of co-localization between GPR30+ and CCK+ neurons. The data were included in the Results (lines 131–132) and Figure 2G.

      Reviewer #2 (Public review):

      Using a variety of experimental manipulations, the authors show that the membrane estrogen receptor G protein-coupled estrogen receptor (GPER/GPR30) expressed in CCK+ excitatory spinal interneurons plays a major role in the pain symptoms observed in the chronic constriction injury (CCI) model of neuropathic pain. Intrathecal application of selective GPR30 agonist G-1 induced mechanical allodynia and thermal hyperalgesia in male and female mice. Downregulation of GPR30 in CCK+ interneurons prevented the development of mechanical and thermal hypersensitivity during CCI. They also show the up modulation of AMPA receptor expression by GPR30. 

      Generally, the conclusions are supported by the experimental results. I also would like to see significant improvements in the writing and the description of results. 

      Methodological details for some of the techniques are rather sparse. For example, when examining the co-localization of various markers, the authors do not indicate the number of animals/sections examined. Similarly, when examining the effect of shGper1, it is unclear how many cells/sections/animals were counted and analyzed. 

      In other sections, there is no description of the concentration of drugs used (for example, Figure 4H). In Figures 4C-E, there is no indication of the duration of the recordings, the ionic conditions, the effect of glutamate receptor blockers, etc 

      Some results appear anecdotal in the way they are described. For example, in Figure 5, it is unclear how many times this experiment was repeated. 

      We sincerely appreciate your valuable feedback and thoughtful recommendations.

      To address your concerns regarding methodological transparency, we have added the following details to the revised manuscript:

      The number of animals and sections analyzed in co-localization studies.

      The number of cells/sections/animals used in each quantification following shGper1 treatment.

      The concentrations of drugs administered (e.g., in Figure 4H).

      Detailed recording conditions, including duration, ionic composition, and pharmacological conditions (Figures 4C-E).

      In addition, we have thoroughly revised the writing throughout the manuscript to enhance clarity and precision in the description of our findings.

      Reviewer #3 (Public review): 

      Summary: 

      The authors convincingly demonstrate that a population of CCK+ spinal neurons in the deep dorsal horn express the G protein-coupled estrogen receptor GPR30 to modulate pain sensitivity in the chronic constriction injury (CCI) model of neuropathic pain in mice. Using complementary pharmacological and genetic knockdown experiments they convincingly show that GPR30 inhibition or knockdown reverses mechanical, tactile, and thermal hypersensitivity, conditioned place aversion, and c-fos staining in the spinal dorsal horn after CCI. They propose that GPR30 mediates an increase in postsynaptic AMPA receptors after CCI using slice electrophysiology which may underlie the increased behavioral sensitivity. They then use anterograde tracing approaches to show that CCK and GPR30 positive neurons in the deep dorsal horn may receive direct connections from the primary somatosensory cortex. Chemogenetic activation of these dorsal horn neurons proposed to be connected to S1 increased nociceptive sensitivity in a GPR30-dependent manner. Overall, the data are very convincing and the experiments are well conducted and adequately controlled. However, the proposed model of descending corticospinal facilitation of nociceptive sensitivity through GPR30 in a population of CCK+ neurons in the dorsal horn is not fully supported. 

      Strengths: 

      The experiments are very well executed and adequately controlled throughout the manuscript. The data are nicely presented and supportive of a role for GPR30 signaling in the spinal dorsal horn influencing nociceptive sensitivity following CCI. The authors also did an excellent job of using complementary approaches to rigorously test their hypothesis. 

      Weaknesses: 

      The primary weakness in this manuscript involves overextending the interpretations of the data to propose a direct link between corticospinal projections signaling through GPR30 on this CCK+ population of spinal dorsal horn neurons. For example, even in the cropped images presented, GPR30 is present in many other CCK-negative neurons. Only about a quarter of the cells labeled by the anterograde viral tracing experiment from S1 are CCK+. Since no direct evidence is provided for S1 signaling through GPR30, this conclusion should be revised. 

      Thank you for your encouraging comments and critical insights.

      We fully acknowledge the concern regarding the proposed direct involvement of corticospinal projections in modulating nociceptive behavior via GPR30 in CCK+ neurons. While our anterograde tracing experiments suggest anatomical overlap, we agree that definitive evidence of functional connectivity is lacking.

      Accordingly, we have revised the Abstract, Discussion, and Graphical Abstract to present our findings more cautiously. We now describe our observations as indicating that S1 projections potentially interact with GPR30<sup>+</sup> spinal neurons, rather than asserting a definitive functional link.

      To support this revised interpretation, we performed additional quantitative analyses examining the co-localization among S1 projections, CCK+, and GPR30+ neurons. Furthermore, we clarified that the chemogenetic activation studies targeted a mixed neuronal population and did not exclusively manipulate CCK+ neurons.

      These changes aim to better align our conclusions with the presented data and provide a more nuanced framework for future investigations.

      Reviewer #1 (Recommendations for the authors): 

      Major corrections 

      (1) Figure 2: The authors conclude that GPR30 is mainly expressed in excitatory spinal neurons because they are labeled by a virus with a Camk2 promoter. While there is evidence that Camk2 is specific to excitatory neurons in the brain, based on RNAseq datasets (e.g. Linnarsson Lab, http://mousebrain.org/adolescent/genesearch.html ) this is less clear cut within the spinal cord. A more direct way to assess the relative expression of GPR30 in excitatory versus inhibitory neurons would be to perform immunohistochemistry or FISH with GPR30/Vglut2/Vgat. 

      Alternatively, if this observation is not crucial for the overall arch of the story, I recommend the authors eliminate these data, as they do not support the idea that GPR30 is mainly in excitatory neurons. 

      We thank the reviewer for highlighting this important limitation. To strengthen our conclusion regarding the neuronal identity of GPR30-expressing cells, we performed fluorescent in situ hybridization (FISH) using vGluT2 (marker for excitatory neurons) and VGAT (marker for inhibitory neurons). The results confirmed that GPR30 is predominantly expressed in vGluT2-positive excitatory neurons within the spinal cord. These new data are presented in the revised manuscript (lines 117-120) and shown in Figure 2A-B.

      (2) (2a) Figure 2: The authors also report that GPR30 is expressed in most CCK+ spinal neurons. A more rigorous way to present the data would be to perform quantification and report the % of CCK neurons that are GPR30. 

      (2b) More importantly, it is unclear what % of GPR30 neurons are CCK+. These types of quantifications would provide useful insights into the heterogeneity of CCK and GPR30 neuron populations, and help align findings of experiments using the behavioral pharmacology using GRP antagonists to the knockdown of Gper1 in CCK spinal neurons - for instance, does a population of GRP30+/CCK- neurons exist? If so, it would be worth discussing what role (if any) that population might play in nerve injury-induced mechanical allodynia. 

      Understanding the breakdown of GPR30 populations becomes even more relevant when the authors characterize which cell types are targeted by descending projections from S1. It is clear that the vast majority of CCK+ neurons that receive descending input from S1 neurons are GPR30+, but there are many other GPR30+ neurons that do not receive input from SI neurons presented in 5M. Is this simply because only a small fraction of CCK+/GPR30+ neurons are targeted by descending S1 projections, or could they represent a distinct population of GPR30 neurons? 

      (2a) We appreciate the suggestion. Quantification showed that approximately 90% of CCK⁺ neurons express GPR30, and about 50% of GPR30⁺ neurons co-express CCK. These data are now provided in the revised Results (lines 131-132) and in Figure 2F-G.

      (2b) Indeed, our data reveal that a substantial portion of GPR30⁺ neurons do not co-express CCK. While this study focuses on GPR30 function in CCK⁺ neurons, we recognize the potential relevance of GPR30⁺/CCK⁻ populations. We have addressed this point in the Discussion (lines 303-306):

      “However, it should be noted that half of GPR30⁺ neurons are not co-localized with CCK⁺ neurons, and further studies are needed to explore the function of these GPR30⁺/CCK⁻ neurons in neuropathic pain.”

      Regarding descending input, our data in Figure 5 show that S1 projections selectively innervate a subset (~30%) of CCK⁺ neurons, most of which co-express GPR30. This suggests that S1-targeted CCK⁺/GPR30⁺ neurons may represent a functionally distinct population. We have added clarification to the revised manuscript, while acknowledging that further studies are needed to elucidate the roles of non-targeted GPR30⁺ neurons.

      (3) Throughout the manuscript both male and female mice were used in experiments. Rather than referring to male and female mice as different genders, it would be more appropriate to describe them as different sexes. 

      As suggested, we have replaced all instances of “gender” with “sex” throughout the revised manuscript.

      (4) Figure 5: To increase the ease of interpreting the figure, in panels 5J and 5N, it would be helpful to indicate directly on the figure panel which another marker was assessed in double-labeling analyses.

      We have revised Figures 5J and 5N to include clear labels identifying the markers used in double-labeling analyses, to improve interpretability.

      Minor corrections: 

      (1) Line 36, I believe the authors mean to say "GPER/GPR30 in spinal neurons", rather than just "spinal". 

      Corrected as suggested. The sentence now reads (line 34):

      “Here we showed that the membrane estrogen receptor G-protein coupled estrogen receptor (GPER/GPR30) in spinal neurons was significantly upregulated in chronic constriction injury (CCI) mice…”

      (2) There are minor grammatical errors throughout the manuscript that interfere with comprehension. Proofreading/editing of the English language use may be beneficial. 

      We have thoroughly revised the manuscript for clarity and corrected grammatical and syntactic errors to improve readability.

      (3) Line 169-170, reads "Known that EPSCs are mediated by glutamatergic receptors like AMPA receptors and several studies have been reported the relationship between GPR30 and AMPA receptor25,29". Rewriting the sentence such that it better describes what the known relationship is between GPR30 and AMPA would be helpful in setting up the rationale of the experiment in Figure 4. 

      We have rewritten this section to better clarify the rationale behind the electrophysiological experiments (lines 161-164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors, and emerging evidence suggesting that GPR30 enhances excitatory transmission by promoting clustering of glutamatergic receptor subunits, we examined whether GPR30 modulates EPSCs via AMPA receptor-dependent mechanisms.”

      (4) Line 198-199 "Then we explored the possible connections among GPR30, S1-SDH projections and CCK+ neuron." In the context of spinal circuitry, "connections" may raise the expectation that synaptic connectivity will be evaluated. What I think best describes what the authors investigated in Figure 5 is the "relationship" between GPR30, S1-SDH projections, and CCK+ neurons. 

      We have revised the sentence accordingly (lines 184-186):

      “Building on previous findings suggesting a functional interaction between S1-SDH projections and spinal CCK⁺ neurons, our current study aimed to further elucidate the structural relationship among GPR30, S1-SDH projections, and CCK⁺ neurons.”

      (5) Figure 5: To increase the ease of interpreting the figure, in panels 5J and FN, it would be helpful to indicate directly on the figure panel which other marker was assessed in double-labeling analyses. 

      We have added direct labels to figure panels to clarify double-labeled analyses in the revised Figure 5J and 5N.

      Reviewer #2 (Recommendations for the authors): 

      (1) Can the authors provide more detail about the distribution of CCK+ cells in the spinal cord and, in particular, the localization of double-stained (CCK/cfos) neurons? 

      We thank the reviewer for this suggestion. To better characterize the distribution of CCK⁺ neurons within the spinal dorsal horn (SDH), we performed immunostaining in CCK-tdTomato mice using lamina-specific markers: CGRP (lamina I), IB4 (lamina II), and NF200 (lamina III–V). Our results demonstrate that CCK⁺ neurons are primarily localized in the deeper laminae of the SDH. These findings are now described in the revised Results (lines 126–129) and shown in Figure 2E.

      In addition, we conducted c-Fos immunostaining in CCK-Ai14 mice and found increased activation of CCK⁺ neurons following CCI. This supports the involvement of CCK⁺ neurons in neuropathic pain. These data are included in the Results (lines 129–131) and Supplementary Figure S4.

      (2) Figure 2A. There is no formal quantification of the percentage of TdTomato+ neurons that are also CCK+. The description of these results is insufficient. 

      We appreciate this point and have revised the description of Figure 2A accordingly. To strengthen our analysis, we conducted additional FISH experiments with vGluT2 and VGAT probes. Quantification revealed that GPR30 is predominantly expressed in excitatory neurons (approximately 60%). These data are shown in the revised Results (lines 117-119) and Figures 2A-B and S3. This supports our conclusion that GPR30 is largely localized to excitatory spinal interneurons.

      (3) Figure 4H. What is the evidence that these are AMPA-mediated currents? This is not explained in the text. 

      Thank you for raising this point. We now provide detailed experimental procedures to clarify that the recorded EPSCs are AMPA receptor–mediated. Specifically, spinal slices from CCK-Cre mice were used, and excitatory postsynaptic currents were recorded in the presence of APV (100 μM, NMDA receptor blocker), bicuculline (20 μM, GABA_A receptor blocker), and strychnine (0.5 μM, glycine receptor blocker), ensuring that the observed currents were AMPA-dependent. These methodological details are now clearly described in the revised Results (lines 165–173) and supported by prior literature (Zhang et al., J Biol Chem 2012; Hughes et al., J Neurosci 2010).

      (1) Yan Zhang, Xiao Xiao, Xiao-Meng Zhang, Zhi-Qi Zhao, Yu-Qiu Zhang (2012). Estrogen facilitates spinal cord synaptic transmission via membrane-bound estrogen receptors: implications for pain hypersensitivity. J Biol Chem. Sep 28;287(40):33268-81.

      (2) Ethan G Hughes, Xiaoyu Peng, Amy J Gleichman, Meizan Lai, Lei Zhou, Ryan Tsou, Thomas D Parsons, David R Lynch, Josep Dalmau, Rita J Balice-Gordon (2010). Cellular and synaptic mechanisms of anti-NMDA receptor encephalitis. J Neurosci. 2010 Apr 28;30(17):5866-75.

      (4) What is the signaling mechanism leading to a larger amplitude of currents after G-1 infusion? 

      We thank the reviewer for this important question. G-1 is a selective agonist for GPR30. Based on previous studies by Luo et al. (2016), we speculate that activation of GPR30 may increase the clustering of glutamatergic receptor subunits at postsynaptic sites, thereby enhancing AMPA receptor-mediated currents. While our current study did not directly address the intracellular signaling cascade, we have incorporated this mechanistic speculation in the Discussion.

      Jie Luo, X.H., Yali Li, Yang Li, Xueqin Xu, Yan Gao, Ruoshi Shi, Wanjun Yao, Juying Liu, Changbin Ke (2016). GPR30 disrupts the balance of GABAergic and glutamatergic transmission in the spinal cord driving to the development of bone cancer pain. Oncotarget 7, 73462-73472. 10.18632/oncotarget.11867.

      (5) Figure 4I. Please include error bars. 

      We have revised Figure 4I to include error bars, as requested.

      (6) Line 198. What is the evidence that AAV2/1 EF1α FLP is an antegrade trans monosynaptic marker? 

      We thank you for this request. AAV2/1 has been widely used for anterograde monosynaptic tracing based on its properties (Wang et al., Nat Neurosci 2024; Wu et al., Neurosci Bull 2021): (1) it infects neurons at the injection site and undergoes active anterograde transport; (2) newly assembled viral particles are released at synapses and infect postsynaptic partners; (3) in the absence of helper viruses, the spread halts at the first synapse, ensuring monosynaptic restriction. We have elaborated on this in the revised manuscript (line 198), citing Wang et al. (Nat Neurosci 2024) and Wu et al. (Neurosci Bull 2021).

      (1) Hao Wang, Qin Wang, Liuzhe Cui, Xiaoyang Feng, Ping Dong, Liheng Tan, Lin Lin, Hong Lian, Shuxia Cao, Huiqian Huang, Peng Cao, Xiao-Ming Li (2024). A molecularly defined amygdalaindependent tetra-synaptic forebrain-tohindbrain pathway for odor-driven innate fear and anxiety. Nat Neurosci. 2024 Mar;27(3):514-526.

      (2) Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (7) Figure 5G. I do not understand the logic of this experiment. A Cre AAV is injected in the S1 cortex. Why should this lead to the expression of tdTomato on a downstream (postsynaptic?) neuron? The authors should quote the literature that supports this anterograde transsynaptic transport.

      We appreciate this question. As described in previous studies (e.g., Wu et al., Neurosci Bull 2021), AAV2/1-Cre injected into the S1 cortex leads to Cre expression in projection targets due to transsynaptic anterograde transport. Subsequent injection of a Cre-dependent AAV (AAV2/9-DIO-mCherry) into the spinal cord enables specific labeling of postsynaptic neurons that receive input from S1. We have clarified this mechanism in line 206 and provided the appropriate citation.

      Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (8) The same question arises when interpreting the results obtained in Figure 6.

      We thank the reviewer for the question, and we have addressed it in point (7).

      (9) Line 257. How do the authors envision that estrogen would change its modulation of GPR30 under basal and neuropathic conditions? Is there any evidence for this speculation? 

      We thank the reviewer for raising this thoughtful question. In the current study, we focused on pharmacologically manipulating GPR30 activity via its selective agonist and antagonist. We did not directly investigate how endogenous estrogen regulates GPR30 under physiological and neuropathic states. We have recognized this limitation and highlighted the need for future research to investigate this regulatory mechanism.

      (10-20) In my opinion, the entire manuscript needs a careful revision of the English language. While one can follow the text, it contains numerous grammatical and syntactic errors that make the reading far from enjoyable. I am highlighting just a few of the many errors. 

      We appreciate the reviewer’s honest assessment. The manuscript has undergone thorough language editing by a native English speaker to correct grammatical errors, improve clarity, and enhance overall readability. We also restructured several sections, particularly the Discussion, to improve logical flow.

      (21) The discussion of results is a bit disorganized, with disconnected sentences and statements, and somewhat repetitive. For example, lines 303 to 306 lack adequate flow. It is also quite long and includes general statements that add little to the discussion of the new findings (lines 326-333). 

      We agree and have revised the Discussion extensively. Disconnected or repetitive sentences (e.g., lines 303-306, 326-333) have been removed or rewritten. For instance, we added a new transitional paragraph (lines 307-311) to improve flow:

      “Abnormal activation of neurons in the SDH is a key contributor to hyperalgesia, and enhanced excitatory synaptic transmission is a major mechanism driving increased neuronal excitability. Therefore, we evaluated excitatory postsynaptic currents (EPSCs) and observed increased amplitudes in CCK⁺ neurons following CCI, suggesting elevated excitability in these neurons.”

      We also removed redundant generalizations to maintain a focused discussion of our novel findings.

      Reviewer #3 (Recommendations for the authors): 

      (1) What is the distribution of GPR30 throughout the spinal cord and DRG? The authors demonstrate that this can overlap with a CCK+ population, but there are many GPR30+ and CCK negative neurons, even in the cropped images presented. It would be helpful to quantify the colocalization with CCK. 

      We thank the reviewer for this important point. As shown in the revised manuscript, GPR30 is expressed in both the spinal cord and dorsal root ganglia (DRG). However, our updated data (Figure 1B) demonstrate that Gper1 mRNA levels in the DRG are not significantly altered after CCI, suggesting a limited involvement of DRG GPR30 in neuropathic pain. These results are described in the revised Results (line 94).

      Regarding spinal co-expression, we performed a detailed quantification. Approximately 90% of CCK⁺ neurons express GPR30, while about 50% of GPR30⁺ neurons are CCK⁺. These co-localization results are now included in the revised Results and presented in Figure 2G.

      (2) It is clear that CCI and GPR30 influence excitatory synaptic transmission in CCK+ neurons. However, these experiments do not fully support the authors' claims of a postsynaptic upregulation of AMPARs. Comparing amplitudes and frequencies of spontaneous EPSCs cannot necessarily distinguish a pre- vs postsynaptic change since some of these EPSCs can arise from spontaneous action potential firing. I suggest revising this conclusion. 

      We appreciate these insightful comments. We fully agree that our data from spontaneous EPSC recordings (sEPSCs) in CCK⁺ neurons are not sufficient to distinguish between pre- and postsynaptic mechanisms, as sEPSCs may include spontaneous presynaptic activity. Therefore, we have revised the text throughout the manuscript to avoid overstating conclusions related to postsynaptic AMPA receptor upregulation.

      (3) What is the rationale for the evoked EPSC experiments from electrical stimulation in "the deep laminae of SDH?" I do not think that this experiment can rule out a presynaptic contribution of GPR30 to the evoked responses, particularly if these are Gs-coupled at presynaptic terminals. Paired-pulse stimulations could help answer this question, otherwise, alternative interpretations, also related to the point above, should be provided. 

      We thank the reviewer for this thoughtful critique. Indeed, electrical stimulation of the deep SDH laminae does not exclude presynaptic involvement, especially considering that GPR30 is a G protein–coupled receptor (GPCR) and could act presynaptically. We agree that paired-pulse ratio (PPR) analysis would be more informative in distinguishing pre- from postsynaptic effects, but this was not performed due to technical limitations in our current experimental setup.

      Accordingly, we have revised our interpretations in both the Results and Discussion to acknowledge that our data do not rule out presynaptic contributions. We now state that GPR30 activation enhances EPSCs in CCK⁺ neurons, while further studies are needed to dissect the precise site of action.

      (4) I appreciate the challenging nature of the trans-synaptic viral labeling approaches, but the chemogenetic and Gper knockdown experiments do not selectively target this CCK+ population of deep dorsal horn neurons. The data are clear that each of these components (descending corticospinal projections, CCK neurons, and GPR30) can modulate nociceptive hypersensitivity, but I do not agree with the overall conclusion that each of are directly linked as the authors propose. I recommend revising the overall conclusion and title to reflect the convincing data presented. 

      We thank the reviewer for this critical observation. We agree that while our data show functional roles for descending cortical input, CCK⁺ neurons, and GPR30 in modulating pain hypersensitivity, the evidence does not establish a definitive direct circuit integrating all three components.

      In response, we have revised our conclusions to reflect this limitation. Specifically, we avoided claiming a direct functional link among S1 projections, CCK⁺ neurons, and GPR30. Instead, we now propose that GPR30 modulates neuropathic pain primarily through its action in CCK⁺ spinal neurons, with potential involvement of descending facilitation from the somatosensory cortex.

      Additionally, we have revised the manuscript title to better reflect our mechanistic focus:<br /> “GPR30 in spinal CCK-positive neurons modulates neuropathic pain.”

      Minor Corrections

      (1) The authors should refer to mice by sex, not gender. 

      Corrected throughout the manuscript.

      (2) Page 9, line 195: "significantly" is used to refer to co-localization of 28.1%. What is this significant to? 

      We have revised the sentence to accurately describe the observed percentage, without implying statistical significance:

      “Our co-staining results revealed that a high proportion of CCK⁺ S1-SDH postsynaptic neurons expressed GPR30” (line 198-199).

      (3) I recommend modifying some of the transition phrases like "by the way," "what's more," and "besides". 

      All informal expressions have been replaced with academic alternatives including “Furthermore,” “Additionally,” and “Moreover.”

      (4) Additional guides to mark specific laminae in the dorsal horn would be useful. 

      We added immunostaining with laminar markers (CGRP for lamina I and NF200 for lamina III–V), and these data are now shown in Figure 2E and described in the Results (lines 126-129).

      (5) Page 5, line 115: immunochemistry should be immunohistochemistry. 

      Corrected as suggested.

      (6) Page 6, line 136: "Confirming the structural connnections" was not demonstrated here. Perhaps co-localization between GPR30 and CCK+. 

      The text was revised to “To functionally interrogate GPR30 and CCK⁺ neurons in neuropathic pain...” (line 133).

      (7) Page 8, line 166: unsure what "took and important role" means. 

      This phrasing was corrected for clarity and replaced with an accurate scientific description.

      (8) Page 8, line 168: "IPSCs of spinal CCK+ neurons" implies that they are sending inhibitory inputs. 

      We revised the term to “EPSCs” to correctly reflect excitatory synaptic currents in CCK⁺ neurons.

      (9) Page 8, line 169: "Known that EPSCs" is missing an introductory phrase. 

      The sentence was rewritten to include an appropriate introductory clause (lines 161–164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors...”

      (10) Page 10, line 227 and 228: "adequately" and "sufficiently" should be adequate and sufficient. 

      We corrected these terms to the proper adjective forms: “adequate” and “sufficient” (lines 224-225).

    1. eLife Assessment

      This study presents a valuable finding regarding the role of oxytocin neurons in thermogenesis and behavioral thermoregulation. The use of numerous converging methods, including behavior, fiber photometry, optogenetics, thermal recordings, metabolic analyses, and more, produces a multi-dimensional dataset delivering findings that provide solid support for the conclusions. Conclusions would be strengthened with validation of the approaches, inclusion of a loss of function experiment, and further investigation of the social nature of the behavior. The maternal findings are, at present, somewhat disconnected from the conclusions. The findings are novel and open new doors for understanding the role of the PVT and oxytocin in thermoregulation work; the work will be of strong interest to the thermoregulation, social behavior, and oxytocin signaling communities.

    2. Reviewer #1 (Public review):

      Summary:

      The authors identify and investigate a specific population of PVNOT neurons (oxytocin neurons of the paraventricular hypothalamus) that seem to be involved in both behavioral and autonomic thermoregulation. These cells are activated by social thermoregulatory behaviors, but can influence thermoregulation in both social and nonsocial contexts, specifically during transitions and when mice are at low core body temperature (Tb).

      Strengths:

      The manuscript has many strengths.

      This is a novel study, with a clear question that is addressed using an array of well-designed experiments employing integrative methods. Most of the figures are well-developed, and the analysis is generally rigorous and well-detailed. The authors are clearly very experienced in this field, and indeed, their scholarly introduction and discussion sections are to their credit.

      The link between thermoregulation and the oxytocin system is well established, as is the link between social behavior and the same broad system. However, the link between these three things is novel, if it can be well substantiated. I am not persuaded that was achieved here, but I do think this manuscript has many novel and useful offerings.

      The authors use a cooling floor, and only go down to 10 degrees Celsius. This is fine, but I would like to see the effects using ambient temperature also. This is not a crucial issue, as it is not necessary for the authors' interpretations, but it could improve measurement sensitivity.

      Through an elegant behavioral experiment in Figure 1, the authors identify c-Fos patterns in the PVN that are activated by active social huddling, and they show that at the RNA level these cells overlap with oxytocin, indicating that they are oxytocin-producing cells. But this is not well discussed or indeed quantified.

      The authors engage in a deep analysis of fiber photometry experiments, first by observing PVNOT neuron overall activity during a variety of different behaviors in the context of three different temperatures. Activity was associated with nesting, quiescence, and both types of huddling (when social opportunities exist). Social situations did not strongly affect this, nor did temperature conditions. These analyses indicate that the PVNOT neurons are involved in mediating specific behavioral outputs.

      With more detailed analysis, the authors investigated how PVNOT neuronal activity relates to behavioral state transition. They found that the probability of peak PVNOT neural activity strongly predicts the offset of quiescence or quiescent huddling, and therefore can be argued to signal an increase in physical activity, and as such, increased metabolism. However, the opposite pattern was observed for huddling and nesting (onset being associated with PVNOT activity), again arguing for increased thermogenesis as a function.

      What is particularly compelling is that these peaks of activity tend to occur during low Tb, again arguing for the function in increasing body warmth.

      The authors then employ an impressive setup where they image brown adipose tissue (BAT) in tandem with DeepLabCut (DLC) based animal tracking. Crucially, BAT activity and surface temperature correlated with the calcium peak of PVNOT neurons.

      Lastly, optogenetic activation of PVNOT neurons increased Tb when it was in the lower range, but not when in the higher range. It also affected BAT and rump temperature, again at low Tb. However, there is no real effect on behavior, except a trend in activity.

      The authors do some interesting tracing work at the end, though this is not functionally explored. That is not a criticism, as it does seem like this would be a whole follow-up study.

      Weaknesses:

      While novel and valuable, the manuscript feels incomplete in its current form.

      The main evidence lacking is a loss of function of the experiment. Ideally, the authors would chronically and/or acutely inhibit PVNOT neurons to establish their necessity. I know this seems obvious, but I think it is important.

      The relative lack of behavioral analysis following optogenetic activation of PVNOT neurons is puzzling. The authors must surely want to study what this intervention does to behavioral state transitions. I feel that the current level of analysis limits the overall conclusions of this study to a large extent.

      A broader criticism is that the social dimension of this manuscript seems overplayed. Naturally, oxytocin signalling can be implicated in social behavior based on a large literature. However, the focus on social thermogenesis seems like a crude integration of social behavior and thermogenesis. Given that the authors see their effects in both social and nonsocial cases of thermoregulation, I am not sure the attempts at integrating social functions and thermogenic functions of PVNOT neurons are warranted. That is, unless the authors have further experiments or analysis that can convincingly justify this link.

      In addition, the analysis of virgin females and lactating mothers seems out of place in Figure 4.

      The c-Fos/oxytocin overlap needs to be quantified.

      The methods section could be improved by explaining how the authors exclude animals that exhibit both types of huddling, if they occur within a 90-minute time window. This seems like it could cause significant confounds.

      The computer vision model is not well-explained. The authors need to be far more explicit here about how it was validated.

      The authors should cite and consider this preprint: https://www.biorxiv.org/content/10.1101/2024.09.17.613378v1

    3. Reviewer #2 (Public review):

      Summary:

      This is a very interesting study from Vandendoren and colleagues examining the role of PVN oxytocin neurons during thermoregulatory behaviors, in particular during thermoregulatory huddling. The findings are important and compelling, and have implications for the thermoregulation field as well as the social/naturalistic behavior field.

      Strengths:

      The study is very creative and tackles a challenging task to examine how natural and social behavior influences neural circuits for a homeostatic system such as thermoregulation. The authors use a combination of state-of-the-art tools (photometry, optogenetics, automated behavior tracking, thermal imaging, and core body temperature measurement), often in combination with each other, to produce a rigorous and high-dimensional dataset. Carrying out tightly temperature-controlled experiments and examining natural behavior, neural activity, and body physiology simultaneously is quite a feat. I applaud the authors for taking this on in a rigorous and detailed manner. This paper will be valuable for both the thermoregulation field as well as for researchers interested in naturalistic social behaviors. The conclusions are supported by the data.

      Weaknesses:

      I have a number of questions and suggestions for clarification that would help improve the interpretation of the findings.

      (1) Figure 1D-F: It would be helpful to include representative images of cFos expression in the PVN, LS, and DMH during both quiescent and solo huddling conditions, to better illustrate the reported differences.

      (2) Figure 1C: The data suggest a general suppression of neural activity during sleep-associated quiescent huddling, which somewhat complicates the interpretation of what specifically the active huddling cells are responding to. A more informative control might have been a comparison between huddling and a more generic form of social engagement (e.g., dyadic sniffing) to assess whether huddling-responsive neurons are broadly tuned to social stimuli. While it may not be feasible to add this experimentally at this time, a brief discussion of this limitation in the main text would be valuable.

      (3) Figure 2H-J vs. Figure 1: The fiber photometry data suggest increased PVN activity during quiescent huddling vs active huddling, which appears to contrast with the cFos results from Figure 1. It would be helpful for the authors to comment on possible reasons for this discrepancy-e.g., methodological differences, temporal resolution, or cell-type specificity.

      (4) Figure 2O: A comparable linear regression for active huddling would be informative to assess whether the observed relationships extend across behavioral states.

      (5) Temperature manipulation: The use of floor temperature changes presents a distinct physiological and sensory experience from, for example, manipulation of ambient temperature. A discussion of how this choice may affect neural circuit engagement or interpretation of thermoregulatory responses would be beneficial.

      (6) Correlations with behavior: Across the manuscript, it would be informative to see correlations between huddle duration and neural activity (e.g., cFos expression, calcium signal magnitude). Similarly, do longer huddles produce greater thermogenic effects?

      (7) Lactating vs. virgin mothers: The inclusion of maternal data is intriguing but feels somewhat disconnected from the central huddling-thermoregulation narrative. If these experiments are to remain, additional explanation of their rationale and how they fit into the broader story would help clarify their relevance.

      (8) Optogenetic manipulation: Have the authors tested the effect of PVN OT neuron stimulation or inhibition during huddling? Even a negative result would be of interest to the field. If these data exist (main or supplementary), I apologize for missing them. If not, the authors might consider including them or commenting briefly on any attempts or challenges in carrying out these experiments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to elucidate the relationship between physiological state (i.e., behavioral status and thermogenic sympathetic activity) and the activity of hypothalamic paraventricular oxytocin (PVNOT) neurons in female mice. They studied this by combining automated classification of mouse behavior via video-based analysis with calcium imaging of PVNOT neuron activity. Sympathetic thermogenesis was inferred from surface temperature changes captured by infrared thermography, and the authors provided their custom analysis scripts in the manuscript. Notably, they found that a strong, pulsatile activation of PVNOT neurons was "occasionally" observed immediately before the animals transitioned from a resting to an active state. This pulsatile activity was observed in both pair-housed and individually housed animals. While PVNOT neurons are often associated with social behaviors, this finding suggests that the oxytocinergic system is also engaged during naturalistic behaviors, even in the absence of social interactions. If experiments were more convincingly performed and presented, the results would point to a broader physiological role of central oxytocin, including in the regulation of fundamental brain states and homeostatic processes, and offer a new perspective on the functional significance of central oxytocin signaling.

      Strengths:

      The oxytocinergic neural system is believed to subserve a wide range of physiological functions, and elucidating these roles requires monitoring PVNOT neuronal activity under various behavioral contexts, as well as manipulating this activity to establish causal links. In the present study, the authors show a technically sound experimental framework that integrates behavioral tracking in both individually and group-housed mice with the observation and manipulation of PVNOT neuron activity. This experimental setup represents a valuable methodological resource for researchers investigating the physiological functions of oxytocin.

      Weaknesses:

      While this study successfully established a new experimental setup for simultaneous analyses of behavior and PVNOT neuronal activity, there are several concerns regarding the interpretation of the results and the robustness of the conclusions, which should be more thoroughly addressed.

      (1) The study relies on the assumption that calcium imaging and optogenetic manipulation were restricted only to PVNOT neurons. However, the specificity of AAV-mediated gene expression was not verified quantitatively. A fair number of cell bodies in the PVN expressed GCaMP8s, but not OT, indicating potential off-target expression (see Figure S2A, B). The lack of quantitative validation weakens confidence in the causal interpretation of the results.

      (2) The study focuses on the transition from rest to active states following pulsatile activity of PVNOT neurons. However, the physiological significance of this pulsatile activity remains unclear. According to the authors, pulsatile activity occurred with an approximately 20% probability within 100 seconds prior to the end of the resting state. This implies that, in the remaining 80% of rest-to-active transitions, pulsatile PVNOT activity did not occur, suggesting that it is not essential for initiating the transition. A comparative analysis of behavioral and thermogenic changes between transitions with and without pulsatile PVNOT activity would help to further clarify the functional relevance of this phenomenon and strengthen the authors' interpretation of the findings.

      (3) The study identifies a correlation between pulsatile activity of PVNOT neurons and rest-to-active transitions, and tests for a causal relationship using optogenetic stimulation. However, since PVNOT neurons are known to co-release other neurotransmitters such as glutamate, it remains unclear whether the observed effects are mediated specifically through oxytocin receptor signaling. To address this question, functional intervention experiments using oxytocin receptor antagonists or receptor knockout mice are necessary.

      (4) The authors attempted to detect BAT thermogenesis and skin vasomotion using infrared thermography. This technique measures only skin hair temperatures (since the skin was not shaved), but does not measure "BAT temperature" or "vasomotor tone". As seen in Figure 5E, the temperatures of the body surface areas ("BAT", "Rump", and "Dorsal surface") mostly changed in parallel, indicating that these temperatures are strongly affected by body core temperature. Therefore, the thermographic measurements in this study did not provide convincing information on BAT thermogenesis or skin vasomotion. To avoid misleading reports, the authors need to use other techniques to directly measure temperatures, such as telemetry.

      (5) Photostimulation of PVNOT neurons increased Tb after 400 sec (6.6 min) (Figure 5). This latency is too long to conclude that the neuronal stimulation elicited BAT thermogenesis. A more reasonable explanation is that the increase in Tb was caused by the induction of physical activity (Figure S4C), which slowly generates heat and contributes to the elevation of Tb. However, this view contradicts the authors' claim. To address this concern, the authors should directly measure BAT thermogenesis and compare it with the rate of Tb elevation. If BAT thermogenesis occurs, the rate at which the BAT temperature increases must exceed the rate at which Tb rises.

    5. Author response:

      (1) Maternal lactation assay and PVN oxytocin neuron identity

      Reviewers and editors noted that the maternal lactation assay felt out of place (Editors, R1, R2) and asked for clearer validation of AAV specificity in the PVN (R3). These issues are linked: the primary purpose of the lactation assay was to physiologically validate that the recorded neurons are oxytocinergic, as PVNOT neurons exhibit well-established pulsatile activity during lactation.

      In response, we will (i) explicitly frame the lactation assay as a validation experiment, (ii) streamline its presentation to sit naturally with our identity-validation rationale, and (iii) clarify our AAV targeting and expression controls; we will also address our oxytocin immunohistochemistry quantification and its limitations (we observed notable intra-individual and technical variability in oxytocin immunoreactivity), which motivated the complementary physiological approach.

      (2) Clarifications and analyses.

      The reviewers pointed to several analyses, inferences, and conclusions that should be clarified. We will clarify: (i) the oxytocin histology in Figure 1 (marker definitions and quantification), (ii) the roles of floor versus ambient temperature, and (iii) further elucidate some of the quantitative links among behavioral state, neural activity, and body temperature (e.g., behavior bout duration vs. neural responses and Tb), (iv) the computer vision methodology. These additions will address the reviewers’ requests for clearer inferences and presentation.

      (3) Optogenetic inhibition. 

      We appreciate the suggestion to include an inhibition experiment (Editors, R1, R2). While interesting, this is beyond the scope of the current revision. Our stimulation experiments were designed to functionally test a specific observation from calcium imaging, namely, that PVNOT neurons show bursts of heightened activity at transitions from quiescence to arousal/thermogenesis, and to assess causal sufficiency for thermogenic/arousal-related readouts. We will make this rationale explicit, discuss the scope limits of the current dataset, and note inhibition as an important direction for future work.

    1. eLife Assessment

      This valuable study identifies a brown adipose tissue-specific heat shock factor 1-alcohol dehydrogenase 5 (ADH5) molecular cascade as a regulator of systemic aging, showing that ADH5 deficiency contributes to BAT dysfunction and health decline in aged mice. While there is evidence to support this mechanism, the conclusions remain incomplete, particularly regarding statistical rigor and clarity in data presentation.

    2. Reviewer #1 (Public review):

      Sebag et al. addressed the role of ADH5 in BAT in the development of aging and metabolic disarrangements associated with it. This is a follow-up study after the authors' demonstration of the role of BAT ADH5 in glucose homeostasis, obesity, and cold tolerance. By ablating ADH5 specifically in brown adipocytes or pharmacologically modulating ADH5 through activation of its transcription factor, the authors conclude that preservation of BAT function is crucial for healthy aging and ADH5 is causally involved in this process. The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging. However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of the enzyme Alcohol Dehydrogenase 5 (ADH5) in brown adipose tissue (BAT) during aging. BAT is crucial for thermogenesis and energy balance, but its function and mass diminish with age, contributing to metabolic dysfunction and age-related diseases. ADH5, also known as S-nitrosoglutathione reductase, regulates nitric oxide (NO) signaling by damaging S-nitrosylation modifications from proteins. The authors show that aging in mice leads to increased protein S-nitrosylation but reduced ADH5 expression in BAT, resulting in impaired metabolic and cognitive functions. Deletion of ADH5 in BAT accelerates tissue senescence and systemic metabolic decline.

      Mechanisticaremoving lly, aging suppresses ADH5 via downregulation of heat shock factor 1 (HSF1), a master regulator of protein homeostasis. Importantly, pharmacologically boosting HSF1 improves BAT function and mitigates both metabolic and cognitive declines in aged mice. The findings highlight a critical HSF1-ADH5 pathway in BAT that protects against aging-related dysfunction, suggesting that targeting this pathway may offer new therapeutic strategies for improving metabolic health and cognition during aging.

      Strengths:

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By identifying a specific enzyme that controls SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age-related decline, the authors have identified a putative mechanism to combat age-related decline in BAT function.

      Weaknesses:

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. The only mention of sex I could find is that the authors reported the general protein SNO status in BAT is increased with age in male C57Bl/6J mice. Is this also true in female mice? For all of the ADH5 knockout mouse data, are these also male mice? Do female ADH5 knockout mice have a consistent phenotype, or are the sex differences?

      (2) It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B).

      (3) For Figure 4D, the ChiP, it would be better to show the IgG control pulldowns. Also, there's an unexpected thing where all the values for the Adh5 flox mice are exactly the same - how is this possible? Finally, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo?

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured. I assume it's HSF1A, and maybe it's the part in the methods with the Metabolomic Analysis, but this wasn't clear. It would also help if release from the NC-Vehicle formulation could be included as a negative control.

      (5) What happens to BAT protein S-nitrosylation in HSF1A-treated mice?

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice?

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?

      (8) Figure 3B looks a bit odd since 7 of the 12 total mice seem to have an IL-beat level of exactly 5. I was a bit unclear about why arbitrary units were used for IL-1β levels since it says an ELISA was used to quantify IL-1β; however, in the methods the authors describe a Bio-Rad Laboratories Bio-plex Pro Mouse Cytokine 23-Plex approach, which I don't think is an ELISA. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels?

      (9) Figure 2C and 2D: I don't really understand why the Heat or VO2 need to be expressed as fold changes. Can't these just be expressed with absolute units? It's also confusing why the heat fold change is 1.0 in the light and the dark for the floxed animal. I bet this is because the knockout is normalized to the floxed animal for light and then normalized again for the dark period, but since both are on the same graph, readers could be confused into thinking there is no difference in the heat production or VO2 between light and dark, which would be surprising. This could all just be solved if absolute units were used.

    4. Author response:

      Reviewer #1 (Public review):

      The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging.  However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution. 

      We greatly appreciate the reviewer’s encouragement. Our team is fully committed to maintaining clarity and rigor in the design, execution, and reporting of this study. We are grateful to the reviewers for bringing these issues to our attention. We also acknowledge and are working on that several statistical analyses could be reperformed to better emphasize our focus on the genetic effect of ADH5 deletion in mice of the same age.

      Reviewer #2 (Public review):

      Strengths: 

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By identifying a specific enzyme that controls SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age-related decline, the authors have identified a putative mechanism to combat age-related decline in BAT function. 

      We greatly appreciate the reviewer’s encouragement. 

      Weaknesses: 

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. 

      We thank the reviewer for the insightful remark, and we agree with the reviewer that sex needs to be considered as a biological variable. We will assess ADH5 expression in aged female mice.

      (2)  It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B). 

      We thank the reviewer for the comment/suggestion. Indeed, we have measured the ADH5 expression in both brown adipose tissue (BAT) and inguinal adipose tissue (iWAT). We regret that we did not include our results in the first submission and will provide these results in the revised manuscript.

      (3)  For Figure 4D, the ChiP, it would be better to show the IgG control pulldowns. Finally, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo? 

      We thank the reviewer for their thoughtful comment and will provide detailed information in the revised manuscript.

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured.

      We apologize for not making these critical points clearer in the first submission. In the revised manuscript we will include, in detail, the logistics of the experiments in the materials and methods section, figure annotation and figure legends.  

      (5) What happens to BAT protein S-nitrosylation in HSF1A-treated mice? 

      We thank the reviewer for the insightful remark, and we will measure general protein Snitrosylation status in the BAT of HSF1A-treated mice. 

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice? 

      We regret that we did not describe our results clearly in the first submission and will provide detailed information in the revised manuscript.

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?  

      We regret that we did not present results clearly in the first submission and will provide detailed information in the revised manuscript.

      (8) Figure 3B looks a bit odd. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels? 

      We will provide detailed information in the revised manuscript.

      (9) Figure 2C and 2D: I don't really understand why the Heat or VO2 need to be expressed as fold changes. Can't these just be expressed with absolute units? 

      We thank the reviewer for the insightful comment. We will present these results as suggested in the revised manuscript.

    1. eLife Assessment

      This modelling study tests several hypotheses describing how seasonality and migration drive the epidemiology of Rift Valley Fever Virus among transhumant cattle in The Gambia. The work is methodologically solid, and findings offer valuable insights into how the movement of cattle in and out of the Gambia River and Sahel ecoregions could lead to source-sink transmission dynamics among cattle subpopulations, sustaining endemic transmission.

    2. Joint Public Review:

      Summary:

      This study uses data from a recent RVFV serosurvey among transhumant cattle in The Gambia to inform the development of an RVFV transmission model. The model incorporates several hypotheses that capture the seasonal nature of both vector-borne RVFV transmission and cattle migration. These natural phenomena are driven by contrasting wet and dry seasons in The Gambia's two main ecoregions and are purported to drive cyclical source-sink transmission dynamics. Although the Sahel is hypothesized to be unsuitable for year-long RVFV transmission, findings suggest that cattle returning from the Gambia River to the Sahel at the beginning of the wet season could drive repeated RVFV introductions and ensuing seasonal outbreaks. The model is also used to evaluate the potential impacts of cattle movement bans on transmission dynamics, although there is doubt about the certainty of these latter findings in light of various simplifying assumptions.

      Strengths:

      Like most infectious diseases in animal systems in low- and middle-income countries, the transmission dynamics of RVFV in cattle in The Gambia are poorly understood. This study harnesses important data on RVFV seroepidemiology to develop and parameterize a novel transmission model, providing plausible estimates of several epidemiological parameters and transmission dynamic patterns.

      This study is well written and easy to follow.

      The authors consider both deterministic and stochastic formulations of their model, demonstrating potential impacts of random events (e.g., extinctions) and providing confidence regarding model robustness.

      The authors use well-established Bayesian estimation techniques for model fitting and confront their transmission model with a seroepidemiological model to assess model fit.

      Elasticity analyses help to understand the relative importance of competing demographic and epidemiological drivers of transmission in this system.

      Weaknesses:

      The model predicts relatively stable annual dynamics reminiscent of a seasonal endemic pathogen, but RVF in sub-Saharan Africa is often characterized as causing periodic epizootics with sustained lulls in between outbreaks. Do the authors believe this conventional wisdom regarding RVF epidemiology is wrong, and that their results better support that transmission patterns are seasonal but truly relatively stable year-over-year, at least in the Gambia? The authors should discuss whether these predicted dynamics could be an artefact of the model's structure, and what ramifications this could have for their conclusions.

      It is unclear how the network analysis is used to inform the model. The network (Figure S2) suggests a highly fragmented population, which could better support, for example, a herd metapopulation approach. The first results section highlights that transhumant movements cover large distances (perhaps to justify the assumption of homogenous mixing within each ecoregion?), but the median (13.5km) is quite short.

      The model does not include an impact of infection on cattle birth rates, but the authors highlight the well-known impacts of RVF epizootics on cattle abortion and neonatal death.

      ODEs for M herds in the dry season are missing from the appendix. Even in the absence of transmission among this subpopulation in this season, demographic turnover should influence its SIR population dynamics. Were these not included in the model or simply omitted from the text?

      The importance of the LVFV positivity decay rate is highlighted, but the loss of immunity is not considered in the SIR model. The authors do discuss uncertainty regarding model structure, but could better justify their choice. Is there evidence of reduced infection risk among previously infected seronegatives, and why was an SIRS model not considered? How might findings be expected to differ under an SIRS model?

      Shouldn't disease-induced host death be included in the serocatalytic model? A high RVF mortality rate has been estimated, and FOI is relatively high, suggesting a non-negligible impact of RVF death on seroprevalence dynamics, and indeed possibly a greater impact than seroreversion.

      It is helpful that the authors have described findings from the previously conducted household survey, which is a key foundation for the model, but it needs to be made clearer what work was already conducted as part of the previous study, in particular the Methods sections RVFV seroprevalence & household survey data and Epidemiological setting & cattle population structure. Same for the sections Study Area and Data Collection in the appendix.

      The study limitations paragraph is vague. What modelling assumptions have introduced the greatest uncertainty, and what implications could this have for study conclusions?

      Two main issues with the simulations of a ban on transhuman movement:

      The introduction rightly highlights the importance of pastoral lifestyles for subsistence farmers in the Gambia. It therefore seems likely that transhumant movement bans would have great socioeconomic and ethical challenges in addition to obvious practical challenges. Is such an intervention even a remote possibility?

      The model's structure, including homogenous mixing within each ecoregion and step-change seasonality, allows for estimation of generalized transmission rates at a macro scale. However, it greatly simplifies the movement process itself and assumes that transhumant cattle movement is the only mechanism for RVF reintroduction into the Sahel region. The model is therefore likely to misrepresent the potential impacts of movement bans on transmission. As studies, for example, in healthcare settings have shown, where fine-scaled contact data are available, incorporating the specific and complex nature of inter-individual contact can change not only the magnitude but the direction of intervention impacts relative to predictions from a model with homogenous mixing assumptions. Conclusions from this work regarding the impacts of movement bans, therefore, seem poorly supported.

      This model seems perhaps better suited to exploring, for example, cattle vaccination, and potential differential efficiency when targeting T herds relative to M or L.

    3. Author response:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In the revision, we will:

      Clarify that while epidemics occur in other parts of sub-Saharan Africa, our results may indicate a different epidemiological narrative in The Gambia, with sustained but low-level circulation (hyperendemicity).

      Discuss how model assumptions (e.g. seasonality, homogenous mixing) may bias results toward stable dynamics.

      Highlight the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In revisions we will:

      Clarify this distinction in the manuscript to avoid overinterpretation.

      Emphasize the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause abortions and neonatal deaths, these occur during relatively rare epidemics. In the Gambian context, where we’re not observing such large episodic outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we will acknowledge this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for M herds in the dry season were not included in the appendix due to an oversight, though demographic turnover was incorporated in the model code. We will add the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay/seroreversion) is an important consideration in RVFV serology, but whether this reflects true loss of protective immunity after natural infection remains unknown. Biologically, it is plausible that infected cattle develop long-lasting protection, as suggested by studies in humans, but there is an absence of longitudinal field data. From a modelling perspective, our aim was to predict age-seroprevalence curve dependent on FOI estimates and assess its ability to reproduce observed cross-sectional seroprevalence patterns. We therefore adopted a parsimonious SIR framework, treating loss of seropositivity as a potential explanation for the observed age disparity rather than modelling it as loss of immunity. In revisions we will:

      Clarify this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      Further discuss the seropositivity decay rates predicted in our survey and their possible relation to test sensitivity.

      Highlight that while a SIRS structure could generate different long-term dynamics, evaluating this requires stronger evidence for true immunity loss; we consider this an important future modelling direction.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment. Disease-induced mortality was included in the serocatalytic model through the mortality parameter (γ), but we recognise that this might not have been sufficiently clear in the text. In revisions we will clarify in the Methods and Appendix.

      (7) Clarifying previous vs. current study components

      We will revise the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We will expand the limitations section to specifically identify the assumptions contributing most to uncertainty. We will then outline how these may bias transmission dynamics and intervention estimates.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not be ideally suited to exploring them. In the revised manuscript, we will remove this analysis and emphasize how our modelling framework is more suited to exploring cattle vaccination scenarios, including targeting of specific herd types (e.g. T vs. M vs. L). We note that we are currently developing separate work focused on vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

    1. eLife Assessment

      This important study identifies a putative iron and zinc transporter in the plasma membrane of the obligate intracellular pathogen, Toxoplasma gondii. Using an array of different approaches, the authors convincingly demonstrate that this transporter regulates diverse cellular processes, including parasite metabolism and differentiation. This work will be of broad interest to cell biologists and biochemists studying metal ion transport mechanisms.